Well, it’s that time again. The facebook did something you don’t like, and now you’re angry. Well, yeah. But what did they really do? And why are you really angry? More than that, why are you really surprised, and will you really be angry tomorrow? Tomorrow, will you even remember this happened?
Here’s the breakdown of what seems to have happened.
At some point in 2012, someone at the facebook likely came up with an empirical question: ‘does the content of the user feed impact the user experience?’
Start discussing this idea for a while and you’ll get to the meat of the discussion: ‘by altering content of the user feed, can we impact the user experience?’
They decided to try, and instead of doing this kind of research on a reasonable sample of individuals, decided to just go nuts and try it on about 700,000 people.
Then, once they did this, and had this data, the story seems to be that they had no idea what to do with it, and got in touch with some guy at the cornell. He was like ‘cool, this is an easy publication’, ran the numbers on it, and got it published in an academic journal.
On the surface, most of the individual pieces of this are nothing surprising. The heart of it (the idea that a company might change the user experience differentially and then see which seems to work better) is standard practice.
This part, alone, isn’t surprising for the facebook, or for the Google, or the Apple, or the JCPenny. User experience testing is nothing new. Think about any website or store that you’ve visited over the course of a decade or so and you’ll see small changes over time. Only the most poorly run of these companies make these changes blindly and without planning and testing. There are levels of this, though, and there is also a thing called restraint.
Let’s think of a simple example. There is a lot you can learn from sitting people down in a room and asking them what they think about a change to something they know. Sometimes they like it, sometimes they don’t. Collecting human subjects data in this way is rarely controversial. It’s not very deep, and there’s not really that much to it. Things only go south when you really have no idea what you’re doing, or what you plan to do. Even then it’s usually just the case that you’ll get back poor quality data, not that you’ll, I don’t know, withhold treatment of syphilis from your participants.
If you’ve ever completed an IRB training course you might know what I’m talking about. If not, we’ll come back to it in a bit.
I hate to be the one to have to tell you, but websites do this sort of user experience testing all the time. Can you really think of any website (sure, other than the Google homepage) that has remained static for the last decade? Take a spin on the wayback machine and check out 2005 Amazon:
https://web.archive.org/web/20050714084608/http://www.amazon.com/exec/obidos/subst/home/home.html
Or 2001 Yahoo:
https://web.archive.org/web/20010815022655/http://www.yahoo.com/
Or 1997 Geocities:
https://web.archive.org/web/19970702235214/http://www15.geocities.com/
Or any number of sites that you will find to be almost unrecognizable from their current iterations. The changes to these websites over time isn’t random blundering in the darkness (okay, except maybe for Yahoo). Changes to these websites actually follow a fairly tried and true system of focus testing driven evolution.
The use of the term evolution here is no accident, as what these websites have the potential to do is differentially manipulate the underlying digital makeup of their webpage to see which fares best in the wild.
Have you ever been part of an early roll out? Probably, and you probably didn’t even think about it that much. Remember when you had to have an invite to sign up for gmail? You don’t? Ask your grandparents, youngin. It is often the case that you might even be mad to not be in the early roll out.
Google does this sort of small scale testing all the time. To be fair, a large part of this is ostensibly stress testing the system in ways that regular testers can’t, but the process also has great potential to just see how people react to changes. You would never know, at least in the moment, if you were in a Google early roll out that actually had two different experiences. If it was subtle you might never know.
People are already clamoring to be part of these roll outs, and they are already there to test the functionality of the product. So what if half the roll out had a minor tweak to the experience that didn’t alter the functionality? Think of Gmail Google chat integration, years ago. It almost certainly had a soft roll out at some point. Nowadays, Google chat has a firmly rooted place on the left side of the screen. In a roll out on a few thousand people, you could split the sample in half and give half the left side chat and half the right side chat. They’re already there to comment on the functionality and if that variability of position matters it should be one of the things they mention.
You really should be more worried that giants like Google might not be doing this. It’s lost opportunity, and it really is some low-hanging fruit.
Think about it. Yahoo wants to know if making all the tables on their fantasy sports site look like semi-opaque vomit scattered with misdirecting links will make people visit those pages less. Pull down a subset of the user base (read: thousands or millions of people) and change the tables for half of them so they look horrible, and leave the other half of your sample alone. Record visit and click data over time.
This is as far as the facebook got, this time. It also appears to be the point where Yahoo stops, coincidentally, before just rolling out the half-coded vomit tables. Thanks, Yahoo. Thanks.
There seems to be a line in the sand here that is occasionally washed away by the tide and then redrawn. This time, on this day, the facebook seems to be on the wrong side of it. It has incurred the wrath of the internet (are we keeping score, because this isn’t the first time).
I said in the title that this is something you really shouldn’t be surprised by. The facebook makes absolutely no qualms about the fact that your data is their data, and your data is how the Zuck gets paid. You are not the customer, you are the product. This is not news. I am not some prophet coming down from the mountains telling you this. This is common knowledge. Advertisers are the customer, and your page impressions (visits) are what grease those wheels. It is not really that tricky of a business model to grasp. Let’s break it down:
Phase 1: Collect users
Phase 2: ?
Phase 3: Profit
Simple, right? Unfortunately, the above seems to be pretty much how people perceive the facebook as operating. Best not to think too hard about that Phase 2.
It boils down to the simple fact that [more clicks] = [more money].
Have you been to a “news” website in, say, the last five years? Have you noticed how the headlines are really 80-90% clickbait? We could go to any “news” site out there and pick a dozen headlines that sound like:
“4 signs the stock market is overheating”
“6 months in, how’s Colorado pot?”
“10 places we dare you to go”
“James and the worst headline ever”
“What a shot! 32 sports photos”
“The most powerful celebrity is…”
“THIS is to blame for car accidents”
“$32 for a hot dog?!”
“Watch dude’s crazy pants dance”
“The upside of Pippa’s backside”
“These celebs are sexy in their 50s”
Those headlines all happened to be from CNN, from the top half of the page (it gets worse down below). I was thinking of doing more sites, but really lost the will for it after that set.
Is that enough baiting? Have you yet to wonder, ‘why’?
Well, people in the comments on CNN wonder about it all the time. Go to any of these stories and start reading comments (warning, strong fortitude recommended to ever read internet comments) and you’ll quickly get to handfuls upon handfuls of people posting some variant of:
‘wow cnn great clickbait this article sux’
Quickly followed with comments by those even more unfortunate souls of the internet, not realizing their own advice also applies to themselves on both the original level and the additional feeding the trolls level:
‘but here you are still reading it, and still posting about it’
One can only hope (for CNN’s sake, not for humanity’s sake) that CNN has actually done this kind of experiment and found that people are more likely to click on the sensationalized clickbait than the normal, well, journalism. They’ve found their new model, and it’s pretty simple, too:
Phase 1: Collect clicks
Phase 2: ?
Phase 3: Profit
Are you sad about that? Angry? As angry as at the facebook? What makes it different? Really start to ask yourself: ‘what makes it different?’
We have a fairly bad history of cherry-picking the evil de jour based on some pretty sketchy foundations, without considering how many other things we should actually consider evil.
Now, to get it out there, the facebook is decidedly evil. I’m going to tip my hand on that one. There is simply no way around it in my book. Frankly, though, that’s my opinion. You might love them. Sure, go for it. I long ago stopped proselytizing against the facebook, as the only really good place to do it is on the facebook. It just starts to feel a bit too much akin to a steadily growing ouroboros.
That said, my political views on the facebook are (happily) still listed as ‘anti-facebook’. Small victories. Take ’em where you can get ’em.
The astute reader might note that while I consider the facebook evil, I still use it. Well, yeah. Sometimes it is all about the devil you know.
So maybe you’ve had a chance to think about the question I asked above. What makes this thing, this time, different from all the others?
Oh, it’s because the facebook was actually trying to manipulate your emotions. Well, judging from the outcry, just talking about this study had a much larger effect size than actually running the study. In terms of bang for your buck the real punchline would be if talking about this earlier study was itself the actual study. Soooo meta.
I’m sorry to say that the facebook does not seem to be so clever.
So the facebook moved around the contents of the crap your friends had to say and gave you a crappier or rosier version of the world outside your window. They did this for a week, for something like 700,000 people. Then they looked at the things you posted, and the quantity of the things you posted, because that’s how they operationalized your emotions.
Guess what. Half a million people might be right around the place that some of these statistical tests become overpowered. You know, give or take a half a million people. I guess no one at the facebook knows what a power analysis is? Or maybe the software to do one was just too expensive? Oh, that software is free? Well, maybe they didn’t have administrator rights to install software on their machines. That’s probably it. Always so hard to find the tech guy at such a big company.
So, they found some /significant/ results. I put slashes around it because I honestly have no idea what kind of quotes would even be appropriate around the word significant here, for so many reasons.
Sure, there’s a difference between the groups, so they were able to manipulate something (noise?). Sure, that difference, albeit small, is statistically significant. Once you have a few thousand people you really need to start watching for significant but small (SBS) effects. The difference between these groups is non-zero. That is uninteresting. Frankly, the burden starts to fall on the “statisticians” at “the cornell” who even accepted a “sample” of this size. You don’t need a crystal ball to know that this exact result is almost guaranteed from a sample like this. Finding no significant effect here would have been the impressive result.
I will leave it to the statistically inclined (or reclined) reader to run the odds (they are calculable) on finding a result in the absence of a result with this sample size on this test (they ran two sample z-tests, it would appear).
So the facebook did this, and they revealed a few things. First off, they’re particularly bad experimentalists (if not good showmen). Also, they might have been able to change some people’s moods a little. They also might have caused people to post a little more or a little less than normal.
Don’t miss this among the noise, because this is the point that the facebook really cares about. Clicks are dollar dollar bills y’all, and the more things people are posting the more clicking they are doing and the more clicking they are causing their friends to do. If they had simply been happy with that result and taken it to the bank no one would be any the wiser and they’d be that much richer.
But for some reason, they decided they wanted to publish this. Beyond that, they wanted to call it an effect that people should care about. Turns out people do care about it, but maybe not for the reasons they expected. Hint: it isn’t because this is a large effect. How big is the effect?
Well, the long and the short of it is that this is no the Stroop Effect.
It can really be said that, other than collecting a huge amount of data, the facebook study really has nothing of results to speak of. Have you seen all those decimal placeholder zeros in their effect sizes? If I can express an effect size just as concisely in scientific notation as in decimal notation I think we’re safely in the zone of not very big. There’s also a joke there about placeholder zeros, maybe something with like I haven’t seen this many placeholder zeros since the line at the last midnight showing of [insert popular movie you dislike].
Let’s put this in a different framework. I mentioned the IRBs before, and if you don’t work at a place that has an IRB you might not know that it stands for Institutional Review Board (now you do). These are the folks that give the ethical green light to research conducted on human and/or animal subjects at, well, academic institutions. The facebook doesn’t have an IRB, because they’re a corporation, and they don’t have to worry about ethical research. *shrug* Tell me I’m wrong.
IRBs exist because it turns out that humans wanting to do research tend to kind of turn into jerks when left to their own devices without any regulation of their work. What kind of jerks? Bad ones. Much worse than those I’m about to talk about, if you can believe it and bring yourself to Google it.
Anyway, if you’ve run into a few studies that set the groundwork for modern IRBs you might be familiar with Zimbardo’s Stanford Prison Experiment.
We talk about this experiment for a lot of reasons. That said, it is genuinely hard to argue against the fact that the main reason we talk about it to this day is because it worked. The effect sizes were huge. The guards – normal people – accepted and internalized their roles so completely that the result became downright mortifying. Like, torture. The study was stopped early, after six days, when it finally became clear that things had been out of control for, give or take, six days.
What if the Zimbardo Prison Experiment had an effect size of d=0.001 (the size of the facebook effect)? What if the guards acted pleasant and, well, normal? What if everyone carried on as happy and friendly folks and after two weeks everyone just went their separate ways? Or after a week everyone decided to switch roles, and then everyone was still pleasant and cheerful? Would we care as much?
The answer should be yes, but I would imagine that many of you might say no.
This is one of the most worrisome parts of this the facebook study that no one is talking about. We seem to be concerned about the after the fact ethical ramifications of research based largely on how big of an effect was found. People are giving the facebook a pass here because they didn’t really find anything of substance, but what if this the facebook mood manipulation study had worked as well as something like the Zimbardo Prison Experiment? What is the end result of a normal effect size in this case?
At this point, I guess we don’t know who these 700,000 people were. I’ve seen no reports of them being debriefed (something IRBs make you do), so to my knowledge no one knows if they were in these groups.
What we do know is that we’re well within the law of large numbers. The odds scale (not necessarily in a linear fashion, mind you) with the number of people manipulated. If this effect had been large it is not outside of the zone of possibility (in fact it is decidedly well within the zone of possibility) that this mood manipulation might have made someone who was already sad just that much sadder, just enough to push them over the cusp to something a bit more drastic, like suicide, or homicide, or both. Even as a small effect on this large a sample there is still that risk, just smaller. It is a bit morbid to think about, but it is our job as ethical researchers to think about these things before the fact.
I’m having trouble finding 2012 data on the number of suicides in the US, but the 2010 number is right around 38,000. What are the odds that this the facebook study drove at least one person to suicide in 2012? Well, the odds might be small, but they are at least finite non-zero.
Think about it. If their sample was randomly drawn from the population then we can run some really quick back of the envelope math to show that if 38,000 (# of suicides) of 314,000,000 (population of US) people are committing suicide in a given year then we’re looking at a little more than 1 suicide in per every 10,000 people.
Do you see where I’m going? With a sample size of 700,000 people, something like 70 people they selected into this study would be expected to commit suicide sometime over the course of the year just by sheer numerical chance.
You should really think about that. I’m not just making up these numbers.
A whole bunch of you are going to throw up your hands like a wacky waving inflatable arm flailing tube man and say ‘oh hai hypocrite for talking earlier about sensationalism but now being super sensationalist’, but hear me out.
Philip Zimbardo has his supporters and his detractors, and his general stance on the Stanford Prison Experiment (after the fact) is that he never expected it to go so far and he found himself wrapped up in it and continuing it against his better judgment. It’s the sort of thing you have to say, but it is also believable that he never expected it to go as far as it did. The effect size was drastically larger than what a normal person might expect. The lesson to learn here is that just because you think that things can’t go horribly wrong is no reason to progress with a situation where things could go horribly wrong.
Like I said, this experiment is one of the things that led to IRBs having more and more control over this process. The job of an IRB is not to look at a study after the fact and say how much damage it did, but to look at a study before the fact and say how much damage it might do.
If you’ve filled out an IRB application you might know that you always have to specify risks to participants that might never happen, even if you tend to do very low-risk research. The reason you do this is because the whole point of an IRB is to consider the worst case scenarios. Let’s call them the places where d > 0.001.
So what are the potential risks of this the facebook study?
Take a quarter of a million people, and you’re likely to find a few people in there that are particularly sad. You are going to find a few that are walking that ledge between rational and irrational decision making. By making those people sadder you are walking a fine line, and by walking that fine line 700,000 times you’re just increasing the odds that something bad will happen. That’s what we should be worried about if we are the IRB reviewing it before the fact. We are now after the fact, and it’s already done. If you want to do some crazy investigations, it is those people who are the ones to hunt down and make sure they are okay. That is where the continuation of this story is, not on the ‘the facebook made me angry but not sure why’ or ‘the facebook are a bunch of meanies and they won’t even apologize and omg the Google+’
Take note, though, that even if you find a few people who were part of this study who committed suicide you are already expecting a few in both conditions by chance. You’d really need to piece together that full contingency table to say anything definitive.
Oh, or you know, take a reasonable sample from their sample of 700,000 and run your secondary stats on that.
The bottom line, though, is that the facebook is well within their rights in the current system to do whatever they want to their users with no regulation or oversight. Welcome to the 21st century. The only check to this free reign is if the things they do stop people from using their product, or if they break the law. They are a corporation, and they do not have an IRB, nor do they seem to have any of the ethical constraints that they would have at an institution, like, I don’t know, the cornell.
The fact that some guy at the cornell entered this process after data had been collected fits into an odd loophole of IRB lore. As long as the data is already in existence there is much less intense questioning of how ethically or unethically that data was obtained. Was this data collected unethically? Well, lack of informed consent and/or debriefing seems to be the major red flags that point at yes.
It’s hard not to see the ways to con this system. It might sound fairly conspiratorial (it is), but all a corporation like the facebook would have to have is one guy with a passing knowledge of research (it actually kind of sounds like they got the D student in this case) and then an exceptionally mutually beneficial collusion with someone who runs their numbers from a research institution to give it some publication cred (and IRB cred, which is often required in the publication process).
There are a lot of people out there not trying to con this system, but maybe this is a point where we are all just due for a collective slap on the wrist and a firm ‘this is why you can’t have nice things’. Maybe data not collected under the supervision of an IRB should never be granted IRB approval for analysis? That might be too harsh, but look at where we are. Look at what we’ve become.
The facebook shouldn’t get a pass here just because their experiment sucked and they found a really weak effect. The facebook should be responsible for the whole range of things that could have happened, including effect sizes greater than d=0.001. Is it possible that the facebook caused the death of at least one of their 700,000 test subjects? I hate to sound like this guy, but yes, it is at least possible. If they didn’t it is only another testament to how small of an effect they found.
The facebook also shouldn’t get a pass on failing to meet the basic expectations of human subjects research like informed consent and debriefing just because they’re a corporation and don’t have an IRB. Unfortunately, this is something that they’ve already gotten a pass on from our past-selves. This is just regular corporate research, and the reasons that the facebook got called out this time are 1) they are huge, 2) the study sample size was huge, 3) they got greedy.
If you’re mad about this the facebook thing, this is what you should be mad about. You should be mad at all of us, yourself included, for not worrying about this giant loophole until someone stepped right through it guns blazing, and also the bullets coming out of the guns are bad research.
At the same time, though, you shouldn’t be surprised. You shouldn’t hold the facebook to some personal standards that they have no hope of ever adhering to, and you shouldn’t try to put them on some pedestal like this is something that you’re shocked they did. The facebook is not kid George Washington bravely stating that they will never lie. This is not outside of the facebook’s comfort zone, and they are likely to do very similar if not identical things in the future. Other companies are probably doing pretty similar stuff at this very moment. If you don’t like it, stop using their service. Yeah, go ahead and try.
At the same time, you should again be more mad at the system that allows the facebook and the cornells to do this with absolutely no limitations or repercussions. It appears, at least at this point with the information that we currently have, that everyone was working within the bounds of the system we have put in place. They might have colluded or taken advantage of weird loopholes, but unless we find out something weird it does appear that they were technically correct in their actions.
I said ‘technically correct’, so I guess that just leaves us this to wrap things up: