As of five years ago—2011—the replication crisis was barely a cloud on the horizon.Here’s what I see as the timeline of important events: 1960s-1970s: Paul Meehl argues that the standard paradigm of experimental psychology doesn’t work, that “a zealous and clever investigator can slowly wend his way through a tenuous nomological network, performing a long series of related experiments which appear to the uncritical reader as a fine example of ‘an integrated research program,’ without ever once refuting or corroborating so much as a single strand of the network.” Psychologists all knew who Paul Meehl was, but they pretty much ignored his warnings.
“Four reviewers made comments on the manuscript,” he said, “and these are very trusted people.” In retrospect, Bem’s paper had huge, obvious multiple comparisons problems—the editor and his four reviewers just didn’t know what to look for—but back in 2011 we weren’t so good at noticing this sort of thing.
At this point, certain earlier work was seen to fit into this larger pattern, that certain methodological flaws in standard statistical practice were not merely isolated mistakes or even patterns of mistakes, but that they could be doing serious damage to the scientific process.
1980s-1990s: Null hypothesis significance testing becomes increasingly controversial within the world of psychology.
Unfortunately this was framed more as a methods question than a research question, and I think the idea was that research protocols are just fine, all that’s needed was a tweaking of the analysis.
2011: Joseph Simmons, Leif Nelson, and Uri Simonsohn publish a paper, “False-positive psychology,” in Psychological Science introducing the useful term “researcher degrees of freedom.” Later they come up with the term p-hacking, and Eric Loken and I speak of the garden of forking paths to describe the processes by which researcher degrees of freedom are employed to attain statistical significance.
(Correction: Uri emailed to inform me that their paper actually had nothing to do with the subfield of positive psychology and that they intended no such pun.) That same year, Simonsohn also publishes a paper shooting down the dentist-named-Dennis paper, not a major moment in the history of psychology but important to me because that was a paper whose conclusions I’d uncritically accepted when it had come out.I too had been unaware of the fundamental weakness of so much empirical research.2011: Daryl Bem publishes his article, “Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect,” in a top journal in psychology.1971: Tversky and Kahneman write “Belief in the law of small numbers,” one of their first studies of persistent biases in human cognition.This early work focuses on resarchers’ misunderstanding of uncertainty and variation (particularly but not limited to p-values and statistical significance), but they and their colleagues soon move into more general lines of inquiry and don’t fully recognize the implication of their work for research practice.I’ve written elsewhere on my problems with this attitude—in short, (a) many published papers are clearly in error, which can often be seen just by internal examination of the claims and which becomes even clearer following unsuccessful replication, and (b) publication itself is such a crapshoot that it’s a statistical error to draw a bright line between published and unpublished work.