Inferential Statistics and Replications

When you drop a glass it’ll crash to the floor. Wherever you are on this planet, and whatever glass it is you were disposing of, gravity will ensure its swift demise. The replicability of phenomena is one of the hallmarks of science: once we understand a natural “law” we expect it to yield the same outcome in any situation in which it is applicable. (This outcome may have error bars associated with it but that doesn’t affect our basic conclusion).

Nobel-winning cognitive scientist Dan Kahneman recently voiced his concern about the apparent lack of replicability of some results in an area of social psychology that concerns itself with “social priming”, the modification of people’s behavior without their awareness. For example, it has been reported that people walk out of the lab more slowly after being primed with words that relate to the concept “old age” (Bargh et al., 1996). Alas, notes Kahneman, those effects have at least sometimes failed to be reproduced by other researchers. Kahneman’s concern is therefore understandable.

How can experiments fail to replicate? There are several possible reasons but here we focus on the role of inferential statistics in scientific research generally. It isn’t just social psychology that relies on statistics; many other disciplines do too. In a nutshell, statistics enables us to decide whether or not an observed effect has occurred simply by chance. Researchers routinely test whether an observed effect is “significant”. A “significant” effect is one that is so large that it is unlikely to arise from chance alone. An effect is declared “significant” if the probability to observe an effect this large or larger by chance alone is smaller than a pre-defined “significance level”, usually set to 5% (.05).

So, statistics can help us decide whether people walked down the hallway more slowly by chance or because they were primed by “old” words. However, our conclusion that the effect is “real” and not due to chance is inevitably accompanied by some uncertainty.

Here is the rub: if the significance level is .05 (5%), then there is still a 1 in 20 chance that we erroneously concluded the effect was real even when it was due to chance—or put another way, out of 20 experiments, there may be 1 that reports an effect when in fact that effect does not exist. This possibility can never be ruled out (although the probability can be minimized by various means).

There is one more catch: as an experimenter, when reporting a single experiment, one can never be 100% sure whether one’s effect is real or due to chance. One can be very confident that the effect is real if it is extremely unlikely to observe such an effect by chance alone, but the possibility that one’s experiment will fail to replicate can never be ruled out with absolute certainty.

So does this mean that we can never be sure of the resilience of an effect in psychological research?

No.

Quite on the contrary, we know much about how people function and how they think.

This is readily illustrated with Dan Kahneman’s own work, as he has produced several benchmark results in cognitive science. Consider the following brief passage about a hypothetical person called Linda:

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. In university, she was involved in several social issues, including the environment, the peace campaign, and the anti-nuclear campaign.

Do you think Linda is more likely to be a bank teller or a bank teller and an active feminist?

Every time this experiment is done—and we have performed it literally dozens of times in our classes—most people think Linda is more likely to be a feminist bank teller than a mere bank teller. After all, she was engaged in environmental issues, wasn’t she?

However, this conclusion is false, and people’s propensity to endorse it is known as the “conjunction fallacy” (Tversky & Kahneman, 1983). It’s a fallacy because an event defined by multiple conditions can never be more likely than an event requiring only one of the constituent conditions: Because there are bound to be some bank tellers who are not feminists, Linda is necessarily more likely to be a bank teller than a bank teller and an active feminist.

Replicable effects such as the conjunction fallacy are obviously not confined to cognitive science. In climate science, for example, the iconic “hockey stick” which shows that the current increase in global temperatures is unprecedented during the past several centuries if not millennia, has been replicated numerous times since Mann et al. published their seminal paper in 1998. (Briffa et al., 2001; Briffa et al., 2004; Cook et al. 2004; D’Arrigo et al., 2006; Esper et al., 2002; Hegerl et al., 2006; Huang et al., 2000; Juckes et al., 2007; Kaufman et al., 2009 ; Ljungqvist, 2010; Moberg et al., 2005; Oerlemans, 2005 ; Pollack & Smerdon, 2004; Rutherford et al., 2005; Smith et al., 2006).

Crucially, those replications relied on a variety of proxy measures to reconstruct past climates—from tree rings to bore holes to sediments and so on. The fact that all reconstructions arrive at the same conclusion therefore increases our confidence in the robustness of the hockey stick. The sum total of replications has provided future generations with a very strong scientific (and moral) signal by which to evaluate our collective response to climate change at the beginning of the 21st century.

Let us now illustrate the specifics of the process of replication within the context of one of my recent papers, with colleagues Klaus Oberauer and Gilles Gignac, which showed (among other things) that conspiracist ideation predicted rejection of a range of scientific propositions, from the link between smoking and lung cancer to the fact that the globe is warming due to human greenhouse gas emissions. This effect was highly significant but the possibility that it represented a statistical fluke—though seemingly unlikely—cannot be ruled out.

To buttress one’s confidence in the result, a replication of the study would thus be helpful.

But that doesn’t mean it should be the same exact study done over again. On the contrary, this next study should differ slightly, so that the replication of the effect would underscore its breadth and resilience, and would buttress its theoretical impact.

For example, one might want to conduct the study using a large representative sample of the U.S. population. The kind of sample that professional survey and market research companies specialize in.

One might refine the set of items based on the results of the first study. One might provide a “neutral” option for the items this time round: the literature recognizes both strengths and weaknesses of including a neutral response option, so running the survey both ways and getting the same result would be particularly helpful.

One might also expand the set of potential worldview predictors, and one might query other controversial scientific propositions, such as GM foods and vaccinations—both said to be rejected by the political Left even though data on that claim are sparse.

Yes, such a replication would be quite helpful.

References

Bargh, J. A., Chen, M., & Burrows, L. (1996). Automaticity of social behavior: Direct effects of trait construct and stereotype activation on action. Journal of Personality and Social Psychology, 71, 230-244.

Briffa, K.R., et al., 2001: Low-frequency temperature variations from a northern tree ring density network. J. Geophys. Res., 106(D3), 2929–2941.

Briffa, K.R., T.J. Osborn, and F.H. Schweingruber, 2004: Large-scale temperature inferences from tree rings: a review. Global Planetary Change, 40(1–2), 11–26.

Cook, E.R., J. Esper, and R.D. D’Arrigo, 2004a: Extra-tropical Northern Hemisphere land temperature variability over the past 1000 years. Quat.Sci. Rev., 23(20–22), 2063–2074.

D’Arrigo, R., R. Wilson, and G. Jacoby, 2006: On the long-term context for late twentieth century warming. J. Geophys. Res., 111(D3), doi:10.1029/2005JD006352.

Esper, J., E.R. Cook, and F.H. Schweingruber, 2002: Low-frequency signals in long tree-ring chronologies for reconstructing past temperature variability. Science, 295(5563), 2250–2253.

Hegerl, G.C., T.J. Crowley, W.T. Hyde, and D.J. Frame, 2006: Climate sensitivity constrained by temperature reconstructions over the past seven centuries. Nature, 440, 1029–1032.

Huang, S. and Pollack, H. S. and Shen, P.-Y. (2000). Temperature trends over the past five centuries reconstructed from borehole temperatures. Nature, 403, 756-758.

Juckes, M. N. et al. (2007). Millennial Temperature Reconstruction Intercomparison and  Evaluation. Climate of the Past, 3, 591–609.

Kaufman, D. S. et al. (2009). Recent Warming Reverses Long-Term Arctic Cooling. Science, 325, 1236.

Ljungqvist, F. C. (2010). A New Reconstruction of Temperature Variability in the Extra-tropical Northern Hemisphere During the Last Two Millennia. Geografiska Annaler , 92A, 339–351.

Mann, M. E., Bradley, R. S., & Hughes, M. K. (1998). Global-Scale Temperature Patterns and Climate Forcing over the Past Six Centuries. Nature, 392, 779–787.

Moberg, A., et al., 2005: Highly variable Northern Hemisphere temperatures reconstructed from low- and high-resolution proxy data. Nature, 433(7026), 613–617.

Oerlemans, J., 2005: Extracting a climate signal from 169 glacier records. Science, 308(5722), 675–677.

Pollack, H.N., and J.E. Smerdon, 2004: Borehole climate reconstructions: Spatial structure and hemispheric averages. J. Geophys. Res., 109(D11), D11106, doi:10.1029/2003JD004163.

Smith, C. L. and Baker, A. and Fairchild, I. J. and Frisia, S. and Borsato, A. (2006). Reconstructing hemispheric-scale climates from multiple stalagmite records. International Journal of Climatology, 26, 1417 – 1424.

Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment. Psychological Review, 90, 293-315.

Wahl, E. R. and Ammann, C. R. (2007).  Robustness of the Mann, Bradley, Hughes reconstruction of Northern Hemisphere surface temperatures: Examination of criticisms based on the nature and processing of proxy climate evidence. Climatic Change, 85, 33-69.