Drilling into noise

The science of statistics is all about differentiating signal from noise. This exercise is far from trivial: Although there is enough computing power in today’s laptops to churn out very sophisticated analyses, it is easily overlooked that data analysis is also a cognitive activity.

Numerical skills alone are often insufficient to understand a data set—indeed, number-crunching ability that’s unaccompanied by informed judgment can often do more harm than good.

This fact frequently becomes apparent in the climate arena, where the ability to use pivot tables in Excel or to do a simple linear regressions is often over-interpreted as deep statistical competence.

The graph below illustrates this problem with the global temperature data: although there is no question that the trend is increasing, it is always possible to cherry pick periods for analysis during which there is no significant increase in temperature. Of course, those “analyses” are a meaningless distraction from what is actually happening on our planet (the only one we’ve got, by the way).

Similar comments apply to some of the analyses reported in the blogosphere of our recent data on rejection of science and conspiracist ideation. We have already dealt with the “scamming” issue here and here, and we will not take it up again in this post.

Instead, we focus on the in-principle problems exhibited by some of the blog-analyses of our data. Two related problems and misconceptions appear to be pervasive: first, blog analysts have failed to differentiate between signal and noise, and second, no one who has toyed with our data has thus far exhibited any knowledge of the crucial notion of a latent construct or latent variable.

Let’s consider the signal vs. noise issue first. We use the item in our title, viz. that NASA faked the moon landing, for illustration. Several commentators have argued that the title was misleading because if one only considers level X of climate “skepticism” and level Y of moon endorsement, then there were none or only very few data points in that cell in the Excel spreadsheet.


But that is drilling into the noise and ignoring the signal.

The signal turns out to be there and it is quite unambiguous: computing a Pearson correlation across all data points between the moon-landing item and HIV denial reveals a correlation of -.25. Likewise, for lung cancer, the correlation is -.23. Both are highly significant at p < .0000…0001 (the exact value is 10 -16, which is another way of saying that the probability of those correlations arising by chance is infinitesimally small).

What about climate? The correlation between the Moon item and the “CauseCO2” item is smaller, around -.12, but also highly significant, p < .0001.

Now you know why the title of our paper was “NASA faked the moon landing—Therefore (Climate) Science is a Hoax: An Anatomy of the Motivated Rejection of Science.” We put the “(climate)” in parentheses before “science” because the association between conspiracist ideation and rejection of science was greater for the other sciences than for climate science.

(As an intriguing aside, by the logic that’s been applied to our data by some critics, the larger correlations involving other sciences would suggest that AIDS researchers—keen to get their grants renewed?— scammed our survey to make AIDS deniers look bad.)

But we can do better than extract a signal by simple correlations.

Far better.

This brings us to our second issue, the role of latent variables.

To understand this concept, one must first consider the fact that on any cognitive test or survey, any one item, however well designed, will not provide an error-free measure of the psychological construct of interest. No single puzzle can tell you about a person’s IQ, no single question will reveal one’s personality, and no single moon landing will reveal a person’s propensity for conspiracist ideation.

So the correlations we just reported constitute a better signal than the noise that overwhelms a selected few cells of an Excel spreadsheet, but they are still “contaminated” by measurement error or item variance—that is, the data reflect the idisosyncracy of the particular item in addition to information about the construct of interest, in this case conspiracist ideation.

What to do?

Enter latent variable analysis, also known as structural equation modeling (SEM).

SEM is a technique that estimates latent constructs—that is, the hypothesized psychological construct of interest, such as intelligence or personality or conspiracist ideation. SEM does this by considering multiple items, thereby removing the measurement error that besets individual test items.

We cannot get into the details here, but basically SEM permits computation of the error-free associations between constructs, such as one’s attitudes towards science and one’s conspiracist ideation. It is because measurement error has been reduced or eliminated, that correlations between constructs are higher in magnitude than might be suggested by the pairwise correlations between items.

And because SEM removes measurement error, the associations between constructs are particularly resilient, as we showed earlier when all observations are removed that might conceivably represent “scamming.”

When the long-term temperature trend is ignored in favour of a few years of declining temperatures after a unique scorcher, this is missing the statistical forest not just for a tree but for a little twig on a tree.

Likewise, when the associations between latent constructs in our data are ignored in favour of one or two cells in an Excel spreadsheet, that’s missing the statistical forest not just for a little twig on a single tree but for a single leg of a pinebark beetle on that twig that’s eating its way northbound through the Rockies as the globe is warming.