# A simple recipe for the manufacturing of doubt

Mr. McIntyre, a self-declared expert in statistics, recently posted an ostensibly unsuccessful attempt to replicate several exploratory factor analyses in our study on the motivated rejection of (climate) science. His wordy post creates the appearance of potential problems with our analysis.

There are no such problems, and it is illustrative to examine how Mr. McIntyre manages to manufacture this erroneous impression.

Our explanation focuses on the factor analysis of the five “climate science” items as just one example, because this is the case where his re-“analysis” deviated most from our actual results.

The trick is simple when you know a bit about exploratory factor analysis (EFA). EFA serves to reduce the dimensionality in a data set. To this end, EFA represents the variance and covariance of a set of observed variables by a smaller number of latent variables (factors) that represent the variance shared among some or all observed variables.

EFA is a non-trivial analysis technique that requires considerable training to be used competently, and a full explanation is far beyond the scope of a single blog post. Suffice it to say that what EFA does is to take a bunch of variables, such as items on a questionnaire, and then replaces the multitude of items with a small number of “factors” that represent the common information that is picked up by those items. In a nutshell, EFA permits you to go from 100 items on an IQ test to a single factor that one might call “intelligence.” (It’s more nuanced than that, but that captures the essential idea for now).

One core aspect of EFA is that the researcher must decide on the number of factors to be extracted from a covariance matrix. There are several well-established criteria that guide this selection. In the case of our data, all acknowledged criteria yield the same conslusions.

For illustrative purposes we focus on the simplest and most straightforward criterion, which states one should extract factors with an eigenvalue > 1.  (If you don’t know what an eigenvalue is, that’s not a problem—all you need to know is that this quantity should be >1 for a factor to be extracted). The reason is that factors with eigenvalues < 1 represent less variance than a single variable, which negates the entire purpose of EFA, namely to represent the most important dimensions of variation in the data in an economical way.

Applied to the five “climate science” items, the first factor had an eigenvalue of 4.3, representing 86% of the variance. The second factor had an eigenvalue of only .30, representing a mere 6% of the variance. Factors are ordered by their eigenvalues, so all further factors represent even less variance.

Our EFA of the climate items thus provides clear evidence that a single factor is sufficient to represent the largest part of the variance in the five “climate science” items.  Moreover, adding further factors with eigenvalues < 1 is counterproductive because they represent less information than the original individual items. (Remember that all acknowledged standard criteria yield the same conclusions.)

Practically, this means that people’s responses to the five questions regarding climate science were so highly correlated that they reflect, to the largest part, variability on a single dimension, namely the acceptance or rejection of climate science. The remaining variance in individual items is most likely mere measurement error.

How could Mr. McIntyre fail to reproduce our EFA?

Simple: In contravention of normal practice, he forced the analysis to extract two factors. This is obvious in his R command line:

pc=factanal(lew[,1:6],factors=2)

In this and all other EFAs posted on Mr. McIntyre’s blog, the number of factors to be extracted was chosen by fiat and without justification.

Remember, the second factor in our EFA for the climate item had an eigenvalue much below 1, and hence its extraction is nonsensical. (As it is by all other criteria as well.)

But that’s not everything.

When more than one factor is extracted, researchers can rotate factors so that each factor represents a substantial, and approximately equal, part of the variance. In R, the default rotation method, which Mr. McIntyre did not overrule, is to use Varimax rotation, which forces the factors to be uncorrelated. As a result of rotation, the variance is split about evenly among the factors extracted.

Of course, this analysis is nonsensical because there is no justification for extracting more than one factor from the set of “climate change” items.

There are two explanations for this obvious flaw in Mr. McIntyre’s re-“analysis”. Either he made a beginner’s mistake, in which case he should stop posing as an expert in statistics and take a refresher of Multivariate Analysis 101. Or else, he intentionally rigged his re-“analysis” so that it deviated from our EFA’s in the hope that no one would see through his manufacture of doubt.