The Joys of Statistical Mapping

Statistical maps, which display the geographic distribution as well as the magnitude of a variable of interest, have become an increasingly common tool in data analysis. From crime rates to forest fires, it is now common to represent the geographical distribution of a variable of interest by coloring a map in proportion to its magnitude. Many different ways to represent magnitude exist, and as I showed together with colleagues some 20 years ago, different plotting techniques can give rise to very different impressions for identical data. Issues such as granularity (i.e. size of area for data display; states vs. counties) or choice of color (i.e., red vs. yellow or purple shading) can affect people’s perception and accuracy.

In the climate arena, maps are routinely used to display temperature anomalies. Typically, shades of red represent positive anomalies (i.e., above-average temperatures) whereas blue is used to represent negative anomalies (below average.) This opposing-colors scheme arguably works very well in drawing the reader’s attention to particularly warm or cool areas on the globe.

James Risbey and I published a paper last Sunday with colleagues that used a set of maps in one of the figures to show the modeled and observed decadal trends (Kelvin/decade) of Sea Surface Temperature (SST). The observations, as shown in Figure 5c in our paper, are shown again below in a virtually identical format. The figure was created with MATLAB, using a white color around the zero trend and a high-resolution colormap with a low resolution contour interval:

These data present an opportunity to explore the impact of subtle graphical choices on the observer’s perception of the data.

The next figure shows the same data, also plotted with MATLAB, but with no white zone in the red-blue color bar, and with a coarse colormap that matches the contour interval used.

It will be noted that this figure “runs hotter” than the one we published in Figure 5c, because some very small long-term trends are “forced” into a pink band because the white (“neutral”) choice is no longer available.

And one more figure drawn with MATLAB: This time with a white zone around (near-) zero trends, but with no contouring.  This shows the raw trends better, but the white zone and the high resolution colorbar start to change the look quite a bit.

Finally, let’s try another software. The figure below was plotted using Panoply, using the same contour interval and matching colorbar resolution as in the first figure above, which is nearly identical to Figure 5c in our paper.


What conclusions can we draw from these comparisons?

The figures change the appearance of the data considerably. It follows that one should apply considerable caution when comparing figures between different publications or different research groups: Visual differences may not reflect differences in the data but differences in the subtle ways in which the graphs were produced, not all of which can be identified from inspecting the figure alone.

A second conclusion that we can draw is that regardless of the specific map being used, they all show warming during the last 15 years in the Northern and central Pacific, accompanied by cooling in the Western and Eastern Pacific. As we argued in the paper, this spatial pattern is washed out when the full CMIP5 model ensemble is considered. When models are selected with respect to how well they are synchronized with the world’s ocean, then those that are in phase with the Earth’s natural variability capture the spatial pattern of ocean heating better than the models that are maximally out of phase.