Simpson's paradox

From wiki.gis.com
Jump to: navigation, search

Simpson’s Paradox, also known as the amalgamation paradox, reversal paradox, or Yule-Simpson Effect[1], is a problem studied in statistics. It describes how when looking at data in groupings, one trend may be observed, but when looking at the aggregation of the group data, that trend may be the opposite.

Example

Let’s say that the record for on-time-completion of city road projects is being compared between two cities over three years. For Year 1, city A completes 1/4 projects on time (25%) and city B completes 3/9 (33%). For Year 2, city A completes 5/6 (83%), and city B completes 2/2 (100%). In Year 3, city A is 3/5 (60%) and city B 3/4 (75%). Looking at the data year-by-year, city B’s completion percentage is better than city A’s in every case, easily leading to a conclusion that city B’s record and trend is better than city A’s. However, looking at the 3-year aggregate, city A completed 9/15 projects (60%) and city B completed 8/15 projects (53%); thus city A has outperformed city B over that three year period. This is Simpson’s Paradox.

The challenge then becomes deciding from what perspective to evaluate the data. Which viewpoint should be used to make decisions? Which city should be rewarded or commended? Which city should be incentivized to perform better or because it performed better? How should future funds be allocated? Some studies indicate that such decisions must then be made not on the data, but based on knowing the story behind the data and causal relationships involved. Furthermore, once evaluated, the non-intuitive choice often turns out to be the better choice, given the proper context. [2]

An aspect of GIS where Simpson’s Paradox can easily surface is in the design and interpretation of choropleth maps. Since choropleth’s are almost always based on grouped data (very commonly, census data), it is easy to represent group patterns or trend, intentionally or unintentionally, without consideration of the meaning of larger aggregate implications. Even where such ramifications are carefully considered, the reader of the map may not be educated as to how to interpret what is presented, so it is the responsibility of the map maker to be sure the intended audience and purpose are clear.

See Also

Ecological fallacy

References

  1. I. J. Good, Y. Mittal (June 1987). "The Amalgamation and Geometry of Two-by-Two Contingency Tables". The Annals of Statistics. 15 (2): 694–711
  2. Judea Pearl. Causality: Models, Reasoning, and Inference, Cambridge University Press (2000, 2nd edition 2009)