Confusion matrix

From wiki.gis.com
Jump to: navigation, search

A confusion matrix (also known as an error matrix or contingency table) visually represents the difference between the actual and predicted classifications of a model. [1] It is used to easily recognize how often a classification system mislabels one classification as another. This is helpful for testing the strength of hypotheses in statistical models by seeing how often the models can reproduce accurate results on further data sets. A confusion matrix shows where the model is actually wrong and where it is correct.

Cartographic Application

In cartography, "An error matrix is frequently employed to organize and display information used to assess the thematic accuracy of a land-cover map" [2] It is commonly used to asses the accuracy of species distribution models (models that predict the locations of certain species based on a specified set of data criteria).[3] That is, the confusion matrix represents how often the model correctly places a species in a given area or recognizes that it will not be found in the area. This helps to ensure that a map based on the species distribution model presents the most accurate information possible.

Example

The confusion matrix is based on the frequency recorded for each of the four possible types of prediction:

  • True positive: presence predicted by model and confirmed by data
  • False positive: presence predicted by the model but data indicates absence
  • False negative: absence predicted by the model but data indicates presence
  • True negative: absence predicted by the model and confirmed by the data

The following table is a basic example of how a confusion matrix deals with frequencies, as adapted from "Species Distribution Model" by Richard Pearson.[4]

Recorded Present Recorded Absent
Predicted Present A (true positive) B (false positive)
Predicted Absent C (false negative) D (true negative)







Statistical Testing

Notice, the confusion matrix is also set up in a similar manner as a statistical contingency table. This contingency table is necessary for Pearson's chi-square test. Before the "goodness-of-fit" test (that the chi-square calculates) can be completed, observed and expected frequencies must be calculated for insertion into the formula. The confusion matrix(and/or contingency table) is necessary in calculating those observed and expected frequencies, therefore the "goodness-of-fit" of your data cannot be statistically determined without this table.

See Also

References

  1. [1] Confusion Matrix Accessed 29 September 2012
  2. [2] Stephen V. Stehman. "Selecting and interpreting measures of thematic classification accuracy: Abstract," Remote Sensing of Environment: Volume 62, Issue 1, October 1997, Pages 77–89. Accessed 29 September 2012
  3. [3] Pearson, Richard. "Species Distribution Models." Center for Biodiversity and Conservation at the American Museum of Natural History. Accessed 9 Oct 2012
  4. Ibid.


Further reading

Dale, Peter. Introduction to Mathematical Techniques Used in GIS, CRC Press: New York, 2005.