Classification
Classification is the process of organizing individuals into groups according to shared qualities or characteristics. Classification is used in GIS, cartography and remote sensing to generalize complexity in, and extract meaning from, geographic phenomena and geospatial data. There are different kinds of classifications, but all will generally involve a classification schema or key, which is a set of criteria (usually based on the attributes of the individuals) for deciding which individuals go into each class. ^{[1]} Changing the classification of a data set can create a variety of different maps. ^{[2]}
Contents
Types of Classification Schema
Classification schema can take a number of forms:
- The simplest is to divide of the range of values of a single quantitative attribute into ordinal classes. This is the method usually used for choropleth and isarithmic maps. For example, the incomes of families in a county could be classified as "high" (>$200,000), "medium" ($40,000-199,999), and "low" (<$40,000). There are several techniques for developing this type of schema, based on patterns in the data:
- Equal Interval: When classifying data for map symbolizing, equal interval classification arranges a set of attribute values into groups that contain an equal range of values. This can help show different groups when they are close in size. However, this doesn't often occur in geographic phenomena.
- Quantile: divides the attribute values equally into a predefined number of classes. The attribute values are added up, then divided into the predetermined number of classes. In order to do this, you take the number of total observations and divide that by the number of classes resulting in the number of observations in each class. One of the advantages to using this method is that the classes are easy to compute and each class is equally represented on the map. Ordinal data can be easily classified using this method since the class assignment of quantiles is based on ranked data ^{[3]}.
- Jenks Natural Breaks: The Jenks Natural Breaks Classification, ^{[4]} (or Optimization) system is a data classification method designed to optimize the arrangement of a set of values into "natural" classes. This is done by seeking to minimize the average deviation from the class mean, while maximizing the deviation from the means of the other groups. The method reduces the variance within classes and maximizes the variance between classes.^{[5]}^{[6]}
- Geometric Interval: This classification method is used for visualizing continuous data that is not distributed normally. This method was designed to work on data that contains excessive duplicate values, e.g., 35% of the features have the same value.
- Standard Deviation: The Standard Deviation Classification method finds the mean value of the observations then places class breaks above and below the mean at intervals of either .25, .5, or 1 standard deviation until all the data values are contained within the classes. ^{[7]} This classification method shows how much the feature's attribute value varies from the mean. Using a diverging color scheme to illustrate these values is useful to emphasize which observations are above the mean and which observations are below the mean.
Choosing the Appropriate Method
Choosing which classification method could be the hardest and most complicated thing an analyst will do. Information can be falsely represented if the correct method is not used. ^{[8]}
See Also:
References
- ↑ Longely et al. "Chapter 3: Representing Geography", Geographic Information Systems and Science. 2011.
- ↑ ArcGIS 10.1 Help, "Data classification." Accessed 22 Oct 2012
- ↑ [1]Geographic Information Technology Training Alliance. Accessed 08 November 2015
- ↑ https://en.wikipedia.org/wiki/Jenks_natural_breaks_optimization,
- ↑ Jenks, George F. 1967. "The Data Model Concept in Statistical Mapping", International Yearbook of Cartography 7: 186-190.
- ↑ McMaster, Robert, "In Memoriam: George F. Jenks (1916-1996)". Cartography and Geographic Information Science. 24(1) p.56-59.
- ↑ Standard Deviation Classification, GIS Dictionary. http://support.esri.com/en/knowledgebase/GISDictionary/term/standard%20deviation%20classification
- ↑ Data ClassificationUniversity of California Santa Barbara. Accessed 22 October 2012