Classification is the process of organizing individuals into groups according to shared qualities or characteristics. Classification is used in GIS, cartography and remote sensing to generalize complexity in, and extract meaning from, geographic phenomena and geospatial data. There are different kinds of classifications, but all will generally involve a classification schema or key, which is a set of criteria (usually based on the attributes of the individuals) for deciding which individuals go into each class.  Changing the classification of a data set can create a variety of different maps. 
Humans use categories in every aspect of everyday life to make sense of the world. Common nouns are categories of entities; most adjectives are categories of the attributes of those entities. Even regions can be thought of as spatial categories.
A useful category is one that creates cognitive efficiency by allowing us to think about the groups as a whole without having to think about the individuals. To accomplish this, the ideal category would consist of individuals that share a wide variety of characteristics with other members of the category, while having very little in common with individuals outside the category. That is, good categories should have minimal intra-category variation and maximal inter-category variation. In such an ideal situation, one can deduce the characteristics of an individual from the categories to which it belongs with greater certainty (that is, there is less danger of the ecological fallacy). By this criterion, the worst possible set of categories would be to randomly assign individuals to categories; the lack of any intra-category similarity means that one cannot safely deduce anything about an individual based on the characteristics of the category (although this lack of predictability makes random categorization an ideal approach for statistical sampling).
An ideal set of categories should be able to classify each individual unambiguously. Therefore, classification schema should be mutually exclusive and collectively exhaustive. Mutually exclusive means that there is no overlap between any two categories (i.e., an individual cannot belong to two categories simultaneously), while collectively exhaustive means that the categories "exhaust" or include all individuals (i.e., an individual cannot belong to zero categories). Thus, in an ideal classification schema, each individual belongs to one and only one category.
Everyday categories often violate these ideals. Boundary cases (individuals that seem to fall in between two categories, such as a plant that has some characteristics of a "tree" and some characteristics of a "shrub") and outliers (individuals that don't seem to have much in common with anything else) are common. An individual is usually similar to a set of others in some characteristics, but very different from them in other characteristics. Because geospatial technology is typically not good at dealing with these vagueries, operational classification in geospatial applications typically require artificial decisions and thresholds.
Types of Classification Schema
Classification schema can take a number of forms and can be derived using a variety of methods. The choice of schema type and classification methodology depends largely on the nature of the source data, and the nature of the criteria for putting each individual into a class. Choosing which classification method could be the hardest and most complicated thing an analyst will do. In many cases, multiple schemes area available that are equally valid, but portray very different patterns in analysis and visualization. Information can even be falsely represented if the correct method is not used. 
The simplest method is to divide of the range of values of a single quantitative attribute into ordinal classes. This is the method usually used for choropleth and isarithmic maps. For example, the incomes of families in a county could be classified as "high" (>$200,000), "medium" ($40,000-199,999), and "low" (<$40,000). There are several techniques for developing this type of schema, based on patterns in the data:
- Equal Interval: When classifying data for map symbolizing, equal interval classification arranges a set of attribute values into groups that contain an equal range of values. This can help show different groups when they are close in size. However, this doesn't often occur in geographic phenomena.
- Quantile: divides the attribute values equally into a predefined number of classes. The attribute values are added up, then divided into the predetermined number of classes. In order to do this, you take the number of total observations and divide that by the number of classes resulting in the number of observations in each class. One of the advantages to using this method is that the classes are easy to compute and each class is equally represented on the map. Ordinal data can be easily classified using this method since the class assignment of quantiles is based on ranked data .
- Jenks Natural Breaks: The Jenks Natural Breaks Classification,  (or Optimization) system is a data classification method designed to optimize the arrangement of a set of values into "natural" classes. This is done by seeking to minimize the average deviation from the class mean, while maximizing the deviation from the means of the other groups. The method reduces the variance within classes and maximizes the variance between classes.
- Geometric Interval: This classification method is used for visualizing continuous data that is not distributed normally. This method was designed to work on data that contains excessive duplicate values, e.g., 35% of the features have the same value.
- Standard Deviation: The Standard Deviation Classification method finds the mean value of the observations then places class breaks above and below the mean at intervals of either .25, .5, or 1 standard deviation until all the data values are contained within the classes.  This classification method shows how much the feature's attribute value varies from the mean. Using a diverging color scheme to illustrate these values is useful to emphasize which observations are above the mean and which observations are below the mean.
A Decision Tree is an ordered set of questions applied to each individual entity (or each point in space) to determine the category to which it belongs. The answer to each question either results in a final choice of category, or leads to another more specific question. The questions can involve a wide range of attributes and criteria. As a geographic example, the Köppen climate classification system is usually implemented as a decision tree. Also, the biological species of an organism is usually determined using a decision tree.
In GIS, decision tree classification schemes are typically implemented by evaluating each question for the entire study area using relevant data GIS analysis techniques like query and overlay (in raster or vector). The "answer" for a given question will thus be a set of regions for each possible answer, each of which can be attributed with a final class, or used as a mask for where to apply the next question. For example, the Köppen climate classification system can be modeled using 24 raster grids (long-term mean precipitation and temperature for each month) and Map Algebra.
Clustering is a classification method that is most commonly used in data mining and remote sensing image analysis. It is based on the premise that if a set of meaningful categories exists in a phenomenon (e.g., types of land cover), they should be appear as patterns in the characteristics of the phenomena. Specifically, there should be clusters of individuals that are similar in several attributes while being very different from other individuals in the same attributes (i.e., the minimal intra-variability, maximal inter-variability ideal discussed above). For example, if humans are able to intuitively identify different types of land cover in an aerial photograph by recognizing similarities and differences in color and texture, then remote sensing software should be able to identify the same patterns in multispectral imagery data.
While analytically identifying the perfect set of clusters in a multivariate dataset is computationally difficult (NP-Hard), there are a variety of analysis methods and heuristic optimization algorithms for searching for clusters, such as Lloyd's K-Means Algorithm. The Jenks optimization algorithm discussed above is essentially k-means performed on a single variable.
In many geographic classification schemes, each class is defined by a set of criteria, that manifests as a spatial region. In this case, finding the region corresponding to each class can be implemented as a multi-criteria evaluation or Suitability analysis procedure, using GIS analysis methods such as queries, buffers, overlay, and map algebra.
- Longely et al. "Chapter 3: Representing Geography", Geographic Information Systems and Science. 2011.
- ArcGIS 10.1 Help, "Data classification." Accessed 22 Oct 2012
- Data ClassificationUniversity of California Santa Barbara. Accessed 22 October 2012
- Geographic Information Technology Training Alliance. Accessed 08 November 2015
- Jenks, George F. 1967. "The Data Model Concept in Statistical Mapping", International Yearbook of Cartography 7: 186-190.
- McMaster, Robert, "In Memoriam: George F. Jenks (1916-1996)". Cartography and Geographic Information Science. 24(1) p.56-59.
- Standard Deviation Classification, GIS Dictionary. http://support.esri.com/en/knowledgebase/GISDictionary/term/standard%20deviation%20classification