Classification is "the process of sorting or arranging entities into groups or categories; on a map, the process of representing members of a group by the same symbol, usually defined in a legend." Classification is used in GIS, cartography and remote sensing to generalize complexity in, and extract meaning from, geographic phenomena and geospatial data. There are different kinds of classifications, but all will generally involve a classification schema or key, which is a set of criteria (usually based on the attributes of the individuals) for deciding which individuals go into each class.  Changing the classification of a data set can create a variety of different maps. 
Humans use categories in every aspect of everyday life to make sense of the world. Common nouns are categories of entities; most adjectives are categories of the attributes of those entities. Even regions can be thought of as spatial categories.
A useful category is one that creates cognitive efficiency by allowing us to think about the groups as a whole without having to think about the individuals. To accomplish this, the ideal category would consist of individuals that share a wide variety of characteristics with other members of the category, while having very little in common with individuals outside the category. That is, good categories should have minimal intra-category variation and maximal inter-category variation. In such an ideal situation, one can deduce the characteristics of an individual from the categories to which it belongs with greater certainty (that is, there is less danger of ecological fallacy). By this criterion, the worst possible set of categories would be to randomly assign individuals to categories; the lack of any intra-category similarity means that one cannot safely deduce anything about an individual based on the characteristics of the category (although this lack of predictability makes random categorization an ideal approach for statistical sampling).
An ideal set of categories should be able to classify each individual unambiguously. Therefore, classification schema should be mutually exclusive and collectively exhaustive. Mutually exclusive means that there is no overlap between any two categories (i.e., an individual cannot belong to two categories simultaneously), while collectively exhaustive means that the categories "exhaust" or include all individuals (i.e., an individual cannot belong to zero categories). Thus, in an ideal classification schema, each individual belongs to one and only one category.
Everyday categories often violate these ideals. Boundary cases (individuals that seem to fall in between two categories, such as a plant that has some characteristics of a "tree" and some characteristics of a "shrub") and outliers (individuals that don't seem to have much in common with anything else) are common. An individual is usually similar to a set of others in some characteristics, but very different from them in other characteristics. Because geospatial technology is typically not good at dealing with these vagueries, operational classification in geospatial applications typically requires artificial decisions and thresholds.
Types of Classification Schema
Classification schema can take a number of forms and can be derived using a variety of methods. The choice of schema type and classification methodology depends largely on the nature of the source data and the nature of the criteria for putting each individual into a class. Choosing which classification method to use could be the hardest and most complicated decision an analyst will make. In many cases, multiple schemes are available that are equally valid but portray very different patterns in analysis and visualization. Information can even be falsely represented if the correct method is not used. 
The simplest method is to divide the range of values of a single quantitative attribute into ordinal classes. This is the method usually used for choropleth and isarithmic maps. For example, the incomes of families in a county could be classified as "high" (>$200,000), "medium" ($40,000-199,999), and "low" (<$40,000). There are several techniques for developing this type of schema, based on patterns in the data:
- Equal Interval: arranges a set of attribute values into groups that contain an equal range of values. This can help show different groups when they are close in size. However, this doesn't often occur in geographic phenomena. Take the range of your data (maximum - minimum) and divide by your chosen number of categories. 
- Quantile: divides the attribute values equally into a predefined number of classes. The attribute values are added up, then divided into the predetermined number of classes. In order to do this, you take the number of total observations and divide that by the number of classes resulting in the number of observations in each class. One of the advantages to using this method is that the classes are easy to compute and each class is equally represented on the map. Ordinal data can be easily classified using this method since the class assignment of quantiles is based on ranked data .
- Jenks Natural Breaks: The Jenks Natural Breaks Classification,  (or Optimization) system is a data classification method designed to optimize the arrangement of a set of values into "natural" classes. This is done by seeking to minimize the average deviation from the class mean while maximizing the deviation from the means of the other groups. The method reduces the variance within classes and maximizes the variance between classes.
- Geometric Interval: This classification method is used for visualizing continuous data that is not distributed normally. This method was designed to work on data that contains excessive duplicate values, e.g., 35% of the features have the same value.
- Standard Deviation: The Standard Deviation Classification method finds the mean value of the observations then places class breaks above and below the mean at intervals of either .25, .5, or 1 standard deviation until all the data values are contained within the classes.  This classification method shows how much the feature's attribute value varies from the mean. Using a diverging color scheme to illustrate these values is useful to emphasize which observations are above the mean and which observations are below the mean.
A Decision Tree is an ordered set of questions applied to each individual entity (or each point in space) to determine the category to which it belongs. The answer to each question either results in a final choice of category or leads to another more specific question. As these questions branch out into more possibilities, the diagram takes on the shape of a horizontal tree.  The questions can involve a wide range of attributes and criteria. As a geographic example, the Köppen climate classification system is usually implemented as a decision tree. Also, the biological species of an organism is usually determined using a decision tree.
In GIS, decision tree classification schemes are typically implemented by evaluating each question for the entire study area using relevant data GIS analysis techniques, such as query and overlay (in raster or vector). The "answer" for a given question will thus be a set of regions for each possible answer, each of which can be attributed with a final class, or used as a mask for where to apply the next question. For example, the Köppen climate classification system can be modeled using 24 raster grids (long-term mean precipitation and temperature for each month) and Map Algebra.
Clustering is a classification method that is most commonly used in data mining and remote sensing image analysis. It is based on the premise that if a set of meaningful categories exists in a phenomenon (e.g., types of land cover), they should appear as patterns in the characteristics of the phenomena. Specifically, there should be clusters of individuals that are similar in several attributes while being very different from other individuals in the same attributes (i.e., the minimal intra-variability, maximal inter-variability ideal discussed above). For example, if humans are able to intuitively identify different types of land cover in an aerial photograph by recognizing similarities and differences in color and texture, then remote sensing software should be able to identify the same patterns in multispectral imagery data.
While analytically identifying the perfect set of clusters in a multivariate dataset is computationally difficult (NP-Hard), there are a variety of analysis methods and heuristic optimization algorithms for searching for clusters, such as Lloyd's K-Means Algorithm. The Jenks optimization algorithm discussed above is essentially k-means performed on a single variable.
In many geographic classification schemes, each class is defined by a set of criteria, that manifests as a spatial region. In this case, finding the region corresponding to each class can be implemented as a multi-criteria evaluation or Suitability analysis procedures. This is done using GIS analysis methods such as queries, buffers, overlay, and map algebra.
For example, imagine that a GIS analyst is searching for the best site to build a hypothetical waste management facility, based on certain spatial criteria. Such criteria could be that the imagined facility needs to be near existing roads, far from wildlife reserves, and far from land use areas zoned as residential. In order to classify the better areas versus areas that are less than ideal, the analyst could use multi- criteria evaluation to consider the several variables that would affect where the facility would ultimately be located. With this method, it is also possible to set weights for each criterion so certain variables can be considered more strongly than others if needed.  For example, in the case of the waste management facility, if it were more important for the facility to be located far from residential areas than near existing roads, this could be accounted for in the classification. This method of classification is especially useful in GIS because it is often necessary to consider multiple spatial criteria when working with data.
If the final set of classes is ordinal (for example, low-medium-high earthquake hazard potential), then it can be modeled as an index, a pseudo-measurement of something that cannot directly be measured (in the above example, hazard potential on a scale of 1-10), typically based on factors that can be measured. The most common way this is done is that each contributing factor is mapped, with attributes scaled to a common quantitative scale, then combined using a formula such as Weighted Linear Combination to produce a final score.
- Pidwirny, M. (2006). "Climate Classification and Climatic Regions of the World". Fundamentals of Physical Geography, 2nd Edition. http://www.physicalgeography.net/fundamentals/7v.html
- Longely et al. "Chapter 3: Representing Geography", Geographic Information Systems and Science. 2011.
- ArcGIS 10.1 Help, "Data classification." Accessed 22 Oct 2012
- Data ClassificationUniversity of California Santa Barbara. Accessed 22 October 2012
- "Data Classification." The National Center for Geographic Information and Analysis. http://www.ncgia.ucsb.edu/cctp/units/unit47/html/comp_class.html
- Geographic Information Technology Training Alliance. Accessed 08 November 2015
- Jenks, George F. 1967. "The Data Model Concept in Statistical Mapping", International Yearbook of Cartography 7: 186-190.
- McMaster, Robert, "In Memoriam: George F. Jenks (1916-1996)". Cartography and Geographic Information Science. 24(1) p.56-59.
- Standard Deviation Classification, GIS Dictionary. http://support.esri.com/en/knowledgebase/GISDictionary/term/standard%20deviation%20classification
- "What is a Decision Tree Diagram." Lucid Chart. https://www.lucidchart.com/pages/decision-tree
- Eastman, J.R. Multi-criteria evaluation and GIS. Accessed on November 5, 2017. Accessed from https://www.geos.ed.ac.uk/~gisteac/gis_book_abridged/files/ch35.pdf.