Choropleth map

A choropleth map (from Greek χῶρος ("area/region") + πλῆθος ("multitude")) is a thematic map in which areas are shaded or patterned proportionally to the value of a particular variable measured for each area. Most often the variable is quantitative, with a color associated with an attribute value. Though not as common, it is possible to create a choropleth map with nominal data. Choropleth maps illustrate the value of a variable across the landscape with color that changes across the landscape within a particular geographic area. A choropleth map is an excellent way to visualize how a measurement varies across a geographic area.

The earliest known choropleth map was created in 1826 by Baron Pierre Charles Dupin.

=Designing a Choropleth Map= A choropleth map is constructed from several elements:
 * 1) A thematic variable of interest (e.g., median family income, population density, percent Latino)
 * 2) A set of districts subdividing the area of interest (e.g., cities, counties, provinces), as a polygon GIS data set
 * 3) A table of values, with a statistical summary of values of the variable in each district, often obtained as a non-spatial table
 * 4) A classification scheme for organizing the range of values of the variable (or a decision to not classify the values)
 * 5) A color scheme for mapping each value (or class of values) to a particular color
 * 6) A legend that clearly lists the values associated with each color

In a well-crafted choropleth map, each of these elements is carefully developed to clearly portray the natural geographic variation in the variable. That is, map readers should be able to intuitively see the real-world patterns without needing to work to decipher the elements themselves.

Choice of Districts
Choropleth maps are based on statistical data summarized over a set of districts (such as counties). This summarization is called "standardization of data" which sets either a geographic or numerical standard to base the data. In choropleth data, the boundaries of the districts are defined a priori (before adding data to the map) and are not based on patterns in the variable being mapped; that is they are arbitrary with respect to the data. Typically, the value of the given variable for each district is a summary of a large number of individuals or smaller regions within that district, and any variation within the district is not reported. For example, a choropleth map of median family income by county reports a single value for a county, which may contain neighborhoods of very high and very low income. In contrast, chorochromatic (area-class) and isarithmic maps use regions that are defined by patterns in the phenomenon being mapped. The popularity of choropleth maps is largely due to the convenience of obtaining this kind of data since governments typically report statistical information (e.g. Census) that has been aggregated into well-known districts such as cities, counties, and provinces.

Where the defined regions are important to a discussion (as in an election map divided by constituent jurisdictions or making policy for the regions), choropleth maps are ideal. However, when real-world patterns in the variable may not conform to the chosen regions, a choropleth map can mask the true pattern and give rise to interpretation issues like the ecological fallacy and the modifiable areal unit problem (MAUP), so other techniques may be preferable. For example, a map showing world population density by country will show the same color over Canada's entire extent, even though most of the population lives along the coasts and southern border of the country. Unfortunately, choropleth maps are frequently used in inappropriate applications due to the abundance of data in this form and the ease of choropleth map creation using Geographic Information Systems.

While these issues are inherent to the a priori nature of the districts and cannot be eliminated, the problem can be mitigated by choosing districts that are very small with respect to the scale of the map, so that map readers are more likely to make interpretations based on large collections of districts rather than looking at a single large district and making assumptions about the variability therein. For example, a choropleth map of the 3,000 counties in the United States is likely to be misinterpreted far less frequently than a choropleth map of the 50 states (although there are counties in the West that are as large as states in the East and may still be misread).

The dasymetric technique can be thought of as a solution to the districting problem in some situations. This technique uses other data sources to adjust the district boundaries. For example, in a map of a human population variable, land ownership or land use/land cover data can be used to exclude areas that are known to be uninhabited. Because the effective area of the district is changed, some variables such as population density need to be adjusted.

Choice of Variable
Technically, GIS software can create a choropleth map from any statistical variable aggregated into districts. However, some variables are preferred while others are generally inappropriate. As with the choice of districts, this distinction is based on avoiding the likelihood of misinterpretation.

The best variables for choropleth maps are those that can be conceptualized as continuous fields (also called statistical surfaces or spatially intensive variables), in which the variable could be theoretically measured at an arbitrary point or small region. Thus, a choropleth map is a discrete representation of a continuous field. For example, population density, median family income, and annual precipitation are all fields that can be appropriately mapped this way. Nominal field-type variables, such as "most prevalent primary language," are also appropriate for choropleth maps, as they are also statistical aggregations. However, a colored map of a nominal variable that has only a single value for each district, such as the religious affiliation of the representative of each legislative constituency, are not technically choropleth maps because they do not represent statistical aggregate summaries of more detailed data, and because the district boundaries and the variable are intimately related (in this case, because a single legislator represents that district). Instead, these should be considered chorochromatic maps.

Conversely, variables that are only meaningful for the entire district (spatially extensive variables, such as total counts), are typically avoided because they can be easily misinterpreted. Representing data types such as total counts is not ideal because "large areas as a consequence of their size, are likely to include more of whatever the map is about, and thus contain darker symbols than smaller areas with equal or greater density". Other thematic mapping techniques, such as proportional symbols are much more appropriate for visualizing total count variables.

A simple method for determining whether a variable is spatially intensive (appropriate for choropleth maps) or spatially extensive (problematic for choropleth maps) is the "addition test." Imagine you have two neighboring districts with a value of 50 in the chosen variable. Next, suppose you realign the districts so that these two become a single district. If you would expect the new district to have a value of 50 (e.g., population per square mile, percent Hispanic, annual precipitation), then it is a continuous field or intensive variable. If you expect it to have a value of 100 (e.g., total population, acres of farmland), then it is extensive.

Normalization


The problem with total counts arises when the districts are not all the same size (in either area or total population), as in the figure at right. Because a single color, representing a single value, is spread over the entire area of the district, large areas will be more dominant (i.e., higher in the visual hierarchy) than they should be, and are commonly misinterpreted as having larger values than smaller districts with the same color. To solve this issue, one can normalize the variable by dividing it by the total area, thus deriving density, which is a field.

Another misinterpretation can arise when one is mapping a variable that represents a subgroup of a larger population (e.g., a particular ethnicity). A choropleth map of this subgroup may show large total numbers in major cities, but it is unclear whether this is significant, because there are likely more of all of the other subgroups in the city as well. This too can be solved through normalization, by dividing the total in the subgroup by the total population to create a proportion (e.g., percent Hispanic).

Other valid forms of normalization for choropleth maps can be derived by computing ratios between two total amounts, such as rates of change (population growth = 2010 population / 2000 population) and mean allocations (mean family income = total income / total families), or other descriptive statistics such as the median or standard deviation.

Data Classification
The way data is classified and represented on a choropleth map determines how the data will be perceived and interpreted by the viewer. Choosing a classification method is an important decision because in most cases there are multiple available schemes that are equally valid but show the spatial patterns in the variable very differently. Common classification methods include:
 * Equal Interval (also known as Arithmetic Progression): classification of data by making all class ranges equal but the number of observations per class may vary. In normally distributed data, this tends to show more variation in (and thus draws attention to) the outlying values and groups the main cluster of "normal" values into fewer classes.
 * Quantile classification or Equal Frequency: classifying data by making the number of observations per class equal but varying the class ranges. In normally distributed data, this tends to show more variation in (and thus draws attention to) the main cluster of "normal" values, and groups the outliers into fewer classes.
 * Geometric Interval: the class breaks have an equal ration (e.g., 1,10,100,1000); useful for extremely skewed distributions.
 * Mean and Standard Deviations: can be used to classify data with a normal frequency distribution
 * Nested Means: similar to Mean and Standard Deviations, but does not require a normal frequency distribution
 * Natural Break Methods: grouping values by minimizing the within-class variance and maximizing the between-class variance; also called maximum homogeneity classification; uses traditional natural breaks or Jenks optimization
 * User Defined: user creates own classification system if specific divisions are required

Color Scheme
When producing a choropleth map, the cartographer should choose a set of colors that clearly and intuitively portrays the geographic and statistical patterns in the data. In both choropleth maps and heat maps, the intensity of color, either through saturation percentage or through the hue, denotes intensity of the variable. The resulting palette is referred to as a color gradient or sometimes color ramp. A quantitative variable that has been classified is essentially an ordinal variable, and should thus be represented by colors that have a clear order that suggests "more" and "less." Value (light vs. dark colors) is probably the most intuitive technique to do this. Conversely, it is not advisable to use a rainbow color scheme (Hue) in a quantitative choropleth map because each color has the same "weight," and the spatial pattern will not be visible in the map. Rainbow color schemes work better for nominal data. The ColorBrewer, created by Cynthia Brewer of Pennsylvania, is useful in formulating color swatches for choropleth maps.

Cartographers have developed several different types of color progression schemes to show choropleth data; each has advantages and disadvantages for portraying different kinds of patterns in different kinds of data.
 * Single-hue Sequential progressions fade from a dark shade of the color to a very light shade of the same hue color used. This is a common method used to show the magnitude of the data being represented on the map. Shades of gray are a simple form of this progression.
 * Part-spectral Sequential progressions also use value as the primary difference, but also vary somewhat in hue (in order around the color wheel), such as progressing from a pale yellow to a medium orange to a dark red. This has two primary advantages if designed well: it makes the map more interesting and attractive; and the hue difference strengthens the contrast between the categories, making interpretation easier and/or allowing for more categories.
 * Bi-polar progressions are normally used with two opposite hues to show a change in value from negative to positive or emphasize values on either side of a central tendency, such as the mean of the variable being mapped or other significant values like room temperature. For example, a typical progression when mapping temperatures is from dark blue (for cold) to dark red (for hot) with white in the middle.
 * A Qualitative progression is often used when working with nominal or qualitative data. The colors shown on the map seem unrelated to one another or are arbitrarily chosen. For example, a choropleth map of "most prevalent religion" would be best shown with this type of scheme.

Dot Density Maps
Many dot density maps are based on the same data model as choropleth maps: pre-defined districts with aggregate attribute values. These can be considered a kind of choropleth map, that "shades" each district using randomly-placed dots instead of solid color fills. This kind of map has all the same inherent interpretation issues of choropleth maps explained above. This is a very different conceptualization from dot density maps in which each dot represents the location of an individual feature, like a city, in which apparent density is the result of spatial clustering of these features rather than statistical aggregation. In fact, some cartographers have argued that the former type should not even be termed a dot density map to avoid confusion.

Choropleth Map Legends
The inclusion of a legend for a choropleth map is extremely important; without it, the colors have no meaning. All map colors and symbols represent a value, which is defined and explained in the map legend. The different values are represented in boxes within the legend and are usually listed vertically, but can be listed horizontally if the map is much wider than it is tall.

Frequency Histogram Legend
A frequency histogram legend is an alternative to the traditional choropleth legend. It consists of a histogram showing the statistical distribution of the variable, with each bar colored according to the map class in which it falls. While it is more difficult to construct in most GIS software, it has the advantage of helping map readers visualize both statistical and geographic patterns in the data simultaneously.