Modifiable areal unit problem

From Wiki.GIS.com

(Redirected from MAUP)
Jump to:navigation, search

Contents

[edit] Background

The modifiable areal unit problem (MAUP) is a source of statistical bias that can radically affect the results of statistical hypothesis tests. MAUP can cause the correlation, or association, between two variables to range from -0.99 to +0.99. MAUP occurs when point-based measures of spatial phenomena (e.g., population density) are aggregated into districts, the resulting summary values (e.g., totals, rates, proportions) are influenced by the choice of district boundaries. For example, census data may be aggregated into census enumeration districts, or postcode areas, or police precincts, or any other spatial partition (thus, the 'areal units' are 'modifiable'). Variation in the spatial units used for aggregation causes variation in statistical results.

The issue was discovered in 1934. The term MAUP was first coined and described in detail by Openshaw (1984), who lamented that "the areal units (zonal objects) used in many geographical studies are arbitrary, modifiable, and subject to the whims and fancies of whoever is doing, or did, the aggregating." (Openshaw, 1984, p.3). The problem is especially crucial when the aggregate data are used for cluster analysis for spatial epidemiology, spatial statistics or choropleth mapping, in which misinterpretations can easily be made without realizing it. Many fields of science, especially human geography are prone to disregard the MAUP when drawing inferences from statistics based on aggregated data. MAUP is closely related to the topic of ecological fallacy and ecological bias.

Ecological bias caused by MAUP has been documented as two separate effects that usually occur simultaneously during the analysis of aggregated data. The scale effect causes variation in statistical results between different levels of aggregation. Therefore, association between variables depends on the size of areal units for which data are reported. Generally, correlation increases as areal unit size increases. The zone effect describes variation in correlation statistics caused by the regrouping of data into different configurations at the same scale.

Research since the 1930’s has found extra variation in statistical results because of MAUP. The standard methods of calculating within-group and between-group variance do not account for the extra variance seen in MAUP studies as the groupings change. MAUP can be used as a methodology to calculate upper and lower limits as well as average regression parameters for multiple sets of spatial groupings.

[edit] Suggested solutions

Several suggestions have been made in the literature to reduce aggregation bias during regression analysis. A researcher might correct the variance-covariance matrix using samples from individual-level data (Holt et al., 1996). Alternatively, one might focus on local spatial regression rather than global regression. Alternatively, a researcher might attempt to design areal units to maximize a particular statistical result (Openshaw, 1984). Others have argued that it may be difficult to construct a single set of optimal aggregation units for multiple variables, each of which may exhibit non-stationarity and spatial autocorrelation across space in different ways. Others have suggested developing statistics that change across scales in a predictable way, perhaps using the fractal dimension as a scale independent measure of spatial relationships. Others have suggested Bayesian hierarchical models as a general methodology for combining aggregated and individual-level data for ecological inference.

Studies of the MAUP based on empirical data can only provide limited insight due to an inability to control relationships between multiple spatial variables. Data simulation is necessary to have control over various properties of individual level data. Simulation studies such as those by Swift et al. (2008) have demonstrated that the spatial support of variables can effect the magnitude of ecological bias caused by spatial data aggregation.

[edit] MAUP sensitivity analysis

Using simulations for univariate data, Larsen (2000) advocated the use of a Variance Ratio to investigate the effect of spatial configuration, spatial association and data aggregation. A detailed description of variation of statistics due to MAUP is presented by Reynolds, H. (1998). Reynold’s research demonstrates the importance of the spatial arrangement and spatial autocorrelation of data values. Reynold’s simulation experiments were expanded by Swift, A. (2009). A series of nine exercises begins with simulated regression analysis and a spatial trend, then focuses on the topic of MAUP in the context of spatial epidemiology. A method of MAUP sensitivity analysis is presented that demonstrates MAUP is not entirely a problem. MAUP can be used as an analytical tool to help understand spatial heterogeneity and spatial autocorrelation.

This topic is of particular importance because (in some cases) data aggregation can obscure strong a correlation between variables, making the relationship appear weak or even negative. Conversely, MAUP can cause random variables to appear as if there is a significant association, when there is not. Multivariate regression parameters are more sensitive to MAUP than correlation coefficients. Until a more analytical solution to MAUP is available, spatial sensitivity analysis using a variety of areal units is recommended as a methodology to estimate uncertainty of correlation and regression coefficients due to ecological bias.

[edit] References

Navigation
Need Help
Toolbox
Share This Page