Inferential Statistics

Inferential Statistics:

Much work in statistics is inferential; using information gathered from samples to make general conclusions about a larger population. In a standard statistical inference, samples are obtained independently. In geographic information systems, the dataset is often all there is in a given area. it is the population.

Many geographers use data that is obtained from samples instead of comprehensive data pertaining to an entire population. When geographers measure these samples, they want to assume that these measurements can be extrapolated to the entire population they want to study. This method of assuming that the sample represents the population, to some degree of accuracy, is called inferential statistics.

Sampling
It is impossible to replicate the real world in entirety for use in experimentation. Instead, sampling is used to represent the complexities of the world on a smaller scale, which is more practical for analysis. The results of the experiment can then be extrapolated for use on a larger scale. The sampling method used is important as geographic data are only as good as the sample from which they are created In GIS, this is often done by creating random points and comparing them with the measured or observed data set. These points are generated using random x and y values. However, continuous surfaces have an infinite number of possible values that can be examined. Thus, it is necessary to sample only some of the points and extrapolate the data derived from these points.

There are three main types of sampling:
 * Simple Random
 * Stratified
 * Systematic

It is important to note that the simple random and systematic sampling methods are only useful in situations where each observation is assumed to have equal weight. Therefore, instances where Tobler’s First Law of Geography applies, and the data are spatially autocorrelated, are not going to be accurately represented by random sampling. This can be fixed with different sampling methods such as stratified sampling, where areas that have a greater impact on the results can be sampled more frequently.

These ideas are best applied when collecting one’s own data. This is not often the case when working in GIS and so it is important to carefully review the metadata for data collected by other parties in order to be aware of possible problems that may arise from sampling technique.

Hypothesis Testing
Hypothesis Testing is the third step of the scientific method process. It's the practice of testing a hypothesis by comparing it with the null hypothesis. The null hypothesis is only rejected if its probability it's below a predetermined significance level. If it were to be rejected, the hypothesis being tested is said to have that level of significance. Randomization tests are uniquely adapted to testing hypotheses about spatial.

In GIS, we can use hypothesis testing when we analyze all the data there is about a given area, rather than a sample. For example, one can take a dataset about who people will vote for in the next presidential election and use a scientific reasoning and hypothesis-testing approach. We take a sample, we then ask whether the evidence from the sample supports our hypothesis. This dataset can be put into ARCMap and represented in a visual form.

Some examples of hypothesis tests include Student t-Test and ANOVA.