Goodness of fit

The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, such as to test for normality of residuals, whether two samples are drawn from identical distributions (see Kolmogorov-Smirnov test), or whether outcome frequencies follow a specified distribution (see Pearson's chi-square test). In the analysis of variance, one of the components into which the variance is partitioned may be a lack-of-fit sum of squares.

Chi-Square Test

One-sample chi-square test is used to test the goodness of fit of data.

One-Sample: The chi-square statistic is a sum of differences between observed and expected outcome frequencies. The resulting value from the test can be compared to the chi-square distribution to determine the goodness of fit or how well the observed data matches with the expected data.

Two-or-more samples: With two or more samples, the Pearson's chi-square test determines how samples compare to each other. The difference between observed and expected frequencies is still calculated but the difference between samples can be compared. An example is soil types on two different farms. The two farms are the samples, and the soil types are the categories that are being compared.

Cohen's Kappa

Cohen's Kappa is a measure of the agreement between two raters who determine which category a finite number of subjects belong to whereby agreement due to chance is factored out[1]. If two separate judges are trying to rate their preferences of concerts they went to over the summer, Cohen's Kappa can calculate how much these two judges agreed with each other while the probability of chance is completely factored out.

Shown here is an example of calculating the Cohen's Kappa of two concert goers who rated their concerts either "awesome", "terrible", or "just okay". The Cohen's Kappa calculates how much these two judges agree. The chance is calculated by multiplying the percents of each of the judges' total individual preferences. i.e. 10 *(34/74) * (20/74) = 1.2

How to Interpret Cohen's Kappa

Kappa is always less than or equal to 1. A value of 1 implies perfect agreement and values less than 1 imply less than perfect agreement. In rare situations, Kappa can be negative. This is a sign that the two observers agreed less than would be expected just by chance. It is rare that we get perfect agreement. Different people have different interpretations as to what is a good level of agreement.

Here is one possible interpretation of Kappa.[2]

Poor agreement = Less than 0.20

Fair agreement = 0.20 to 0.40

Moderate agreement = 0.40 to 0.60

Good agreement = 0.60 to 0.80

Very good agreement = 0.80 to 1.00

GIS&T Body of Knowledge

This topic is covered by sections AM1-2 and AM5-1 of the GIS&T Body of Knowledge.

References

1. http://www.real-statistics.com/reliability/cohens-kappa/
2. Simon, Steve. What is a Kappa Coefficient? - http://www.pmean.com/definitions/kappa.htm