Geographic Information Retrieval

Jump to: navigation, search

Geographic Information Retrieval (GIR) or Geographical Information Retrieval is the augmentation of Information Retrieval with geographic metadata.

Information Retrieval generally views documents as a collection or `bag' of words. In contrast Geographic Information Retrieval requires a small amount of semantic data to be present (namely a location or geographic feature associated with a document). Because of this it is common in GIR to separate the text indexing and analysis from the geographic indexing.

GIR Systems can commonly be broken down into the following stages: GeoTagging, Text and Geographic indexing, Data storage, Geographic relevance ranking (wrt a geographic query) and Browsing results (commonly with a map interface).

GIR Systems

GIR involves extracting and resolving the meaning of locations in unstructured text. This is known as Geoparsing. A few tools offer this kind of capabilities, including GeoLocator and MetaCarta's GeoTagger.

After identifying location references in text, a GIR system must index this information for search and retrieval. Only a few such systems exist: Google Maps, Tumba, MetaCarta's Geographic Text Search (GTS) system, and the EU funded SPIRIT (Spatially-Aware Information Retrieval on the Internet) project.


In 2005 the Cross Language Evaluation Forum added a geographic track: GeoCLEF. GeoCLEF was the first TREC style evaluation forum for GIR systems and provided participants a chance to compare systems.

This paper by Andras Kornai describes issues involved in Evaluating geographic information retrieval [1] systems.


In 2004 Chris Jones and Ross Purves held the first GIR workshop at SIGIR. Due to the success of the workshop it was repeated in 2005 and 2008 at CIKM and 2006 at SIGIR.

In 2003 a workshop on geographic references was held in conjunction with HLT-NAACL in Edmonton, Canada. Proceedings of the HLT-NAACL Symposium (hosted by MetaCarta)

See also