|
| | | |
To Learn or to Rule: Two Approaches for Extracting Geographical Information from Unstructured Text
Katz, P. and Schill, A.
Geographical data plays an important role on the Web:
recent search engine statistics regularly confirm that
a growing number of search queries have a locale context
or contain terms referring to locations. Assessing
geographical relevance for Web pages and text documents
requires information extraction techniques for
recognizing and disambiguating geographical entities
from unstructured text. We present a new corpus for
evaluation purposes, which we make publicly available
for research, describe two approaches for extracting
geographical entities from English text|one based
on heuristics, the other relying on machine learning
techniques and perform an extensive discussion of
those two approaches. Furthermore, we compare our
approach to other publicly available location extraction
services. Our results show, that the presented
approaches outperform current state of the art systems. |
Cite as: Katz, P. and Schill, A. (2013). To Learn or to Rule: Two Approaches for Extracting Geographical Information from Unstructured Text. In Proc. Eleventh Australasian Data Mining Conference (AusDM13) Canberra, Australia. CRPIT, 146. Christen, P., Kennedy, P., Liu, L., Ong, K.L., Stranieri, A. and Zhao, Y. Eds., ACS. 117-127 |
(from crpit.com)
(local if available)
|
|