Toponym extraction and disambiguation are key topics recently addressed by fields of information extraction and geographical information retrieval. Toponym extraction and disambiguation are highly dependent processes. Not only toponym extraction effectiveness affects disambiguation, but also disambiguation results may help improving extraction accuracy. In this paper we propose a hybrid toponym extraction approach based on hidden markov models (hmm) and support vector machines (svm). Hidden markov model is used for extraction with high recall and low precision. Then svm is used to find false positives based on informativeness features and coherence features derived from the disambiguation results. Experimental results conducted with a set of descriptions of holiday homes with the aim to extract and disambiguate toponyms showed that the proposed approach outperform the state of the art methods of extraction and also proved to be robust. Robustness is proved on three aspects: language independence, high and low hmm threshold settings, and limited training data.keywordssupport vector machinehide markov modelname entity recognitioninverse document frequencyentity recognitionthese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
|Title of host publication||Proceedings of the International Conference on Language Processing and Intelligent Information Systems (LPIIS 2013), Warsaw, Poland|
|Place of Publication||Berlin|
|Publication status||Published - 1 Jun 2013|
|Series||Lecture Notes in Computer Science|
- Toponym Extraction
- Toponyms Disambiguation
- Hybrid System
- Multilingual Extraction and Disambiguation.