TwitterNEED: A hybrid approach for named entity extraction and disambiguation for tweet

M. B. Habib; M. van Keulen

doi:10.1017/S1351324915000194

TwitterNEED: A hybrid approach for named entity extraction and disambiguation for tweet

M. B. Habib^*, M. van Keulen

^*Corresponding author for this work

Robots, Agents, Interaction

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Twitter is a rich source of continuously and instantly updated information. Shortness and informality of tweets are challenges for Natural Language Processing tasks. In this paper, we present TwitterNEED, a hybrid approach for Named Entity Extraction and Named Entity Disambiguation for tweets. We believe that disambiguation can help to improve the extraction process. This mimics the way humans understand language and reduces error propagation in the whole system. Our extraction approach aims for high extraction recall first, after which a Support Vector Machine attempts to filter out false positives among the extracted candidates using features derived from the disambiguation phase in addition to other word shape and Knowledge Base features. For Named Entity Disambiguation, we obtain a list of entity candidates from the YAGO Knowledge Base in addition to top-ranked pages from the Google search engine for each extracted mention. We use a Support Vector Machine to rank the candidate pages according to a set of URL and context similarity features. For evaluation, five data sets are used to evaluate the extraction approach, and three of them to evaluate both the disambiguation approach and the combined extraction and disambiguation approach. Experiments show better results compared to our competitors DBpedia Spotlight, Stanford Named Entity Recognition, and the AIDA disambiguation system.

Original language	English
Pages (from-to)	423-456
Number of pages	34
Journal	Natural Language Engineering
Volume	22
Issue number	3
DOIs	https://doi.org/10.1017/S1351324915000194
Publication status	Published - 1 May 2016

Keywords

Named entity recognition named entity extraction named entity linking named entity disambiguation microblogs twitter tweets short messages

Access to Document

10.1017/S1351324915000194

Cite this

@article{9216f7d111aa4af9b9f8682c2e337df0,

title = "TwitterNEED: A hybrid approach for named entity extraction and disambiguation for tweet",

abstract = "Twitter is a rich source of continuously and instantly updated information. Shortness and informality of tweets are challenges for Natural Language Processing tasks. In this paper, we present TwitterNEED, a hybrid approach for Named Entity Extraction and Named Entity Disambiguation for tweets. We believe that disambiguation can help to improve the extraction process. This mimics the way humans understand language and reduces error propagation in the whole system. Our extraction approach aims for high extraction recall first, after which a Support Vector Machine attempts to filter out false positives among the extracted candidates using features derived from the disambiguation phase in addition to other word shape and Knowledge Base features. For Named Entity Disambiguation, we obtain a list of entity candidates from the YAGO Knowledge Base in addition to top-ranked pages from the Google search engine for each extracted mention. We use a Support Vector Machine to rank the candidate pages according to a set of URL and context similarity features. For evaluation, five data sets are used to evaluate the extraction approach, and three of them to evaluate both the disambiguation approach and the combined extraction and disambiguation approach. Experiments show better results compared to our competitors DBpedia Spotlight, Stanford Named Entity Recognition, and the AIDA disambiguation system.",

keywords = "Named entity recognition named entity extraction named entity linking named entity disambiguation microblogs twitter tweets short messages",

author = "Habib, {M. B.} and Keulen, {M. van}",

note = "http://eprints.eemcs.utwente.nl/26014/",

year = "2016",

month = may,

day = "1",

doi = "10.1017/S1351324915000194",

language = "English",

volume = "22",

pages = "423--456",

journal = "Natural Language Engineering",

issn = "1351-3249",

publisher = "Cambridge University Press",

number = "3",

}

TY - JOUR

T1 - TwitterNEED

T2 - A hybrid approach for named entity extraction and disambiguation for tweet

AU - Habib, M. B.

AU - Keulen, M. van

N1 - http://eprints.eemcs.utwente.nl/26014/

PY - 2016/5/1

Y1 - 2016/5/1

N2 - Twitter is a rich source of continuously and instantly updated information. Shortness and informality of tweets are challenges for Natural Language Processing tasks. In this paper, we present TwitterNEED, a hybrid approach for Named Entity Extraction and Named Entity Disambiguation for tweets. We believe that disambiguation can help to improve the extraction process. This mimics the way humans understand language and reduces error propagation in the whole system. Our extraction approach aims for high extraction recall first, after which a Support Vector Machine attempts to filter out false positives among the extracted candidates using features derived from the disambiguation phase in addition to other word shape and Knowledge Base features. For Named Entity Disambiguation, we obtain a list of entity candidates from the YAGO Knowledge Base in addition to top-ranked pages from the Google search engine for each extracted mention. We use a Support Vector Machine to rank the candidate pages according to a set of URL and context similarity features. For evaluation, five data sets are used to evaluate the extraction approach, and three of them to evaluate both the disambiguation approach and the combined extraction and disambiguation approach. Experiments show better results compared to our competitors DBpedia Spotlight, Stanford Named Entity Recognition, and the AIDA disambiguation system.

AB - Twitter is a rich source of continuously and instantly updated information. Shortness and informality of tweets are challenges for Natural Language Processing tasks. In this paper, we present TwitterNEED, a hybrid approach for Named Entity Extraction and Named Entity Disambiguation for tweets. We believe that disambiguation can help to improve the extraction process. This mimics the way humans understand language and reduces error propagation in the whole system. Our extraction approach aims for high extraction recall first, after which a Support Vector Machine attempts to filter out false positives among the extracted candidates using features derived from the disambiguation phase in addition to other word shape and Knowledge Base features. For Named Entity Disambiguation, we obtain a list of entity candidates from the YAGO Knowledge Base in addition to top-ranked pages from the Google search engine for each extracted mention. We use a Support Vector Machine to rank the candidate pages according to a set of URL and context similarity features. For evaluation, five data sets are used to evaluate the extraction approach, and three of them to evaluate both the disambiguation approach and the combined extraction and disambiguation approach. Experiments show better results compared to our competitors DBpedia Spotlight, Stanford Named Entity Recognition, and the AIDA disambiguation system.

KW - Named entity recognition named entity extraction named entity linking named entity disambiguation microblogs twitter tweets short messages

U2 - 10.1017/S1351324915000194

DO - 10.1017/S1351324915000194

M3 - Article

SN - 1351-3249

VL - 22

SP - 423

EP - 456

JO - Natural Language Engineering

JF - Natural Language Engineering

IS - 3

ER -