Authorship Disambiguation and Alias Resolution in Email Data

Jan Scholtes, F. Maes

Research output: Contribution to conferencePaperAcademic

Abstract

Given a data set of email messages we are interested in how to resolve aliases and disambiguate authors even if their names are misspelled, if they use completely different email addresses or if they deliberately use aliases. This is done by using a combination of string similarity metrics and techniques from authorship attribution and link analysis. These techniques are combined by using a voting algorithm that is based on a Support Vector Machine. The approach is tested on a cleaned subset of the ENRON email data set. The results show that a combination of Jaro-Winkler email address similarity, Support Vector Machine on writing style attributes and Jaccard similarity of the link network outperforms the use of each of these techniques separately.

Original languageEnglish
Publication statusPublished - 25 Oct 2012
Event24th Benelux Conference on Artificial Intelligence - Maastricht, Netherlands
Duration: 25 Oct 201226 Oct 2012

Conference

Conference24th Benelux Conference on Artificial Intelligence
Abbreviated titleBNAIC 2012
Country/TerritoryNetherlands
CityMaastricht
Period25/10/1226/10/12

Fingerprint

Dive into the research topics of 'Authorship Disambiguation and Alias Resolution in Email Data'. Together they form a unique fingerprint.

Cite this