Whose story is it anyway? Automatic extraction of accounts from news articles

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Narratives are comprised of stories that provide insight into social processes. To facilitate the analysis of narratives in a more efficient manner, natural language processing (NLP) methods have been employed in order to automatically extract information from textual sources, e.g., newspaper articles. Existing work on automatic narrative extraction, however, has ignored the nested character of narratives. In this work, we argue that a narrative may contain multiple accounts given by different actors. Each individual account provides insight into the beliefs and desires underpinning an actor's actions. We present a pipeline for automatically extracting accounts, consisting of NLP methods for: (1) named entity recognition, (2) event extraction, and (3) attribution extraction. Machine learning-based models for named entity recognition were trained based on a state-of-the-art neural network architecture for sequence labelling. For event extraction, we developed a hybrid approach combining the use of semantic role labelling tools, the FrameNet repository of semantic frames, and a lexicon of event nouns. Meanwhile, attribution extraction was addressed with the aid of a dependency parser and Levin's verb classes. To facilitate the development and evaluation of these methods, we constructed a new corpus of news articles, in which named entities, events and attributions have been manually marked up following a novel annotation scheme that covers over 20 event types relating to socio-economic phenomena. Evaluation results show that relative to a baseline method underpinned solely by semantic role labelling tools, our event extraction approach optimises recall by 12.22-14.20 percentage points (reaching as high as 92.60% on one data set). Meanwhile, the use of Levin's verb classes in attribution extraction obtains optimal performance in terms of F-score, outperforming a baseline method by 7.64-11.96 percentage points. Our proposed approach was applied on news articles focused on industrial regeneration cases. This facilitated the generation of accounts of events that are attributed to specific actors.

Original languageEnglish
Pages (from-to)1837-1848
Number of pages12
JournalInformation Processing & Management
Volume56
Issue number5
DOIs
Publication statusPublished - 1 Sep 2019
Externally publishedYes

Keywords

  • Attribution extraction
  • COUNTER-NARRATIVES
  • Corpus annotation
  • Event extraction
  • GRAPHS
  • Named entity recognition
  • Narrative analysis

Cite this