TY - JOUR
T1 - Whose story is it anyway? Automatic extraction of accounts from news articles
AU - Zhang, Hao
AU - Boons, Frank
AU - Batista-Navarro, Riza
N1 - Funding Information:
The research on which this article is based was partially funded by the Alliance Manchester Business School Strategic Investment Fund.
Publisher Copyright:
© 2019
PY - 2019/9/1
Y1 - 2019/9/1
N2 - Narratives are comprised of stories that provide insight into social processes. To facilitate the analysis of narratives in a more efficient manner, natural language processing (NLP) methods have been employed in order to automatically extract information from textual sources, e.g., newspaper articles. Existing work on automatic narrative extraction, however, has ignored the nested character of narratives. In this work, we argue that a narrative may contain multiple accounts given by different actors. Each individual account provides insight into the beliefs and desires underpinning an actor's actions. We present a pipeline for automatically extracting accounts, consisting of NLP methods for: (1) named entity recognition, (2) event extraction, and (3) attribution extraction. Machine learning-based models for named entity recognition were trained based on a state-of-the-art neural network architecture for sequence labelling. For event extraction, we developed a hybrid approach combining the use of semantic role labelling tools, the FrameNet repository of semantic frames, and a lexicon of event nouns. Meanwhile, attribution extraction was addressed with the aid of a dependency parser and Levin's verb classes. To facilitate the development and evaluation of these methods, we constructed a new corpus of news articles, in which named entities, events and attributions have been manually marked up following a novel annotation scheme that covers over 20 event types relating to socio-economic phenomena. Evaluation results show that relative to a baseline method underpinned solely by semantic role labelling tools, our event extraction approach optimises recall by 12.22-14.20 percentage points (reaching as high as 92.60% on one data set). Meanwhile, the use of Levin's verb classes in attribution extraction obtains optimal performance in terms of F-score, outperforming a baseline method by 7.64-11.96 percentage points. Our proposed approach was applied on news articles focused on industrial regeneration cases. This facilitated the generation of accounts of events that are attributed to specific actors.
AB - Narratives are comprised of stories that provide insight into social processes. To facilitate the analysis of narratives in a more efficient manner, natural language processing (NLP) methods have been employed in order to automatically extract information from textual sources, e.g., newspaper articles. Existing work on automatic narrative extraction, however, has ignored the nested character of narratives. In this work, we argue that a narrative may contain multiple accounts given by different actors. Each individual account provides insight into the beliefs and desires underpinning an actor's actions. We present a pipeline for automatically extracting accounts, consisting of NLP methods for: (1) named entity recognition, (2) event extraction, and (3) attribution extraction. Machine learning-based models for named entity recognition were trained based on a state-of-the-art neural network architecture for sequence labelling. For event extraction, we developed a hybrid approach combining the use of semantic role labelling tools, the FrameNet repository of semantic frames, and a lexicon of event nouns. Meanwhile, attribution extraction was addressed with the aid of a dependency parser and Levin's verb classes. To facilitate the development and evaluation of these methods, we constructed a new corpus of news articles, in which named entities, events and attributions have been manually marked up following a novel annotation scheme that covers over 20 event types relating to socio-economic phenomena. Evaluation results show that relative to a baseline method underpinned solely by semantic role labelling tools, our event extraction approach optimises recall by 12.22-14.20 percentage points (reaching as high as 92.60% on one data set). Meanwhile, the use of Levin's verb classes in attribution extraction obtains optimal performance in terms of F-score, outperforming a baseline method by 7.64-11.96 percentage points. Our proposed approach was applied on news articles focused on industrial regeneration cases. This facilitated the generation of accounts of events that are attributed to specific actors.
KW - Attribution extraction
KW - COUNTER-NARRATIVES
KW - Corpus annotation
KW - Event extraction
KW - GRAPHS
KW - Named entity recognition
KW - Narrative analysis
U2 - 10.1016/j.ipm.2019.02.012
DO - 10.1016/j.ipm.2019.02.012
M3 - Article
SN - 0306-4573
VL - 56
SP - 1837
EP - 1848
JO - Information Processing & Management
JF - Information Processing & Management
IS - 5
ER -