Whose story is it anyway? Automatic extraction of accounts from news articles

Hao Zhang; Frank Boons; Riza Batista-Navarro

doi:10.1016/j.ipm.2019.02.012

Whose story is it anyway? Automatic extraction of accounts from news articles

Hao Zhang, Frank Boons, Riza Batista-Navarro^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Narratives are comprised of stories that provide insight into social processes. To facilitate the analysis of narratives in a more efficient manner, natural language processing (NLP) methods have been employed in order to automatically extract information from textual sources, e.g., newspaper articles. Existing work on automatic narrative extraction, however, has ignored the nested character of narratives. In this work, we argue that a narrative may contain multiple accounts given by different actors. Each individual account provides insight into the beliefs and desires underpinning an actor's actions. We present a pipeline for automatically extracting accounts, consisting of NLP methods for: (1) named entity recognition, (2) event extraction, and (3) attribution extraction. Machine learning-based models for named entity recognition were trained based on a state-of-the-art neural network architecture for sequence labelling. For event extraction, we developed a hybrid approach combining the use of semantic role labelling tools, the FrameNet repository of semantic frames, and a lexicon of event nouns. Meanwhile, attribution extraction was addressed with the aid of a dependency parser and Levin's verb classes. To facilitate the development and evaluation of these methods, we constructed a new corpus of news articles, in which named entities, events and attributions have been manually marked up following a novel annotation scheme that covers over 20 event types relating to socio-economic phenomena. Evaluation results show that relative to a baseline method underpinned solely by semantic role labelling tools, our event extraction approach optimises recall by 12.22-14.20 percentage points (reaching as high as 92.60% on one data set). Meanwhile, the use of Levin's verb classes in attribution extraction obtains optimal performance in terms of F-score, outperforming a baseline method by 7.64-11.96 percentage points. Our proposed approach was applied on news articles focused on industrial regeneration cases. This facilitated the generation of accounts of events that are attributed to specific actors.

Original language	English
Pages (from-to)	1837-1848
Number of pages	12
Journal	Information Processing & Management
Volume	56
Issue number	5
DOIs	https://doi.org/10.1016/j.ipm.2019.02.012
Publication status	Published - 1 Sept 2019
Externally published	Yes

Keywords

Attribution extraction
COUNTER-NARRATIVES
Corpus annotation
Event extraction
GRAPHS
Named entity recognition
Narrative analysis

Access to Document

10.1016/j.ipm.2019.02.012Licence: CC BY

https://linkinghub.elsevier.com/retrieve/pii/S0306457318306101

Cite this

@article{a15ba3ae4e314b1092f10fb9c9eb26e2,

title = "Whose story is it anyway? Automatic extraction of accounts from news articles",

abstract = "Narratives are comprised of stories that provide insight into social processes. To facilitate the analysis of narratives in a more efficient manner, natural language processing (NLP) methods have been employed in order to automatically extract information from textual sources, e.g., newspaper articles. Existing work on automatic narrative extraction, however, has ignored the nested character of narratives. In this work, we argue that a narrative may contain multiple accounts given by different actors. Each individual account provides insight into the beliefs and desires underpinning an actor's actions. We present a pipeline for automatically extracting accounts, consisting of NLP methods for: (1) named entity recognition, (2) event extraction, and (3) attribution extraction. Machine learning-based models for named entity recognition were trained based on a state-of-the-art neural network architecture for sequence labelling. For event extraction, we developed a hybrid approach combining the use of semantic role labelling tools, the FrameNet repository of semantic frames, and a lexicon of event nouns. Meanwhile, attribution extraction was addressed with the aid of a dependency parser and Levin's verb classes. To facilitate the development and evaluation of these methods, we constructed a new corpus of news articles, in which named entities, events and attributions have been manually marked up following a novel annotation scheme that covers over 20 event types relating to socio-economic phenomena. Evaluation results show that relative to a baseline method underpinned solely by semantic role labelling tools, our event extraction approach optimises recall by 12.22-14.20 percentage points (reaching as high as 92.60% on one data set). Meanwhile, the use of Levin's verb classes in attribution extraction obtains optimal performance in terms of F-score, outperforming a baseline method by 7.64-11.96 percentage points. Our proposed approach was applied on news articles focused on industrial regeneration cases. This facilitated the generation of accounts of events that are attributed to specific actors.",

keywords = "Attribution extraction, COUNTER-NARRATIVES, Corpus annotation, Event extraction, GRAPHS, Named entity recognition, Narrative analysis",

author = "Hao Zhang and Frank Boons and Riza Batista-Navarro",

note = "Funding Information: The research on which this article is based was partially funded by the Alliance Manchester Business School Strategic Investment Fund. Publisher Copyright: {\textcopyright} 2019",

year = "2019",

month = sep,

day = "1",

doi = "10.1016/j.ipm.2019.02.012",

language = "English",

volume = "56",

pages = "1837--1848",

journal = "Information Processing & Management",

issn = "0306-4573",

publisher = "Elsevier Limited",

number = "5",

}

TY - JOUR

T1 - Whose story is it anyway? Automatic extraction of accounts from news articles

AU - Zhang, Hao

AU - Boons, Frank

AU - Batista-Navarro, Riza

PY - 2019/9/1

Y1 - 2019/9/1

N2 - Narratives are comprised of stories that provide insight into social processes. To facilitate the analysis of narratives in a more efficient manner, natural language processing (NLP) methods have been employed in order to automatically extract information from textual sources, e.g., newspaper articles. Existing work on automatic narrative extraction, however, has ignored the nested character of narratives. In this work, we argue that a narrative may contain multiple accounts given by different actors. Each individual account provides insight into the beliefs and desires underpinning an actor's actions. We present a pipeline for automatically extracting accounts, consisting of NLP methods for: (1) named entity recognition, (2) event extraction, and (3) attribution extraction. Machine learning-based models for named entity recognition were trained based on a state-of-the-art neural network architecture for sequence labelling. For event extraction, we developed a hybrid approach combining the use of semantic role labelling tools, the FrameNet repository of semantic frames, and a lexicon of event nouns. Meanwhile, attribution extraction was addressed with the aid of a dependency parser and Levin's verb classes. To facilitate the development and evaluation of these methods, we constructed a new corpus of news articles, in which named entities, events and attributions have been manually marked up following a novel annotation scheme that covers over 20 event types relating to socio-economic phenomena. Evaluation results show that relative to a baseline method underpinned solely by semantic role labelling tools, our event extraction approach optimises recall by 12.22-14.20 percentage points (reaching as high as 92.60% on one data set). Meanwhile, the use of Levin's verb classes in attribution extraction obtains optimal performance in terms of F-score, outperforming a baseline method by 7.64-11.96 percentage points. Our proposed approach was applied on news articles focused on industrial regeneration cases. This facilitated the generation of accounts of events that are attributed to specific actors.

AB - Narratives are comprised of stories that provide insight into social processes. To facilitate the analysis of narratives in a more efficient manner, natural language processing (NLP) methods have been employed in order to automatically extract information from textual sources, e.g., newspaper articles. Existing work on automatic narrative extraction, however, has ignored the nested character of narratives. In this work, we argue that a narrative may contain multiple accounts given by different actors. Each individual account provides insight into the beliefs and desires underpinning an actor's actions. We present a pipeline for automatically extracting accounts, consisting of NLP methods for: (1) named entity recognition, (2) event extraction, and (3) attribution extraction. Machine learning-based models for named entity recognition were trained based on a state-of-the-art neural network architecture for sequence labelling. For event extraction, we developed a hybrid approach combining the use of semantic role labelling tools, the FrameNet repository of semantic frames, and a lexicon of event nouns. Meanwhile, attribution extraction was addressed with the aid of a dependency parser and Levin's verb classes. To facilitate the development and evaluation of these methods, we constructed a new corpus of news articles, in which named entities, events and attributions have been manually marked up following a novel annotation scheme that covers over 20 event types relating to socio-economic phenomena. Evaluation results show that relative to a baseline method underpinned solely by semantic role labelling tools, our event extraction approach optimises recall by 12.22-14.20 percentage points (reaching as high as 92.60% on one data set). Meanwhile, the use of Levin's verb classes in attribution extraction obtains optimal performance in terms of F-score, outperforming a baseline method by 7.64-11.96 percentage points. Our proposed approach was applied on news articles focused on industrial regeneration cases. This facilitated the generation of accounts of events that are attributed to specific actors.

KW - Attribution extraction

KW - COUNTER-NARRATIVES

KW - Corpus annotation

KW - Event extraction

KW - GRAPHS

KW - Named entity recognition

KW - Narrative analysis

U2 - 10.1016/j.ipm.2019.02.012

DO - 10.1016/j.ipm.2019.02.012

M3 - Article

SN - 0306-4573

VL - 56

SP - 1837

EP - 1848

JO - Information Processing & Management

JF - Information Processing & Management

IS - 5

ER -