Abusive Language on Social Media Through the Legal Looking Glass

Thales Costa Bertaglia; Andreea Grigoriu; Michel Dumontier; Gijs van Dijck

Abusive Language on Social Media Through the Legal Looking Glass

Thales Costa Bertaglia^*, Andreea Grigoriu, Michel Dumontier, Gijs van Dijck

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

Abstract

Abusive language is a growing phenomenon on social media platforms. Its effects can reach beyond the online context, contributing to mental or emotional stress on users. Automatic tools for detecting abuse can alleviate the issue. In practice, developing automated methods to detect abusive language relies on good quality data. However, there is currently a lack of standards for creating datasets in the field. These standards include definitions of what is considered abusive language, annotation guidelines and reporting on the process. This paper introduces an annotation framework inspired by legal concepts to define abusive language in the context of online harassment. The framework uses a 7-point Likert scale for labelling instead of class labels. We also present ALYT – a dataset of Abusive Language on YouTube. ALYT includes YouTube comments in English extracted from videos on different controversial topics and labelled by Law students. The comments were sampled from the actual collected data, without artificial methods for increasing the abusive content. The paper describes the annotation process thoroughly, including all its guidelines and training steps.

Original language	English
Title of host publication	WOAH 2021: THE 5TH WORKSHOP ON ONLINE ABUSE AND HARMS
Publisher	Association for Computational Linguistics
Pages	191-200
Number of pages	10
ISBN (Print)	9781954085596
Publication status	Published - Jun 2021
Event	5th Workshop on Online Abuse and Harms (WOAH) - Virtual Duration: 5 Aug 2021 → 6 Aug 2021 https://www.aclweb.org/portal/content/fifth-workshop-online-abuse-and-harms

Conference

Conference	5th Workshop on Online Abuse and Harms (WOAH)
Abbreviated title	WOAH 5
Period	5/08/21 → 6/08/21
Internet address	https://www.aclweb.org/portal/content/fifth-workshop-online-abuse-and-harms

Access to Document

https://aclanthology.org/2021.woah-1.20.pdf

Cite this

@inproceedings{ff4b91ffa6b84333931209ffe1fb5896,

title = "Abusive Language on Social Media Through the Legal Looking Glass",

abstract = "Abusive language is a growing phenomenon on social media platforms. Its effects can reach beyond the online context, contributing to mental or emotional stress on users. Automatic tools for detecting abuse can alleviate the issue. In practice, developing automated methods to detect abusive language relies on good quality data. However, there is currently a lack of standards for creating datasets in the field. These standards include definitions of what is considered abusive language, annotation guidelines and reporting on the process. This paper introduces an annotation framework inspired by legal concepts to define abusive language in the context of online harassment. The framework uses a 7-point Likert scale for labelling instead of class labels. We also present ALYT – a dataset of Abusive Language on YouTube. ALYT includes YouTube comments in English extracted from videos on different controversial topics and labelled by Law students. The comments were sampled from the actual collected data, without artificial methods for increasing the abusive content. The paper describes the annotation process thoroughly, including all its guidelines and training steps.",

author = "{Costa Bertaglia}, Thales and Andreea Grigoriu and Michel Dumontier and {van Dijck}, Gijs",

year = "2021",

month = jun,

language = "English",

isbn = "9781954085596",

pages = "191--200",

booktitle = "WOAH 2021: THE 5TH WORKSHOP ON ONLINE ABUSE AND HARMS",

publisher = "Association for Computational Linguistics",

note = "5th Workshop on Online Abuse and Harms (WOAH), WOAH 5 ; Conference date: 05-08-2021 Through 06-08-2021",

url = "https://www.aclweb.org/portal/content/fifth-workshop-online-abuse-and-harms",

}

Abusive Language on Social Media Through the Legal Looking Glass. / Costa Bertaglia, Thales ; Grigoriu, Andreea ; Dumontier, Michel et al.
WOAH 2021: THE 5TH WORKSHOP ON ONLINE ABUSE AND HARMS. Association for Computational Linguistics, 2021. p. 191-200.

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

TY - GEN

T1 - Abusive Language on Social Media Through the Legal Looking Glass

AU - Costa Bertaglia, Thales

AU - Grigoriu, Andreea

AU - Dumontier, Michel

AU - van Dijck, Gijs

PY - 2021/6

Y1 - 2021/6

N2 - Abusive language is a growing phenomenon on social media platforms. Its effects can reach beyond the online context, contributing to mental or emotional stress on users. Automatic tools for detecting abuse can alleviate the issue. In practice, developing automated methods to detect abusive language relies on good quality data. However, there is currently a lack of standards for creating datasets in the field. These standards include definitions of what is considered abusive language, annotation guidelines and reporting on the process. This paper introduces an annotation framework inspired by legal concepts to define abusive language in the context of online harassment. The framework uses a 7-point Likert scale for labelling instead of class labels. We also present ALYT – a dataset of Abusive Language on YouTube. ALYT includes YouTube comments in English extracted from videos on different controversial topics and labelled by Law students. The comments were sampled from the actual collected data, without artificial methods for increasing the abusive content. The paper describes the annotation process thoroughly, including all its guidelines and training steps.

AB - Abusive language is a growing phenomenon on social media platforms. Its effects can reach beyond the online context, contributing to mental or emotional stress on users. Automatic tools for detecting abuse can alleviate the issue. In practice, developing automated methods to detect abusive language relies on good quality data. However, there is currently a lack of standards for creating datasets in the field. These standards include definitions of what is considered abusive language, annotation guidelines and reporting on the process. This paper introduces an annotation framework inspired by legal concepts to define abusive language in the context of online harassment. The framework uses a 7-point Likert scale for labelling instead of class labels. We also present ALYT – a dataset of Abusive Language on YouTube. ALYT includes YouTube comments in English extracted from videos on different controversial topics and labelled by Law students. The comments were sampled from the actual collected data, without artificial methods for increasing the abusive content. The paper describes the annotation process thoroughly, including all its guidelines and training steps.

M3 - Conference article in proceeding

SN - 9781954085596

SP - 191

EP - 200

BT - WOAH 2021: THE 5TH WORKSHOP ON ONLINE ABUSE AND HARMS

PB - Association for Computational Linguistics

T2 - 5th Workshop on Online Abuse and Harms (WOAH)

Y2 - 5 August 2021 through 6 August 2021

ER -