Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives

Rohan Nanda; Giovanni Siragusa; Luigi Di Caro; Guido Boella; Lorenzo Grossio; Marco Gerbaudo; Francesco Costamagna

doi:10.1007/s10506-018-9236-y

Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives

Rohan Nanda^*, Giovanni Siragusa, Luigi Di Caro, Guido Boella, Lorenzo Grossio, Marco Gerbaudo, Francesco Costamagna

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

The automated identification of national implementations (NIMs) of European directives by text similarity techniques has shown promising preliminary results. Previous works have proposed and utilized unsupervised lexical and semantic similarity techniques based on vector space models, latent semantic analysis and topic models. However, these techniques were evaluated on a small multilingual corpus of directives and NIMs. In this paper, we utilize word and paragraph embedding models learned by shallow neural networks from a multilingual legal corpus of European directives and national legislation (from Ireland, Luxembourg and Italy) to develop unsupervised semantic similarity systems to identify transpositions. We evaluate these models and compare their results with the previous unsupervised methods on a multilingual test corpus of 43 Directives and their corresponding NIMs. We also develop supervised machine learning models to identify transpositions and compare their performance with different feature sets.

Original language	English
Pages (from-to)	199-225
Number of pages	27
Journal	Artif. Intell. Law
Volume	27
Issue number	2
DOIs	https://doi.org/10.1007/s10506-018-9236-y
Publication status	Published - Jun 2019
Externally published	Yes

Keywords

Machine learning
Text similarity
Transposition

Access to Document

10.1007/s10506-018-9236-y

Cite this

@article{070f2c1e5de142b2a74a4038b509db3b,

title = "Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives",

abstract = "The automated identification of national implementations (NIMs) of European directives by text similarity techniques has shown promising preliminary results. Previous works have proposed and utilized unsupervised lexical and semantic similarity techniques based on vector space models, latent semantic analysis and topic models. However, these techniques were evaluated on a small multilingual corpus of directives and NIMs. In this paper, we utilize word and paragraph embedding models learned by shallow neural networks from a multilingual legal corpus of European directives and national legislation (from Ireland, Luxembourg and Italy) to develop unsupervised semantic similarity systems to identify transpositions. We evaluate these models and compare their results with the previous unsupervised methods on a multilingual test corpus of 43 Directives and their corresponding NIMs. We also develop supervised machine learning models to identify transpositions and compare their performance with different feature sets.",

keywords = "Machine learning, Text similarity, Transposition",

author = "Rohan Nanda and Giovanni Siragusa and Caro, {Luigi Di} and Guido Boella and Lorenzo Grossio and Marco Gerbaudo and Francesco Costamagna",

note = "DBLP's bibliographic metadata records provided through http://dblp.org/search/publ/api are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.",

year = "2019",

month = jun,

doi = "10.1007/s10506-018-9236-y",

language = "English",

volume = "27",

pages = "199--225",

journal = "Artif. Intell. Law",

number = "2",

}

TY - JOUR

T1 - Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives

AU - Nanda, Rohan

AU - Siragusa, Giovanni

AU - Caro, Luigi Di

AU - Boella, Guido

AU - Grossio, Lorenzo

AU - Gerbaudo, Marco

AU - Costamagna, Francesco

N1 - DBLP's bibliographic metadata records provided through http://dblp.org/search/publ/api are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.

PY - 2019/6

Y1 - 2019/6

N2 - The automated identification of national implementations (NIMs) of European directives by text similarity techniques has shown promising preliminary results. Previous works have proposed and utilized unsupervised lexical and semantic similarity techniques based on vector space models, latent semantic analysis and topic models. However, these techniques were evaluated on a small multilingual corpus of directives and NIMs. In this paper, we utilize word and paragraph embedding models learned by shallow neural networks from a multilingual legal corpus of European directives and national legislation (from Ireland, Luxembourg and Italy) to develop unsupervised semantic similarity systems to identify transpositions. We evaluate these models and compare their results with the previous unsupervised methods on a multilingual test corpus of 43 Directives and their corresponding NIMs. We also develop supervised machine learning models to identify transpositions and compare their performance with different feature sets.

AB - The automated identification of national implementations (NIMs) of European directives by text similarity techniques has shown promising preliminary results. Previous works have proposed and utilized unsupervised lexical and semantic similarity techniques based on vector space models, latent semantic analysis and topic models. However, these techniques were evaluated on a small multilingual corpus of directives and NIMs. In this paper, we utilize word and paragraph embedding models learned by shallow neural networks from a multilingual legal corpus of European directives and national legislation (from Ireland, Luxembourg and Italy) to develop unsupervised semantic similarity systems to identify transpositions. We evaluate these models and compare their results with the previous unsupervised methods on a multilingual test corpus of 43 Directives and their corresponding NIMs. We also develop supervised machine learning models to identify transpositions and compare their performance with different feature sets.

KW - Machine learning

KW - Text similarity

KW - Transposition

U2 - 10.1007/s10506-018-9236-y

DO - 10.1007/s10506-018-9236-y

M3 - Article

VL - 27

SP - 199

EP - 225

JO - Artif. Intell. Law

JF - Artif. Intell. Law

IS - 2

ER -