TY - CONF
T1 - A unifying similarity measure for automated identification of national implementations of european union directives
AU - Nanda, Rohan
AU - Caro, Luigi Di
AU - Boella, Guido
AU - Konstantinov, Hristo
AU - Tyankov, Tenyo
AU - Traykov, Daniel
AU - Hristov, Hristo
AU - Costamagna, Francesco
AU - Humphreys, Llio
AU - Robaldo, Livio
AU - Romano, Michele
N1 - DBLP's bibliographic metadata records provided through http://dblp.org/search/publ/api are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.
PY - 2017
Y1 - 2017
N2 - This paper presents a unifying text similarity measure (USM) for automated identication of national implementations of European Union (EU) directives. The proposed model retrieves the transposed provisions of national law at a ne-grained level for each article of the directive. USM incorporates methods for matching common words, common sequences of words and approximate string matching. It was used for identifying transpositions on a multilingual corpus of four directives and their corresponding national implementing measures (NIMs) in three dierent languages: English, French and Italian. We further utilized a corpus of four additional directives and their corresponding NIMs in English language for a thorough test of the USM approach. We evaluated the model by comparing our results with a gold standard consisting of ocial correlation tables (where available) or correspondences manually identied by domain experts. Our results indicate that USM was able to identify transpositions with average F-score values of 0.808, 0.736 and 0.708 for French, Italian and English Directive-NIM pairs respectively in the multilingual corpus. A comparison with state-of-the-art methods for text similarity illustrates that USM achieves a higher F-score and recall across both the corpora.
AB - This paper presents a unifying text similarity measure (USM) for automated identication of national implementations of European Union (EU) directives. The proposed model retrieves the transposed provisions of national law at a ne-grained level for each article of the directive. USM incorporates methods for matching common words, common sequences of words and approximate string matching. It was used for identifying transpositions on a multilingual corpus of four directives and their corresponding national implementing measures (NIMs) in three dierent languages: English, French and Italian. We further utilized a corpus of four additional directives and their corresponding NIMs in English language for a thorough test of the USM approach. We evaluated the model by comparing our results with a gold standard consisting of ocial correlation tables (where available) or correspondences manually identied by domain experts. Our results indicate that USM was able to identify transpositions with average F-score values of 0.808, 0.736 and 0.708 for French, Italian and English Directive-NIM pairs respectively in the multilingual corpus. A comparison with state-of-the-art methods for text similarity illustrates that USM achieves a higher F-score and recall across both the corpora.
U2 - 10.1145/3086512.3086527
DO - 10.1145/3086512.3086527
M3 - Paper
SP - 149
EP - 158
ER -