A Statutory Article Retrieval Dataset in French

Antoine Louis*, Gerasimos Spanakis

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

Abstract

Statutory article retrieval is the task of automatically retrieving law articles relevant to a legal question. While recent advances in natural language processing have sparked considerable interest in many legal tasks, statutory article retrieval remains primarily untouched due to the scarcity of large-scale and high-quality annotated datasets. To address this bottleneck, we introduce the Belgian Statutory Article Retrieval Dataset (BSARD), which consists of 1,100+ French native legal questions labeled by experienced jurists with relevant articles from a corpus of 22,600+ Belgian law articles. Using BSARD, we benchmark several state-of-the-art retrieval approaches, including lexical and dense architectures, both in zero-shot and supervised setups. We find that fine-tuned dense retrieval models significantly outperform other systems. Our best performing baseline achieves 74.8% R@100, which is promising for the feasibility of the task and indicates there is still room for improvement. By the specificity of the domain and addressed task, BSARD presents a unique challenge problem for future research on legal information retrieval. Our dataset and source code are publicly available.
Original languageEnglish
Title of host publicationProceedings of the 60th Annual Meeting of the Association for Computational Linguistics
EditorsSmaranda Muresan, Preslav Nakov, Aline Villavicencio
PublisherAssociation for Computational Linguistics
Pages6789–6803
Number of pages14
ISBN (Print)9781955917216
DOIs
Publication statusPublished - May 2022
Event60th Annual Meeting of the Association for Computational Linguistics - Convention Centre Dublin, Dublin, Ireland
Duration: 22 May 202227 May 2022
https://www.2022.aclweb.org/

Publication series

SeriesProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume1
ISSN0736-587X

Conference

Conference60th Annual Meeting of the Association for Computational Linguistics
Abbreviated titleACL 2022
Country/TerritoryIreland
CityDublin
Period22/05/2227/05/22
Internet address

Keywords

  • ONLINE

Cite this