Interpretable Long-Form Legal Question Answering with Retrieval-Augmented Large Language Models

Antoine Louis; Gijs van Dijck; Gerasimos Spanakis

doi:10.1609/aaai.v38i20.30232

Interpretable Long-Form Legal Question Answering with Retrieval-Augmented Large Language Models

Antoine Louis, Gijs van Dijck, Gerasimos Spanakis

Research output: Contribution to journal › Conference article in journal › Academic › peer-review

Abstract

Many individuals are likely to face a legal dispute at some point in their lives, but their lack of understanding of how to navigate these complex issues often renders them vulnerable. The advancement of natural language processing opens new avenues for bridging this legal literacy gap through the development of automated legal aid systems. However, existing legal question answering (LQA) approaches often suffer from a narrow scope, being either confined to specific legal domains or limited to brief, uninformative responses. In this work, we propose an end-to-end methodology designed to generate long-form answers to any statutory law questions, utilizing a "retrieve-then-read" pipeline. To support this approach, we introduce and release the Long-form Legal Question Answering (LLeQA) dataset, comprising 1,868 expert-annotated legal questions in the French language, complete with detailed answers rooted in pertinent legal provisions. Our experimental results demonstrate promising performance on automatic evaluation metrics, but a qualitative analysis uncovers areas for refinement. As one of the only comprehensive, expertannotated long-form LQA dataset, LLeQA has the potential to not only accelerate research towards resolving a significant real-world issue, but also act as a rigorous benchmark for evaluating NLP models in specialized domains. We publicly release our code, data, and models.

Original language	English
Pages (from-to)	22266-22275
Number of pages	10
Journal	Proceedings of the AAAI Conference on Artificial Intelligence
Volume	38
Issue number	20
DOIs	https://doi.org/10.1609/aaai.v38i20.30232
Publication status	Published - 25 Mar 2024
Event	38th AAAI Conference on Artificial Intelligence 2024 - Vancouver, Canada Duration: 20 Feb 2024 → 27 Feb 2024 https://aaai.org/aaai-conference/

Access to Document

10.1609/aaai.v38i20.30232

Cite this

@article{f934e7f64af6416fbb0002ff56cfc851,

title = "Interpretable Long-Form Legal Question Answering with Retrieval-Augmented Large Language Models",

abstract = "Many individuals are likely to face a legal dispute at some point in their lives, but their lack of understanding of how to navigate these complex issues often renders them vulnerable. The advancement of natural language processing opens new avenues for bridging this legal literacy gap through the development of automated legal aid systems. However, existing legal question answering (LQA) approaches often suffer from a narrow scope, being either confined to specific legal domains or limited to brief, uninformative responses. In this work, we propose an end-to-end methodology designed to generate long-form answers to any statutory law questions, utilizing a {"}retrieve-then-read{"} pipeline. To support this approach, we introduce and release the Long-form Legal Question Answering (LLeQA) dataset, comprising 1,868 expert-annotated legal questions in the French language, complete with detailed answers rooted in pertinent legal provisions. Our experimental results demonstrate promising performance on automatic evaluation metrics, but a qualitative analysis uncovers areas for refinement. As one of the only comprehensive, expertannotated long-form LQA dataset, LLeQA has the potential to not only accelerate research towards resolving a significant real-world issue, but also act as a rigorous benchmark for evaluating NLP models in specialized domains. We publicly release our code, data, and models.",

author = "Antoine Louis and {van Dijck}, Gijs and Gerasimos Spanakis",

note = "Publisher Copyright: {\textcopyright} 2024, Association for the Advancement of Artifcial Intelligence (www.aaai.org). All rights reserved.; 38th AAAI Conference on Artificial Intelligence 2024, AAAI 2024 ; Conference date: 20-02-2024 Through 27-02-2024",

year = "2024",

month = mar,

day = "25",

doi = "10.1609/aaai.v38i20.30232",

language = "English",

volume = "38",

pages = "22266--22275",

journal = "Proceedings of the AAAI Conference on Artificial Intelligence",

issn = "2159-5399",

publisher = "PKP Publishing Services",

number = "20",

url = "https://aaai.org/aaai-conference/",

}

TY - JOUR

T1 - Interpretable Long-Form Legal Question Answering with Retrieval-Augmented Large Language Models

AU - Louis, Antoine

AU - van Dijck, Gijs

AU - Spanakis, Gerasimos

PY - 2024/3/25

Y1 - 2024/3/25

N2 - Many individuals are likely to face a legal dispute at some point in their lives, but their lack of understanding of how to navigate these complex issues often renders them vulnerable. The advancement of natural language processing opens new avenues for bridging this legal literacy gap through the development of automated legal aid systems. However, existing legal question answering (LQA) approaches often suffer from a narrow scope, being either confined to specific legal domains or limited to brief, uninformative responses. In this work, we propose an end-to-end methodology designed to generate long-form answers to any statutory law questions, utilizing a "retrieve-then-read" pipeline. To support this approach, we introduce and release the Long-form Legal Question Answering (LLeQA) dataset, comprising 1,868 expert-annotated legal questions in the French language, complete with detailed answers rooted in pertinent legal provisions. Our experimental results demonstrate promising performance on automatic evaluation metrics, but a qualitative analysis uncovers areas for refinement. As one of the only comprehensive, expertannotated long-form LQA dataset, LLeQA has the potential to not only accelerate research towards resolving a significant real-world issue, but also act as a rigorous benchmark for evaluating NLP models in specialized domains. We publicly release our code, data, and models.

AB - Many individuals are likely to face a legal dispute at some point in their lives, but their lack of understanding of how to navigate these complex issues often renders them vulnerable. The advancement of natural language processing opens new avenues for bridging this legal literacy gap through the development of automated legal aid systems. However, existing legal question answering (LQA) approaches often suffer from a narrow scope, being either confined to specific legal domains or limited to brief, uninformative responses. In this work, we propose an end-to-end methodology designed to generate long-form answers to any statutory law questions, utilizing a "retrieve-then-read" pipeline. To support this approach, we introduce and release the Long-form Legal Question Answering (LLeQA) dataset, comprising 1,868 expert-annotated legal questions in the French language, complete with detailed answers rooted in pertinent legal provisions. Our experimental results demonstrate promising performance on automatic evaluation metrics, but a qualitative analysis uncovers areas for refinement. As one of the only comprehensive, expertannotated long-form LQA dataset, LLeQA has the potential to not only accelerate research towards resolving a significant real-world issue, but also act as a rigorous benchmark for evaluating NLP models in specialized domains. We publicly release our code, data, and models.

U2 - 10.1609/aaai.v38i20.30232

DO - 10.1609/aaai.v38i20.30232

M3 - Conference article in journal

SN - 2159-5399

VL - 38

SP - 22266

EP - 22275

JO - Proceedings of the AAAI Conference on Artificial Intelligence

JF - Proceedings of the AAAI Conference on Artificial Intelligence

IS - 20

T2 - 38th AAAI Conference on Artificial Intelligence 2024

Y2 - 20 February 2024 through 27 February 2024

ER -