Know When to Fuse: Investigating Non-English Hybrid Retrieval in the Legal Domain

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

3 Downloads (Pure)

Abstract

Hybrid search has emerged as an effective strategy to offset the limitations of different matching paradigms, especially in out-of-domain contexts where notable improvements in retrieval quality have been observed. However, existing research predominantly focuses on a limited set of retrieval methods, evaluated in pairs on domain-general datasets exclusively in English. In this work, we study the efficacy of hybrid search across a variety of prominent retrieval models within the unexplored field of law in the French language, assessing both zero-shot and in-domain scenarios. Our findings reveal that in a zero-shot context, fusing different domain-general models consistently enhances performance compared to using a standalone model, regardless of the fusion method. Surprisingly, when models are trained in-domain, we find that fusion generally diminishes performance relative to using the best single system, unless fusing scores with carefully tuned weights. These novel insights, among others, expand the applicability of prior findings across a new field and language, and contribute to a deeper understanding of hybrid search in non-English specialized domains.
Original languageEnglish
Title of host publicationProceedings of the 31st International Conference on Computational Linguistics
EditorsOwen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Place of PublicationKerrville
PublisherAssociation for Computational Linguistics (ACL)
Pages4293-4312
Number of pages20
ISBN (Electronic)9798891761964
Publication statusPublished - 1 Jan 2025
Event31st International Conference on Computational Linguistics, COLING 2025 - Abu Dhabi, United Arab Emirates
Duration: 19 Jan 202524 Jan 2025
https://coling2025.org/

Conference

Conference31st International Conference on Computational Linguistics, COLING 2025
Abbreviated titleCOLING 2025
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period19/01/2524/01/25
Internet address

Fingerprint

Dive into the research topics of 'Know When to Fuse: Investigating Non-English Hybrid Retrieval in the Legal Domain'. Together they form a unique fingerprint.

Cite this