Investigating the effect of different fine-tuning configuration scenarios on agricultural term extraction using BERT

Hercules Panoutsopoulos*, Borja Espejo-Garcia, Stephan Raaijmakers, Xu Wang, Spyros Fountas, Christopher Brewster

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

This paper compares different transformer-based language models for automatic term extraction from agriculture-related texts. Agriculture is an important economic sector faced with severe environmental and societal challenges. The collection, annotation and sharing of agricultural scientific knowledge is key to enabling the agricultural sector to address its challenges. Automatic term extraction is a Natural Language Processing task that can provide solutions to text tagging and annotation towards better knowledge and information exchange. It is concerned with the identification of terms pertaining to a domain, or area of expertise, in text and is an important step in knowledge base creation and update pipelines. Transformer-based language modeling technologies like BERT have become popular for automatic term extraction, but limited work has been undertaken so far in applying these methods to agriculture. This paper systematically compares Agriculture-BERT to Sci-BERT, RoBERTa, and vanilla BERT, which were fine-tuned for the automatic extraction of agricultural terms from English texts. The greatest challenge faced in our research was the scarcity of agriculture-related gold standard corpora for measuring automatic term extraction performance. Our results show that, with a few exceptions, Agriculture-BERT performs better than the other models considered in our research. Our main contribution and novelty of the presented research is the investigation of the impact that different language model fine-tuning configuration scenarios had on the term extraction task. More specifically, we tested different scenarios related to the model layers kept frozen, or being updated, during training, to measure the impact they may have on Agriculture-BERT's performance in automatic term extraction. Our results show that the best performance was achieved by: (i) the “embedding layer updated + all encoder layers updated” scenario for the identification of terms also seen during training; (ii) the “embedding layer frozen + all encoder layers updated” scenario for the identification of terms being synonyms to those seen during training; and (iii) the “embedding layer updated + top 4 encoder layers updated” scenario for identifying terms neither seen during training nor being synonyms to those seen during training (novel terms).
Original languageEnglish
Article number109268
JournalComputers and Electronics in Agriculture
Volume225
DOIs
Publication statusPublished - Oct 2024

Keywords

  • Agriculture
  • Agriculture-BERT
  • Automatic term extraction
  • Fine tuning configuration scenarios
  • Silver standard corpus

Fingerprint

Dive into the research topics of 'Investigating the effect of different fine-tuning configuration scenarios on agricultural term extraction using BERT'. Together they form a unique fingerprint.

Cite this