Automated anonymization of radiology reports: comparison of publicly available natural language processing and large language models

Marcel C. Langenbach*, Borek Foldyna, Ibrahim Hadzic, Isabel L. Langenbach*, Vineet K. Raghu, Michael T. Lu, Tomas G. Neilan, Julius C. Heemelaar

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

PurposeMedical reports, governed by HIPAA regulations, contain personal health information (PHI), restricting secondary data use. Utilizing natural language processing (NLP) and large language models (LLM), we sought to employ publicly available methods to automatically anonymize PHI in free-text radiology reports.Materials and methodsWe compared two publicly available rule-based NLP models (spaCy; NLPac, accuracy-optimized; NLPsp, speed-optimized; iteratively improved on 400 free-text CT-reports (test set)) and one offline LLM approach (LLM-model, LLaMa-2, Meta-AI) for PHI-anonymization. The three models were tested on 100 randomly selected chest CT reports. Two investigators assessed the anonymization of occurring PHI entities and whether clinical information was removed. Subsequently, precision, recall, and F1 scores were calculated.ResultsNLPac and NLPsp successfully removed all instances of dates (n = 333), medical record numbers (MRN) (n = 6), and accession numbers (ACC) (n = 92). The LLM model removed all MRNs, 96% of ACCs, and 32% of dates. NLPac was most consistent with a perfect F1-score of 1.00, followed by NLPsp with lower precision (0.86) and F1-score (0.92) for dates. The LLM model had perfect precision for MRNs, ACCs, and dates but the lowest recall for ACC (0.96) and dates (0.52), corresponding F1 scores of 0.98 and 0.68, respectively. Names were removed completely or majorly (i.e., one first or family name non-anonymized) in 100% (NLPac), 72% (NLPsp), and 90% (LLM-model). Importantly, NLPac and NLPsp did not remove medical information, while the LLM model did in 10% (n = 10).ConclusionPre-trained NLP models can effectively anonymize free-text radiology reports, while anonymization with the LLM model is more prone to deleting medical information.Key PointsQuestionThis study compares NLP and locally hosted LLM techniques to ensure PHI anonymization without losing clinical information.FindingsPre-trained NLP models effectively anonymized radiology reports without removing clinical data, while a locally hosted LLM was less reliable, risking the loss of important information.Clinical relevanceFast, reliable, automated anonymization of PHI from radiology reports enables HIPAA-compliant secondary use, facilitating advanced applications like LLM-driven radiology analysis while ensuring ethical handling of sensitive patient data.Key PointsQuestionThis study compares NLP and locally hosted LLM techniques to ensure PHI anonymization without losing clinical information.FindingsPre-trained NLP models effectively anonymized radiology reports without removing clinical data, while a locally hosted LLM was less reliable, risking the loss of important information.Clinical relevanceFast, reliable, automated anonymization of PHI from radiology reports enables HIPAA-compliant secondary use, facilitating advanced applications like LLM-driven radiology analysis while ensuring ethical handling of sensitive patient data.Key PointsQuestionThis study compares NLP and locally hosted LLM techniques to ensure PHI anonymization without losing clinical information.FindingsPre-trained NLP models effectively anonymized radiology reports without removing clinical data, while a locally hosted LLM was less reliable, risking the loss of important information.Clinical relevanceFast, reliable, automated anonymization of PHI from radiology reports enables HIPAA-compliant secondary use, facilitating advanced applications like LLM-driven radiology analysis while ensuring ethical handling of sensitive patient data.
Original languageEnglish
Number of pages8
JournalEuropean Radiology
DOIs
Publication statusPublished - 1 Oct 2024

Keywords

  • Natural language processing
  • Electronic health records
  • Machine learning
  • Data anonymization
  • Diagnostic imaging
  • DE-IDENTIFICATION

Fingerprint

Dive into the research topics of 'Automated anonymization of radiology reports: comparison of publicly available natural language processing and large language models'. Together they form a unique fingerprint.

Cite this