Autosegmentation for thoracic radiation treatment planning: A grand challenge at AAPM 2017

Jinzhong Yang*, Harini Veeraraghavan, Samuel G. Armato, Keyvan Farahani, Justin S. Kirby, Jayashree Kalpathy-Kramer, Wouter van Elmpt, Andre Dekker, Xiao Han, Xue Feng, Paul Aljabar, Bruno Oliveira, Brent van der Heyden, Leonid Zamdborg, Dao Lam, Mark Gooding, Gregory C. Sharp

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

101 Downloads (Pure)

Abstract

PurposeThis report presents the methods and results of the Thoracic Auto-Segmentation Challenge organized at the 2017 Annual Meeting of American Association of Physicists in Medicine. The purpose of the challenge was to provide a benchmark dataset and platform for evaluating performance of autosegmentation methods of organs at risk (OARs) in thoracic CT images.

Methods Sixty thoracic CT scans provided by three different institutions were separated into 36 training, 12 offline testing, and 12 online testing scans. Eleven participants completed the offline challenge, and seven completed the online challenge. The OARs were left and right lungs, heart, esophagus, and spinal cord. Clinical contours used for treatment planning were quality checked and edited to adhere to the RTOG 1106 contouring guidelines. Algorithms were evaluated using the Dice coefficient, Hausdorff distance, and mean surface distance. A consolidated score was computed by normalizing the metrics against interrater variability and averaging over all patients and structures.

Results The interrater study revealed highest variability in Dice for the esophagus and spinal cord, and in surface distances for lungs and heart. Five out of seven algorithms that participated in the online challenge employed deep-learning methods. Although the top three participants using deep learning produced the best segmentation for all structures, there was no significant difference in the performance among them. The fourth place participant used a multi-atlas-based approach. The highest Dice scores were produced for lungs, with averages ranging from 0.95 to 0.98, while the lowest Dice scores were produced for esophagus, with a range of 0.55-0.72.

Conclusion The results of the challenge showed that the lungs and heart can be segmented fairly accurately by various algorithms, while deep-learning methods performed better on the esophagus. Our dataset together with the manual contours for all training cases continues to be available publicly as an ongoing benchmarking resource.

Original languageEnglish
Pages (from-to)4568-4581
Number of pages14
JournalMedical Physics
Volume45
Issue number10
DOIs
Publication statusPublished - Oct 2018

Keywords

  • automatic segmentation
  • grand challenge
  • lung cancer
  • radiation therapy
  • SEGMENTATION METHODS
  • ATLAS
  • CANCER
  • RADIOTHERAPY
  • ESOPHAGUS
  • GUIDANCE
  • THERAPY
  • RISK
  • LUNG
  • HEAD

Cite this