Automatic classification of dental artifact status for efficient image veracity checks: effects of image resolution and convolutional neural network depth

Mattea L. Welch; Chris McIntosh; Tom G. Purdie; Leonard Wee; Alberto Traverso; Andre Dekker; Benjamin Haibe-Kains; David A. Jaffray

doi:10.1088/1361-6560/ab5427

Automatic classification of dental artifact status for efficient image veracity checks: effects of image resolution and convolutional neural network depth

Mattea L. Welch^*, Chris McIntosh, Tom G. Purdie, Leonard Wee, Alberto Traverso, Andre Dekker, Benjamin Haibe-Kains, David A. Jaffray

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

155 Downloads (Pure)

Abstract

Enabling automated pipelines, image analysis and big data methodology in cancer clinics requires thorough understanding of the data. Automated quality assurance steps could improve the efficiency and robustness of these methods by verifying possible data biases. In particular, in head and neck (H&N) computed-tomography (CT) images, dental artifacts (DA) obscure visualization of structures and the accuracy of Hounsfield units; a challenge for image analysis tasks, including radiomics, where poor image quality can lead to systemic biases. In this work we analyze the performance of three-dimensional convolutional neural networks (CNN) trained to classify DA statuses. 1538 patient images were scored by a single observer as DA positive or negative. Stratified five-fold cross validation was performed to train and test CNNs using various isotropic resampling grids (64(3), 128(3) and 256(3)), with CNN depths designed to produce 32(3), 16(3), and 8(3) machine generated features. These parameters were selected to determine if more computationally efficient CNNs could be utilized to achieve the same performance. The area under the precision recall curve (PR-AUC) was used to assess CNN performance. The highest PR-AUC (0.92 +/- 0.03) was achieved with a CNN depth = 5, resampling grid = 256. The CNN performance with 256(3) resampling grid size is not significantly better than 64(3) and 128(3) after 20 epochs, which had PR-AUC = 0.89 +/- 0.03 (p -value = 0.28) and 0.91 +/- 0.02 (p -value = 0.93) at depths of 3 and 4, respectively. Our experiments demonstrate the potential to automate specific quality assurance tasks required for unbiased and robust automated pipeline and image analysis research. Additionally, we determined that there is an opportunity to simplify CNNs with smaller resampling grids to make the process more amenable to very large datasets that will be available in the future.

Original language	English
Article number	015005
Number of pages	9
Journal	Physics in Medicine and Biology
Volume	65
Issue number	1
DOIs	https://doi.org/10.1088/1361-6560/ab5427
Publication status	Published - Jan 2020

Keywords

CT imaging
HEAD
RADIOMICS
REDUCTION
automation
deep learning
dental artifacts
quality classification
NECK COMPUTED-TOMOGRAPHY

Access to Document

10.1088/1361-6560/ab5427

Full TextFinal published version, 781 KBLicence: Taverne

Cite this

Welch, M. L., McIntosh, C., Purdie, T. G., Wee, L., Traverso, A., Dekker, A., Haibe-Kains, B., & Jaffray, D. A. (2020). Automatic classification of dental artifact status for efficient image veracity checks: effects of image resolution and convolutional neural network depth. Physics in Medicine and Biology, 65(1), Article 015005. https://doi.org/10.1088/1361-6560/ab5427

@article{ebb59271ef764e9691f7fcfe1eed6105,

title = "Automatic classification of dental artifact status for efficient image veracity checks: effects of image resolution and convolutional neural network depth",

abstract = "Enabling automated pipelines, image analysis and big data methodology in cancer clinics requires thorough understanding of the data. Automated quality assurance steps could improve the efficiency and robustness of these methods by verifying possible data biases. In particular, in head and neck (H&N) computed-tomography (CT) images, dental artifacts (DA) obscure visualization of structures and the accuracy of Hounsfield units; a challenge for image analysis tasks, including radiomics, where poor image quality can lead to systemic biases. In this work we analyze the performance of three-dimensional convolutional neural networks (CNN) trained to classify DA statuses. 1538 patient images were scored by a single observer as DA positive or negative. Stratified five-fold cross validation was performed to train and test CNNs using various isotropic resampling grids (64(3), 128(3) and 256(3)), with CNN depths designed to produce 32(3), 16(3), and 8(3) machine generated features. These parameters were selected to determine if more computationally efficient CNNs could be utilized to achieve the same performance. The area under the precision recall curve (PR-AUC) was used to assess CNN performance. The highest PR-AUC (0.92 +/- 0.03) was achieved with a CNN depth = 5, resampling grid = 256. The CNN performance with 256(3) resampling grid size is not significantly better than 64(3) and 128(3) after 20 epochs, which had PR-AUC = 0.89 +/- 0.03 (p -value = 0.28) and 0.91 +/- 0.02 (p -value = 0.93) at depths of 3 and 4, respectively. Our experiments demonstrate the potential to automate specific quality assurance tasks required for unbiased and robust automated pipeline and image analysis research. Additionally, we determined that there is an opportunity to simplify CNNs with smaller resampling grids to make the process more amenable to very large datasets that will be available in the future.",

keywords = "CT imaging, HEAD, RADIOMICS, REDUCTION, automation, deep learning, dental artifacts, quality classification, NECK COMPUTED-TOMOGRAPHY",

author = "Welch, {Mattea L.} and Chris McIntosh and Purdie, {Tom G.} and Leonard Wee and Alberto Traverso and Andre Dekker and Benjamin Haibe-Kains and Jaffray, {David A.}",

note = "Publisher Copyright: {\textcopyright} 2020 Institute of Physics and Engineering in Medicine.",

year = "2020",

month = jan,

doi = "10.1088/1361-6560/ab5427",

language = "English",

volume = "65",

journal = "Physics in Medicine and Biology",

issn = "0031-9155",

publisher = "IOP Publishing Ltd.",

number = "1",

}

TY - JOUR

T1 - Automatic classification of dental artifact status for efficient image veracity checks

T2 - effects of image resolution and convolutional neural network depth

AU - Welch, Mattea L.

AU - McIntosh, Chris

AU - Purdie, Tom G.

AU - Wee, Leonard

AU - Traverso, Alberto

AU - Dekker, Andre

AU - Haibe-Kains, Benjamin

AU - Jaffray, David A.

PY - 2020/1

Y1 - 2020/1

N2 - Enabling automated pipelines, image analysis and big data methodology in cancer clinics requires thorough understanding of the data. Automated quality assurance steps could improve the efficiency and robustness of these methods by verifying possible data biases. In particular, in head and neck (H&N) computed-tomography (CT) images, dental artifacts (DA) obscure visualization of structures and the accuracy of Hounsfield units; a challenge for image analysis tasks, including radiomics, where poor image quality can lead to systemic biases. In this work we analyze the performance of three-dimensional convolutional neural networks (CNN) trained to classify DA statuses. 1538 patient images were scored by a single observer as DA positive or negative. Stratified five-fold cross validation was performed to train and test CNNs using various isotropic resampling grids (64(3), 128(3) and 256(3)), with CNN depths designed to produce 32(3), 16(3), and 8(3) machine generated features. These parameters were selected to determine if more computationally efficient CNNs could be utilized to achieve the same performance. The area under the precision recall curve (PR-AUC) was used to assess CNN performance. The highest PR-AUC (0.92 +/- 0.03) was achieved with a CNN depth = 5, resampling grid = 256. The CNN performance with 256(3) resampling grid size is not significantly better than 64(3) and 128(3) after 20 epochs, which had PR-AUC = 0.89 +/- 0.03 (p -value = 0.28) and 0.91 +/- 0.02 (p -value = 0.93) at depths of 3 and 4, respectively. Our experiments demonstrate the potential to automate specific quality assurance tasks required for unbiased and robust automated pipeline and image analysis research. Additionally, we determined that there is an opportunity to simplify CNNs with smaller resampling grids to make the process more amenable to very large datasets that will be available in the future.

AB - Enabling automated pipelines, image analysis and big data methodology in cancer clinics requires thorough understanding of the data. Automated quality assurance steps could improve the efficiency and robustness of these methods by verifying possible data biases. In particular, in head and neck (H&N) computed-tomography (CT) images, dental artifacts (DA) obscure visualization of structures and the accuracy of Hounsfield units; a challenge for image analysis tasks, including radiomics, where poor image quality can lead to systemic biases. In this work we analyze the performance of three-dimensional convolutional neural networks (CNN) trained to classify DA statuses. 1538 patient images were scored by a single observer as DA positive or negative. Stratified five-fold cross validation was performed to train and test CNNs using various isotropic resampling grids (64(3), 128(3) and 256(3)), with CNN depths designed to produce 32(3), 16(3), and 8(3) machine generated features. These parameters were selected to determine if more computationally efficient CNNs could be utilized to achieve the same performance. The area under the precision recall curve (PR-AUC) was used to assess CNN performance. The highest PR-AUC (0.92 +/- 0.03) was achieved with a CNN depth = 5, resampling grid = 256. The CNN performance with 256(3) resampling grid size is not significantly better than 64(3) and 128(3) after 20 epochs, which had PR-AUC = 0.89 +/- 0.03 (p -value = 0.28) and 0.91 +/- 0.02 (p -value = 0.93) at depths of 3 and 4, respectively. Our experiments demonstrate the potential to automate specific quality assurance tasks required for unbiased and robust automated pipeline and image analysis research. Additionally, we determined that there is an opportunity to simplify CNNs with smaller resampling grids to make the process more amenable to very large datasets that will be available in the future.

KW - CT imaging

KW - HEAD

KW - RADIOMICS

KW - REDUCTION

KW - automation

KW - deep learning

KW - dental artifacts

KW - quality classification

KW - NECK COMPUTED-TOMOGRAPHY

U2 - 10.1088/1361-6560/ab5427

DO - 10.1088/1361-6560/ab5427

M3 - Article

C2 - 31683260

SN - 0031-9155

VL - 65

JO - Physics in Medicine and Biology

JF - Physics in Medicine and Biology

IS - 1

M1 - 015005

ER -