Hippocampal volume change measurement: quantitative assessment of the reproducibility of expert manual outlining and the automated methods FreeSurfer and FIRST

Emma R Mulder; Remko A de Jong; Dirk L Knol; Ronald A van Schijndel; Keith S Cover; Pieter J Visser; Frederik Barkhof; Alzheimer's Disease Neuroimaging Initiative; Hugo Vrenken

doi:10.1016/j.neuroimage.2014.01.058

Hippocampal volume change measurement: quantitative assessment of the reproducibility of expert manual outlining and the automated methods FreeSurfer and FIRST

Emma R Mulder, Remko A de Jong, Dirk L Knol, Ronald A van Schijndel, Keith S Cover, Pieter J Visser, Frederik Barkhof, Alzheimer's Disease Neuroimaging Initiative, Hugo Vrenken^*

^*Corresponding author for this work

MHeNs - Cognitive Neuropsychiatry and Clinical Neuroscience

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

BACKGROUND: To measure hippocampal volume change in Alzheimer's disease (AD) or mild cognitive impairment (MCI), expert manual delineation is often used because of its supposed accuracy. It has been suggested that expert outlining yields poorer reproducibility as compared to automated methods, but this has not been investigated.

AIM: To determine the reproducibilities of expert manual outlining and two common automated methods for measuring hippocampal atrophy rates in healthy aging, MCI and AD.

METHODS: From the Alzheimer's Disease Neuroimaging Initiative (ADNI), 80 subjects were selected: 20 patients with AD, 40 patients with mild cognitive impairment (MCI) and 20 healthy controls (HCs). Left and right hippocampal volume change between baseline and month-12 visit was assessed by using expert manual delineation, and by the automated software packages FreeSurfer (longitudinal processing stream) and FIRST. To assess reproducibility of the measured hippocampal volume change, both back-to-back (BTB) MPRAGE scans available for each visit were analyzed. Hippocampal volume change was expressed in μL, and as a percentage of baseline volume. Reproducibility of the 1-year hippocampal volume change was estimated from the BTB measurements by using linear mixed model to calculate the limits of agreement (LoA) of each method, reflecting its measurement uncertainty. Using the delta method, approximate p-values were calculated for the pairwise comparisons between methods. Statistical analyses were performed both with inclusion and exclusion of visibly incorrect segmentations.

RESULTS: Visibly incorrect automated segmentation in either one or both scans of a longitudinal scan pair occurred in 7.5% of the hippocampi for FreeSurfer and in 6.9% of the hippocampi for FIRST. After excluding these failed cases, reproducibility analysis for 1-year percentage volume change yielded LoA of ±7.2% for FreeSurfer, ±9.7% for expert manual delineation, and ±10.0% for FIRST. Methods ranked the same for reproducibility of 1-year μL volume change, with LoA of ±218 μL for FreeSurfer, ±319 μL for expert manual delineation, and ±333 μL for FIRST. Approximate p-values indicated that reproducibility was better for FreeSurfer than for manual or FIRST, and that manual and FIRST did not differ. Inclusion of failed automated segmentations led to worsening of reproducibility of both automated methods for 1-year raw and percentage volume change.

CONCLUSION: Quantitative reproducibility values of 1-year microliter and percentage hippocampal volume change were roughly similar between expert manual outlining, FIRST and FreeSurfer, but FreeSurfer reproducibility was statistically significantly superior to both manual outlining and FIRST after exclusion of failed segmentations.

Original language	English
Pages (from-to)	169-81
Number of pages	13
Journal	Neuroimage
Volume	92
DOIs	https://doi.org/10.1016/j.neuroimage.2014.01.058
Publication status	Published - 15 May 2014

Keywords

Aged
Algorithms
Alzheimer Disease
Artificial Intelligence
Atrophy
Female
Hippocampus
Humans
Image Enhancement
Image Interpretation, Computer-Assisted
Imaging, Three-Dimensional
Magnetic Resonance Imaging
Male
Mild Cognitive Impairment
Observer Variation
Organ Size
Pattern Recognition, Automated
Reproducibility of Results
Sensitivity and Specificity
Software
Software Validation

Access to Document

10.1016/j.neuroimage.2014.01.058

Cite this

Mulder, E. R., de Jong, R. A., Knol, D. L., van Schijndel, R. A., Cover, K. S., Visser, P. J., Barkhof, F., Alzheimer's Disease Neuroimaging Initiative, & Vrenken, H. (2014). Hippocampal volume change measurement: quantitative assessment of the reproducibility of expert manual outlining and the automated methods FreeSurfer and FIRST. Neuroimage, 92, 169-81. https://doi.org/10.1016/j.neuroimage.2014.01.058

@article{3bb4d1ab40e44843a9e98a4e7ddeba9f,

title = "Hippocampal volume change measurement: quantitative assessment of the reproducibility of expert manual outlining and the automated methods FreeSurfer and FIRST",

abstract = "BACKGROUND: To measure hippocampal volume change in Alzheimer's disease (AD) or mild cognitive impairment (MCI), expert manual delineation is often used because of its supposed accuracy. It has been suggested that expert outlining yields poorer reproducibility as compared to automated methods, but this has not been investigated.AIM: To determine the reproducibilities of expert manual outlining and two common automated methods for measuring hippocampal atrophy rates in healthy aging, MCI and AD.METHODS: From the Alzheimer's Disease Neuroimaging Initiative (ADNI), 80 subjects were selected: 20 patients with AD, 40 patients with mild cognitive impairment (MCI) and 20 healthy controls (HCs). Left and right hippocampal volume change between baseline and month-12 visit was assessed by using expert manual delineation, and by the automated software packages FreeSurfer (longitudinal processing stream) and FIRST. To assess reproducibility of the measured hippocampal volume change, both back-to-back (BTB) MPRAGE scans available for each visit were analyzed. Hippocampal volume change was expressed in μL, and as a percentage of baseline volume. Reproducibility of the 1-year hippocampal volume change was estimated from the BTB measurements by using linear mixed model to calculate the limits of agreement (LoA) of each method, reflecting its measurement uncertainty. Using the delta method, approximate p-values were calculated for the pairwise comparisons between methods. Statistical analyses were performed both with inclusion and exclusion of visibly incorrect segmentations.RESULTS: Visibly incorrect automated segmentation in either one or both scans of a longitudinal scan pair occurred in 7.5% of the hippocampi for FreeSurfer and in 6.9% of the hippocampi for FIRST. After excluding these failed cases, reproducibility analysis for 1-year percentage volume change yielded LoA of ±7.2% for FreeSurfer, ±9.7% for expert manual delineation, and ±10.0% for FIRST. Methods ranked the same for reproducibility of 1-year μL volume change, with LoA of ±218 μL for FreeSurfer, ±319 μL for expert manual delineation, and ±333 μL for FIRST. Approximate p-values indicated that reproducibility was better for FreeSurfer than for manual or FIRST, and that manual and FIRST did not differ. Inclusion of failed automated segmentations led to worsening of reproducibility of both automated methods for 1-year raw and percentage volume change.CONCLUSION: Quantitative reproducibility values of 1-year microliter and percentage hippocampal volume change were roughly similar between expert manual outlining, FIRST and FreeSurfer, but FreeSurfer reproducibility was statistically significantly superior to both manual outlining and FIRST after exclusion of failed segmentations.",

keywords = "Aged, Algorithms, Alzheimer Disease, Artificial Intelligence, Atrophy, Female, Hippocampus, Humans, Image Enhancement, Image Interpretation, Computer-Assisted, Imaging, Three-Dimensional, Magnetic Resonance Imaging, Male, Mild Cognitive Impairment, Observer Variation, Organ Size, Pattern Recognition, Automated, Reproducibility of Results, Sensitivity and Specificity, Software, Software Validation",

author = "Mulder, {Emma R} and {de Jong}, {Remko A} and Knol, {Dirk L} and {van Schijndel}, {Ronald A} and Cover, {Keith S} and Visser, {Pieter J} and Frederik Barkhof and {Alzheimer's Disease Neuroimaging Initiative} and Hugo Vrenken",

year = "2014",

month = may,

day = "15",

doi = "10.1016/j.neuroimage.2014.01.058",

language = "English",

volume = "92",

pages = "169--81",

journal = "Neuroimage",

issn = "1053-8119",

publisher = "Elsevier Science",

}

Mulder, ER, de Jong, RA, Knol, DL, van Schijndel, RA, Cover, KS, Visser, PJ, Barkhof, F, Alzheimer's Disease Neuroimaging Initiative & Vrenken, H 2014, 'Hippocampal volume change measurement: quantitative assessment of the reproducibility of expert manual outlining and the automated methods FreeSurfer and FIRST', Neuroimage, vol. 92, pp. 169-81. https://doi.org/10.1016/j.neuroimage.2014.01.058

TY - JOUR

T1 - Hippocampal volume change measurement

T2 - quantitative assessment of the reproducibility of expert manual outlining and the automated methods FreeSurfer and FIRST

AU - Mulder, Emma R

AU - de Jong, Remko A

AU - Knol, Dirk L

AU - van Schijndel, Ronald A

AU - Cover, Keith S

AU - Visser, Pieter J

AU - Barkhof, Frederik

AU - Alzheimer's Disease Neuroimaging Initiative

AU - Vrenken, Hugo

PY - 2014/5/15

Y1 - 2014/5/15

N2 - BACKGROUND: To measure hippocampal volume change in Alzheimer's disease (AD) or mild cognitive impairment (MCI), expert manual delineation is often used because of its supposed accuracy. It has been suggested that expert outlining yields poorer reproducibility as compared to automated methods, but this has not been investigated.AIM: To determine the reproducibilities of expert manual outlining and two common automated methods for measuring hippocampal atrophy rates in healthy aging, MCI and AD.METHODS: From the Alzheimer's Disease Neuroimaging Initiative (ADNI), 80 subjects were selected: 20 patients with AD, 40 patients with mild cognitive impairment (MCI) and 20 healthy controls (HCs). Left and right hippocampal volume change between baseline and month-12 visit was assessed by using expert manual delineation, and by the automated software packages FreeSurfer (longitudinal processing stream) and FIRST. To assess reproducibility of the measured hippocampal volume change, both back-to-back (BTB) MPRAGE scans available for each visit were analyzed. Hippocampal volume change was expressed in μL, and as a percentage of baseline volume. Reproducibility of the 1-year hippocampal volume change was estimated from the BTB measurements by using linear mixed model to calculate the limits of agreement (LoA) of each method, reflecting its measurement uncertainty. Using the delta method, approximate p-values were calculated for the pairwise comparisons between methods. Statistical analyses were performed both with inclusion and exclusion of visibly incorrect segmentations.RESULTS: Visibly incorrect automated segmentation in either one or both scans of a longitudinal scan pair occurred in 7.5% of the hippocampi for FreeSurfer and in 6.9% of the hippocampi for FIRST. After excluding these failed cases, reproducibility analysis for 1-year percentage volume change yielded LoA of ±7.2% for FreeSurfer, ±9.7% for expert manual delineation, and ±10.0% for FIRST. Methods ranked the same for reproducibility of 1-year μL volume change, with LoA of ±218 μL for FreeSurfer, ±319 μL for expert manual delineation, and ±333 μL for FIRST. Approximate p-values indicated that reproducibility was better for FreeSurfer than for manual or FIRST, and that manual and FIRST did not differ. Inclusion of failed automated segmentations led to worsening of reproducibility of both automated methods for 1-year raw and percentage volume change.CONCLUSION: Quantitative reproducibility values of 1-year microliter and percentage hippocampal volume change were roughly similar between expert manual outlining, FIRST and FreeSurfer, but FreeSurfer reproducibility was statistically significantly superior to both manual outlining and FIRST after exclusion of failed segmentations.

AB - BACKGROUND: To measure hippocampal volume change in Alzheimer's disease (AD) or mild cognitive impairment (MCI), expert manual delineation is often used because of its supposed accuracy. It has been suggested that expert outlining yields poorer reproducibility as compared to automated methods, but this has not been investigated.AIM: To determine the reproducibilities of expert manual outlining and two common automated methods for measuring hippocampal atrophy rates in healthy aging, MCI and AD.METHODS: From the Alzheimer's Disease Neuroimaging Initiative (ADNI), 80 subjects were selected: 20 patients with AD, 40 patients with mild cognitive impairment (MCI) and 20 healthy controls (HCs). Left and right hippocampal volume change between baseline and month-12 visit was assessed by using expert manual delineation, and by the automated software packages FreeSurfer (longitudinal processing stream) and FIRST. To assess reproducibility of the measured hippocampal volume change, both back-to-back (BTB) MPRAGE scans available for each visit were analyzed. Hippocampal volume change was expressed in μL, and as a percentage of baseline volume. Reproducibility of the 1-year hippocampal volume change was estimated from the BTB measurements by using linear mixed model to calculate the limits of agreement (LoA) of each method, reflecting its measurement uncertainty. Using the delta method, approximate p-values were calculated for the pairwise comparisons between methods. Statistical analyses were performed both with inclusion and exclusion of visibly incorrect segmentations.RESULTS: Visibly incorrect automated segmentation in either one or both scans of a longitudinal scan pair occurred in 7.5% of the hippocampi for FreeSurfer and in 6.9% of the hippocampi for FIRST. After excluding these failed cases, reproducibility analysis for 1-year percentage volume change yielded LoA of ±7.2% for FreeSurfer, ±9.7% for expert manual delineation, and ±10.0% for FIRST. Methods ranked the same for reproducibility of 1-year μL volume change, with LoA of ±218 μL for FreeSurfer, ±319 μL for expert manual delineation, and ±333 μL for FIRST. Approximate p-values indicated that reproducibility was better for FreeSurfer than for manual or FIRST, and that manual and FIRST did not differ. Inclusion of failed automated segmentations led to worsening of reproducibility of both automated methods for 1-year raw and percentage volume change.CONCLUSION: Quantitative reproducibility values of 1-year microliter and percentage hippocampal volume change were roughly similar between expert manual outlining, FIRST and FreeSurfer, but FreeSurfer reproducibility was statistically significantly superior to both manual outlining and FIRST after exclusion of failed segmentations.

KW - Aged

KW - Algorithms

KW - Alzheimer Disease

KW - Artificial Intelligence

KW - Atrophy

KW - Female

KW - Hippocampus

KW - Humans

KW - Image Enhancement

KW - Image Interpretation, Computer-Assisted

KW - Imaging, Three-Dimensional

KW - Magnetic Resonance Imaging

KW - Male

KW - Mild Cognitive Impairment

KW - Observer Variation

KW - Organ Size

KW - Pattern Recognition, Automated

KW - Reproducibility of Results

KW - Sensitivity and Specificity

KW - Software

KW - Software Validation

U2 - 10.1016/j.neuroimage.2014.01.058

DO - 10.1016/j.neuroimage.2014.01.058

M3 - Article

C2 - 24521851

SN - 1053-8119

VL - 92

SP - 169

EP - 181

JO - Neuroimage

JF - Neuroimage

ER -