Recently, there has been an increased interest in the use of automatically segmented subfields of the human hippocampal formation derived from magnetic resonance imaging (MRI). However, little is known about the test‐retest reproducibility of such measures, particularly in the context of multisite studies. Here, we report the reproducibility of automated Freesurfer hippocampal subfields segmentations in 65 healthy elderly enrolled in a consortium of 13 3T MRI sites (five subjects per site). Participants were scanned in two sessions (test and retest) at least one week apart. Each session included two anatomical 3D T1 MRI acquisitions harmonized in the consortium. We evaluated the test‐retest reproducibility of subfields segmentation (i) to assess the effects of averaging two within‐session T1 images and (ii) to compare subfields with whole hippocampus volume and spatial reliability. We found that within‐session averaging of two T1 images significantly improved the reproducibility of all hippocampal subfields but not that of the whole hippocampus. Volumetric and spatial reproducibility across MRI sites were very good for the whole hippocampus, CA2‐3, CA4‐dentate gyrus (DG), subiculum (reproducibility error∼2% and DICE > 0.90), good for CA1 and presubiculum (reproducibility error ∼ 5% and DICE ∼ 0.90), and poorer for fimbria and hippocampal fissure (reproducibility error ∼ 15% and DICE < 0.80). Spearman's correlations confirmed that test‐retest reproducibility improved with volume size. Despite considerable differences of MRI scanner configurations, we found consistent hippocampal subfields volumes estimation. CA2‐3, CA4‐DG, and sub‐CA1 (subiculum, presubiculum, and CA1 pooled together) gave test‐retest reproducibility similar to the whole hippocampus. Our findings suggest that the larger hippocampal subfields volume may be reliable longitudinal markers in multisite studies. Hum Brain Mapp 36:3516–3527, 2015.
- within session T1 averaging
- test-retest reproducibility