TY - JOUR
T1 - Systematic analysis of paralogous regions in 41,755 exomes uncovers clinically relevant variation
AU - Steyaert, Wouter
AU - Haer-Wigman, Lonneke
AU - Pfundt, Rolph
AU - Hellebrekers, Debby
AU - Steehouwer, Marloes
AU - Hampstead, Juliet
AU - de Boer, Elke
AU - Stegmann, Alexander
AU - Yntema, Helger
AU - Kamsteeg, Erik Jan
AU - Brunner, Han
AU - Hoischen, Alexander
AU - Gilissen, Christian
N1 - Funding Information:
The aims of this study contribute to the Solve-RD project (to H.B., A.H. and C.G.) which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 779257.
Publisher Copyright:
© 2023, The Author(s).
PY - 2023/10/27
Y1 - 2023/10/27
N2 - The short lengths of short-read sequencing reads challenge the analysis of paralogous genomic regions in exome and genome sequencing data. Most genetic variants within these homologous regions therefore remain unidentified in standard analyses. Here, we present a method (Chameleolyser) that accurately identifies single nucleotide variants and small insertions/deletions (SNVs/Indels), copy number variants and ectopic gene conversion events in duplicated genomic regions using whole-exome sequencing data. Application to a cohort of 41,755 exome samples yields 20,432 rare homozygous deletions and 2,529,791 rare SNVs/Indels, of which we show that 338,084 are due to gene conversion events. None of the SNVs/Indels are detectable using regular analysis techniques. Validation by high-fidelity long-read sequencing in 20 samples confirms >88% of called variants. Focusing on variation in known disease genes leads to a direct molecular diagnosis in 25 previously undiagnosed patients. Our method can readily be applied to existing exome data.
AB - The short lengths of short-read sequencing reads challenge the analysis of paralogous genomic regions in exome and genome sequencing data. Most genetic variants within these homologous regions therefore remain unidentified in standard analyses. Here, we present a method (Chameleolyser) that accurately identifies single nucleotide variants and small insertions/deletions (SNVs/Indels), copy number variants and ectopic gene conversion events in duplicated genomic regions using whole-exome sequencing data. Application to a cohort of 41,755 exome samples yields 20,432 rare homozygous deletions and 2,529,791 rare SNVs/Indels, of which we show that 338,084 are due to gene conversion events. None of the SNVs/Indels are detectable using regular analysis techniques. Validation by high-fidelity long-read sequencing in 20 samples confirms >88% of called variants. Focusing on variation in known disease genes leads to a direct molecular diagnosis in 25 previously undiagnosed patients. Our method can readily be applied to existing exome data.
U2 - 10.1038/s41467-023-42531-9
DO - 10.1038/s41467-023-42531-9
M3 - Article
SN - 2041-1723
VL - 14
JO - Nature Communications
JF - Nature Communications
IS - 1
M1 - 6845
ER -