Analysis of multiple SNPs in genetic association studies: comparison of three multi-locus methods to prioritize and select SNPs

A.G. Heidema; E.J. Feskens; P.A. Doevendans; H.J. Ruven; H.C. van Houwelingen; E.C. Mariman; J.M. Boer

doi:10.1002/gepi.20251

Analysis of multiple SNPs in genetic association studies: comparison of three multi-locus methods to prioritize and select SNPs

A.G. Heidema^*, E.J. Feskens, P.A. Doevendans, H.J. Ruven, H.C. van Houwelingen, E.C. Mariman, J.M. Boer

^*Corresponding author for this work

NUTRIM School of Nutrition and Translational Research in Metabolism

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Nonparametric approaches have been developed that are able to analyze large numbers of single nucleotide polymorphisms (SNPs) in modest sample sizes. These approaches have different selection features and may not provide similar results when applied to the same dataset. Therefore, we compared the results of three approaches (set association, random forests and multifactor dimensionality reduction [MDR]) to select from a total of 93 candidate SNPs a subset of SNPs that are important in determining high-density lipoprotein (HDL)-cholesterol levels. The study population consisted of a random sample from a Dutch monitoring project for cardiovascular disease risk factors and was dichotomized into cases (low HDL-cholesterol, n = 533) and non-cases (high HDL-cholesterol, n = 545) based on gender-specific median values for HDL cholesterol. Clearly, all three approaches prioritized three SNPs as important (CETP Taq1B, CETP-629 C/A and LPL Ser447X). Two SNPs with weaker main effects were additionally prioritized by random forests (APOC3 3175 G/C and CCR2 Val62Ile), whereas MTHFR 677 C/T was selected in combination with CETP Taq1B as best model by MDR. Obtained p-values for the selected models were significant for the set association approach (p =.0019), random forests (p<.01) and MDR (p<.02). In conclusion, the application of a combination of multi-locus methods is a useful approach in genetic association studies to select a well-defined set of important SNPs for further statistical and epidemiological interpretation, providing increased confidence and more information compared with the application of only one method.

Original language	English
Pages (from-to)	910-921
Journal	Genetic Epidemiology
Volume	31
Issue number	8
DOIs	https://doi.org/10.1002/gepi.20251
Publication status	Published - 1 Jan 2007

Access to Document

10.1002/gepi.20251

Cite this

@article{29211bc479f14e9db6c9622a2b780048,

title = "Analysis of multiple SNPs in genetic association studies: comparison of three multi-locus methods to prioritize and select SNPs",

abstract = "Nonparametric approaches have been developed that are able to analyze large numbers of single nucleotide polymorphisms (SNPs) in modest sample sizes. These approaches have different selection features and may not provide similar results when applied to the same dataset. Therefore, we compared the results of three approaches (set association, random forests and multifactor dimensionality reduction [MDR]) to select from a total of 93 candidate SNPs a subset of SNPs that are important in determining high-density lipoprotein (HDL)-cholesterol levels. The study population consisted of a random sample from a Dutch monitoring project for cardiovascular disease risk factors and was dichotomized into cases (low HDL-cholesterol, n = 533) and non-cases (high HDL-cholesterol, n = 545) based on gender-specific median values for HDL cholesterol. Clearly, all three approaches prioritized three SNPs as important (CETP Taq1B, CETP-629 C/A and LPL Ser447X). Two SNPs with weaker main effects were additionally prioritized by random forests (APOC3 3175 G/C and CCR2 Val62Ile), whereas MTHFR 677 C/T was selected in combination with CETP Taq1B as best model by MDR. Obtained p-values for the selected models were significant for the set association approach (p =.0019), random forests (p<.01) and MDR (p<.02). In conclusion, the application of a combination of multi-locus methods is a useful approach in genetic association studies to select a well-defined set of important SNPs for further statistical and epidemiological interpretation, providing increased confidence and more information compared with the application of only one method.",

author = "A.G. Heidema and E.J. Feskens and P.A. Doevendans and H.J. Ruven and {van Houwelingen}, H.C. and E.C. Mariman and J.M. Boer",

year = "2007",

month = jan,

day = "1",

doi = "10.1002/gepi.20251",

language = "English",

volume = "31",

pages = "910--921",

journal = "Genetic Epidemiology",

issn = "0741-0395",

publisher = "Wiley-Blackwell",

number = "8",

}

TY - JOUR

T1 - Analysis of multiple SNPs in genetic association studies: comparison of three multi-locus methods to prioritize and select SNPs

AU - Heidema, A.G.

AU - Feskens, E.J.

AU - Doevendans, P.A.

AU - Ruven, H.J.

AU - van Houwelingen, H.C.

AU - Mariman, E.C.

AU - Boer, J.M.

PY - 2007/1/1

Y1 - 2007/1/1

N2 - Nonparametric approaches have been developed that are able to analyze large numbers of single nucleotide polymorphisms (SNPs) in modest sample sizes. These approaches have different selection features and may not provide similar results when applied to the same dataset. Therefore, we compared the results of three approaches (set association, random forests and multifactor dimensionality reduction [MDR]) to select from a total of 93 candidate SNPs a subset of SNPs that are important in determining high-density lipoprotein (HDL)-cholesterol levels. The study population consisted of a random sample from a Dutch monitoring project for cardiovascular disease risk factors and was dichotomized into cases (low HDL-cholesterol, n = 533) and non-cases (high HDL-cholesterol, n = 545) based on gender-specific median values for HDL cholesterol. Clearly, all three approaches prioritized three SNPs as important (CETP Taq1B, CETP-629 C/A and LPL Ser447X). Two SNPs with weaker main effects were additionally prioritized by random forests (APOC3 3175 G/C and CCR2 Val62Ile), whereas MTHFR 677 C/T was selected in combination with CETP Taq1B as best model by MDR. Obtained p-values for the selected models were significant for the set association approach (p =.0019), random forests (p<.01) and MDR (p<.02). In conclusion, the application of a combination of multi-locus methods is a useful approach in genetic association studies to select a well-defined set of important SNPs for further statistical and epidemiological interpretation, providing increased confidence and more information compared with the application of only one method.

AB - Nonparametric approaches have been developed that are able to analyze large numbers of single nucleotide polymorphisms (SNPs) in modest sample sizes. These approaches have different selection features and may not provide similar results when applied to the same dataset. Therefore, we compared the results of three approaches (set association, random forests and multifactor dimensionality reduction [MDR]) to select from a total of 93 candidate SNPs a subset of SNPs that are important in determining high-density lipoprotein (HDL)-cholesterol levels. The study population consisted of a random sample from a Dutch monitoring project for cardiovascular disease risk factors and was dichotomized into cases (low HDL-cholesterol, n = 533) and non-cases (high HDL-cholesterol, n = 545) based on gender-specific median values for HDL cholesterol. Clearly, all three approaches prioritized three SNPs as important (CETP Taq1B, CETP-629 C/A and LPL Ser447X). Two SNPs with weaker main effects were additionally prioritized by random forests (APOC3 3175 G/C and CCR2 Val62Ile), whereas MTHFR 677 C/T was selected in combination with CETP Taq1B as best model by MDR. Obtained p-values for the selected models were significant for the set association approach (p =.0019), random forests (p<.01) and MDR (p<.02). In conclusion, the application of a combination of multi-locus methods is a useful approach in genetic association studies to select a well-defined set of important SNPs for further statistical and epidemiological interpretation, providing increased confidence and more information compared with the application of only one method.

U2 - 10.1002/gepi.20251

DO - 10.1002/gepi.20251

M3 - Article

C2 - 17615573

SN - 0741-0395

VL - 31

SP - 910

EP - 921

JO - Genetic Epidemiology

JF - Genetic Epidemiology

IS - 8

ER -