Nonparametric approaches have been developed that are able to analyze large numbers of single nucleotide polymorphisms (SNPs) in modest sample sizes. These approaches have different selection features and may not provide similar results when applied to the same dataset. Therefore, we compared the results of three approaches (set association, random forests and multifactor dimensionality reduction [MDR]) to select from a total of 93 candidate SNPs a subset of SNPs that are important in determining high-density lipoprotein (HDL)-cholesterol levels. The study population consisted of a random sample from a Dutch monitoring project for cardiovascular disease risk factors and was dichotomized into cases (low HDL-cholesterol, n = 533) and non-cases (high HDL-cholesterol, n = 545) based on gender-specific median values for HDL cholesterol. Clearly, all three approaches prioritized three SNPs as important (CETP Taq1B, CETP-629 C/A and LPL Ser447X). Two SNPs with weaker main effects were additionally prioritized by random forests (APOC3 3175 G/C and CCR2 Val62Ile), whereas MTHFR 677 C/T was selected in combination with CETP Taq1B as best model by MDR. Obtained p-values for the selected models were significant for the set association approach (p =.0019), random forests (p<.01) and MDR (p<.02). In conclusion, the application of a combination of multi-locus methods is a useful approach in genetic association studies to select a well-defined set of important SNPs for further statistical and epidemiological interpretation, providing increased confidence and more information compared with the application of only one method.
Heidema, A. G., Feskens, E. J., Doevendans, P. A., Ruven, H. J., van Houwelingen, H. C., Mariman, E. C., & Boer, J. M. (2007). Analysis of multiple SNPs in genetic association studies: comparison of three multi-locus methods to prioritize and select SNPs. Genetic Epidemiology, 31(8), 910-921. https://doi.org/10.1002/gepi.20251