TY - JOUR
T1 - Genome-wide association studies: Quality control and population-based measures
AU - Ziegler, Andreas
PY - 2009
Y1 - 2009
N2 - Genome-wide association studies, using hundreds of thousands of single-nucleotide polymorphism (SNP) markers, have become a standard approach for identifying disease susceptibility genes. The change in the technology poses substantial computational and statistical challenges that have been addressed in the quality control, imputation, and population-based measure groups of the Genetic Analysis Workshop 16. The computational challenges pertain to efficient memory management and computational speed of the statistical procedures, and we discuss an approach for efficient SNP storage. Accuracy and computational speed is relevant for genotype calling, and the results from a comparison of three calling algorithms are discussed. The first statistical challenge is related to statistical quality control, and we discuss two novel quality control procedures. These low-level analyses have an effect on subsequent preparatory steps for high-level analyses, e.g., the quality of genotype imputation approaches. After the conduct of a genome-wide association study with successful replication and/or validation, measures of diagnostic accuracy, including the area under the curve, are investigated. The area under the curve can be constructed from summary data in some situations. Finally, we discuss how the populationattributable risk of a genetic variant that is only measured in a reference data set can be determined.
AB - Genome-wide association studies, using hundreds of thousands of single-nucleotide polymorphism (SNP) markers, have become a standard approach for identifying disease susceptibility genes. The change in the technology poses substantial computational and statistical challenges that have been addressed in the quality control, imputation, and population-based measure groups of the Genetic Analysis Workshop 16. The computational challenges pertain to efficient memory management and computational speed of the statistical procedures, and we discuss an approach for efficient SNP storage. Accuracy and computational speed is relevant for genotype calling, and the results from a comparison of three calling algorithms are discussed. The first statistical challenge is related to statistical quality control, and we discuss two novel quality control procedures. These low-level analyses have an effect on subsequent preparatory steps for high-level analyses, e.g., the quality of genotype imputation approaches. After the conduct of a genome-wide association study with successful replication and/or validation, measures of diagnostic accuracy, including the area under the curve, are investigated. The area under the curve can be constructed from summary data in some situations. Finally, we discuss how the populationattributable risk of a genetic variant that is only measured in a reference data set can be determined.
UR - http://www.scopus.com/inward/record.url?scp=71449096845&partnerID=8YFLogxK
U2 - 10.1002/gepi.20472
DO - 10.1002/gepi.20472
M3 - Journal articles
C2 - 19924716
AN - SCOPUS:71449096845
SN - 0741-0395
VL - 33
SP - S45-S50
JO - Genetic Epidemiology
JF - Genetic Epidemiology
IS - SUPPL. 1
ER -