TY - JOUR
T1 - Interpretive Risk Assessment on GWA Data with Sparse Linear Regression
AU - Brænne, Ingrid
AU - Labusch, Kai
AU - Martinetz, Thomas
AU - Mamlouk, Amir Madany
PY - 2010
Y1 - 2010
N2 - Genome-wide association (GWA) studies provide large amounts of high-dimensional data. GWA studies aim to identify variables, i.e., single nucleotide polymorphisms (SNP) that increase the risk for a given phenotype and have been successful in identifying susceptibility loci for several complex diseases. A remaining challenge is however to predict the individual risk based on the genetic pattern. Counting the number of unfavorable alleles is a standard approach to estimate the risk of a disease. However this approach limits the risk prediction by only allowing for a subset of predefined SNPs. Recent studies that apply SVM-learning have been successful in improving the risk prediction for Type I and II diabetes. However, a drawback of the SVM is the poor interpretability of the classifier. The aim is thus to classify based on only a small number of SNPs in order to also allow for a genetic interpretability of the resulting classifier. In this work we propose an algorithm that can do exactly this. We use an approximation method for sparse linear regression problems that has been recently proposed and can be applied to large data sets in order to search for the best sparse risk predicting pattern among the complete set of SNPs.
AB - Genome-wide association (GWA) studies provide large amounts of high-dimensional data. GWA studies aim to identify variables, i.e., single nucleotide polymorphisms (SNP) that increase the risk for a given phenotype and have been successful in identifying susceptibility loci for several complex diseases. A remaining challenge is however to predict the individual risk based on the genetic pattern. Counting the number of unfavorable alleles is a standard approach to estimate the risk of a disease. However this approach limits the risk prediction by only allowing for a subset of predefined SNPs. Recent studies that apply SVM-learning have been successful in improving the risk prediction for Type I and II diabetes. However, a drawback of the SVM is the poor interpretability of the classifier. The aim is thus to classify based on only a small number of SNPs in order to also allow for a genetic interpretability of the resulting classifier. In this work we propose an algorithm that can do exactly this. We use an approximation method for sparse linear regression problems that has been recently proposed and can be applied to large data sets in order to search for the best sparse risk predicting pattern among the complete set of SNPs.
UR - https://www.researchgate.net/publication/263962653_Interpretive_Risk_Assessment_on_GWA_Data_with_Sparse_Linear_Regression
UR - https://www.semanticscholar.org/paper/Interpretive-Risk-Assessment-on-GWA-Data-with-Br%C3%A6nne-Labusch/b0a8e39ada37b1680c8b55dfbabcacdd22e5640f
M3 - Journal articles
SP - 61
EP - 68
JO - Machine Learning Reports
JF - Machine Learning Reports
ER -