Abstract
Genome-wide association (GWA) studies, which typically aim to identify single nucleotide polymorphisms (SNPs) associated with a disease, yield large amounts of high-dimensional data. GWA studies have been successful in identifying single SNPs associated with complex diseases. However, so far, most of the identified associations do only have a limited impact on risk prediction. Recent studies applying SVMs have been successful in improving the risk prediction for Type I and II diabetes, however, a drawback is the poor interpretability of the classifier. Training the SVM only on a subset of SNPs would imply a preselection, typically by the p-values. Especially for complex diseases, this might not be the optimal selection strategy. In this work, we propose an extension of Adaboost for GWA data, the so-called SNPboost. In order to improve classification, SNPboost successively selects a subset of SNPs. On real GWA data (German MI family study II), SNPboost outperformed linear SVM and further improved the performance of a non-linear SVM when used as a preselector. Finally, we motivate that the selected SNPs can be put into a biological context.
Original language | English |
---|---|
Title of host publication | Artificial Neural Networks and Machine Learning – ICANN 2011 |
Editors | Timo Honkela, Włodzisław Duch, Mark Girolami, Samuel Kaski |
Number of pages | 8 |
Volume | 6792 |
Publisher | Springer Verlag |
Publication date | 2011 |
Pages | 111-118 |
ISBN (Print) | 978-3-642-21737-1 |
ISBN (Electronic) | 978-3-642-21738-8 |
DOIs | |
Publication status | Published - 2011 |
Event | 21st International Conference on Artificial Neural Networks - Espoo, Finland Duration: 14.06.2011 → 17.06.2011 |