Abstract
Genome-wide association (GWA) studies provide large amounts of high-dimensional data. GWA studies aim to identify variables that increase the risk for a given phenotype. Univariate examinations have provided some insights, but it appears that most diseases are affected by interactions of multiple factors, which can only be identified through a multivariate analysis. However, multivariate analysis on the discrete, high-dimensional and low-sample-size GWA data is made more difficult by the presence of random effects and nonspecific coupling. In this work, we investigate the suitability of three standard techniques (p-values, SVM, PCA) for analyzing GWA data on several simulated datasets. We compare these standard techniques against a sparse coding approach; we demonstrate that sparse coding clearly outperforms the other approaches and can identify interacting factors in far higher-dimensional datasets than the other three approaches.
Original language | English |
---|---|
Title of host publication | Artificial Neural Networks – ICANN 2010 |
Editors | Konstantinos Diamantaras, Wlodek Duch, Lazaros S. Iliadis |
Number of pages | 10 |
Volume | 6352 |
Publisher | Springer Verlag |
Publication date | 08.2010 |
Pages | 337-346 |
ISBN (Print) | 978-3-642-15818-6 |
ISBN (Electronic) | 978-3-642-15819-3 |
DOIs | |
Publication status | Published - 08.2010 |
Event | 20th International Conference Artificial Neural Networks - Thessaloniki, Greece Duration: 15.09.2010 → 18.09.2010 |