Abstract
Genome-wide association (GWA) studies provide large amounts of high-dimensional data. GWA studies aim to identify variables that increase the risk for a given phenotype. Univariate examinations have provided some insights, but it appears that most diseases are affected by interactions of multiple factors, which can only be identified through a multivariate analysis. However, multivariate analysis on the discrete, high-dimensional and low-sample-size GWA data is made more difficult by the presence of random effects and nonspecific coupling. In this work, we investigate the suitability of three standard techniques (p-values, SVM, PCA) for analyzing GWA data on several simulated datasets. We compare these standard techniques against a sparse coding approach; we demonstrate that sparse coding clearly outperforms the other approaches and can identify interacting factors in far higher-dimensional datasets than the other three approaches.
| Original language | English |
|---|---|
| Title of host publication | Artificial Neural Networks – ICANN 2010 |
| Editors | Konstantinos Diamantaras, Wlodek Duch, Lazaros S. Iliadis |
| Number of pages | 10 |
| Volume | 6352 |
| Publisher | Springer Verlag |
| Publication date | 08.2010 |
| Pages | 337-346 |
| ISBN (Print) | 978-3-642-15818-6 |
| ISBN (Electronic) | 978-3-642-15819-3 |
| DOIs | |
| Publication status | Published - 08.2010 |
| Event | 20th International Conference Artificial Neural Networks - Thessaloniki, Greece Duration: 15.09.2010 → 18.09.2010 |