Sparse Coding for Feature Selection on Genome-wide Association Data

Ingrid Brænne, Kai Labusch, Amir Madany Mamlouk


Genome-wide association (GWA) studies provide large amounts of high-dimensional data. GWA studies aim to identify variables that increase the risk for a given phenotype. Univariate examinations have provided some insights, but it appears that most diseases are affected by interactions of multiple factors, which can only be identified through a multivariate analysis. However, multivariate analysis on the discrete, high-dimensional and low-sample-size GWA data is made more difficult by the presence of random effects and nonspecific coupling. In this work, we investigate the suitability of three standard techniques (p-values, SVM, PCA) for analyzing GWA data on several simulated datasets. We compare these standard techniques against a sparse coding approach; we demonstrate that sparse coding clearly outperforms the other approaches and can identify interacting factors in far higher-dimensional datasets than the other three approaches.
Original languageEnglish
Title of host publicationArtificial Neural Networks – ICANN 2010
EditorsKonstantinos Diamantaras, Wlodek Duch, Lazaros S. Iliadis
Number of pages10
PublisherSpringer Verlag
Publication date08.2010
ISBN (Print)978-3-642-15818-6
ISBN (Electronic)978-3-642-15819-3
Publication statusPublished - 08.2010
Event20th International Conference Artificial Neural Networks
- Thessaloniki, Greece
Duration: 15.09.201018.09.2010


Dive into the research topics of 'Sparse Coding for Feature Selection on Genome-wide Association Data'. Together they form a unique fingerprint.

Cite this