Abstract
Single nucleotide polymorphism-gene expression associations have received increasing interest. The aim of these studies is discovering a difference in the location parameters of gene expressions given genotype. Because gene expressions often are highly skewed, heavy-tailed or data of different genotypes vary in dispersion, the median is the most appropriate measure of location. In this case, model assumptions of standard statistical methods for comparing locations such as the analysis of variance (ANOVA) or the Kruskal-Wallis (KW) test are violated. Alternatives that might be more appropriate are the median test (MED) and tests based on mutual information (MI). In simulation studies these approaches and a novel MI test are compared with ANOVA and KW. Location, dispersion and skewness parameters of the gene expression distributions given genotypes are varied as well as genotype frequencies. The MED test and the novel MI-based method keep the nominal significance levels for comparing medians if gene expression data are non-normally distributed. ANOVA and KW have substantially inflated type I errors. They are, however, optimal if standard model assumptions are fulfilled. The MED test generally has larger power than MI and is therefore recommended if model assumptions of standard procedures are violated. A 300 kb region on chromosome 9p21.3, which is associated with coronary artery disease, was analyzed using the HapMap data. Only the alternative approaches were able to identify three genes (ADM, FCGR3B and ADORA1) as promising candidates to clarify the molecular mechanism of the genetic association.
Originalsprache | Englisch |
---|---|
Zeitschrift | Statistics in Medicine |
Jahrgang | 28 |
Ausgabenummer | 29 |
Seiten (von - bis) | 3581-3596 |
Seitenumfang | 16 |
ISSN | 0277-6715 |
DOIs | |
Publikationsstatus | Veröffentlicht - 20.12.2009 |