TY - JOUR
T1 - Analyzing Illumina Gene Expression Microarray Data from Different Tissues: Methodological Aspects of Data Analysis in the MetaXpress Consortium
AU - Schurmann, Claudia
AU - Heim, Katharina
AU - Schillert, Arne
AU - Blankenberg, Stefan
AU - Carstensen, Maren
AU - Dörr, Marcus
AU - Endlich, Karlhans
AU - Felix, Stephan B.
AU - Gieger, Christian
AU - Grallert, Harald
AU - Herder, Christian
AU - Hoffmann, Wolfgang
AU - Homuth, Georg
AU - Illig, Thomas
AU - Kruppa, Jochen
AU - Meitinger, Thomas
AU - Müller, Christian
AU - Nauck, Matthias
AU - Peters, Annette
AU - Rettig, Rainer
AU - Roden, Michael
AU - Strauch, Konstantin
AU - Völker, Uwe
AU - Völzke, Henry
AU - Wahl, Simone
AU - Wallaschofski, Henri
AU - Wild, Philipp S.
AU - Zeller, Tanja
AU - Teumer, Alexander
AU - Prokisch, Holger
AU - Ziegler, Andreas
PY - 2012/12/7
Y1 - 2012/12/7
N2 - Microarray profiling of gene expression is widely applied in molecular biology and functional genomics. Experimental and technical variations make meta-analysis of different studies challenging. In a total of 3358 samples, all from German population-based cohorts, we investigated the effect of data preprocessing and the variability due to sample processing in whole blood cell and blood monocyte gene expression data, measured on the Illumina HumanHT-12 v3 BeadChip array. Gene expression signal intensities were similar after applying the log2 or the variance-stabilizing transformation. In all cohorts, the first principal component (PC) explained more than 95% of the total variation. Technical factors substantially influenced signal intensity values, especially the Illumina chip assignment (33-48% of the variance), the RNA amplification batch (12-24%), the RNA isolation batch (16%), and the sample storage time, in particular the time between blood donation and RNA isolation for the whole blood cell samples (2-3%), and the time between RNA isolation and amplification for the monocyte samples (2%). White blood cell composition parameters were the strongest biological factors influencing the expression signal intensities in the whole blood cell samples (3%), followed by sex (1-2%) in both sample types. Known single nucleotide polymorphisms (SNPs) were located in 38% of the analyzed probe sequences and 4% of them included common SNPs (minor allele frequency >5%). Out of the tested SNPs, 1.4% significantly modified the probe-specific expression signals (Bonferroni corrected p-value<0.05), but in almost half of these events the signal intensities were even increased despite the occurrence of the mismatch. Thus, the vast majority of SNPs within probes had no significant effect on hybridization efficiency. In summary, adjustment for a few selected technical factors greatly improved reliability of gene expression analyses. Such adjustments are particularly required for meta-analyses.
AB - Microarray profiling of gene expression is widely applied in molecular biology and functional genomics. Experimental and technical variations make meta-analysis of different studies challenging. In a total of 3358 samples, all from German population-based cohorts, we investigated the effect of data preprocessing and the variability due to sample processing in whole blood cell and blood monocyte gene expression data, measured on the Illumina HumanHT-12 v3 BeadChip array. Gene expression signal intensities were similar after applying the log2 or the variance-stabilizing transformation. In all cohorts, the first principal component (PC) explained more than 95% of the total variation. Technical factors substantially influenced signal intensity values, especially the Illumina chip assignment (33-48% of the variance), the RNA amplification batch (12-24%), the RNA isolation batch (16%), and the sample storage time, in particular the time between blood donation and RNA isolation for the whole blood cell samples (2-3%), and the time between RNA isolation and amplification for the monocyte samples (2%). White blood cell composition parameters were the strongest biological factors influencing the expression signal intensities in the whole blood cell samples (3%), followed by sex (1-2%) in both sample types. Known single nucleotide polymorphisms (SNPs) were located in 38% of the analyzed probe sequences and 4% of them included common SNPs (minor allele frequency >5%). Out of the tested SNPs, 1.4% significantly modified the probe-specific expression signals (Bonferroni corrected p-value<0.05), but in almost half of these events the signal intensities were even increased despite the occurrence of the mismatch. Thus, the vast majority of SNPs within probes had no significant effect on hybridization efficiency. In summary, adjustment for a few selected technical factors greatly improved reliability of gene expression analyses. Such adjustments are particularly required for meta-analyses.
UR - http://www.scopus.com/inward/record.url?scp=84870895128&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0050938
DO - 10.1371/journal.pone.0050938
M3 - Journal articles
C2 - 23236413
AN - SCOPUS:84870895128
VL - 7
JO - PLoS ONE
JF - PLoS ONE
IS - 12
M1 - e50938
ER -