Evaluating and Improving Power of Whole Genome Products |
Emerging technologies make it possible for the first time to genotype hundreds of thousands of single nucleotide polymorphisms (SNPs) simultaneously, enabling whole genome association studies. Using empirical genotype data from the International HapMap Project, we evaluate the extent to which the sets of SNPs contained on several whole genome genotyping arrays capture common SNPs across the genome, and find that the majority of common SNPs are well captured by many of these products either directly or through linkage disequilibrium (LD). We explore analytical strategies that utilize HapMap data to improve power of association studies conducted with these fixed sets of markers, and show that inclusion of specified haplotype tests in the downstream analysis can increase the fraction of common variants captured by 25% to 100%. Finally, a Bayesian approach to association analysis can improve power by weighting the likelihood of each statistical test detecting a true positive signal to reflect the number of putative causal alleles to which it is correlated.
Published descriptions of the rationale and computational methods used in our basline evaluations are available in:
Pe’er I, de Bakker PIW, Maller J, Yelensky R, Altshuler D, Daly MJ, Evaluating and Improving Power in Whole Genome Association Studies using Fixed Marker Sets, Nat Genet, 2006;38(6):663-667
de Bakker PI, Yelensky R, Pe'er I, Gabriel SB, Daly MJ, Altshuler D. Efficiency and power in genetic association studies. Nat Genet. 2005; 37(11):1217-1223.
Due to popular demand, especially when R01 deadlines are approaching, we applied our methods for evaluation and improvement of array coverage, and present results here for the different continental ancestry, as represented in the HapMap samples (see description):
Raw data needed for further evaluations and applicaiton of multimarker predictors is also available in the following format.
Evaluation is performed with HapMap PhaseII phased data, chopped to chunks of 10,000 markers at a time. Potential limitations of the method include:
|