Evaluating and Improving Power of Whole Genome Products

Overview

Emerging technologies make it possible for the first time to genotype hundreds of thousands of single nucleotide polymorphisms (SNPs) simultaneously, enabling whole genome association studies. Using empirical genotype data from the International HapMap Project, we evaluate the extent to which the sets of SNPs contained on several whole genome genotyping arrays capture common SNPs across the genome, and find that the majority of common SNPs are well captured by many of these products either directly or through linkage disequilibrium (LD). We explore analytical strategies that utilize HapMap data to improve power of association studies conducted with these fixed sets of markers, and show that inclusion of specified haplotype tests in the downstream analysis can increase the fraction of common variants captured by 25% to 100%. Finally, a Bayesian approach to association analysis can improve power by weighting the likelihood of each statistical test detecting a true positive signal to reflect the number of putative causal alleles to which it is correlated.


Background Work

Published descriptions of the rationale and computational methods used in our basline evaluations are available in:


Data for the Community

Due to popular demand, especially when R01 deadlines are approaching, we applied our methods for evaluation and improvement of array coverage, and present results here for the different continental ancestry, as represented in the HapMap samples (see description):

Additional summary statistics will be provided upon request.

Raw data needed for further evaluations and applicaiton of multimarker predictors is also available in the following format.


Method Details

Evaluation is performed with HapMap PhaseII phased data, chopped to chunks of 10,000 markers at a time. Potential limitations of the method include:

Home Research Contact Teaching Publications