Overview
I study, develop and apply novel computational methods in human genetics.
How is it best to measure, describe and quantify differences
between individual DNA sequences?
How does sequence variation affect biological processes?
How can we use it to understand and influence human disease?
All these questions pose complex analytical challenges,
with direct impact on medical research.
Human genetics is as ancient as human history.
It's computational foundations are intertwined with the most fundamental
developments in statistics.
Such quantifications reveal the tremendous degree to which medical traits
are heritable, and motivate a large research community to investigate the
interconnections between gene variants (genotypes)
and observed traits (phenotypes).
The third millenium finds genetics more flourishing than ever
with high throughput technologies generating large scale data sets,
yet with more need than ever of computational innovation and methods to process
these data into meaningful biomedical insights.
The upcoming era,
of complete genotype information available to each individual
on the planet therefore holds the potential of great discoveries,
and poses the challenges of powerful and rigorous analyses of these data.
Genetics of special populations
A model system where complete genetics can be practiced at present is a small,
isolated population.
In such a place everybody can genotyped for the markers
along entire genome,
and because of the bottleneck effect on gene flow
during founding of the population,
markers would exhibit high correlation to disease-causing variants,
therefore the marker density offered by current technology
has the potential of dissecting the genetics of heritable disease.
The Pacific island of
Kosrae,
Federates States of Micronesia
is exatly such a population.
Furthermore, high prevalence of the diabetes, obesity,
hypertension and hyperlipidemia make the island a compelling population
to dissect the genetics of their combination, the Metabolic Syndrome.
Joint with
Jeff Friedmann's
and other labs at the
Rockefeller Institute and the
Altshuler/Daly group at
Broad Institute
we have quantified the effects of population isolation in Kosrae,
and mapped European admixture into the island.
We have recently generated genomewide data for essentially the entire adult
population, making an association study for metabolic disorders of unprecedented
magnitude and complexity.
We are continually developing computational methods to handle the complexities
of the Kosrae data and other special populations
in terms of magnitude, population isolation, internal family relatedness,
and ethnic admixture.
Predicting variation in the general population
Current technologies allow experimentally typing only
a small fraction of human genetic variants.
Yet, due to correlation among alleles of nearby markers called
Linkage Disequilibrium, we are able to infer the genotypes of other markers
in the same samples.
We are using data form the
Human Haplotype Map (HapMap)
to train advanced computational models that predict most of the 10 million
human variants based on observing only two orders of magnitude
less data points.
Active research projects include
- Using haplotypes (multi-marker predictors) to increase in-silico
coverage of the
genome by standard genomewide genotyping products [more]
- Using Markov random fields to improve prediction of variation
and genetic discovery.
- Combining sparse marker data with densely-typed family members
to facilitate high-coverage genotype inference.
- Using coalescence to predict rare variants.
Somatic genetic variation
Cancer is a genetic illness of cells.
Somatic lineages suffer loss and gain of parts of the genome.
As somatic copy number changes occur randomly,
but may be sometimes selected for in tumorgenesis,
we are investigating the allele specificity of this selection,
i.e., the connections between an individual's germline genotype and
the observed copy number changes in the tumor's somatic genotypes.