Overview:
The Pe'er Lab Studies, Develops and Applies Novel Computational Methods in Human Genetics
How is it best to measure, describe and quantify differences
between individual DNA sequences?
How does sequence variation affect biological processes?
How can we use it to understand and influence human disease?
All these questions pose complex analytical challenges,
with direct impact on medical research.
Human genetics is as ancient as human history.
It's computational foundations are intertwined with the most fundamental
developments in statistics.
Such quantifications reveal the tremendous degree to which medical traits
are heritable, and motivate a large research community to investigate the
interconnections between gene variants (genotypes)
and observed traits (phenotypes).
The third millennium finds genetics more flourishing than ever
with high throughput technologies generating large scale data sets,
yet with more need than ever of computational innovation and methods to process
these data into meaningful biomedical insights.
The upcoming era,
of complete genotype information available to each individual
on the planet therefore holds the potential of great discoveries,
and poses the challenges of powerful and rigorous analyses of these data.
Genetics of Common Variants: Genomewide Association Studies
Whole genome arrays of Single Nucleotide Polymorphisms (SNPs)
and Copy Number Variants (CNVs) have been designed to represent
common variation in the human genome, aiming to associate such variation with health-related traits.
We are involved in the analysis of such Genomewide Association Study (GWAS) data across multiple phenotypes:
Read more
Genetics of Rare Variants: Identity by Descent between Purported Unrelated Individuals
The co-inheritance of long haplotypes in recent generations is key to the analysis of
rare variants carried on the background of these segments that are Identical By Descent (IBD).
We have developed a linear-time algorithm to scan a large population for
IBD segments without the quadratic exhaustive search of all pairs of individuals.
This enables genomewide analysis of thousands of samples, and paves the way to multiple avenues of research.
Specifically, we have been using IBD in unrelateds to inform phasing,
deletion detection, population structure, and fine-mapping of associated variants.
Read more
High Throughput Sequencing for Comprehensively Cataloging Variants
While most of the current genetic data is based on SNP markers,
DNA sequencing has been increasing in throughput and cost-effectiveness
out-pacing Moore's law.
We have been developing and applying computational methods to tackle this torrent of sequence data.
Specifically, we have refined models of genomic coverage in worm resequencing data,
observing they fit a Gamma distribution, rather than a Poisson model.
We further developed a novel method for sequencing of DNA from pools of individuals.
The method designs overlapping pools, so that an individual carrier of a discovered variant
can be traced through the intersection of such pools.
Error Correcting Codes make such pool designs robust to experimental and statistical error.
Read more
Somatic vs. Germline Genetic Variation and Its Effect on Cancer
Cancer is a genetic illness of cells.
Somatic lineages suffer loss and gain of parts of the genome.
As somatic copy number changes occur randomly,
but may be sometimes selected for in tumorgenesis,
we are investigating the allele specificity of this selection,
i.e., the connections between an individual's germline genotype and
the observed copy number changes in the tumor's somatic genotypes.
Contact person:Ninad
A word about updates and backward compatibility: Ideally, this page would be always up to date.
However, in practice it is overhauled on occasion, and then left as it is.
During the current update (Oct. 2008), I found it refreshing, humbling and overall educating to
reflect on my research interests as I wrote them down
in August 2006.