Columbia Computer Science
Faculty Candidate Colloquium

Spring 2004

An Accurate System for Analysis of Genetic Disease Association

Eran Halperin


Department of Computer Science
Tel-Aviv University

Monday, April 19th, 11 AM, Interschool Lab, 7th floor, CEPSR

Abstract

Each person's genome contains two copies of each chromosome, one inherited from the father and the other from the mother. A person's genotype specifies the pair of bases at each site, but does not specify which base occurs on which chromosome. The sequence of each chromosome separately is called a haplotype. The determination of the haplotypes within a population is essential for understanding genetic variation and the inheritance of complex diseases. The haplotype mapping project, a successor to the human genome project, seeks to determine the common haplotypes in the human population.

Experimental determination of a person's component haplotypes is an expensive and time consuming process, and it is more attractive to first determine genotypes experimentally (relatively simple) and then use them to compute haplotypes. This computation is not simple and is complicated by the fact that current sequencing technology often gives the DNA sequence with some missing nucleotide bases at some positions. Consequently, it is important to find efficient algorithms for the reconstruction of the haplotypes from noisy data. In this talk I will introduce a system which accurately reconstructs haplotypes from missing or genotype data. I will present the applicability of this system to various biological data sets and disease association studies.