Computational Genomics

CBMF W4761
Computer Science · Biomedical Engineering · Medical Informatics
Columbia University
Spring Semester, 2004

Course description

In this course, we explore new computational approaches for studying genomic data, including biological sequence data and gene expression ("gene chip") data. Our focus is on machine learning techniques: probabilistic models such as hidden Markov models, learning algorithms like support vector machines for classification problems, clustering algorithms, Bayesian networks for inferring regulatory networks.

Instructional staff

Instructor: Prof. Christina Leslie
Email: cleslie@cs.columbia.edu
Phone: (212) 939-7043
Office: 466 Mudd
Office Hours: Wed 4-6pm (tentative) and by appointment

Teaching Assistant: Eugene Ie
Email: eie@cs.columbia.edu
Office: TBA
Office Hours: TBA

Courseworks site and homepage

The course home page can be found at http://www.cs.columbia.edu/~cleslie/cs4761.

We plan to use the CBMF 4761 Courseworks website to host a bulletin board discussion of class material and for electronic submission of homework. The Courseworks site will be accessible to all students soon. Students should read the course bulletin board on a daily basis and are responsible for information posted there.

Course Goals

The goals for students who take this course: Some notes on our approach:

Topics to be covered

Sequence alignment, hidden Markov models, information-based sequence analysis. Learning algorithms for classification problems: support vector machines, kernel techniques, and clustering algorithms for gene expression and sequence data. Computational signal finding. Inferring regulatory networks using Bayesian nets.

Prerequisites

A sufficient prerequisite for the course is If you haven't taken this course, the following list should provide an idea of necessary background:

Computer Science

Biology Mathematics

Course materials

The following textbook is required for all students:

The text will (soon) be available in the Columbia University Bookstore in Lerner Hall. The following textbook is recommended for background in machine learning:

For students without significant background in biology, the following textbook is recommended:

The following books are not required but may be of interest.

Computational Biology:

Computer Science/Machine Learning:

Biology:

Additional readings will be available on the web (see the links in the lecture schedule below).

Course requirements

As the final project for the course, students will complete a group research-oriented project (ideally, teams of 3-4 people). Projects will consist of writing a computer program (or using an existing one), running experiments on real biological data, summarizing the results on a web site, and writing up the results in a technical report. Suggestions for projects will be made available during the term.

In addition, 3-4 homework assignments consisting of theory and programming problems will be assigned during the semester. Late homeworks are penalized 10% per calendar day.

Examinations

There will be two in-class, 75-minute, open book tests. The tests are scheduled for Monday, March 8 and Monday, May 3.

Course grade