Lecture Schedule

CBMF 4761
Department of Computer Science
Spring Semester, 2003
For tentative schedule, see last year's lecture page.

Lecture 1 (Tues Jan 21)

Most of following slides are borrowed from Prof. Russ Altman (Stanford Medical Informatics): Online repositories for biological data:

Lecture 2 (Thurs Jan 23)

Reading: Section 1.3 in Durbin. There is also good but terse material on probabilistic methods in Chapter 11 of the text -- see in particular Section 11.3 on inference.

some background material and references for the splice site recognition problem (supplied for your interest only -- you aren't required to know details about splicing beyond what I present in class):

Lecture 3 (Tues Jan 28)

Reading: Durbin, Sections 2.1, 2.2 and 2.3 until the end of the subsection on global alignment (Needleman-Wunsch algorithm).

Lecture 4 (Thurs Jan 30)

Reading: Durbin, Sections 2.3 until the end of the subsection on local alignment (Smith-Waterman). Also take a look at the affine gap penalty part of Section 2.4. We won't do every variant of pairwise alignment in class, but it's useful to see that there are so many different versions.

Lecture 5 (Tues Feb 4)

Reading: Read through Section 2.5 on heuristic alignment algorithms, Section 2.7 on significance of scores (the "classical approach" subsection is most important), and Section 2.8 on deriving score parameters from data. Some of the explanation is quite sketchy, and the links below provide clearer exposition. Also start in on Section 3.1 for Markov chains.

Lecture 6 (Thurs Feb 6)

Reading: Sections 3.1 and start of 3.2 on Markov chains and Hidden Markov Models for CpG island detection.

Lecture 7 (Tues Feb 11)

Reading: Continue with Section 3.2 on the Viterbi algorithm for Hidden Markov models.

Lecture 8 (Thurs Feb 13)

Reading: Section 3.2 on posterior decoding (the forward and backward algorithms). We'll fully cover the Viterbi algorithm and hopefully finish posterior decoding also.

Lecture 9 (Tues Feb 18)

Reading: Training HMMs -- the parameter estimation problem (Section 3.3). We'll finish discuss of the forward/backward algorithms for posterior decoding. We'll also discuss maximimum likelihood estimation when (1) states for the training data are known and (2) states for the training data are unknown (Expectation Maximization). Also read about scaling probabilities for the forward/backward algorithms in the last section of Chapter 3.

Lecture 10 (Thurs Feb 20)

Reading: Section 3.3 on the Baum-Welch algorithm. We'll finish discussion the learning algorithm (a special case of Expectation Maximization) used to train the parameters of an HMM when the state sequence for the training data is unknown. If there's time, we'll briefly go over Chapter 4 on pair HMMs, used to produce alignments.

Lecture 11 (Tues Feb 25)

Reading: Chapter 4 on pair HMMs, used to produce alignments. If you want to read more about Expectation Maximization in general and the Baum-Welch algorithm in particular (material from last time), you can check out Chapter 11 in the text (beware of typos in equations).

Lecture 12 (Thurs Feb 27)

Reading: Chapter 5 on profile HMMs for modeling protein families.

Lecture 13 (Tues March 4)

We'll finish profile HMMs with a discussion of the sophisticated prior distributions, i.e. mixture of Dirichlet priors, used in parameter estimation for these models. We'll also do a quick tour of the secondary structure prediction problem:

Lecture 14 (Thurs March 6)

In this lecture, we'll give an introduction to microarray technology, gene expression data, and an overview of some of the main learning problems of interest for this data: classification of samples, clustering, and inference of regulatory relationships.

Lecture 15 (Tues March 11)

First, to help you get groups together for the project -- please post an information web page about yourself and your interests for the project. Please see the project guidelines page for required information. Send the url to Omar (osa2001@cs.columbia.edu),or send him the HTML page itself if you prefer. Please do this sometime during the week. I would like to have information for everyone in the class posted before Spring Break.

In this lecture, we'll discuss clustering algorithms for gene expression data, such as hiearchical clustering and K-means. We'll also touch on some other learning problems in functional genomics, including regulatory network inference.

Lecture 16 (Thurs March 13)

Midterm test

Lecture 17 (Tues March 25)

We'll talk about approaches to the classification problem -- including k-nearest neighbor and Fisher's linear discriminant -- and introduce support vector machines and kernel methods.

Lecture 18 (Thurs March 27)

Presentation of the SVM hard margin ("maximal margin") classifier, slack variable idea for soft margin SVMs.

Lecture 19 (Tues April 1)

We'll quickly finish soft-margin SVM classifiers, kernels, and feature selection. We'll also start a discussion of transcriptional regulation, in preparation for talking about Bayes nets and other models for learning regulatory networks.

Lecture 20 (Thurs April 3)

We'll present an overview of Bayes nets for inferring regulatory networks and start discussing the papers listed below.

Lecture 21/Lecture 22 (Tues April 8)

We'll have back to back lectures today, giving details of the three papers introduced in the last lecture for using Bayes nets to learn regulatory networks. In particular, we'll talk about the Bayesian score for scoring network structures and several approaches for learning structures. Note that there will be no new lecture on Thursday, April 10, but CVN has agreed to show the video of the Lecture 22 during the regular class time on Thursday.

Lecture 23 (Tues April 15)

We'll have a guest lecture, Prof. Harmen Bussemaker from the Biology Department, who will talk about the REDUCE algorithm, which detects regulatory elements (motifs) from promoter sequences via correlation with gene expression.

Lecture 24 (Thurs April 17)

We'll discuss some approaches to motif discovery, also called computational signal finding. This lecture, we discuss MEME, a popular motif discovery algorithm based on expectation maximization.

Lecture 25 (Tues April 22)

Introduction to computational gene-finding for eukaryotes (in particular, vertebrates and humans). The main reference is Chris Burge's paper on GENSCAN, one of the best-known gene-finding programs. The second reference is David Haussler's review article on computational gene-finding.

Lecture 26 (Thurs April 24)

We'll do a quick discussion of TWINSCAN, a new gene-finding algorithm that uses both the GENSCAN model and a model of conservation across two organisms to improve prediction. We'll also talk about a new comparative genomics paper from Eric Lander's group (computational companion paper to an upcoming Nature paper), which used comparative annotation of four specicies of yeast to do regulatory motif discovery.

Lecture 27 (Tues April 29)

We'll talk about protein classification and remote homology detection, one of the central problems in computational biology.

Lecture 28 (Thurs May 1)

Final test