Lecture schedule from Spring 2002

Lecture 1

Most of following slides are borrowed from Prof. Russ Altman (Stanford Medical Informatics): Online repositories for biological data:

Lecture 2

Reading: Section 1.3 in Durbin. There is also good but terse material on probabilistic methods in Chapter 11 of the text -- see in particular Section 11.3 on inference.

Some background material and references for the splice site recognition problem (supplied for your interest only -- you aren't required to know details about splicing beyond what I present in class):

Lecture 3

Reading: Durbin, Sections 2.1, 2.2 and 2.3 until the end of the subsection on global alignment (Needleman-Wunsch algorithm).

Lecture 4

Reading: Durbin, Sections 2.3 until the end of the subsection on local alignment (Smith-Waterman). Also take a look at the affine gap penalty part of Section 2.4. We won't do every variant of pairwise alignment in class, but it's useful to see that there are so many different versions.

Lecture 5

Reading: This lecture we'll be finishing up pairwise alignment Read through Section 2.5 on heuristic alignment algorithms, Section 2.7 on significance of scores (the "classical approach" subsection is most important), and Section 2.8 on deriving score parameters from data. Some of the explanation is quite sketchy, and the links below provide clearer exposition. Also start in on Section 3.1 for Markov chains.

Lecture 6

Reading: Sections 3.1 and start of 3.2 on Markov chains and Hidden Markov Models for CpG island detection.

Lecture 7

Reading: Continue with Section 3.2 on the Viterbi algorithm for Hidden Markov models.

Lecture 8

Reading: Section 3.2 on posterior decoding (the forward and backward algorithms).

Lecture 9

Reading: Training HMMs -- the parameter estimation problem (Section 3.3). Maximimum likelihood estimation when (1) states for the training data are known and (2) states for the training data are unknown (Expectation Maximization). Also read about scaling probabilities for the forward/backward algorithms in the last section of Chapter 3.

Lecture 10

Reading: Section 3.3 on the Baum-Welch algorithm. We'll finally cover the learning algorithm (a special case of Expectation Maximization) used to train the parameters of an HMM when the state sequence for the training data is unknown. If there's time, we'll start Chapter 4 on pair HMMs, used to produce alignments.

Lecture 11

Reading: Chapter 4 on pair HMMs, used to produce alignments. If you want to read more about Expectation Maximization in general and the Baum-Welch algorithm in particular (material from last time), you can check out Chapter 11 in the text (beware of typos in equations).

Lecture 12

Reading: Chapter 5 on profile HMMs for modelling protein families.

Lecture 13

First, to help you get groups together for the project -- please copy and fill out the HTML information sheet template with information about yourself, your background, and your project interests. Either post the template yourself and mail the URL to Ilan, or send the HTML file to Ilan and he'll post it. Please do this sometime in the next week. I would like to have information for everyone in the class posted before Spring Break.

In this lecture, we'll do a brief introduction to gene expression data and machine learning approaches to classification and clustering problems for vector-valued data -- this class is a preparation for the guest lecture from Dr. Paul Pavlidis next time. This material is not in the text -- I'll work on finding and posting some good reference material for these topics. In the meantime, some of the links below provide some background/pictures.

Lecture 14

Guest lecture by Dr. Paul Pavlidis, head of the Gene Expression Informatics Group at the Columbia Genome Center.

Lecture 15

In this lecture, we'll discuss two clustering algorithms, hierarchical clustering and K-means.

Lecture 16

Midterm test.

Lecture 17

We'll start giving more details on support vector machines and kernel methods (for classification problems).

Lecture 18

Presentation of the SVM hard margin ("maximal margin") classifier.

Lecture 19

Soft-margin SVM classifiers, kernels, and feature selection.

Lecture 20

We'll discuss kernels for SVMs in general, and in particular the paper on the Fisher-SVM approach to remote homology detection

Lecture 21

We'll finish up with standard kernels and operations on kernels, and we'll discuss principal component analysis (PCA) -- a standard dimension reduction technique -- and kernel PCA.

Lecture 22

Introduction to Bayes nets for inferring regulatory networks from gene expression data. Overview of the following three papers:

Lecture 23

We'll start discussing the use of Bayes nets in the above three papers in more detail, starting with the Pe'er paper.

Lecture 24

Finish discussion of Pe'er paper and Ong paper on dynamic Bayes nets.

Lecture 25

Introduction to computational gene-finding for eukaryotes (in particular, vertebrates and humans). The main reference is Chris Burge's paper on GENSCAN, one of the best-known gene-finding programs. The second reference is David Haussler's review article on computational gene-finding.

Lecture 26

More details on GENSCAN and a quick discussion of TWINSCAN, a new gene-finding algorithm that uses both the GENSCAN model and a model of conservation across two organisms to improve prediction.

Lecture 27

For the last week, we'll discuss approaches to computational signal finding. This lecture, we discuss MEME, a popular motif discovery algorithm based on expectation maximization.

Lecture 28

More computational approaches to motif discovery.