APR 22, APR 27, APR 29 AND MAY 04  2016





1. These are 4-person team projects. It is up to you to form a team of 4 people to explore this effort and to produce a co-authored paper representing the entire team's work. Machine learning is increasingly a multi-person effort at companies (and in academia) so collaboration is a crucial skill.


2. Use the TA and Professor's office hours to discuss your project ideas and make sure that they are reasonable. Be prepared to discuss broadly what you plan to do and are doing, what results you expect, etc. If you are really stuck, look at the topics below and those covered in class to think of a direction and we can also try to help you if you come to office hours.


3. The final presentations should be in powerpoint or pdf files and must be no more than 10 minutes long. Since you are in a group of 4 people, you don't all have to present, one person could be the designated speaker for the group and everyone will get the same grade for the presentation and the writeup. We strongly suggest that your team does not have more than 10 slides total. Time yourself to make sure you do not ramble on for more than your allowed time. We will deduct points if you exceed your allotted time and we will also stop you if you go over your allowed time (that is how things work at real conferences).


4. After presentations, submit a write-up in a two-column conference paper-style document as a Postscript file or a Portable Document Format file project.pdf, whichever is more appropriate and convenient for you to produce. Please do not send your work as a Microsoft Office document, LaTex source code, or something more exotic. Include images within your document as figures. Keep your total write-up no longer than 5 pages (two-column) although for 2-person projects you can write up to 8 pages. If you go over the page limits, you will also lose points (that's how conferences enforce limits). To see how to write a good paper and present it, check out this link:
In particular see Simon Peyton Jones on "How to Write a Good Research Paper".
We recommend using Latex to write up your report:


Submit your homework via Courseworks. If unable to, please email it to both the TAs and Instructor. Please tar.gz everything in your current directory and then send it to us. Make sure you send us a write up of your results as a postscript or pdf file containing any figures, tables and equations as well as your Matlab or C code and scripts as separate files.


For examples of previous year’s projects, take a look at:

 (some links may be broken, just try to follow the ones that work)




Unlike the assignments, for the projects there is no fixed recipe to follow. Rather, you are free to pick a topic and direction that you find motivating and to leverage the tools covered in class. Here are a few themes we suggest as well as a few papers to look into.

    Combine discriminative and generative learning. Consider new ways to fuse the two, either via the tools we have discussed or new ideas of your own. Also consider structured prediction and use a novel type of structural constraint (not just linear chains or HMM dependence).


Maximum margin Markov Networks

B. Taskar, C. Guestrin, D. Koller


Maximum entropy discrimination

T. Jaakkola, M. Meila and T. Jebara


Machine Learning: Discriminative and Generative

T. Jebara


Cutting-Plane Training of Structural SVMs

Joachims, Finlay and Yu.

Structured Prediction with Relative Margin

Shivaswamy and Jebara.

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Lafferty, McCallum and Pereira

Majorization for CRFs and Latent Likelihoods

Jebara and Choromanska



    Graph based learning.


Graph Transduction via Alternating Minimization

J. Wang, T. Jebara and S.F. Chang

Graph Construction and b-Matching for Semi-Supervised Learning

T. Jebara, J. Wang and S.F. Chang

Graph reconstruction with degree-constrained subgraphs

S. Andrews and T. Jebara


    Manifold learning. Consider ways to constrain or represent data that lives on a non-linear manifold.


Neighborhood Components Analysis

J. Goldberger, S. Roweis, G. Hinton and R. Salakhutdinov,

Minimum Volume Embedding

B. Shaw and T. Jebara ,


Nonlinear Dimensionality Reduction by Semidefinite Programming

and Kernel Matrix Factorization

K. Weingberger, B. Backed and L. Saul,


Action Respecting Embedding 

S. Bowling, A. Ghodsi and D. Wilkinson,


GTM: The generative topographic mapping 

C. Bishop,


Nonlinear dimensionality reduction by locally linear embedding

S. Roweis and L. Saul,


Kernel PCA and de-noising in feature spaces

S. Mika et al.,


A generalization of principal component analysis to the exponential family

M. Collins et al,


    Feature selection. Aggressively discard irrelevant features in a classification problem.


Feature selection for SVMs

J. Weston et al.,


Feature selection and dualities in maximum entropy discrimination

T. Jebara and T. Jaakkola,


    Novel Kernels. Try building kernels on unusual spaces (not just standard vectors).


String matching kernels for text classification

H. Lodhi et al,


A kernel between sets of vectors

R. Kondor and T. Jebara,Jebara_point_set.pdf


Probability Product Kernels

T. Jebara, R. Kondor and A. Howard


Exploiting generative models in discriminative classifiers

T. Jaakkola and D. Haussler.

Density Estimation under Independent Similarly Distributed Sampling Assumptions

T. Jebara, Y. Song and K. Thadani


    Meta-Learning, Multi-Class and Multi-Task Learning. Can learning from one task help with other tasks?.


Multitask Learning

R. Caruana,


Multitask Sparsity via Maximum Entropy Discrimination

T. Jebara,


Learning Internal Representations

J. Baxter,


Solving multiclass learning problems via error-correcting output codes

T. Dietterich and T. Bakiri,


    Temporal Modeling. How to model complicated dynamic systems, particularly if they have interactions, couplings and hierarchy.


Learning switching linear models of human motion

V. Pavlovik, et al


Dynamical Systems Trees

A. Howard and T. Jebara


Nonlinear prediction of chaotic time series using support vector machines

S. Mukherjee et al,


Coupled hidden Markov models for modeling interacting processes

M. Brand,


    Approximate Methods for Bayesian models.


Variational Bayes for mixture models

H. Attias,

W. Penny,


Expectation-propagation for approximate inference in dynamic Bayesian nets  

Heskes & Zoeter



    SVMs and variants, transduction, universum, etc.


Inference with the Universum

J. Weston, et. al.,


Learning with Local and Global Consistency

D. Zhou, et. al.,


Transductive inference for text classification using SVMs 

T. Joachims,


Relative margin machines 

P. Shivaswamy and T. Jebara,


The relevance vector machine 

M. Tipping,


Estimating the Support of a High-Dimensional Distribution.

Scholkopf, et. al. Microsoft Technical Report, MSR-TR-99-87. 1999.


    Invariance, learning a model despite some nuisance source of variation that must be separated away.


Rotation and Translation Invariance for Images

Come see me for the hardcopy of the paper.


Orbit Learning using Convex Optimization

T. Jebara and Y. Bengio,


Separating style and content with bilinear models

J. Tenenbaum and W. Freeman,


Estimating mixture models of images and  inferring

spatial transformations using EM

B. Frey and N. Jojic,


Kernelizing Sorting, Permutation and Alignment for Minimum Volume PCA

T. Jebara,


Transformation Invariance in Pattern Recognition

Simard, et al


    Information Theoretic Learning. Using information theory in learning.


Multivariate information bottleneck

N. Friedman et al,


    Clustering and Learning mixtures without EM.


On Spectral Clustering: Analysis and an Algorithm

A. Ng, M. Jordan and Y. Weiss.


B-Matching for Spectral Clustering

T. Jebara and V. Shchogolev.


Expander Flows, Geometric Embeddings and Graph Partitioning

S. Arora, S. Rao, U. Vazirani.



    General application areas (vision, text, audio, compbio)


Kernel Independent Component Analysis

F. Bach and M. Jordan,


Bayesian Out-Trees (applied to images)

T. Jebara,


Probabilistic latent semantic analysis

T. Hofmann,



Or... any topic you can convince us about and involves advanced machine learning techniques! In particular, this would be a method published in a top machine learning conference in the past 15 years. Feel free to also bring new papers to the list below and suggest them as well. Places to look for papers include recent machine learning conferences such as:

Neural Information Processing Systems, NIPS

Uncertainty in Artificial Intelligence, UAI

International Conference on Machine Learning, ICML

Computer Vision and Pattern Recognition, CVPR

Conference on Learning Theory, COLT

and some machine learning journals like the Journal of Machine Learning Research, Journal of Artificial Intelligence Research, Machine Learning, Pattern Recognition, Neural Computation, IEEE Transactions on Pattern Analysis and Machine Intelligence and so forth. Many recent articles from these compilations are available online or in the library. You can find copies of the papers (postscript and pdf) through Citeseer, a popular search engine for computer science publications:


What are examples of bad choices for projects? Anything that only involves easy algorithms from the introductory class (just logistic regression, just SVMs, just HMMs, just perceptrons, just EM for mixtures of Gaussians, just junction tree algorithm, etc.). These are ok methods to use as baselines to compare with while you develop or implement a better method but the whole point is to go beyond the things we learned about in 4771. Also, do not waste too much time setting up and presenting on the domain of your problem (say motivating and setting up a problem from finance, genomics, etc.). This course is about the machine learning side of things and not about the domains.

Potential datasets on which to try some of your learning algorithms:

Stanford Large Network Dataset Collection: