ADVANCED MACHINE LEARNING

CLASS PROJECT
PROF. TONY JEBARA

PRESENTATIONS ON

APR 22, APR 27, APR 29 AND MAY 04 2016

WRITE UP DUE ON

MAY 7^th 2016 BY MIDNIGHT

1. These are 4-person team projects. It is up to you to form a team of 4 people to explore this effort and to produce a co-authored paper representing the entire team's work. Machine learning is increasingly a multi-person effort at companies (and in academia) so collaboration is a crucial skill.

2. Use the TA and Professor's office hours to discuss your project ideas and make sure that they are reasonable. Be prepared to discuss broadly what you plan to do and are doing, what results you expect, etc. If you are really stuck, look at the topics below and those covered in class to think of a direction and we can also try to help you if you come to office hours.

3. The final presentations should be in powerpoint or pdf files and must be no more than 10 minutes long. Since you are in a group of 4 people, you don't all have to present, one person could be the designated speaker for the group and everyone will get the same grade for the presentation and the writeup. We strongly suggest that your team does not have more than 10 slides total. Time yourself to make sure you do not ramble on for more than your allowed time. We will deduct points if you exceed your allotted time and we will also stop you if you go over your allowed time (that is how things work at real conferences).

4. After presentations, submit a write-up in a two-column conference paper-style document as a Postscript file project.ps or a Portable Document Format file project.pdf, whichever is more appropriate and convenient for you to produce. Please do not send your work as a Microsoft Office document, LaTex source code, or something more exotic. Include images within your document as figures. Keep your total write-up no longer than 5 pages (two-column) although for 2-person projects you can write up to 8 pages. If you go over the page limits, you will also lose points (that's how conferences enforce limits). To see how to write a good paper and present it, check out this link:
http://www.cs.iastate.edu/~honavar/grad-advice.html
In particular see Simon Peyton Jones on "How to Write a Good Research Paper". We recommend using Latex to write up your report: http://www.latex-project.org

Submit your homework via Courseworks. If unable to, please email it to both the TAs and Instructor. Please tar.gz everything in your current directory and then send it to us. Make sure you send us a write up of your results as a postscript or pdf file containing any figures, tables and equations as well as your Matlab or C code and scripts as separate files.

For examples of previous year’s projects, take a look at:

http://www1.cs.columbia.edu/~jebara/6772/proj/

http://www1.cs.columbia.edu/~jebara/6998-01/projects/

(some links may be broken, just try to follow the ones that work)

PROJECT DESCRIPTION

Unlike the assignments, for the projects there is no fixed recipe to follow. Rather, you are free to pick a topic and direction that you find motivating and to leverage the tools covered in class. Here are a few themes we suggest as well as a few papers to look into.

Combine discriminative and generative learning. Consider new ways to fuse the two, either via the tools we have discussed or new ideas of your own. Also consider structured prediction and use a novel type of structural constraint (not just linear chains or HMM dependence).

Maximum margin Markov Networks

B. Taskar, C. Guestrin, D. Koller http://books.nips.cc/papers/files/nips16/NIPS2003_AA04.pdf

Maximum entropy discrimination

T. Jaakkola, M. Meila and T. Jebara http://www1.cs.columbia.edu/~jebara/papers/maxent.pdf

Machine Learning: Discriminative and Generative

T. Jebara

Cutting-Plane Training of Structural SVMs

Joachims, Finlay and Yu. http://www.cs.cornell.edu/People/tj/publications/joachims_etal_09a.pdf

Structured Prediction with Relative Margin

Shivaswamy and Jebara. http://www.cs.columbia.edu/~jebara/papers/icmla09structrmm.pdf

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Lafferty, McCallum and Pereira http://www.cs.columbia.edu/~jebara/6772/papers/crf.pdf

Majorization for CRFs and Latent Likelihoods

Jebara and Choromanska http://www.cs.columbia.edu/~jebara/papers/nips2012.pdf

Graph based learning.

Graph Transduction via Alternating Minimization

J. Wang, T. Jebara and S.F. Chang http://www.cs.columbia.edu/~jebara/papers/icml08.pdf

Graph Construction and b-Matching for Semi-Supervised Learning

T. Jebara, J. Wang and S.F. Chang http://www.cs.columbia.edu/~jebara/papers/JebWanCha09.pdf

Graph reconstruction with degree-constrained subgraphs

S. Andrews and T. Jebara http://www1.cs.columbia.edu/~jebara/papers/stu-andrews-workshop-submission-nips2007.pdf

Manifold learning. Consider ways to constrain or represent data that lives on a non-linear manifold.

Neighborhood Components Analysis

J. Goldberger, S. Roweis, G. Hinton and R. Salakhutdinov,

http://www.cs.toronto.edu/~hinton/absps/nca.pdf

Minimum Volume Embedding

B. Shaw and T. Jebara ,

http://www1.cs.columbia.edu/~jebara/papers/aistatsMVE07.pdf

Nonlinear Dimensionality Reduction by Semidefinite Programming

and Kernel Matrix Factorization

K. Weingberger, B. Backed and L. Saul,

http://www.seas.upenn.edu/~kilianw/publications/PDFs/kfactor_aistats05.pdf

Action Respecting Embedding

S. Bowling, A. Ghodsi and D. Wilkinson, http://www.machinelearning.org/proceedings/icml2005/papers/009_Action_BowlingEtAl.pdf

GTM: The generative topographic mapping

C. Bishop, http://www.ncrg.aston.ac.uk/Papers/postscript/NCRG_96_015.ps.Z

Nonlinear dimensionality reduction by locally linear embedding

S. Roweis and L. Saul, http://www.sciencemag.org/cgi/reprint/290/5500/2323.pdf

Kernel PCA and de-noising in feature spaces

S. Mika et al., http://www.kernelmachines.org/papers/MikSchSmoMueRaeSch99.ps.gz

A generalization of principal component analysis to the exponential family

M. Collins et al, http://www.research.att.com/~dasgupta/pca.pdf

Feature selection. Aggressively discard irrelevant features in a classification problem.

Feature selection for SVMs

J. Weston et al., http://www.ai.mit.edu/people/sayan/webPub/feature.ps

Feature selection and dualities in maximum entropy discrimination

T. Jebara and T. Jaakkola, http://www.cs.columbia.edu/~jebara/papers/uai.pdf

Novel Kernels. Try building kernels on unusual spaces (not just standard vectors).

String matching kernels for text classification

H. Lodhi et al, http://www.support-vector.net/papers/string.ps

A kernel between sets of vectors

R. Kondor and T. Jebara

http://www.cs.columbia.edu/~jebara/papers/Kondor,Jebara_point_set.pdf

Probability Product Kernels

T. Jebara, R. Kondor and A. Howard

http://www1.cs.columbia.edu/~jebara/papers/jebara04a.pdf

Exploiting generative models in discriminative classifiers

T. Jaakkola and D. Haussler. http://www.ai.mit.edu/~tommi/papers/gendisc.ps

Density Estimation under Independent Similarly Distributed Sampling Assumptions

T. Jebara, Y. Song and K. Thadani

http://www1.cs.columbia.edu/~jebara/papers/nips07isd.pdf

Meta-Learning, Multi-Class and Multi-Task Learning. Can learning from one task help with other tasks?.

Solving multiclass learning problems via error-correcting output codes

T. Dietterich and T. Bakiri, ftp.cs.orst.edu/pub/tgd/papers/jair-ecoc.ps.gz

Temporal Modeling. How to model complicated dynamic systems, particularly if they have interactions, couplings and hierarchy.

Learning switching linear models of human motion

V. Pavlovik, et al http://www.cc.gatech.edu/~rehg/Papers/SLDS-NIPS00.pdf

Dynamical Systems Trees

A. Howard and T. Jebara http://www1.cs.columbia.edu/~jebara/papers/uai04.pdf

Nonlinear prediction of chaotic time series using support vector machines

S. Mukherjee et al, http://www.ai.mit.edu/people/girosi/home-page/nnsp97.pdf

Coupled hidden Markov models for modeling interacting processes

M. Brand, http://www.media.mit.edu/people/brand/papers/brand-chmm.ps.gz

Approximate Methods for Bayesian models.

Variational Bayes for mixture models

H. Attias, http://research.microsoft.com/~hagaia/uai99.ps

W. Penny, http://www.fil.ion.ucl.ac.uk/~wpenny/publications/vgbmm.ps

Expectation-propagation for approximate inference in dynamic Bayesian nets

Heskes & Zoeter ftp://ftp.mbfys.kun.nl/pub/snn/pub/reports/Heskes.uai2002.ps.gz

SVMs and variants, transduction, universum, etc.

Inference with the Universum

J. Weston, et. al., http://www.icml2006.org/icml_documents/camera-ready/127_Inference_with_the_U.pdf

Learning with Local and Global Consistency

D. Zhou, et. al., http://research.microsoft.com/~denzho/papers/LLGC.pdf

Transductive inference for text classification using SVMs

T. Joachims, http://www-ai.cs.uni-dortmund.de/DOKUMENTE/Joachims_99c.ps.gz

Relative margin machines

P. Shivaswamy and T. Jebara, http://www.cs.columbia.edu/~jebara/papers/nips08.pdf

The relevance vector machine

M. Tipping, ftp.research.microsoft.com/users/mtipping/rvm_nips.ps.gz

Estimating the Support of a High-Dimensional Distribution.

Scholkopf, et. al. Microsoft Technical Report, MSR-TR-99-87. 1999.

Invariance, learning a model despite some nuisance source of variation that must be separated away.

Rotation and Translation Invariance for Images

Come see me for the hardcopy of the paper.

Orbit Learning using Convex Optimization

T. Jebara and Y. Bengio, http://www1.cs.columbia.edu/~jebara/papers/snowbird3.pdf

Separating style and content with bilinear models

J. Tenenbaum and W. Freeman, http://www.merl.com/reports/docs/TR99-04.pdf

Estimating mixture models of images and inferring

spatial transformations using EM

B. Frey and N. Jojic, http://www.psi.toronto.edu/~frey/papers/tmg-cvpr99.ps.Z

Kernelizing Sorting, Permutation and Alignment for Minimum Volume PCA

T. Jebara, http://www.cs.columbia.edu/~jebara/papers/permkern.pdf

Transformation Invariance in Pattern Recognition

Simard, et al http://yann.lecun.com/exdb/publis/psgz/simard-00.ps.gz

Information Theoretic Learning. Using information theory in learning.

Multivariate information bottleneck

N. Friedman et al, http://www.cs.huji.ac.il/~noamm/publications/UAI2001.ps.gz

Clustering and Learning mixtures without EM.

On Spectral Clustering: Analysis and an Algorithm

A. Ng, M. Jordan and Y. Weiss. http://ai.stanford.edu/~ang/papers/nips01-spectral.pdf

B-Matching for Spectral Clustering

T. Jebara and V. Shchogolev. http://www1.cs.columbia.edu/~jebara/papers/bmatching.pdf

Expander Flows, Geometric Embeddings and Graph Partitioning

S. Arora, S. Rao, U. Vazirani. http://www.cs.princeton.edu/~arora/pubs/arvstoc.pdf

General application areas (vision, text, audio, compbio)

Kernel Independent Component Analysis

F. Bach and M. Jordan, http://cmm.ensmp.fr/~bach/kernelICA-jmlr.pdf

Bayesian Out-Trees (applied to images)

T. Jebara, http://www.cs.columbia.edu/~jebara/papers/uai08tree.pdf

Probabilistic latent semantic analysis

T. Hofmann, http://www.cs.brown.edu/people/th/papers/Hofmann-UAI99.pdf

Or... any topic you can convince us about and involves advanced machine learning techniques! In particular, this would be a method published in a top machine learning conference in the past 15 years. Feel free to also bring new papers to the list below and suggest them as well. Places to look for papers include recent machine learning conferences such as:

Neural Information Processing Systems, NIPS

Uncertainty in Artificial Intelligence, UAI

International Conference on Machine Learning, ICML

Computer Vision and Pattern Recognition, CVPR

Conference on Learning Theory, COLT

and some machine learning journals like the Journal of Machine Learning Research, Journal of Artificial Intelligence Research, Machine Learning, Pattern Recognition, Neural Computation, IEEE Transactions on Pattern Analysis and Machine Intelligence and so forth. Many recent articles from these compilations are available online or in the library. You can find copies of the papers (postscript and pdf) through Citeseer, a popular search engine for computer science publications: http://citeseer.nj.nec.com/cs

What are examples of bad choices for projects? Anything that only involves easy algorithms from the introductory class (just logistic regression, just SVMs, just HMMs, just perceptrons, just EM for mixtures of Gaussians, just junction tree algorithm, etc.). These are ok methods to use as baselines to compare with while you develop or implement a better method but the whole point is to go beyond the things we learned about in 4771. Also, do not waste too much time setting up and presenting on the domain of your problem (say motivating and setting up a problem from finance, genomics, etc.). This course is about the machine learning side of things and not about the domains.

Potential datasets on which to try some of your learning algorithms:

http://www1.ics.uci.edu/~mlearn/MLRepository.html

Stanford Large Network Dataset Collection: http://snap.stanford.edu/data/

http://mldata.org

http://www.cs.toronto.edu/~delve/

http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/