CS 4252 Final Project

For the final project, you will do independent research/reading on a topic not covered in COMS 4252 lectures (or go into significantly greater depth on some topic that we did touch upon) and write up a report in your own words as clearly and thoroughly as you can. Your project should demonstrate that you have synthesized information from several different sources (typically research papers). Projects absolutely do not need to have an experimental component but it is OK to have an experimental component if it is relevant. For projects that include an experimental component the report should include a clear description of the experiments that were performed and code should be made available.

Ten to fifteen pages is a reasonable length for your written report, though there is neither a hard upper limit nor lower limit on the length of the report. (Note that your report should be no longer than it needs to be -- concision will be rewarded and excessive wordiness or "padding" will be frowned on).

You may work in teams of up to three people for the final project, but you do not have to work in a team. Expectations for team projects will be calibrated according to the number of team members -- so a three-person project would need to demonstrate three times as much work as a one-person project on the same topic in order to get the same grade.


The timeline for final projects is:


  • Wed October 19: initial project proposal due.
  • Describe the topic you propose for your final project. This can be brief (no longer than a single paragraph) but should include some description of the sources you anticipate using for your project (names of papers, books, etc). Email this to the course account with subject line "Project Proposal".


  • Mon November 21: progress report due.
  • Send a 1-2 page high-level description of what you have accomplished so far and what you still have to do for your project. This should be substantially more detailed than your initial project proposal. Email this to the course account with subject line "Project Progress Report."


  • Mon December 12: final project report due.
  • Email this to the course account with subject line "Final Project Report".


    If you are not sure what to work on for your final project, here are some sample topics. It is OK to choose from this list but you are encouraged to spend some time browsing the literature since you may encounter a great topic that interests you more than any of these.

  • Kernels: Kernel functions and large margin classifiers are an important topic in contemporary machine learning and learning theory research. A starting point here is the book by Cristianini and Shawe-Taylor ("An introduction to support vector machines"). Such a project could include an experimental component, perhaps testing the performance of the Perceptron algorithm or other kernel-based algorithms with various kernels.
  • Random projection: papers by Arriaga/Vempala (FOCS 1999) and Klivans/Servedio (COLT 2004) (see also papers by Sanjoy Dasgupta on this topic) use random projection of data from n-dimensional space to k-dimensional space (with k much less than n) as an essential step of learning algorithms with provable performance guarantees. Read these and/or related papers on learning with random projection and write a summary/synthesis explaining what you learned. Here too experiments are an option.
  • Learning under the uniform distribution / Fourier analysis: learning Boolean functions from uniformly distributed random examples on {0,1}^n is an important topic in learning theory. Much more is learnable in these frameworks than in the standard PAC model where you must succeed w.r.t. any distribution. Some good papers to look at are Verbeurgt's 1989 COLT paper on learning DNF in quasipolynomial time under the uniform distribution; the paper of Kushilevitz and Mansour (SIAM J on Computing, 1993) on learning decision trees using the Fourier spectrum; Mansour's survey article on "Learning Boolean functions using the Fourier transform"; Linial, Mansour and Nisan's 1993 JACM article on "Constant depth circuits, Fourier transform and learnability"; and others.
  • Boosting: We will only touch the tip of the iceberg w.r.t. boosting algorithms -- many have been studied and analyzed. Experiments could play a role in a project here.
  • Multiclass learning: in the class we only discuss learning Boolean (i.e. two-valued) functions but there is much work on learning multivalued functions where there are more than two possible outputs.
  • Best expert / weighted majority type algorithms: here too there is a large literature that we just scratched the surface of. The original paper "The weighted majority algorithm" by Littlestone and Warmuth is one good source to get started with.
  • "Follow-the-(perturbed)-leader": this is an alternate approach to ``best-expert'' algorithms (different from multiplicative weight approaches) but one that also gives good bounds. The paper "Efficient algorithms for online decision problems" by Kalai and Vempala is a good starting point.

  • Topics in exact learning: toward the end of the course we will study (only a little bit) a model in which the learner can make equivalence queries and membership queries, and must exactly identify the unknown target concept. We will show that monotone DNF are efficiently learnable in this framework, but in fact several other interesting and expressive concept classes are known to be efficiently learnable in this model; these include finite automata (a result due to Dana Angluin; this is covered in the Kearns/Vazirani text, and references are given there) and decision trees (due to Nader Bshouty, Information and Computation, "Exact learning Boolean functions via the monotone theory"). Study and give an exposition of these results.
  • Learning in the presence of noise: In class we will discuss the models of PAC learning with random classification noise and PAC learning with malicious noise, but many other noise models have been studied as well (so called nasty noise; attribute noise; agnostic learning can be viewed as a noise model; etc.) There is a lot of interesting work on these topics.

  • Algorithms for learning linear threshold functions: One possible project would be to do a detailed study of linear programming based algorithms for learning linear threshold functions (good if you have a strong interest in linear programming). Another would be to study simple iterative algorithms such as Perceptron and Winnow in more detail; there are many variants of these algorithms for related problems such as linear regression, modifications/analyses for noise-tolerance, etc. A project of the latter sort could easily have an experimental component.

  • Inductive inference: This is a whole other way to look at learning from a theoretical CS perspective (more of a recursion theory flavor). Much work in inductive inference predates the work we've discussed in 4252. The survey by Angluin and Smith is a good place to get started here.

  • Learning rich concept classes: Some learning algorithms which run in more than polynomial time (but less than exponential 2^n time) are known for rich concept classes in the PAC learning model. These include algorithms for decision trees (Ehrenfeucht and Haussler, 1989; see also Blum, 1992, Information Processing Letters, "Rank-r Decision Trees are a subclass of r-decision lists") and algorithms for DNF formulas (see Bshouty, 1994, "A subexponential exact algorithm for learning DNF using equivalence queries"; also see Klivans and Servedio, STOC 2001). Study and given an exposition of these results.
  • Active learning: In active learning, the learner is given a large data set of unlabelled examples, and is allowed to choose which of these examples should receive a label; she must "pay" for each example that she chooses to have labelled. This topic has received renewed attention recently; papers by Dasgupta (NIPS 2004), Freund/Seung/Shamir/Tishby (Machine Learning 1997), and Dasgupta/Kalai/Monteleoni (COLT 2005) are good initial references, and there is a lot of work just in the past few years as well.