CS 4252 Final Project

For the final project, you and a partner will choose one research paper in computational learning theory, read it, and submit a written report giving a clear and coherent explanation in your own words of what you learned from the paper. The point of the project is (a) to give you a chance to explore a CLT topic outside of the lectures/book in conjunction with a partner, and (b) for you to demonstrate to me that you can read and understand (at least some portion of) a research paper in computational learning theory.

Note that your report does not have to provide detailed coverage of the entire paper that you choose (some papers are quite long), but it is a good idea to go into at least some moderate level of detail about at least some portion of the paper. Ten pages is a reasonable length for your written report, though there is neither a hard upper limit nor lower limit on the length of the report. (Note that your report should be no longer than it needs to be -- concision will be rewarded and excessive wordiness or "padding" will be frowned on).

You must work in a team with at least one other person for the final project -- teams of up to three people are acceptable, but expectations for team projects will be calibrated according to the number of team members, so a three-person project would need to demonstrate more total work/understanding than a two-person project on the same topic in order to get the same grade.

The rough timeline for final projects is:

  • Tues November 10: initial project proposal due.
  • Identify your partner(s) and specify the paper you plan to cover for your final project. Email this directly to Rocco at rocco at cs dot columbia dot edu, with subject line containing the words "Project Proposal" and the unis of the project team members. If you don't plan to cover the entire paper you should specify (approximately) the portion of the paper that you are planning to cover. Only one email needs to be sent per project team.

  • Thurs December 17, 11:59pm EST: final project report due.
  • Email your final project to coms4252columbia.projects2015@gmail.com (not the usual course account) with subject line "Final Project Report". Your project should be submitted in the form of a pdf file.

    If you are not sure what to do for your final project, here are some sample topics and papers to consider as possibilities. There is nothing particularly special about the items listed below, many other great topics/papers could have been given. It is OK to choose from this list but you are encouraged to spend some time browsing the literature since you may encounter a great topic that interests you more than any of these.

    Note that if you decide to choose your own paper, it is not an option to choose a survey paper -- you should choose a paper that actually presents original research results. You also may not be allowed to pick such a "classic" paper that its contents have already been extensively covered in surveys, since in such cases the survey papers have already by and large done the work that you are expected to do for this project, that of reading, understanding, and giving a nice exposition of (at least a portion of) the paper.

  • Learning rich concept classes: Some learning algorithms which run in more than polynomial time (but less than exponential 2^n time) are known for rich concept classes in the PAC learning model. These include algorithms for decision trees (Ehrenfeucht and Haussler, 1989, Information and Computation, "Learning Decision Trees from Random Examples") and algorithms for DNF formulas (Bshouty, 1994, Information Processing Letters, "A subexponential exact algorithm for learning DNF using equivalence queries", or the 2004 Klivans/Servedio JCSS paper on DNF.).

  • Learning under the uniform distribution / Fourier analysis: Learning Boolean functions from uniformly distributed random examples on {0,1}^n is an important topic in learning theory. Much more is learnable in these frameworks than in the standard PAC model where you must succeed w.r.t. any distribution. For this project you could choose from the 1989 COLT paper of Karsten Verbeurgt on learning DNF in quasipolynomial time under the uniform distribution; the paper of Kushilevitz and Mansour (SIAM J on Computing, 1993) on learning decision trees using the Fourier spectrum; or Linial, Mansour and Nisan's 1993 JACM article on "Constant depth circuits, Fourier transform and learnability".
  • "Follow-the-(perturbed)-leader": this is an alternate approach to ``best-expert'' algorithms (different from multiplicative weight approaches) but one that also gives good bounds. The 2005 JCSS paper "Efficient algorithms for online decision problems" by Kalai and Vempala is a good choice for this topic.

  • Statistical Query learning: The STOC 1994 paper of Blum, Furst, Jackson, Kearns, Mansour and Rudich on "Weakly learning DNF and characterizing statistical query learning using Fourier analysis" covers aspects of the SQ learning model that we won't do in class. Another good paper here is Vitaly Feldman's 2012 JCSS paper "A complete characterization of statistical query learning with applications to evolvability."

  • Topics in exact learning: toward the end of the course we will study (only a little bit) a model in which the learner can make equivalence queries and membership queries, and must exactly identify the unknown target concept. We will show that monotone DNF are efficiently learnable in this framework, but in fact several other interesting and expressive concept classes are known to be efficiently learnable in this model; these include finite automata (a result due to Dana Angluin; this is covered in the Kearns/Vazirani text, and references are given there) and decision trees. Good papers to choose for this topic are Dana Angluin's 1987 Information and Computation paper "Learning Regular Sets from Queries and Counterexamples", or her 1990 Machine Learning paper "Negative results for Equivalence Queries", or Nader Bshouty's 1995 Information and Computation paper "Exact learning Boolean functions via the monotone theory".
  • Complexity-theoretic aspects of learning: There are many interesting connections between complexity theory (complexity classes) and learning theory. Good papers in this area are the 1996 JCSS paper of Bshouty, Cleve, Gavalda, Kannan, Tamon "Oracles and queries that are sufficient for exact learning" and the 2009 Fortnow/Klivans JCSS paper "Efficient learning algorithms yield circuit lower bounds."

  • Random projection: A useful technique (which we did not cover in class) is to perform a 'random projection' of data from n-dimensional space to k-dimensional space (with k much less than n). If you are interested in this topic you may want to choose the 2006 Machine Learning paper "An algorithmic theory of learning: Robust concepts and random projection" of Arriaga and Vempala, or the 2008 Klivans/Servedio paper "Learning intersections of halfspaces with a margin".
  • On-line mistake-bound learning with ``I don't know'' answers: One can consider a variant of the OLMB learning model in which the learning algorithm is allowed to answer ``I don't know''. In some versions of this model the learning algorithm must never make a mistake, and in other versions the goal is to achieve a good tradeoff between the number of mistakes and the number of examples for which ``I don't know'' is answered. Relevant papers here include the 2008 ICML paper of Li, Littman and Walsh; the 2010 NIPS paper of Sayedi, Zadimoghaddam and Blum; and the 2013 SODA paper of Demaine and Zadimoghaddam.

  • Multiclass learning: in the class we only discuss learning Boolean (i.e. two-valued) functions but there is much work on learning multivalued functions where there are more than two possible outputs. Some good possible papers on this topic would be the 1999 Guruswami/Sahai paper or the 1995 JAIR multiclass learning paper of Dietterich/Bakiri.

  • Evolvability: Leslie Valiant has introduced a computational model of evolution which views the evolution process as a restricted form of learning. See his 2009 JACM paper "Evolvability."

  • Noise-tolerant learning of linear threshold functions: The 1998 Algorithmica paper of Blum, Frieze, Kannan and Vempala gives a polynomial-time algorithm to learn arbitrary linear threshold functions in the presence of random classification noise.