#
CS 4252 Final Project

For the final project, you and a partner will choose **one** research
paper in computational
learning theory, read it, and submit a written report giving a clear
and coherent explanation **in your own words** of what you learned from
the paper. The point of the project is (a) to give you a chance to explore
a CLT topic outside of the lectures/book in conjunction with a partner,
and (b) for you to demonstrate to me
that you can read and understand (at least some portion of) a research
paper in computational learning theory.
Note that your report does not have to provide detailed coverage of
the entire paper that you choose (some papers are quite long), but it
is a good idea to go into at least some moderate level of detail
about at least some portion of the paper. Ten pages
is a reasonable length for your written report, though there is neither
a hard upper limit nor lower limit on the length of the report.
(Note that your report should be no longer than it needs to be -- concision
will be rewarded and excessive wordiness or "padding" will be frowned on).

You must work in a team with at least one other person for the final project -- teams of up to three people are acceptable, but
**
expectations for team projects will be calibrated according to the
number of team members, so a three-person project would need to demonstrate
more total work/understanding than a two-person project on the
same topic in order to get the same grade.
**

The rough timeline for final projects is:

###

**
**

**Tues November 10:
initial project proposal due.**
Identify your partner(s) and specify the paper
you plan to cover for your final project.
Email this **directly to Rocco** at rocco at cs dot columbia dot edu, with subject line containing the words "Project Proposal" and the unis of the project
team members.
If you don't plan to cover the entire paper you should specify
(approximately) the portion
of the paper that you are planning to cover. Only one email needs to be
sent per project team.
###

**
Thurs December 17, 11:59pm EST: final project report due.
**

Email your final project to **coms4252columbia.projects2015@gmail.com
** (not the usual course account)
with subject line "Final Project Report".
Your project should be submitted in the form of a pdf file.

If you are not sure what to do for your final project,
here are some sample topics and papers to consider as possibilities.
There is nothing particularly special about the items listed below,
many other great topics/papers could have been given.
It is OK to choose from this list
but you are encouraged to spend some time browsing the literature
since you may encounter a great topic that interests you more than
any of these.

Note that if you decide to choose your own paper, it is
**not** an option to choose a survey paper -- you should choose a
paper that actually presents original research results.
You also may not be allowed to pick such a "classic" paper that its contents
have already been extensively covered in surveys, since in such cases the
survey papers have already by and large
done the work that you are expected to do
for this project, that of reading, understanding, and giving a nice exposition
of (at least a portion of) the paper.

Learning rich concept classes: Some learning algorithms which
run in more than polynomial time (but less than exponential 2^n time) are
known for rich concept classes in the PAC learning model. These include
algorithms for decision trees (Ehrenfeucht and Haussler, 1989,
Information and Computation, "Learning Decision Trees from Random Examples")
and algorithms for DNF formulas (Bshouty, 1994, Information Processing
Letters, "A subexponential exact algorithm for learning DNF using
equivalence queries", or the 2004 Klivans/Servedio JCSS paper on DNF.).

Learning under the uniform distribution / Fourier analysis:
Learning Boolean functions from uniformly distributed random examples on
{0,1}^n is an important topic in learning theory. Much more is learnable
in these frameworks than in the standard PAC model where you must succeed
w.r.t. any distribution. For this project you could choose from the
1989 COLT paper of Karsten Verbeurgt on learning DNF in quasipolynomial time
under the uniform distribution;
the paper of Kushilevitz and Mansour (SIAM J on Computing,
1993) on learning decision trees using the Fourier spectrum; or
Linial, Mansour and Nisan's 1993 JACM article on "Constant
depth circuits, Fourier transform and learnability".

"Follow-the-(perturbed)-leader": this is an alternate approach to
``best-expert'' algorithms (different from multiplicative weight approaches)
but one that also gives good bounds. The 2005 JCSS paper "Efficient algorithms
for online decision problems" by Kalai and Vempala is a good choice
for this topic.

Statistical Query learning:
The STOC 1994 paper of Blum, Furst, Jackson, Kearns, Mansour and Rudich
on "Weakly learning DNF and characterizing statistical query learning using
Fourier analysis" covers aspects of the SQ learning model that we won't
do in class. Another good paper here is Vitaly Feldman's 2012 JCSS
paper "A complete characterization of statistical query learning with applications to evolvability."

Topics in exact learning: toward the end of the course we will
study (only a little bit) a model in which the learner can make
equivalence queries and membership queries, and must exactly identify the
unknown target concept. We will show that monotone DNF are efficiently
learnable in this framework, but in fact several other interesting and
expressive concept classes are known to be efficiently learnable in this
model; these include finite automata (a result due to Dana Angluin; this
is covered in the Kearns/Vazirani text, and references are given there)
and decision trees. Good papers to choose for this topic are
Dana Angluin's 1987 Information and Computation
paper "Learning Regular Sets from Queries and Counterexamples",
or her 1990 Machine Learning paper "Negative results for Equivalence Queries",
or Nader Bshouty's 1995 Information and Computation paper
"Exact learning Boolean functions via the monotone theory".

Complexity-theoretic aspects of learning: There are many interesting
connections between complexity theory (complexity classes) and learning theory.
Good papers in this area are the 1996 JCSS paper of
Bshouty, Cleve, Gavalda, Kannan, Tamon "Oracles and queries that are
sufficient for exact learning" and the 2009 Fortnow/Klivans JCSS
paper "Efficient learning algorithms yield circuit lower bounds."

Random projection: A useful technique (which we did not cover in
class) is to perform a 'random projection' of data
from n-dimensional space to k-dimensional space (with k much less than n).
If you are interested in this topic you may want to choose the 2006 Machine
Learning paper "An algorithmic theory of learning: Robust concepts and random projection" of Arriaga and Vempala, or the 2008 Klivans/Servedio
paper "Learning intersections of halfspaces with a margin".

On-line mistake-bound learning with ``I don't know'' answers:
One can consider a variant of the OLMB learning model in which the learning
algorithm is allowed to answer ``I don't know''. In some versions of this
model the learning algorithm must never make a mistake, and in other versions
the goal is to achieve a good tradeoff between the number of mistakes
and the number of examples for which ``I don't know'' is answered. Relevant
papers here include the 2008 ICML paper of Li, Littman and Walsh; the
2010 NIPS paper of Sayedi, Zadimoghaddam and Blum; and the 2013
SODA paper of Demaine and Zadimoghaddam.

Multiclass learning: in the class we only discuss learning
Boolean (i.e. two-valued) functions but there is much work on learning
multivalued functions where there are more than two possible outputs.
Some good possible papers on this topic would be the 1999 Guruswami/Sahai
paper or the 1995 JAIR multiclass learning paper of Dietterich/Bakiri.

Evolvability: Leslie Valiant has introduced a computational model
of evolution which views the evolution process as a restricted
form of learning. See his 2009 JACM paper "Evolvability."

Noise-tolerant learning of linear threshold functions: The 1998 Algorithmica
paper of Blum, Frieze, Kannan and Vempala gives a polynomial-time
algorithm to learn arbitrary linear threshold functions in the
presence of random classification noise.