COMS 6253 Projects
In general there are two main goals for a project in this class:
(1) You should acquire a substantial body of knowledge about the topic of
your project. This will involve closely and carefully reading
literature on your specific project topic
(likely to be several papers).
You'll demonstrate this aspect of your project in the "background"
section(s) of your project report, which should be a clear synthesis
and exposition in your own words of what you learned.
(2) You should gain research experience in this area; i.e. make a
serious effort to contribute to the state of knowledge on your project
topic by (i) identifying an interesting open question or direction for
future research related to your project topic; (ii) coming up with an
approach to make progress; and (iii) working to carry out your approach.
You'll demonstrate this aspect of your project by
explaining in detail what you did for (i), (ii) and (iii) in the rest of your
project report.
The ratio of (1) to (2) may vary between different projects. There are
some projects that might involve relatively less background (but in that
case you will be expected to spend more time -- and give more evidence of
time well spent) -- on trying to make research progress; and there are
other projects where you'll need to acquire more extensive background.
The timetable for projects will be as follows:
-
March 1: Turn in a brief (1-2 paragraph) project proposal to
me by email.
-
April 5: Turn in a progress report (1-2 pages) summarizing
your topic, research question, and progress to date.
-
April 19/26: In-class project presentations.
-
May 3: Final project reports due (email to Rocco)
For the first phase of your project (the initial proposal) you
should not only select a general topic, but come up with
some rough idea of what research question/direction
you will pursue. This not as easy as it sounds! You should expect to
spend a nontrivial amount of
time before March 1 on coming up with a good project
proposal.
The end products of your project will be:
(A) An in-class presentation. You should expect to
give a 15-30 minute presentation on what you did. This is not much time
so good preparation for your presentation is a must.
(B) A project report that you'll turn in to me (Rocco). This will be the
major component of your course grade and should reflect your best effort.
A rough guide is that you should expect to write at least ten (but
hopefully not more than twenty unless it's really necessary)
pages giving clear and convincing evidence that
goals (1) and (2) above were achieved.
You may work in groups of up to three on the project. Expectations for
projects will be calibrated according to group size.
Below are some potential broad project topics with associated jumping-off
points into the literature. If you go with one of these
projects, you should also do your own literature search
beyond the sources referenced below.
You are also encouraged to come up with your
own project topic (if you do this, send me a quick email with the
project topic when you settle on it -- hopefully before March 3 --
to make sure it's suitable for
this class). Good places to look if you are browsing/brainstorming
for ideas are COLT (Conference on Computational Learning Theory)
proceedings or journals such as Machine Learning or Journal of Machine
Learning Research. There are also many learning theory papers that appear
in more "mainstream" CS theory venues such as the STOC (Symposium on
Theory of Computing) proceedings, FOCS (Foundations of Computer Science)
proceedings, and journals such as JACM, J. Computer & System Sciences,
SIAM Journal on Computing, Information and Computation, and many others.
-
Boosting: Boosting is an automatic way of converting "weak"
PAC learning algorithms into "strong" PAC learning algorithms.
We will talk about boosting in class, but only briefly.
Rob Schapire and Yoav Freund have good webpages with many
references (and survey papers to get started) on boosting.
-
Random projection: papers by Dasgupta (FOCS 1999),
Arriaga/Vempala (FOCS 1999) and Klivans/Servedio (COLT 2004) and others use
random projection of data from n-dimensional space to k-dimensional space
(with k much less than n) as an essential step to get
computationally efficient learning algorithms.
- Neural models of computation and learning: The homepages of Leslie
Valiant and Wolfgang Maass are good starting points.
-
Connections with complexity theory: Two of the main tools we will use in this
class, polynomial threshold functions and Fourier analysis
over the Boolean hypercube, also are widely used in complexity theory.
For example, PTFs play a crucial role in showing that the complexity
class PP is closed under intersection, and Fourier analysis of Boolean
functions is heavily used in the study of inapproximability
(and is a topic of independent interest).
Even though strictly speaking it isn't a learning theory topic,
it would be a great project to do an in-depth study of one of these
two topics (either PTFs or Fourier).
For PTFs, good sources are the paper by Beigel, Reingold and Spielman,
followup work by Beigel and Fortnow, and the paper of Vereschagin
on lower bounds for Perceptrons.
For Fourier analysis, good starting points are the course notes
of Khot, of Mossel, of Friedgut/Dinur, and of Dubhashi/Hegarty/Steif.
-
Unsupervised learning: There has been much recent work on
learning an unknown probability distribution, given access to draws
from the distribution. Some starting points here are the
1999 FOCS paper of Dasgupta on learning mixtures of Gaussians
(there have been many followups), the SICOMP paper of
Cryan, Goldberg and Goldberg on learning evolutionary trees,
and the 2005 FOCS paper of Feldman, O'Donnell and Servedio
on learning mixtures of product distributions over discrete
domains.
- Kernels: Kernel functions and large margin classifiers are an
important topic in contemporary machine learning and learning theory
research. A good first source here is the book by Cristianini and Shawe-Taylor
or recent "Kernel Workshop" proceedings (these have been held with the
NIPS and COLT conferences in recent years).
- Simple iterative linear threshold learning algorithms like
Perceptron and Winnow: There is a lot of interesting theory
about these algorithms. Good sources are Littlestone's
original Winnow paper (on the "Readings" page) and his followups;
the Freund/Schapire COLT 98 paper on the Perceptron algorithm;
Kivinen, Warmuth and Auer's COLT 1995 paper on
"The Perceptron Algorithm versus Winnow"; the COLT 2002
paper by Cesa-Bianchi, Conconi and Gentile on second-order
Perceptron algorithms; the NIPS paper by
Ofer Dekel, Shai Shalev-Shwartz, Yoram Singer on the ``forgetron'';
and Blum and Dunagan's recent paper on smoothed analysis of the
Perceptron algorithm.
-
Learning linear threshold functions in polynomial time:
the paper of Maass and Turan (see Readings page) and related work
on polynomial-time linear programming.
There have been some recent interesting developments here including
a ``perceptron rescaling algorithm'' by Dunagan and Vempala.
- Multiplicity automata: Toward the end of the course we
will discuss a learning model known as exact learning
from membership and equivalence queries. Many of the most
powerful learning algorithms in this model are based on
multiplicity automata; these are nice results that we won't
get to cover in class. Good references here are the JACM 2000
paper by Amos Beimel, Francesco Bergadano, Nader H. Bshouty, Eyal Kushilevitz,
and Stefano Varricchio on learning multiplicity automata and the
more recent work of Klivans and Shpilka from COLT 2003.
- Quantum variants of learning models: good
first sources here are the 2004 SICOMP paper of Servedio/Gortler
and the 1999 SICOMP paper of Bshouty/Jackson. Blum and Yang
have an interesting CCC paper on connections between quantum computing
and the Statistical Query model.
-
Cryptographic hardness results for learning algorithms: In
COMS 4252 we discussed some relations between cryptography
and (distribution-independent) PAC learning algorithms. There is a nice
body of work giving cryptographic hardness results for various
uniform distribution learning problems as well. Good
sources are Michael Kharitonov's 1992, 1993 and 1995 COLT, STOC and J. Computer & System Sciences papers,
as well as a recent paper by Klivans and Sherstov. This is a
good project if you're interested in
(and know something about) crypto.
- Attribute-efficient learning algorithms:
These are algorithms that can learn simple concepts from
very few examples (sublinear in the dimension n).
Such algorithms have been studied in a range of learning models.
Papers on Winnow are relevant for the online mistake-bound model;
papers by Bshouty, Jackson and Tamon are relevant for the
uniform distribution model; and papers by Bshouty and
Hellerstein are relevant for models of learning from queries.
-
Learning in the presence of noise: Many different models have been studied
for the phenomenon of learning from noisy data. One such model
is ``agnostic learning;'' there has been recent progress both on
positive results for agnostic learning (Kalai, Klivans, Mansour,
Servedio, FOCS 2005) and negative results (Feldman and Guruswami/Raghavendra
papers in CCC 2006 and FOCS 2006).
- Active learning: In active learning, the learner is given a large
data set of unlabelled examples, and is allowed to choose which
of these examples should receive a label; she must "pay" for each
example that she chooses to have labelled. This is something of a
``hot topic'' that has
received renewed attention recently; papers by Dasgupta (NIPS 2004),
Freund/Seung/Shamir/Tishby (Machine Learning 1997), and
Dasgupta/Kalai/Monteleoni (COLT 2005) are good references.
-
Learning linear threshold functions in polynomial time:
the paper of Maass and Turan (see Readings page) and related work
on polynomial-time linear programming.
There have been some recent interesting developments here including
a ``perceptron rescaling algorithm'' by Dunagan and Vempala.
-
Inductive inference: A completely
different way to look at learning from a theoretical CS perspective.
A somewhat old but still useful introduction is "Inductive
Inference: Theory and Methods" by Angluin & Smith.