MACHINE LEARNING March 4, 2010
COMS 4771
HOMEWORK #3
PROF. TONY JEBARA
DUE |
Thursday, March 25th, 2010 Before Class |
Your write-up should be in ASCII plain text format (.txt) or Postscipt (.ps) or the Adobe
Portable Document Format (.pdf). Please do not submit
Microsoft Office documents, LaTeX source code, or
something more exotic. We will not be able to read it. LaTeX
is preferred and highly recommended, but it is not mandatory. You can use any
document editing software you wish, as long as the final product is in .ps or .pdf. Even if you do not
use LaTeX to prepare your document, you can use LaTeX notation to mark up complicated mathematical
expressions, for example, in comments in your Matlab
code. All your code should be written in Matlab.
Please submit all your souce files, each function in
a separate file. Clearly denote what each function does, which problem it
belongs to, and what the inputs and the outputs are. Do not resubmit code or
data provided to you. Do not submit code written by others. Identical
submissions will be detected and both parties will get zero credit. Shorter
code is better. Sample code is available on the “Tutorials” web page. Datasets
are available from the “Handouts” web page.
You may include figures directly in your write-up or include them
separately as .jpg, .gif, .ps or .eps
files, and refer to them by filename. Submit work via courseworks.columbia.edu.
If something goes wrong, ask the TA's for help.
Thanks.
1. (20
points) Principal Components Analysis:
Download the dataset of 100 teapot images of size 38x50 in the file datasetPCA.mat. These images are organized as vectors of length 1900 in the matrix teapotImages. To view an image, say the second one in the dataset type:
load datasetPCA.mat
imagesc(reshape(teapotImages(2,:),38,50));
colormap gray;
Compute the mean, covariance matrix and the top 3 eigenvectors of
the covariance matrix of the dataset. Show the mean and the top 3 eigenvectors as
images. Reconstruct the data using principal components analysis (PCA) with least squares error using only the mean and a linear
combination of the top 3 eigenvectors. Show ten different images in the data the
way they originally appeared and after reconstruction using the mean
and the top 3 eigenvectors.
2. (15
points) Multinomial Classification: Write an implementation for fitting 2 class multinomial
models to dataset in dataset3.data. Just type ‘load
dataset3.data’ in Matlab. This dataset3 is a
matrix of 18 columns and 10025 rows. The data are the word counts from 9 plays
are by William Shakespeare and 9 plays by Thomas Middleton. The first 9 columns
correspond to the Shakespeare plays
The texts
were downloaded (and parsed to remove caps, punctuation, etc.) from
http://www.tech.org/~cleary/middhome.html
http://the-tech.mit.edu/Shakespeare/
The
‘words’ (about 10025 words total were used in the dictionary of all the above
18 plays) for each row are shown in the first column of dataset3.txt if you are
curious what the word counts correspond to. We also added a 1 to the counts so
words that never appear in a document get 1, words that appear once get a count
of 2, and so forth.
Start by
training your multinomials on only the first 2
Shakespeare (
Note: you
should compare log probabilities instead of regular probabilities to avoid
numerical problems.
Note: the
code for this problem is actually very short!
3. (15
points) How predictable is human behavior?
This
question looks at the rock-papers-scissors game. You will play this game
against the computer in Matlab but have it use
learning to find patterns of play and hopefully start to beat you. We will
assume that we have a fully interconnected graphical model, or fully general
distribution, with no conditional independencies. The distribution is over the
past M moves:
P(XM,XM-1,XM-2,…X1)
Where
each X is in the set {1,2,3} which stand for {Rock,
Paper, Scissors} and the first XM is the oldest one in the sequence
and the newest X is X1. The way we will do learning here is to look
at the past M moves each time you play the game and choose
rock-paper-scissors. By simply counting the frequency of various combinations
of rock, paper and scissors, from the past plays, we can form a
multidimensional multinomial model (or probability table). The counts will
start off with ones everywhere. Download
the code rockpaper.m from the Matlab tutorial page and modify the parts that are
commented to compute statistics on the M-th
order multinomial distribution from the clicks (currently it always guesses
‘Rock’ which is not very good). You will need to write code for order M=1,2,3,4,5. Also, you will need code to find the slice in the M-th order distribution from the past few clicks to compute
what is the most likely guess (R,P,S) from past M-1
clicks. In Matlab, the 3 configurations are labeled 1,2,3 (i.e. R, P, S). You will only need to add about 20 or
so lines of code (which are actually quite trivial). Hand in your code as well
as a short discussion of how the algorithm behaves for 1st order, 2nd
order, … up to 5th order probability tables
and how it compares against a human player.
As an
example, for order M=2, we would have the following initial values for cube:
cube
= [1 1 1; 1 1 1; 1 1
1];
After
observing the following sequence, (the code assumes we have RRRR… in our
memory) and with the last click being on the top of the window which quits the
program, we get:
RPSRPSRPS
Basically,
this is generated by clicking in the bottom part of the window with the left,
middle and then right mouse buttons and repeating three times. This gives us
the following data which we would treat as iid
samples for our multidimensional multinomial:
RR, RP,
PS, SR, RP, PS, SR, RP, PS
For the
counts cube, we would get something like this:
cube = [2 4 1; 1 1
4; 3 1 1];
The code should
predict that the next best move is rock as you click the last scissors with the
cursor on the top of the window to quit. You don’t need to normalize the counts
cube since this is trivial but in general, a true probability table would be
normalized. Also, it is easier to accumulate more counts in an online manner if
we don’t normalize.