ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING

COMS 4772/6772 HOMEWORK #1
PROF. TONY JEBARA

DUE	FEBRUARY 25^th, 2015 BEFORE CLASS

Write all code in Matlab. Submit your work via Courseworks. If unable to, contact the TA via email. Make sure to tar.gz everything into a directory and then send it. Also send us a write up of your results as a postscript or pdf file containing discussion, explanation, figures, tables and equations as well as your Matlab code and scripts as separate files. Try to make the submitted report look like a conference paper but keep the text brief. We recommend using Latex to write up your report: http://www.latex-project.org

Clearly denote the various components and the function calls or scripts that execute your Matlab functions. Shorter code is better. Do not submit copied code from others. You can use any draw package or Matlab to make illustrations and figures. Note, to save the current figure in Matlab as a postscript file you can type:

print –depsc filename.eps

Sample code is available on the “Tutorials” web page. Datasets are available from the “Handouts” web page.

Dimensionality reduction with MVE and different distances (kernels)

Part 1: download and install the Minimum Volume Embedding code and supporting components.
a) First download CSDP for your platform (linux, pc, mac) here:
https://projects.coin-or.org/Csdp/
b) Edit csdp.m so that both the dos and system calls point to the full path of the CSDP executable, e.g.
info=system([csdp ' fname '.dat-s' ' 'fname '.sol']);
should become
info=system(['/Users/yourname/bin/matlab/csdp 'fname '.dat-s' ' ' fname '.sol']);
c) Download MVE from here:
http://www1.cs.columbia.edu/~jebara/code/mve/
d) Edit the setuppath.m file for MVE so that it points to your CSDP installation
e) From the MVE directory, run setuppath in matlab, you are now ready to run MVE, you can run mvedriver to see an example. If you have out-of-memory errors try running matlab with the -nojvm option.

You can also download the MVE paper here:
http://www.cs.columbia.edu/~jebara/papers/aistatsMVE07.pdf

First try embedding the dataset here in 2D:
http://www.cs.columbia.edu/~jebara/4772/teapots100.mat
using MVE (with d=2) with Euclidean distances as well as different numbers of k-nearest neighbors to build the neighborhood graph (k=2,3, and 4). This dataset is 100 images of size 76x101 pixels of a teapot rotating. To watch a movie of the 100 images in Matlab type:
load teapots100.mat; for i=1:100; image(reshape(teapots(:,i),76,101,3)/255); pause(0.01); end;
You should be able to use the image function to plot each image at a specific coordinate given by the embedding coordinates MVE gives you to visualize how they inter-relate. Also, it helps to show the connectivity between the images that the k-nearest neighbor procedure gives you by drawing lines between the embeddings.

Part 2: Next, visualize in 2D (d=2) this dataset of 155 digit drawings (of the digit 9):
http://www.cs.columbia.edu/~jebara/4772/digits9.mat
to visualize these type:
load digits9.mat; for i=1:155; plot(digits9(i,1:2:140),digits9(i,2:2:140),'.'); pause(0.4); end
Here, these are not quite images but a list of coordinates for each digit as 70 black dots in the order x1,y1,x2,y2,...,x70,y70. Each digit is a row of 140 numbers in the matrix digits9. So, using a linear kernel or Euclidean distance is not very good and you will probably need to explore different distance functions and kernels. Recall that a generalized distance between points x and y can be recovered using a kernel as:
D(x,y) = (k(x,x)-2k(x,y)+k(y,y))^0.5
If you use innerproducts instead of kernels, you get the usual Euclidean distance. Modify the MVE code and the k-nearest neighbor code so that different distances are used by picking different kernels. You can try linear, RBF and polynomial kernels (with different paramter settings) to see if you can improve the visualization. Also, think about this dataset more closely. Each digit is represented as a cloud of points in some arbitrary order, not as a regular image of a grid of pixels. Try to come up with a better kernel of your own design for calculating the similarity between two clouds of 70 points (try to make sure it is a Mercer kernel or else MVE may not work). HINT: you could try to make images out of the point clouds (like the tea pot images) or you can try to build a novel kernel which finds the similarity between two point clouds by solving some kind of matching or some other way of summarizing the point clouds.

Part 3: For the digits data, try different value of beta and compute the fidelity across various choices. Plot the fidelity as a function of beta to identify which beta gives the highest fidelity in two dimensional visualization. Specifically, sweep a few values of beta for a linear kernel with k=2, 3, and 4 and with d=2.

Part 4: find any dataset of your OWN choosing (try to keep the number of points to around 400 or else MVE gets fairly slow) and embed it and explore different settings of the kernel, the number of neighbors, the target dimension d and so forth used in the MVE code to get a good visualization.

When presenting your results, compare the visualization to PCA and kernel PCA. When the spectra (eigenvalues) are very different for the different methods, plot them and discuss what dimensionality the data really needs for the various datasets. Also, report the fidelity score across your various experiments. Note that if you set k too large when you build the k-nearest-neighbor graph, MVE gives the same result as kernel PCA. Discuss why. You can also optionally try comparing with SDE. It is straightforward to modify the MVE code so that it produces that answer as well.