MACHINE LEARNING January
27, 2011
COMS 4771
HOMEWORK #1
PROF. TONY JEBARA
|
DUE |
FEBRUARY 10th, 2011 BY 2:00pm EST |
PLEASE NOTE SUBMISSION INSTRUCTIONS.
1. (10
points) Crossvalidation for Polynomial Fitting: Download the Matlab code in
“polyreg.m” (on the tutorial web page) to do polynomial curve fitting. Also
download the dataset “dataset1.mat”. Type “load dataset1” in Matlab and you will
have the variables X (scalar inputs) and Y (scalar outputs) in your Matlab
environment space. Split the data into training (the first 100 points) and
testing (the second 100 points). Compute the training and testing error for
various polynomial order fits, i.e. for 0th order, 1st
order all the way to 20th order and plot both error curves
graphically. Which polynomial order fits the best and why? Graphically show the
fit of the data with the best polynomial order as an attached image.
2. (15
points) RBF Basis Regression: Modify the Matlab code for “polyreg.m” such that it does
RBF basis curve fitting instead of polynomial regression. Fit the data in
“dataset1.mat” with the first 100 points as training and the second 100 points
as testing and set the RBF’s sigma parameter equal to 1.0. Compute and show the
training and testing error for this model and show the fit graphically by
saving the Matlab plot of f(x) overlayed on the data.
3. (15
points) Perceptron:
Implement the linear perceptron in Matlab (using stochastic or gradient descent). Train it on
“dataset2.mat”. Type “load dataset2” and you will have the variables X (inputs)
and Y (+/- 1 labels) in your Matlab environment which contain the dataset. Use
the whole data set as training. Show with figures the resulting linear decision
boundary on the 2D X data. Show the binary classification error and the
perceptron error you obtain throughout the run from random initialization until
convergence on a successful run (some random inits may not converge or may
require many iterations). Note the number of iterations needed. If using the non-stochastic algorithm,
discuss the convergence behavior as you vary the step size (h).
4. (10
points) Multi-Class Discrimination:
There are several possible ways in which to generalize the concept of a
linear discriminant function from two classes to c classes. One possibility
would be to use (c-1) linear discriminant functions, such that
for inputs
in class
and
for inputs not in
class
. By drawing a simple example in two dimensions for
, show that this approach can lead to regions of x-space for
which the classification is ambiguous. Another approach would be to use one
discriminant function
for each possible
pair of classes
and
such that
for patterns in class
and
for patterns in class
. For c classes we would need
discriminant
functions. Again, by drawing a specific example in two dimensions for
, show that this approach can also lead to ambiguous regions.