COMS 4771 Section 2 Fall-B 2020 (Machine Learning)

This is the website for COMS 4771 Section 2, which is taught during Fall 2020 Subterm B (October 26–December 14, 2020).

Course information


Homework and quizzes

These will be made available on Courseworks.

  • HW 1: due 11/4
  • Quiz 1: 11/6
  • HW 2: due 11/18
  • Quiz 2: 11/20
  • HW 3: due 12/7
  • Quiz 3: 12/4
  • HW 4: due 12/16

Lecture & reading schedule

Below is the planned schedule. I will try to keep it accurate at least for the next lecture.

Dates Lecture topic Required reading Optional reading
10/27 Overview Dietterich article on ML 1.0-3.3 CML 1.1-1.2
10/27 Nearest neighbor classification CML 3.1-3.3; PC 4.5.5, 4.6
10/29 Prediction theory Coin tosses handout CML 5.5-5.6, 9.1-9.2; PC 3.2
11/5, 11/10 Linear regression Linear regression handout MML 9.1-9.2.1, 9.4; ESL 3.1
11/10, 11/12 Regularization MML 9.2.2-9.3.3; ESL 3.3-3.4.3
11/12, 11/17 Multivariate Gaussians and PCA PCA handout 5.1-5.4 MLPP 4.3.1, 4.3.2.1, 4.3.4
11/19 Kernels CML 11.1, 11.4
11/19, 11/24 Linear classification Logistic regression handout CML 7.1-7.4; PC 5.1-5.4
11/24, 12/1 Convex optimization Gradient descent handout MML 7.1, 7.3-7.3.1; CML 7.5
12/3, 12/8 Neural networks CML 10.1-10.5; efficient backprop
12/8 Margins and SVMs Perceptron handout CML 7.7; MML 12.1-12.1
12/8, 12/10 Classification objectives COMPAS article; OAA handout CML 6.1-6.2, 8.1, 8.4
12/10 Ensemble methods AdaBoost handout BFA 1.1-1.3, 3.4.3

Office hours

Zoom links for office hours available on Courseworks

Monday Tuesday Wednesday Thursday Friday
Andy Wonjun Recitation
10am-noon 10am-noon 10:10am-noon
 
Lecture Lecture Andrea
Serena 1:10-3:40pm 1:10-3:40pm 1:00-3:00pm
2:00-4:00pm
William Daniel
4:00-6:00pm 4:00-6:00pm

Syllabus

Course description

COMS 4771 is a graduate-level introduction to machine learning. The course covers basic statistical principles of supervised machine learning, as well as some common algorithmic paradigms.

Note: The course description for COMS 4771 elsewhere (e.g., SSOL, Vergil) is out-of-date.

Hybrid format

This course is designated as a “hybrid course”. This means that roughly ~20% of the instruction will happen in-person for “On Campus” students.

Lectures will be recorded and made available to students. It will be possible to complete all of the required coursework, quizzes, and exams remotely (i.e., online). Synchronous participation in lectures and recitations will not be necessary.

Attendance (for either the lectures or recitations) will not be formally checked.

International students should consult Columbia ISSO about concerns regarding visa eligibility and related issues.

I don’t know if it is okay to enroll in courses that meet in overlapping time slots. I suggest you check with your academic program officers to determine if this is allowed.

Learning goals

  • Apply mathematical and statistical principles to understand and reason about machine learning problems and algorithms.
  • Apply algorithmic techniques to construct machine learning algorithms.

Prerequisites

You must know multivariate calculus, linear algebra, and basic probability. You must be comfortable with writing code to process and analyze data in Python, and be familiar with basic algorithmic design and analysis. You must have general mathematical maturity.

Note: COMS 4701 (Artificial Intelligence) is not a prerequisite.

A more detailed list of topics is available here.

Online resources for course prerequisites

If you are unsure about whether you satisfy the prerequisites for this course (or would like to “page-in” this knowledge), please check the following links.

Course content

Topics

  • Overview of machine learning
  • Nearest neighbors
  • Prediction theory
  • Regression I: Linear regression
  • Regression II: Regularization
  • Multivariate Gaussians and PCA
  • Regression III: Kernels
  • Classification I: Linear classification
  • Optimization I: Convex optimization
  • Classification II: Margins and SVMs
  • Classification III: Classification objectives
  • Optimization II: Neural networks

If time permits, we may also cover other topics such as boosting, unsupervised learning, online decision making (depending on student interest).

Readings

The lectures will be mostly self-contained, but required reading assignments (which should be completed prior to lecture) will be posted on the website. Additional reading material from some of the following texts will be suggested.

(All of these texts are available online, possibly through Columbia University Libraries.)

Assignments

The overal course grade is comprised of:

  • homework assignments (40%)
  • quizzes (30%)
  • final exam (30%); projected to be Tuesday, December 22

Please submit all assignments by the specified due dates. Extensions are generally only granted for medical reasons. (Please ask your academic advisor to confirm documentation from a physician / medical practitioner, and then ask them to email me their confirmation.)

All written portions of assignments should be neatly typeset as PDF documents. This will make grading much easier! You can use LaTeX, Microsoft Word, or any other system that produces high-quality PDFs with neatly typeset equations. If you have not used LaTeX before, or if you only have a passing familiarity with it, it is recommended that you read and complete the lessons and exercises in The Bates LaTeX Manual or on learnlatex.org. This video by Ryan O’Donnell on writing math in LaTeX is also recommended.

Disability services

If you require accommodations or support services from Disability Services, please make necessary arrangements in accordance with their policies within the first two weeks of the semester.

Academic rules of conduct

You are expected to adhere to the Academic Honesty policy of the Computer Science Department, as well as the following course-specific policies.

Collaboration on homework assignments

You are welcome and encouraged to discuss homework assignments with fellow students. Your discussions should respect the following rules.

  • Homework assignments should be completed individually or in groups of at most three students (including yourself).
    • We will provide instructions for submitting assignments as a group.
    • Every group member must contribute to every part of the assignment; no one should be just “along for the ride”.
    • Every group member must take responsibility for the entire submitted write-up.
    • The submitted write-up should be completely in your own words. If you need to quote or reference a source, you must include proper citations in your write-up.
  • Discussion between groups may include brainstorming and verbally discussing possible solution approaches, but must not go as far as one person telling another how to solve the problem.
    • You may not take any notes (whether handwritten or typeset) from the discussions.
    • Any written/electronic discussions (e.g., over messaging platforms, email) should be discarded/deleted immediately after they take place.
  • You may not look at another group’s homework write-up/solutions (whether partial or complete).
  • You may not show your homework write-up/solutions (whether partial or complete) to another group.

Collaboration or discussion between students is NOT PERMITTED on quizzes or exams.

Use of outside references on homework assignments

Outside reference materials and sources (i.e., texts and sources beyond the assigned reading materials for the course) may be used on homework only if given explicit written permission from the instructor and if the following rules are followed.

  • Any outside reference must be acknowledged and cited in the write-up.
  • Sources obtained by searching the literature/internet for answers or hints on homework assignments are never permitted.
  • You are permitted to use texts and sources on course prerequisites (e.g., a linear algebra textbook).
    • If you need to look up a result in such a source, provide a citation in your homework write-up.
  • If you inadvertently come across a solution to (or substantial hint about) a problem, you must:
    • acknowledge this source and document the circumstance in your homework write-up;
    • produce a solution without looking at the source; and
    • as always, write your solution in your own words.
  • If you have already seen one of the homework problems before (e.g., in a different course), please re-solve the problem without referring to any previous solutions.
    • In your write-up, please also indicate that you had seen the problem before. (You won’t lose any credit for this; it would just be helpful for us to know about this fact.)

Outside references CANNOT be used on quizzes or exams unless you have received explicit written permission from the instructor.

Violations

Violation of any portion of these policies will result in a penalty to be assessed at the instructor’s discretion (e.g., a zero grade for the assignment in question, a failing letter grade for the course). All violations are reported to the relevant dean’s office.

Getting help

You are encouraged to use office hours and Piazza to discuss and ask questions about course material and reading assignments, and to ask for high-level clarification on and possible approaches to homework problems. If you need to ask a detailed question specific to your solution, please do so on Piazza and mark the post as “private” so only the instructors can see it.

Questions, of course, are also welcome during lecture. If something is not clear to you during lecture, there is a chance it may also not be clear to other students. So please raise your hand to ask for clarification during lecture. Some questions may need to be handled “off-line”; we’ll do our best to handle these questions in office hours or on Piazza.