# COMS 4721 Spring 2022 (Machine Learning)

This is the website for COMS 4721, which is taught during Spring 2022.

## Essential information

• Lecture time: Tue/Thu 1:10pm-2:25pm
• Lecture venue: 301 Uris
• Instructor: Daniel Hsu
• Course assistants: Manohar Anantha, Katie Kim, Bin Li, Han Lin, Abhinava Sikdar, Jincheng Xu
• Office hours: See course calendar

Machine learning is the study of computational mechanisms that “learn” from data to make predictions and decisions. Such mechanisms execute algorithms intended to identify useful patterns from relevant data in order to devise a rule for prediction or decision-making. But in what kinds of data is it possible to find such patterns, and under what circumstances would such patterns lead to accurate predictions or prudent decisions? These are the types of questions addressed by the computational and statistical approach to machine learning which is at the heart of this course.

## Homework schedule

Instructions and materials for homework assignments can be found on Courseworks under “Files”.

• HW0: due Monday, January 24 at 11:59 PM
• HW1: due Friday, February 11 at 11:59 PM
• HW2: due Friday, March 4 at 11:59 PM
• HW3: due Friday, April 1 at 11:59 PM
• HW4: due Friday, April 29 at 11:59 PM

## Course calendar

Unless otherwise stated, office hours are conducted over Zoom. The Zoom link can be found in Courseworks.

The calendar is also available in iCal format.

## Syllabus

### Course description

COMS 4721 is a graduate-level introduction to machine learning. The course covers basic statistical principles of supervised machine learning, as well as some common algorithmic paradigms.

Note: The course description for COMS 4721 elsewhere (e.g., SSOL, Vergil) is out-of-date.

#### Relation to COMS 4771

This course is the same as COMS 4771, except that enrollment is restricted to students in the Data Science MS program.

If you are not in the Data Science MS program, consider taking COMS 4771 instead.

#### Instruction modality

As announced by the university, the first two weeks of lectures will be held online via Zoom. The Zoom links will be made available on Courseworks; recordings of these lectures are expected to be made available there a little while after each lecture (assuming someone remembers to click ‘record’ …).

If/when in-person instruction resumes, it is unclear if Zoom-based attendance/lecture recordings will be possible due to lack of technical support.

### Learning goals

• Apply mathematical and statistical principles to understand and reason about machine learning problems and algorithms.
• Apply algorithmic techniques to construct machine learning algorithms.

### Prerequisites

You must know multivariate calculus, linear algebra, and basic probability. You must be comfortable with writing code to process and analyze data in Python, and be familiar with basic algorithmic design and analysis. You must have general mathematical maturity. A detailed list of prerequisite topics is available here.

#### Caution

In previous offerings of this course, some students who only had a passing familiarity with the prerequisite topics have found the course material extraordinarily difficult.

### Resources for prerequisite topics

If you are unsure about whether you satisfy the prerequisites for this course (or would like to “page-in” this knowledge), please check the following links.

### Topics

• Overview of machine learning
• Decision trees
• Model selection
• Statistical models
• Ensemble methods
• Linear regression
• Linear classification
• Regularization, margins, and SVM
• Numerical optimization
• Classification objectives
• Multivariate Gaussians and PCA
• Kernel machines and neural networks

If time permits, we may also cover other topics such as boosting, unsupervised learning, online decision making (depending on student interest).

The lectures will be mostly self-contained, but required reading assignments (which should be completed prior to lecture) will be posted on the website. Additional reading material from some of the following texts will be suggested.

(All of these texts are available online, possibly through Columbia University Libraries.)

The overall course grade is comprised of:

• homework assignments (40%)
• quizzes (30%)
• final exam (30%)

Submission of assignments and grading will be handled on Gradescope and Courseworks.

The final letter grade is determined by the overall percentage $$p$$ of points you earn. For each $$y \in \{ {\operatorname{A}}, \operatorname{B}, \operatorname{C}, \operatorname{D}, \operatorname{F} \}$$, if $$p \geq l_y$$, then the letter grade is $$y$$ or better (possibly with a plus or minus modifier). The values $$l_{\operatorname{A}}, l_{\operatorname{B}}, l_{\operatorname{C}}, l_{\operatorname{D}}, l_{\operatorname{F}}$$ are guaranteed to satisfy $$l_{\operatorname{A}} \leq 90$$, $$l_{\operatorname{B}} \leq 80$$, $$l_{\operatorname{C}} \leq 70$$, $$l_{\operatorname{D}} \leq 60$$, $$l_{\operatorname{F}} \leq 0$$, and $$l_{\operatorname{F}} \leq l_{\operatorname{D}} \leq l_{\operatorname{C}} \leq l_{\operatorname{B}} \leq l_{\operatorname{A}}$$, although their final values may only be determined at the end of the semester. Note that it is not a priori expected that the distribution of final letter grades (over all students in the course) will neatly match any particular probability distribution or “curve”.

### Homework policy

All written portions of assignments must be neatly and legibly typeset as PDF documents; handwritten submissions will not be accepted. The purpose of this requirement is to make it easier to grade your work. Before submission, please read the rendered PDF document to make sure that everything is legible and neat!

You can use LaTeX, Microsoft Word, or any other system that produces high-quality PDFs with neatly typeset mathematical expressions. The purpose of this requirement is to make it easier to grade your work.

• Jupyter Notebook allows one to input some mathematical expressions in a Markdown cell using LaTeX syntax. But you should export (via the “Download as” menu option) the notebook using “LaTeX” or “PDF via LaTeX”. Please do not export using “PDF via HTML”, since this may cut off some parts of the cells. (You can use Overleaf or a local LaTeX system to create a PDF file from the LaTeX source that is exported by Jupyter Notebook.)

If you plan to use LaTeX but have not used it before, or if you only have a passing familiarity with it, it is recommended that you read and complete the lessons and exercises in The Bates LaTeX Manual or on learnlatex.org. This video by Ryan O’Donnell on writing math in LaTeX is also recommended.

Programming portions of assignments will be submitted on Courseworks; instructions will be given in the homework assignments.

### Extensions

All assignments must be submitted by the specified due dates unless an extension has been approved. Late submissions are not accepted unless explicitly stated in the assignment instructions.

Extensions are generally only granted for medical reasons. To request an extension, ask your academic advisor to (1) confirm documentation from a physician / medical practitioner and (2) to communicate this confirmation and your extension request via an email to the instructor.

### Disability services

In order to receive disability-related academic accommodations for this course, students must first be registered with their school’s Disability Services office. Detailed information is available online for both the Columbia and Barnard registration processes. Refer to the appropriate website for information regarding deadlines, disability documentation requirements, and drop-in hours (Columbia) / intake session (Barnard).

You are expected to adhere to the Academic Honesty policy of the Computer Science Department, as well as the course-specific policies described below.

#### Collaboration on homework assignments

You are welcome and encouraged to discuss homework assignments with fellow students. Your discussions should respect the following rules.

• Homework assignments should be completed individually or, if explicitly permitted, in groups of two.
• The submitted write-up should be completely in your own words. If you need to quote or reference a source, you must include proper citations in your write-up.
• Discussion between groups may include brainstorming and verbally discussing possible solution approaches, but must not go as far as one person/group telling another how to solve the problem.
• You may not take any notes (whether handwritten or typeset) from the discussions.
• Any written/electronic discussions (e.g., over messaging platforms, email) should be discarded/deleted immediately after they take place.
• You may not look at another group’s homework write-up/solutions (whether partial or complete).
• You may not show your homework write-up/solutions (whether partial or complete) to another group.

COLLABORATION OR DISCUSSION BETWEEN STUDENTS ON QUIZZES OR EXAMS IS NOT PERMITTED.

#### Special rules for working in groups

• We will provide instructions for submitting assignments as a group.
• The group members must “practice” making a submission as a group well before the deadline so as to not run into any technical problems with the final submission.
• Each group member must take responsibility for the entire submitted write-up.
• Each group member must contribute to every part of the assignment; no one should be just “along for the ride”.
• If a homework group must dissolve before an assignment is submitted (e.g., because one student falls ill and requires an extension), both students must (1) immediately inform the instructor of the circumstance by email, and (2) indicate on their independent submission which parts were done in collaboration.

#### Use of outside references on homework assignments

Outside reference materials and sources (i.e., texts and sources beyond the assigned reading materials for the course) may be used on homework only if given explicit written permission from the instructor and if the following rules are followed.

• Any outside reference must be acknowledged and cited in the write-up.
• Sources obtained by searching the literature/internet for answers or hints on homework assignments are never permitted.
• You are permitted to use texts and sources on course prerequisites (e.g., a linear algebra textbook).
• If you need to look up a result in such a source, provide a citation in your homework write-up.
• If you inadvertently come across a solution to (or substantial hint about) a problem, you must:
• acknowledge this source and document the circumstance in your homework write-up;
• produce a solution without looking at the source; and
• If you have already seen one of the homework problems before (e.g., in a different course), please re-solve the problem without referring to any previous solutions.

OUTSIDE REFERENCES CANNOT BE USED ON QUIZZES OR EXAMS UNLESS YOU HAVE RECEIVED EXPLICIT WRITTEN PERMISSION FROM THE INSTRUCTOR.

#### Violations

Violation of any portion of these policies will result in a penalty to be assessed at the instructor’s discretion. This may range from a zero grade for the assignment in question to a failing letter grade for the course. All violations are automatically reported to Student Conduct and Community Standards.

### Getting help

You are encouraged to use office hours and Ed to discuss and ask questions about course material and reading assignments, and to ask for high-level clarification on and possible approaches to homework problems.

• If you need to ask a detailed question specific to your solution on a homework problem, please do so on Ed and mark the post as “private” so only the instructor and course assistants can see it.

Questions, of course, are also welcome during lecture. If something is not clear to you during lecture, there is a chance it may also not be clear to other students. So please raise your hand to ask for clarification during lecture. Some questions may need to be handled “off-line”; we’ll do our best to handle these questions in office hours or on Ed.

When asking questions on Ed or in office hours, please be as specific as possible and give all of the relevant context.

• Questions like “can you explain X” and “how do I solve Y” are not questions that can be usefully answer on Ed or in office hours.

To help us (the instructor, TAs, other students) provide a useful answer, make your question specific and accompanied by relevant context.

• E.g., “It seems to me that Theorems X and Y from last week’s lecture (discussed in textbook Z) have contradicting conclusions. I believe Theorem X applies in the following premise […], but applying Theorem Y to the same premise gives an opposite conclusion. Why does Theorem Y not apply?”

## Quiz 1 instructions

Quiz 1 will be released on February 24 at 4:00 PM. You can choose to do it in any contiguous 1 hour period between February 24th at 4:00 PM and February 25th at 4:00 PM.

For Quiz 1, you can use the following:

1. Pen and/or pencil.
2. As many sheets of scratch paper as you like.
5. Basic calculator (though you probably won’t need one).

You cannot use any of the following:

1. Phones, tablets, smartwatches, pagers, walkie-talkies, smoke signal apparatus, etc.
2. Any computer or internet resources beyond Gradescope and those that are explicitly allowed above.

You may not communicate with anyone while taking the quiz. The answers you submit must be your own. You must not give or receive any unauthorized assistance.

There is a “Fake Quiz” on Gradescope that demonstrates the format of the quiz.

## Quiz 2 instructions

Quiz 2 will be released on March 24 at 4:00 PM. You can choose to do it in any contiguous 1 hour period between March 24th at 4:00 PM and March 25th at 4:00 PM.

For Quiz 2, you can use the following:

1. Pen and/or pencil.
2. As many sheets of scratch paper as you like.
5. Basic calculator (though you probably won’t need one).

You cannot use any of the following:

1. Phones, tablets, smartwatches, pagers, walkie-talkies, smoke signal apparatus, etc.
2. Any computer or internet resources beyond Gradescope and those that are explicitly allowed above.

You may not communicate with anyone while taking the quiz. The answers you submit must be your own. You must not give or receive any unauthorized assistance.

The format of the quiz will be similar to that of Quiz 1.

## Quiz 3 instructions

Quiz 3 will take place in the first thirty minutes of lecture on April 21. It will cover course material through the lecture on kernels and neural networks.

For Quiz 3, you can use the following:

1. Pen and/or pencil.
2. One letter-size page of notes (handwritten or typed; both sides ok).
3. Basic calculator (though you probably won’t need one).

You cannot use any of the following:

1. Books, notes beyond what is explicitly allowed as stated above.
2. Computers, phones, tablets, smartwatches, pagers, walkie-talkies, smoke signals, etc.
3. The internet.
4. Friends, enemies, classmates, strangers, family members, pets, etc.

(We will provide pages to use as scratch paper as part of the quiz; do not bring your own.)

You may not communicate with anyone (except the instructor) while taking the quiz. The answers you submit must be your own. You must not give or receive any unauthorized assistance.

The first page of the quiz is available on Courseworks.

## Final exam

• Date: May 10
• Time: 1:10 PM to 2:25 PM

Please plan to be in a location with a desktop/laptop computer with a camera and reliable internet connection during the scheduled final exam time period.

• The exam will be conducted using Courseworks; you’ll find it under “Quizzes”.
• You will also need to be connected to the final exam Zoom meeting during the exam time period; the link will be made available through Courseworks (“FINAL EXAM FOR COMSW4721_001_2022_1 - MACHINE LEARNING”).
• “Attendance” in the Zoom meeting is required and will be logged.
• You should be in the meeting for the full 75 minutes of the exam.
• The Zoom meeting will be recorded.

For the final exam, you can use the following:

1. Pen and/or pencil.
2. As many sheets of notes as you like (handwritten or typed) and as many sheets of scratch paper as you like.
3. Basic calculator.
4. Laptop or desktop computer for the purposes of (1) connecting to the final exam Zoom meeting, and (2) connecting to Courseworks to read the exam questions and to submit your answers.

Use of Zoom during the final exam:

• You will be able to ask clarification questions by sending private chat messages to the instructor over Zoom.
• You must have your camera on and positioned in a way to show your face at all times.
• Please turn your audio on, so that you can hear announcements.
• You must be in the meeting for the full 75 minutes of the exam.

You cannot use any of the following:

1. Phones, tablets, smartwatches, pagers, walkie-talkies, smoke signals, etc.
2. Any computer or internet resources beyond those that are explicitly allowed above. For instance, you should not read or ask questions on Ed.
3. Friends, enemies, classmates, strangers, family members, pets, etc.

During the exam, you must not communicate with anyone except for the instructor to ask questions. The answers you submit must be your own. You must not give or receive any unauthorized assistance.

There is a “Fake Quiz” on Courseworks that demonstrates the format of the exam.