COMS 4771 is a graduate-level introduction to machine learning. The course covers basic statistical principles of supervised machine learning, as well as some common algorithmic paradigms.

**Time**: Mon & Wed 10:10am–11:25am**Venue**: 451 Computer Science Building**Instructor**: Daniel Hsu (office hours: Wed 2:30pm–4:30pm in 426 Mudd)**Course assistants**:- Palakorn (Gary) Buranasampatanon (office hours: Thu 1:00pm-3:00pm in IA room)
- Sihyun Lee (office hours: Mon 5:00pm–7:00pm, Tue 11:00am–1:00pm in IA room)

- Apply mathematical and statistical principles to understand and reason about machine learning problems and algorithms.
- Apply algorithmic techniques to construct machine learning algorithms.

You must know multivariate calculus, linear algebra, and basic probability. You must be comfortable writing code to process and analyze data in Python, and be familiar with basic algorithmic design and analysis. You must have general mathematical maturity.

A more detailed list of topics is available here.

Some Python tips are available here.

If you are unsure about whether you satisfy the prerequisites for this course (or would like to “page-in” this knowledge), please check the following links.

*Multivariate calculus*: textbook by Marsden and Weinstein; MIT open courseware.*Linear algebra*: immersive linear algebra; lecture notes from UC Davis; MIT open courseware; book chapter by Goodfellow, Bengio, and Courville; additional review notes*Probability*: textbook by Grinstead and Snell; book chapter by Goodfellow, Bengio, and Courville.*Algorithms*: Chapter 0 of textbook by Dasgupta, Papadimitriou, and Vazirani (with discussion of asymptotic notation); “booksite” by Sedgewick and Wayne.*Mathematical maturity*: notes on writing math in paragraph style from SJSU; notes on writing proofs from SJSU.

- Nearest neighbor classifiers
- Predictions
- Generative models
- Risk estimation, model selection/averaging
- Linear regression
- Logistic regression and linear classifiers
- Support vector machines
- Generalization theory
- Convex optimization
- Optimization algorithms
- Neural networks
- Classification objectives
- Societal consequences
- Ensemble methods
- Clustering
- Principal components analysis

Readings will be assigned from various sources, including the following texts:

*Pattern Classification*(PC) Duda, Hart, and Stork (also on reserve at Science & Engineering Library; call number: Q327 .D83 2001)*A Course in Machine Learning*(CML) by Daumé*Convex Optimization*(CO) by Boyd and Vandenberghe*Boosting: Foundations and Algorithms*(BFA) by Schapire and Freund

(All of these texts are available online, possibly through Columbia University Libraries.)

- Complete assigned reading before each lecture.
- Attend lecture.
- Complete homework assignments (34% of total points).
- Complete two in-class exams (October 17 and December 10; 33% of total points each).

Homework assignments (along with instructions) will be posted on the course website. We aim to have 4-6 homework assignments (not counting Homework 0), 2-3 before the first exam, and 2-3 after. But this is subject to change as the semester progresses.

The first exam covers topics from the first part of the course; the second exam covers topics from the entire course, but with an emphasis on the second part of the course.

**No late homework assignments will be accepted, and there will be no make-up exams.** The lowest homework score (besides that of Homework 0) will be dropped before determining the final grade. If you have to miss an exam due to a valid medical or family emergency, please present any confirmatory documentation (e.g., from a physician) to your academic adviser, and then have your adviser e-mail me about the circumstance. In such a case, some accommodation will be made (e.g., your grade composition may be adjusted).

Final grades are not “curved” to fit any particular distribution. Instead, overall course letter grades are assigned according to the percentage of total points earned: 90%-100% is some kind of A; 80%-89% of the total points is some kind of B, etc.

If you require accommodations or support services from Disability Services, make necessary arrangements in accordance with their policies within the first two weeks of the semester.

You are expected to adhere to the Academic Honesty policy of the Computer Science Department, as well as the following course-specific policies.

You are welcome and encouraged to discuss course materials and reading assignments, and homework assignments **other than Homework 0** with each other in small groups (two to three people). You must list all discussants in your homework write-up. Discussion about homework assignments may include brainstorming and verbally discussing possible solution approaches, but **must not go as far as one person telling others how to solve a problem**. In addition, **you must write-up your solutions by yourself**, and **you may not look at another student’s homework write-up/solutions (whether partial or complete)**.

**Collaboration of any kind on Homework 0 and exams is not permitted**.

Outside reference materials and sources (i.e., texts and sources beyond the assigned reading materials for the course) may be used on homework assignments *only if given explicit written permission from the instructor*. Such references must be appropriately acknowledged in the homework write-up. You must always write up your solutions in your own words.

- Sources obtained by searching the internet for answers or hints on homework assignments are
*never permitted*. - You are permitted to use texts and sources on course prerequisites (e.g., your linear algebra textbook). If you need to look up a result in such a source, provide a citation in your homework write-up.
- If you
*inadvertently*come across the solution to a homework problem: you must acknowledge this source and document the circumstance in your homework write-up, and then do your best to produce a solution without looking at the source. You must, as always, write your solution in your own words.

Violation of any portion of these policies will result in a penalty to be assessed at the instructor’s discretion. **This may include receiving a zero grade for the assignment in question AND a failing grade for the whole course, even for the first infraction.** Such students are also reported to the relevant Deans’ offices that handle cases of academic dishonesty.

You are encouraged to use office hours and Piazza to discuss and ask questions about course material and reading assignments, and to ask for high-level clarification on and possible approaches to homework problems. If you need to ask a detailed question specific to your solution, please do so on Piazza and mark the post as “private” so only the instructors can see it.

Questions, of course, are also welcome during lecture. If something is not clear to you during lecture, there is a chance it may also not be clear to other students. So please raise your hand to ask for clarification during lecture. Some questions may need to be handled “off-line” if they take the lecture discussion too far off-track; we’ll do our best to handle these questions in office hours or on Piazza.

Please don’t expect questions to be answered in the time period *immediately after* lecture. Save your questions for office hours, Piazza, or the next lecture!

Course materials (e.g., lecture slides, lecture notes, homework assignments, homework solutions, exams, exam solutions) are copyrighted and may not be re-distributed without explicit permission from the instructor.