- Time: Monday/Wednesday 1:10pm - 2:25pm
- Location: 417 International Affairs Building
- Instuctor: Andreas C. Müller
- Office hours: Monday, 10am-12pm, Interchurch 320K
- Course Assistants:
- Shubham Agrawal sa3762 Wednesday 3pm-5pm
- Deka Auliya Akbar da2897 Thursday 10am-12pm
- Ishaan Arora ia2419 Thursdays 3:40pm-5:40pm
- Ritesh Baldva rb3447 Monday 11am-1pm
- Satvik Jain sj2995 Friday 4pm-6pm
- Hritik Jain hj2533 Truesday 2pm-4pm
- Amogh Mishra am5323 Fridays 7pm-9pm
- Kumari Nishu kn2492 Thursdays 1:30pm-3:30pm
- Kartik Parnami kp2844 Monday 7pm-9pm
- Neelam Patodia np2723 Thursday 7pm-9pm
- Jiayin Yang jy3016 Tuesday 2pm-4pm
MS DS students can enroll via SSOL. External students need to follow the Cross-Registration Instructions provided by DSI. The course will be restricted during early registration for just DSI students. If there is space, we will open up the waiting list to non-DSI students on January 14th.
Waiting list #
Undergraduate and Master students not in the DSI program will be admitted from the waiting list in order of position on the waiting list. The class has been extended to 225 students to accommodate the demand. If you are an undergraduate or master student, please don’t reach out to the instructor to be let into the class. Students will be enrolled based on their place on the waiting list if spaces become available.
Auditing Applied Machine Learning #
If you are not able to enroll in the course or would like to audit for other reasons, anyone interested can get acces to the coursework and piazza platforms. Please email a CA to get yourself added once the class started. If you are auditing, your exams and homework will not be graded, and the class will not be included in your transcript.
Each lecture will be recorded and posted on Youtube. While it is suggested to attend the lectures in person, it’s not required. Please be aware that the lecture audio might capture questions asked during the lecture.
This class offers a hands-on approach to machine learning and data science. The class discusses the application of machine learning methods like SVMs, Random Forests, Gradient Boosting and neural networks on real world dataset, including data preparation, model selection and evaluation. This class complements COMS W4721 in that it relies entirely on available open source implementations in scikit-learn and tensor flow for all implementations. Apart from applying models, we will also discuss software development tools and practices relevant to productionizing machine learning models.
Familiarity with Python programming and basic use of NumPy, pandas and matplotlib. A good reference is the Python Data Science Handbook by Jake VanderPlas. It’s online for free and available as a notebook at the link above. I highly recommend going through it before starting the class.
Grading / course grade #
6 homework assignments (60%), midterm exam (20%), final in-class exam (20%). All homework assignments are programming assignments and need to be submitted via Github (as will be explained in the class). The midterm will test material from the first half of the class, while the second exam will test material from the second half.
Homework policy #
All homework assignments are due at 1pm. No later submissions (or commits) will be accepted. There are no deadline extensions. The last commit before the deadline will be counted as your submission. All code is expected to run on Python 3.4 and adhere to the pep8 standard.
The exams will be written, no computer or course material allowed. Everything that is on the slides or on the notes to the slides is up for testing. There might be some minor coding, but mostly conceptual questions and multiple choice.
The general syntax of the libraries that are discussed in class is up for testing, but the point is not to memorize all functions and arguments.
Academic rules of conduct #
You are expected to adhere to the Academic Honesty policy of the Computer Science Department, as well as the following course-specific policies.
You are welcome and encouraged to discuss course materials and reading assignments with other students. Please limit discussion of homework to general approaches. You are not allowed to share code between submissions or submission groups. For homeworks submitted individually, each individual is required to write their own solution. For homeworks submitted in groups (if allowed), a single write-up should be submitted. Collaboration is not permitted for any of the exams.
Use of outside references #
Students are welcome to use any outside materials sources on general machine learning and data science topics. However, you are not permitted to use solutions to specific homework tasks or problems that you find online. Code that is provided during the lectures or as part of the github repository can be reused for the homework, but should be marked as such.
Violation of any portion of these policies will result in a penalty to be assessed at the instructor’s discretion. This may include receiving a zero grade for the assignment in question AND a failing grade for the whole course, even for the first infraction. Such students are also reported to the relevant Deans’ offices that handle cases of academic dishonesty.
Copyright notice #
Lecture slides, notes, illustrations and notebooks are licensed under CC-0 and can be used without requiring acknowledgement for any purpose (though acknowledgement is appreciated). Homeworks, homework solutions, exams and exam solutions are copyrighted and may not be re-distributed without explicit permission from the instructor.