Robustness and Security in ML Systems: Junfeng Yang

Location: 222 Pupin
Time: M 10:10am-12:00pm
Credits: 3

Instructor: Junfeng Yang
Address: 519 CSB
Office Hours: By appointment

TA: Chengzhi Mao
Office Hours: By appointment

Discussions (via Piazza)

Email: sysml-course@lists.cs. Best way to reach us.

Course Description

Over the past few years, Machine Learning (DL) has made tremendous progress, achieving or surpassing human-level performance for a diverse set of tasks including image classification, speech recognition, and playing games such as Go. These advances have led to widespread adoption and deployment of ML in security- and safety-critical systems such as self-driving cars, malware detection, and aircraft collision avoidance systems. This wide adoption of DL techniques presents new challenges as the predictability and correctness of such systems are of crucial importance. Unfortunately, ML systems, despite their impressive capabilities, often demonstrate unexpected or incorrect behaviors in corner cases for several reasons such as biased training data, overfitting, and underfitting of the models. In safety- and security-critical settings, such incorrect behaviors can lead to disastrous consequences such as a fatal collision of a self-driving car. For example, a Google self-driving car recently crashed into a bus because it expected the bus to yield under a set of rare conditions but the bus did not. A Tesla car in autopilot crashed into a trailer because the autopilot system failed to recognize the trailer as an obstacle due to its “white color against a brightly lit sky” and the “high ride height.” Such corner cases were not part of Google’s or Tesla’s test set and thus never showed up during testing. Other examples include Microsoft’s Tay chatbot tweeting racist words because it was misled by malicious twitter users, and Google removing “gorilla” as an image class after its image classification algorithm incorrectly classified dark skined people as gorillas.

These challenges have drawn huge attention from researchers in machine learning, security, systems, and programming language communities. A number of techniques and theories have been proposed to increase the robustness and security of machine learning. In this course, we will study the most practical and most important of these techniques and theories with a focus on deep learning. For details on the topics we'll cover, please go to the Course Syllabus page.

Course Goal

The general goal of this course is to help you understand the challenges and solutions to make ML robust and secure. This understanding will make you a more effective ML programmer or scientist. If you are interested in doing research in this emerging area, this course will help you get started

Course Format and Student Workload

This course will center around paper readings, presentations, and discussions; and a final project . The course readings include a list of research papers selected from top machine learning, security, systems, and programming language conferences. We will discuss roughly two papers every class meeting. For the in-depth discussions to be possible, you will have to read the papers carefully before class.

You have three main responsibilities in the course:

Read the assigned papers carefully, before class. One of the main goals of the course is to have interesting in-class discussions so that students can hopefully understand the topics better. This goal is reflected in grading: 40% of the total grade will come from class participations: this includes talking in class, as well as how you do on pop quizzes and (possibly) pop presentations. To truly understand a paper, I recommend you read each paper at least three times: twice very carefully, the last time focusing on the hard parts. You should also form reading groups and discuss the papers before class. Reading and thoroughly understanding a paper is not easy; you may find the reading advice on the advice page helpful.
Present some of the papers. Students sign up to present papers to the class. The key is that you need to really understand the paper and come up with a good way to explain it. Student presenters must send draft slides three days before class to get feedback from the teaching staff. Presenting well is not easy; you may find presentation the advice on the advice page helpful.
Complete the final project. The final project is essentially a mini-research project that may involve building a new system, designing a new algorithm, improving an existing technique, or performing a large case study. You are encouraged to come up with a topic of your own, which I'll help refine; alternatively, you can choose one of the projects I suggest.

Prerequisite

COMS W3137 Data Structures and Algorithms, COMS W3157 Advanced Programming (or good working knowledge of C/C++), and COMS W3827 Fundamentals of Computer Systems; or equivalents of these three courses. Good working knowledge of machine learning and deep learning.

Linux environment. For instance, you should know how to write a make file.

Enrollment

This semester's enrollment for this class will be limited. Please register early if you plan to take this class in this semester. If the class is full and you would like to take the class, please email the instructor and come to the first day of class.

The enrollment is open to PhD, MS and undergraduate students. If you are an undergraduate and would like to take the course, please email the instructor for permission.

Materials

There is no required textbook; all relevant materials will be made available online at the Course Syllabus page.

Grading

40%:	Class participation. To encourage in-depth discussion, 40% of the grade will be assigned to in-class participation and paper presentation.
60%:	Final project.

E6998 Robustness and Security in ML Systems