|
|
|
|
Course Description
Despite our increasing reliance on computing platforms, making reliable software systems remains difficult. Software errors have been reported to take lives and cost billions of dollars annually. Making reliable software is one of the most important problems in computer science. In recent years, this problem has drawn huge attention from researchers in systems, software engineering, and programming language communities. A number of automated techniques have been developed to increase system reliability. In this course, we will study the most practical and most important of these reliability techniques. Specifically, we will study:
- Automated debugging. Debugging (i.e. finding the root cause of a bug) is usually a strenuous and painful manual process; we will learn automated techniques to make debugging easier.
- Automated repair. Wouldn't it be nice to have tools to automatically patch your programs to fix errors or prevent malicious attacks? We will learn techniques that can do so.
- Concurrency. CPUs are getting more cores; the implication is that multi-threaded programs will be the mainstream. Unfortunately, programmers think sequentially, thus are prone to mistakes when writing multi-threaded programs. We will learn the characteristics of concurrency errors to better avoid them in our own programs; we will also learn effective techniques to detect, debug, and fix concurrency errors.
- Practical bug-finding techniques. We will learn effective bug-finding techniques that have found thousands of serious errors in large systems, as large as an entire Linux kernel. Some of the bugs may allow attackers to run arbitrary code; some others may cause permanent loss of an entire file system.
For a complete list of topics, go to the Course Syllabus page.
Course Goal
The general goal of this course is to help you make reliable systems. It will help you gain a better understanding of software bugs and techniques to detect, debug, and fix them. This understanding will make you a more effective programmer.
If you are interested in doing research in the area of system reliability, this course can help you get started; if you are currently involved in research in other areas such as operating system, networking, security, and database, this course can help you apply the techniques learned in this course to your research area.
Course Format and Student Workload
This course will center around readings and discussions; it has an optional project component (described here). The course readings include a list of research papers selected from top system, software engineering, and programming language conferences. We will discuss two papers every class meeting. In the first half of a class meeting, the instructor or students will present the papers; in the second half, we will discuss the papers in depth. For the in-depth discussions to be possible, you will have to read the papers carefully before class. To help achieve this, I will post reading questions and you will have to answer these questions and turn in your answers before the day of the class.
You have three basic responsibilities for the papers covered in the course:
- Read the assigned papers carefully, before class. One of the main goals of the course is to have interesting in-class discussions so that students can hopefully understand the topics better. This goal is reflected in grading: 40% of the total grade will come from class participations: this includes talking in class, as well as how you do on pop quizzes and (possibly) pop presentations. To truly understand a paper, I recommend you read each paper at least three times: twice very carefully, the last time focusing on the hard parts. You should also form reading groups and discuss the papers before class. Reading and thoroughly understanding a paper is not easy; you can find reading advice on the advice page.
- Answer the reading questions. I will post one or two reading questions for each paper when it is assigned. The purpose of these questions is to make you think and think critically when you read the papers. You are encouraged to discuss these questions within your reading group, but you must write the answers individually. Your answer for each question must be less than 100 words. Turn in your answers via email (reliability-course@lists.cs) before the day of the class to discuss the paper, with the class and date in the subject line (E6998 Reading 9/14). You should turn in a plain text email with no attachments so I can easily parse them with any mailer.
- Present papers. I will present the papers and lead the discussions for the first half of the semester. In the second half of the semester, each student is expected to present one paper she or he chooses. Each presentation should be within 30 15 minutes to cover the key points of the paper. Student presenters should get their slides ready and show them to me one week before they present the slides to the class; I will provide you feedback on how to revise your slides. Giving good presentations is not easy; you can find presentation advice on the advice page.
Project
The project component is optional. It is essentially a mini research project that may involve building a new system, designing a new algorithm, improving an existing technique, or performing a large case study. If you are interested in doing a project, please discuss it with me. With my approval, you can register a project course with me and get 3 additional units. You are encouraged to come up with a topic of your own, which I'll help refine; alternatively, you can choose one of the projects I suggest. More details are available once you sign up for a project.
Prerequisite
COMS W3137 Data Structures and Algorithms, COMS W3157 Advanced Programming (or good working knowledge of C), and COMS W3827 Fundamentals of Computer Systems.
In addition, students are expected to have done significant programming, by taking an advanced programming course (e.g., COMS W4118 OS or COMS W4115 PLT) or working in industry.
Enrollment
The Fall 2009 enrollment for this class will be limited. Please register early if you plan to take this class in Fall 2009. If the class is full and you would like to take the class, please email the instructor and come to the first day of class.
The enrollment is open to PhD, MS and undergraduate students. If you are an undergraduate and would like to take the course, please email the instructor for permission.
Materials
There is no required textbook; all relevant materials will be made available online at the Course Syllabus page.
Grading
40%: | Class participation. To encourage in-depth discussion, 40% of the grade will be assigned to class participations: this includes talking in class, as well as how well you do on pop quizzes and (possibly) pop presentations. |
30%: | Answers to reading questions. I will post one or two reading questions for each paper. You should answer these questions and turn in your answers before the day of the class. Your answer to each question must be less than 100 words. |
30%: | Paper Presentation. You will present one paper. You should get your slides ready and go through them with me one week before your presentation day. I will provide you feedback so you can revise your slides. Your presentation must be less than 15 minutes. |