COMS E6998 (Spring 2019)


Lecture Details

Instructor: Suman Jana
Office: Mudd 412
Office hours: by appointment
TA: Kyle Matoba (km3227@columbia.edu) TA Office hours: Mondays 2:30-4:00 pm over Skype
Classroom: 1127 Seeley W. Mudd Building
Class hours: Thursday (1:10-3:40 pm)

Description

This class is going to focus on improving program analysis using Machine Learning (ML). Traditionally, program analysis has used formal logic due to its mathematical precision and expressiveness. However, such approaches often struggle to scale to large programs. In this class, I plan to explore the challenges and possibilities of using ML together with formal logic to make such analysis scalable without making major sacrifices in precision.

Note:There will be no assigned textbook for the class and you are expected to read the assigned articles/papers/slides carefully.

Prerequisite

There is no formal prerequisite for this class but you should be generally comfortable with ML. Feel free to send me an email if you have any specific questions.

Grading

Schedule

Date Topics Lecture notes & Reading
Jan 24 Basics of program analysis Class notes
Jan 31 Class cancelled
Feb 7 Static Analysis & abstract interpretation Control Flow (Slides: Control Flow Analysis.pptx, Control Flow Analysis.pdf, Notes:Class notes) Control flow reading Data Flow (Data Flow Analysis.pptx, Data Flow Analysis.pdf) Data flow reading Abstract interpretation Reading
Feb 14 Symbolic analysis Symbolic Execution.pptx, Symbolic Execution.pdf additional reading: Symbolic Execution for Software Testing: Three Decades Later (Cadar and Sen)
KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs (Cadar et al.)
CUTE: A Concolic Unit Testing Engine for C (Sen et al.)
DART: Directed Automated Random Testing (Godfroid et al.)
Symbolic execution and program testing (King et al.)
Feb 21 Dynamic analysis & fuzzing fuzzing.pptx, fuzzing.pdf
Feb 28 Improving fuzzing with ML (Neuzz, Learn&Fuzz)
Mar 7 Student Presenter: Gabriel Ryan (Slides) Neural Code Comprehension: A Learnable Representation of Code Semantics by Ben-Nun et al. NIPS 2018
Mar 14 Student Presenter: Harry Smith (Slides) DeepCoder: Learning to Write Programs by Balog et al. ICLR 2017
1 page preliminary project proposals due
Mar 21 No Class (Spring Break)
Mar 28 Student Presenters: Noah Gallant (Slides)
Justin Wong
AppFlow: Using Machine Learning to Synthesize Robust, Reusable UI Tests by Hu et al. FSE'18
code2vec: Learning Distributed Representations of Code by Alon et al. POPL 2019
Apr 4 Student Presenters: Joshua Learn
Saikat Chakraborty/Yufan Zhuang
Neural-Augmented Static Analysis of Android Communication by Zhao et al. FSE'18
Improving Neural Program Synthesis with Inferred Execution Traces by Shin et al. NIPS'18
Leveraging Grammar And Reinforcement Learning For Neural Program Synthesis by Bunel et al. ICLR'18
Apr 11 Christian Doan
Jonas Duan
Avik Laha
Apr 18 Dennis Roellke
Yoongbok Lee
Jeevan Farias/Dmitiri Leggas
Apr 25 Shiqi Wang+Justin Whitehouse
Kyra Busser
May 2 Abhishek Shah/Dongdong She/Kexin Pei, Ben Meerovitch


Online presentations:

Andrew Calvano (Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection) Video, Slides