Complete List of Publications
-
Verifying Systems Rules Using Rule-Directed Symbolic Execution
 [bib]
Eighteenth International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '13), 2013
Describes Woodpecker, a system that leverages path slicing to speed up symbolic execution. It enables users to check systems rules, and avoids checking program paths irrelevant to the rule, drastically reducing the amount of redundant work. -
Sound and Precise Analysis of Parallel Programs through Schedule Specialization
 [bib]
Proceedings of the ACM SIGPLAN 2012 Conference on Programming Language Design and Implementation (PLDI '12), June, 2012
Describes a program analysis framework for analyzing multithreaded programs with high precision. The key idea is to statically analyze a multithreaded program toward only a small set of schedules to improve precision, then enforce these schedules at runtime for soundness. -
Concurrency Attacks
 [bib]
the Fourth USENIX Workshop on Hot Topics in Parallelism (HOTPAR '12), June, 2012
Studies the security consequences of concurrency errors. -
Efficient Deterministic Multithreading through Schedule Relaxation
 [bib]
Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP '11), October, 2011
Describes Peregrine, a system for efficiently making threads deterministic, addressing a key open challenge within the field of deterministic execution. -
Pervasive Detection of Process Races in Deployed Systems
 [bib]
Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP '11), October, 2011
Describes RacePro, a system for finding process races (e.g., multiple processes accessing a shared resource such as a file without proper synchronization). -
Practical Software Model Checking via Dynamic Interface Reduction
 [bib]
Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP '11), October, 2011
Our latest model checking paper. It describes a new reduction technique that decomposes a full distributed system into components and then explores the executions of these components in a divide-and-conquer way. -
Concurrency Attacks
 [bib]
Columbia University Technical Report CUCS-028-11Studies the security consequences of concurrency errors. -
Finding Concurrency Errors in Sequential Code---OS-level, In-vivo Model Checking of Process Races
 [bib]
Proceedings of the 131th USENIX workshop on Hot topics in operating systems (HOTOS '11), 2011
Argues that process races (e.g., multiple processes accessing a shared resource such as a file without proper synchronization) are bad and that the research community has not given them their due share of attention. -
Optimizing Data Partitioning for Data-Parallel Computing
 [bib]
Proceedings of the 131th USENIX workshop on Hot topics in operating systems (HOTOS '11), 2011
Presents a crucial problem in data-parallel computing: how to evenly partition data, and sketches a solution to this problem. -
Context-based Online Configuration-Error Detection
 [bib]
Proceedings of the USENIX Annual Technical Conference (USENIX '11), 2011
Describes CODE, a system for automatically detecting software configuration errors. The key insight is to infer configuration access invariants that predict what access events follow what contexts. -
Stable Deterministic Multithreading through Schedule Memoization
 [bib]
Proceedings of the Ninth Symposium on Operating Systems Design and Implementation (OSDI '10), October, 2010
Describes Tern, a system for making threads more deterministic and stable. The key idea is to memoize past schedules and reuse them when possible, much like the natural tendencies in animals and humans to repeat familiar routes to avoid possible hazards along unknown routes. -
Bypassing Races in Live Applications with Execution Filters
 [bib]
Proceedings of the Ninth Symposium on Operating Systems Design and Implementation (OSDI '10), October, 2010
Describes LOOM, a live-workaround system for fixing races in live applications. LOOM is safe (live-update will not introduce new errors), fast (negligible overhead for most benchmarks), and flexible (able to fix all bugs evaluated). -
Bypassing Races in Live Applications with Execution Filters
 [bib]
Columbia University Technical Report CUCS-036-10A more complete description of LOOM compared to our OSDI '10 paper. -
Scalable and Systematic Detection of Buggy Inconsistencies in Source Code
 [bib]
Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA '10), October, 2010
Describes how we found many copy-and-paste bugs in a large commercial code base. -
MODIST: Transparent Model Checking of Unmodified Distributed Systems
 [talk | bib]
Proceedings of the Sixth Symposium on Networked Systems Design and Implementation (NSDI '09), April, 2009
Describes how we applied our in-situ model checking approach to find 10 protocol-level errors in three real distributed systems, including a production system that has been managing more than 100 thousand machines for over two years. Note this version has two minor calculation errors fixed. -
A Software Checking Framework Using Distributed Model Checking and Checkpoint/Resume of Virtualized PrOcess Domains
 [bib]
CS Department, Columbia University Technical Report CUCS-032-09Describes a version of eXplode for in-vivo model checking. -
Distributed eXplode: A High-Performance Model Checking Engine to Scale Up State-Space Coverage
 [bib]
CS Department, Columbia University Technical Report CUCS-051-08Describes a distributed version of eXplode. -
EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors
 [talk | bib]
Proceedings of the Seventh Symposium on Operating Systems Design and Implementation (OSDI '06), November, 2006
Describes our in-situ model checking approach, which made it easy to thoroughly check real systems. We applied eXplode to 17 storage systems and found serious data-loss errors in every system checked. This paper is my favorite in describing our model checking approach. -
Using Model Checking to Find Serious File System Errors
 [bib]
ACM Transactions on Computer Systems, Volume 24, Issue 4 (November, 2006)
A journal version of our FiSC work, forwarded from OSDI 04. -
Automatically Generating Malicious Disks using Symbolic Execution
 [talk | bib]
Proceedings of the 2006 IEEE Symposium on Security and Privacy (S\&P '06), May, 2006
Describes how we generated disks-of-death using symbolic execution. These disks, when mounted, can crash or take over control of your machine. -
eXplode: A Lightweight, General Approach for Finding Serious Errors in Storage Systems
 [bib]
Proceedings of the first Workshop on the Evaluation of Software Defect Detection Tools (BUGS '05), June, 2005
Describes a preliminary version of the eXplode storage system checker -
Using Model Checking to Find Serious File System Errors
 [talk | bib]  (Best paper award)
Proceedings of the Sixth Symposium on Operating Systems Design and Implementation (OSDI '04), December, 2004
Describes how we leveraged model checking to find serious errors in three widely used, well tested Linux file system errors. Some of these errors can vaporize your entire file systems. -
MECA: an Extensible, Expressive System and Language for Statically Checking Security Properties
 [bib]
Proceedings of the 10th ACM conference on Computer and communications security (CCS '03), October, 2003
Describes an extensible and lightweight annotation system that allows programmers to write a small set of domain-specific annotations to effectively annotate large bodies of code. -
An Empirical Study of Operating Systems Errors
 [bib]
Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP '01), November, 2001
Presents our study of the errors found using meta-compilation. Some interesting conclusions include drivers are up to three to seven times buggier than the rest of the kernel. -
Correlation exploitation in error ranking
 [bib]
Proceedings of the 12th ACM SIGSOFT International Symposium on Foundations of Software Engineering (SIGSOFT '04/FSE-12), November, 2004
Describes how we can exploit the correlations of error messages emitted by static analysis tools to cluster false positives together, thus improve the effectiveness of the static analysis tools. -
Kinesis: A new approach to replica placement in distributed storage systems
 [bib]
ACM Transactions on Storage Systems, Volume 4, Issue 4 (January, 2009)
Describes a new replica placement strategy that uses multiple linear hash functions to achieve high performance, scalability, and availability.