Complete List of Publications
-
Determinism Is Not Enough: Making Parallel Programs Reliable with Stable Multithreading
 [bib]
Communications of the ACM (2014)
This paper is geared toward a general audience. If you have time to read just one paper on our concurrency work, this is the paper to read. It describes our vision of stable multithreading (StableMT), a radical approach to making multithreading reliable, and summarizes our last five years of work on designing, building, and applying stable multithreading systems. The final version of this paper will appear in CACM. -
Parrot: a Practical Runtime for Deterministic, Stable, and Reliable Threads
 [bib]
Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP '13), November, 2013
Describes Parrot, a simple, deployable thread runtime system for improving reliability with low overhead. This is our most recent and best paper on stable and deterministic multithreading. -
Effective Dynamic Detection of Alias Analysis Errors
 [bib]
Proceedings of the Ninth Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC-FSE '13), August, 2013
Describes NeonGoby, a system for effectively detecting errors in alias analysis, one of the most important and widely used program analyses. -
Determinism Is Overrated: What Really Makes Multithreaded Programs Hard to Get Right and What Can Be Done about It?
 [bib]
the Fifth USENIX Workshop on Hot Topics in Parallelism (HOTPAR '13), June, 2013
A position paper describing our vision of stable multithreading, a radically new approach to making multithreading reliable. -
Verifying Systems Rules Using Rule-Directed Symbolic Execution
 [bib]
Eighteenth International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '13), 2013
Describes Woodpecker, a system that leverages path slicing to speed up symbolic execution. It enables users to check systems rules, and avoids checking program paths irrelevant to the rule, drastically reducing the amount of redundant work. -
Effective Dynamic Detection of Alias Analysis Errors
 [bib]
Columbia University Technical Report CUCS-003-13Technical report version of NeonGoby, a system for effectively detecting errors in alias analysis, one of the most important and widely used program analyses -
Sound and Precise Analysis of Parallel Programs through Schedule Specialization
 [bib]
Proceedings of the ACM SIGPLAN 2012 Conference on Programming Language Design and Implementation (PLDI '12), June, 2012
Describes a program analysis framework for analyzing multithreaded programs with high precision. The key idea is to statically analyze a multithreaded program w.r.t. only a small set of schedules to improve precision and then enforce these schedules at runtime for soundness. -
Concurrency Attacks
 [bib]
the Fourth USENIX Workshop on Hot Topics in Parallelism (HOTPAR '12), June, 2012
Studies the security consequences of concurrency errors. -
Efficient Deterministic Multithreading through Schedule Relaxation
 [bib]
Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP '11), October, 2011
Describes Peregrine, a system for efficiently making threads deterministic, addressing a key open challenge within the field of deterministic execution. -
Pervasive Detection of Process Races in Deployed Systems
 [bib]
Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP '11), October, 2011
Describes RacePro, a system for finding process races (e.g., multiple processes accessing a shared resource such as a file without proper synchronization). -
Practical Software Model Checking via Dynamic Interface Reduction
 [bib]
Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP '11), October, 2011
Our most recent and best model checking paper. It describes a new reduction technique that decomposes a full distributed system into components and then explores the executions of these components in a divide-and-conquer way. -
Concurrency Attacks
 [bib]
Columbia University Technical Report CUCS-028-11Studies the security consequences of concurrency errors. -
Finding Concurrency Errors in Sequential Code---OS-level, In-vivo Model Checking of Process Races
 [bib]
Proceedings of the 131th USENIX workshop on Hot topics in operating systems (HOTOS '11), 2011
Argues that process races (e.g., multiple processes accessing a shared resource such as a file without proper synchronization) are bad and that the research community has not given them their due share of attention. -
Optimizing Data Partitioning for Data-Parallel Computing
 [bib]
Proceedings of the 131th USENIX workshop on Hot topics in operating systems (HOTOS '11), 2011
Presents a crucial problem in data-parallel computing: how to evenly partition data, and sketches a solution to this problem. -
Context-based Online Configuration-Error Detection
 [bib]
Proceedings of the USENIX Annual Technical Conference (USENIX '11), 2011
Describes CODE, a system for automatically detecting software configuration errors. The key insight is to infer configuration access invariants that predict what access events follow what contexts. -
Stable Deterministic Multithreading through Schedule Memoization
 [bib]
Proceedings of the Ninth Symposium on Operating Systems Design and Implementation (OSDI '10), October, 2010
Describes Tern, a system for making threads more deterministic and stable. The key idea is to memoize past schedules and reuse them when possible, much like the natural tendencies in animals and humans to repeat familiar routes to avoid possible hazards along unknown routes. -
Bypassing Races in Live Applications with Execution Filters
 [bib]
Proceedings of the Ninth Symposium on Operating Systems Design and Implementation (OSDI '10), October, 2010
Describes LOOM, a live-workaround system for fixing races in live applications. LOOM is safe (live-update will not introduce new errors), fast (negligible overhead for most benchmarks), and flexible (able to fix all bugs evaluated). -
Bypassing Races in Live Applications with Execution Filters
 [bib]
Columbia University Technical Report CUCS-036-10A more complete description of LOOM compared to our OSDI '10 paper. -
Scalable and Systematic Detection of Buggy Inconsistencies in Source Code
 [bib]
Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA '10), October, 2010
Describes how we found many copy-and-paste bugs in a large commercial code base. -
MODIST: Transparent Model Checking of Unmodified Distributed Systems
 [talk | bib]
Proceedings of the Sixth Symposium on Networked Systems Design and Implementation (NSDI '09), April, 2009
Describes how we applied our in-situ model checking approach to find 10 protocol-level errors in three real distributed systems, including a production system that has been managing more than 100 thousand machines for over two years. Note this version has two minor calculation errors fixed. -
A Software Checking Framework Using Distributed Model Checking and Checkpoint/Resume of Virtualized PrOcess Domains
 [bib]
CS Department, Columbia University Technical Report CUCS-032-09Describes a version of eXplode for in-vivo model checking. -
Distributed eXplode: A High-Performance Model Checking Engine to Scale Up State-Space Coverage
 [bib]
CS Department, Columbia University Technical Report CUCS-051-08Describes a distributed version of eXplode. -
EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors
 [talk | bib]
Proceedings of the Seventh Symposium on Operating Systems Design and Implementation (OSDI '06), November, 2006
Describes our in-situ model checking approach, which made it easy to thoroughly check real systems. We applied eXplode to 17 storage systems and found serious data-loss errors in every system checked. This paper is my favorite in describing our model checking approach, which forms the basis of my PhD thesis work. -
Using Model Checking to Find Serious File System Errors
 [bib]
ACM Transactions on Computer Systems, Volume 24, Issue 4 (November, 2006)
A journal version of our FiSC work, forwarded from OSDI 04. -
Automatically Generating Malicious Disks using Symbolic Execution
 [talk | bib]
Proceedings of the 2006 IEEE Symposium on Security and Privacy (S\&P '06), May, 2006
Describes how we generated disks-of-death using symbolic execution. These disks, when mounted, can crash or take over control of your machine. -
eXplode: A Lightweight, General Approach for Finding Serious Errors in Storage Systems
 [bib]
Proceedings of the first Workshop on the Evaluation of Software Defect Detection Tools (BUGS '05), June, 2005
Describes a preliminary version of the eXplode storage system checker -
Using Model Checking to Find Serious File System Errors
 [talk | bib]  (Best paper award)
Proceedings of the Sixth Symposium on Operating Systems Design and Implementation (OSDI '04), December, 2004
Describes how we leveraged model checking to find serious errors in three widely used, well tested Linux file system errors. Some of these errors can vaporize your entire file systems. -
MECA: an Extensible, Expressive System and Language for Statically Checking Security Properties
 [bib]
Proceedings of the 10th ACM conference on Computer and communications security (CCS '03), October, 2003
Describes an extensible and lightweight annotation system that allows programmers to write a small set of domain-specific annotations to effectively annotate large bodies of code. -
An Empirical Study of Operating Systems Errors
 [bib]
Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP '01), November, 2001
Presents our study of the errors found using meta-compilation. Some interesting conclusions include drivers are up to three to seven times buggier than the rest of the kernel. -
Correlation exploitation in error ranking
 [bib]
Proceedings of the 12th ACM SIGSOFT International Symposium on Foundations of Software Engineering (SIGSOFT '04/FSE-12), November, 2004
Describes how we can exploit the correlations of error messages emitted by static analysis tools to cluster false positives together, thus improve the effectiveness of the static analysis tools. -
Kinesis: A new approach to replica placement in distributed storage systems
 [bib]
ACM Transactions on Storage Systems, Volume 4, Issue 4 (January, 2009)
Describes a new replica placement strategy that uses multiple linear hash functions to achieve high performance, scalability, and availability.