Broadly speaking, I am interested in computer systems research, including
distributed systems, the Web, security and privacy, operating systems, and
databases. More specifically, my current research focuses on the challenges
and opportunities created by today's emerging technologies, such as the Web,
cloud computing, and powerful mobile devices.
Brief descriptions of my research projects follow.
A one-page research statement outlining my vision about privacy in this data-driven world can be found here.
For descriptions of the exciting projects in our broader systems group
at Columbia, please see our Software
Systems Lab page.
XRay: Increasing the Web's Transparency. Today's Web services accumulate
enormous sensitive information -- such as emails, search logs, or locations -- and use them
to target advertisements, prices, or products at users. Presently, users have little insight
into how their data is used for such purposes. To enhance transparency, we are building
XRay, a Chrome plugin that predicts what data -- such as emails or searches -- is used to
target which ads in Gmail, which prices in Amazon, etc. The mechanism is Web-service
independent, though the plugin is not. The insight is to compare ads/prices witnessed by
different accounts with similar, but not identical, subsets of the data.
A paper describing XRay appeared at
USENIX Security 2014. The project's
website is here.
This work is in collaboration with Prof.
Augustin Chaintreau from Columbia.
- Modern protection abstractions for modern OSes.
Data storage abstractions in OSes have evolved enormously.
While traditional OSes used to provide fairly low-level
abstractions -- files and directories -- modern OSes, including
Android, iOS, OSX, and recent Windows, embed much higher-level
abstractions, such as relational databases or object-relational
models. Despite the change in abstraction, many crucial
protection systems, such as encryption or deniable systems,
still operate at the old file level, which often renders them
ineffective. We are investigating new data protection abstractions
that are more suitable for modern operating systems, including a
new logical data object abstraction, which corresponds directly
to user-level objects, such as emails, documents, or pictures.
Thus far, we've investigated two end-of-spectrum approaches for
implementing logical data objects: (1) expose a new APIs to app
programmers (CleanOS system, described in an
OSDI 2012 paper)
and (2) recognize objects automatically by leveraging structural
information from modern storage abstractions (lOS system,
described in an upcoming OSDI 2014 paper, done in collaboration with
Prof. Gail Kaiser).
- Synapse: Data Exchange Done Right.
Data has become the principal asset of the Internet era, which everyone strives
to acquire and process. A new economy is emerging, in which striking amounts of continuously
changing user data is being sold and shared for others to process upon. That economy
needs to be controlled so that information can be shared efficiently, with strong semantic
guarantees, and securely across multiple applications. To this end, we are building Synapse,
an easy-to-use, strong-semantics, secure Web programming framework for large-scale,
data-driven Web service integrations. This project is in collaboration with Prof.
Jason Nieh from Columbia.
- Evade: Cloud Architectures for Dealing with Intrusion Alarms.
Large-scale clouds are magnets for increasingly sophisticated intrusion attacks. Intrusion
detection systems have long been designed to detect and prevent such attacks, but they
present a difficult challenge: achieving high detection rates results in huge numbers of
false positives, for which no good mechanisms to handle them exist. We are building Evade,
a new cloud architecture designed to deal with intrusion alarms cheaply and gracefully.
- vTube: An Interactive Repository of Executable Content for Long-term Content Preservation.
Humangous amounts of data, software, and research artifacts are being
produced today, including photos, documents, games, applications, and
scientific models. While many techniques
exist to ensure the availability of these digital contents in
the short to medium term, little is known about long-term preservation.
What will happen with these contents 50 years from
now, when most of the libraries, codecs, OSes, and hardware platforms
that they were created for will have evolved in non-backward-compatible
ways? In collaboration with Prof. Mahadev Satyanarayanan from CMU
and Dr. Kaustubh Joshi
from ATT, we are building infrastructure necessary for long-term
preservation and convenient access of such digital artifacts despite
infrastructural change. Read about our first step toward this
goal in our SoCC 2013 paper,
which describes vTube,
a YouTube-like system that stores, archives, and streams executable
content, i.e., data or software plus all of their dependencies.
Notable Recent Projects
- Keypad: Auditing File System for Mobile Devices.
With today's limited anti-theft tools, users can neither assuredly restrict
nor remotely monitor a thief's data accesses on a stolen or lost mobile device.
I am currently building Keypad, a new file system that enhances data security
on mobile devices by providing users with post-theft remote control and
fine-grained access auditing. Details about Keypad are available in our
EuroSys 2011 paper,
which won a "Best Student Paper" award.
- Vanish: Self-destructing Data.
Users' migration to cloud and Web services is causing them to lose
control over the lifetime of their data. Vanish is a self-destructing data
system that allows users to impose timeouts on their Web data, such as emails,
Facebook messages, or Google Docs. The project's
web page includes a detailed
description of our Vanish work and links to our prototype. Our initial Vanish
design was described in our
USENIX Security 2009
paper, which received an "Outstanding Student Paper" award.
- Comet: Active Distributed Storage Systems.
Today's cloud storage services, such as Amazon S3 or peer-to-peer DHTs,
are highly inflexible and impose a variety of constraints on their clients:
specific replication and consistency schemes, fixed data timeouts,
limited logging, etc. We witnessed such inflexibility first-hand as part of our
Vanish work, where we used a DHT to store encryption keys
temporarily. To address this issue, we built Comet, an extensible storage
service that allows clients to inject snippets of code that control their
data's behavior inside the storage service. Details about Comet can be found
in our OSDI 2010 paper.
- CloudViews: Web Service Composition in Public Clouds.
Today's migration of Web services to public clouds such as Amazon AWS creates
an unprecedented environment where a myriad of Web services are co-located on
the same cloud or the same data center. CloudViews investigates the unique
opportunities for Web service sharing and composition that is spawned
by the public-cloud environment. CloudViews' vision is described in our
HotCloud 2009 paper.
- Menagerie: Personal Web-Data Organization.
The radical shift from the desktop to Web-based services is scattering
personal data across a myriad of Web sites, such as Yahoo! Mail, Google Docs,
Facebook, and Amazon S3. Menagerie addresses some of the data management
challenges raised by this dispersion by providing users and applications
with a uniform view of all scattered Web data. More details about Menagerie
can be found in our
WWW 2008 paper.
- Fault-tolerant File System Specifications.
During my summer internship with
Research (Silicon Valley) in 2007, I worked on creating and analyzing
formal specifications for several fault-tolerant file systems: Niobe, GFS,
and Chain Replication. Our goal was to explore the extent to which formal
methods could help in fault-tolerant file system analysis, design, and
comparison. Our results and experience are described in our
DSN-DCCS 2008 paper.
- HomeViews: Personal File Organization and Sharing.
Today's users possess enormous amounts of data. To facilitate the organization
and sharing of their data, we designed HomeViews, a peer-to-peer data
management system that allows users to create database-style views of their
data and share them securely with other users. Details about this system are
available in our SIGMOD
- FlowDB: Using Relational Databases in Network Intrusion Analysis.
In the FlowDB
project, we investigated whether out-of-the-box relational databases are
amenable for use as backends for network intrusion detection systems (NIDSs).
To cope with high input rates, these systems typically come with their own
custom storage backends. These custom solutions, however, impose severe
limitations on query processing at forensic analysis time. In FlowDB, we
evaluated a set of techniques for making relational databases amenable for use
under NIDSs. Our results are described in our
NetDB 2007 paper.