Equipe Regal, LIP6 / INRIA Paris-Rocquencourt
Paris, France
Tuesday, February 1, 2011, 11AM-Noon, CSB 453 (CS Conference Room)
Abstract:
Replicating shared data is a fundamental mechanism in large-scale
distributed systems, but suffers from a fundamental tension between
scalability and data consistency. Eventual consistency sidesteps the
(foreground) synchronisation bottleneck, but remains ad-hoc,
error-prone, and difficult to prove correct.
We present a promising new approach that is simple, scales almost
indefinitely, and provably ensures eventual consistency: A CRDT is a
data type that demonstrates some simple properties, that its
concurrent operations commute, or that its states form a semi-lattice.
Any CRDT provably converges, provided all replicas eventually receive
all operations. A CRDT requires no synchronisation: an update can
execute immediately, irrespective of network latency, faults, or
disconnection; it is highly scalable and fault-tolerant.
The approach is necessarily limited since any task requiring consensus
is out of reach. Nonetheless, many interesting and useful data types
can be designed as a CRDT. We previously published the Treedoc CRDT, a
sequence data type suited to concurrent editing tasks (as in a p2p
wiki). This talk presents a portfolio of generally-useful, non-trivial,
composable CRDTs, including variations on counters, registers, sets,
maps (key-value stores), graphs and sequences. This research is part of
a systematic and principled study of CRDTs, to discover their power and
limitations, and to better understand the underlying mechanisms and
requirements. The challenges ahead include scaling garbage collection
and integrating occasional non-commuting operations.
This is joint work with Marek Zawirski of LIP6, Nuno Preguica of
Universidade Nova de Lisboa, and Carlos Baquero, of Universidade do
Minho. The research is supported by ANR grant ConcoRDanT
(ANR-10-BLAN-0208), a Google Research Award 2009, and a Google European
Doctoral Fellowship 2010.
Speaker Biography: Marc Shapiro is a researcher at INRIA, focusing on large-scale distributed computing systems. His research topics include data replication and consistency, especially in the wide area and in disconnected operation, in the presence of faults and speculative computation. After his PhD at LAAS (Toulouse), Marc Shapiro did his research at MIT (Cambridge, USA), CMIRH (Paris, France), INRIA (Rocquencourt, France), Cornell University (Ithaca, USA), at Sun Microsystems (Chelmsford, USA), and Microsoft Research (Cambridge, UK). He is currently a senior researcher for INRIA in the Regal group (INRIA-LIP6). Home page: http://lip6.fr/Marc.Shapiro/