Equipe Regal, LIP6 / INRIA Paris-Rocquencourt
Tuesday, February 1, 2011, 11AM-Noon, CSB 453 (CS Conference Room)
Abstract: Replicating shared data is a fundamental mechanism in large-scale distributed systems, but suffers from a fundamental tension between scalability and data consistency. Eventual consistency sidesteps the (foreground) synchronisation bottleneck, but remains ad-hoc, error-prone, and difficult to prove correct.
We present a promising new approach that is simple, scales almost indefinitely, and provably ensures eventual consistency: A CRDT is a data type that demonstrates some simple properties, that its concurrent operations commute, or that its states form a semi-lattice. Any CRDT provably converges, provided all replicas eventually receive all operations. A CRDT requires no synchronisation: an update can execute immediately, irrespective of network latency, faults, or disconnection; it is highly scalable and fault-tolerant.
The approach is necessarily limited since any task requiring consensus is out of reach. Nonetheless, many interesting and useful data types can be designed as a CRDT. We previously published the Treedoc CRDT, a sequence data type suited to concurrent editing tasks (as in a p2p wiki). This talk presents a portfolio of generally-useful, non-trivial, composable CRDTs, including variations on counters, registers, sets, maps (key-value stores), graphs and sequences. This research is part of a systematic and principled study of CRDTs, to discover their power and limitations, and to better understand the underlying mechanisms and requirements. The challenges ahead include scaling garbage collection and integrating occasional non-commuting operations.
This is joint work with Marek Zawirski of LIP6, Nuno Preguica of Universidade Nova de Lisboa, and Carlos Baquero, of Universidade do Minho. The research is supported by ANR grant ConcoRDanT (ANR-10-BLAN-0208), a Google Research Award 2009, and a Google European Doctoral Fellowship 2010.
Speaker Biography: Marc Shapiro is a researcher at INRIA, focusing on large-scale distributed computing systems. His research topics include data replication and consistency, especially in the wide area and in disconnected operation, in the presence of faults and speculative computation. After his PhD at LAAS (Toulouse), Marc Shapiro did his research at MIT (Cambridge, USA), CMIRH (Paris, France), INRIA (Rocquencourt, France), Cornell University (Ithaca, USA), at Sun Microsystems (Chelmsford, USA), and Microsoft Research (Cambridge, UK). He is currently a senior researcher for INRIA in the Regal group (INRIA-LIP6). Home page: http://lip6.fr/Marc.Shapiro/