Checking Distributed Systems
Introduction
This project basically tends to demenstrate the implementation of DMT systems
in practical environment. Most DMT systems have a common assumption that the
environment input remains unchanged for different runs. This assumption raises
many doubts about practical distribution of DMT systems, because the outside
environment is always an undecidalbe factor to the whole system.
How to make the DMT systems dealing with various external environment? Our
observation is that most events will happen in an expected manner. For example,
in Apache, the main thread is always receiving the first request from a client.
Also, in aget (a multithreaded downloader), every network event is expected to
obtain a specific piece of data from the server.
Besides the special events, it's possible to make the whole system determinitic
with a DMT scheduler in most times. On the other hand, since the number of
special, or say unexpected events, is supposed to be reasonably small, we tend
to apply an input exploration algorithm to expose all possible patterns of
unexpected events. For example, we can adopt a model checker to do this job.
Regular project meeting: 12pm - 1pm, every Wednesday in System Lab.
What's been done
We have a LD_PRELOAD based dynamic instrumentation library and a LLVM-based
static instrumentation library. These libraries allow recording and controlling
program execution.
We have a DMT scheduler framework and implementations of multiple scheduling
algorithms, e.g. Round-Robin, First Come First Serve.
We are able to print execution trace in a log file and produce partial order
view or total order view of the execution trace. So does trace signature.
We have setup multiple benchmarks, including aget, Apache, Berkeley DB
replication. Setup of some other benchmarks (like MongoDB, COPS, and NYU's YFS)
are still on going.
What's our current plan
Build a DMT scheduler algorithm that enforces most determinism even if the
input enviroment changes. Some simple ideas could be applied in this algorithm:
Delay and batch input messages
Run network events as internal operations unless some unexpected behavior is
observed. For example, unexpected delay or message content changes.
Application evaluation. Each application will be executed in a certain
configuration with DMT for many times. We compare the difference between
different runs to determine unexpected events and the portion of
deterministic part. In this part, we want to demenstrate
The number of unexpected events is relatively small.
DMT system is applicable in most times.
More discussion on program behavior in different configurations.
|