Recent updates
Feb 29, 2012
Application's dependency on physical time turns out to be an
obstacle on the road of DMT. Take a glimpse at the following
code appeared in YFS.
void async_call(int socket, timespec finaldeadline)
{
do {
clock_timespec(now);
nextdeadline = now + delta;
if (pthread_cond_timedwait(sth, nextdeadline))
process message queue and return;
} while (nextdeadline < finaldeadline);
}
First, function pthread_cond_timedwait itself is non-deterministic.
The physical time elapses all the way so one cannot guarantee a
precise waiting time of delta. Secondly, it's hard to determine how
long this timedwait is gonna take. The variant cost of timedwait leads
to an undecidable number of loop iterations.
In addition, physical time also matters in the interaction
between different nodes in the distributed system. Think about the
lease protocol. The lease server grants a client a lease for one
minute. And the client guarantees the period by doing
pthread_cond_timedwait with a deadline after 50 seconds. However,
if the physical time is converted into turn numbers, the client is
not able to make the same physical time guarantee it used to make.
To solve this issue, we need sort of timing model of physical
time in applications.
ST-local
Single threaded local programs doesn't care about physical time
at all.
This assertion is easy to make: if you only have one thread, and you
don't interact with any other threads, processes or remote party, why
do you need physical time?
MT-local
In multithreaded local programs, physical time operations are based on
the assumption of enough CPU speed.
Let's look at sleep and timedwait operation. A thread makes a
sleep mostly for two purposes, waiting for some operations to finish,
or delay the current thread to save CPU resources. On the other
hand, the length of physical time used in timewait operation basically
means some operations are expected to be finished within the period
of time.
MT-distributed
Between two nodes in the distributed system, physical time ise used to
measure network latency. In addition, the physical time eclapsing
speed should be almost the same on different nodes.
Applicaion should not make any assumption of the absolute clock on
another computer.
Epoch-based Timing Model
Based on discussions above, here we propose a realime model for DMT
systems based on the concept of epoch: an epoch is a small period
of time, when each core processes at most one task.
Here we change the concept of epoch a little bit. At most one thread
is allowed to run on one CPU in a single epoch, and in addition, the
length of epoch is almost fixed.
With the definition of epoch, the good thing is that physical time
can be measured by the number of epochs taken. On the other hand,
making epochs with a fixed length slows down the execution. While
this approach is very similar with the one that fixes the realtime
length of each turn, it ticks the clock by the computation capabitliy,
instead of the number of threads/turns. The latter leads to abnormal
faster (measured by turns) or slower (measured by threads) speed of
time elapsing.
Question 1. How timedwait and sleep work with epochs?
We can do almost precise convertion between turn number and realtime.
Suppose each epoch's length is .1 ms, and a thread is goning to sleep
for 100 ms. We can convert the time into 1,000 turns. When the thread
wakes up from sleeping, the real clock elapses approximately
0.1 * 1000 = 100 ms.
Question 2. How lease protocol works with epochs?
The issue here is we want to make the time elapsing in the same speed
on different machines. This property can be implemented based on the
clock of local computer because the real clocks are running in almost
the same speed. If we have the assumption, lease protocol will work
like this. The lease server issues a lease with one minute's
availability including 5 seconds’ roundtrip cost. When the client
receives the lease, it launches a thread waiting for 40 seconds,
and the thread then returns the lease after waken up.
Based on the assumption, 40 seconds on the client takes approximately
40 seconds on the server, the lease availability is guaranteed.
Question 3. What if the system is suffering load imbalance?
Epoch + CPU approach is able to measure the computation scalability
of a machine. When a machine is running too many threads/processes,
it suffers an inherent slow down and epoch approach only makes an
extra slow down.
Question 4. What if an operatoins takes longer than an epoch?
This should be an interesting problem. If you explored any potential
performance optimization of the epoch approach, you would find out
that the longer operations are certainly be an issue.
First, the scheduler should always approach the exact epoch point
in execution: If an operation finishes before the next epoch point,
we delay it. And if it finishes after the next epoch point, we launch
the next operation immediately and hope it could balance the delay.
Question 5. How's the potential performance issue?
A simple solution is balance epoch length and operation cost.
If we set the epoch length long, we have determinism but the
application is slowed down. If we shorten the epoch length, the number
of longer operations increases and will lead to non-determinism, but
we gain a better performance. An easy way to obtain an appropriate
conversion rate is sampling.
An alternative optimization is to make coarse-grain clock tick. That
allows more chance to balance delay of late operation.
Implementation Plan
Hook real time functions. (done)
Add real time checker in schedule functions like getTurn
Merge previous code of realtime-turn conversion.
Add tests (almost done, but lack of inter-proc test)
Feb 12, 2012
Round-Robin scheduler is making a huge overhead without knowledge of blocking operations!
In this case we have to adopt a monitor thread checking and skiping blocked threads.
Because we have to wait for a reasonable time to make sure the operation is not
gonna return in this round. This blocks other thread for a threshold lenght every round!
In YFS, the configuration has to wait for 20 seconds for each client's setup.
This time slot used to be 1 second without DMT system!
Feb 9, 2012
Some notes on the handling of physical time.
It seems like some codes rely on physical time to accomplish operations,
and a timed out could be treated as error or something.
Round-Robin scheduler could cause thread imblance in terms of physical time.
Because some operations finsh quickly, while others take some time.
How to handle timedwait and sleep will an issue.
Feb 8, 2012
Discussion today focuses on evaluation and measurement.
Basically we have two targets before next week's meeting.
The first is to make a latency analysis of each operation,
and to use this data to figure out expected events and unexpected events step-by-step.
Secondly, after building a working DMT scheduler with expected/unexpected events,
try to obtain the approximate number of essential schedules required for an application.
Here are the working items.
Enable recording latency in log files together with insid. (Check with YiHong)
Enable Round-Robin scheduler with Jingyang's YFS. (Huayang)
Try to speed up Round-Robin schedule from the ugly wait-skip manner.
Setup benchmark configurations.
AGET (done). Download www.cs.columbia.edu/~huayang/files/demeter.pdf with 4 threads.
YFS. 3 lock servers, 1 extent server and 2 yfs_clients.
Execute a fixed sequence of operations on each clients in parallel.
Do latency analysis for AGET and YFS.
Figure out expected/unexpected events in the benchmarks.
Run YFS 1000 times to get the approx number of schedules.
Try some other configurations.
Feb 1, 2012
Discussion today focuses on how to demenstrate expected and unexpected in real programs.
The basic intuition here is to look at the expected response time of a blocking function.
Generally, an expected event should behave like following.
|
Latency Graph
This graph shows the relationship between response time and possibility.
x axis shows the response time, from 0 to INF.
y axis shows the possibility.
An expected event should have most responses appear in a relatvely short interval.
Events out of the expected interval are treated as unexpected event.
|
Jan 29, 2012
Design for a non-blocking runtime scheduler algorithm.
A simple timer to skip threads?
Dec 7, 2011
Currently the project focuses on measuring the effect of changing
external inputs for DMT system. So the concern is mostly about timing
issue.
What's done.
Instrumented external blocking functions.
One schedule point is inserted before the function call,
and another schedule point is inserted after the call.
Instrumented functions include
accept(),
connect(),
recv(),
recvfrom(),
recvmsg(),
read(),
select().
Being able to record a message trace.
The basic trace is a sequence of tuple (tid, op, turn).
Being able to reproduce (prefix of) the input message timing.
The current algorithm is to enforce schedule points after external call.
Being able to run a modified the message trace (prefix).
There are two possible outputs for a modified message trace (prefix):
either a successful run with corresponding new message trace,
or an error message like invalid message trace (prefix).
Being able to merge message trace and synchronization operations for a global (partial order) view.
What's next.
Sequantial transaction based scheduling algorithm
void block() // schedule point before ex_call
{
Remove the thread from scheduler
}
void wakeup() // schedule point after ex_call
{
while (true)
{
lock(scheduler);
if Round-Robin-Queue.empty()
{
add myself into the schedule
} else
{
donothing
}
unlock(scheduler);
}
general schedule
}
Timing Exploration Algorithm
void DFS(Sequence s, int prefix_fixed)
{
DFS(s, prefix_fixed + 1); // fix the next operation
op = s.get_operation(prefix_fixed); // the next variable operation
new_op_rt = op.inc_rt();
s_1 = s.prefix(prefix_fixed) + op;
s_2 = run_DMT_with_prefix(s_1); // notice op's real return time might not be new_op_rt, it's determined by the DMT scheduler.
// In addition, the next operation might not be op, and other operations can be brought forward.
if (s_2 is valid)
DFS(s_2, prefix_fixed);
}
|