COMS W4118: Homework Assignment 5

You will build your own Linux kernel for this class. For each homework assignment, you will build a new Linux kernel. The image of the kernel you build for this assignment should be vmlinuz.hmwk5. Grading for this assignment will be done based on vmlinuz.hmwk5.

For this problem, you will implement a new kernel synchronization primitive. Read the "How Processes Are Organized" section in Chapter 3, and the "Synchronization Primitives", "Atomic Operations", "Spin Locks" and "Read/Write Spin Locks" sections in Chapter 5 in Understanding the Linux Kernel. In Chapter 3, pay particular attention to the material about wait queues. Note that the synchronization primitives in Linux generally do not have owners and are not recursive. No owners means that there's nothing stopping any piece of code from unlocking every lock in the kernel (and bringing the kernel to a fiery screeching halt, probably). Not recursive means that the owner of a lock (even if there was such a thing) can't obtain the lock twice.

(25 pts.) Design and implement a new kernel synchronization primitive: a barrier. Barriers are used in parallel programming to ensure that all processes complete one stage of computation before moving on to another (for example, a parallelized matrix multiply). Given a barrier of size N, all processes calling the barrier primitive will block until all N processes have called it, after which each may continue. The barrier is then reset back to N. Note that no other ordering is expressed or implied; e.g., the last process to arrive at the barrier may be the first process to leave or vice versa. Implement the following new system calls in the Linux kernel, which should be numbered 318 to 320, respectively:
- int barriercreate(int num); Creates a new barrier of size num, returning the barrier ID on success, -1 on failure.
- int barrierdestroy(int barrierID); Destroy the barrier with the given barrier ID and notifies any processes waiting on the barrier to leave it. Return number of processes notified on success and -1 on failure.
- int barrierwait(int barrierID); Blocks an individual process until N processes have arrived; must verify that the barrier ID is valid. Return 0 on success and -1 on failure. The barrier being destroyed while a process waits is also a failure condition that should return -1.
You should begin by thinking carefully about the data structures that you will need to solve this problem. Your system will need to support having multiple barriers open at the same time, so you will probably need a set of barrier descriptor data structures, each of which identifies a barrier. Those data structures will need to be put in a list from which your code can find the appropriate barrier descriptor correponding to the barrier that you need. Space for the barrier descriptors should be dynamically allocated, most likely using the kernel functions kmalloc() and kfree().

You should not make any assumptions about whether the system is a uniprocessor or multiprocessor system. Be sure to properly synchronize access to your data structures. Moreover, be sure to select the appropriate synchronization primitives such that they are both correct and efficient, in this order. For instance, you should prefer a spinlock over a semaphore if-and-only-if you don't plan to sleep while holding it.

You can choose to work at the level of wait queues using either the associated low-level routines such as add_wait_queue(), remove_wait_queue(), or the higher-level routines such as prepare_to_wait(), finish_wait(). You can find code examples both in the book and in the kernel. If you prefer to use functions such as interruptible_sleep_on() and sleep_on(), then plan carefully because they can be racy in certain circumstances.

HINT: a useful method to guarantee the validity of a data structure in the presence of concurrent create, access and delete operations, is to maintain a reference count for the object. Then, the object should only be freed by the last user. The file system, for example, keeps a reference count for inodes: when a process opens a file the reference count of the inode is incremented. Thus if another process deletes the file, the inode remains valid in the system (albeit invisible) because its count is positive. The count is decremented when the process closes the corresponding file descriptor, and if it reaches zero then the inode is freed. A reference count must be protected against concurrent access, either using explicit synchronization or using atomic types (see atomic_t in Chapter 5 in Understanding the Linux Kernel).

You are designing a completely new facility to add to the kernel. You will need to create a new implementation file and change the Makefile so that it compiles your new file. You are also likely to need to change a few parts of the existing kernel. You should properly handle errors that may occur and report meaningful error codes, e.g. -ENOMEM in the barrier that a memory allocation fails. As always, you are encouraged to look for existing kernel code that performs similar tasks and follow the coventions and practices it provides.
(5 pts.) Write an application program to test your new kernel functions. Your test program should show that the kernel functions work for the usual cases, for example, when all N processes wait on a barrier, and also for boundary conditions such as (but not limited to)
- creating and using a barrier where N is 1;
- N-1 tasks waiting when barrierwait() is called;
- Have M tasks calling a barrier of size N, where M > N;
- multiple barriers open at one time, and
- processes waiting when barrierdestroy() is called.

COMS W4118 Spring 2008: Homework Assignment 5

Last Updated Apr 19th 2008

Due Friday, April 25th at 11:59 PM

Written Assignment

Programming Assignment: Synchronization

Description

Purpose

Before you start

Programming Assignment

Special Notes - Clarifications

What and How to Submit

References