Spring 2009 -- Junfeng Yang

Homework 3 is due 3/5 at 3:09pm EST

Individual written problems are to be done individually. Group programming problems are to be done in your assigned programming groups. All homework submissions and discussions are to be made via Courseworks.

Individual Written Problems (40 pts)

Exercise numbers refer to the course textbook, Operating System Concepts (8th Edition). Each problem is worth 5 points.

  1. Ch 4 Exercise 7
  2. Ch 4 Exercise 9
  3. Ch 4 Exercise 12
  4. Ch 6 Exercise 9
  5. Ch 6 Exercise 12
  6. Ch 6 Exercise 13
  7. Show how counting semaphore can be implemented using only binary semaphores and ordinary machine instructions.

  8. The solution to the readers and writers problem we learned in class favors readers and may starve writers. Provide another solution to the readers and writers problem that favors writers, i.e., once a writer is waiting to write to shared data, no new readers may start reading. Write your solution using pthread synchronization primitives. You are free to use mutex, semaphore, or condition variable.

Group Programming Problems (70 pts)

Programming problems are to be done in your assigned groups, using the VM that has been assigned to your group for any kernel programming that is needed. For all programming problems you will be required to submit source code, a kernel patch, a README file documenting your files and code, and a test run of your programs. In addition, you should submit a cover sheet using either homework_work.txt or homework_nonwork.txt, depending on whether or not the programming assignment is completely working or not. For source code submissions, you only need to submit new source code files that you created and kernel source code files that you changed. Please do not submit binaries or kernel images. You should clearly indicate your names, email addresses, and assigned group number on your submission. Each group is required to submit one writeup for the programming assignment.

For the user threads package part of this assignment, be sure to Submit a Makefile and a README file with your code. The Makefile should have at least a default target that builds all assigned programs. The README should explain the details of your solution, describe any way in which your solution differs from what was assigned, and state any assumptions you made (5 pts). Also, use the -Wall option to gcc when compiling. You will lose points if your code produces warnings when compiled. Note that the user threads package should be tested and run on any host 32-bit CLIC machine, not inside your guest VMs. The threads package is designed to only work on 32-bit x86 machines. To connect to those machines, you can ssh to clic32.cs.columbia.edu.

For the kernel programming part of this assignment, the image of the kernel you build for this assignment should be vmlinuz.hmwk3. Grading for this assignment will be done based on vmlinuz.hmwk3. You should use base your changes on the 2.6.11 kernel that you installed for Homework 2, but without the modifications you made for the new systems calls you wrote for Homework 2. In particular, submitted kernel patches should be patched against your modified KDB-enabled kernel from Homework 2, i.e. do not include KDB code in your patch.

  1. (25 pts.)Synchronization and threads in userspace

    In this problem you will add synchronization functionality to the implementation of SThreads, a simple threading library. Get the source files found in this directory. The files provide the source, header, and Makefile for the SThreads library. The publicly available functions and datatypes are in the file sthread.h. The library is used as follows:

    • Threads are manipulated using variables of type sthread_t. Consider sthread_t to be an opaque data type (i.e. only functions in sthread.c get to know what it really is).

    • sthread_init() must be called exactly once, as the first thing in main(). It returns 0 normally, and -1 on error.

    • A new thread is created using the function:

      int sthread_create(sthread_t *t, sthread_main_t main, void *arg).

      The first argument is where the sthread_t object is returned. The second is the function the new thread should run. The third argument is passed to this function. sthread_create() returns 0 normally and -1 on error.

    • You are also provided with sthread_self(), which returns the sthread_t associated with the currently running thread, as well as sthread_suspend(), which puts the currently running thread to sleep, and sthread_wake(sthread_t t), which wakes up a thread, given the thread's sthread_t object. Note that for the SThreads library, first waking up a running thread and then suspending it leaves the thread in a running state.

    • Usage example:

      #define _REENTRANT
      #include <stdio.h>
      #include <string.h>
      #include <errno.h>
      #include <unistd.h>
      #include "sthread.h"
      threadmain(void *arg)
        int threadno = (int)arg;
        for (;;) {
          printf("thread %d: I'm going to sleep\n", threadno);
          printf("thread %d: I woke up!\n", threadno);
        return 0;
      main(int argc, char *argv[])
        sthread_t thr1, thr2;
        if (sthread_init() == -1)
          fprintf(stderr, "%s: sthread_init: %s\n", argv[0], strerror(errno));
        if (sthread_create(&thr1, threadmain, (void *)1) == -1)
          fprintf(stderr, "%s: sthread_create: %s\n", argv[0], strerror(errno));
        if (sthread_create(&thr2, threadmain, (void *)2) == -1)
          fprintf(stderr, "%s: sthread_create: %s\n", argv[0], strerror(errno));
        return 0;

    Note the #define _REENTRANT at the top of the file. This is necessary in any multithreaded program, as it tells subsequent include files to use reentrant versions of library functions and thread-safe variables. For example, the errno variable, since it is a global variable, is normally not thread-safe. A function in thread A can set it on error, and thread B can set it to something else before the caller in thread A ever gets to read the value. This is an example of a race condition. If you #define _REENTRANT, though, errno is redefined to (*__errno_location()), which references a different address for each thread. An implementation of this function is provided in sthread.c.

    You will also be provided with the function test_and_set(int *x), which atomically sets the integer that x points to to 1, and returns its previous value. Using sthread_suspend(), sthread_wake(), and test_and_set(), you are to implement the missing semaphore primitives in the SThreads library.

    You may use test_and_set() to implement spinlocks, in which you repeatedly call test_and_set() and a no-op in a tight loop, waiting for the test result to be 0. sched_yield() is a good no-op function to use, but you don't have to worry about precisely what it does (not until the next homework, anyway!). Note that you can use spinlocks to synchronize access on your semaphores' shared data structures, but not to implement the semaphores themselves. In other words, if I call sthread_sem_down() on an unavailable semaphore (the count of the semaphore is negative or 0), I should suspend rather than spinning until the semaphore is available (the count of the semaphore is positive).

    Now comes the fun part. For this assignment, implement counting semaphores in the SThreads library. Put your function implementations in a single file, called sync.c, and your structure definitions in sync.h. Skeleton files are provided, but you may have to add functions and datatypes to these files. You shouldn't have to change sthread.c or sthread.h. In fact, you shouldn't even have to look at the implementations of the functions in sthread.c (but feel free if you want to, of course). Unless otherwise noted, all functions should return 0 on success and -1 on error.

    The prototypes for the five functions you must implement are found in sync.h, and are all named sthread_sem_*. You must define struct sthread_sem_struct. sthread_sem_init(sthread_sem_t *sem, int count) and sthread_sem_destroy() should be used to initialize and free resources related to this struct. The argument int count in sthread_sem_init(sthread_sem_t *sem, int count) indicates the count of the semaphore. The value of int count could be any positive number.

    sthread_sem_down() will decrement the semaphore by 1 if the value of which is greater than 0 (these two steps must be atomic), or it will block until another thread releases the semaphore and wakes it up. sthread_sem_up() will increment the value of semaphore by 1 if nobody is being blocked on it; if there are threads waiting for the semaphore, it should wake up one of the waiting threads; these two steps must also be atomic. sthread_sem_try_down() should obtain the semaphore and return 0 if the semaphore is available, otherwise return non-zero immediately. This function does not cause the caller to block.

    Your implementation should not allow starvation of any thread waiting for the semaphore. That is, if two threads are repeatedly obtaining and releasing a semaphore, that should not prevent a third thread from eventually getting the semaphore. (Of course, if one thread never release the semaphore, the others will starve, but that's a programming error, i.e. not our problem!)

  2. Synchronization in the Linux kernel

    For this problem, you will implement a new kernel synchronization primitive. Read the "How Processes Are Organized" section in Chapter 3, and the "Synchronization Primitives", "Atomic Operations", "Spin Locks" and "Read/Write Spin Locks" sections in Chapter 5 in Understanding the Linux Kernel. In Chapter 3, pay particular attention to the material about wait queues. Note that unlike the userspace semaphores you implemented for question 1, the synchronization primitives in Linux generally do not have owners and are not recursive. No owners means that there's nothing stopping any piece of code from unlocking every lock in the kernel (and bringing the kernel to a fiery screeching halt, probably). Not recursive means that the owner of a lock (even if there was such a thing) can't obtain the lock twice.

    1. (35 pts.) Design and implement a new kernel synchronization primitive: a barrier. Barriers are used in parallel programming to ensure that all processes complete one stage of computation before moving on to another (for example, a parallelized matrix multiply). Given a barrier of size N, all processes calling the barrier primitive will block until all N processes have called it, after which each may continue. The barrier is then reset back to N. Note that no other ordering is expressed or implied; e.g., the last process to arrive at the barrier may be the first process to leave or vice versa. Implement the following new system calls in the Linux kernel, which should be numbered 289 to 291, respectively.

      • int barriercreate(int num); Creates a new barrier of size num, returning the barrier ID on success, -1 on failure. Each barriercreate() call creates a distinct barrier object with a unique barrier ID.

      • int barrierdestroy(int barrierID); Destroy the barrier with the given barrier ID and notifies any processes waiting on the barrier to leave it (i.e., wake up and resume execution). Return number of processes notified on success and -1 on failure. Make sure that all processes being blocked at this barrier are waken up before you destroy this barrier (this is why you may need a reference count, see the HINT below).

      • int barrierwait(int barrierID); Blocks an individual process until N (the size of the barrier) processes have arrived; must verify that the barrier ID is valid. Return 0 on success and -1 on failure. The barrier being destroyed while a process waits is also a failure condition that should return -1.

      You should begin by thinking carefully about the data structures that you will need to solve this problem. Your system will need to support having multiple barriers open at the same time, so you will probably need a set of barrier descriptor data structures, each of which identifies an barrier. Those data structures will need to be put in a list from which your code can find the appropriate barrier descriptor correponding to the barrier that you need. Space for the barrier descriptors should be dynamically allocated, most likely using the kernel functions kmalloc() and kfree().

      You should not make any assumptions about whether the system is a uniprocessor or multiprocessor system. Be sure to properly synchronize access to your data structures. Moreover, be sure to select the appropriate synchronization primitives such that they are both correct and efficient, in this order. For instance, you should prefer a spinlock over a semaphore if-and-only-if you don't plan to sleep while holding it.

      You can choose to work at the level of wait queues using either the associated low-level routines such as add_wait_queue(), remove_wait_queue(), or the higher-level routines such as prepare_to_wait(), finish_wait(). You can find code examples both in the book and in the kernel. If you prefer to use functions such as interruptible_sleep_on() and sleep_on(), then plan carefully because they can be racy in certain circumstances.

      HINT: You should not use kmalloc() or kfree() in a critical section protected by spinlocks because the two functions may sleep (not efficient). As a result, a useful method to guarantee the validity of a data structure in the presence of concurrent create, access and delete operations, is to maintain a reference count for the object. Then, the object should only be freed by the last user. The file system, for example, keeps a reference count for inodes: when a process opens a file the reference count of the inode is incremented. Thus if another process deletes the file, the inode remains valid in the system (albeit invisible) because its count is positive. The count is decremented when the process closes the corresponding file descriptor, and if it reaches zero then the inode is freed. A reference count must be protected against concurrent access, either using explicit synchronization or using atomic types (see atomic_t in Chapter 5 in Understanding the Linux Kernel). Note that this mechanism may not be the best, you can design you own one. Just pay attention to efficiency.

      You are designing a completely new facility to add to the kernel. You will need to create a new implementation file and change the Makefile so that it compiles your new file. You are also likely to need to change a few parts of the existing kernel. You should properly handle errors that may occur and report meaningful error codes, e.g. -ENOMEM in the event that a memory allocation fails. As always, you are encouraged to look for existing kernel code that performs similar tasks and follow the coventions and practices it provides.

    2. (10 pts.) Write an application program to test your new kernel functions. Your test program should show that the kernel functions work for the usual cases, for example, when all N processes wait on a barrier, and also for boundary conditions such as (but not limited to)

      • Using a barrier of size 1;
      • M tasks calling a barrier of size N, where M > N;
      • Multiple barriers open simultaneously;
      • barrierdestroy() called, when processes are waiting;
      • N processes calling a barrier of size N.