Spring 2009 -- Junfeng Yang

UPDATE: Homework 4 deadline is extended to 3:09pm EDT, 03/31

Homework 4 is due 03/26 at 3:09pm EDT

The written problems are to be done individually. The programming problems are to be done in your assigned programming groups. All homework submissions are to be made via Courseworks. Refer to the homework policy section on the class web site for further details.

Written Assignment (30 pts)

Exercise numbers refer to the course textbook, Operating System Concepts. Each problem is worth 5 points.

  1. Write a code example in less than 20 lines of C code that causes the lock order algorithm for deadlock detection (discussed in class) to report a false positive.

  2. Eraser uses per-thread locksets, i.e., the locks in a thread's lockset are only those that the thread acquired. Of course, the locks that are currently locked is a global property and do not depend on which thread is running. What would happen if Eraser used this global rather than per-thread lockset when checking for races? Give a concrete code example in less than 20 lines of C.

  3. Ch 5 Exercise 12
  4. Ch 5 Exercise 15
  5. Ch 5 Exercise 17
  6. Describe how the time slice for a SCHED_NORMAL process can be updated in Linux 2.6.11. Be sure to list the relevant procedures, data structures, and variables; explain the meaning of each.

Please complete and submit a private group programming assignment evaluation. The evaluation should be in a separate file called evaluation.txt that is submitted with your individual written assignment submission.

Programming Assignment: Adding Fair Scheduling to Linux (70 pts)

Programming problems are to be done in your assigned groups using copies of the virtual machine (VM) image already provided. For all programming problems you will be required to submit source code, a README file documenting your files and code, and a test run of your programs. In addition, you should submit a cover sheet using either homework_work.txt or homework_nonwork.txt, depending on whether or not the programming assignment is completely working or not. For source code submissions, you only need to submit new source code files that you created and kernel source code files that you changed. You should clearly indicate your names, email addresses, and assigned group number on your submission. Each group is required to submit one writeup for the programming assignment.

You will build your own Linux kernel for this class. The image of the kernel you build for this assignment should be vmlinuz.hmwk4. Grading for this assignment will be done based on vmlinuz.hmwk4. You will base your changes on the 2.6.11 kernel that you installed for Homework 2, but without the modifications you made for the new systems calls you wrote for Homework 2. In particular, submitted kernel patches should be patched against your modified KDB-enabled kernel from Homework 2, i.e. do not include KDB code in your patch.


The assignment is to add a fair share scheduler to the kernel.

The Linux scheduler repeatedly switches between all the running tasks on the system, attempting to give a fair amount of CPU time to each task. Fair-share scheduling is a strategy that shares the CPU among sets of processes according to the users that own those processes. For example, suppose that Alice, Bob, and Carol belong to different users and are logged in to a machine that uses fair-share scheduling. Suppose Alice has one runnable process, Bob has three, and Carol has six. Fair-share scheduling views the ten runnable processes as belonging to the three users, and each user receives one-third of the CPU cycles to allocate to the processes that it owns. So Alice's single process would get about 33 percent of the available CPU cycles, each of Bob's three processes would get roughly 11 percent of the available CPU cycles, and each of Carol's six processes would get about 5.5 percent of the CPU cycles.

Add a new scheduling policy to support fair-share scheduling. Call this policy UWRR, for user weighted round-robin scheduling. UWRR should use fair-share scheduling based on each process's UNIX user identification. At each invocation of the scheduler, we will use a hierachical scheme to choose which task to run: first, a user is chosen, then, a task within that user's set of tasks is chosen. If we allow each user's chosen task the same amount of time on the CPU, each user should be represented equally.

Since we are making two choices in each scheduling decision, we need two algorithms: one to choose the user, and one to choose one of that user's tasks. Let's start by assuming we will use a Round-Robin scheme to decide which user to choose. We keep a queue of users, and whichever user was chosen last time we scheduled, we choose the next user in the queue for this schedule. We then choose a task within this user's set of tasks (more on this later), and let it run for a fixed amount of time (the time quantum). Now every user gets an equal amount of CPU time.

Now let's say that some users are more equal than others. Imagine that we associate an integer, which we'll call a weight, with each user, and when a user is chosen, we let the user's chosen tasks run for W time quanta (instead of a single time quantum), where W is the user's weight. In this way we can specify that some users get more CPU time than others, and also how much more. A user with a weight of 3 will get 50% more CPU time than a user with weight 2. This is called proportional sharing. More specifically, this implementation of proportional sharing is called Weighted Round-Robin.

We still haven't specified how a task is to be chosen once we choose a user. For simplicity, use a simple round-robin (RR) scheduling algorithm to do that. The intra-user RR scheduler should use the same default timeslice as the Linux scheduler uses for a task with the default nice value. Otherwise, the RR scheduler should work the same as UWRR except that it schedules tasks, not users, and there are no different task weights.

You are strongly advised to read Understanding Linux Kernel Chapter 7 Process Scheduling, before starting the programming assignment. Most of the code of interest for this assignment is in kernel/sched.c and include/linux/sched.h. These are probably not the only files you will need to look at, but they're a good start. While there is a fair amount of code in these files, a key goal of this assignment is for you to understand how to abstract the scheduler code so that you learn in detail the parts of the scheduler that are crucial for this assignment and ignore the parts that are not.

  1. (60 pts.) Implement a new task scheduler - User-based Weighted Round-Robin (UWRR)

    • Implement user weights. Add two system calls, getuserweight() and setuserweight(), with the following prototypes:

      int getuserweight(int uid);

      int setuserweight(int uid, int weight);

      Give getuserweight() syscall number 318, and setuserweight() number 319. getuserweight() should return the weight of the specified user, or -1 on error. The default user weight should be 10. setuserweight() should give the specified user the specified weight, and return 0, or -1 on error.

      For either call, a uid value of -1 indicates the current user (that is, the owner of the task that is making the system call). Otherwise, the range of valid uids is 0-65535. A bad uid should result in a return value of -1 and errno set to EINVAL. Only root is allowed to call setuserweight(); a call by any other user should return -1 and set errno to EACCES. Also, a user's weight must be greater than zero. Any attempt to set a bad weight is an error, and errno should be set to EINVAL.

      Put your code for these two system calls at the bottom of kernel/sched.c.

    • Your scheduler should operate alongside the existing Linux scheduler. Therefore, you should add a new scheduling policy, SCHED_UWRR. The value of SCHED_UWRR should be 4. Only tasks whose policy is set to SCHED_UWRR (normally done via the system call sched_setscheduler()) should be considered for selection by your new scheduler.

      Your scheduler and the existing scheduler should interoperate as follows:

      • If there are any running tasks whose policy is SCHED_RR or SCHED_FIFO (the real-time tasks), one of them should be chosen before any other, according to the existing Linux scheduler rules.

      • If there are any running tasks whose policy is SCHED_UWRR (your scheduler), one of them should be chosen before any SCHED_NORMAL (the default scheduler) tasks, according to the rules outlined.

      • If all running tasks have a policy of SCHED_NORMAL (or SCHED_BATCH), the default scheduler should be used to choose a task, according to the default scheduler's rules.

      In other words, there should be a strict priority relationship where SCHED_RR and SCHED_FIFO are first, SCHED_UWRR is second, and SCHED_NORMAL and SCHED_BATCH are last.

      Making your scheduler operate alongside the existing may seem like an unnecessary difficulty, but it spares you the (significant) difficulty of dealing with scheduling during initialization, so you can get a system up and running before any of your own code goes into action.

    • If a user who has no runnable SCHED_UWRR tasks has a SCHED_UWRR task become runnable, that user should go to the end of the round-robin queue (should be the last user to get a turn).

    • The base time quantum should be 100 ms. This means that if a user has a weight of 5, she gets 500 ms at a time with which to run her SCHED_UWRR tasks. Note that this does not mean that every one of her SCHED_UWRR tasks runs for 500 ms. How long a task is allowed to run depends on the intra-user RR scheduler.

      Note that you will need to change the timer interrupt handler so that schedule gets called at the right time. See scheduler_tick() in kernel/sched.c.

    • Linux uses an array data structure for keeping track of tasks for scheduling. Tasks in this array are indexed by priority. You may find this data structure useful for implementing UWRR. However, your scheduler should be designed to work with any number of users and should not assume only a fixed number of users.

    • Hint: when you see the need_resched field of the currently running task get set to 1, this is a signal to some assembly code on the return path of the interrupt handler. It causes schedule() to get called just before returning from the current interrupt handler, and is the correct way to make a schedule happen from within an interrupt handler, where you can't call schedule() directly.

    • Hint: the user_struct structure in include/linux/sched.h is used for tracking users. This structure may be helpful in creating a useful way to keep track of users.

  2. (10 pts.) Create a simple test program to show that your scheduler works

    Write a program that goes into an infinite while(1) loop. This program would normally take up 100% of the cpu.  Now create three different users Alice, Bob, and Carol and have them each run one or more copies of the loop program. Use the program top to monitor the CPU usage, both before and after you make your scheduler modifications.  Show that your scheduler is both fair (i.e., if the weights are set equally Alice, Bob, and Carol each get 1/3 of the CPU) and can be weighted (i.e., set weights 3, 2, and 1 and show that Alice gets three time the CPU of Carol and 50% more than Bob).  Users can be added to the system using the useradd program (e.g., useradd alice).

  3. Create the patch file for submission

    Your submission should be a kernel patch of the Linux kernel with KDB that you created in the previous problem. A patch is used to store changes you've made to the kernel and to submit your changes for this problem. For example, if you've changed 50 lines of code, a patch will be about 50 lines long. This is much easier to store, email, or move around than the full 283,000-line source tree. To create a patch, first back up your .config file, then do a "make distclean" in your changed source tree (you don't want object files, config files, etc. in your patch). Then, from the directory above your source tree (e.g. /usr/src), run

    $ diff -ruN linux- linux- > homework4.patch

    Notice that the original source tree is first, and the one containing your modifications is second. There is a particular "algebra" to patches. You may find it helpful to think of diff as subtracting two source trees. The result of this subtraction is a patch.

Additional Information

  • Backup your kernel changes outside your VM often. Sometimes your virtual disk can get corrupted, and you may lose all changes you've made.
  • Remember, as a safety measure, you are strongly encouraged to backup your VMware image from time to time.
  • printk statements in system calls will print their output to the console whenever they are invoked. To request printk messages be sent to a log file, insert the following line into the /etc/syslog.conf file:
    kern.* /var/kern.log
    This will cause printk messages to be written to /var/kern.log. You can send a signal to request the syslog daemon re-read the updated /etc/syslog.conf file, with the command "kill -1 pid" where pid is the pid of the syslogd process.
  • A lot of your problems will come from system administration issues. If nobody in your group in familiar with unix, you might want to pick up a book on Unix/Linux system administration.
  • You can find some Linux programming tips at the course resources page.