Homework 2
W4118 Fall 2008
UPDATED: 9/30/2008 at 4:45pm EST
DUE: Friday 10/3/2008 at 11:59pm EST

Individual written problems are to be done individually. Group programming problems are to be done in your assigned programming groups. All homework submissions are to be made via Courseworks. Refer to the homework policy section on the class web site for further details.

Individual Written Problems:

Exercise numbers refer to the course textbook, Modern Operating Systems. Each problem is worth 5 points.
  1. Chapter 1, Exercise 25

  2. Chapter 1, Exercise 26

  3. Chapter 2, Exercise 3

  4. Chapter 2, Exercise 4

  5. Read the Linux source code and construct a state diagram to represent the relevant states that a process may be in at any given time. For each state, give the exact name used for the state in the source code.

  6. Describe what happens in the Linux kernel when a timer interrupt occurs. Be sure to list the procedures that are called and explain the function of each.

  7. Write the C code for a new Linux 2.6.11 system call getstate() that takes no arguments and returns the process running state of the current process. The system call should be numbered 223. You will need to modify the code in  entry.S  and unistd.h. Indicate which lines in those files need to be changed by referring to the line numbers given by the Linux source code navigator for the 2.6.11 kernel. You should create a new kernel function  sys_getstate() that performs the actual work for returning the running state.

  8. Describe what happens in the Linux kernel when a fork system call occurs. Be sure to list the procedures that are called and explain the function of each.

  9. In problem 3 of the programming assignment below, there is assembly code to manipulate each register sycall-logger pushes/pops EAX, ECX, EDX, EBX, ESP, EBP, ESI and EDI. What does each register hold and why do we need to do this?

Please complete and submit a private group programming assignment evaluation. The evaluation should be in a separate file called evaluation.txt that is submitted with your individual written assignment submission.

Group Programming Problems:

Programming problems are to be done in your assigned groups using one of the VMs that has been assigned to your group members. For all programming problems you will be required to submit source code, a README file documenting your files and code, and a test run of your programs. In addition, you should submit a cover sheet using either homework_work.txt or homework_nonwork.txt, depending on whether or not the programming assignment is completely working or not. For source code submissions, you only need to submit new source code files that you created and kernel source code files that you changed. You should clearly indicate your names, email addresses, and assigned group number on your submission. Each group is required to submit one writeup for the programming assignment.

You will build your own Linux kernel for this class. For each homework assignment, you will build a new Linux kernel. The image of the kernel you build for this assignment should be vmlinuz.hmwk2. Grading for this assignment will be done based on vmlinuz.hmwk2.

Note that for this and subsequent kernel programming assignments, you will be using a VMware VM to test and debug your kernels. To use VMware, you need an MRL account . If you setup this account for the first time, or reset your password, you must then change your password in the MRL lab at least once. Each student has already been allocated a VM, which can be accessed at http://vm.cs.columbia.edu. Use your MRL username and password to login to the VMware infrastructure. Once you have powered on your VM, use the username student and password given to you in class to login to the guest OS in the VM.

  1. (5 pts.) Compile and install your own Linux kernel on your VM. This can be a daunting task at first, and if you aren't careful, you can run into a lot of problems whose causes are not easily traced. There are many web-based resources with instructions on compiling your own kernel, such as The Linux Kernel HOWTO, which describes how to do this in some detail. Note that there are several issues that we must address:
  2. (5 pts) Patch a kernel to install SGI's the KDB kernel debugger. You will be installing KDB version 4.4. The files kdb-v4.4-2.6.11-common-3.bz2 and kdb-v4.4-2.6.11-i386-2.bz2 file have already been downloaded for you in /usr/src. The two files are both patches. To apply patches to a kernel tree, first see the patching instructions at kernelnewbies.org . You may also find the man page for patch(1) helpful. Move each patch into your linux kernel source directory, extract the files from the archive, and apply it with the patch command by running patch -p1 < patchfile from the toplevel source tree directory. Then recompile the kernel with the patch. After you have patched the kernel, the Documentation/ subdirectory will contain man pages on kdb. Read them to familiarize yourself with its operation. You can also read a summary of kernel debugging techniques (which include kdb) at: http://www.xml.com/ldd/chapter/book/ch04.html. At any time you may press the pause button to enter the debugger and, once in the the debugger, type go to exit to the ordinary command prompt. Though we won't require you to use the debugger for this assignment, we will be testing to make sure it is installed! Save a snapshot of your VM with KDB installed so that you can show it to the TAs as part of a demo.
  3. (40 pts.) Write two new system calls in Linux: In the previous homework you learned that the GNU Standard C library implements wrappers to kernel-level system calls that abstract how users interact with the Operating System. As we learned in Chapter 1.6, system calls are the means through which user-space program can access kernel services. In this assignment you will learn the intracacies of how the underlying architecture and Operating System interact and provide a clean interface to user-level applications. You will develop a system call that will allow us to tabulate and record the usage of all system calls by a specific process.

    Note that you will be adding this system call to the kernel based on the 2.6.11.12 kernel you previously built with kernel debugging installed. For the following discussion all relative paths refer to the top of your kernel source directory. For example, if your kernel is located in /usr/src/linux-2.6.11.12.hmwk2 then the relative path include/linux/types.h refers to the file /usr/src/linux-2.6.11.12.hmwk2/include/linux/types.h.

    The system call you write should take two arguments: int enable  to enable and disable logging for the current process, int max_entries  to define the number of system calls to record, and return an integer describing whether the operation was successful or not. As a side-effect, it will trigger a syscall-logger (described below) for the current process. The system call number must be 289. The prototype for this system call is:
    	int setrecording(int enable, int max_entries);

    Be aware that kernel resources are limited and memory should not be plainly allocated as in user-mode. So, we will pre-define a global maximum of 4000 entries in the system you should verify this global maximum against int max_entries  in the setrecording  system call. You will only record up to the first 'max_entries'.

    Your second system call will be used to retrieve the information stored in the kernel, it should take two arguments: an array of log structures allocated by the user-application and the number of log structures you have allocated in user-land. The return value should be similar to the return value of read(): on success, the number of struct reclog copied is returned. It is not an error if this number is smaller than the number of struct reclog requested; this may happen because there are fewer  struct reclog than requested such as when there is nothing to retrieve return 0. The system call number must be 290. The prototype for this system call is:
    	int getrecording(struct reclog *log, int nr_logs);
    Once this function finishes,  struct reclog *log will have an array of ordered system calls. NOTE that you are still recording until you call  setrecording to disabled the syscall-logger. For example, if max_entries is 5, 5 system calls were recorded, and nr_logs was 3, you should now have room to record 3 more system calls.

    Define struct reclog as
    struct reclog {
    long syscall_nr; /* system call number */
    long arg1; /* arguments of system call */
    long arg2;
    long arg3;
    long arg4;
    long arg5;
    long arg6;
    };

    in include/linux/recinfo.h as part of your solution. You should also be able to #include it as part of a user-land application (see prob. 4).

    Your code should handle all errors that can occur. At a minimum, your system call should detect and report the following error conditions:

    The referenced error codes are defined in include/asm-generic/errno.h.


    Syscall-logger: System calls are implemented as traps or software-interrupts that transfer control to a specific kernel code, Linux has registered software interrupt 0x80 in the interrupt descriptor table for invoking system calls. Utilizing the %eax register as the index into the sys_call_table in arch/i386/kernel/entry.S the system call handler jumps to the specific code in the kernel for that system function.



    You will add to arch/i386/kernel/entry.S the following assembly code provided to create the system call logger. Line numbers are clearly indicated where they should be added (251-267 or with KDB patch 263-279):

    250 syscall_call:
    251         pushl %eax                     # pushad is the ISA standard
    252         pushl %esp
    253         pushl %ebp
    254         pushl %edi
    255         pushl %esi
    256         pushl %edx
    257         pushl %ecx
    258         pushl %ebx
    259         call record_syscall            # HW2: SYSCALL LOGGER FUNCTION
    260         popl %ebx
    261         popl %ecx
    262         popl %edx
    263         popl %esi
    264         popl %edi
    265         popl %ebp
    266         popl %esp
    267         popl %eax
    268         call *sys_call_table(,%eax,4)
    269         movl %eax,EAX(%esp)             # store the return value
    270 syscall_exit:

    Create the function  record_syscall() store it in arch/i386/kernel/sysrecord.c you must add sysrecord.c to the correct compilation directive in the Makefile.

    	asmlinkage void record_syscall(long arg1, long arg2, long arg3, long arg4, long arg5, long arg6);
    The above call will record information on the arguments given to the original system call, but be aware that the system call number is not in the parameters it must be fetched from the %eax register with the following code (must be the first instruction executed in function):
    	long index;
    __asm__("movl %%eax, %0;" : "=r"(index) ); /* Save EAX into 'index' */

    Additionally record_syscall() should maintain struct kern_reclog in a linked list constructed using the kernel list structures in include/linux/list.h:

    struct kern_reclog {
    long syscall_nr; /* system call number */
    long arg1; /* arguments of system call */
    long arg2;
    long arg3;
    long arg4;
    long arg5;
    long arg6;
    ... /* free to use structures to maintain this information */
    };

    This structure will be dynamically allocated when record_syscall() is called and will be free'd in getrecording() once they've been copied over to the user buffer.


    Your submission should be a kernel patch off of the Linux kernel with KDB that you created in the previous problem. A patch is used to store changes you've made to the kernel and to submit your changes for this problem. For example, if you've changed 50 lines of code, a patch will be about 50 lines long. This is much easier to store, email, or move around than the full 283,000-line source tree. To create a patch, first back up your .config file, then do a ``make distclean'' in your changed source tree (you don't want object files, config files, etc. in your patch). Then, from the directory above your source tree (e.g. /vmware1/src), run

    diff -ruN linux-2.6.11.12 linux-2.6.11.12-changed > changed.patch

    Notice that the original source tree is first, and the one containing your modifications is second. There is a particular ``algebra'' to patches. You may find it helpful to think of diff as subtracting two source trees. The result of this subtraction is a patch. To apply a patch, then, is to add this difference back. If you supply the -R option to the patch program, you subtract the difference, instead of adding. If you thus ``subtract the difference'' from your changed tree, you end up with the original tree. So, to contain all the information of n kernel trees, all you need is the n - 1 patches between them, and only one full kernel tree (and it can be any one of the n).

    Big Hint:
    In order to learn about system calls, you may also find it helpful to search the Linux Kernel for other system calls and see how they are defined. The file kernel/timer.c might give some useful examples of this. The getpid and getuid system calls might be useful starting points. The system call sys_getpid defined in kernel/timer.c uses current macro and provides a good reference point to how you should define your system call.

  4. (5 pts.) To test your system call, write a simple program called test-record that calls setrecording & getrecording. You should have modified the appropriate fields in linux/include/asm-386/unistd.h so as to be able to utilize the _syscallN() macros in your user-application. Your program should
    1. Enable the systemcall-logger
    2. Call some system calls: any number of system calls you want. Verify this with the linux command strace.
    3. Disable the systemcall-logger
    4. Retrieve the logged entries
      e.g:
    user> test-record
    Calling setrecording(1)...
    Calling some system calls to log:
    Testing printf()
    Testing open()
    Testing close()
    Calling setrecording(0)...
    Calling getrecording()... SUCCESS

    -------------------------------
    Here is the list of calls made:
    ...
    N. __NR_[289] (0, 5, 123, -1207956256, 0, -831438848)
    ...
    ...
    ...
    ...
    ...
NOTE: Although system calls are generally accessed through a library (libc) as in the figure in problem 3, your test program should access your system call directly. This is accomplished by utilizing the generall purpose syscall(2) system call (consult its man page for more details).

The output of the program should be easy to read, and the program should check for errors such as invalid input or too many/few arguments. 

Your program should be compiled using the -static option to gcc to ensure that it captures all system calls. This is due to some optimized methods for executing system calls that are used by Linux for more modern CPUs.
  1. Additional Information:

Image reference: http://www.linux.it/~rubini/docs/ksys/