W4118 OPERATING SYSTEMS I

Spring 2010 -- Junfeng Yang

Homework 2 is due Tuesday 2/16 at 12:01 AM ET.

The written problems are to be done individually. The programming problems are to be done in your assigned programming groups. All homework submissions are to be made via Courseworks. Refer to the homework policy section on the class web site for further details.

Written Assignment (40 pts)

Exercise numbers refer to the course textbook, Modern Operating Systems. Each problem is worth 5 points. Make your answers concise. You'll lose points for verbosity.
  1. Assume x86 hardware. What's the current privilege level (CPL) when a regular user program is executing on the CPU? What's the CPL when this program is started by the superuser? Explain your answer.
  2. MOS 2.4
  3. MOS 2.7
  4. MOS 2.13
  5. Run "strace ls" and report what functions are used to (1) open the current directory, (2) get the list of directory entries, and (3) print the output to your screen. Do the same for "ltrace ls."
  6. Describe what happens when a user processes passes a pointer as a system call argument. Why?
  7. Read the Linux source code and describe what happens when a user process calls libc fork(). Be sure to list the procedures that are called and explain the function of each.
  8. Read the Linux source code and construct a state diagram to represent the relevant states that a process may be in at any given time. For each state, give the exact name used for the state in the source code.

Please complete and submit a private group programming assignment evaluation. The evaluation should be in a separate file called evaluation.txt that is submitted with your individual written assignment submission.

Programming Assignment: System Call Fault Injector (60 pts)

Programming problems are to be done in your assigned groups using copies of the virtual machine (VM) image we provide (more on how to acquire a copy of our VM image later). For all programming problems you will be required to submit source code, a README file documenting your files and code, and a test run of your programs. In addition, you should submit a cover sheet using either homework_work.txt or homework_nonwork.txt, depending on whether or not the programming assignment is completely working or not. For source code submissions, you only need to submit new source code files that you created and kernel source code files that you changed. You should clearly indicate your names, email addresses, and assigned group number on your submission. Each group is required to submit one writeup for the programming assignment.

You will build your own Linux kernel for this class. For each homework assignment, you will build a new Linux kernel. The image of the kernel you build for this assignment should be vmlinuz.hmwk2. Grading for this assignment will be done based on vmlinuz.hmwk2.

Note that for this and subsequent kernel programming assignments, you will be using a VMware VM to test and debug your kernels.

  1. (5 pts.) Install VMWare and acquire a copy of the class VM. Follow the instructions at the main homework page to acquire a copy of the class VM.
  2. (10 pts.) Compile and install your own Linux kernel on your VM.

    This can be a daunting task at first, and if you aren't careful, you can run into a lot of problems whose causes are not easily traced. There are many web-based resources with instructions on compiling your own kernel, such as The Linux Kernel HOWTO, which describes how to do this in some detail. Note that there are several issues that we must address.

    NOTE: All the commands to be used for building and deploying the require root privileges. You're advised to prepend "sudo" to these commands to get the required privilege level

    2.1. Extract the sources
    Kernel sources have already been installed in your VM in /home/student/linux-2.6.11.12.tar.bz2 . You will use these sources for building the kernel. As far as building the kernel is concerned, it doesn't actually matter where you put the sources, so we'll use /home/student:

    $ cd /home/student
    $ tar -xjvf linux-2.6.11.12.tar.bz2


    We recommend that you keep a backup of your VMware image from time to time, as the data in the VMs is not automatically backed up. Thus, should your VM image crash, you can restart with a backed up image.

    2.2. Apply an assembler patch
    The Linux version you are using utilizes a newer assembler than the one originally available to compile the Linux Kernel 2.6.11. Because of this you must apply a patch to the linux source code before compiling. Please be aware that this is not typically the case!!

    $ cd /home/student/linux-2.6.11.12/
    $ patch -p1 < ../linux-2.6-seg-5.patch

    2.3. Patch a kernel to install SGI's KDB kernel debugger

    You will be installing KDB version 4.4. The files kdb-v4.4-2.6.11-common-3.bz2 and kdb-v4.4-2.6.11-i386-2.bz2 file have already been downloaded and extracted for you in /home/student. The two files are both patches. To apply patches to a kernel tree, first see the patching instructions at kernelnewbies.org. You may also find the man page for patch(1) helpful. Move each patch into your linux kernel source directory, and apply it with the patch command by running $ patch -p1 < patchfile from the top-level source tree directory. Then recompile the kernel as specified in step 2 earlier. After you have patched the kernel, the Documentation/ subdirectory will contain man pages on kdb. Read them to familiarize yourself with its operation. You can also read a summary of kernel debugging techniques (which include kdb) here. At any time you may press the pause button to enter the debugger and, once in the debugger, type go to exit to the ordinary command prompt. Though we won't require you to use the debugger for this assignment, we will be testing to make sure it is installed!

    2.4. Taking new patches (this is only FYI; will be useful later)
    The source will extract into a directory called linux-2.6.11.12 (for a version 2.6.11.12 kernel). Don't touch this directory! As we'll see, it's useful to have a copy of the pristine sources lying around. Instead, make a copy with the command ``cp -al linux-2.6.11.12 linux-2.6.11.12-pudding'', replacing pudding with the desired suffix that will uniquely identifies your kernel, such as "-hmwk2". The -al flags to cp make the copied tree use all hard links instead of full copies of files. What does this mean? Run ``du linux-2.6.11.12'', then ``du -c linux-2.6.11.12 linux-2.6.11.12-pudding''. One tree takes up 268 MB, but two trees together take up 273 MB! The extra 5 MB go to the directories themselves, which can't be hardlinked (think about why this is). You can now make all your changes to the copy. Since you will generally only change a few files in the kernel, this is a big win in terms of disk space when keeping around multiple slightly different source trees.

    A little care is needed to make sure the linked files are copied when written. If you use emacs to edit, you should have no problem: when you first make a change to a file in emacs, it moves the original file link to a backup (the ~ file), and makes a copy to work on. Likewise, the patch program moves the original link out of the way and makes a copy before patching a file. However, IF YOU USE VI, YOU WILL HAVE TO RUN ``mv file file~; cp file~ file'' BEFORE YOU START EDITING. Yes, this may seem like a pain in the neck right now, but you'll be glad for the time saved. You may be able to avoid this pain by setting a vi option: put "set bkc=no" in your ~/.vimrc and then you can use vim on the hardlinks tree without having to do that annoying mv/cp dance (from Adam Joseph Weiss).

    $ time cp -a linux-2.6.11.12 linux-2.6.11.12-realcopy
    real 0m41.476s user 0m0.140s sys 0m3.340s
    $ time cp -al linux-2.6.11.12 linux-2.6.11.12-linkcopy
    real 0m0.428s user 0m0.030s sys 0m0.400s


    Then, make some changes to the copies, and make a diff with the original sources (more about this later):

    $ time diff -Naur linux-2.6.11.12 linux-2.6.11.12-realcopy > /tmp/patch1
    real 0m2.615s user 0m1.250s sys 0m1.360s
    $ time diff -Naur linux-2.6.11.12 linux-2.6.11.12-linkcopy > /tmp/patch2
    real 0m0.278s user 0m0.160s sys 0m0.120s

    2.5. Compiling the kernel
    To compile the kernel so that it recognizes your hardware, you need to set up a config file. You can find a sample config file in your VM at /home/student/w4118.config that you may use within the VMware environment. This config file is confirmed to work with your recently downloaded kernel version, as well as with the patched one you will be building for part 2 of this assignment. In order to use this config file, simply copy it to .config in the top-level of your source tree and type $ make oldconfig. The kernel should configure itself without any input necessary on your part. You can alter the configuration options of the kernel by running $ make menuconfig. In particular, one useful option is the "General setup ---> Local version" option: a string that is appended to kernel release, and can be used to identify your kernel. For instance, you may change the local version to ".hmwk2" for homework#2.

    Once the kernel is compiled, you must install the new boot image and tell the boot manager where to find it. To install a new kernel, do: $ make install

    On the first time that you install a new kernel, and every time you make changes that may affect modules (e.g. modifying an important include file) you also need to install the modules. Assuming that your kernel version suffix is ".hmwk2", you can do these by:

    $ make modules_install
    [edit /boot/grub/menu.lst, as described below]
    [reboot with new kernel]


    To let the bootloader grub, know where to find your kernel, be sure to enter the appropriate information into your /boot/grub/menu.lst file. You can create a new entry by copying an existing one and changing all instances of the kernel version to 2.6.11.12.hmwk2, for example:

    title Ubuntu, kernel 2.6.11.12.hmwk2
    root (hd0,0)
    kernel /boot/vmlinuz-2.6.11.12.hmwk2 root=/dev/sda1 ro
    savedefault


    Verify that the kernel image is placed in /boot

  3. (40 pts) Add a new system call in Linux.

    4.1. Implement the system call
    In the previous homework you learned that the GNU Standard C library implements wrappers to kernel-level system calls that abstract how users interact with the Operating System. As we learned in class, system calls are the means through which user-space program can access kernel services. In this assignment you will learn the intracacies of how the underlying architecture and Operating System interact and provide a clean interface to user-level applications. Specifically, you will develop one system call that can inject an artificial failure to a future system call made by the current process.

    Many system call failures are not fatal, and must be properly handle by important programs such as the daemon programs in /etc/init.d. However, since normally system calls won't fail, it's difficult to comprehensively test a user program against these failures. As a result, many system call failures are mishandled, causing bigger problems when system calls do fail occasionally.

    NOTE: For the following discussion all relative paths refer to the top of your kernel source directory. For example, if your kernel is located in /home/student/linux-2.6.11.12.hmwk2 then the relative path include/linux/types.h refers to the file /home/student/linux-2.6.11.12.hmwk2/include/linux/types.h

    In this assignment, you will implement a system call failure injector. So how to inject an artificial failure to a system call? Without actually doing the real system call, you can simply return an error code. To specify what system calls we want to inject failures to, and what error code to return, you will use the following struct:

                struct syscall_failure {
                        long syscall_nr; /* what system call to fail */
                        long error; /* what error code to return */
                }; 

    To tell the kernel to fail a system call, you'll add a new system call fail() with the following signature:

                int fail(int ith, int ncall, struct syscall_failure* calls); 

    The arguments have the following meanings. The argument calls is an array of syscall_failure structures. The syscall_nr fields of this array specify the system calls we want to inject failures to, and the error codes we want to return. Its length is specified as the ncall argument. This array helps to make our failure injector more targeted. The ith argument tells the kernel to fail the ith subsequent system call that matches one of the system calls listed in calls. Note ith starts from 0, i.e., the immediate next system call to fail is the 0'th call. In addition, whenever a system call matches one of the system calls in calls, you should count it as a matching call toward ith. Another way to put it is: for each fail() invocation we only inject one failure; we don't need to inject a failure for each different syscall_nr listed in calls.

    An important thing to note is that our fail() should only count the number of times the selected system calls are called by the current process. This necessitates storing the state information in a process-specific way. Linux maintains a list of all processes in a doubly linked list. Each entry in this list is a task_struct structure, which is defined in include/linux/sched.h. Read Understanding the LINUX KERNEL chap.3 for more information on this.

    Note: you must make sure that the data structure you add to task_struct is properly initialized. In addition, when the current process exits, you should free the memory you use to avoid memory leaks. You don't need to copy your failure injection data structure in fork().

    Your code should handle all errors that can occur when users call fail(). At a minimum, your system call should detect and report the following error conditions:

    • -EINVAL: if ith is negative or ncall is non-positive.
    • -EFAULT: if the argument calls is outside the accessible address space.
    • -EINVAL: if a fail() has been issued, and the ith matching system call hasn't been reached, and the current process calls calls fail() again. That is, we only allow one failure injection attempt at a time.
    The referenced error codes are defined in include/asm-generic/errno.h.

    Hint: In order to learn about system calls, you may also find it helpful to search the Linux kernel for other system calls and see how they are defined. The file kernel/sched.c might give some useful examples of this. The getpid and getuid system calls might be useful starting points.

    4.2. Add the system call to the kernel
    You will be adding this system call to the kernel based on the 2.6.11.12 kernel you previously built with kernel debugging installed. Define struct syscall_failure in include/linux/syscall_fail.h as part of your solution. You should also be able to #include this header file as part of a user-space application (see section 5).

    Define the function sys_fail() and store it in arch/i386/kernel/syscall_fail.c. You must add syscall_fail.cto the correct compilation directive in the Makefile of that directory. You need to create a system call entry (in arch/i386/kernel/entry.S); use number 223, which is currently unused. Your kernel has partially reserved a slot in the syscall table for ni_syscall (when you see it, you will know what we are talking about). Since ni_syscall means "not implemented", do not be afraid to commandeer that slot.

    On some occasions the process can die suddenly due to some fault and if you do not clean the structure you create for failure injection you will be leaking memory. You must handle this by defining a function free_fail() and storing it in arch/i386/kernel/syscall_fail.c. This function should free memory you use for failure injection. You should call this function from free_task in kernel/fork.c

    4.3. Add a hook to the system call handler
    The correct functioning of fail() requires that we keep track of all the system calls invoked by our process. In other words, we've to create a hook into the system call handler, to be able to update our process-specific state information and return the specified error code, when required. System calls are implemented as traps or software-interrupts that transfer control to a specific kernel code. Linux has registered software interrupt 0x80 in the interrupt descriptor table for invoking system calls. Utilizing the %eax register as the index into the sys_call_table in arch/i386/kernel/entry.S the system call handler jumps to the specific code in the kernel for that system function. You will have to make changes to arch/i386/kernel/entry.S to create a hook into the system call. Line numbers are indicated for your reference (250-263 or with KDB patch 263-276):

    250 syscall_call:
                                            ##### begin of new code #####
    251     pushl %eax                      # save caller-saved registers,
    252     pushl %ecx                      # just in case
    253     pushl %edx
    254     pushl %eax                      # push syscall_fail() argument
    255     call syscall_fail               # call our internal function
    256	cmpl $0, %eax                   # compare return value with 0
    257   	jl syscall_skip                 # if less than 0, don't invoke the 
                                            # system call, instead return our error code
    258     addl $4,%esp                    # skip syscall_fail() argument
    259	popl %edx                       # restore caller-saved registers
    260	popl %ecx
    261	popl %eax
                                            ##### end of new code #####
    262 	call *sys_call_table(,%eax,4)
    263     movl %eax,EAX(%esp)	     # store the return value
    264   syscall_exit:
              

    After inserting the above pice of code, the following lines need to be added (line no. 268-275, or with KDB patch 275-282 ):

    265   restore_all:
    266        RESTORE_ALL
    267
    268    ALIGN                           ###### begin of new code #####
    269  syscall_skip:
    270    movl %eax,(16+EAX)(%esp)        # store injected error as syscall 
                                           # return in current stack.  
                                           # +16 because we just pushed 
                                           # 4 registers on stack.
    271    addl $4,%esp                    # skip syscall_fail() argument
    272    popl %edx                       # restore caller-saved registers
    273    popl %ecx
    274    popl %eax
    275    jmp syscall_exit                ###### end of new code #####
              

    Our hook code, besides performing register store and restore, invokes an internal function, syscall_fail, which takes a long argument, the system call number. If the return value from this function is negative (which means that we have an error value to return for this invocation of the specified system call), then we directly jump to the syscall_skip label (i.e. we do not perform the actual system call). If the return value from syscall_fail is non-negative, we let the actual system call to be invoked (line no. 256). We need to save registers to current kernel stack using pushl (line 250-252) before we call syscall_fail because a function call may modify registers. Once we return from syscall_fail, we must pop the registers from the stack using poplin reverse order (line 258-260, and also line 272-274). Line 253 sets up the argument to syscall_fail by pushing the system call number from %eax to current stack. You can make a similar modification around another invocation of call *sys_call_table() in entry.S, to handle the case that a system call uses the sysenter instruction to enter the kernel; this is not required.

    Create the function syscall_fail() store it in arch/i386/kernel/syscall_fail.c and you must add its name to the correct compilation directive in the Makefile. This internal function should update the process-specific information about the system call that is just now invoked, and increment the number of times that system call is called by the current process, if it matches the syscall number, and finally decide if an error value is to be returned or if it should let the actual system call be executed. As we don't want to expose this function to the user space, we don't have to make it a system call.

    	asmlinkage long syscall_fail(long syscall_nr) {
    
    		...
    		...
    		Logic for determining if error code needs to be returned
    		...
    		If an error code needs to be returned, 
                         free up memory used for failure injection
                         and return error code.
    	} 
            

    4.4. Create the patch file for submission
    Your submission should be a kernel patch of the Linux kernel with KDB that you created in the previous problem. A patch is used to store changes you've made to the kernel and to submit your changes for this problem. For example, if you've changed 50 lines of code, a patch will be about 50 lines long. This is much easier to store, email, or move around than the full 283,000-line source tree. To create a patch, first back up your .config file, then do a "make distclean" in your changed source tree (you don't want object files, config files, etc. in your patch). Then, from the directory above your source tree (e.g. /home/student), run

    $ diff -ruN linux-2.6.11.12 linux-2.6.11.12-modified > homework2.patch

    Notice that the original source tree is first, and the one containing your modifications is second. There is a particular ``algebra'' to patches. You may find it helpful to think of diff as subtracting two source trees. The result of this subtraction is a patch.

  4. (5 pts.) Write a simple program to test your system call.

    The test program should be called test-fail. You should have modified the appropriate fields in linux/include/asm-386/unistd.h so as to be able to utilize the _syscallN() macros in your user-application. You'll likely need the -I flag for gcc to set up an include path. An example compilation command is gcc -I/include/linux> -o test-syscall -Wall test-syscall.c. Your program should loop several times, and in each iteration, it should do the following things:

    1. Enable the fault injector for the i'th system call, where i is the loop index.
    2. Make N system calls: any number of system calls you want, where N is the total number of iterations. Verify this with the linux command strace.
    3. Check if the i'th system call fails.

    NOTE: Although system calls are generally accessed through a library (libc), your test program should access your system call directly. This is accomplished by utilizing the general purpose _syscallN macros. This is important, because if you call a library function directly, it will enter the kernel using the sysenter instruction, thus bypass our hook function syscall_fail.

    The output of the program should be easy to read, and the program should check for errors such as invalid input or too many/few arguments. Also, your program should be compiled using the -static option to gcc to ensure that it captures all system calls. Below shows an example output of the test program:

    ----------------------------------------------
    Part-1 Testing bad argument handling in fail()
    ----------------------------------------------
    [1a] Invalid ith => fail() returned -1
    ...
    ------------------------------------
    Part-2 Testing wrong usage of fail()
    ------------------------------------
    [2a] A valid call to fail() => returned 0
         Same call to fail() repeated immediately => returned -1
    [2b] Repetitive syscall_nr in *calls => fail() returned -1
    ...
    -------------------------------
    Part-3 Calling fail() in a loop
    -------------------------------
    Calling fail() with ith = 0
    Syscall returned -1 in iteration 0
    Syscall returned 1768 in iteration 1
    Calling fail() with ith = 1
    Syscall returned 1768 in iteration 0
    ...
    

Additional Information

  • Backup your kernel changes outside your VM often. Sometimes your virtual disk can get corrupted, and you may lose all changes you've made.
  • Remember, as a safety measure, you are strongly encouraged to backup your VMware image from time to time.
  • printk statements in system calls will print their output to the console whenever they are invoked. To request printk messages be sent to a log file, insert the following line into the /etc/syslog.conf file:
    kern.* /var/kern.log
    This will cause printk messages to be written to /var/kern.log. You can send a signal to request the syslog daemon re-read the updated /etc/syslog.conf file, with the command "kill -1 pid" where pid is the pid of the syslogd process.
  • A lot of your problems will come from system administration issues. If nobody in your group in familiar with unix, you might want to pick up a book on Unix/Linux system administration.
  • You can find some Linux programming tips at the course resources page.