OPERATING SYSTEMS I: Junfeng Yang

Homework 2 is due Wednesday 2/16 at 12:01 AM ET.

The written problems are to be done individually. The programming problems are to be done in your assigned programming groups. All homework submissions are to be made via Courseworks. Refer to the homework policy section on the class web site for further details.

Written Assignment (40 pts)

Exercise numbers refer to the course textbook, Modern Operating Systems. Each problem is worth 5 points. Make your answers concise. You'll lose points for verbosity.

Read swtch.S in xv6 and draw a figure to show the stacks when xv6 is about to switch from the scheduler context to a user process, right before the CPU executes the instruction movl %edx, %esp. Be sure to label everything you know on the stacks. For example, if you know that memory location %esp+4 stores the address of function foo(), clearly label so in your figure.
Assume x86 hardware. What's the current privilege level (CPL) when a regular user program is executing on the CPU? What's the CPL when this program is started by the superuser? Explain your answer.
strace is a useful tool for tracing the system calls a process issues. Run "strace ls" and report what functions are used to (1) open the current directory, (2) get the list of directory entries, and (3) print the output to your screen. Do the same for "ltrace ls." ltrace traces the dynamically loaded library calls a process issues.
MOS 2.4
MOS 2.7
MOS 2.22
MOS 3.7
MOS 3.10

Please complete and submit a private group programming assignment evaluation. The evaluation should be in a separate file called evaluation.txt that is submitted with your individual written assignment submission.

Programming Assignment (60 pts)

This programming assignment has five parts. The purpose of part A-D is to introduce you to x86 assembly language and the PC bootstrap process, get you started with QEMU and QEMU/GDB debugging, and warm you up with xv6's system call dispatch mechanism. In the last part, part E, you'll write a system call recorder.

You will not have to write any code or turn in any answers for part A-D of the lab, but you should go through them anyway for your own understanding and be prepared to answer the questions posed below.

Part A: Setup

The files needed for this assignment are distributed using the Git distributed version control software. To learn about Git, take a look at the Git user's manual, or, if you are already familiar with other version control systems, you may find this Git overview useful.

To get started, run the following commands

$ git clone http://repair.cs.columbia.edu/git/xv6 xv6 -b hw2 This command does not work with the git installed on CLIC machines. It only works with newer versions. Please use the following commands instead

$ git clone http://repair.cs.columbia.edu/git/xv6 xv6
$ cd xv6
$ git checkout -b hw2 origin/hw2

to clone the course Git repository onto your local machine (the "clone" command), and switch to the "hw2" branch (the "checkout" command). Then you can modify the source files, and Git automatically tracks your changes. If you add a new file, you should tell Git to track this file by running

$ git add <new file>

To generate an intermediate checkpoint of the source files, you can commit your changes to your local repository by running

$ git commit -am 'checkpoint 1: added new system call sys_startrecording()'
[hw2 9bad18e] checkpoint 1: added new system call sys_startrecording()
 1 files changed, 1 insertions(+), 1 deletions(-)

You can also run git diff HEAD to see what you have changed since your last commit. To see what you've changed since you cloned the repository, run git diff origin/hw2.

To better collaborate with your teammates, you and your teammates may want to set up a shared repository. Each team member can then clone this shared repository to get a local copy of the repository. Each member can then git push his changes to the shared repository or git pull changes made by others from the shared repository. If you choose to set up your repositories like this, make sure you know what you're doing.

Submission instructions

When you are ready to hand in your solution, put your UNI in conf.mk, commit all your changes, and run make handin in the xv6 directory. This will generate hw2-<your uni>.tar.gz for you, which you can then upload via courseworks. This tar ball includes a snapshot of all xv6 source code and a patch generated using git diff origin/hw2 hw2. Thus, be sure you commit all your changes (including the new files you add) before running make handin.

We will be grading your solutions with a grading program. You can run make grade to test your solutions with the grading program.

Part B: PC Bootstrap

Getting Started with x86 assembly

If you are not already familiar with x86 assembly language, you will quickly become familiar with it during this course! The PC Assembly Language Book is an excellent place to start. Hopefully, the book contains mixture of new and old material for you.

Warning: Unfortunately the examples in the book are written for the NASM assembler, whereas we will be using the GNU assembler. NASM uses the so-called Intel syntax while GNU uses the AT&T syntax. While semantically equivalent, an assembly file will differ quite a lot, at least superficially, depending on which syntax is used. Luckily the conversion between the two is pretty simple, and is covered in Brennan's Guide to Inline Assembly.

Exercise 1. Familiarize yourself with the assembly language materials available on the course resources page. You don't have to read them now, but you'll almost certainly want to refer to some of this material when reading and writing x86 assembly. We do recommend reading the section "The Syntax" in this book. It gives a good (and quite brief) description of the AT&T assembly syntax we'll be using with the GNU assembler in xv6.

Certainly the definitive reference for x86 assembly language programming is Intel's instruction set architecture reference, which you can find on in two flavors: an HTML edition of 80386 Programmer's Reference Manual from MIT folks; and the full, latest and greatest IA-32 Intel Architecture Software Developer's Manuals from Intel, covering all the features of the most recent processors that we won't need in class but you may be interested in learning about. An equivalent set of manuals is available from AMD.

You should skim the recommended chapters of the PC Assembly book, and "The Syntax" section in Brennan's Guide now. Save the Intel/AMD architecture manuals for later or use them for reference when you want to look up the definitive explanation of a particular processor feature or instruction.

Simulating the x86

Instead of developing the operating system on a real, physical personal computer (PC), we use QEMU Emulator which faithfully emulates a complete PC: the code you write for QEMU will boot on a real PC too. Using an emulator instead of a real PC simplifies debugging; you can, for example, set break points inside of the emulated x86, which is difficult to do with the silicon version of an x86.

While QEMU's built-in monitor provides only limited debugging support, QEMU can act as a remote debugging target for the GNU debugger (GDB), which we'll use in this assignment to step through the early boot process.

To get started, clone our xv6 repository as described above in "Setup", then type make in the xv6 directory to build the boot loader and kernel you will start with.

$ cd xv6
$ make
...
dd if=kernel of=xv6.img seek=1 conv=notrunc
281+1 records in
281+1 records out
144356 bytes transferred in 0.061452 secs (2349080 bytes/sec)
rm wc.o grep.o mkdir.o rm.o ln.o stressfs.o kill.o echo.o init.o usertests.o
zombie.o cat.o sh.o ls.o

Now you're ready to run QEMU, supplying the file xv6.img and fs.img created above. xv6.img serves as the emulated PC's "virtual hard disk", which contains both our boot loader (bootmain) and our kernel (kernel). The second virtual disk fs.img contains a file system you will later examine after the system boots up.

To run xv6 under QEMU, type make qemu:

$ make qemu
qemu -serial mon:stdio -hdb fs.img xv6.img -smp 2
xv6...

cpu0: starting xv6

lapicinit: 1 0xfee00000
cpu1: starting
init: starting sh
cpu0: starting
Spurious IDE interrupt.

A separate window should appear containing the display of the virtual machine. After a few seconds, QEMU's virtual BIOS will load xv6's boot loader from a virtual hard drive image contained in the file xv6.img, and the boot loader will in turn load and run the xv6 kernel.

After everything is loaded, you should get a '$' prompt in the xv6 display window and be able to enter commands into the rudimentary but functional xv6 shell. For example, try:

$ ls
.              1 1 512
..             1 1 512
README         2 2 1926
cat            2 3 12491
...
$ echo Hello
Hello
$ cat README
xv6 is a re-implementation of Dennis Ritchie's and Ken Thompson's Unix
Version 6 (v6).  xv6 loosely follows the structure and style of v6,
but is implemented for a modern x86-based multiprocessor using ANSI C.
...

Now close this QEMU session, destroying the state of the xv6 virtual machine. You can do so either by closing the QEMU window or by pressing CTRL-C in the terminal where you typed make qemu.

The PC's Physical Address Space

We will now dive into a bit more detail about how a PC starts up. A PC's physical address space is hard-wired to have the following general layout:

 
+------------------+  <- 0xFFFFFFFF (4GB)
|      32-bit      |
|  memory mapped   |
|     devices      |
|                  |
/\/\/\/\/\/\/\/\/\/\

/\/\/\/\/\/\/\/\/\/\
|                  |
|      Unused      |
|                  |
+------------------+  <- depends on amount of RAM
|                  |
|                  |
| Extended Memory  |
|                  |
|                  |
+------------------+  <- 0x00100000 (1MB)
|     BIOS ROM     |
+------------------+  <- 0x000F0000 (960KB)
|  16-bit devices, |
|  expansion ROMs  |
+------------------+  <- 0x000C0000 (768KB)
|   VGA Display    |
+------------------+  <- 0x000A0000 (640KB)
|                  |
|    Low Memory    |
|                  |
+------------------+  <- 0x00000000

The first PCs, which were based on the 16-bit Intel 8088 processor, were only capable of addressing 1MB of physical memory. The physical address space of an early PC would therefore start at 0x00000000 but end at 0x000FFFFF instead of 0xFFFFFFFF. The 640KB area marked "Low Memory" was the only random-access memory (RAM) that an early PC could use; in fact the very earliest PCs only could be configured with 16KB, 32KB, or 64KB of RAM!

The 384KB area from 0x000A0000 through 0x000FFFFF was reserved by the hardware for special uses such as video display buffers and firmware held in non-volatile memory. The most important part of this reserved area is the Basic Input/Output System (BIOS), which occupies the 64KB region from 0x000F0000 through 0x000FFFFF. In early PCs the BIOS was held in true read-only memory (ROM), but current PCs store the BIOS in updateable flash memory. The BIOS is responsible for performing basic system initialization such as activating the video card and checking the amount of memory installed. After performing this initialization, the BIOS loads the operating system from some appropriate location such as floppy disk, hard disk, CD-ROM, or the network, and passes control of the machine to the operating system.

When Intel finally "broke the one megabyte barrier" with the 80286 and 80386 processors, which supported 16MB and 4GB physical address spaces respectively, the PC architects nevertheless preserved the original layout for the low 1MB of physical address space in order to ensure backward compatibility with existing software. Modern PCs therefore have a "hole" in physical memory from 0x000A0000 to 0x00100000, dividing RAM into "low" or "conventional memory" (the first 640KB) and "extended memory" (everything else). In addition, some space at the very top of the PC's 32-bit physical address space, above all physical RAM, is now commonly reserved by the BIOS for use by 32-bit PCI devices.

Recent x86 processors can support more than 4GB of physical RAM, so RAM can extend further above 0xFFFFFFFF. In this case the BIOS must arrange to leave a second hole in the system's RAM at the top of the 32-bit addressable region, to leave room for these 32-bit devices to be mapped. For now we will pretend that all PCs have "only" a 32-bit physical address space. But dealing with complicated physical address spaces and other aspects of hardware organization that evolved over many years is one of the important practical challenges of OS development.

The ROM BIOS

In this portion of the lab, you'll use QEMU's debugging facilities to investigate how an IA-32 compatible computer boots.

Open two terminal windows. In one, enter make qemu-gdb (or make qemu-nox-gdb). This starts up QEMU, but QEMU stops just before the processor executes the first instruction and waits for a debugging connection from GDB. In the second terminal, from the same directory you ran make, run gdb. You should see something like this:

$ gdb
...
+ target remote localhost:25501
The target architecture is assumed to be i8086
[f000:fff0]    0xffff0: ljmp   $0xf000,$0xe05b
0x0000fff0 in ?? ()
+ symbol-file kernel
(gdb)

The following line:

 
[f000:fff0] 0xffff0:  ljmp   $0xf000,$0xe05b

is GDB's disassembly of the first instruction to be executed. From this output you can conclude a few things:

The IBM PC starts executing at physical address 0x000ffff0, which is at the very top of the 64KB area reserved for the ROM BIOS.
The PC starts executing with CS = 0xf000 and IP = 0xfff0.
The first instruction to be executed is a jmp instruction, which jumps to the segmented address CS = 0xf000 and IP = 0xe05b.

Why does QEMU start like this? This is how Intel designed the 8088 processor, which IBM used in their original PC. Because the BIOS in a PC is "hard-wired" to the physical address range 0x000f0000-0x000fffff, this design ensures that the BIOS always gets control of the machine first after power-up or any system restart - which is crucial because on power-up there is no other software anywhere in the machine's RAM that the processor could execute. The QEMU emulator comes with its own BIOS, which it places at this location in the processor's simulated physical address space. On processor reset, the (simulated) processor enters real mode and sets CS to 0xf000 and the IP to 0xfff0, so that execution begins at that (CS:IP) segment address. How does the segmented address 0xf000:fff0 turn into a physical address?

To answer that we need to know a bit about real mode addressing. In real mode (the mode that PC starts off in), address translation works according to the formula: physical address = 16 * segment + offset. So, when the PC sets CS to 0xf000 and IP to 0xfff0, the physical address referenced is:

 
   16 * 0xf000 + 0xfff0   # in hex multiplication by 16 is
   = 0xf0000 + 0xfff0     # easy--just append a 0.
   = 0xffff0

0xffff0 is 16 bytes before the end of the BIOS (0x100000). Therefore we shouldn't be surprised that the first thing that the BIOS does is jmp backwards to an earlier location in the BIOS; after all how much could it accomplish in just 16 bytes?

When the BIOS runs, it sets up an interrupt descriptor table and initializes various devices such as the VGA display. This is where the "Starting SeaBIOS" message you see in the QEMU window comes from.

Exercise 2. Use GDB's si (Step Instruction) command to trace into the ROM BIOS for a few more instructions, and try to guess what it might be doing. You might want to look at You might want to look at Phil Storrs I/O Ports Description. No need to figure out all the details - just the general idea of what the BIOS is doing first.

After initializing the PCI bus and all the important devices the BIOS knows about, it searches for a bootable device such as a floppy, hard drive, or CD-ROM. Eventually, when it finds a bootable disk, the BIOS reads the boot loader from the disk and transfers control to it.

Part C: The Boot Loader

Floppy and hard disks for PCs are divided into 512 byte regions called sectors. A sector is the disk's minimum transfer granularity: each read or write operation must be one or more sectors in size and aligned on a sector boundary. If the disk is bootable, the first sector is called the boot sector, since this is where the boot loader code resides. When the BIOS finds a bootable floppy or hard disk, it loads the 512-byte boot sector into memory at physical addresses 0x7c00 through 0x7dff, and then uses a jmp instruction to set the CS:IP to 0000:7c00, passing control to the boot loader. Like the BIOS load address, these addresses are fairly arbitrary - but they are fixed and standardized for PCs.

The ability to boot from a CD-ROM came much later during the evolution of the PC, and as a result the PC architects took the opportunity to rethink the boot process slightly. As a result, the way a modern BIOS boots from a CD-ROM is a bit more complicated (and more powerful). CD-ROMs use a sector size of 2048 bytes instead of 512, and the BIOS can load a much larger boot image from the disk into memory (not just one sector) before transferring control to it. For more information, see the "El Torito" Bootable CD-ROM Format Specification.

Xv6 uses the conventional hard drive boot mechanism, which means that its boot loader must fit into a measly 512 bytes. The boot loader consists of one assembly language source file bootasm.S, and one C source file, bootmain.c. Look through these source files carefully and make sure you understand what's going on. The boot loader must perform two main functions:

First, the boot loader switches the processor from real mode to 32-bit protected mode, because it is only in this mode that software can access all the memory above 1MB in the processor's physical address space. Protected mode is described briefly in Bootstrap, and in great detail in the Intel architecture manuals. At this point you only have to understand that translation of segmented addresses (segment:offset pairs) into physical addresses happens differently in protected mode, and that after the transition offsets are 32 bits instead of 16.
Second, the boot loader reads the kernel from the hard disk by directly accessing the IDE disk device registers via the x86's special I/O instructions. You will not need to learn much about programming specific devices in this class: writing device drivers is in practice a very important part of OS development, but from a conceptual or architectural viewpoint it is also one of the least interesting.

After you understand the boot loader source code, look at the file bootblock.asm. This file is a disassembly of the boot loader that the Makefile creates after compiling the boot loader. This disassembly file makes it easy to see exactly where in physical memory all of the boot loader's code resides, and makes it easier to track what's happening while stepping through the boot loader in GDB. Likewise, kernel.asm contains a disassembly of the xv6 kernel, which can often be useful for debugging.

Loading the Kernel

We will now look in further detail at the C language portion of the boot loader, in bootmain.c. To make sense out of bootmain.c you'll need to know what an ELF binary is. When you compile and link a C program such as the xv6 kernel, the compiler transforms each C source ('.c') file into an object ('.o') file containing assembly language instructions encoded in the binary format expected by the hardware. The linker then combines all of the compiled object files into a single binary image such as kernel, which in this case is a binary in the ELF format, which stands for "Executable and Linkable Format".

Full information about this format is available in the ELF specification, but you will not need to delve very deeply into the details of this format in this class. Although as a whole the format is quite powerful and complex, most of the complex parts are for supporting dynamic loading of shared libraries, which we will not do in this class.

For the purposes of this course, you can consider an ELF executable to be a header with loading information, followed by several program sections, each of which is a contiguous chunk of code or data intended to be loaded into memory at a specified address. The boot loader does not modify the code or data; it loads it into memory and starts executing it.

An ELF binary starts with a fixed-length ELF header, followed by a variable-length program header listing each of the program sections to be loaded. The C definitions for these ELF headers are in elf.h. The program sections we're interested in are:

.text: The program's executable instructions.
.rodata: Read-only data, such as ASCII string constants produced by the C compiler. (We will not bother setting up the hardware to prohibit writing, however.)
.data: The data section holds the program's initialized data, such as global variables declared with initializers like int x = 5;.

When the linker computes the memory layout of a program, it reserves space for uninitialized global variables, such as int x;, in a section called .bss that immediately follows .data in memory. C requires that "uninitialized" global variables start with a value of zero. Thus there is no need to store contents for .bss in the ELF binary; instead, the linker records just the address and size of the .bss section. The loader or the program itself must arrange to zero the .bss section.

You can display a full list of the names, sizes, and link addresses of all the sections in the kernel executable by typing:

$ objdump -h kernel

You will see many more sections than the ones we listed above, but the others are not important for our purposes. Most of the others are to hold debugging information, which is typically included in the program's executable file but not loaded into memory by the program loader.

Take particular note of the "VMA" (or link address) of the .text section. We'll reexamine this shortly.

Besides the section information, there is one more field in the ELF header that is important to us, named e_entry. This field holds the link address of the entry point in the program: the memory address in the program's text section at which the program should begin executing. You can see the entry point:

 
$ objdump -f kernel

To examine memory in GDB, you use the x command with different arguments. The GDB manual has full details. For now, it is enough to know that the recipe x/Nx ADDR prints N words of memory at ADDR. (Note that both 'x's in the command are lowercase.)

Warning: The size of a word is not a universal standard. In GNU assembly, a word is two bytes (the 'w' in xorw, which stands for word, means 2 bytes).

Exercise 3. Reset the machine (exit QEMU/GDB and start them again). Examine the 8 words of memory at 0x00100000 at the point the BIOS enters the boot loader, and then again at the point the boot loader enters the kernel. Why are they different? What is there at the second breakpoint? (You do not really need to use QEMU to answer this question. Just think.)

Link vs. Load Address

The load address of a binary is the memory address at which a binary is actually loaded. For example, the BIOS is loaded by the PC hardware at address 0xf0000. So this is the BIOS's load address. Similarly, the BIOS loads the boot sector at address 0x7c00. So this is the boot sector's load address.

The link address of a binary is the memory address for which the binary is linked. Linking a binary for a given link address prepares it to be loaded at that address. The linker encodes the link address in the binary in various ways, for example when the code needs the address of a global variable, with the result that a binary usually won't work if it is not loaded at the address that it is linked for.

In one sentence: the link address is the location where a binary assumes it is going to be loaded, while the load address is the location where a binary is loaded. It's up to us to make sure that they turn out to be the same.

Look at the -Ttext linker commands in Makefile. These set the link address for the boot loader and kernel respectively.

Trace through the first few instructions of the boot loader again and identify the first instruction that would "break" or otherwise do the wrong thing if you were to get the boot loader's link address wrong. Then change the link address in Makefile to something wrong, run make clean, recompile the lab with make, and trace into the boot loader again to see what happens. Don't forget to change the link address back and make clean again afterward!

When object code contains no absolute addresses that encode the link address in this fashion, we say that the code is position-independent: it will behave correctly no matter where it is loaded. GCC can generate position-independent code using the -fpic option, and this feature is used extensively in modern shared libraries that use the ELF executable format. Position independence typically has some performance cost, however, because it restricts the ways in which the compiler may choose instructions to access the program's data. Xv6 does not use -fpic.

Part D: System Call Dispatch

When a user process issues a system call to request OS service or a device generates an interrupt, the hardware must switch from user-mode to kernel mode so that the kernel can handle this system call or interrupt. On the x86, this switch is done by a hardware mechanism called "interrupt" or "trap". An interrupt pauses the execution of the current user process, saves its context, switches from the user mode to the kernel mode, and executes a piece of code in the kernel called an interrupt handler.

In xv6, each interrupt handler sets up a trap frame in struct trapframe containing the processor registers at the time of the interrupt. It then calls the C function trap defined in trap.c. This function looks at the hardware trap number in the trap frame to decide why it has been called and what needs to be done. If the trap number is T_SYSCALL, trap calls the system call handler syscall.

Each system call has a system call number to distinguish it from others. In xv6, the system call number is stored in register %eax before issuing the system call. Function syscall reads the system call number from the trap frame, looks up the table syscalls and decides which sys_* routine to call.

Xv6 does not copy the arguments of a system call to the kernel stack. Therefore, it fetches these arguments from the user stack using helper functions argint, argptr, and argstr. These helper functions reads the user stack register %esp from the trap frame, and locates the wanted argument.

Exercise 4. Read vector.S, trapasm.S, trap.c, and syscall.c to understand how xv6 dispatches interrupts and system calls. Set a breakpoint at alltraps and trace how xv6 handles timer interrupts. Write a simple program that repeatedly calls getpid(), and use gdb to trace how these system calls are handled.

Part E: Implement a System Call Recorder (60 pts)

Developers often want to trace the system calls a program issues for program understanding and debugging. In this part of the assignment, you will modify the xv6 kernel to record system calls a process issues. For each system call, you should record the system call number, the return value, and the arguments if any. There are three types of system call arguments in xv6:

Integer argument: you should record its numeric value.
Pointer argument: you should record the value of the pointer instead of the value pointed to by the pointer. That is, given a pointer argument p, record p instead of *p.
String argument: record up to 19 characters of the string. We limit the number of characters to record to save memory.

In order to control the recording and retrieve the results, you need to implement the following three system calls:

int startrecording()
int stoprecording()
int fetchrecords(struct record *records, int num_records)

A process is either in the recording mode or the normal mode. Initially, a process is in the normal mode. startrecording puts the calling process into the recording mode, and stoprecording switches it back to normal. These two system calls should return 0 on success, or return -1 if the current process is already in the targeting mode. When a process forks, the child process inherits the mode of the parent.

All the system calls issued by a process, except the above three system calls, should be recorded when this process is in the recording mode. You should store the records in a process's PCB, e.g. a linked list associated with struct proc. These records should be removed when the process exits. When a process forks, the child process starts without any record.

fetchrecords retrieves all the system call records of the current process.

If the first argument records is NULL, the system call returns the number of records, and the second argument num_records is simply ignored.
If records is not NULL, it points to a pre-allocated array used to store the records. In this case, num_records indicates the size of the array (the number of struct records). fetchrecords retrieves at most num_records records and return the number of records actually retrieved.

For the purpose of grading, please use the struct record defined in record.h to record system calls. This struct is shown below

enum recordtype { SYSCALL_NO, ARG_INTEGER, ARG_POINTER, ARG_STRING, RET_VALUE };

#define MAX_STR_LEN (20)

struct record {
  enum recordtype type;
  union recordvalue {
    int intval;
    void *ptrval;
    char strval[MAX_STR_LEN];
  } value;
};

A record may be a system call number, an argument, or a return value. The field type indicates the type of the record, and value stores the corresponding value.

One system call invocation may generate multiple records. They should appear in the following order: system call number, argument 0, argument 1, ... , return value. For example, a system call open("README", 0) = 3 generates four records:

SYSCALL_NO: SYS_open
ARG_STRING: "README"
ARG_INTEGER: 0
RET_VALUE: 3

W4118 OPERATING SYSTEMS I

Spring 2011 -- Junfeng Yang