Homework 1
W4118 Fall 2009
DUE: Tuesday 9/22/2009 at 11:59pm EST

All non-programming and programming problems in this assignment are to be done by yourself. Group collaboration is not permitted for this assignment. All homework submissions are to be made via Git using these instructions. The Git repository you will use is: ssh+git://UNI@os1.cs.columbia.edu/git/UNI/hmwk1 (Replace UNI with your own UNI). Be aware that commits pushed after the deadline will not be considered. Refer to the homework policy section on the class web site for further details.

Non-Programming Problems:

Exercise numbers refer to the course textbook, Operating System Concepts. Each problem is worth 4 points.

  1. Exercise 1.8

  2. Exercise 1.9

  3. Exercise 1.10

  4. Exercise 1.13

  5. Exercise 1.15

  6. Exercise 1.22

  7. Exercise 2.20

  8. Exercise 2.21

  9. Exercise 2.25

  10. Consider the various definitions of an operating system. Consider whether the operating system should include applications such as web browsers and mail programs. Argue both that it should and that it should not, and support your answer.

Programming Problems:

For all programming problems you will be required to submit source code, a README file documenting your files and code, and a test run of your programs. The README should explain any way in which your solution differs from what was assigned, and any assumptions you made. Refer to the homework submission page on the class web site for additional submission instructions.

  1. Shell Utility (20 Pts.)
    1. Using system calls to handle all file operations, write a program in C that copies a file from one location to another, and deletes the original version. The program should take two file names as its arguments, the first one being the file to copy and the second being the new location. The new location should not be overwritten if it already exists. The program should check for bad inputs and should work for files of arbitrary size. In addition, your program should gracefully handle all error conditions while copying the file. Make sure to use system calls (ie open), not the C library for manipulating files (ie fopen).
    2. Write a program in C to create files of 1, 10 and 100 MB in size, and then time how long it takes to copy these files. Are the results what you imagine they would be?

  2. Simple Shell (25 Pts.) Even the shell itself is just another user program. The files /bin/sh, /bin/bash, and /bin/tcsh are all executable files for shells. The only thing special about your login shell is that it is listed in your login record so that /bin/login (the program that prompts you for your password) knows what program to start when you log in. If you run ``ypmatch id passwd'', replacing id with your user ID, you'll see your login record, with your login shell program listed as the last field.

    1. Write a simple shell in C. Your shell should read a line from standard input, parse the line to get the command and its arguments, fork, and exec the command. Your shell should wait for commands to finish, and should exit when it receives the command exit. When executing a command, the shell should first look for the command in the current directory, and, if not found, search the directories defined in a special variable, path.

      • Using the system function is not allowed, as it just invokes the system's /bin/sh shell to do all the work.

      • You may assume command line arguments are separated by whitespace. Don't do anything special for backslashes, quotes, ampersands or other characters that are ``special'' in other shells. Note that this means commands in your shell will not be able to operate on filenames with spaces in them!

      • You may set a reasonable maximum on the number of command line arguments, but your shell should handle input lines of any length.

      • The executable file for your shell should be named sh. When testing, make sure you execute your sh and not /bin/sh.

    2. A few shell commands are not separate programs at all, but are handled directly by the shell program. You have already implemented exit, which is an example of this. Now implement cd, the change directory command. You will need to invoke the chdir system call to do this. Note that if the call to chdir fails, you probably don't want to exit the shell but instead should handle the error appropriately.

    3. Some convenient built-ins present in many shells are pushd, popd, and dirs. Implement these in your shell. You should not assume any limit on the number of commands that may be executed.

      • pushd works like cd, but pushes the directory you switched from onto a stack.

      • popd pops the top directory off the stack, and cd's to it. In other words, a pushd followed by a popd should bring you back to the directory you were in before the pushd.

      • dirs prints the contents of the stack.

    4. The path variable holds a list of possible paths in which to search for executables. The list of paths is empty by default, but may grow to any arbitrary size. You should implement a built-in command to control this variable: path [+|- /some/dir]

      • path (without arguments) displays all the entries in the list separated by colons, e.g. "/bin:/usr/bin".

      • path + /some/dir appends the given pathname to the path list.

      • path - /some/dir removes the given pathname from the path list.

  3. Timing Methodologies (15 Pts.) Modern x86 PCs have a tsc hardware register that counts processor cycles. The register can be read using the rdtsc instruction, which can be used to provide very accurate timing measurements. The macro rdtsc below can be used to read this register. This macro can be wrapped through a function like getcycles, which returns the number of cpu cycles used since the system was last booted. #define rdtsc(low,high) \ __asm__ __volatile__("rdtsc" : "=a" (low), "=d" (high)) static inline long long int getcycles(void) { unsigned long low; long high; rdtsc(low, high); return ((long long int)high << 32) | low; }
    1. Write a function in C called gethosttime that takes a long long cycles value as an argument and returns the equivalent long long in nanoseconds. Your function should be portable across machines, so it will need to determine the CPU speed of the machine on which it is being called. One way to do this in Linux is to parse the data contained in the file /proc/cpuinfo. The /proc filesystem (where /proc/cpuinfo resides), is a virtual filesystem that presents a way for userspace to communicate with the kernel and vice-versa. Various statistics can be read from the files in the /proc filesystem and /proc/cpuinfo, as its name suggests, contains information regarding the CPU.
    2. Using gethosttime, figure out how long it takes to execute the getcycles function. In addition, time the gettimeofday system call. You should be careful to do the measurement in such a way that minimizes the overhead of doing the measurement. In particular, you cannot directly measure the time of a function accurately using another function whose overhead to execute is comparable to the function being measured.
    3. Use both gethosttime and the gettimeofday system call to time the inner for-loop of the following bit of code: for (x=0; x<1000; x++) { for (y=0; y<100; y++) { /* inner loop starts here */ z = x + y; } /* inner loop ends here */ } Your writeup should include the mean and standard deviation of loop iterations, and provide an explanation for any large variations in your results. Based on your measurements, indicate which timing method is more accurate and explain why.

    Additional Requirements

  1. All code (including test programs) must be written in C.
  2. Make at least ten commits with Git. The point is to make incremental changes and use an iterative development cycle.
  3. Follow the following coding style rules:
    • Tab size: 8 spaces.
    • Do not have more that 3 levels of indentations (unless the function is extremely simple).
    • Do not have lines that goes after the 80th column (with rare exceptions).
    • Do not comment your code to say obvious things. Use /* ... */ and not // ...
    • Try to follow the Linux kernel coding style.
  4. Use a makefile to control the compilation of your code. The makefile should have at least a default target that builds all assigned programs.
  5. When using gcc to compile your code, use the -Wall switch to ensure that all warnings are displayed. Do not be satisfied with code that merely compiles; it should compile with no warnings. You will lose points if your code produces warnings when compiled.
  6. Check the return values of all functions utilizing system resources. Do not blithely assume all requests for memory will succeed and all writes to a file will occur correctly. Your code should handle errors properly. Many failed function calls should not be fatal to a program. Typically, a system call will return -1 in the case of an error (malloc will return NULL). If a function call sets the errno variable (see the function's man page to find out if it does), you should use the result of strerror(errno) as part of your error message. As far as system calls are concerned, you will want to use one of the mechanisms described in Reporting Errors or Error Reporting.

    Tips

  1. For this assignment, your primary reference will be Programming in C. You might also find the Glibc Manual useful.
  2. Many questions about functions and system behaviour can be found in the system manual pages; type in man function to get more information about function. If function is a system call, man 2 function can ensure that you receive the correct man page, rather than one for a system utility of the same name.
  3. If you are having trouble with basic Unix/Linux behavior, you might want to check out the resources section of the class webpage.
  4. A lot of your problems have happened to other people. If you have a strange error message, you might want to try searching for the message on Google.