Homework 2

W4118 Fall 2022

UPDATED: Monday 9/26/2022 at 9:51pm EST

DUE: Wednesday 10/5/2022 at 11:59pm EST

All homework submissions are to be made via Git. You must submit a detailed list of references as part your homework submission indicating clearly what sources you referenced for each homework problem. You do not need to cite the course textbooks and instructional staff. All other sources must be cited. Please edit and include this file in the top-level directory of your homework submission in the test branch of your team repo. Homeworks submitted without this file will not be graded. Be aware that commits pushed after the deadline will not be considered. Refer to the homework policy section on the class web site for further details.

Programming Problems:

Group programming problems are to be done in your assigned groups. The Git repository for your group has been setup already on Github. You don't need the Github Classroom link for the group assignment. It can be cloned using:


git clone git@github.com:W4118/f22-hmwk2-teamN.git
    
(Replace teamN with the name of your team, e.g. team0). This repository will be accessible to all members of your team, and all team members are expected to commit (local) and push (update the server) changes / contributions to the repository equally. You should become familiar with team-based shared repository Git commands such as git-pull, git-merge, git-fetch.

All team members should make at least five commits to the team's Git repository. The point is to make incremental changes and use an iterative development cycle. Follow the Linux kernel coding style and check your commits with the checkpatch.pl script on the default path in the provided VM. Errors from the script in your submission will cause a deduction of points.

The kernel programming for this assignment will be run using your Linux VM. As a part of this assignment, you will be experimenting with Linux platforms and gaining familiarity with the development environment. Linux platforms can run on many different architectures, but the specific platforms we will be targeting are the X86_64 or Arm64 CPU families. All of your kernel builds will be done in the same Linux VM from homework 1. You will be developing with the Linux 5.10.138 kernel.

For this assignment, you will write a system call to dump the process tree and a user space program to use the system call.

For students on Arm Mac computers (e.g. with M1 or M2 CPU): if you want your submission to be built/tested for Arm, you must create and submit a file called .armpls in the top-level directory of your repo; feel free to use the following one-liner: cd "$(git rev-parse --show-toplevel)" && touch .armpls && git add .armpls && git commit -m "Arm pls" You should do this first so that this file is present in any code you submit for grading.

For all programming problems you should submit your source code as well as a README file documenting your files and code. Please do NOT submit kernel images. The README should explain any way in which your solution differs from what was assigned, and any assumptions you made. You are welcome to include a test run in your README showing how your system call works. It should also state explicitly how each group member contributed to the submission. The README should be placed in the top level directory of the test branch of your team repo.


  1. Build your own Linux 5.10.138 kernel and install and run it in your Linux VM.

    Kernel building instructions for VM
    Build and run a custom kernel for your VM. The source code of of the VM is located at master branch of your team repo. You can checkout to that branch. Clone the master branch in a separate directory, which will be the root of your kernel tree.
    1. In your repo, run git checkout master to switch to the "master" branch.
    2. The first thing you need to do is make the config file for the VM kernel, which means that you will create a .config file in the root of your kernel tree that has the appropriate configuration options set for the kernel you are going to build. This guide provides detailed instructions for you to follow to create the .config, though you should replace wherever it refers to the kernel version as 5.10.57 with 5.10.138. As described in the guide, the two steps of the process will be executing the following commands in the root directory of your kernel tree:
      
      make olddefconfig
      make menuconfig
                      
    3. You will then build and install the kernel in your VM. Run make -jN, where N is the number of parallel compilation jobs to run, which should correspond to the number of cores in your VM. Wait for the kernel to compile.
    4. Install the new kernel by running:
      
      sudo make modules_install && sudo make install
                      
    5. Reboot your VM.
    6. In the boot selection screen (called grub), which shows up immediately after the VMWare logo screen, select "Advanced options for Ubuntu GNU/Linux" and choose the kernel identified by "cs4118".
    7. You are now running your custom kernel! You can use uname -a to check the version of the kernel. You now have an unmodified kernel from the 5.10.138 Linux source provided in the master repo. You should name it 5.10.138-cs4118, and keep it around as your fallback kernel for all future assignments (including this one), in case you run into any trouble booting into the kernel you’re working on. Additionally, make sure that the CONFIG_BLK_DEV_LOOP option is set to y in your .config file before you build and install your unmodified kernel. This will come in handy in later assignments.
    8. Optimize your kernel compile time. A large amount of time is spent compiling and installing kernel modules you never use. To reduce your kernel compilation time, you can optionally regenerate a .config so that it only contains modules you are using by following these instructions:
      • Backup your .config to something like .config.-from-lts. Make sure to keep your local version the same; that is, your kernel should still be named 5.10.138-cs4118.
      • Run make localmodconfig in your Linux kernel source tree. This will take your current .config and turn off all modules that you are not using. It will ask you a few questions. You can hit ENTER to accept the defaults, or just have yes do so for you:
        
        $ yes '' | make localmodconfig
                        
        Make sure that CONFIG_BLK_DEV_LOOP is still set to y before building and installing this kernel. Now you have a much smaller .config. You can follow the rest of the steps starting from make (step 3). Note that you only need to do make localmodconfig once, not each time you build the kernel.
      • When you are hacking kernel code, you’ll often make simple changes to only a handful of .c files. If you didn’t touch any header files, the modules will not be rebuilt when you run make; thus there is no reason to reinstall all modules every time you rebuild your kernel. In other words, in lieu of step 4, you can just do:
        
        sudo make install
                        
        assuming you have already done step 4 with the kernel configuration you are using.

    Just to reemphasize the earlier point, when you are hacking kernel code, the standard workflow will be to modify kernel code, then build the kernel and install the updated kernel using the following two steps:

    
    make -jN
    sudo make install
                    
    Then reboot your VM and select your kernel in grub. In other words, no need to do step 2 or step 8 each time you build your kernel; those steps only need to be done once.
  2. Write a new system call in Linux

    General Description

    The system call you write should take three arguments and copy the process tree information to a buffer in a breadth-first-search (BFS) order. You should only include processes in your buffer and count only the number of processes, not threads.

    The prototype for your system call will be:
    int ptree(struct prinfo *buf, int *nr, int root_pid);
    You should define struct prinfo as:
    
    struct prinfo {
            pid_t parent_pid;       /* process id of parent */
            pid_t pid;              /* process id */
            uid_t uid;              /* user id of process owner */
            char comm[16];          /* name of program executed */
            int level;              /* level of this process in the subtree */
    };
            
    in include/linux/prinfo.h as part of your solution.

    Use the following function to get the task_struct given pid (Don't worry about namespaces and virtual pids):
    
    static struct task_struct *get_root(int root_pid)
    {
            if (root_pid == 0)
                    return &init_task;
    
            return find_task_by_vpid(root_pid);
    }
            

    Parameters description

    Additional Requirements

    Hints

  3. Test your new system call

    General Description

    Write a simple C program which calls ptree with the root_pid as an argument. If no argument is provided, your program should return the entire process tree. The program should be in the test branch of your team repo, and your makefile should generate an executable named test. Since you do not know the tree size in advance, you should start with some reasonable buffer size for calling ptree, then if the buffer size is not sufficient for storing the tree, repeatedly double the buffer size and call ptree until you have captured the full process tree requested. Print the contents of the buffer from index 0 to the end. For each process, you must use the following format for program output:
    
    printf("%s,%d,%d,%d,%d\n", buf[i].comm, buf[i].pid,
                buf[i].parent_pid, buf[i].uid, buf[i].level);
            

    Example program output:
    
    swapper/0,0,0,0,0
    systemd,1,0,0,1
    kthreadd,2,0,0,1
    systemd-journal,385,1,0,2
    ....
    kworker/u16:4,5169,2,0,2
    kworker/4:0,8280,2,0,2
    ...
    sh,1516,1472,1000,7
    ...
            

    Hints

    Compiling for the VM:
    You can compile your test program with a standard toolchain to make it run on the VM so you don't need to specify any arguments, just make it.
  4. Investigate the Linux process tree

    gdb is a debugger on the Linux platform. It has a command line interface. Use gdb to debug an arbitrary program on the VM, choose a breakpoint, run and halt gdb there. Then, use the program you developed in part 3 to find the program that gdb is attached to. How do gdb and the program it is attached to appear in your process tree?
    Learn to use GDB in 3 seconds:

    
    $: gdb ./test
    GNU gdb (Ubuntu 8.1-0ubuntu3) ...
    ...
    Reading symbols from ./test...done.
    (gdb) break 49
    Breakpoint 1 at 0x4007e7: file test.c, line 49
    (gdb) run
    Starting program: ..../test
    Breakpoint 1, .... at test.c:50
    (gdb)
                    
  5. Create your own process tree

    Using the program you developed in part 3, write another program such that you can use the program from part 3 to output the following process tree:

    
    foo,5000,1,x,y
    foo,5001,5000,x,y+1
    foo,5002,5000,x,y+1
    foo,5003,5001,x,y+2
    foo,5004,5002,x,y+2
    
    
    where x and y are each some integer value. For example, any value of x is okay, so long as it is the same value of x for all the processes in this tree. Other than x and y, all of the other fields shown in the output above should exactly match the strings and integers shown. This program should also be in the test branch of your team repo, and your makefile should generate the executable required. You may find it helpful to change the maximum possible pid value to make it easier to test your program so that the pid value will rollover quicker.