E6121 Reliable Software

Fall 2012 -- Junfeng Yang

Lab 2 is due 11:30pm ET 10/14.

Lab 2: Bounds Checker

This lab will introduce you to static instrumentation and runtime checking, in the context of defending against buffer overflows. We've learned from lab 1 that buffer overflows can allow attackers to gain control of a program. In lab 2, we'll build a bounds checker to prevent such attacks. Similar to the baggy bounds checking system, our system consists of a runtime component that tracks and checks bounds and a static component that instruments a program to call into our runtime.

We've set up the working environment for you on workbench.cs.columbia.edu. To ssh into workbench>, use the username and password sent to you via email. You must change password upon the first login. (You may also do this lab on the machine you prefer, but the teaching staff won't have the extra resources to help you with the setup.)

You also need to tell git your email and name (suppose your username is jy2324):

jy2324@workbench:~/lab2$ git config --global user.email "your-email@example.com"
jy2324@workbench:~/lab2$ git config --global user.name "Your Name"

Check out the lab 2 source code as follows:

jy2324@workbench:~$ git clone http://debug.cs.columbia.edu/e6121/2012/lab2.git
Initialized empty Git repository in /home/jy2324/lab2/.git/
...
jy2324@workbench:~$ cd lab2
jy2324@workbench:~$858:~/lab2$ 

We'll build our bounds checker within the LLVM compiler framework. Download and build LLVM as follows:

jy2324@workbench:~/lab2$ wget http://llvm.org/releases/3.1/llvm-3.1.src.tar.gz
Initialized empty Git repository in /home/jy2324/lab2/.git/
...
2012-09-28 16:39:24 (5.33 MB/s) - `llvm-3.1.src.tar.gz' saved [11077429/11077429]
jy2324@workbench:~$ tar xzvf llvm-3.1.src.tar.gz
llvm-3.1.src/
...
llvm-3.1.src/autoconf/m4/visibility_inlines_hidden.m4
jy2324@workbench:~$858:~/lab2$ cd llvm-3.1.src
jy2324@workbench:~/lab2/llvm-3.1.src$ mkdir build
jy2324@workbench:~/lab2/llvm-3.1.src$ cd build
jy2324@workbench:~/lab2/llvm-3.1.src/build$ ../configure --target=x86_64 #build only for x86_64 architecture
checking for clang... clang
...
config.status: executing tools/sample/Makefile commands
jy2324@workbench:~/lab2/llvm-3.1.src/build$ make ENABLE_OPTIMIZED=0 -j #build LLVM with debug information, -j means parallel build
llvm[0]: Constructing LLVMBuild project information.
...
llvm[0]: ***** Completed Debug+Asserts Build
llvm[0]: ***** Note: Debug build can be 10 times slower than an
llvm[0]: ***** optimized build. Use make ENABLE_OPTIMIZED=1 to
llvm[0]: ***** make an optimized build. Alternatively you can
llvm[0]: ***** configure with --enable-optimized.

The last command may take roughly 10 minutes. Now build the lab 2 source code as follows:

jy2324@workbench:~/lab2/llvm-3.1.src/build$ cd ../../bounds
jy2324@workbench:~/lab2/bounds$ mkdir build
jy2324@workbench:~/lab2/bounds$ cd build
jy2324@workbench:~/lab2/bounds/build$ ../configure --with-llvmsrc=$PWD/../../llvm-3.1.src --with-llvmobj=$PWD/../../llvm-3.1.src/build
../configure: line 1654: cd: /home/junfeng/work/e6121/lab2-ta/llvm-3.1.src: Not a directory
...
config.status: executing instr/Makefile commands
jy2324@workbench:~/lab2/bounds/build$ make ENABLE_OPTIMIZED=0
make[1]: Entering directory `/home/jy2324/lab2/bounds/build/runtime'
llvm[1]: Compiling check.cpp for Debug+Asserts build
llvm[1]: Building Debug+Asserts Archive Library librt.a
clang++ -emit-llvm -o /home/jy2324/lab2/bounds/build/Debug+Asserts/lib/rt.bc -c /home/jy2324/lab2/bounds/runtime/check.cpp
make[1]: Leaving directory `/home/jy2324/lab2/bounds/build/runtime'
make[1]: Entering directory `/home/jy2324/lab2/bounds/build/instr'
llvm[1]: Compiling instr.cpp for Debug+Asserts build
llvm[1]: Compiling link.cpp for Debug+Asserts build
llvm[1]: Compiling main.cpp for Debug+Asserts build
llvm[1]: Linking Debug+Asserts executable bounds
llvm[1]: ======= Finished Linking Debug+Asserts Executable bounds
make[1]: Leaving directory `/home/jy2324/lab2/bounds/build/instr'

To rebuild the bounds checker after you make changes to the source code in the bounds directory, rerun the last command (make ENABLE_OPTIMIZED=0). To clean up the build, run make ENABLE_OPTIMIZED=0 clean.

Now let's get familiar with LLVM and the bounds checker work flow. We'll compile a simple C file into the LLVM intermediate representation called the bitcode using the Clang compiler frontend as follows:

jy2324@workbench:~/lab2/bounds/build$ clang -emit-llvm -c ../test/t0.c -o t0.bc

The output file t0.bc is a binary bitcode file. You can disassemble it into a human readable representation as follows:

jy2324@workbench:~/lab2/bounds/build$ llvm-dis t0.bc
jy2324@workbench:~/lab2/bounds/build$ cat -n t0.ll
     1	; ModuleID = '../test/t0.c'
     2	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
     3	target triple = "x86_64-unknown-linux-gnu"
     4	
     5	define i32 @main() nounwind uwtable {
     6	entry:
     7	  %retval = alloca i32, align 4
     8	  %x = alloca i32, align 4
     9	  store i32 0, i32* %retval
    10	  store i32 10, i32* %x, align 4
    11	  %0 = load i32* %x, align 4
    12	  ret i32 %0
    13	}
}

You can also directly compile C to a human readable bitcode file using the -S switch:

jy2324@workbench:~/lab2/bounds/build$ clang -emit-llvm -c ../test/t0.c -S -o t0.ll

We briefly explain the lines in t0.ll here; refer to the LLVM Language Reference Manual for the detailed semantics of LLVM instructions. Lines started with ";", such as line 1, are comments. Lines 2 and 3 provide architecture and OS information for later executable code generation; they are not relevant for this assignment. Lines 5 to 13 define a function @main, which is compiled from t0.c's main function. In LLVM, a function is also a global variable, and the names of all global variables start with @. The i32 on line 5 specifies that the return type of @main is the 32-bit integer type. Line 7 allocates stack space to hold the return value of @main, and line 8 local variable %x. The names of all local variables start with %.

The alloca instruction allocates space for a stack variable of the current function call and returns a pointer to the space. The space will be automatically reclaimed when the call returns via a ret instruction. In this example, t0.c's main() function defines a stack variable int x, so clang emits %x = alloca i32, align 4 where i32 represents the 32-bit integer type. Note that %x is a pointer to a 32-bit integer and its type is i32*.

LLVM bitcode instructions are in Single Static Assignment (SSA) form, meaning that each variable is defined only once. For example, line 7 defines %x, and there will be no other lines that define %x.

To process t0.ll with our bounds checker and generate a hardened version of the file called t0-hardened.ll, run

jy2324@workbench:~/lab2/bounds/build$ ./Debug+Asserts/bin/bounds t0.ll -S

Since we haven't added the code to track and check bounds, right now t0-hardened.ll is identical to t0.ll except that t0-hardened.ll links in our bounds checker's runtime methods such as AllocVar and FreeVar. In the next part of this lab, you'll add the missing code so the hardened versions of the programs will actually track and check bounds.

Part 1: Building a Bounds Checker

Our bounds checker will record bounds information for global variables, stack variables, and heap-allocated buffers. It will perform bounds check for pointer dereference and arithmetic. Whenever an error is detected, it will terminate the execution. For simplicity, our bounds checker will not add padding or change alignment of the original program. In addition, it will eagerly flag any off-bound pointer from pointer arithmetic, even if the pointer may be converted back to be in bound later or the pointer may never be dereferenced. For instance, our bounds checker will flag an error for the code below:

int a[N] = {...};
int *p = a + N; // ERROR! p is off bound!

The source code of our bounds checker are split into two parts. The first part, in directory bounds/instr, operates during compilation. Specifically, file instr.cpp implements an LLVM FunctionPass that will be invoked on each Function. This pass instruments the relevant instructions to call into our runtime methods. It is incomplete; we've marked the places where your code is needed using Lab 2 TODO. File link.cpp implements a simple ModulePass that links a bitcode program with our bounds checker runtime. File main.cpp is for parsing a bitcode program, invoking these passes on it, and writing the hardened bitcode program. You don't need to change link.cpp and main.cpp.

The second part, in directory bounds/runtime, operates at runtime. File check.h declares all methods in our runtime, and check.cpp implements these methods. It is also incomplete, and you'll need to fill in the missing code.

Thanks to the nice design of LLVM, we can simply write a couple of passes with fewer than a few hundred lines of code to instrument a program for bounds checking. However, this is likely the first time you hack a production quality compiler, so be prepared to read a lot of code and programming manuals. A few tips:

Exercise 1. Read the LLVM tutorials. Study the bounds checker skeleton code. Complete the bounds checker by filling in the missing code in bounds/instr/instr.cpp and bounds/runtime/check.cpp.

Create 10 testcases in directory bounds/test. Your testcases should be designed to tested various aspects of your bounds checker. Run your bounds checker over the testcases and report the results in answers.txt. Commit your changes using git commit.

Part 2: Preventing Buffer Overflow Attacks for a Web Server

Next we'll apply the bounds checker you build to defend against attacks to the zookws web server. For simplicity, we'll perform the attacks to zookws on workbench, instead of the VM you downloaded in lab 1.

To get started, download, patch, and build zookws as follows:

 
jy2324@workbench:~/lab2$ git clone git://g.csail.mit.edu/6.858-lab-2012 zookws
Initialized empty Git repository in /home/jy2324/lab2/6.858-lab-2012/.git/
...
jy2324@workbench:~/lab2$ cd zookws
jy2324@workbench:~/lab2/zookws$ wget http://www.cs.columbia.edu/~junfeng/12fa-e6121/hw/Makefile.patch -O - | patch -p1
...
patching file Makefile
jy2324@workbench:~/lab2/zookws$ make
clang -emit-llvm zookld.c -c -o zookld.bc -g -O0 -std=c99 -Wall -Werror -D_GNU_SOURCE -emit-llvm -fno-stack-protector
...

Now you can run the bounds-checking version of zookws by running:

jy2324@workbench:~/lab2/zookws$ ./clean-env.sh ./zookld zook-exstack.conf

You may need to change the port number in zook-exstack.conf in case port 8080 is already taken by your fellow students.

Exercise 2. Run zookws and send a few legitimate requests to the server. Check whether your bounds checker have false positives by verifying whether the server sends back correct replies. Document the false positives you encounter in answers.txt and fix them.

Run zookws again with the two exploits you created in lab 1. Check whether your bounds checker have false negatives by verifying whether your bounds checker successfully prevents these exploits. Document the false negatives you encounter in answers.txt and fix them.

You are done! Commit all your changes to lab 2 and generate a patch as follows:

jy2324@workbench:~/lab2$ git diff origin > submit.patch

Upload submit.patch to the submission folder of lab 2 in Courseworks.