Reliable Software: Junfeng Yang

Lab 2 due time: 11:30pm EST 02/20/2017

Lab 2: Bounds Checker

This lab will introduce you to static instrumentation and runtime checking, in the context of defending against buffer overflows. We've learned from lab 1 that buffer overflows can allow attackers to gain control of a program. In lab 2, we'll build a bounds checker to prevent such attacks. Similar to the baggy bounds checking system, our system consists of a runtime component that tracks and checks bounds and a static component that instruments a program to call into our runtime.

We've set up a virtual machine image (GDrive link) for you as the working environment (username/password: e6121). You can also choose to set up your own working environment, though we may not be able to provide support. This lab has been tested on the x64 virtual machine with 32 GBs of disk and 4 GBs of memory, running Ubuntu 16.10, and should be working on most Linux distributions. Make sure the following dependency packages are installed:

gcc, g++, gdb
wget, git
cmake
libssl-dev (development package)
python, python-flask, python-sqlalchemy

Before starting with the lab, you need to tell git your email and name (suppose your username is jy2324, and the machine name is workbench):

jy2324@workbench:~/lab2$ git config --global user.email "your-email@example.com"
jy2324@workbench:~/lab2$ git config --global user.name "Your Name"

Check out the lab 2 source code as follows:

jy2324@workbench:~$ git clone https://bitbucket.org/xinhaoyuan/e6121-lab2.git lab2

We'll build our bounds checker within the LLVM compiler framework and the Clang frontend. Download and build them as follows:

jy2324@workbench:~$ wget http://releases.llvm.org/3.9.1/llvm-3.9.1.src.tar.xz
jy2324@workbench:~$ wget http://releases.llvm.org/3.9.1/cfe-3.9.1.src.tar.xz
jy2324@workbench:~$ tar xJvf llvm-3.9.1.src.tar.xz
jy2324@workbench:~$ tar xJvf cfe-3.9.1.src.tar.xz -C llvm-3.9.1.src/tools
jy2324@workbench:~$ cd llvm-3.9.1.src
jy2324@workbench:~/llvm-3.9.1.src$ mkdir build
jy2324@workbench:~/llvm-3.9.1.src$ cd build
jy2324@workbench:~/llvm-3.9.1.src/build$ cmake -DCMAKE_BUILD_TYPE=Debug -DLLVM_TARGETS_TO_BUILD="X86" .. #build only for x86 and x86_64 architecture
jy2324@workbench:~/llvm-3.9.1.src/build$ make #build LLVM with debug information, use "make -j" for parallel build

The last command may take roughly 30 minutes to hours. Now build the lab 2 source code as follows:

jy2324@workbench:~/llvm-3.9.1.src/build$ cd ../../lab2/bounds
jy2324@workbench:~/lab2/bounds$ mkdir build
jy2324@workbench:~/lab2/bounds$ cd build
jy2324@workbench:~/lab2/bounds/build$ LLVM_DIR=../../../llvm-3.9.1.src/build cmake -DCMAKE_BUILD_TYPE=Debug ..
jy2324@workbench:~/lab2/bounds/build$ make

To rebuild the bounds checker after you make changes to the source code in the bounds directory, rerun the make command. To clean up the build, run make clean.

Now let's get familiar with LLVM and the bounds checker work flow. We'll compile a simple C file into the LLVM intermediate representation called the bitcode using the Clang compiler frontend as follows:

jy2324@workbench:~/lab2/bounds/build$ export LLVM_DIR=`cd ../../../llvm-3.9.1.src/build;pwd`
jy2324@workbench:~/lab2/bounds/build$ ${LLVM_DIR}/bin/clang -emit-llvm -c ../test/t0.c -o t0.bc

The output file t0.bc is a binary bitcode file. You can disassemble it into a human readable representation as follows:

jy2324@workbench:~/lab2/bounds/build$ ${LLVM_DIR}/bin/llvm-dis t0.bc
jy2324@workbench:~/lab2/bounds/build$ cat -n t0.ll
   1  ; ModuleID = 't0.bc'
   2  source_filename = "../test/t0.c"
   3  target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
   4  target triple = "x86_64-unknown-linux-gnu"
   5
   6  ; Function Attrs: nounwind uwtable
   7  define i32 @main() #0 {
   8  entry:
   9    %retval = alloca i32, align 4
  10    %x = alloca i32, align 4
  11    store i32 0, i32* %retval, align 4
  12    store i32 10, i32* %x, align 4
  13    %0 = load i32, i32* %x, align 4
  14    ret i32 %0
  15  }
  16
  17  attributes #0 = { nounwind uwtable ... }
  18  
  19  !llvm.ident = !{!0}
  20  
  21  !0 = !{!"clang version 3.9.1 (tags/RELEASE_391/final)"}

You can also directly compile C to a human readable bitcode file using the -S switch:

jy2324@workbench:~/lab2/bounds/build$ ${LLVM_DIR}/bin/clang -emit-llvm -c ../test/t0.c -S -o t0.ll

We briefly explain the lines in t0.ll here; refer to the LLVM Language Reference Manual for the detailed semantics of LLVM instructions. Lines started with ";", such as line 1, are comments. Lines 3 and 4 provide architecture and OS information for later executable code generation; they are not relevant for this assignment. Lines 7 to 15 define a function @main, which is compiled from t0.c's main function. In LLVM, a function is also a global variable, and the names of all global variables start with @. The i32 on line 7 specifies that the return type of @main is the 32-bit integer type. Line 9 allocates stack space to hold the return value of @main, and line 10 local variable %x. The names of all local variables start with %.

The alloca instruction allocates space for a stack variable of the current function call and returns a pointer to the space. The space will be automatically reclaimed when the call returns via a ret instruction. In this example, t0.c's main() function defines a stack variable int x, so clang emits %x = alloca i32, align 4 where i32 represents the 32-bit integer type. Note that %x is a pointer to a 32-bit integer and its type is i32*.

LLVM bitcode instructions are in Single Static Assignment (SSA) form, meaning that each variable is defined only once. For example, line 7 defines %x, and there will be no other lines that define %x.

To process t0.ll with our bounds checker and generate a hardened version of the file called t0-hardened.ll, run

jy2324@workbench:~/lab2/bounds/build$ ./instr/bounds-instr t0.ll -S

Since we haven't added the code to track and check bounds, right now t0-hardened.ll is identical to t0.ll except that t0-hardened.ll links in our bounds checker's runtime methods such as AllocVar and FreeVar. In the next part of this lab, you'll add the missing code so the hardened versions of the programs will actually track and check bounds.

Part 1: Building a Bounds Checker

Our bounds checker will record bounds information for global variables, stack variables, and heap-allocated buffers. It will perform bounds check for pointer dereference and arithmetic. Whenever an error is detected, it will terminate the execution. For simplicity, our bounds checker will not add padding or change alignment of the original program. In addition, it will eagerly flag any off-bound pointer from pointer arithmetic, even if the pointer may be converted back to be in bound later or the pointer may never be dereferenced. For instance, our bounds checker will flag an error for the code below:

int a[N] = {...};
int *p = a + N; // ERROR! p is off bound!

The source code of our bounds checker are split into two parts. The first part, in directory bounds/instr, operates during compilation. Specifically, file instr.cpp implements an LLVM FunctionPass that will be invoked on each Function. This pass instruments the relevant instructions to call into our runtime methods. It is incomplete; we've marked the places where your code is needed using Lab 2 TODO. File link.cpp implements a simple ModulePass that links a bitcode program with our bounds checker runtime. File main.cpp is for parsing a bitcode program, invoking these passes on it, and writing the hardened bitcode program. You don't need to change link.cpp and main.cpp.

The second part, in directory bounds/runtime, operates at runtime. File check.h declares all methods in our runtime, and check.cpp implements these methods. It is also incomplete, and you'll need to fill in the missing code.

Thanks to the nice design of LLVM, we can simply write a couple of passes with fewer than a few hundred lines of code to instrument a program for bounds checking. However, this is likely the first time you hack a production quality compiler, so be prepared to read a lot of code and programming manuals. A few tips:

To understand how LLVM invokes the methods provided by an LLVM pass, read this tutorial.
The value returned by an instruction is represented by the instruction itself. For instance, the pointer returned by an AllocaInst is represented by the AllocaInst itself. In other words, an LLVM instruction object is also a Value object, implemented via C++ inheritance. If you need to use the value returned by an instruction X in another instruction, simply use the instruction object X.
A GlobalVariable represents a pointer to the global data. This is similar to what alloca returns.
The LLVM IRBuilder and TypeBuilder can be quite handy at building instructions. See the tutorials (1, 2). These tutorials are slightly out dated, and we've noted the change in the lab skeleton code.
Pointer arithmetic is implemented using the GetElementPtrInst instruction.
LLVM uses doxygen to generate documentation for code. You can browse the documentation here

Exercise 1. Read the LLVM tutorials. Study the bounds checker skeleton code. Complete the bounds checker by filling in the missing code in bounds/instr/instr.cpp and bounds/runtime/check.cpp.

Create 10 testcases in directory bounds/test. Your testcases should be designed to tested various aspects of your bounds checker. Run your bounds checker over the testcases and report the results in answers.txt. Commit your changes using git commit.

Part 2: Preventing Buffer Overflow Attacks for a Web Server

Next we'll apply the bounds checker you build to defend against attacks to the zookws web server. For simplicity, we'll perform the attacks to zookws on workbench, instead of the VM you downloaded in lab 1.

To get started, download, patch, and build zookws as follows:

 
jy2324@workbench:~/lab2$ git clone git://g.csail.mit.edu/6.858-lab-2012 zookws
jy2324@workbench:~/lab2$ cd zookws
jy2324@workbench:~/lab2/zookws$ wget http://www.cs.columbia.edu/~junfeng/17sp-e6121/hw/zookws.patch -O - | patch -p1
jy2324@workbench:~/lab2/zookws$ PATH="${LLVM_DIR}/bin:$PATH" make
clang -emit-llvm zookld.c -c -o zookld.bc -g -O0 -std=c99 -Wall -Werror -D_GNU_SOURCE -emit-llvm -fno-stack-protector
...

Now you can run the bounds-checking version of zookws by running:

jy2324@workbench:~/lab2/zookws$ ./clean-env.sh ./zookld zook-exstack.conf

Exercise 2. Run zookws and send a few legitimate requests to the server. Check whether your bounds checker have false positives by verifying whether the server sends back correct replies. Document the false positives you encounter in answers.txt and fix them.

Run zookws again with the two exploits you created in lab 1. Check whether your bounds checker have false negatives by verifying whether your bounds checker successfully prevents these exploits. Document the false negatives you encounter in answers.txt and fix them.

You are done! Commit all your changes to lab 2 and generate a patch as follows:

jy2324@workbench:~/lab2$ git diff origin > submit.patch

Upload submit.patch to the submission folder of lab 2 in Courseworks.

Notice - git diff won't include new files by default. Please make sure that the patch contains all your code needed to submit.

E6121 Reliable Software

Spring 2017 -- Junfeng Yang

Lab 2: Bounds Checker

Part 1: Building a Bounds Checker

Part 2: Preventing Buffer Overflow Attacks for a Web Server