HW3: HTTP Server

Group project
15 parts: part 0 - part 14
220 points: 20 points for each part, except for parts 0, 4, 11, 14
- part 0 is required, but not graded
- part 4, 11, 4 are optional, and not graded

Submission

One member of a group will submit a single gzipped tar file named group_xx_hw3.tar.gz where xx is your two digit group number (with leading zeros, if applicable).
- For example, group #2 and #27 should submit group_02_hw3.tar.gz and group_27_hw3.tar.gz, respectively.
- How to make a gzipped tar file: tar czvf group_xx_hw3.tar.gz hw3
- Only one person in your group submits. If that person forgets to submit, everyone gets zero, no exception. If there are more than 1 submission, you get 20% penalty. I recommend you get together with your group at the final stage, finish it and submit together.
The tar file will contain a single top level directory named hw3, which will contain a git repo, README.txt, and directories for each part. The directory structure looks like this:
```
hw3/.git
hw3/README.txt
hw3/part0
hw3/part1
hw3/part2
...
hw3/part13
hw3/part14
```
README.txt
- There should be only one README.txt, and it is in hw3 directory.
- In the beginning of README.txt, please write your group number, and list each group member’s full name and UNI in the following format: FIRST LAST <UNI@columbia.edu>
Each part’s directory will contain the following:
- http-server.c – Please do NOT change the file name.
- Any other source file(s) – You are allowed to put your code in separate source files, but it’s perfectly fine to put everything in http-server.c too.
- Makefile – Graders will simply type “make” and run the executable named “http-server”, and nothing else. If your program requires additional steps to build or is not named correctly, you will receive ZERO for that part.
- You MUST compile with -Wall -Werror flags.

Part 0: Basic single-process / single-threaded web server (Not graded)

Required tasks

Read the skeleton code provided (http-server.c). Make sure you understand the code completely.
Test http-server.c using netcat (nc)
- To install netcat in Arch Linux, run sudo pacman -S openbsd-netcat
- You may need to setup host-only networking if you’re running your http-server inside VirtualBox

Recommended tasks

Measure the performance of the basic web server
- Use a HTTP traffic generator to measure how many requests the web server can handle in a second
- You can find an open-source HTTP traffic generator
- Use a sizable file (a hi-res image or a short movie) for testing so that it takes a measurable time for a request to complete
- Note that I am not suggesting that you conduct a serious performance measurement study. Measuring performance correctly and accurately is not an easy thing to do – many researchers build their careers around it. The actual numbers from your measurements don’t mean much. Your goal here is twofold:
  1. To test your server to make sure you implemented it correctly
  2. To gain a deeper understanding of server architectures by comparing performance characteristics of different strategies
- Performance measurements are optional and will not be graded, but recommended. We will be using benchmarking tools to test your implementations.

Deliverables

None

Part 1: Multi-process web server

The basic version of the HTTP server has a limitation: it can handle only one connection at a time. This is serious limitation because a malicious client could take an advantage of this weakness and prevent the server from processing additional requests by sending an incomplete HTTP request. In this part we improve the situation by creating additional processes to handle requests.

The easiest way (from programmer’s point of view) to handle multiple connections simultaneously is to create additional child processes with the fork() system call. Each time a new connection is accepted, instead of processing the request within the same process, we create a new child process by calling fork() and let it handle the request.

The child process inherits the open client socket and processes the request, generating a response. After the response has been sent, the child process terminates.

Required tasks

Modify the code so that the web server forks after it accepts a new client connection, and the child process handles the request and terminates afterwards.
Test this implementation by connecting to it from multiple netcat clients simultaneously.

Recommended tasks

Do performance testing. Do you see any difference from part 1?

Requirements (and hints)

Note that the two socket descriptors – the server socket and the new connected client socket – are duplicated when the server forks. Make sure to close anything you don’t need as early as possible. Think about these:
- Does the parent process need the client socket? Should it close it? If so, when? If the parent closes it, should the child close it again?
- Does the child process need the server socket? Should it close it? What would happen if it doesn’t close it?
Don’t let your children become zombies… At least not for too long. Make sure the parent process calls waitpid() immediately after one or more child process have terminated.
- How do you do this? Can you call waitpid() inside the main for (;;) loop? Obviously we cannot let waitpid() block until a child process terminates – we’d be back to where we started then. You will need to call waitpid() in a non-blocking way. (Hint: look into WNOHANG flag.) But even if you make it non-blocking, can you make your parent process call it immediately after a child process terminates? What if the parent process is being blocked on accept()?
Modify the logging so that it includes the process id of the child process that handled the request.

Part 2: Interprocess communication through shared memory

Reading assignment

APUE 15.2
APUE 14.8: read page 525–527, skim or skip the rest
APUE 15.9: skim or skip page 571–575, read page 576–578
APUE 15.10

Required tasks

Modify the code so that the web server keeps request statistics. The web server should respond to a special admin URL /statistics with a statistics page that looks something like this:
```
Server Statistics

Requests: 50

2xx:      20
3xx:      10
4xx:      10
5xx:      10
```
Feel free to beautify the output.

Requirements, hints, and recommended order of tasks

Since multiple child processes will need to update the stats, you need to keep them in a shared memory segment. Use anonymous memory mapping described in APUE 15.9.
Perform the hit test from Part 1 and see if your code keeps accurate stats. The request counts may or may not be correct due to race conditions.
Now use POSIX semaphore described in APUE 15.10 to synchronize access to the stats. A few things to think about:
- POSIX semaphore can be named or unnamed. Which is a better choice here?
- Where should you put the sem_t structure?
- Are we using it as a counting semaphore or a binary semaphore?
- Is any of the semaphore functions you are calling a “slow” system call? If so, what do you need to handle?
Repeat the performance test and verify that the stats are accurate.

Part 3: Directory listing

The skeleton http-server.c does not handle directory listing. When a requested URL is a directory, it simply responds with 403 Forbidden.

Tasks

Modify the code so that it will provide directory listings.

Requirements and hints

Run /bin/ls -al on the requested directory and send out the result. You can format it in HTML if you wish, but the raw output is fine too.
In order to take the output of the ls command, you need to call pipe, fork, and exec. Arrange the file descriptors so that the ls output comes through the pipe.
Make sure you do not lose the multi-processing capability; that is, you still need to be able to serve multiple requests (whether they are files or directory listings) simultaneously.
Be diligent in closing the file descriptors that you don’t need as early as possible.
If ls encounters an error, it will print things to stderr. Make sure that the result you send to the browser includes them.

Part 4: Directory listing without running /bin/ls (0 points)

This part is optional and will not be graded.

Tasks

Serve directory listing without running /bin/ls.

Requirements and hints

This part is easy. Instead of forking and execing /bin/ls, just use opendir() and readdir() functions. See APUE 1.4 for an example.
You don’t have to mimic the output of ls -al. Just the list of filenames is fine – i.e., mimic the output of ls -a.

Part 5: Multi-threaded web server

POSIX threads provide a light-weight alternative to child processes. Instead of creating child processes to handle multiple HTTP requests simultaneously, we will create a new POSIX thread for each HTTP request.

Required tasks

Modify the original skeleton code (i.e., part0/http-server.c) so that the web server creates a new POSIX thread after it accepts a new client connection, and the new thread handles the request and terminates afterwards.
Two library functions used by the skeleton http-server.c are not thread-safe. You must replace them with their thread-safe counterparts in your code.

In your README.txt, identify the two functions and describe how you fixed them.
Test this implementation by connecting to it from multiple netcat clients simultaneously.

Recommended tasks

Perform benchmark measurements as you did in part 0 and part 1. Do you see any improvement over the skeleton version (part 0)? Any improvement over the multi-process version from part 1?

Requirements and hints

Call pthread_create() to create a new thread, passing the client socket descriptor as an argument to the thread start function.
Make sure that the newly created threads do not remain as thread zombies when they are done. You can prevent a thread from remaining as a thread zombie by either joining with it from another thread (i.e., call pthread_join()) or making it a detached thread (i.e., call pthread_detach(pthread_self())). Which method makes more sense in this situation?
Note that malloc() is not async-signal-safe, but is actually thread-safe.

Deliverables

Makefile, http-server.c, and other source files as usual
In your README.txt: how you fixed the two non-thread-safe function calls

Part 6: Pre-created pool of threads

Reading

Read the following Q&A at StackOverflow.com:

Calling accept() from multiple threads

In part 6 & 7, we will implement the two methods described in the article.

Required tasks

Instead of creating a new thread for each new client connection, pre-create a fixed number of worker threads in the beginning. Each of the pre-created worker threads will act like the original skeleton web server – i.e., each thread will be in a for(;;) loop, repeatedly calling accept().
Test this implementation by connecting to it from multiple netcat clients simultaneously.

Recommended tasks

Perform benchmark testing. How does it compare with part 5?

Requirements and hints

You can use a global array of pthread_t like this:

#define N_THREADS 16
static pthread_t thread_pool[N_THREADS];

After creating N_THREADS worker threads, make sure your main thread does not exit. A clean way to ensure it is to call pthread_join().

Part 7: Blocking queue

Required tasks

Modify the code so that only the main thread calls accept(). The main thread puts the client socket descriptor into a blocking queue, and wakes up the worker threads which have been blocked waiting for client requests to handle.
- After the main thread puts a client socket descriptor into the blocking queue, should it call pthread_cond_signal() or pthread_cond_broadcast()? Or will the server behave correctly in both ways (assuming everything else is correct)?
Test this implementation by connecting to it from multiple netcat clients simultaneously.

Recommended tasks

Perform benchmark testing. How does it compare with part 6?

Requirements and hints

You must use the following structures for your blocking queue:

/*
 * A message in a blocking queue
 */
struct message {
    int sock; // Payload, in our case a new client connection
    struct message *next; // Next message on the list
};

/*
 * This structure implements a blocking queue. 
 * If a thread attempts to pop an item from an empty queue 
 * it is blocked until another thread appends a new item.
 */
struct queue {
    pthread_mutex_t mutex; // mutex used to protect the queue
    pthread_cond_t cond;   // condition variable for threads to sleep on
    struct message *first; // first message in the queue
    struct message *last;  // last message in the queue
    unsigned int length;   // number of elements on the queue
};

Implement the following queue API:

// initializes the members of struct queue
void queue_init(struct queue *q)

// deallocate and destroy everything in the queue
void queue_destroy(struct queue *q)

// put a message into the queue and wake up workers if necessary
void queue_put(struct queue *q, int sock)

// take a socket descriptor from the queue; block if necessary
int queue_get(struct queue *q)

The members of struct queue should be accessed ONLY using the four API functions.

The main thread is in a for(;;) loop, accept()ing and putting the client socket into the queue.
The worker threads are in a for(;;) loop, taking out a socket descriptor from the queue and handling the connection.

Part 8: Listening on multiple ports

Required tasks

Modify the code so that the web server takes not just one, but multiple port numbers as command line arguments (followed by the web root as the last argument.) The web server will bind and listen on all of the ports.
Test this implementation by connecting to it from multiple netcat clients simultaneously to different ports.

Recommended tasks

Perform benchmark testing, hitting multiple ports. Compared to part 7, performance penalty should be negligible, if any.

Requirements and hints

Here is a piece code you can use in main():

if (argc < 3) {
    fprintf(stderr, 
        "usage: %s <server_port> [<server_port> ...] <web_root>\n",
        argv[0]);
    exit(1);
}

int servSocks[32];
memset(servSocks, -1, sizeof(servSocks));

// Create server sockets for all ports we listen on
for (i = 1; i < argc - 1; i++) {
    if (i >= (sizeof(servSocks) / sizeof(servSocks[0])))
        die("Too many listening sockets");
    servSocks[i - 1] = createServerSocket(atoi(argv[i]));
}

webRoot = argv[argc - 1];

The code will create server sockets for all of the ports specified in the command line, up to 31 of them. The servSocks array is initially filled with –1 so that we can tell where the list of socket descriptors ends.
- BTW, do you understand how memset(servSocks, -1, sizeof(servSocks)) fills an array of ints with –1 when it is supposed to fill the memory byte-by-byte?
In your main thread, before you call accept(), you need to find out which server sockets currently have a client pending so that you can call accept() knowing that it won’t block.
- You can accomplish that task using select() system call.
- You pass a read set containing all your server socket descriptors.
- When select() returns, you can go through the server socket descriptors, calling accept() on only those descriptors that are ready for reading.
Note that select() is special in that, even if the SA_RESTART option is specified, the select() function is not restarted under most UNIX systems. Make sure you handle this behavior properly.

Part 9: Nonblocking accept()

Part 8 has a flaw. Between select() and accept(), there is a chance that the client connection gets reset. If that happens, accept() will block. In order to handle that case, we need to make the server socket nonblocking.

Tasks

Modify the code so that createServerSocket() sets the server socket into a nonblocking mode.
- You can use fcntl() to turn on nonblocking right after you create a server socket with a socket() call.
Now accept() will never block. In those cases where it would have blocked, it will now fail with certain errno values. Read the man page to find out which errno values you need to handle.
- Also don’t forget to handle the interruption by a signal.

Part 10: Printing request statistics on SIGUSR1

Recall part 2, where we implemented a special admin URL /statistics to fetch a web server request statistics page. In this part, we will implement an alternate mechanism to print statistics.

For this part, we have to go back and start from our part 3 code, which is the last version of http-server with multiple processes (before we switched to multi-threading in part 5.)

Tasks

Modify the code from part 3 so that when the web server receives a SIGUSR1 signal, it will print the statistics at that time to standard error.
Test it by sending the signal with the kill command while the web server is blocked on accept call.
- Make sure the web server prints out the stats immediately, not when it receives the next HTTP request.
Test it by sending the signal with the kill command to a child process while the child process is in the middle of receiving an HTTP request. Describe what happens and explain why.
- You are not expected to do anything special about the child processes in this part, which means that the children will inherit the parent’s signal handler. This is not the right design. (Think about why.) You don’t have to fix the behavior. Just explain what happens and why.

Requirements and hints

Use sigaction to install a handler for SIGUSR1. You need to decide if you should set SA_RESTART flag or not.
Note that what you can do inside a signal handler is very limited. For example, you can’t call fprintf because it is not an async signal safe function.
Don’t forget to lock the semaphore when you access the stats. Again, you can’t lock stuff in signal handlers because you can then deadlock.
The web server should respond immediately to SIGUSR1 when it’s blocked on accept. If a SIGUSR1 signal comes during the short period of time between two accept() calls, it will miss it. You don’t have to handle this case.

Deliverables

Makefile, http-server.c, and other source files as usual
In your README.txt: explanation for task #3.

Part 11: Server-side bash scripts (0 points)

This part is optional and will not be graded. You may skip to part 12.

This part is a challenge for those of you hackers, who are complaining that this assignment has been too easy so far.

In this part, we will enable server-side bash scripts. When a requested URL is an executable script, the web server will run it using /bin/bash, and send back the output of the script.

The web server will ensure that the script will not run longer than a fixed amount of time. The server will also terminate the script if the HTTP client (i.e. the browser) closes the TCP connection while the script is still running.

Getting this right is actually pretty hard. You are not expected to handle every single corner cases. (In fact, our solution doesn’t handle all cases either.) But you can get close. We suggest you approach this in the following order:

Implement support for server-side scripts
- If the requested file has the execute permission, pass it as an argument to /bin/bash -c.
- This is pretty much the same as part 3. Replace /bin/ls -al with /bin/bash -c.
- You can test it with the hostinfo script provided.
Terminate the script when the HTTP connection is closed
- If the client HTTP connection gets closed while the script is still running (send() will fail in that case), you need to kill the script because there is no point running it when you don’t have anyone to send the result.
- Killing the script is a bit tricky. Since a bash script by definition will run child processes of its own, you need to send SIGTERM to all of them, not just the bash process. An easy way to achieve this is to make the bash process a group leader by calling setpgid(0, 0) (see the man page for detail), and then later sending SIGTERM to the entire group. The kill and waitpid functions have a way to refer to a group rather than an individual process.
- You can test it with the loop script provided.
If the script does not respond to SIGTERM (because it’s catching it or ignoring it), send SIGKILL.
- Set an alarm so that waitpid() is interrupted after 5 seconds.
- When you return from waitpid(), you need to check if the alarm has fired (if it did, the signal handler was just called), and send SIGKILL only if waitpid() got interrupted by SIGALRM.
- You can test it with the undying script provided.
Limit the time that the script can run even if the HTTP client is willing to wait.
- Set an alarm for 10 seconds before you begin reading the bash process’s output.
- Same SIGTERM & SIGKILL sequence as before. Thus, a script that catches SIGTERM can run up to 15 seconds if the HTTP client does not quit within 10 seconds.

Part 12: Pre-forked pool of processes

Recall that in part 6 we pre-created a pool of worker threads. Here, we will pre-fork a pool of worker child processes.

What the child processes do is also similar to what the threads did in part 6. The child processes will all be in an infinite loop repeatedly calling accept(). In part 13, we will change this model in a similar way we did in part 7. In part 7, we passed open socket descriptors to worker threads using a blocking queue. In part 13, we will pass open socket descriptors to worker processes using a UNIX domain socket.

Required tasks

Pre-fork a fixed number of processes. Each child process will run a for (;;) loop, in which it will call accept() and handle the client connection.
- After giving birth to all its children, the parent process will wait for all the child processes to terminate, which will never happen because the child processes are in a perpetual loop. See the man page for waitpid to figure out how to wait for any of one’s multiple child processes.
Change the code so that only the parent process will handle SIGUSR1 for dumping statistics.
- Here is how you can do this. Before you start forking, you set it up so that SIGUSR1 is ignored. The disposition will be inherited by the child processes when you fork. After you are done forking, you then set the signal handler. You will also have to move the code that prints the stats.

Recommended tasks

Perform benchmark testing to compare this implementation with part 6 where we had threads instead of processes. Do you see any difference? For this comparison, which would be more revealing, serving big files or tiny files, if at all?

Part 13: Passing socket descriptors to child processes

In this part, instead of all the child processes calling accept(), only the parent process will call accept(), and it will pass each connected socket to a child process (chosen by round robin) through a UNIX domain socket.

Here are sendConnection() & recvConnection() functions that sends and receives open file descriptors through a UNIX domain socket. (You don’t need to understand this code. These are for you to copy & paste, and use it in your http-server.c.)

// Send clntSock through sock.
// sock is a UNIX domain socket.
static void sendConnection(int clntSock, int sock)
{
    struct msghdr msg;
    struct iovec iov[1];

    union {
      struct cmsghdr cm;
      char control[CMSG_SPACE(sizeof(int))];
    } ctrl_un;
    struct cmsghdr *cmptr;

    msg.msg_control = ctrl_un.control;
    msg.msg_controllen = sizeof(ctrl_un.control);

    cmptr = CMSG_FIRSTHDR(&msg);
    cmptr->cmsg_len = CMSG_LEN(sizeof(int));
    cmptr->cmsg_level = SOL_SOCKET;
    cmptr->cmsg_type = SCM_RIGHTS;
    *((int *) CMSG_DATA(cmptr)) = clntSock;

    msg.msg_name = NULL;
    msg.msg_namelen = 0;

    iov[0].iov_base = "FD";
    iov[0].iov_len = 2;
    msg.msg_iov = iov;
    msg.msg_iovlen = 1;

    if (sendmsg(sock, &msg, 0) != 2)
        die("Failed to send connection to child");
}

// Returns an open file descriptor received through sock.
// sock is a UNIX domain socket.
static int recvConnection(int sock)
{
    struct msghdr msg;
    struct iovec iov[1];
    ssize_t n;
    char buf[64];

    union {
      struct cmsghdr cm;
      char control[CMSG_SPACE(sizeof(int))];
    } ctrl_un;
    struct cmsghdr *cmptr;

    msg.msg_control = ctrl_un.control;
    msg.msg_controllen = sizeof(ctrl_un.control);

    msg.msg_name = NULL;
    msg.msg_namelen = 0;

    iov[0].iov_base = buf;
    iov[0].iov_len = sizeof(buf);
    msg.msg_iov = iov;
    msg.msg_iovlen = 1;

    for (;;) {
        n = recvmsg(sock, &msg, 0);
        if (n == -1) {
            if (errno == EINTR)
                continue;
            die("Error in recvmsg");
        }
        // Messages with client connections are always sent with 
        // "FD" as the message. Silently skip unsupported messages.
        if (n != 2 || buf[0] != 'F' || buf[1] != 'D')
            continue;

        if ((cmptr = CMSG_FIRSTHDR(&msg)) != NULL
            && cmptr->cmsg_len == CMSG_LEN(sizeof(int))
            && cmptr->cmsg_level == SOL_SOCKET
            && cmptr->cmsg_type == SCM_RIGHTS)
            return *((int *) CMSG_DATA(cmptr));
    }
}

Required tasks

Pre-fork a fixed number of processes. Each child process will run a for (;;) loop, in which it will call recvConnection() and handle the client connection it receives.

Recommended tasks

Perform benchmark testing to compare this implementation with part 7 where we had a fixed number of worker threads receiving open sockets from a blocking queue.

Requirements and hints

The parent process, when it forks the child processes, will also create the same number of connected UNIX domain socket pairs, one pair for each child process.
- Close the unused end of the socket pair from parent and child process.
The parent process will no longer call waitpid. Instead, it will be in a for (;;) loop repeatedly calling accept(). After each accept(), it will pick a child process by round robin, and call sendConnection() to pass the client socket.
- The parent process can safely close the client socket when sendConnection() returns (even if the child process might have not received it yet.) The descriptor has been “dup”ed when the underlying sendmsg() call has returned.
- The child process will also close the client socket when it’s done handling the connection (by closing the FILE* wrapper of it.)

Part 14: Daemonization (0 points)

This part is optional and will not be graded.

In this part, we will make our web server a daemon process. Daemons in UNIX systems are programs that run as background processes typically providing essential system services to users and other programs. See APUE chapter 13 for more information.

Tasks

Daemonizing your web server is super-easy. Here are all you have to do:

At program start-up (i.e. in the beginning of the main() function maybe after checking arguments), call daemonize() from APUE 13.3.
The daemonize() function will detach the running process from its controlling terminal, so printing to stdout or stderr won’t work anymore. You need to replace the printf() and fprintf() statements with syslog(), described in APUE 13.4.

If you are doing this part, I recommend that you read APUE chapter 13 to learn about daemon processes.

Good luck!

Acknowledgment

This series of assignments were co-designed by Jae Woo Lee and Jan Janak as a prototype for a mini-course on advanced UNIX systems and network programming.

Jan Janak wrote the solution code.

Jae Woo Lee is a lecturer, and Jan Janak is a researcher, both at Columbia University. Jan Janak is a founding developer of the SIP Router Project, the leading open-source VoIP platform.

Last updated: 2016–02–07