Simplified HTTPS

The full HTTP protocol is complex. You will be implementing a very small subset. The basic syntax of a command is:

<verb> <url> <version>

<option-lines>

<newline>

Where <verb> will be either GET or POST, <option-lines> is zero or more non-empty lines, and <newline> is an empty line. Note that lines may be terminated by \r\n; you should always simply ignore \r characters. The server will read a command line, and an arbitrary number of non-empty lines terminated by a null line. In real HTTP, those non-empty lines are used to pass assorted options. By accepting and ignoring them, you can use a standard web browser to talk to your stripped-down server. One of these lines must be accepted and interpreted: Content-Length: (and yes, the colon is mandatory). Note that the option name is case-insensitive; you must accept, e.g., conTent-lEngth: as well.

<version> is, of course, the version number; always send HTTP/1.0 but ignore the version on receipt.

A <url> is a URL. All URLs for this project will be of one or two forms:

https://host/file/path[?parameters]

https://host:port/file/path[?parameters]

The host field is, of course, a hostname or IP address. Always assume port 443 unless a port field (a string of digits) is present. Why? You may find it useful to use a second port number when doing client-side authentication. The meaning of the /file/path string is up to you, but you can (and should) impose reasonable length restrictions.

The parameter list is a ? followed by &-separated keyword=value sequences; we've all seen these. It is up to you if you use parameters; I suspect you will find it easier not to. Note that passwords or other secret values must not be passed in URLs: URLs are often logged.

A GET request simply asks the server to send some data; the URL . A POST request is used when uploading data. When using POST, the Content-Length: line must be used. After the colon and optional white space, there is a length in bytes given as a sequence of digits; that denotes the number of bytes of data to read. You will use this, for example, when uploading a message. If the server receives an end-of-file indication before that number of bytes, it may discard everything.

I do not require any particular format for input; I strongly suggest that you make it as simple as possible, e.g., a line for the username, a line for the password, etc.

A real web server can receive and send ASCII or binary data back, depending on option lines; don't do that. Instead, have your programs know from context what's coming and decode things accordingly. I strongly suggest using simple hexadecimal to send binary, though you can use base64 if you're really concerned about efficiency (you shouldn't be for this project).

A response from the server consist is:

<version> <status-code> <text>

<option-lines>

<newline>

<body>

where <version> will be HTTP/1.0 (but accept anything), <status-code> is a 3-digit number, and <text> should be ignored for non-error situations and displayed to the user for error situations. 

Always send 200 as the status code, but accept any status code whose first (decimal) digit is a 2 as indicating success. If the first digit is a 3, there must be a Location: option line showing a new URL to go to instead. Why? This allows web servers to redirect you to a different URL, e.g., one that takes a port number as shown above. A Content-Length: option indicates that the server is sending back data, e.g., a message or a certificate; interpret it as above.

Status codes beginning with 4 or 5 are error codes; display <text> to the user and exit.

Here is an actual transcript of me connecting to the CS department web server and getting a 301 redirect:

$ telnet www.cs.columbia.edu 80

Trying 128.59.11.206...

Connected to webcluster.cs.columbia.edu.

Escape character is '^]'.

GET / HTTP/1.0


HTTP/1.1 301 Moved Permanently

content-length: 0

location: https:///

connection: close


Connection closed by foreign host.


Everything up to the "Escape character" line and the last line are from the telnet command.

Servers need not send Content-Length:; if they don't, read until end-of-file. As before, ignore all \r characters.