Communication Protocol for Wireless Group Data Sharing
(yet to be named)

Purpose
- allow a host to create requests for data when not connected to internet
- allow a host to service these requests

Basic Implementation Ideas
The intent of the protocol is for it to make it seem to the user that a computer has connectivity to the internet without it actually being on the internet. This is achieved by creating a seamless switchover to a peer-to-peer (i.e. no direct internet access) network. Within this network, data can then be shared between hosts, as if they were connected to the internet.

To make this switchover seamless to the user, the IP driver on the wireless NIC (Network Interface Card) needs to start communicating with its peers using this protocol. The user should not have to do anything to make this switchover happen - all he/she should do is maybe notice that internet connectivity is gone and that the access to data may be limited.

For the case of the Vadem Clio and probably most other devices, one NIC needs to support both off-line and direct connectivity. This issue will need to be resolved at some point, but it will most likely involve work at the NIC driver level which is closed-source and copyrighted code. Considering the fact that we should be using Linux laptops at some point soon, getting the wireless communication working with this protocol is more important than thinking about the switch-over from internet connectivity to no connectivity.

Technology
UDP packets with broadcast IP. The system-wide default public address is 233.19.5.0 (picked arbitrarily), but private groups (for institutions or certain types of common interests) are welcome to define their own. Default groups for topics ranging from sports to current news will be defined later (see "Channels").

Protocol Header
The header for version 0 will be as follows:

Note: the "|" spacers don't count as bits

 0    4    8                        32                 80             96
 |    |    |                        |                  |              |
+----+----+------------------------+------      ------+----------------+
| 1  | 2  |           3            |    4  ...   4    |       5        |
+----+----+------------------------+------      ------+----------------+
  4    4              24                    48                16


 number | length | description
        | (bits) |
--------+--------+---------------------------------------------------
   1    |   4    | version number (currently 0)
   2    |   4    | type of packet (request, response, etc.)
   3    |   24   | ID number
   4    |   48   | sender's MAC (Ethernet Hardware) address
   5    |   16   | sequence number


This makes the header length 96 bits (12 bytes).

Protocol Header Layout
1) Version Number
The version number for this incarnation is 0. The purpose of this field is to allow the protocol header to be altered in the future. With this, multiple versions of the protocol can operate at the same time, just like IPv4 and IPv6 can coexist. The intent is to only increase version numbers if the header format changes or if there are drastic changes in the protocol.

The first thing any client should do when it receives a packet, is read off this protocol number and check to see whether it supports it.

2) Type of Packet
Note: This is the area of greatest uncertainty for version 0 of the protocol. As well thought-out as this section may be, it will surely be changed and expanded often in the early stages of this project.

The following are the packet type to code matchings:

 4-bit code |       name        |     description
 in binary  |                   |
============+===================+==============================================
    0000    | document request  | Represents general request for a
            |                   | document.
------------+-------------------+----------------------------------------------
    0001    | have document     | Response to a document request containing the
            |                   | date of the document. This enables the client
            |                   | to choose the newest document or document
            |                   | from a specific time period.
------------+-------------------+----------------------------------------------
    0010    | specific document | A request for a specific document from a 
            | request           | specific host. This is a response to the 
            |                   | "have document" command.
------------+-------------------+----------------------------------------------
    0011    | document send     | Transmits the document that was
            |                   | requested.
------------+-------------------+----------------------------------------------
    0100    | packet request    | Represents a request for a specific
            |                   | packet. Using the
            |                   | sequence number, this is used to retransmit
            |                   | lost packets.
------------+-------------------+----------------------------------------------
    0101    | packet response   | Response to a specific packet request.
------------+-------------------+----------------------------------------------
    0110    | EOL message	| Indicates that the document has been
            |                   | received fully.
------------+-------------------+----------------------------------------------
The body for each of these packet types is described below in "Packet Body".

Using 4-bit identifiers for the packet type allows 2^4 = 16 packet types which should be plenty for future expansion.

3) ID Number
The purpose of this number is to uniquely identify the requests and responses that are sent. A host should choose a random 8-bit number and couple that with the last 2 digits of its current IP address. This should almost uniquely identify requests that are made in the network.

A response to a request will use this exact number to as its ID.

4) Requester's IP Address
Although this is an IP-based system, MAC addresses will need to be used to uniquely identify hosts. This is because IP addresses can change if a user moves out of certain area and into another one.

5) Sequence Number
The sequence number is needed to be able to reassemble the data in order. UDP does not make any guarantees that the packets will arrive in the order that they are sent, so the application layer procotol (this protocol) need to maintain data consistency.

Sequence numbers are only relevant to the "document send" command.

Packet Contents
0) General Information
The packet header is defined in the previous section. The following subsections will explain the body for each type of request and specify any special subleties in the header for that type of packet. The total number of bytes available in the body is 1466 bytes:

         1500       -       10   -   8    -   12    = 1470
         MTU                IP      UDP      this     
(Maximum Transfer Unit)   header   header   header
1) 0000 - Document Request
Header: The requester chooses a randomly generated ID number for the request to be placed in the header. The sequence number field is unused.

Body: Since packets for each different topic are sent on a different multicast address, the body for the Document Request packet contains a date which represents the oldest allowable response of the document being requested and the piece of data that it is requesting.

The piece of data being requested will be in the form a Universal Resource Locator (URL) to allow maximum flexibility in terms of what is being requested. Not only can regular HTML files be requested via an "http://..." URL, but FTP requests could be made as well as other new services which use a URL to identify themselves. More information on Universal Resource Indicators (URI) and URL's can be found in RFC 2396 at http://www.ietf.org/rfc/rfc2396.txt

The separator for the URL and date will be a semicolon (";"). If the need for a semicolon arises in the URL, it should just be placed there without any "\;" or similar escape sequence, because the date is assumed not to contain any semicolons.

The date will be sent as a 64bit long representing the number of milliseconds since the epoch, January 1, 1970. If it does not matter how old the document it, one should send 0x1000000000000000.

2) 0001 - Have Document
Header: The ID number has to be the same as the one from the document request. Again, the sequence number field is unused, since the body will fit into 1 packet.

Body: The "Have Document" response will reply with the date of the document that the host has stored so that the original requester can then decide whether it wants to actually have that document. In addition, a 64 bit unsigned long will be used to represent the number of subsequent packets that will be sent. This number is used so the receiver knows how much packets to expect and when one or more packets have been missed.

3) 0010 - Specific Document Request
Header: The ID number again is the same as from the previous two packets. The requester does not create a new random number for this request. Once again, the sequence number field is unused.

Body: The "Specific Document Request" packet allows a host to ask a specific other host for a specific document with specific date. The intent is for the original requester to have selected a host to get the data from and then get it.

The format for this request will have the MAC address of the recipient host (the one that has the data) as the first and only field.

4) 0011 - Document Send
Header: The ID number is also unchanged here. It has to be the same as from the previous packets. The sequence number is important though and is discussed below.

Body: There will probably be multiple packets of this type for every request. Each of the packets should have the same ID number. The data is split up into chunks of 1470 bytes and placed in the packet. Their sequence number starts at 0 and increases by 1 each time a new one is sent out for this response (ID number), allowing assembly of the data even for cases where packets arrive out of sequence.

The end of the data will be signified by a packet that is not the maximum size or if necessary by a completely empty packet. In case this last packet is lost and the recipient is left dangling, there should be a timeout and the requester should use the "Packet Request" packet to re-request this packet.

5) 0100 - Packet Request
Header: The ID number for this packet request needs to be the same as the one that was used in the data stream from which this packet was lost and is being re-requested. The sequence number in this "Packet Request" packet is the sequence number that is being re-requested.

Body: The body of this packet contains URL of the document and the MAC address of the host that it's requesting from, separated by a ";".

6) 0101 - Packet Response
Header: The ID number and sequence number are those from the "Packet Request" packet, since this is a response to that request.

Body: The body contains the data for the packet that was re-requested. It follows the same guidelines as the "Document Send" packet.

7) 0110 - EOL message
Header: The ID number and sequence number are those from the "Packet Request" packet, since this is a response to that request. Body: The body only contains a 64 bit unsigned long which indicates the number of unique information carrying packets sent (type 0011) so that the client can verify that they have been received.

Protocol Implementation in an Application
Sending a query
Since a wireless network is inherantly lossy, we may need to resend queries. There could easily be situations where a packet just didn't reach a host that actually could have replied. On the same token, a reply could have gotten lost, so it is important for a host to resubmit his query.

This raises the question of how often and with what interval though. THIS IS OPEN FOR DISCUSSION.

Getting a query and replying
A host chooses to listen to certain mulicast addresses ("channels", see section "Channels"). There is a system-wide default channel (233.19.5.0 as mentioned in the "Technology" section) that each host should listen to by default.

For each packet that a host comes across, it it needs to examine it to see whether (in this case) it is a query for data packet. The packet type for this is "000". If it is, the host should look up to see whether it has the data and reply accordingly if it does. The reply is of type "0001" and the host then continues to scan the packets on the multicast address.

The requesting host now takes some time to gather all the replies it gets and pick the one it wants. If it chooses a host X, a "Specific Document Request" packet (type "0010") will then be sent to the multicast address which the host X should pick up and reply to by a series of "Document Send" (type "0011") packets.

Channels
TBD

Outstanding Issues
1) Short headers. Leaving out the sequence number for everything except for a "document send" where it is needed makes the packet size smaller for most packets. However, there isn't going to be much data in those packets anyway, so leaving out 16 bits or 2 bytes is not a huge issue.


© 2000 Columbia University
Written by: Tim Trampedach (ttt5@cs.columbia.edu)
Last updated: Friday, 05-May-2000 14:04:49 EDT