INTERNET DRAFT A Profile for the Transmission of Video Data over RTP 9 January 1995 Document Revision 0.3 Revision Date: 02 Nov 1994 Frank Kastenholz FTP Software, Inc 2 High Street North Andover, Mass 01845-2620 USA kasten@ftp.com Status of this Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a ``working draft'' or ``work in progress.'' Please check the 1id-abstracts.txt listing contained in the internet-drafts Shadow Directories on nic.ddn.mil, nnsc.nsf.net, nic.nordu.net, ftp.nisc.sri.com, or munnari.oz.au to learn the current status of any Internet Draft. This is a working document only, it should neither be cited nor quoted in any formal document. Internet Draft A Video Transmission Profile January 1995 This document will expire before 14 July 1995. Distribution of this document is unlimited. Please send comments to the author(s). IETF Exp. 14 July 1995 [Page 1] Internet Draft A Video Transmission Profile January 1995 1. INTRODUCTION This document presents the specification for Loki, a profile to carry video traffic over RTP[1]. Loki is an experimental protocol developed at ftp Software to conduct research into the issues surrounding the development, performance, and use of network video applications in the PC/Microsoft Windows environment. There are several factors in that environment that affected the decision to develop our own video profile as opposed to doing a a straight port of NV, or at least a re-implementation of NV's protocols in a Windows application, and then influenced the design of that protocol: 1. A main element of the PC/Windows environment is Microsoft's "Video for Windows." Video for Windows is a set of libraries and APIs that provide a common environment for writing video applications. Of most concern to network- video is a "standard" API for controlling video capture devices and obtaining captured images. The use of Video for Windows imposes certain constraints on the use of the video-capture hardware, including the formats of the data received from that hardware. 2. PC/Windows machines are rather limited in their performance when compared to typical Unix workstations. Low speed (20, 25, 33 MHz) 386 processors with 4 or 8 Meg of memory are still very very common. There are other performance considerations as well, such as Windows' use of 16-bit mode, the use of DOS, and so on. 3. Network interface adaptors, and drivers, exhibit widely varying levels of performance. Furthermore, the PC world has the notion of "server" cards and "client" cards, with "server" cards tending to exhibit higher performance than "client" cards. Of course, "server" cards are also more expensive. In many instances, it turns out that the key performance issue is the card and driver used by the PC. 4. The byte ordering native to PCs is reversed when compared to the standard Network Byte Order. Whilst Loki headers are all in Network Byte Order, one particular pixel format type is transferred as 16-bit integers in PC order rather than Network Byte Order. 5. The programming environment in Windows is vastly different from the typical X/Unix environment to which NV was written. Therefore, a straight port of NV was eliminated from consideration rather early on. IETF Exp. 14 July 1995 [Page 1] Internet Draft A Video Transmission Profile January 1995 6. A system should be able to recieve and display Loki video transmissions without any special display processing hardware and with a minimum of 1.1. Note On Terminology While network video is symmetric - a node can be both the sender of video data to others and the receiver/displayer of data received from others - this symmetry is composed of many asymetric relationships. That is, a single, two-way video conference between two people is actually composed of two, one-way conferences. The following terms are used in this document to describe the two nodes of a single, one-way, video conference: SOURCE The SOURCE is the node that is actually transmitting video image data. SINK The SINK is the node(s) that actually receives the video data from the SOURCE and displays it (or otherwise processes it). Note that a single physical node can be both a SOURCE and a SINK. Furthermore, both SOURCEs and SINKS will transmit and receive packets. E.g., both will send and receive SDES packets, a SOURCE will send SR packets and receive RR packets from the SINKs, while the SINK receives the SR packets and sends RR packets. 1.2. Change Log The following changes have been made to the Loki specification since the previous document. (1) The Loki header is no longer considered an RTP Header Extension. (2) The Loki header has been rearranged since it is no longer an RTP header extension (the length field is no longer needed). In addition, some of the unused fields IETF Exp. 14 July 1995 [Page 2] Internet Draft A Video Transmission Profile January 1995 have been removed to save space. Fields have been gratuitously rearranged to make full use of the 4 bytes before the version number. This way the version number is kept 'in position' allowing easy version differentiation. (3) The Loki Header version number has been changed to version 2. (4) The Can I See request packet has been restructured. The packet now includes a field specifying the network address (IP Address/UDP Port) to send the Can I See video stream to. This is needed because of the curious behavior of some PC TCP/IP stacks w.r.t. multicasts. (5) The version number for the Loki RTCP Application header has been changed to version number 2. (6) A description of the Can I See function has been added. IETF Exp. 14 July 1995 [Page 3] Internet Draft A Video Transmission Profile January 1995 2. PROTOCOL SPECIFICATION This chapter specifies the Loki video protocol. 2.1. Byte Order Unless otherwise mentioned, all data are transferred in Network Byte Order, as specified in [2]. 2.2. Video Data This section specifies the protocol used to carry the Video Data. Loki Video data packets are carried in RTP Data Packets[1]. Loki adds an additional header to the packet, behind the RTP header: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RTP Header... | : : | ... RTP Header | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Loki Header... | : : | ... Loki Header | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Loki Data... | : : RTP Data Packet Each tick mark represents one bit position. 2.2.1. Packet Size Loki implementations MUST be able to send and receive packets containing at least 1,000 bytes of Loki Data. IETF Exp. 14 July 1995 [Page 4] Internet Draft A Video Transmission Profile January 1995 2.2.2. RTP Header Profile This section specifies the use of several fields within the RTP header. 2.2.2.1. Marker Bit The Marker (M) bit of the RTP header is not used by Loki. Loki implementations should not generate packets with the M bit set and they should ignore the bit in received packets. 2.2.2.2. Payload Type The payload type field for Loki packets is TBD. (Current experiments use the value 0x11, but this is subject to change). 2.2.2.3. Extension Bit The RTO Header Extension (E) bit is not used for Loki. 2.2.3. Loki Header This section specifies the Loki header (which follows the RTP header). All Loki video packets contain the following header between the RTP header and the actual video data: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Width | Height | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | Unused 1 | Format | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Loki Video Header Each tick mark represents one bit position. Where: IETF Exp. 14 July 1995 [Page 5] Internet Draft A Video Transmission Profile January 1995 Width This field contains the width of the image, in pixels. By carrying the width and height information in every data packet the image size can be changed dynamically during the session. All Loki SINKs MUST support this. Height This field contains the height of the image, in pixels. By carrying the width and height information in every data packet the image size can be changed dynamically during the session. All Loki SINKs MUST support this. Version This field contains the version number of the Loki protocol. The version of the protocol specified in this document is 2. The value 0 is explicitly reserved for research use. Any received packet with an invalid version number, or a number identifying a version that is not supported MUST be discarded. Unused 1 This field is currently unused. It is present to preserve alignment of following fields. Format This field contains a value that describes the video data format. Loki allows for several different video data formats, both how each pixel is encoded and how the video frame is chopped up and transmitted. All of these formats are described in the "Video Encoding Specifications" chapter, below. 2.3. RTCP Extensions With the exception of application-defined packets (see next section), Loki does not extend any of the RTCP packets in any way. IETF Exp. 14 July 1995 [Page 6] Internet Draft A Video Transmission Profile January 1995 2.4. Loki Control Packets Loki defines several control packets. These all are carried in the RTCP APP packets. Loki control packets are identified by the four character string "loki" (in 7-bit ASCII) in the "name" field of the RTCP APP packet[1]. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RTCP Flags | RTCP Payload | RTCP Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Application Name -- "loki" | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Version | Unused | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type Specific... | Loki Control Packet Header Each tick mark represents one bit position. The RTCP fields, the SSRC field, and the Application Name field are as described for the RTCP APP packet in the LOKI specification. Type This field identifies the type of Loki Control packet this is. The specific types are described in following sections. Version This field identifies the version of the Loki Control protocol. The version number described in this document is version 2. This field must contain a 2. Unused This field is unused. IETF Exp. 14 July 1995 [Page 7] Internet Draft A Video Transmission Profile January 1995 2.4.1. Discard This is the DISCARD Loki Control packet. If this packet is received it must be discarded. No error conditions or notifications may be generated as a result of receiving this packet. This packet must be included in any packet-accounting done by the receiver (e.g., bytes or packets received on the control port). There are no extensions to the Loki Control Packet Header. If an application appends more data, that data MUST be ignored by the receiver. This packet is packet type 0. 2.4.2. Suspend This packet tells the SINKS that the SOURCE has temporarily stopped transmission but will probably resume shortly. A SINK may then notify the human users of this condition. This packet is packet type 1. A SOURCE should retransmit this packet periodically while suspended. The exact retransmission strategy is not important. One strategy which implementation experience has show to be effective is for a SOURCE to send an initial burst of 3 or 4 Suspend packets at short intervals (such as 100ms) and then go to a longer period, such as once a minute. There are no extensions to the Loki Control Packet Header. If an application appends more data, that data MUST be ignored by the receiver. 2.4.3. Can I See Request This packet is used to submit a "Can I See" request to a known video source. The operation of the Can I See function is described in a following section. The Can I See (CNIC) request packet is packet type 2. IETF Exp. 14 July 1995 [Page 8] Internet Draft A Video Transmission Profile January 1995 The CNIC Request packet adds the following fields to the basic Loki Control Packet Header. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Loki Control Packet Header ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Request Handle | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Requested Frame Count | Desired Format | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Requested Epsilon | Addr Family | Flags | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Server Csrc | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RTCP Request Address... | | ...RTCP Request Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RTP Request Address... | | ... RTP Request Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ CNIC Request Packet Each tick mark represents one bit position. Where Request Handle This is a 32-bit value assigned by the SINK to the request. The SOURCE will include this value in any Can I See status packets to assist the SINK in matching status indications to requests. Requested Frame Count This is the desired number of video frames which the SINK wishes to see. The SOURCE must make every effort to send this number of frames to the SINK. A frame count of 0 indicates that the SINK wishes the SOURCE to stop video transmission. IETF Exp. 14 July 1995 [Page 9] Internet Draft A Video Transmission Profile January 1995 A frame count of 65535 (0xffff) indicates that the SINK wishes the SOURCE to send video data indefinitely. A frame count of 0 indicates that the SOURCE should immediately stop sending video data. Desired Format The video transmit format (see "Video Encoding Specifications") which the SINK wishes to receive. A SOURCE is not required to use this value in satisfying the request. it is merely an indication of the SINK's desire. Requested Epsilon This field carries a desired value to use for epsilon in any pixel-comparisons done in satisfying the request. See the section on "Pixel Comparisons", below, for a description of the use of the epsilon value. A SOURCE is not required to use this value in satisfying the request. it is merely an indication of the SINK's desire. Address Family This field identifies the address family for the Request Address. The following values are supported: 0 Reserved. 1 UDP/IPv4 2 Reserved for UDP/IPv6 Flags This field contains some flag bits: 0x01 Indicates that the Server's CSRC field actually contains a valid CSRC. Server's CSRC This field contains the Content Source that identifies the server to which the request is directed. This field may assist Mixers/Bridges and Translators in passing requests. No particular CSRC value is reserved, so a separate flag bit has been defined to indicate that the IETF Exp. 14 July 1995 [Page 10] Internet Draft A Video Transmission Profile January 1995 field actually contains a CSRC. RTP Request Address RTCP Request Address These two fields indicate the responses should be sent. The actual format of the fields depends on the address family. Currently, only the UDP/IPv4 family is specified: +--------+--------+--------+--------+ | IP Address | +--------+--------+--------+--------+ | UDP Port | unused | +--------+--------+--------+--------+ There are two fields here. The first one contains the Addressing information where RTCP packets are to be sent. The second one contains the information specifying where the RTP data should be sent. If any element of the Request Address fields contains 0 then the value should be derived from the Source Address and Port information for the packet. For example, if the IP Address fields are 0 then the IP address to which the data are sent should be the source address of the packet containing the request. 2.4.4. Can I See Status This packet is used to indicate status information to a "Can I See" requestor. The Can I See (CNIC) Status packet is packet type 3. The CNIC Status packet adds the following fields to the basic Loki Control Packet Header. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Loki Control Packet Header ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Request Handle | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ IETF Exp. 14 July 1995 [Page 11] Internet Draft A Video Transmission Profile January 1995 | Status Code | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ CNIC Status Packet Each tick mark represents one bit position. Where Request Handle Is the Request Handle field of the CNIC Request packet to which this status indication applies. This information may assist the SINK in matching requests with status indications. Status Code This is the actual status code. The following codes are defined: 0 - NO-OP This is a no-op. It may be ignored. 1 - No Resources This indicates that the CNIC request is being rejected because the SOURCE does not have the resources to honor the request. 2 - Disabled This indicates that the CNIC request is being rejected because the CNIC service at the SOURCE is either administratively disabled or not supported at all. 3 - Too Many Frames This value indicates that the request is being rejected because the requested number of frames exceeds an administrative limit set at the SOURCE on the number of frames allowed for any single request. 4 - Prohibited to You This value indicates that the request is being rejected because the SOURCE is administratively prohibited from honoring requests from the SINK. 5 - Terminated This code is sent to the SINK when the SOURCE has completed honoring the request. 6 - Too Many Requests This value indicates that the request is being rejected because the request would exceed the SOURCE's administrative limit on the number of requests that it IETF Exp. 14 July 1995 [Page 12] Internet Draft A Video Transmission Profile January 1995 can handle. IETF Exp. 14 July 1995 [Page 13] Internet Draft A Video Transmission Profile January 1995 3. Video Encoding Specifications This section describes the data formats used by Loki. There are separate parts to the data format specifications; there are the specifications of the formats for the data for the indivdual pixels (called Pixel Formats) and the specifications for how the pixel data are assembled, formed into packets, and transmitted (called Transfer Formats). 3.1. Pixel Formats This section contains the specifications on how individual pixels are represented. 3.1.1. 24 Bit RGB In this format the pixel is transferred as a 24-bit RGB triple. Each of the Red, Green, and Blue values are transferred as a single 8-bit quantity, each one in a single byte. The elements are transferred in the order Blue, Green, Red: 0 0 0 1 1 1 2 2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ | Blue | | Green | | Red | +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 24-bit RGB Pixel Format Each tick-mark represents one bit Note that this format is in the color-ordering that is native to Microsoft Windows. 3.1.2. 16 Bit RGB This format transfers each pixel as a 16 bit RGB value. Each of the three Red, Green, and Blue values are transferred in 5 bits. The three 5-bit values are packed into a single 16-bit IETF Exp. 14 July 1995 [Page 14] Internet Draft A Video Transmission Profile January 1995 element for transmission. The elements are packed as follows: 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |G2 G1 G0 B4 B3 B2 B1 B0 xx R4 R3 R2 R1 R0 G4 G3| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 16-bit RGB Pixel Format Each tick-mark represents one bit Where: xx Is an unused bit. B0...B4 Are the 5 bits representing the BLUE value, B0 being the least significant bit, B4 being the most significant bit. G0...G4 Are the 5 bits representing the GREEN value, G0 being the least significant bit, G4 being the most significant bit. R0...R4 Are the 5 bits representing the RED value, R0 being the least significant bit, R4 being the most significant bit. This bit and byte ordering is an artifact of the Intel '86 processor family's byte ordering. The ordering is "correct" for this processor family in that, when treating this pixel as a single 16-bit integer value, the BLUE bits are the least- significant, the RED bits the most-significant, and all of the GREEN bits are adjacent. 3.1.3. 8 Bit Monochrome This format transfers each pixel as a single 8-bit monochrome value. Each value is transferred in a single byte. One possible method for conversion from a 24-bit RGB value would be: mono_value = (pixel.red + pixel.green + pixel.blue) / 3; IETF Exp. 14 July 1995 [Page 15] Internet Draft A Video Transmission Profile January 1995 3.1.4. 4 Bit Monochrome This format transfers each pixel as a single 4-bit monochrome value. Each transferred byte contains two pixels' of data: 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |Pixel 1|Pixel 2| +-+-+-+-+-+-+-+-+ 4-bit Monochrome Pixel Format One possible method for conversion from a 24-bit RGB value would be: mono_byte = (((pixel[0].read + pixel[0].green + pixel[0].blue) /3) & 0xf0) + (((pixel[1].read + pixel[1].green + pixel[1].blue) /3) >> 4) 3.1.5. 8 Bit Palette This pixel format type is reserved for future use. The intent is to use it to transfer image data using Microsoft Windows' 8-bit palettized data format. 3.1.6. 4 Bit Palette This pixel format type is reserved for future use. The intent is to use it to transfer image data using Microsoft Windows' 4-bit palettized data format. 3.1.7. 411 YUV This format is available from some video capture hardware (notably Intel's Indeo card). In this format there are 8 bits IETF Exp. 14 July 1995 [Page 16] Internet Draft A Video Transmission Profile January 1995 of combined Luminance/Y data per pixel. In addition, for every 2 pixels there is one U and 1 V bit. When transferring 411-YUV data in the "Simple Transfer Format" (see the following section), the data are transferred exactly as received from the video capture hardware. Specifically, for an N pixel image: - N bytes of Luminance/Y data followed by, - N/16 bytes of V data (1 bit per 2 pixels), - N/16 bytes of U data (also 1 bit per 2 pixels). When transferring 411 YUV data in the "Block Transfer Format" (see the following section), all data for a single 8x8 pixel block are transferred together. Specifically, each 64-pixel block is transferred as: - 64 bytes of Luminance/Y data, followed by - 4 bytes of V Chrominance Data (1 bit for every 2 pixels), followed by - 4 bytes of U Chrominance Data (1 bit for every 2 pixels), followed by The Luminance/Y bytes are transferred in the following order (given in X/Y coordinates within the 8x8 block): IETF Exp. 14 July 1995 [Page 17] Internet Draft A Video Transmission Profile January 1995 (0,0), (0,1), (0,2), (0,3), (0,4), (0,5), (0,6) (0,7), (1,0), (1,1), (1,2), (1,3), (1,4), (1,5), (1,6) (1,7), (2,0), (2,1), (2,2), (2,3), (2,4), (2,5), (2,6) (2,7), (3,0), (3,1), (3,2), (3,3), (3,4), (3,5), (3,6) (3,7), (4,0), (4,1), (4,2), (4,3), (4,4), (4,5), (4,6) (4,7), (5,0), (5,1), (5,2), (5,3), (5,4), (5,5), (5,6) (5,7), (6,0), (6,1), (6,2), (6,3), (6,4), (6,5), (6,6) (6,7), (7,0), (7,1), (7,2), (7,3), (7,4), (7,5), (7,6) (7,7), The U and V data are then transferred as follows Bit Number Pixel Coordinates 0 (0,0), (0,1) 1 (0,2), (0,3) 2 (0,4), (0,5) 3 (0,6), (0,7) 4 (1,0), (1,1) 5 (1,2), (1,3) 6 (1,4), (1,5) 7 (1,6), (1,7) 8 (2,0), (2,1) 9 (2,2), (2,3) 10 (2,4), (2,5) 11 (2,6), (2,7) 12 (3,0), (3,1) 13 (3,2), (3,3) 14 (3,4), (3,5) 15 (3,6), (3,7) 16 (4,0), (4,1) 17 (4,2), (4,3) 18 (4,4), (4,5) 19 (4,6), (4,7) 20 (5,0), (5,1) 21 (5,2), (5,3) 22 (5,4), (5,5) 23 (5,6), (5,7) 24 (6,0), (6,1) 25 (6,2), (6,3) 26 (6,4), (6,5) 27 (6,6), (6,7) 28 (7,0), (7,1) 29 (7,2), (7,3) 30 (7,4), (7,5) 31 (7,6), (7,7) IETF Exp. 14 July 1995 [Page 18] Internet Draft A Video Transmission Profile January 1995 3.2. Transfer Formats This section describes how the data for the individual pixels (encoded per the previous section) are assembled together into packets for transmission. There are two transfer formats, simple and block. In simple mode, the data as received from the video capture hardware is gathered up into chunks and transmitted "as is". In block mode, the image-data are broken up into 8x8 pixel cells with some "diff" analysis and some backgrounding done. There are, in fact, two "simple" modes of transfer. One is for all video formats EXCEPT the Indeo format. The other is JUST for the Indeo-YVU format. The reason that there is a separate Simple mode for Indeo is that Indeo represents the pixels in YVU format and it breaks up the three values (Y, V, and U) into separate locations in the video capture buffers. The "non-Indeo" Simple Mode requires that all the data for a specific pixel be transferred together, so sending Indeo-YVU data in this mode would require that three separate chunks of data be brought together, transmitted, and then broken up. The role of the Simple Mode is to be simple, and this all seemed rather complex. In addition, since the Indeo-YVU data format uses 9-bits for each pixel, it would be very likely that a non-integral number of bytes would have to be transferred, further complicating matters. 3.2.1. Simple Mode This mode applies to all data formats except the Indeo data format. Simple Mode is a simple transfer of all the data received from the video capture hardware. No additional compressions or transformations are performed on the data. In Simple Mode, data from the video-capture hardware are gathered up into blocks of up to 256 pixels. Each block is preceeded by a 4-byte "Video Element Header" which gives the length of the element and that element's position within the image. IETF Exp. 14 July 1995 [Page 19] Internet Draft A Video Transmission Profile January 1995 Each packet MUST contain an integral number of pixels. That is to say, for encodings that use more than one byte to represent a pixel, all bytes for a pixel must be in the same packet. E.g., if 24-Bit RGB data is being transmitted, it is NOT allowed to send the bytes containing the Blue and Green information in packet N and the byte containing the Red information in packet N+1. The format of the Video Element Header is: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Pixel Count | X Coordinate | Y Coordinate | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Video Element Header Each tick mark represents one bit position. Where Pixel Count This is the number of pixels being transferred by this header. The maximum number of pixels is 255. A count of 0 means "no pixels" X Coordinate This is the X coordinate (0 based) of the first pixel in the block being transferred. Coordinate (0,0) is the upper-left corner of the screen. Y Coordinate This is the Y coordinate (0 based) of the first pixel in the block being transferred. Coordinate (0,0) is the upper-left corner of the screen. A given packet may have more than one block of data in it. For example, if an image is being transmitted in RGB-24 format, and a packet has up to 1,000 data bytes, the packet could contain: 1. A 4-byte Video Element Header, 2. 765 bytes of RGB-24 data (255 pixels at 3 bytes per pixel), 3. A second 4-byte Video Element Header, and 4. 225 more bytes of RGB-24 data (75 pixels at 3 bytes per IETF Exp. 14 July 1995 [Page 20] Internet Draft A Video Transmission Profile January 1995 pixel). This gives a total of 998 bytes of data. The remaining 2 bytes are not used since the packet must have an integral number of pixels, and the pixel-format for this example uses 3 bytes per pixel. 3.2.2. Indeo Simple Mode When sending Indeo-formatted Pixels in the Simple Mode, a single additional header is added to the packet. This header is added after the Loki Video Header and before the image date.. The header's format is: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Buffer Size | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Where Buffer Size Is the total size of the buffer needed to hold all of the Indeo data. This is in bytes. Offset Is the offset (0-based) in the Indeo data buffer of the first byte in this packet. The rest of the packet contains the image data. IETF Exp. 14 July 1995 [Page 21] Internet Draft A Video Transmission Profile January 1995 3.2.3. Block Mode Each block is preceeded by a 4-byte header: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags | X Coordinate | Y Coordinate | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Block Mode Header Each tick mark represents one bit position. Where Flags Contains flag information: 0x80 Insert the background image information (stored in the SINK's background buffer) into the display at the location specified in the X/Y coordinates. This is the DISPLAY_BACKGROUND_DATA flag in the algorithm presented below. 0x40 Copy the data for this block from the packet into the SINK's background buffer. This is the COPY_DATA_TO_BG flag in the algorithm presented below. X Coordinate This is the X coordinate (0 based) of the block being transferred. This coordinate is in units of 8x8 blocks; thus, the 8x8 block at coordinates (1,1) would be at pixel location (8,8) in the image. Coordinate (0,0) is the upper-left corner of the screen. Y Coordinate This is the Y coordinate (0 based) of the block being transferred. This coordinate is in units of 8x8 blocks; thus, the 8x8 block at coordinates (1,1) would be at pixel location (8,8) in the image. Coordinate (0,0) is the upper-left corner of the screen. An advantage of block mode is that the quantum of transfer is a single block of 64 pixels. Thus it allows for improved efficiency when sending only image changes. A SOURCE would IETF Exp. 14 July 1995 [Page 22] Internet Draft A Video Transmission Profile January 1995 examine only 2 or 3 pixels of each block and if any of those pixels had changed, the entire block is sent. 3.3. Format Values The Loki Extension Header Format field may contain the following values, describing the format of the data carried in the packet: IETF Exp. 14 July 1995 [Page 23] Internet Draft A Video Transmission Profile January 1995 +-------+----------+----------------+ | Value | Transfer | Pixel Encoding | | | Format | Format | +-------+----------+----------------+ | 0 | Reserved | +-------+----------+----------------+ | 1 | Simple | 24 Bit RGB | +-------+----------+----------------+ | 2 | Block | 24 Bit RGB | +-------+----------+----------------+ | 3 | Simple | 16 Bit RGB | +-------+----------+----------------+ | 4 | Block | 16 Bit RGB | +-------+----------+----------------+ | 5 | Simple | 8 Bit Palette | +-------+----------+----------------+ | 6 | Block | 8 Bit Palette | +-------+----------+----------------+ | 7 | Simple | 4 Bit Palette | +-------+----------+----------------+ | 8 | Block | 4 Bit Palette | +-------+----------+----------------+ | 9 | Simple | 8 Bit Mono | +-------+----------+----------------+ | 10 | Block | 8 Bit Mono | +-------+----------+----------------+ | 11 | Simple | 4 Bit Mono | +-------+----------+----------------+ | 12 | Block | 4 Bit Mono | +-------+----------+----------------+ | 13 | Unused | +-------+----------+----------------+ | 14 | Simple | Raw Indeo | +-------+----------+----------------+ | 15 | Block | Raw Indeo | +-------+----------+----------------+ Obviously, in keeping with the robustness principle, if some value other than the ones specified above is received in a packet, a receiver will ignore the packet. IETF Exp. 14 July 1995 [Page 24] Internet Draft A Video Transmission Profile January 1995 4. Algorithms This chapter describes several key algorithms in the FTP Software implementation of Loki. In general, these algorithms do not have any effects on interoperability. However, they can improve performance and as such are offered on an informational basis. Implementation experience has shown these algorithms to be effective. 4.1. Pixel Comparisons Some video digitization hardware has a fairly large amount of "noise" in the system. Even if a scene is truly unchanging, the numeric values for the pixels of the scene can vary over a fairly large range. So, when comparing two pixel values (such as the "current" and "previous" values for block-mode transmissions), a false "unequal" may occur because of this noise. Therefore, in order to eliminate this noise, comparisons of pixel values are not done for "strict equality" but rather "close equality". Specifically, comparisons are done as: if ((pixel_a <= (pixel_b + epsilon)) && (pixel_a >= (pixel_b - epsilon))) { The pixels are considered "equal" } One particular piece of video-capture hardware has a "noise level" of +/- 32 when capturing RGB-24 images. That is, when capturing an unchanging scene, the values of a pixel's components in two successive images are deemed to be "unequal" with epsilon values less than 32. Other than tending to reduce the amount of data that needs to be transmitted, this comparison algorithm has no effect on the protocol or the network. However, in order to support the algorithm, the "Can I See" request includes a field that permits the requestor to specify a desired "epsilon" value. IETF Exp. 14 July 1995 [Page 25] Internet Draft A Video Transmission Profile January 1995 4.2. Adaptive Transmit In order to make the best of changing network conditions and differing receiver capabilities, an adaptive transmission algorithm is used. This algorithm operates at the SOURCE, which evaluates the performance perceived by the SINKS. It then inserts some artificial delay between transmitting packets. It varies this delay in an attempt to maximise the throughput and frame rate while minimising the loss rate. The algorithm uses a high and a low "receiver quality threshold". If the receiver quality falls below the low threshold then the algorithm increases the added inter-packet gap until the quality gets above the high threshold (or until some administrative limit on the size of the gap is reached). If the receiver quality exceeds the high-threshold then the gap is reduced until the quality falls below the high- threshold (or the added gap is reduced to 0). The intent is to keep the receiver quality within some range. The algorithm operates as follows: (1) The SINKS all periodically report their reception characteristics to the SOURCE via the RR packet. This packet includes, among other things, a count of the number of packets actually received by the SINK as well as a count of the number of packets expected by the SINK (based on the RTP sequence number field). This is a standard function of RTP. (2) For each known SINK, the SOURCE maintains a "percent received" value: Number of Packets Received by the Sink * 100 -------------------------------------------- Number of Packets Expected by the Sink IETF Exp. 14 July 1995 [Page 26] Internet Draft A Video Transmission Profile January 1995 (3) Periodically, the SOURCE takes the average "percent received" value for all known SINKS. This value, Q, is then fed into the following algorithm: static BOOL correcting = false; if (Q > HI_THRESHOLD) { decrease interpacket gap; correcting = FALSE; } else if (Q < LOW_THRESHOLD) { correcting = TRUE; increase interpacket gap; } else if (correcting == TRUE) { increase interpacket gap; } The two values HI_THRESHOLD and LOW_THRESHOLD define the desired performance at the SINKS. If the overall SINK performance, Q, falls below the LOW_THRESHOLD then the inter- packet gap is steadily increased until Q gets above the HI_THRESHOLD. If Q gets above HI_THRESHOLD then the interpacket gap is decreased until it gets below that value Increasing and decreasing the interpacket gap is done in fairly small increments. Significant increases in image quality at the SINKS have been observed by adding only a few milliseconds of gap between packets. Ideally, the gap would be increased by the smallest effective time quantum available on the SOURCE system. Obviously an implementation would wish to bound the maximum amount of time added as an interpacket gap. Decreasing the gap has the effect of increasing the load on the network. To be a "good network citizen", one should decrease the gap slower than one increases the gap - perhaps every n'th pass through the algorithm (with n > 1). The basis for this algorithm is similar to the one for the Van Jacobson congestion control algorithm for TCP. That is; IETF Exp. 14 July 1995 [Page 27] Internet Draft A Video Transmission Profile January 1995 networks are generally very reliable, they do not lose packets because of error conditions. Lost packets are the result of congestion someplace in the network. Therefore, to reduce packet loss rates (which is the goal for video transmission since reduced loss means better image quality), we must reduce congestion. Network video transmission by PCs has a bi-modal network load characteristic. There is a period of time when a frame is captured and digitized and there is no network traffic generated. Then there is a period of time during which the digitized frame is packetized and transmitted. The congestion that must be reduced is the congestion that occurs during this second period of time; when the frame is actually transmitted. Simply reducing the frame capture rate, or increasing the compression will not solve this problem. Decreasing the frame capture rate will only increase the length of the quiet time. Increasing the compression on the frames will only reduce the length of time during which data are transmitted (reducing the number of packets and bytes sent). However, experience has indicated that packet loss occurs because packets arrive at interfaces (recieving hosts or intermediate routers) too fast -- the interface can not be 'turned around' in time to receive the next packet. The chosen solution was to add a delay between packet transmissions. An improvement to the algorithm would be to add the gap between every N packets, on the theory that most interfaces can receive several packets in rapid succession but then there needs to be some time for the hosts to "reset" the interface for the next batch of packets. This behavior has, in fact, been seen on some combinations of PC and network interface. Due to time constraints, this work was not done. 4.3. Jitter The RTP Receiver Reports carry an interarrival jitter field. However, no algorithm is specified. For Loki, the jitter is standard deviation in the average IETF Exp. 14 July 1995 [Page 28] Internet Draft A Video Transmission Profile January 1995 inter-packet gap, expressed in milliseconds. A filter is applied to the incomming packets. The inter- arrival time between two packets, X and Y, is calculated and used for jitter calculations if and only if those packets meet the following two criteria: (1) Both packets must be from the same video frame. That is, the RTP Timestamp must be the same for both packets. This criteria filters out the gap while the transmitter grabs, digitizes, and starts packetizing the "next" frame. (2) The RTP Sequence Number for packet Y must be X's sequence number plus one. This filters out any gaps that occur because of missing packets. The calculation of the standard deviation, s, is according to the following formula: ---------------------------------- _ / (n * Sigma(x**2) - Sigma(x)**2) s = \ / ------------------------------- v n(n-1) Where x is an inter-packet arrival time in milliseconds, n is the number of packets over which the calculation is performed, and Sigma is the summation operation. As an implementation matter, only n, Sigma(x**2) and Sigma(x) are calculated when a packet is received. The remaining calculations are performed only when the Receiver Report packet is generated. IETF Exp. 14 July 1995 [Page 29] Internet Draft A Video Transmission Profile January 1995 4.4. Block Mode Algorithm In the block transmission mode, the image is divided up into 8-pixel by 8-pixel blocks of data and each 8x8 block of data is treated as a single quantum. This makes it reasonable to retain a large set of state information on the actual image data, allowing for more sophisticated transmit features: IMAGE DIFFING In block mode it is reasonably efficient to examine a few pixels of each block in order to determine whether to transmit that block's data or not. The pixels chosen for the old vs. new comparison should be randomly selected from within the block. Otherwise it is possible that a corner of a block would change but that change would never be detected. BACKGROUNDING If a block doesn't change for several frames then that block is presumed to be 'background'. The SOURCE would then inform the SINKs that a particular Block's data is now Background data. As the image changes, the background data can be re-displayed by a short header (which just gives the coordinates of the block and a "display background here" command) rather than sending the entire block's worth of data. TIMEOUT It is conceivable that portions of a scene will not change for an appreciable period of time (if ever). Late-arriving SINKS would then never get the actual data for the block. The algorithm realizes this and a SOURCE will time out portions of the image and then send those blocks. This ensures that all SINKs eventually get all parts of the image. PROBABLE DELIVERY In order to increase the probability that all the SINKs receive a particular update, a SOURCE will send the update for several frames beyond which it would otherwise have to send the data. For example, if a particular block of the image changes from "A" to "B" in frame N (and doesn't change thereafter), the SOURCE will transmit the "B" data for several additional frames after frame N IETF Exp. 14 July 1995 [Page 30] Internet Draft A Video Transmission Profile January 1995 (e.g., frame N, N+1, and N+2). This increases the likelyhood that all SINKs received the new data. When deciding whether to send image data and flags, the following algorithm is used: (1) if ((background_valid == TRUE) && (background_timeout <= NOW)) { background_valid = FALSE; send_background = TRUE; retrans_count = RETRANS; background_timeout = NOW + LIFE; } (2) if ((background_valid == TRUE) && (current_data == background_data)) { previous_data = current_data; TRANSMIT(NULL, DISPLAY_BACKGROUND_DATA); return; } (3) if (current_data != previous_data) { send_background = FALSE; retrans_count = RETRANS; previous_data = current_data; TRANSMIT(current_data, 0); return; } (4) if (retrans_count != 0) { retrans_count = retrans_count - 1; if (send_background == TRUE) { TRANSMIT(current_data, COPY_DATA_TO_BG); return; } TRANSMIT(current_data, 0); return; } (5) if (send_background == FALSE) { send_background = TRUE; IETF Exp. 14 July 1995 [Page 31] Internet Draft A Video Transmission Profile January 1995 retrans_count = RETRANS; background_data = current_data; TRANSMIT(current_data, COPY_DATA_TO_BG); return; } (6) background_timeout = NOW + LIFE; background_valid = TRUE; send_background = FALSE; TRANSMIT(NULL, DISPLAY_BACKGROUND_DATA); return; TRUE and FALSE are constants with the obvious meaning. In the algorithm, RETRANS, LIFE, and NOW are global variables: LIFE Is the lifetime of background data in units of captured frames. The SOURCE will transmit the background data to the SINKs if the data has not changed in LIFE frames. The SOURCE will send the data RETRANS times (see next item) before it stops. RETRANS Is the number of times data will be retransmitted after the data stops changing. In other words, it is used to define how "probable" the Probable Delivery is. It also is the basis for defining background data. If a block does not change for 2xRETRANS frames then the block's data is called background data. NOW The current frame number being transmitted. All other variables are state variables that are kept on a per-block basis: background_valid Is a boolean which indicates whether the data in "background_data" is valid or not. background_timeout This is the timeout for the background data. When the current frame number, NOW, exceeds this value the SOURCE will retransmit the background data. IETF Exp. 14 July 1995 [Page 32] Internet Draft A Video Transmission Profile January 1995 retrans_count This is the number of times that data will be transmitted. This counts down. When it reaches 0, retransmission stops. send_background This is a boolean which indicates whether the SOURCE is telling the SINKs to place the transmitted data into the background buffers at the SINKs or not. current_data This is the current image data for the block. background_data This is the background data for the block. previous_data This is the data for the previous frame of the image. The algorithm works as follows: (1) This step checks to see if the background data has timed out. The intent is to periodically refresh the background information at the SINKs. The background data is retransmitted by the expedient of simply declaring that the background is "invalid" but that we are in the process of transmitting it. (2) This step checks to see if the block in the current image is the same as the background data. If it is then no data is transmitted, but the "display the background data for this block" command is sent. The algorithm then exits. (3) This step checks if the data for the current image is the same as the previous image. If not then the image is changing and we have to transmit the "current" data. The retransmission logic is also set up so that the data will be retransmitted several times (RETRANS) once the image 'stops' changing. The algorithm then exits. (4) At this point, we know that the current frame is A) not the same as the background and B) IS the same as the previous frame. This step then checks to see if the block is being retransmitted. If it is, the retransmit IETF Exp. 14 July 1995 [Page 33] Internet Draft A Video Transmission Profile January 1995 count is decremented (we only retransmit a small, finite, number of times) and the current image data are transmitted. If the data are being sent as "background" data to the SINKs then the COPY_DATA_TO_BG flag is set, otherwise it isn't. The algorithm then exits. (5) Once the retransmit count goes to 0 we fall through into here. At this point the decision is made as to whether the data should be considered background data or not. In this step, we check to see if we were transmitting background information or not. If we were not then we start sending the current data as background data (i.e. send the COPY_DATA_TO_BG flag with the data). Note that we only get to this point if the data have not changed for RETRANS frames, implying that this portion of the image is fairly stable. Also, we remember that we are now sending background information (send_background is set to TRUE) and we set up the retransmission counter to send the background data a few times. (6) At this point, the there is no new data to send to the SINKs, all retransmissions are done, and we know that what they should display is what's in the background. Simply transmit no data, just the command to tell the SINKs to display what is in their background buffers. Also initialize the retransmission logic so that the background data will be sent LIFE frames in the future (assuming no other changes occur). Note that this algorithm declares a particular block to be "background" data if that data does not change for 2xRETRANS frames. IETF Exp. 14 July 1995 [Page 34] Internet Draft A Video Transmission Profile January 1995 5. Future Work 5.1. RTP -06 The Loki work was done on RTP version 05[1]. Since that time, a new version of RTP has been produced, the 06 version[3]. The Loki protocol and implementation should be revised to use RTP 06. The most significant modifications are that the 06 version of RTP includes implementation notes and algorithm specifications which will supplant the algorithms used in Loki. In particular, the Jitter algorithm specified in the [3] will replace the one specified in this document. 5.2. Additional Transmit Adaptivity The current adaptive transmit algorithm: (1) Uses only the packet loss rate to determine received image quality, (2) Only uses the inter-packet transmit delay as the tuning "knob", and (3) Adjusts the delay for every transmitted packet. There are additional strategies that could be investigated, both when the determining received image quality and in adjusting transmit behavior to fix any perceived quality problems. For example (1) The inter-packet transmit delay could be adjusted only every N'th packet. Investigations indicate that some interfaces are able to receive a few packets back-to- back but then start to fail (the typical failure mode here is that on-card buffers fill up). By inserting the gap every N'th packet, then the effective transmit rate can be increased without sacrificing quality. (2) The current algorithm will increase the transmit delay until either the high-threshold or the maximum allowed delay is reached. Rather, work could be done to make the algorithm cognizant of how the receiver quality is changing as the delay is increased, and the algorithm IETF Exp. 14 July 1995 [Page 35] Internet Draft A Video Transmission Profile January 1995 could stop increasing the delay when the reported quality stops getting better. Any such modifications must be aware of possible non-linear relationships between the delay and the quality and be able to deal with them. (3) The packet size could be increased. This would add a small amount of time between transmitting packets (more time is needed to packetize more data). It would also reduce the number of packets sent per frame, reducing the number of inter-packet gaps, increasing the overall data transmission rate. (4) The pixel-formatting could be changed. This would have two effects. The time required to reformat the pixels (say from RGB-24 to MONO-4) would add to the interpacket gap. The number of packets required to transmit an image would be decreased. 5.3. Selective Retransmissions Early development of Loki included a facility whereby a SINK could request that the SOURCE retransmit certain portions of the image. This facility was not kept when Loki was ported to run over RTP (mostly as a matter of time constraints on the implementation work). Experiments with that early version had shown that, in certain types of environment (such as 1 to 1 video conferencing) this capability provides noticable improvements to the video image. This capability worked as follows. 1. This was only available in block-mode. 2. For each block, the SINK monitored how long it had been since any real data had been received for the block, either the foreground or the background. If a period of time had elapsed when no data had been received, a request was sent to the SOURCE that specifically asked that the data for the block be transmitted. 3. The SOURCE then had the option of honoring the request, or ignoring it, or queueing it up for a short time in order to combine several requests together into a single response. The SOURCE also had the capability of sending a response to IETF Exp. 14 July 1995 [Page 36] Internet Draft A Video Transmission Profile January 1995 the SINK(s) indicating that the SOURCE would not do selective retransmissions and therefore the SINK(s) should not bother to ask. 4. A SINK had the responsibility of A) Not asking for retransmissions too often (i.e. throttling its own behavior), B) Noticing if there were a "large number" of SINKS and if so, refraining from asking (to avoid flooding the SOURCE with too many requests), and C) Noticing whether the SINK was losing on too many parts of the image and if so, refraining from asking (on the theory that if the SINK was losing terribly badly, it was probably a real network problem and doing lots of requests and retransmissions would not solve the problem). This capability proved most useful in dealing with situations where the SINK just simply never received a certain portion of the image. This could occur quite often as a result of peculiarities in network interface adaptors (e.g. every N'th packet might be lost because the card only had N-1 buffers and it spent the time when it should have received the N'th packet emptying out its buffers), or various, inadvertant, "self- synchronizations" that occured between the SOURCE and SINK. One might have thought that the various random events that occur in networks would have been enough to prevent this from occuring, but these things did occur. IETF Exp. 14 July 1995 [Page 37] Internet Draft A Video Transmission Profile January 1995 6. Can I See 6.1. Overview The Can I See (aka CNIC) feature is for unicast, one-to-one, video traffic. The model of operation is that a Requestor submits a CNIC request to a known Server. The Server then unicasts back to the Requestor some number of video frames, or a rejection notification, giving the reason. The user must know the particulars for the specific Server that is desired (IP Address and UDP ports). Propagation of this information is not a function of Loki. This information could easily be obtained via some directory or session control protocols. Some preliminary experimentation has been done on PCs that combines a Loki receiver and a Web browser. A configuration file that contains the details for a particular CNIC request is obtained and downloaded via the Web. The browser then invokes the Loki receiver with that configuration file and the receiver then automatically submits the CNIC request. This worked reasonably well. CNIC requests should never be multicast or broadcast. Implosion problems would occur. Using "N" unicast Can I See sessions is not a substitude of a single one to N multicast session. 6.2. Bridges/Mixers and Translators The use of Can I See may be problematic when intermediate devices such as bridges/mixers or translators are present. In order to properly handle CNIC requests, an intermediate system would have to: 1. Receive the request, 2. Be able to map the request to some other, known, Server(s), (the Server CSRC field of the CNIC request is provided to aid in this mapping), 3. Generate one or more new requests, based on the mapping in the previous step (which would then cause one or more streams to be unicast to the intermediate system) and 4. Take the received unicast streams, properly process them, and IETF Exp. 14 July 1995 [Page 38] Internet Draft A Video Transmission Profile January 1995 then unicast them on to the original Requestor. This sort of higher level control protocol 'remapping' is beyond the original scope of behavior planned for intermediate devices. Current work has not made use of Bridges/Mixers or Translators. 6.3. Historical Perspective The Can I See (aka CNIC) feature was added to the protocol early on as a debugging and demonstration tool. As the first implementations were developed and tested, we'd test them at various places within FTP Software, as well as selected other locations. We could have left the software "transmitting" to the intended receivers, either via unicasts or directed broadcasts or multicasts. This would have been, at the very least, impolite. Instead, we developed CNIC. It was inspired by the general idea of Cornell's CU See Me tool. This allowed demonstrations and tests from remote sites without running a lot of unnecessary network traffic. Since then it's proven to be a popular and continually useful facility so it was retained in Loki. IETF Exp. 14 July 1995 [Page 39] Internet Draft A Video Transmission Profile January 1995 7. Design Comments A goal of the protocol, and its associated algorithms, is that it be usable in a MS Windows environment by applications using the standard, documented, MS Windows video application programming interface (specifically, Microsoft's Video For Windows). This environment supports rich suite of video encoding formats, image sizes, and so on. Furthermore, there is not necessarily a "common" video encoding format for all Video for Windows compliant capture cards. Since the transmission functions are somewhat tied to the capabilities of the hardware, this has led to a design decision to require that the receivers be flexible, that they be able to receive and properly process anything that a transmitter may send, that they not consider it an error to receive something in a format that they didn't expect. IETF Exp. 14 July 1995 [Page 40] Internet Draft A Video Transmission Profile January 1995 8. References [1] Schulzrinne, Casner, Frederick, and Jacobson, "RTP: A Transport Protocol for Real-Time Applications", version 05, 18 July 1994. draft-ietf-avt-rtp-05. This document may no longer be available. Contact the author for copies. [2] Reynolds, J. K., and J. Postel, "Assigned Numbers", RFC 1700, October 1994. [3] Schulzrinne, Casner, Frederick, and Jacobson, "RTP: A Transport Protocol for Real-Time Applications", version 06, 28 November 1994. draft-ietf-avt-rtp-06. IETF Exp. 14 July 1995 [Page 41] Internet Draft A Video Transmission Profile January 1995 9. Security Considerations Loki does not contain any provisions for security. Loki assumes that the underlying protocols provide any desired security (i.e., the IP security work and/or the IP6 security work actually produce useable security features). Network video applications can generate a large amount of traffic. This traffic could severely load down various elements in the network (64kbit lines can easily be saturated with video traffic). In addition the high packet-rates can overload some interface cards. These two conditions, while not, per se, security issues, can, end up degrading, or even denying service to other users of the networks, media, routers, and host. This might be seen as not dissimilar to a denial-of-service attack. IETF Exp. 14 July 1995 [Page 42] Internet Draft A Video Transmission Profile January 1995 10. Author's Address Frank J. Kastenholz FTP Software 2 High Street North Andover, MA, 01845-2620 USA Phone: +1 508-685-4000 EMail: kasten@ftp.com IETF Exp. 14 July 1995 [Page 43] Internet Draft A Video Transmission Profile January 1995 Table of Contents Status of this Memo .................................... i 1 INTRODUCTION .......................................... 1 1.1 Note On Terminology ................................. 2 1.2 Change Log .......................................... 2 2 PROTOCOL SPECIFICATION ................................ 4 2.1 Byte Order .......................................... 4 2.2 Video Data .......................................... 4 2.2.1 Packet Size ....................................... 4 2.2.2 RTP Header Profile ................................ 5 2.2.2.1 Marker Bit ...................................... 5 2.2.2.2 Payload Type .................................... 5 2.2.2.3 Extension Bit ................................... 5 2.2.3 Loki Header ....................................... 5 2.3 RTCP Extensions ..................................... 6 2.4 Loki Control Packets ................................ 7 2.4.1 Discard ........................................... 8 2.4.2 Suspend ........................................... 8 2.4.3 Can I See Request ................................. 8 2.4.4 Can I See Status .................................. 11 3 Video Encoding Specifications ......................... 14 3.1 Pixel Formats ....................................... 14 3.1.1 24 Bit RGB ........................................ 14 3.1.2 16 Bit RGB ........................................ 14 3.1.3 8 Bit Monochrome .................................. 15 3.1.4 4 Bit Monochrome .................................. 16 3.1.5 8 Bit Palette ..................................... 16 3.1.6 4 Bit Palette ..................................... 16 3.1.7 411 YUV ........................................... 16 3.2 Transfer Formats .................................... 19 3.2.1 Simple Mode ....................................... 19 3.2.2 Indeo Simple Mode ................................. 21 3.2.3 Block Mode ........................................ 22 3.3 Format Values ....................................... 23 4 Algorithms ............................................ 25 4.1 Pixel Comparisons ................................... 25 4.2 Adaptive Transmit ................................... 26 4.3 Jitter .............................................. 28 4.4 Block Mode Algorithm ................................ 30 5 Future Work ........................................... 35 5.1 RTP -06 ............................................. 35 5.2 Additional Transmit Adaptivity ...................... 35 IETF Exp. 14 July 1995 [Page ii] Internet Draft A Video Transmission Profile January 1995 5.3 Selective Retransmissions ........................... 36 6 Can I See ............................................. 38 6.1 Overview ............................................ 38 6.2 Bridges/Mixers and Translators ...................... 38 6.3 Historical Perspective .............................. 39 7 Design Comments ....................................... 40 8 References ............................................ 41 9 Security Considerations ............................... 42 10 Author's Address ..................................... 43 IETF Exp. 14 July 1995 [Page iii]