Internet Engineering Task Force Audio-Video Transport Working Group INTERNET-DRAFT W. Fenner draft-ietf-avt-jpeg-01.txt Xerox PARC L. Berc Digital Equipment Corporation R. Frederick Xerox PARC S. McCanne Lawrence Berkeley Laboratory March 23, 1995 Expires: 11/1/95 RTP Encapsulation of JPEG-compressed video. Status of this Memo This document is an Internet Draft. Internet Drafts are working docu- ments of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts). Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." Please check the I-D abstract listing contained in each Internet Draft directory to learn the current status of this or any other Inter- net Draft. Distribution of this document is unlimited. Abstract This draft describes the RTP payload format for JPEG video streams. It is optimized for real-time video streams using constant JPEG parameters, as opposed to individual JPEG images coming from dif- ferent sources. This document is a product of the Audio-Video Transport working group within the Internet Engineering Task Force. Comments are solicited and should be addressed to the working group's mailing list at rem- conf@es.net and/or the author(s). Expires November 1995 [Page 1] Internet Draft draft-ietf-avt-jpeg-01.txt March 1995 1. Introduction This document describes the transport of JPEG-compressed video over RTP. JPEG-compressed video has several unique features: + Each frame is large, requiring fragmentation and reassembly + There is no easy way to recover from a lost segment - a lost seg- ment means the whole frame is lost. The JPEG spec specifies a method to recover, but not all hardware decoders can handle it. 2. RTP Usage The RTP timestamp is in units of 65536Hz. The same timestamp value is used for all fragments of a single frame. The RTP marker bit marks the end of a frame. 3. JPEG header A special header is added to each packet. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MBZ | Fragment Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Q | Width | Height | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3.1. MBZ: 8 bits This space is reserved for future use. 3.2. Fragment Offset: 24 bits The Fragment Offset is the data offset in bytes of the current packet in the full JPEG frame. 3.3. Type: 8 bits The type field describes the format of the JPEG data. It encodes all of the JFIF options. Types 0-127 are pre-determined by pro- files, and types 128-255 are free to be redefined by the session protocol. 3.4. Q Factor (Q): 8 bits The Q Factor describes the current JPEG quantization table. If 1 Expires November 1995 [Page 2] Internet Draft draft-ietf-avt-jpeg-01.txt March 1995 <= Q <= 99, the algorithm devised by the Independent JPEG Group is used to calculate the JPEG quantization table. If Q >= 100, a cus- tom quantization table is being used. It is expected that the standard quantization tables will handle almost every possible case, and custom tables will be used rarely. Q = 0 is invalid. 3.5. Width: 8 bits Width encodes the number of pixels in multiples of 8 (i.e. a width of 40 denotes an image 320 pixels wide). 3.6. Height: 8 bits Height encodes the number of pixels in multiples of 8 (i.e. a height of 30 denotes an image 240 pixels tall). 3.7. Data The data following the header is an entropy coded JPEG stream as defined in the JPEG standard. JPEG markers are 0xFF bytes in the data stream. A "stuffed" 0x00 byte follows any 0xFF byte generated by the entropy coder (as per section B.1.1.5 of the JPEG standard). 4. Discussion 4.1. The Type Field The Type field encodes all of the JFIF parameters that are expected to stay constant over the lifetime of a session in a single byte. Two type fields are currently defined: Type 0: YUV 4:2:2 square pixels, 16x8 MCU, standard Huffman tables Type 1: YUV 4:2:0 square pixels, 16x16 MCU, standard Huffman tables Additional types in the range 128-255 may be defined by external means, such as a session protocol. 4.2. Fragmentation and Reassembly Since JPEG frames are large, they must be fragmented. Frames should be fragmented into packets in a manner avoiding fragmentation at a lower level. When using restart markers, frames should be fragmented such that each packet starts with a restart interval (see below). Each packet that makes up a single frame has the same timestamp. The fragment offset field is set to the byte offset of this packet within the original frame. The RTP marker bit is set on the last packet in a frame. Expires November 1995 [Page 3] Internet Draft draft-ietf-avt-jpeg-01.txt March 1995 4.3. Restart Markers Restart markers indicate a point in the JPEG stream at which the Huffman coder is reset, allowing partial decoding starting at that point. The use of restart markers allows for robustness in the face of packet loss. However, not all hardware decoders support restart markers, meaning that such hardware will only be able to decode the first portion of a frame, up to a restart marker, and then fail. Thus, for maximum interoperabil- ity, restart markers should not be included in the JPEG data. If restart markers are included, each packet should begin with a restart interval. Since there is no way to tell a priori how much data will occur between restart markers, a restart interval might span multiple packets. If a restart interval must be fragmented, it is preferable to create a short packet so that the next restart interval can occur at the beginning of a packet once again. 5. Security Considerations Security issues are not discussed in this memo. Expires November 1995 [Page 4] Internet Draft draft-ietf-avt-jpeg-01.txt March 1995 6. Authors' Addresses William C. Fenner Xerox PARC 3333 Coyote Hill Road Palo Alto, CA 94304 Phone: +1 415 812 4816 Email: fenner@cmf.nrl.navy.mil Lance M Ber: Systems Research Center Digital Equipment Corporation 130 Lytton Ave Palo Alto CA 94301 Phone: +1 415 853 2100 Email: berc@pa.dec.com Ron Frederick Xerox PARC 3333 Coyote Hill Road Palo Alto, CA 94304 Phone: +1 415 812 4459 Email: frederick@parc.xerox.com Steven McCanne Lawrence Berkeley Laboratory M/S 46A-1123 One Cyclotron Road Berkeley, CA 94720 Phone: +1 510 486 7520 Email: mccanne@ee.lbl.gov Expires November 1995 [Page 5] Internet Draft draft-ietf-avt-jpeg-01.txt March 1995 Appendix A The following code can be used to create a quantization table from a Q factor: /* * Table K.1 from JPEG spec. */ static const int jpeg_luma_quantizer[64] = { 16, 11, 10, 16, 24, 40, 51, 61, 12, 12, 14, 19, 26, 58, 60, 55, 14, 13, 16, 24, 40, 57, 69, 56, 14, 17, 22, 29, 51, 87, 80, 62, 18, 22, 37, 56, 68, 109, 103, 77, 24, 35, 55, 64, 81, 104, 113, 92, 49, 64, 78, 87, 103, 121, 120, 101, 72, 92, 95, 98, 112, 100, 103, 99 }; /* * Table K.2 from JPEG spec. */ static const int jpeg_chroma_quantizer[64] = { 17, 18, 24, 47, 99, 99, 99, 99, 18, 21, 26, 66, 99, 99, 99, 99, 24, 26, 56, 99, 99, 99, 99, 99, 47, 66, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99 }; /* * Call MakeTables with the Q factor and two int[64] return arrays */ void MakeTables(int q, int *lum_q, int *chr_q) { int i; if (q < 1) factor = 1; if (q > 99) factor = 99; if (q < 50) q = 5000 / factor; else q = 200 - factor*2; Expires November 1995 [Page 6] Internet Draft draft-ietf-avt-jpeg-01.txt March 1995 for (i=0; i < 64; i++) { lum_q[i] = ( jpeg_luma_quantizer[i] * q + 50) / 100; chr_q[i] = ( jpeg_chroma_quantizer[i] * q + 50) / 100; /* Limit the quantizers to 1 < q < 256 */ if ( lum_q[i] < 1) lum_q[i] = 1; if ( chr_q[i] < 1) chr_q[i] = 1; if ( lum_q[i] > 255) lum_q[i] = 255; if ( chr_q[i] > 255) chr_q[i] = 255; } } Expires November 1995 [Page 7]