INTERNET-DRAFT Eric Edwards draft-ietf-avt-rtp-jpeg2000-01.txt Satoshi Futemma Eisaburo Itakura Nobuyoshi Tomita Andrew Leung Takahiro Fukuhara Sony Corporation June 30, 2002 Expires: December 30 2002 RTP Payload Format for JPEG 2000 Video Streams Status of this Memo This document is an Internet-Draft and is in subject to all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference materials or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Drafts Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document describes a payload format for transporting JPEG 2000 video streams using RTP (Real-time Transport Protocol). JPEG 2000 video streams are formed as a continuous series of JPEG 2000 still images. The JPEG 2000 payload format described in this document has three features: (1) Improvement of robustness to packet loss by intelligently fragmenting JPEG 2000 packet units, (2) Persistency of main header to minimize loss effect and maximize recovery, (3) Priority information field for scalable delivery from the same code stream. These will allow for scalability and robustness of JPEG 2000's potential to be maximized in streaming applications. 1. Introduction This document specifies payload formats for JPEG 2000 video streams over the Real-time Transport Protocol (RTP). JPEG 2000 is an ISO/IEC International Standard developed for next-generation still image encoding. Its basic encoding technology is described in [1]. Edwards, et al. [Page 1] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 Part 3 of the JPEG 2000 standard defines Motion JPEG 2000[2]. However, this defines only the file format but not the transmission format for streaming on the Internet. For this reason, it is necessary to define the RTP format for JPEG 2000 video streams. JPEG 2000 supports many features over the current JPEG standard [3][4][5]: o Higher compression efficiency than JPEG with less visual loss especially at extreme compression ratios. o A single code stream that offers both lossy and superior lossless compression. o Transmission over noisy environments. o Progressive transmission by pixel accuracy and resolution. o Random code stream access and processing. First, the JPEG-2000 algorithm is briefly explained below. Fig. 1 shows a block diagram of JPEG 2000 encoding method. +-----+ | ROI | +-----+ | V +----------+ +----------+ +------------+ |DC, comp. | | Wavelet | | | raw image ==> |transform-|==>|transform-|==>|Quantization|==+ | ation | | ation | | | | +----------+ +----------+ +------------+ | | +-------------+ +----------+ +------------+ | | | | | | | | JPEG 2000 <==|Data ordering|<==|Arithmetic|<==|Coefficient |<=+ code stream | | | coding | |bit modeling| +-------------+ +----------+ +------------+ Fig. 1: Block diagram of the JPEG 2000 encoder Each color component or tile is transformed into wavelet coefficients. The component or tile is sub-sampled into various levels usually vertically and horizontally from high frequencies (which contains all the sharp details) to the low frequencies (which contains all the flat areas.) Quantization is performed on the coefficients within each subband. The wavelet coefficient is divided by the quantization step size and the result is truncated. After quantization, code blocks are formed from within the precincts within the tiles. Precincts are a finer separation than tiles and code blocks are the smallest separation of the image data. Entropy Edwards, et al. [Page 2] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 coding is performed within each code block and arithmetically encoded by bit plane. After the coefficients of all code blocks have been coded into a short bit stream, a header is added turning it into a packet. The header has all the information needed to decompress the packet into code blocks. A group of packets is called layers. For additional features in transmitting, a re-ordering of the formed packets is necessary. The standard has four ways to transmit and decode a compressed image by: resolution, quality, position, or component. This is only to serve as an introduction to JPEG 2000 and to aid in understanding the rest of this document. Further details of the encoder can be found in various texts on JPEG 2000 [1]. To decompress a JPEG 2000 code stream, one would follow the reverse order of the encoding order, minus the quantization step. It is outside the scope of this document to describe in detail this procedure. Please refer to various JPEG 2000 texts for details [1]. 1.1 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 [7]. 1.2 Author's comments, responses, and changes relative to -00 [ this section will be removed in a future version of this document] 1.2.1 Author's comments on this draft Changes required from implementation of the last draft of the document Implementation of the last draft of this document revealed some potential problems with the previous draft. Some markers would never be used, and some situations may always occur, which there would be no combinations of markers to indicate it and inefficient usage of packets would be encountered (i.e. packing multiple tiles into a single RTP payload packet.) Revisions have been added to handle these cases and redundant markers have been removed. Removal of redundant texts from this document A lot of text has been removed from the introduction of this document. This document cannot possibly cover JPEG 2000 in any comprehensive way compared to other resources available or cited. Implementors of this standard should have a more comprehensive understanding of JPEG 2000 than anything that was written in the introduction previously. Please refer to cited Edwards, et al. [Page 3] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 texts for further information on JPEG 2000. 1.2.2 Response to comments Response to comments made on previous drafts of this document and design methodology used here. Comments from IETF and WG01 are responded to here. H.263 picture header redundancy technique The picture header redundancy technique from RFC2428, an RTP payload for H.263+, is quite intelligent and useful. In JPEG 2000, there can be instances where the Main Header of the codestream can become incredibly large, larger than the MTU size if many encoding options are used. In such a situation, sending the Main Header with each codestream packet would not be viable at all. The codestream header is already quite compressed during from basic JPEG 2000 development. Another technique will be used in this standard to do something similar. Through the optional payload header extension using the optional Marker Segment Optional Header, the sender can include all the data that it feels to be most important inside this optional header. Scalable audio technique The scalable audio technique from RFC2198 is quite interesting and in some ways, applicable to our standard. This standard's target market is quite wide and very unique. JPEG 2000 was developed to be a highly flexible standard for digital imaging, target applications from ultra-thin clients to image archiving. At the imaging archiving level, the technique would be useful as we move down to thinner clients, such a technique may not be optimal when memory resources are scarce. Optimal packet reordering JPEG 2000 packet reordering and transmission may give a much lower error rate when packets are lost or dropped as the error would not be immediately apparent and can just "smear" over from frame to frame. With packet reordering, the client must store all the packets and rearrange them in memory for the decode. The authors feel this would be incredibly taxing on some target devices and not sure if such a scheme's result would be effective. There should be some investigation into this area with testing to find maybe a single best reordering scheme. JPIP Interoperability JPIP is new work taking place in the ISO/WG01 JPEG group to develop a new part to the JPEG 2000 Standard.As the new JPIP targets different application areas than this standard, interoperability is highly desired. While this is an RTP standard and JPIP is an RTSP standard, we have provided Edwards, et al. [Page 4] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 provisions for compatability from within the optional header. This standard currently has only reserved definition for JPIP header within the optional header. As JPIP is in early stages of development and standardization, this standard shall incorporate JPIP as a peer standard and strive for interoperability as both become more mature. 1.2.3 Changes from the -00 version The changes from the -00 version of this Internet Draft are: Tiling bit removed and MTL has become MHF The tile bit has been removed from the MTL field and the MTL field has been renamed to MHF. Tiling flag introduced. The T flag field comes before the tile number field in the payload header. Fragment offset shortened from 32bit to 24bit field. The justification is that for even QHD images, the fragment offset value will not exceed 24bits. (Our target applications are at most QHD size which has at most 4000 width and 3000 height. even if we encode that QHD size with 2bpp, the encoded size is 4000x3000x2 / 8 = 3MB which is less than 2^24.) Additionally, the savings in bits has also been reserved for future use. Examples of packetization The packetization methods are much simpler than the first version of the document, which required the examples to help illustrate the packetization method. Introduction to JPEG 2000 shortened As mentioned previously in the comments section. X and E bit swapped positions in the header The fields have been swapped positions as implementations demonstrated this is optimal data layout for this information. Default priority table introduced When there is no user table defined, a default table will be used. This table is based on the JPEG 2000 packet number in the codestream. Most JPEG 2000 images have at most 90 (=6x5x3) jp2-packets which are constructed from 5 decomposition levels (6 resolutions), 5 layers and 3 components (YcrCb). Therefore, we Edwards, et al. [Page 5] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 suppose the 254 level priorities are enough in the worst case. 2. JPEG 2000 Video Features JPEG 2000 video streams are formed as a continuous series of JPEG 2000 still images so the above features of JPEG 2000 can be used effectively. A JPEG 2000 video stream has the following merits: SNR is improved at a low bit rate. The formation can be used as a video stream format at a low bit rate. This is a Full Intra format, which each frame is independently compressed has a low encoding and decoding delay. JPEG 2000 has flexible and accurate rate control. This is suitable for traffic control and congestion control at the network transmission. JPEG 2000 can provide its own code stream error resilience markers to aid in code stream recovery. 3. Design of RTP payload format for JPEG 2000 video streams To provide a payload format that exploits the JPEG 2000 video stream, described in the previous section, the following must be taken into consideration: - Provisions for packet loss On the Internet, 5% packet loss is common and this percentage may sometimes come to 20% or more. To split JPEG 2000 video streams into RTP packets, efficient packetization of the code stream is required to minimize the effects of disabled decoding due to missing code-blocks over error prone environments. If the main header is lost in transmission, the decoding ability is lost. Accordingly, a system to compensate for the loss of the main header as much as possible is required. - A packetizing scheme that exploits JPEG 2000 functionality. A packetizing scheme so that an image can be progressively transmitted and reconstructed progressively by the receiver using JPEG 2000 functionality. Maximizing performance over various network conditions and various computing power of receiving platforms. 4. Proposal for an RTP payload format for JPEG 2000 video streams 4.1 RTP fixed header usage Edwards, et al. [Page 6] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 For each RTP packet, the RTP fixed header is followed by the JPEG 2000 payload header, which is followed by JPEG 2000 code stream. The RTP header fields that have a meaning specific to the JPEG 2000 video are described as follows: Payload type (PT): The payload type is dynamically assigned by means outside the scope of this document. A payload type in the dynamic range shall be chosen by means of an out of band signaling protocol (e.g., RTSP, SIP, etc.) Marker bit (M): The marker bit of the RTP fixed header MUST be set to 1 on the last RTP packet of a video frame, and otherwise, it must be 0. When transmission is performed by multiple RTP sessions, the bit is set in the last packet of the frame in each session. Timestamp: The RTP timestamp is in units of 90 KHz. The same timestamp must appear in each fragment of a given frame. The initial value of the timestamp is random to make known plaintext attacks on encryption more difficult, even if the source itself does not encrypt, as the packets may flow through a translator that does. 4.2 RTP Payload header format The RTP payload header format for JPEG 2000 video stream is as follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |X|E|MHF|mh_id|T| priority | tile number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | reserved | fragment offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fig. 2: RTP payload header format for JPEG 2000 X : 1 bit Extension bit flag. This bit MUST be set to 1 when a JPEG 2000 optional payload header follows this header, the JPEG 2000 payload header, otherwise it MUST be set to 0. The details of optional payload headers are described in Section 8 of this document. E : 1 bit Enable bit flag. If this bit is set to 1, it means "intelligent packetization" described in Section 5.2. If E bit is 0, it means "non-intelligent packetization" and a receiver MUST ignore any other payload header information other than extension bit flag and fragment offset. Edwards, et al. [Page 7] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 MHF (Main Header Flag) : 2 bits MHF shows whether the main header is packed into the RTP packet or not. When the main header exists in the RTP packet, the sender MUST set the first bit to 1, otherwise this field MUST set to 0. If the first bit is 1, the second bit is valid, and if the last part of the main header is included (either whole or fragmented), the sender MUST set the second bit to 1. In other words, this field is either 3(=0b11) or 2(=0b10) if the main header exists in the RTP packet, otherwise 0. Table of MHF usage is below: +----+-------------------------------------------------------+ |MHF | Description | +----+-------------------------------------------------------+ | 00 | no main header is packed at all | | 01 | the fragmented main header (not last part) is packed. | | 10 | reserved for future use. | | 11 | a whole main header or the last part of the | | | fragmented main header is packed. | +----+-------------------------------------------------------+ Table 1: MHF usage values The receiver checks MHF to determine the main header range and may perform main header compensation described in Section 7 if the main header is lost. mh_id : 3 bits Main header identification value. This is used for the JPEG 2000 main header recovery. The same mh_id is used as long as the coding parameters described in the main header remain unchanged. The mh_id starts at a value 1 when the first main header is transmitted. Mh_id value must increase by 1 every time a new main header is transmitted. Once the mh_id value is greater than 7, it must roll over and start at 1 again. Usage of this header is described in Section 7 of this document. This field is only valid when E bit is 1. If the E bit is 0, then this field SHOULD be zero. priority : 8 bits The priority field indicates the importance of the JPEG 2000 packet included in the payload. Typically, a higher priority is set in the packets containing the JPEG 2000 packets of the lower layers and the lower subbands. T (Tile flag) : 1 bit This field shows whether tile number field is valid or not: T=0 means that tile number field is valid and shows the tile Edwards, et al. [Page 8] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 number of the tile-part. The sender MUST set T flag to 0 when only one tile-part is packed into the RTP packet regardless whether it is a whole tile-part or a fragmentation of the tile-part. T=1 means that tile number field is invalid. The sender MUST set T flag to 1 when the multiple whole tile-parts are packed into the RTP packet or there is no tile-part (in other words, only a main header) in the RTP packet. tile number : 16 bits The interpretation of this field is changed depending on the value of the T_flag. When T=0, this field shows the tile number. When T=1, tile number field is invalid. The sender SHOULD set tile number to 0, and the receiver MUST ignore this field. fragment offset : 24 bits This value must be set to the byte offset in the JPEG 2000 data stream of this RTP packet's contents. JPEG 2000 frames are typically larger than underlying network's maximum transfer units (MTU), frames might be fragmented into several packets. The fragment offset is the data offset in bytes of the current packet from the start of the JPEG 2000 code stream. This field helps the receiver to reassemble JPEG 2000 code stream. To perform scalable video delivery by using multiple RTP sessions, the offset value from the first byte of the same frame is set for fragment offset. Accordingly, in scalable video delivery using multiple RTP sessions, the fragment offset may not start with 0 in some RTP sessions even if the packet is the first one of the frame. 5. Fragmentation of JPEG 2000 code stream and Type Field Fig. 3 shows the construction of the JPEG 2000 code stream. The JPEG 2000 code stream consists of a main header beginning with the SOC marker, one or more tiles (only one tile for no tile division), and the EOC marker to indicate the end of the code steam. Each tile consists of a tile-part header starts with the SOT marker and ending with the SOD marker, and a bit stream (a series of JPEG 2000 packets.) +-- +------------+ Main | | SOC | Required as the first marker. header| +------------+ Edwards, et al. [Page 9] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 | | main | Main header marker segments +-- +------------+ | | SOT | Required at the beginning of each tile-part Tile- | +------------+ header. part | | T0,TP0 | Tile 0, tile-part 0 header marker segments header| +------------+ | | SOD | Required at the end of each tile-part header +-- +------------+ | bit stream | Tile-part bit stream. +-- +------------+ Might include SOP and EPH | | SOT | Tile- | +------------+ part | | T1,TP0 | header| +------------+ | | SOD | +-- +------------+ | bit stream | +------------+ | EOC | Required as the last marker in the code stream +------------+ Fig. 3: Construction of the JPEG 2000 code stream The JPEG 2000 code stream consists of a main header, tile-part headers, and JPEG 2000 packets. When we packetize the JPEG 2000 code stream, these construction units from the code stream must be maintained. Each RTP packet will consist of a main header, tile-part header, or JPEG 2000 packet. If the server does not understand JPEG 2000 code stream (i.e. the sender is not intelligent) it should pack JPEG 2000 code stream in the largest possible MTU data size for the RTP packet. The sender must segment the JPEG 2000 code stream along arbitrary lengths into RTP sized packets for the receiver. In this case, the E bit MUST be set to 0. This type of packetization is called "non-intelligent packetization". If the sender understands JPEG 2000 code streams and can read the JPEG 2000 packets from the code stream. (i.e. the sender is intelligent) This type of packetization is called "intelligent packetization". JPEG 2000 packets should be packed into RTP payload packets in the following way: 1. If the JPEG 2000 packets are smaller than the MTU size, the sender should put as many whole JPEG 2000 packets into a single RTP packet. That is, the JPEG 2000 payload data should begin with either one of the SOC marker, SOT marker, or SOP marker (if it exists in the JPEG 2000 data stream). 2. If the JPEG 2000 packets are larger than the MTU size, the sender should segment the JPEG 2000 packets at the largest possible MTU size but JPEG 2000 packets must not overlap. Edwards, et al. [Page 10] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 Regardless of the sender's capabilities, the receiver MUST be able to handle RTP packets of any size. If the sender does not fragment, any packets larger than the MTU size might be fragmented into multiple smaller IP packets than the MTU size by the IP layer. If one fragmented IP packet is lost during transmission, it is recognized as a loss of the whole RTP packet because the receiving host might not be able to reassemble the RTP packet. The segmentation of the JPEG 2000 code stream into RTP packets must fit within the RTP payload size. For intelligent packetization, all packets SHOULD be 32 bit aligned. If padding bits are required, then the padding bits MUST come at the end of the payload. Any required padding bits MUST NOT appear between the header and the payload or at the beginning. In the following, all the possible packetization cases are described with diagrams. 5.1 Separation at arbitrary lengths In this case, a JPEG 2000 code stream is split into several fragments at arbitrary byte-position(Fig.4). The E bit MUST be set to 0 for this packetization type. +---+---+---+----------------------+ |RTP|PL |SOC| jpeg 2000 codestream | |hdr|hdr| | fragment (1) | +---+---+---+----------------------+ +---+---+--------------------------+ |RTP|PL | jpeg 2000 codestream | |hdr|hdr| fragment (2) | +---+---+--------------------------+ ... +---+---+----------------------+---+ |RTP|PL | jpeg 2000 codestream |EOC| |hdr|hdr| fragment (N) | | +---+---+----------------------+---+ *PL hdr = payload header Fig. 4: Arbitrary length fragmentation. The E (Enable) bit flag in the payload header MUST be 0 for this packetization type. All other fields except for the X bit and fragment offset field, in the payload header SHOULD be 0 and the receiver MUST ignore any other values when the enable bit is 0. Such RTP packetization scheme is not recommended from the standpoint of error resilience. It is desirable to use it only in some limited environments shown below: Edwards, et al. [Page 11] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 - The sender finds it difficult to distinguish the main header, tile header, and JPEG 2000 packets from one another. Such a situation is likely to occur when the sender has poor computational power and there is no SOP marker in the JPEG 2000 code stream. - The network environment is error free. - If the JPEG 2000 error resilience markers (TLM, PLM, PLT, PPM, and PPT markers) are present in the code stream. Error resilience will be handled outside of RTP. Its description is not within the scope of this document. Using these markers may improve error resilience and recovery. Producing JPEG 2000 bit streams with these markers is highly recommended in all cases. 5.2 General JPEG 2000 RTP packet types For the following packetization types, the E bit MUST be set to 1 in all following cases. (1) JPEG 2000 main header (SOC marker) must come first after the payload header (just after the RTP payload header). If a whole main header is packed into the RTP packet, the MHF_value must be 3 (=0b11). The tile-part header and jp2-packets MAY follow the main header in the same packet. When only the main header is in the RTP packet, the T flag MUST be set to 1 and the tile number field is ignored. The sender SHOULD set the tile number to 0x00, and the receiver MUST ignore this field. (2) If two or more tile-parts are packed into a single RTP packet, only whole tile-parts MUST be packed into the RTP packet. Segmented tile-packets MUST NOT be packed or spread over RTP several RTP packets. When the multiple tile-parts exist in a single RTP packet, the T flag MUST be set to 1, which shows the tile number field is invald . (3) If one tile-part is packed into the RTP packet, the tile-part header, if any, MUST come first. Note that the tile-part header just after the main header MAY either be packed with the main header, or be separated to another RTP packet. In this case, T flag MUST be set to 0 and the tile number of the tile-part is set in the "tile number" field. Jp2-packets MAY follow the tile-part header and may be packed into the same RTP packet. (4) If no headers of any kind are in the RTP packet, the T flag MUST be set to 0 and the tile number field MUST be set to the tile number which the jp2-packets belongs to. (5) If the main header, a tile-part header, or a jp2-packet is split into the multiple RTP packets, only one fragment SHALL be packed into an RTP packet. If the main header is split, only the last fragment's MHF is 3 (=0b11), and the rest are 2(=0b10) . All other fragmented RTP packet's MHF value shall be 0. Edwards, et al. [Page 12] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 6. Scalable Delivery and Priority field JPEG 2000 code stream has rich functionality built into it so decoders can easily handle scalable delivery or progressive transmission. Progressive transmission that allows images to be reconstructed with increasing pixel accuracy or spatial resolution is essential for many applications. This feature allows the reconstruction of images with different resolutions and pixel accuracy, as needed or desired, for different target devices. The largest image source devices can provide a code stream that is easily processed for the smallest image display device. The JPEG 2000 packets contain all compressed image data from a specific layer, a specific component, a specific resolution level, and a specific precinct. The order in which these packets are found in the code stream is called the "progression order". The ordering of the packets can progress along four axes: layer, component, resolution level and precinct. Providing priority field to show importance of data contained in a given RTP packet can exploit JPEG 2000 progressive & scalable functions. The lower the priority value, the higher the priority. In other words, the priority value 0 is the highest priority, and 255 is the lowest priority. We define the priority value 0 and 255 as special priorities: 0 for the headers (the main header or tile-part header), and 255 for no priority. When any headers (the main header or tile-part header) are packed into the RTP packet, the sender MUST set the priority value to 0. When the sender will not use the priority field, the sender MUST set the priority value to 255 to inform the receiver that sender doesn't use the priority field. 6.1 Priority mapping table For the progression order, the priority value to be given to each JPEG 2000 packet is defined by the priority mapping table. In principle, the priority mapping table is negotiated between the sender and the receiver through external protocols (such as: RTSP, SIP, etc), which not within the scope of this document. However, in some environments such as a multicast videoconference environment, it might be difficult to negotiate the priority-mapping table between senders and receivers. We define the default priority mapping for such a situation. The receiver interprets the priority as a user-defined priority value only when the priority-mapping table has been negotiated and otherwise the receiver interprets as the default priority. 6.1.1 default priority mapping The JPEG 2000 codestream is ordered in some progression order and Edwards, et al. [Page 13] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 the in most cases; the foremost jp2-packets are more important than the latter ones. In the default priority table, jp2-packet number is used as a priority value. Jp2-packet number is "packet sequence number" defined at SOP marker segments described in Annex A.8.1 [1]. The default priority values have a range from 1 to 254. If the number of packets is larger than 254, that is, a sequence number exceeds 254, the sender MUST set priority values of the following jp2-packets to 254. 6.1.2 user-defined priority table The user-defined priority table is freely defined by users, but priority value 0 and 255 MUST be used as a special priorities: 0 for the headers and 255 for no priority. For example, in the LRCP order codestream with 3 layers and 3 resolutions, the user-defined priority table can be defined below (the format is not significant). It has 4 level priorities. priority 1: L=0,R=0, C=any, P=any priority 2: L=0,R=1-2, C=any, P=any priority 3: L=1,R=any, C=any, P=any priority 4: L=2,R=any, C=any, P=any As another example, the resolution-based priority table can be defined as below: Priority 1: R=0, L=0, C=any, P=any Priority 2: R=0, L=1-2, C=any, P=any Priority 3: R=1, L=any, C=any, P=any Priority 4: R=2, L=any, C=any, P=any As another example, the component-based priority table can be defined as below: Priority 1: C=0, L=0, R=0, P=any Priority 2: C=0, L=0, R=any, P=any C=0, L=any, R=0, P=any Priority 3: C=1-2, L=any, R=any, P=any To change the priority-mapping table, a new priority-mapping table must be sent from the sender to the receiver as needed. 6.2 Sender's Actions Priority is given in accordance with the priority-mapping table. For RTP packets that only consist of a whole or fragmented main/tile header, the sender MUST set priority 0 when a priority-mapping table is used. If a priority-mapping table is not used, the priority value must be 0xFF for the same RTP packets. Edwards, et al. [Page 14] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 When the several jp2-packets are packed into the same RTP packet, the priority values of these jp2-packets are sometimes different. In such a case, a sender MUST set the packet priority to the highest priority of all the ones inside the packet. If the sender does not use any priority-mapping table, it MUST set 0xff in the priority field. The sender may transmit each priority using separate multiple RTP sessions defined by the priority value. For example, different priority may be allocated to others in a multicast group. The sender may also transmit all priority valued RTP packets using a single RTP session. 6.3 Receiver's Action Progressive transmission that allows images to be reconstructed with increasing pixel accuracy or spatial resolution is essential for many applications. This feature allows the reconstruction of images with different resolutions and pixel accuracy, as needed or desired, for different target devices. The image architecture provides for the efficient delivery of image data in many applications such as client/server applications. The receiver should decode packets above a certain priority to obtain maximum performance depending on the receiver's platform. The receiver can determine on its own (using or not using the mapping table or other variables) the priority value level the RTP packets it should decode. For example, when a less powerful CPU is used or the terminal has only a low-resolution display, decoding only RTP packets below a certain priority permits obtaining optimal performance. If any high-priority RTP packet is not received when a packet loss occurs, frame(s) can be skipped because no significant visual loss may be perceived even if decoding can be successfully performed. When any uninterpretable or an unexpected priority is received, the receiver must interpret the packets as no priority (i.e. priority= 0xFF). 7. JPEG 2000 main header compensation The JPEG 2000 image main header describes various encode parameters and the decoder decodes by using the parameters described in the main header. If the RTP packet that contains the main header is lost, the corresponding JPEG 2000 code stream cannot and should not be decoded. In an extremely rare case, if the main header has dropped and all the remainder JPEG 2000 packets has been received successfully, the receiver cannot decode the frame without main header information. Even when the main header is lost, it can be recovered to a certain level using the following method. Edwards, et al. [Page 15] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 A recovery of the main header that has been lost is very simple with this procedure. In the case of JPEG 2000 video, it is common that encode parameters will not vary greatly from each successive frame. Even if the RTP packet including the main header of a frame has dropped, decoding processing may be performed by using the main header of the previous frame if this previous frame is already encoded by the same encode parameters. The mh_id field of the payload header is used to recognize whether the encoding parameters of the main header are the same as the encoding parameters of the previous frame. The same value is set in mh_id of the RTP packet in the same frame. Mh_id and encode parameters are not associated with each other as 1:1 but they are used to recognize whether the encode parameters of the previous frame are the same or not. The mh_id field value SHOULD be saved from previous frames to be used to recover the current frame's main header, if lost. If the mh_id of the current frame has the same value as the mh_id value of the previous frame, the previous frame's main header SHOULD be used to decode the current frame, in case of a lost header. The sender MUST increment mh_id when parameters in the header change and send a new main header accordingly. The receiver MAY use the md_id and MAY retain the header for such compensation. 7.1 Sender processing The sender must transmit RTP packets with the same mh_id value unless the encoder parameters are different from the previous frame. The encode parameters are the fixed information marker segment (SIZ marker) and functional marker segments (COD, COC, RGN, QCD, QCC, and POC) specified in JPEG 2000 Part 1 Annex A [1]. If the encode parameters have been changed, the sender transmitting RTP packets MUST increment the mh_id value by one. The initial mh_id value should be 1. When the mh_id value exceeds 7, the value MUST return to 1 again. If the md_id field is set to 0, the receiver MUST not save the main header and MUST NOT compensate for lost headers using the above method. 7.2 Receiver processing When the receiver has received the main header correctly, the RTP sequence number, the mh_id and main header should be saved except when the mh_id value is 0. Only the last main header that was received correctly SHOULD be saved. That is, if there has been a saved main header, the previous one is deleted and the new main header is saved. Edwards, et al. [Page 16] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 When the main header is not received, the receiver compares the current mh_id value (this mh_id can be known by receiving at least one RTP packet) with the saved mh_id value. When the values are the same, decoding may be performed by using the saved main header. Knowing whether the main header is lost or not maybe difficult, especially when the main header is fragmented. In all cases, the main header will start with fragment offset = 0. In the case of fragmented main header, only the first fragment will have the fragment offset = 0. 8. Optional Payload Header When the extension bit of the JPEG 2000 payload header is 1, an optional payload header follows the payload header. The JPEG 2000 video stream payload comes after the optional payload header. The figure shows a general format of the optional payload header. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | optype |X| length | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | option specific format ..... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fig. 5 : JPEG 2000 video stream optional payload header generic format optype : 7 bits optype describes the optional payload header type. Optype value 0 MUST not be used. Optype values from 1 to 63 are defined in the specification. In this draft, 1 and 2 are defined and the rest are reserved for future use. Optype values from 64 to 127 are application-defined ones and can be freely used for application's own definition. Any optype values in the range of 0-63 not specified within this document MUST be ignored and the accompanying header must be ignored as well. +--------------------+----------------------------------+ | Optype value range | Defined in | +--------------------+----------------------------------+ | 0 | Not allowed. | | 1-63 | In this specification. | | 64-127 | Free for application definition. | +--------------------+----------------------------------+ Table 2: Optype value definition range X : 1 bit Edwards, et al. [Page 17] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 Further extension bit. This must be set to 1 if another optional payload header follows this optional payload header; otherwise it must be set to 0. When the extension bit of the optional header is 1, another optional payload header MUST come immediately after this optional payload header. length : 16 bits This value must be the length of optional header in bytes. The receiver shall perform processing for the optional header when the extension bit of the JPEG 2000 payload header is 1. 8.1 Marker Segment Optional Header The marker segment optional header allows changes to almost any property of the JPEG 2000 main or tile header functional markers such as: (SIZ, COD, COC, RGN, QCD, QCC, POC, etc.) As an optional header, this can be used to duplicate critical data from the main or tile header redundantly with each packet. At the same time, small changes to a larger header would be simple with this marker. The format of this optional header: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | optype=1 |X| length |F| JP2code | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | marker segment data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fig. 6 : Marker segment optional header format optype: 7 bits This value MUST be 1 for this optional header. X : 1 bit Extension bit. This signifies whether another optional header follows this one. If there is another, the X bit MUST be set to 1; else, it must be 0. For multiple changes to the header, chaining these headers together is recommended. length : 16 bits Length value. The length of this optional marker should be the length of the corresponding JP2 functional marker minus 1. Edwards, et al. [Page 18] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 (i.e. Lxxx - 1) Please see section 8.1.1 and section 8.1.2 for specific example. F : 1 bit Functional bit. Whether the optional header is making a change in the main or tile header. F = 0 for tile header and F = 1 for main header. JP2code : 7 bits JP2 functional code value. This value contains the lower 7bits of the original JPEG 2000 functional code marker. (i.e. COD marker = 0xFF52, lower 7 bits = 0x52 -> 0b1010010) marker segment data : length bits The data in this area MUST be the same as the corresponding JPEG 2000 marker data specified in Annex A of [1] but not including the length of the marker segment. A limitation of this optional header is that the functional markers in the optional header MUST be present in the original main or tile header. Markers other than the ones in main or tile headers MUST NOT be present in this header. 8.1.1 Specific example of marker segment header: COD Here is a specific marker segment header for a COD functional segment: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | optype=1 |0| length=(Lcod-1) |1| 1010010 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Scod | SGcod | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SGcod | | +-+-+-+-+-+-+-+-+ | | Spcod (Lcod length) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fig. 7 : COD marker segment optional header example -Optype = 1. As specified in this recommendation. -X = 0. For this instance. -Length = Lcod - 1. The length of the original COD marker - 1. -F = 1. This change is in the main header, then F=1. - JP2Code = 1010010 -> 0x52. COD marker in JPEG 2000 value: 0xFF52. 8.1.2 Specific example of marker segment header: QCD Edwards, et al. [Page 19] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 Here is a specific marker segment header for a QCD functional segment: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | optype=1 |0| length=(Lqcd-1) |0| 1011100 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sqcd | | +-+-+-+-+-+-+-+-+ | | SPqcd (Lqcd length) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fig. 8 : QCD marker segment optional header example -Optype = 1. As specified in this recommendation. -X = 0. For this instance. -Length = Lqcd - 1. The length of the original QCD marker - 1. -F = 0. This change is in the tile header, then F=0. -JP2Code = 1011100 -> 0x5C. QCD marker in JPEG 2000 value: 0xFF5C. 8.2 JPIP Optional Header Interoperability with different standards is extremely useful. The ISO WG1 group also has put forth a transmission protocol standard called: JPIP. This standard is a protocol standard for viewing JPEG 2000 images interactively using RTSP. To embrace this standard, an optional JPIP header to handle the RTP data for JPIP compatible clients is defined here. At the time of this writing, the JPIP work is still in its early stage of standardization. Currently, a reserved optype value of 2 will be placed for JPIP when it is complete. The option specific information in this optional header shall be the same as the server response data packet from a JPIP server or a description of the packet's JPEG 2000 packets. Optype : 7 bits The optype value for a compatible JPIP optional header must be 2. Option specific format: X bits This shall be determined at a later date. 9. Security Consideration RTP packets using the payload format defined in this specification Edwards, et al. [Page 20] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 are subject to the security considerations discussed in the RTP specifications[3]. This implies that confidentiality of the media streams is achieved by encryption. Because the data compression used with this payload format is applied end-to-end, encryption may be performed on the compressed data so there is no conflict between the two operations. 10. Recommended Practices As the JPEG 2000 coding standard is highly flexible, many different but compliant data streams can be produced and still be labeled as a JPEG 2000 data stream. The following is a set of recommendations set forth from our experience in developing JPEG 2000 and this payload specification. Implementations of this standard must handle all possibilities mentioned in this specification. The following is a listing of items an implementation could optimize. Error Resilience Markers The use of error resilience markers in the JPEG 2000 data stream is highly recommended in all situations. Error recovery with these markers is helpful to the decoder and save external resources. Markers such as: RESET, RESTART, and ERTERM Packetization Ordering Packetization ordering is completely dependent on the client's capabilities. Some orderings allow for less amount of distortion in the event of loss at the expense of memory storage and packet reordering. YCbCr Color space The YCbCr color space provides the greatest amount of compression in color with respect to the human visual system. When used with JPEG 2000, the usage of this color space can provide excellent visual results at extreme bit rates. Progression Ordering JPEG 2000 offers many different ways to order the final code stream to optimize the transfer with the presentation. The most useful ordering in our usage cases have been for layer progression and resolution progression ordering. Tiling and Packets JPEG 2000 packets are formed regardless of the encoding method. The encoder has little control over the size of these JPEG 2000 packets as they maybe large or small. Edwards, et al. [Page 21] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 Tiling splits the image up into smaller areas and each are encoded separately. With tiles, the JPEG 2000 packet sizes are also reduced. When using tiling, almost all JPEG 2000 packet sizes are an acceptable size (i.e. smaller than the MTU size of most networks.) It is highly recommended that tiling be used so that packetization of JPEG 2000 packets for transport can be done simpler. 11. Intellectual Property Right There are format and mechanisms included in a pending patent application that have been FILED to the Japanese Patent Office. It must be stressed that as of this document's submission they have only been filed and have not been granted. We wish to contribute to development of the Internet community and continue our good relationship with IETF. Our primary concern is to promote technology so that others may feel it is useful and worthwhile in the spirit of the IETF. If the mechanisms are granted as patents, the patents will be licensed under reasonable and non-discriminatory conditions to any person(s) who wishes to implement such mechanisms. 12. References [1] ISO/IEC JTC1/SC29: ISO/IEC 15444-1 "Information technology - JPEG 2000 image coding system - Part 1: Core coding system", December 2000. [2] ISO/IEC JTC1/SC29/WG1: "Motion JPEG 2000 Committee Draft 1.0", http://www.jpeg.org/public/cd15444-3.pdf, December 2000. [3] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson "RTP: A Transport Protocol for Real Time Applications", RFC 1889, January 1996. [4] ISO/IEC JTC1/SC29/WG1: "JPEG2000 requirements and profiles version 6.3", draft in progress, http://www.jpeg.org/public/wg1n1803.pdf [5] Diego Santa-Cruz, Touradj Ebrahimi, Joel Askelof, Mathias Larsson and Charilaos Christopoulos: "JPEG 2000 still image coding versus other standards", In Proc. of SPIE's 45th annual meeting, Application of Digital Image Processing XXIII, vol.4115, pp.446-454, July 2000. 13. Authors' Addresses Edwards, et al. [Page 22] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-01.txt June 2002 Eric Edwards Sony Corporation Media Processing Division Network & Software Technology Center of America 3300 Zanker Road, MD: SJ2C4 San Jose, CA 95134 Phone: +1 408 955 6462 Fax: +1 408 955 5724 Email: Eric.Edwards@am.sony.com Satoshi Futemma/Eisaburo Itakura/Nobuyoshi Tomita Sony Corporation 6-7-35 Kitashinagawa Shinagawa-ku Tokyo 141-0001 JAPAN Phone: +81 3 5448 3096 Fax: +81 3 5448 4622 Email: {satosi-f|itakura|n-tomita}@sm.sony.co.jp Andrew Leung/Takahiro Fukuhara Sony Corporation 1-11-1 Osaki Shinagawa-ku Tokyo 141-0032 JAPAN Phone: +81 3 5435 3665 Fax: +81 3 5435 3891 Email: {andrew|fukuhara}@av.crl.sony.co.jp Edwards, et al. [Page 23]