Internet Draft S. Wenger Document: draft-ietf-avt-rtp-h264-01.txt M.M. Hannuksela Expires: August 2003 T. Stockhammer February 2003 Expires August 2003 RTP payload Format for JVT Video Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract This memo describes an RTP Payload format for the ITU-T Recommendation H.264 video codec. The most up-to-date draft of the video codec was specified in February 2003, is due for final approval at the committee level late March 2003, and is available for public review [1]. This codec was designed as a joint project of the Video Coding Experts Group (VCEG) of ITU-T and the Moving Picture Experts Group (MPEG) of ISO/IEC. ISO/IEC International Standard 14496-10 will be technically identical to ITU-T Recommendation H.264. Wenger et. al. Expires August 2003 [Page 1] Internet Draft 01 March, 2003 Table of Contents 1. Introduction......................................................3 1.1. The JVT codec..................................................3 1.2. Parameter Set Concept..........................................4 1.3. Network Abstraction Layer Packet (NALU) Types..................5 2. Conventions.......................................................6 3. Changes relative to draft-ietf-avt-rtp-h264-00.txt................6 3.1. Status of the JVT standardization, and recent changes to JVT...6 3.2. Changes relative to draft-ietf-avt-rtp-h264-00.txt.............6 4. Scope.............................................................6 5. Definitions.......................................................7 6. RTP Payload Format................................................7 6.1. RTP Header Usage...............................................7 6.2. Simple Packet..................................................8 6.3. Aggregation Packets............................................9 6.4. Fragmentation Units...........................................13 7. Packetization Rules..............................................14 7.1. Unrestricted Mode (Multiple Picture Model)....................15 7.2. Restricted Mode (Single Picture Model)........................16 8. De-Packetization Process.........................................16 9. MIME Considerations..............................................18 9.1. MIME Registration.............................................19 9.2. SDP Parameters................................................21 10. Security Considerations.........................................21 11. Informative Appendix: Application Examples......................22 11.1. Video Telephony, no Data Partitioning, no packet aggregation.22 11.2. Video Telephony, Interleaved Packetization using Packet Aggregation........................................................22 11.3. Video Telephony, with Data Partitioning......................23 11.4. Low-Bit-Rate Streaming.......................................23 11.5. Robust Packet Scheduling in Video Streaming..................24 12. Open Issues.....................................................25 13. Full Copyright Statement........................................25 14. Intellectual Property Notice....................................25 15. References......................................................25 15.1. Normative References.........................................25 15.2. Informative References.......................................26 Wenger et. al. Expires December 2002 [Page 2] Internet Draft 01 March, 2003 1. Introduction 1.1. The JVT codec This memo specifies an RTP payload specification for a new video codec that is currently under development by the Joint Video Group (JVT), which is formed of video coding experts of MPEG and the ITU- T. After the likely approval by the two parent bodies, the codec specification will have the status of the ITU-T Recommendation H.264 and become part of the MPEG-4 specification (ISO/IEC 14496 Part 10). The current project timeline of the JVT project is such that a technically frozen specification exists since February 2003 (pending bug fixes). It is believed that only very few, if any, technical details will be changed that directly affect this draft in the future. Before JVT was formed in late 2001, this project used the ITU-T project name H.26L and the JVT project inherited all the technical concepts of the H.26L project. The JVT video codec has a very broad application range that covers the all forms of digital compressed video from low bit rate Internet Streaming applications to HDTV broadcast and Digital Cinema applications with near loss-less coding. Most, if not all, relevant companies in all of these fields (including Video- Conferencing, Streaming, TV broadcast, and Digital Cinema) have participated in the standardization, which gives hope that this wide application range is more than an illusion and may materialize, probably in a relatively short time frame. The overall performance of the JVT codec is as such that bit rate savings of 50% or more, compared to the current state of technology, are reported. Digital Satellite TV quality, for example, was reported to be achievable at 1.5 Mbit/s, compared to the current operation point of MPEG 2 video at around 3.5 Mbit/s [5]. The codec specification [1] itself distinguishes conceptually between a video coding layer (VCL), and a network abstraction layer (NAL). The VCL contains the signal processing functionality of the codec, things such as transform, quantization, motion search/compensation, and the loop filter. It follows the general concept of most of today's video codecs, a macroblock-based coder that utilizes inter picture prediction with motion compensation, and transform coding of the residual signal. The output of the VCL are slices: a bit string that contains the macroblock data of an integer number of macroblocks, and the information of the slice header (containing the spatial address of the first macroblock in the slice, the initial quantization parameter, and similar). Macroblocks in slices are ordered in scan order unless a different macroblock allocation is specified, using the so-called Flexible Macroblock Ordering syntax. In-picture prediction is used only within a slice. The NAL encapsulates the slice output of the VCL into Network Abstraction Layer Units (NALUs), which are suitable for the transmission over packet networks or the use in packet oriented Wenger et. al. Expires December 2002 [Page 3] Internet Draft 01 March, 2003 multiplex environments. JVT's Annex B defines an encapsulation process to transmit such NALUs over byte-stream oriented networks. In the scope of this memo Annex B is not relevant. Neither VCL nor NAL are claimed to be media or network independent - the VCL needs to know transmission characteristics in order to appropriately select the error resilience strength, slice size, etc., whereas the NAL needs information like the importance of a bit string provided by the VCL to select the appropriate application layer protection. Internally, the NAL uses NAL Units or NALUs. A NALU consists of a one-byte header and the payload byte string. The header co-serves as the RTP payload header and indicates the type of the NALU, the (potential) presence of bit errors in the NALU payload, and information regarding the relative importance of the NALU for the decoding process. This RTP payload specification is designed to be unaware of the bit string in the NALU payload. One of the main properties of the JVT codec is the complete decoupling of the transmission time, the decoding time, and the sampling or presentation time of slices and pictures. The codec itself is unaware of time, and does not carry information such as the number of skipped frames (as common in the form of the Temporal Reference in earlier video compression standards). Also, there are NAL units that are affecting many pictures and are, hence, inherently time-less. For this reason, the handling of the RTP timestamp requires some special considerations for those NALUs for which the sampling or presentation time is not defined, or, at transmission time, unknown. 1.2. Parameter Set Concept One very fundamental design concept of the JVT codec is to generate self-contained packets, to make mechanisms such as the header duplication of RFC2429 [6] or MPEG-4's HEC [7] unnecessary. The way how this was achieved is to decouple information that is relevant to more than one slice from the media stream. This higher layer meta information should be sent reliably, asynchronously and in advance from the RTP packet stream that contains the slice packets. (Provisions for sending this information in-band are also available for such applications that do not have an out-of-band transport channel appropriate for the purpose). The combination of the higher level parameters is called a Parameter Set. The Parameter Set contains information such as o picture size, o display window, o optional coding modes employed, o macroblock allocation map, o and others. In order to be able to change picture parameters (such as the picture size), without having the need to transmit Parameter Set Wenger et. al. Expires December 2002 [Page 4] Internet Draft 01 March, 2003 updates synchronously to the slice packet stream, the encoder and decoder can maintain a list of more than one Parameter Set. Each slice header contains a codeword that indicates the Parameter Set to be used. This mechanism allows to decouple the transmission of the Parameter Sets from the packet stream, and transmit them by external means, e.g. as a side effect of the capability exchange, or through a (reliable or unreliable) control protocol. It may even be possible that they get never transmitted but are fixed by an application design specification. Although, conceptually, the Parameter Set updates are not designed to be sent in the synchronous packet stream, this memo contains means to convey them in the RTP packet stream. 1.3. Network Abstraction Layer Packet (NALU) Types Tutorial information on the NAL design can be found in [8], [9] and [10]. For the precise definition of the NAL it is referred to [1]. All NALUs consist of a single NALU Type octet, which also co-serves as the payload header. The payload of a NALU follows immediately. The NALU type octet has the following format: +---------------+ |0|1|2|3|4|5|6|7| +-+-+-+-+-+-+-+-+ |F|NRI| Type | +---------------+ F: 1 bit The Forbidden bit, when zero, indicates a bit error free NAL unit. The JVT specification declares a value of 1 as a syntax violation. Hence, when set, the decoder is advised that bit errors may be present in the payload or in the NALU type octet. A prudent reaction of decoders that are incapable of handling bit errors is to discard such packets. NRI: 2 bits NAL Reference IDC. A value of 00 indicates that the content of the NALU is not used to reconstruct stored pictures (that can be used for future reference). Such NALUs can be discarded without risking the integrity of the reference pictures. Values above 00 indicate that the decoding of the NALU is required to maintain the integrity of the reference pictures. Furthermore, values above 00 indicate the relative transport priority, as determined by the encoder. Intelligent network elements can use this information t protect more important NALUs better than less important NALUs. 11 is the highest transport priority, followed by 10, then by 01 and, finally, 00 is the lowest. Wenger et. al. Expires December 2002 [Page 5] Internet Draft 01 March, 2003 Type: 5 bits The NAL Unit payload type as defined in table 7.1 of [1], and later within this memo. Note that the NAL unit types defined in this memo are marked as reserved for external use in [1]. For a reference of all currently defined NALU types and their semantics please refer to section 7.4.1 in [1]. In particular, note that VCL NAL units refer to coded slice and data partition NAL units as well as filler data NAL units. 2. Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [2]. 3. Changes relative to draft-ietf-avt-rtp-h264-00.txt [This section will be removed in a future version of this draft.] 3.1. Status of the JVT standardization, and recent changes to JVT None that affect this draft. 3.2. Changes relative to draft-ietf-avt-rtp-h264-00.txt This memo contains the following technical changes relative to the previous I-D: o The MTAPs with timestamp offset lengths of 8 and 32 bits are removed, as discussed in Atlanta. o The remarks about application layer protection are aligned with the current thinking of the AVT group re congestion control. o A fragmentation NALU has been introduced to allow fragmenting long NALUs into several RTP packets. o Recovering the decoding order of NALUs carried in MTAPs is clarified and transmission of NALUs out of decoding order is allowed, which can be used for robust packet scheduling in streaming systems (see section 11.5 for further details). o The rule of assigning the RTP timestamp to non-slice NALUs has been changed: The RTP timestamp is set to the RTP timestamp of the primary coded picture to which the NALU is associated according to section 7.4.1.2 of [1]. o MIME type registration and SDP usage have been specified. 4. Scope This payload specification can only be used to carry the "naked" JVT NALU stream over RTP. Likely, the first applications of a Standard Track RFC resulting from this draft will be in the conversational multimedia field, video telephone or video Wenger et. al. Expires December 2002 [Page 6] Internet Draft 01 March, 2003 conference. The draft is not intended for the use in conjunction with the Byte Stream format of Annex B of the JVT working draft. 5. Definitions This document uses the definitions of [1]. In addition, the following definitions apply: NAL unit decoding order: A NAL unit order that conforms to the constraints on NAL unit order given in section 7.4.1.1 in [1]. Transmission order: The order of packets in ascending RTP sequence number order (in modulo arithmetic). Within an Aggregation Packet, the NAL unit transmission order is the same as the order of appearance of NAL units in the packet. 6. RTP Payload Format 6.1. RTP Header Usage The format of the RTP header is specified in RFC 1889 [3] and reprinted in Figure XXXX for convenience. This payload format uses the fields of the header in a manner consistent with that specification. When encapsulating one NALU per RTP packet, the RECOMMENDED RTP payload is specified in section 6.2. The RTP payload (and the settings for some RTP header bits) for aggregation packets and fragmentation units are specified in sections 6.3 and 6.4, respectively. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure xxxx: RTP header according to RFC 1889. The RTP header information is set as follows: Version (V): 2 bits Set to 2 according to RFC 1889. Padding (P): 1 bit Used according to RFC 1889. Wenger et. al. Expires December 2002 [Page 7] Internet Draft 01 March, 2003 Extension (X): 1 bit Specified in the RTP profile in use. CSRC count (CC): 4 bits Used according to RFC 1889. Marker bit (M): 1 bit Set for the very last packet of the picture indicated by the RTP timestamp, in line with the normal use of the M bit and to allow an efficient playout buffer handling. Decoders MAY use this bit as an early indication of the last packet of a coded picture, but MUST not rely on this property because the last packet of the picture may get lost, and because the use of MTAPs does not always preserve the M bit. Payload type (PT): 7 bits The assignment of an RTP payload type for this new packet format is outside the scope of this document, and will not be specified here. It is expected that the RTP profile under which this payload format is being used will assign a payload type for this encoding or specify that the payload type is to be bound dynamically. Sequence number (SN): 16 bit Increased by one for each sent packet. Set to a random value during startup as per RFC1889 Timestamp: 32 bits The RTP timestamp is set to the sampling timestamp of the content. If the NALU has no own timing properties (e.g. parameter set and SEI NAL units), the RTP timestamp is set to the RTP timestamp of the primary coded picture to which the NALU is associated according to section 7.4.1.2 of [1]. The setting of the RTP Timestamp for MTAPs is defined in section 6.3.2 above. Synchronization source (SSRC) identifier: 32 bits Used according to RFC 1889. Contributing source (CSRC) identifiers: 0 to 15 items, 32 bits each Used according to RFC 1889. 6.2. Simple Packet The RTP payload of a Simple Packet according to this specification consists of one NALU as depicted in Figure xxxx. The type of the NALU MUST be specified in [1], i.e., the NALU MUST NOT be an aggregation packet or a fragmentation unit. A NAL unit stream composed by decapsulating Simple Packets in RTP sequence number order MUST conform to the NAL unit decoding order. Wenger et. al. Expires December 2002 [Page 8] Internet Draft 01 March, 2003 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | NALU | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | :...OPTIONAL RTP padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure xxxx. RTP payload format for Simple Packet. 6.3. Aggregation Packets Aggregation packets are the packet aggregation scheme of this payload specification. The scheme is introduced to reflect the dramatically different MTU sizes of two key target networks -- wireline IP networks (with an MTU size that is often limited by the Ethernet MTU size -- roughly 1500 bytes), and IP or non-IP (e.g. H.324/M) based wireless networks with preferred transmission unit sizes of 254 bytes or less. In order to prevent media transcoding between the two worlds, and to avoid undesirable packetization overhead, a packet aggregation scheme is introduced. Two types of Aggregation packets are defined by this specification: o Single-Time Aggregation Packet (STAP) aggregate NALUs with identical NALU-time. o Multi-Time Aggregation Packets (MTAP) aggregate NALUs with potentially differing NALU-time. Two different MTAPs are defined that differ in the length of the NALU timestamp offset. The term NALU-time is defined as the value the RTP timestamp would have if that NALU would be transported in its own RTP packet. The structure of the RTP payload format for aggregation packets is presented in Figure xxxx. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F|NRI| type | | +-+-+-+-+-+-+-+-+ | | | | NALU payload | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | :...OPTIONAL RTP padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure xxxx. RTP payload format for aggregation packets. MTAPs and STAP share the following packetization rules: The RTP timestamp MUST be set to the minimum of the NALU times of all the Wenger et. al. Expires December 2002 [Page 9] Internet Draft 01 March, 2003 NALUs to be aggregated. The Type field of the NALU type octet MUST be set to the appropriate value as indicated in table xxx. The F bit MUST be cleared if all F bits of the aggregated NALUs are zero, otherwise it MUST be set. Table xxx: Type field for STAP and MTAPs Type Packet Timestamp offset field length (in bits) ---------------------------------------------- 0x18 STAP 0 0x19 MTAP16 16 0x20 MTAP24 24 The Marker bit in the RTP header MUST be set to the value the marker bit of the last NALU of the aggregated packet would have if it were transported in its own RTP packet. The NALU Payload of an aggregation packet consists of one or more aggregation units. See section 6.3.1 and 6.3.2 for the two different types of aggregation units. An aggregation packet can carry as many aggregation units as necessary, however the total amount of data in an aggregation packet obviously MUST fit into an IP packet, and the size SHOULD be chosen such that the resulting IP packet is smaller than the MTU size. An aggregation packet MUST NOT contain fragmentation units specified in section 6.4. A NAL unit stream composed by decapsulating Aggregation Packets in RTP sequence number order is NOT REQUIRED to conform to the NAL unit decoding order. Requirements on the NAL unit transmission order are specified in section 7 and means to recover the NAL unit decoding order are given in section 8. 6.3.1. Single-Time Aggregation Packet Single-Time Aggregation Packet (STAP) SHOULD be used whenever aggregating NALUs that share the same NALU-time. The NALU payload of an STAP consists of a 16-bit unsigned decoding order number (DON) followed by at least one Single-Picture Aggregation Unit as presented in Figure XXXX. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : decoding order number (DON) | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | single-picture aggregation units | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure xxxx. NALU payload format for STAP. Wenger et. al. Expires December 2002 [Page 10] Internet Draft 01 March, 2003 DON indicates the NAL unit decoding order specified in this document. The DON of the first NALU in transmission order MAY be set to any value. Let DON of one NAL unit be D1 and DON of another NAL unit be D2. If D1 < D2 and D2 - D1 < 32768, or if D1 > D2 and D1 . D2 >= 32768, then the NAL unit having DON equal to D1 precedes the NAL unit having DON equal to D2 in NAL unit decoding order. If D1 < D2 and D2 - D1 >= 32768, or if D1 > D2 and D1 - D2 < 32768, then the NAL unit having DON equal to D2 precedes the NAL unit having DON equal to D1 in NAL unit decoding order. NAL units associated with different primary coded pictures according to subclause 7.4.1.2 of [1] MUST NOT have the same value of DON. NAL units associated with the same primary coded picture according to subclause 7.4.1.2 of [1] MAY have the same value of DON. If all NAL units of a primary coded picture have the same value of DON, NAL units of a redundant coded picture associated with the primary coded picture SHOULD have the same value of DON as the NAL units of the primary coded picture. The NAL unit decoding order of NAL units that have the same value of DON is the following: 1. Picture delimiter NAL unit, if any 2. Sequence parameter set NAL units, if any 3. Picture parameter set NAL units, if any 4. SEI NAL units, if any 5. Coded slice and slice data partition NAL units of the primary coded picture, if any 6. Coded slice and slice data partition NAL units of the redundant coded pictures, if any 7. Filler data NAL units, if any 8. End of sequence NAL unit, if any 9. End of stream NAL unit, if any A Single-Picture Aggregation Unit consists of 16-bit unsigned size information that indicates the size of the following NALU in bytes (excluding these two octets, but including the NALU type octet of the NALU), followed by the NALU itself including its NALU type byte. A Single-Picture Aggregation Unit is byte-aligned within the RTP payload but it may not be aligned on a 32-bit word boundary. Figure xxxx presents the structure of the Single-Picture Aggregation Unit. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : NALU size | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | NALU | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure xxxx. Structure for single-picture aggregation unit. Wenger et. al. Expires December 2002 [Page 11] Internet Draft 01 March, 2003 6.3.2. Multi-Time Aggregation Packets (MTAPs) The NALU payload of MTAPs consists of a 16-bit unsigned decoding order number base (DONB) and one or more Multi-Picture Aggregation Units as presented in Figure xxxx. DONB MUST contain the smallest value of DON among the NAL units of the MTAP. The choice between the different MTAP fields is application dependent -- the larger the timestamp offset is the higher is the flexibility of the MTAP, but the higher is also the overhead. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : decoding order number base | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | multi-picture aggregation units | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure xxxx. NALU payload format for MTAPs. Two different Multi-Time Aggregation Units are defined in this specification. Both of them consist of 16 bits unsigned size information of the following NALU, an 8-bit unsigned decoding order number delta (DOND), and n bits of timing information for this NALU, whereby n can be 16 or 24. The structure of the Multi- Time Aggregation Units for MTAP16 and MTAP24 are presented in figures XXXX and XXXX respectively. Note that the starting or ending position of an aggregation unit within a packet is NOT REQUIRED to be on a 32-bit word boundary. DON of the following NALU is equal to DONB + DOND and MUST NOT be larger than 65535. This memo does not specify how the NALUs within an MTAP are ordered, but, in most cases, NAL unit decoding order, i.e., ascending order of DONDs, SHOULD be used. The timing information field MUST be set so that the RTP timestamp of an RTP packet of each NALU in the MTAP (the NALU-time) can be generated by adding the timing information from the RTP timestamp of the MTAP. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : NALU size | DOND | timing info | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timing info | | +-+-+-+-+-+-+-+-+ NALU | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure xxxx: Multi-Time Aggregation Unit for MTAP16 Wenger et. al. Expires December 2002 [Page 12] Internet Draft 01 March, 2003 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : NALU size | DOND | timing info | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timing info | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | NALU | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure xxxx: Multi-Time Aggregation Unit for MTAP24 For the "earliest" multi-picture Aggregation Unit in an MTAP the timing offset MUST be zero. Hence, the RTP timestamp of the MTAP itself is identical to the earliest NALU-time. 6.4. Fragmentation Units Fragmentation units (FU) are the packet fragmentation scheme of this payload specification. Among others, the scheme is introduced to complement the aggregation unit scheme introduced in section 6.3 and to deliver pre-encoded packetized video over networks with limited MTU size. FUs contain fragments of one single NALU, which is referred to as fragmented NALU. STAPs and MTAPs MUST NOT be fragmented. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |NALU type octet| FU header | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | FU payload | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | :...OPTIONAL RTP padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure xxxx. RTP payload format for Fragmentation Unit. The NALU type octet of a fragmentation unit is indicated by the type definition 0x21 and has the following format: +---------------+ |0|1|2|3|4|5|6|7| +-+-+-+-+-+-+-+-+ |F|NRI| 0x21 | +---------------+ Wenger et. al. Expires December 2002 [Page 13] Internet Draft 01 March, 2003 The NALU payload of a fragmentation unit consists of fragmentation unit header of one octet and a fragmentation unit payload. The FU header has the following format: +---------------+ |0|1|2|3|4|5|6|7| +-+-+-+-+-+-+-+-+ |S|E|R| Type | +---------------+ S: 1 bit The Start bit, when one, indicates the start of a fragmented NALU. Otherwise, when the following payload is not the start of an NALU payload, the Start bit is set to zero. E: 1 bit The End bit, when one, indicates the end of a fragmented NALU, i.e. the last byte of the payload is also the last byte of the fragmented NALU. Otherwise, when the following payload is not the last FU of a fragmented NALU, the End bit is set to zero. R: 1 bit The Reserved bit MUST be 0. Type: 5 bits The NAL Unit payload type as defined in table 7.1 of [1]. The FU payload consists of fragments of the payload of the fragmented NALU such that if the fragmentation unit payloads of consecutive FUs are sequentially concatenated, the payload of the fragmented NALU is reconstructed. Note that the NALU type octet of the fragmented NALU is not included as such in the fragmentation unit payload, but rather the information of the NALU type octet of the fragmented NALU is conveyed in F and NRI fields of the NALU type octet of the fragmentation unit and in the type field of the FU header. A FU payload can have any number of octets and can be empty. The following rules apply to fields in the RTP header and in the NALU type octet of the RTP payload: The RTP timestamp is set to the NALU time of the fragmented NALU. The F bit MUST be set according to the F bit of the fragmented NALU. The value of NRI field MUST be set according to the value of the NRI field in the fragmented NALU. 7. Packetization Rules Two cases of packetization rules have to be distinguished by the possibility to put packets belonging to more than a single picture into a single aggregated packet (using STAPs or MTAPs). Wenger et. al. Expires December 2002 [Page 14] Internet Draft 01 March, 2003 7.1. Unrestricted Mode (Multiple Picture Model) This mode MAY be supported by some receivers. Usually, the capability of a receiver to support this mode is implied by the application, or indicated by external control protocol means. The use of this mode MUST be signaled with the optional aggregation- mode MIME or SDP parameter, if MIME or SDP signaling is in use. The following packetization rules MUST be enforced by the sender: o Single slice NALUs or Data Partition NALUs belonging to the same picture (and hence share the same RTP timestamp value) MAY be sent in any order permitted by the applicable profile defined in [1], although, for delay critical systems, they SHOULD be sent in their original coding order to minimize the delay. Note that the coding order is not necessarily the scan order, but the order the NAL packets become available to the RTP stack. o The transmission order of NALUs MUST conform to the NAL unit decoding order unless signaled otherwise with the optional num- reorder-VCL-NAL-units MIME parameter or by other means. Some receivers MAY NOT support a transmission order that does not conform to the NAL unit decoding order. o Both MTAPs and STAPs MAY be used. o FUs MAY be used. If an NALU is transmitted as a fragmented NALU, the following rules apply. For the first FU of a fragmented NALU the Start bit is set to one, the End bit is set to zero, and any number of initial bytes of the fragmented NALU payload are transported in this FU. Any number of additional FUs belonging to this fragmented NALU may be transmitted with Start bit set to zero and End bit set to zero. If the FU contains the last byte of the fragmented NALU, the End bit is set to one. A fragmented NALU MUST NOT be transmitted in one FU, i.e., Start bit and End bit MUST NOT both be set to one in the same FU header. A Fragmentation Unit MUST NOT contain an aggregation packet. Fragmentation units of a NALU MUST be sent in consecutive packets. o SEI packets MAY be sent anytime. o Parameter set NALUs MUST NOT be sent in an RTP session whose Parameter Sets were already changed by control protocol messages during the lifetime of the RTP session. If parameter set NALUs are allowed by this condition, they MAY be sent at any time. o An MTAP or a STAP MUST NOT contain an FU. o An Aggregation Packet MUST succeed a Simple Packet in transmission order if the NAL units in the Simple Packet precede the NAL units in the Aggregation Packet in NAL unit decoding order. An Aggregation Packet MUST precede a Simple Packet in transmission order if the NAL units in the Simple Packet succeed the NAL units in the Aggregation Packet in NAL unit decoding order. Wenger et. al. Expires December 2002 [Page 15] Internet Draft 01 March, 2003 o An Aggregation Packet MUST succeed a Fragmentation Unit in transmission order if the NAL units in the Simple Packet precede the NAL unit conveyed in the Fragmentation Unit in NAL unit decoding order. An Aggregation Packet MUST precede a Fragmentation Unit in transmission order if the NAL unit conveyed in the Fragmentation Unit succeed the NAL units in the Aggregation Packet in NAL unit decoding order. o All NALU types MAY be mixed freely, provided that above rules are obeyed. In particular, it is allowed to mix slices in data- partitioned and single-slice mode. o Network elements MAY convert multiple RTP packets carrying individual NALUs into one aggregated RTP packet, convert an aggregated RTP packet into several RTP packets carrying individual NALUs, or mix both concepts. However, when doing so they SHOULD take into account at least the following parameters: path MTU size, unequal protection mechanisms (e.g. through packet-based FEC according to RFC2398, carried by RFC2198, especially for parameter set NALUs and Type A Data Partitioning NALUs), bearable latency of the system, and buffering capabilities of the receiver. o NALUs of all types except for FUs MAY be conveyed as aggregation units of an STAP or MTAP rather than individual RTP packets. Special care SHOULD be taken (particularly in gateways) to avoid more than a single copy of identical NALUs in a single STAP/MTAP in order to avoid unnecessary data transfers without any improvements of QoS. 7.2. Restricted Mode (Single Picture Model) This mode MUST be supported by all receivers. It is primarily intended for low delay applications. Its main difference from the Unrestricted Mode is to forbid the packetization of data belonging to more than one picture in a single RTP packet. Hence, MTAPs MUST NOT be used. The following packetization rules MUST be enforced by the sender: o All rules of the Unrestricted Mode above, with the following additions o only STAPs MAY be used, MTAPs MUST NOT be used. This implies that aggregated packets MUST NOT include slices or data partitions belonging to different pictures. 8. De-Packetization Process The de-packetization process is implementation dependent. Hence, the following description should be seen as an example of a suitable implementation. Other schemes MAY be used as well. Wenger et. al. Expires December 2002 [Page 16] Internet Draft 01 March, 2003 Optimizations relative to the described algorithms are likely possible. The general concept behind these de-packetization rules is to reorder NALUs from transmission order to the NAL unit decoding order. All fragmentation units of a NALU are collected and the resulting NALU is processed as if it were received as a Simple Packet. Aggregation packets are handled by unloading their payload into individual RTP packets carrying NALUs. Those NALUs are processed as if they were received in separate RTP packets, in the order they were arranged in the Aggregation Packet. Hereinafter, let N be the value of the optional num-reorder-VCL- NAL-units MIME type parameter (see section 9.1). When the RTP session is initialized, the receiver buffers at least N VCL NAL units before passing any packet to the decoder. For each NAL unit stored in the buffer, the RTP sequence number of the packet that contained the NAL unit is stored and associated with the stored NAL unit. Moreover, the packet type (Simple Packet or Aggregation Packet) that contained the NAL unit is stored and associated with each stored NAL unit. Furthermore, for NAL units carried in aggregation packets, decoding order number (DON) is calculated and stored. If the receiver buffer contains at least N VCL NAL units, NAL units are removed from the receiver buffer and passed to the decoder in the order specified below until the buffer contains N-1 VCL NAL units. Hereinafter, let PDON be the DON of the previous NAL unit of an aggregation packet in NAL unit decoding order. If no previous NAL unit of an aggregation packet in NAL unit decoding order exists, PDON is 0. The order that NAL units are passed to the decoder is specified as follows: o If the oldest RTP sequence number in the buffer corresponds to a Simple Packet, the NALU in the Simple Packet is the next NALU in the NAL unit decoding order. o If the oldest RTP sequence number in the buffer corresponds to an Aggregation Packet, the NAL unit decoding order is recovered among the NALUs conveyed in Aggregation Packets in RTP sequence number order until the next Simple Packet or FU (exclusive). This set of NALUs is hereinafter referred to as the candidate NALUs. If no NALUs conveyed in Simple Packets or FUs reside in the buffer, all NALUs belong to candidate NALUs. o For each NAL unit among the candidate NALUs, a DON distance is calculated as follows. If the DON of the NAL unit is larger than PDON, the DON distance is equal to DON - PDON. Otherwise, the DON distance is equal to 65535 - PDON + DON + 1. NAL units are delivered to the decoder in ascending order of DON distance. Wenger et. al. Expires December 2002 [Page 17] Internet Draft 01 March, 2003 o If several NAL units share the same DON distance, the order to pass them to the decoder is the following: 1. Picture delimiter NAL unit, if any 2. Sequence parameter set NAL units, if any 3. Picture parameter set NAL units, if any 4. SEI NAL units, if any 5. Coded slice and slice data partition NAL units of the primary coded picture, if any 6. Coded slice and slice data partition NAL units of the redundant coded pictures, if any 7. Filler data NAL units, if any 8. End of sequence NAL unit, if any 9. End of stream NAL unit, if any o If the video decoder in use does not support Arbitrary Slice Ordering, the decoding order of slices and A data partitions is ordered in ascending order of the first_mb.in.slice syntax element in the slice header. Moreover, B and C data partitions immediately follow the corresponding A data partition in decoding order. The following additional de-packetization rules MAY be used to implement an operational JVT de-packetizer: o Intelligent RTP receivers (e.g. in Gateways) MAY identify lost DPAs. If a lost DPA is found, the Gateway MAY decide not to send the DPB and DPC partitions, as their information is meaningless for the JVT Decoder. In this way a network element can reduce network load by discarding useless packets, without parsing a complex bit stream o Intelligent receivers MAY discard all packets that have a NAL Reference Idc of 0. However, they SHOULD process those packets if possible, because the user experience may suffer if the packets are discarded. o If a Fragmentation Unit is lost, all Fragmentation Units corresponding to the same NALU SHOULD be discarded. 9. Payload Format Parameters This section specifies the parameters that MAY be used to select optional features of the payload format. The parameters are specified here as part of the MIME subtype registration for the ITU-T H.264 | ISO/IEC 14496-10 codec. A mapping of the parameters into the Session Description Protocol (SDP) [4] is also provided for those applications that use SDP. Equivalent parameters could be defined elsewhere for use with control protocols that do not use MIME or SDP. Wenger et. al. Expires December 2002 [Page 18] Internet Draft 01 March, 2003 9.1. MIME Registration The MIME subtype for the ITU-T H.264 | ISO/IEC 14496-10 codec is allocated from the IETF tree. Any unspecified parameter MUST be ignored by the receiver. Media Type name: video Media subtype name: H264 Required parameters: none Optional parameters: profile-level-id: A profile-level element used in specifying the value of this parameter is generated by forming a string of hexadecimal representations of the following two bytes in the sequence parameter set NAL unit specified in [1]: 1) profile_idc and 2) level_idc. The value of profile-level-id is a sequence of profile-level elements. If this parameter is used for indicating properties of a NAL unit stream, it indicates the profiles that are in use in the stream and the highest level that is in use for each signaled profile. If this parameter is used for capability exchange or session setup procedure, it indicates the profiles that the codec supports and the highest level that is supported for each signaled profile. For example, if a codec supports the Baseline Profile at level 3 and below and the Main Profile at level 2.1 and below, the profile-level-id becomes 421E4D15. If no profile-level-id is present, the Baseline Profile at Level 1 MUST be implied. profile-interoperability: This parameter MAY be used to signal the properties of a NAL unit stream. It MUST NOT be used to signal the capabilities of a codec implementation. The parameter indicates which ones of the coding tools that are included in the Baseline Profile but are not included in the Main Profile are in use in the NAL unit stream. The value of the parameter is a 3-character string of "1"s and "0"s indicating the values of more_than_one_slice_group_allowed_flag, arbitrary_slice_order_allowed_flag, and redundant_pictures_allowed_flag (respectively) of the sequence parameter set NAL units that are in use in the NAL unit stream. If the value of any one of the flags in any of the sequence parameter sets used in the NAL unit stream changes, a value of "1" MUST be Wenger et. al. Expires December 2002 [Page 19] Internet Draft 01 March, 2003 indicated in the value of the corresponding flag in the profile-interoperability parameter. If no profile-interoperability is present, its value is undefined. parameter-sets: This parameter MAY be used to convey such parameter set NAL units, herein referred to as the initial parameter set NAL units, that MUST precede any other NAL units in decoding order. The parameter MUST NOT be used to indicate codec capability in any capability exchange procedure. The value of the parameter is the hexadecimal representation of the initial parameter set NAL units as specified in sections 7.3.2.1 and 7.3.2.2 of [1]. The parameter sets are conveyed in decoding order and no framing of the parameter set NAL units takes place. Note that the number of bytes in a parameter set NAL unit is typically less than 10 bytes, but a picture parameter set NALU can contain even several hundreds of bytes. num-reorder-VCL-NAL-units: This parameter MAY be used to signal the properties of a NAL unit stream or the capabilities of a transmitter or receiver implementation. The parameter specifies the maximum amount of VCL NAL units that precede any VCL NAL unit in the NAL unit stream in NAL unit decoding order and follow the VCL NAL unit in RTP sequence number order or in the composition order of the aggregation packet containing the VCL NAL unit. If the parameter is not present, num-reorder-VCL-NAL-units equal to 0 MUST be implied. The value of num- reorder-VCL-NAL-units MUST be an integer in the range from 0 to 32767, inclusive. aggregation-mode: Permissible values are 0 and 1. If 0 or not present, STAPs MAY be present and MTAPs MUST NOT be present in the NAL unit stream. If 1, both STAPs and MTAPs MAY be present in the NAL unit stream. Encoding considerations: This type is defined for transfer via RTP (RFC 1889). Security considerations: See section 10 of RFC XXXX. Public specification: Please refer to RFC XXXX and its section 15. Additional information: Wenger et. al. Expires December 2002 [Page 20] Internet Draft 01 March, 2003 None File extensions: none Macintosh file type code: none Object identifier or OID: none Person & email address to contact for further information: stewe@cs.tu-berlin.de Intended usage: COMMON. Author/Change controller: stewe@cs.tu-berlin.de IETF Audio/Video transport working group 9.2. SDP Parameters The MIME media type video/H264 string is mapped to fields in the Session Description Protocol (SDP) [4] as follows: o The media name in the "m=" line of SDP MUST be video. o The encoding name in the "a=rtpmap" line of SDP MUST be H264 (the MIME subtype). o The "a=fmtp" line of SDP MUST contain the optional parameters "profile-level-id", "profile-interoperability", "parameter-sets", "num-reorder-VCL-NAL-units", and "aggregation-mode", if any, to indicate the coder capability and configuration, respectively. These parameters are expressed as a MIME media type string, in the form of as a semicolon separated list of parameter=value pairs. An example of media representation in SDP is as follows (Baseline Profile, Level 3.0, more than one slice group, arbitrary slice ordering, and redundant slices are in use): m=video 49170/2 RTP/AVP 98 a=rtpmap:98 H264/90000 a=fmtp:98 profile-level-id=421E;profile-interoperability=111 10. Security Considerations So far, no security considerations beyond those of RFC1889 have been identified. Currently, the JVT CD does not allow carrying any type of active payload. However, the inclusion of a "user data" mechanism is under consideration, which could potentially be used for mechanisms such as remote software updates of the video decoder and similar tasks. Wenger et. al. Expires December 2002 [Page 21] Internet Draft 01 March, 2003 11. Informative Appendix: Application Examples This payload specification is very flexible in its use, to cover the extremely wide application space that is anticipated for the JVT codec. However, such a great flexibility also makes it difficult for an implementer to decide on a reasonable packetization scheme. Some information how to apply this specification to real-world scenarios is likely to appear in the form of academic publications and a Test Model in the near future. However, some preliminary usage scenarios should be described here as well. 11.1. Video Telephony, no Data Partitioning, no packet aggregation The RTP part of this scheme is implemented and tested (though not the control-protocol part, see below). In most real-world video telephony applications, the picture parameters such as picture size or optional modes never change during the lifetime of a connection. Hence, all necessary Parameter Sets (usually only one) are sent as a side effect of the capability exchange/announcement process e.g. according to the SDP syntax specified in section 9.2 of this document. Since all necessary Parameter Set information is established before the RTP session starts, there is no need for sending any parameter set NALUs. Data Partitioning is not used either. Hence, the RTP packet stream consists basically of NALUs that carry single slices of video information. The size of those single-slice NALUs is chosen by the encoder such that they offer the best performance. Often, this is done by adapting the coded slice size to the MTU size of the IP network. For small picture sizes this may result in a one-picture-per-one- packet strategy. The loss of packets and the resulting drift- related artifacts are cleaned up by Intra refresh algorithms. 11.2. Video Telephony, Interleaved Packetization using Packet Aggregation This scheme allows better error concealment and is widely used in H.263 based designed using RFC2429 packetization. It is also implemented and good results were reported [8]. The source picture is coded by the VCL such that all MBs of one MB line are assigned to one slice. All slices with even MB row addresses are combined into one STAP, and all slices with odd MB row addresses into another STAP. Those STAPs are transmitted as RTP packets. The establishment of the Parameter Sets is performed as discussed above. Note that the use of STAPs is essential here, because the high number of individual slices (18 for a CIF picture) would lead to unacceptably high IP/UDP/RTP header overhead (unless the source Wenger et. al. Expires December 2002 [Page 22] Internet Draft 01 March, 2003 coding tool FMO is used, which is not assumed in this scenario). Furthermore, some wireless video transmission systems, such as H.324M and the IP-based video telephony specified in 3GPP, are likely to use relatively small transport packet size. For example, a typical MTU size of H.223 AL3 SDU is around 100 bytes [11]. Coding individual slices according to this packetization scheme provides a further advantage in communication between wired and wireless networks, as individual slices are likely to be smaller than the preferred maximum packet size of wireless systems. Consequently, a gateway can convert the STAPs used in a wired network to several RTP packets with only one NALU that are preferred in a wireless network and vice versa. 11.3. Video Telephony, with Data Partitioning This scheme is implemented and was shown to offer good performance especially at higher packet loss rates [8]. Data Partitioning is known to be useful only when some form of unequal error protection is available. Normally, in single-session RTP environments, even error characteristics are assumed -- statistically, the packet loss probability of all packets of the session is the same. However, there are means to reduce the packet loss probability of individual packets in an RTP session. RFC 2198 [12], for example, allows carrying a redundant copy of a essential packet in the next RTP packet. Packet-based Forward Error Correction [13] carried in RFC2198 is also an appropriate means to protect high priority information. In all cases, the incurred overhead is substantial, but in the same order of magnitude as the number of bits that have otherwise be spent for intra information. However, this mechanism is not adding any delay to the system. Again, the complete Parameter Set establishment is performed through control protocol means. 11.4. Low-Bit-Rate Streaming This scheme has been implemented with H.263 and gave good results [14]. There is no technical reason why similarly good results could not be achievable using the JVT codec. In today's Internet streaming, some of the offered bit-rates are relatively low in order to allow terminals with dial-up modems to access the content. In wired IP networks, relatively large packets, say 500 - 1500 bytes, are preferred to smaller and more frequently occurring packets in order to reduce network congestion. Moreover, use of large packets decreases the amount of RTP/UDP/IP header overhead. For low-bit-rate video, the use of large packets means that sometimes up to few pictures should be encapsulated in one packet. Wenger et. al. Expires December 2002 [Page 23] Internet Draft 01 March, 2003 However, loss of such a packet would have drastic consequences in visual quality, as there is practically no other way to conceal a loss of an entire picture than to repeat the previous one. One way to construct relatively large packets and maintain possibilities for successful loss concealment is to construct MTAPs that contain slices from several pictures in an interleaved manner. An MTAP should not contain spatially adjacent slices from the same picture or spatially overlapping slices from any picture. If a packet is lost, it is likely that a lost slice is surrounded by spatially adjacent slices of the same picture and spatially corresponding slices of the temporally previous and succeeding pictures. Consequently, concealment of the lost slice is likely to succeed relatively well. 11.5. Robust Packet Scheduling in Video Streaming This scheme has been implemented with MPEG-4 Part 2 and simulated in a wireless streaming environment [15]. There is no technical reason why similar or better results could not be achievable using the JVT codec. Streaming clients typically have a receiver buffer that is capable of storing a relatively large amount of data. Initially, when a streaming session is established, a client does not start playing the stream back immediately, but rather it typically buffers the incoming data for a few seconds. This buffering helps to maintain continuous playback, because, in case of occasional increased transmission delays or network throughput drops, the client can decode and play buffered data. Otherwise, without initial buffering, the client has to freeze the display, stop decoding, and wait for incoming data. The buffering is also necessary for either automatic or selective retransmission in any protocol level. If any part of a picture is lost, a retransmission mechanism may be used to resend the lost data. If the retransmitted data is received before its scheduled decoding or playback time, the loss is perfectly recovered. Coded pictures can be ranked according to their importance in the subjective quality of the decoded sequence. For example, non-reference pictures, such as conventional B pictures, are subjectively least important, because their absence does not affect decoding of any other pictures. In addition to non-reference pictures, the ITU-T H.264 | ISO/IEC 14496-10 standard includes a temporal scalability method called sub-sequences [16]. Subjective ranking can also be made on data partition or slice group basis. Coded slices and data partitions that are subjectively the most important can be sent earlier than their decoding order indicates, whereas coded slices and data partitions that are subjectively the least important can be sent later than their natural coding order indicates. Consequently, any retransmitted parts of the most important slice and data partitions are more likely to be received before their scheduled decoding or playback time compared to the least important slices and data partitions. Wenger et. al. Expires December 2002 [Page 24] Internet Draft 01 March, 2003 12. Open Issues There may be an issue when using the I-D to transport interlace content. It seems that the draft has a problem when one picture has more than one timestamp. The authors will try to come to a conclusion during the Pattaya meeting of JVT (in the week before the San Francisco IETF), and report in the AVT session whether a problem exist and, if time permits, present a possible solution. 13. Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 14. Intellectual Property Notice The IETF has been notified of intellectual property rights claimed in regard to some or all of the specification contained in this document. For more information consult the online list of claimed rights at http://www.ietf.org/ipr. 15. References 15.1. Normative References [1] "Study of Final Committee Draft of Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC 14496-10 AVC) ", available from Wenger et. al. Expires December 2002 [Page 25] Internet Draft 01 March, 2003 ftp://ftp.imtc-files.org/jvt-experts/2002_12_Awaji/JVT- F100.zip, February 2003. [2] S. Bradner,"Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [3] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", RFC 1889, January 1996. [4] M. Handley and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998. 15.2. Informative References [5] P. Borgwardt, "Handling Interlaced Video in H.26L", VCEG- N57r2, available from ftp://standard.pictel.com/video- site/0109_San/VCEG-N57r2.doc, September 2001. [6] C. Borman et. Al., "RTP Payload Format for the 1998 Version of ITU-T Rec. H.263 Video (H.263+)", RFC 2429, October 1998. [7] ISO/IEC IS 14496-2. [8] S. Wenger, "H.26L over IP", IEEE Transaction on Circuits and Systems for Video technology, to appear (April 2002). [9] S. Wenger, "H.26L over IP: The IP Network Adaptation Layer", Proceedings Packet Video Workshop 02, April 2002, to appear. [10] T. Stockhammer, M.M. Hannuksela, and S. Wenger, "H.26L/JVT Coding Network Abstraction Layer and IP-based Transport" in Proc. ICIP 2002, Rochester, NY, September 2002. [11] ITU-T Recommendation H.223 (1999). [12] C. Perkins et. al., "RTP Payload for Redundant Audio Data", RFC 2198, September 1997. [13] J. Rosenberg, H. Schulzrinne, "An RTP Payload Format for Generic Forward Error Correction", RFC 2733, December 1999. [14] V Varsa, M. Karczewicz, "Slice interleaving in compressed video packetization", Packet Video Workshop 2000. [15] S.H. Kang and A. Zakhor, "Packet scheduling algorithm for wireless video streaming," International Packet Video Workshop 2002, available http://www.pv2002.org. [16] M.M. Hannuksela, "Enhanced concept of GOP", JVT-B042, available ftp://standard.pictel.com/video-site/0201_Gen/JVT- B042.doc, January 2002. Author's Addresses Stephan Wenger Phone: +49-172-300-0813 TU Berlin / Teles AG Email: stewe@cs.tu-berlin.de Franklinstr. 28-29 D-10587 Berlin Germany Thomas Stockhammer Phone: +49-89-28923474 Institute for Communications Eng. Email: stockhammer@ei.tum.de Munich University of Technology D-80290 Munich Germany Wenger et. al. Expires December 2002 [Page 26] Internet Draft 01 March, 2003 Miska M. Hannuksela Phone: +358 40 5212845 Nokia Corporation Email: miska.hannuksela@nokia.com P.O. Box 68 33721 Tampere Finland Wenger et. al. Expires December 2002 [Page 27]