Internet Engineering Task Force Audio Visual Transport WG INTERNET-DRAFT C.Guillemot, P.Christ, S.Wesner draft-guillemot-genrtp-00.txt INRIA / Univ. Stuttgart - RUS November 13, 1998 Expires: June 12, 1999 RTP Generic Payload with Scaleable & Flexible Error Resiliency Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months. Internet-Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet-Drafts as reference material or to cite them other than as a "working draft" or "work in progress." To learn the current status of any Internet-Draft, please check the 1id-abstracts.txt listing contained in the Internet-Drafts Shadow Directories on ds.internic.net, nic.nordu.net, ftp.nisc.sri.com, or munnari.oz.au. ABSTRACT This document describes a payload, generic in the sense of useable and unique for both audio, video and scene description streams. Relying on a simple, yet sufficient, fragmentation and grouping mechanism, the proposed payload allows for protection against packet loss, with a mechanism intended to be generic. This mechanism allows to avoid extra connection management complexity - for separate FEC channels - in high-number-of-streams applications. It allows the support of a hierarchy of error control mechanisms ('no', FEC, redundant data, to network characteristics. C.Guillemot, P.Christ, S.Wesner [Page 1] INTERNET-DRAFT draft-guillemot-genrtp-00.txt November 13, 1998 1. Introduction This document is motivated by the large variety of compressed streams, of error control mechanisms, by the increasing number of payload formats, tailored to specific compression schemes, with dedicated yet with conceptually-closed mechanisms for increased error resiliency. Indeed, the approach, so far, was in the specification of dedicated payload formats for all media compression schemes, supporting specific and dedicated error resiliency mechanisms; One payload type is thus assigned for each compression scheme with an associated error control mechanisms support. In addition to a generic unique payload, the motivation here is flexibility in associating error control mechanisms with the compressed media streams, making these error control mechanisms evolutive without the need for redefining the payload format, and adaptable to stream segment types and network characteristics. Inspired from some genericity concepts from [2-3] on one hand, trying to federate the different error control approaches [1,4, 5, 6] under a unique protocol support mechanism on the other hand, the rationale for this payload proposal consists in: - genericity with simple, yet sufficient, fragmentation and grouping mechanisms. The same payload could hence be used for audio, video, and possibly for scene description and object descriptors streams (in the VRML / MPEG-4 spirit), according to their QoS requirements. - Protection against packet loss, with a generic mechanism, while alleviating from extra connection management complexity possibly brought by separate FEC channels in applications where the number of streams can potentially be high (e.g. MPEG-4 applications). - flexible support of a range of error control mechanisms, from no protection to FEC and redundant data, that could be adapted to data segment types and to network characteristics. Redundant data, as in the sense of [1], or of [5-6] (e.g. under the form of repeated picture headers, or of the HEC field of the MPEG-4 video syntax [12]) could then be supported by a unique mechanism. Some unavoidable 'specificity' is moved from the payload type to an extension field type, that can be specified by out-of-band signaling at the begining of the session, using either SDP [11] or notions of 'decoder configuration descriptors', as introduced in MPEG-4, and C.Guillemot, P.Christ, S.Wesner [Page 2] INTERNET-DRAFT draft-guillemot-genrtp-00.txt November 13, 1998 conveyed in a separate channel or via out-of-band signalling. As informative annex, a possible usage of the payload for an error resilient transport of different types of video and audio compressed streams is described. 2. Design Considerations Let the Access Units (AU) be the entities produced by the compression layer with a given presentation time (e.g. a Video Frame or an audio frame). The protocol support (payload field specification) for fragmentation and grouping is inspired from [2-3] with an attempt for simplification. 2.1. Fragmentation Here, two levels of fragmentation are envisaged: A 'media aware' level allowing fragmentation into independently decodable entities (e.g. MPEG-4 video packets, H.263+ slices,...), called PDUs; and a 'media unaware' level, into possibly non independently decodable fragments (e.g. MPEG-4 video packets or H263+ slices across RTP packet boundaries). The notion of 'independently decodable' entities supposes that all spatially predictive information is confined within the given entities or PDUs. in compression and network adaptation layers. If a 'media aware' fragmentation is not performed by the compression layer then the PDU contains the whole AU (e.g. the whole audio frame). When the compression layer is not MTU aware, the second level of fragmentation, called here the 'media unaware' fragmentation may be necessary. If one PDU is larger than the MTU size, then it will be fragmented across several RTP packets. RTP packets transporting fragments or transporting PDUs belonging to the same AU will have their RTP timestamp set at the same value. 2.2. Grouping Mechanisms The grouping mechanism concerns first the possibility of concatenating C.Guillemot, P.Christ, S.Wesner [Page 3] INTERNET-DRAFT draft-guillemot-genrtp-00.txt November 13, 1998 several AU's and/or PDUs and/or 'fragments' in one packet, in the scenario where the compression layer producing these PDU's is not MTU aware, and the PDU size is smaller than the MTU size. The grouping mechanism concerns also the possibility of aggregating extension data and PDU in the same packet, as proposed in [2]. Grouping and fragmentation may be combined. 2.3. Data Characterization Compressed streams are usually characterized by bit segments containing information with different priority levels, in the sense that the loss of these segments will lead to different impacts, from decoder no-start, to a whole range of quality impairments, including loss of entire frames. Similarly other types of streams, such as scene description streams (in the VRML/MPEG-4 sense) require a very high level of protection, possibly reliable transport. Therefore, it seems natural to envisage different levels of protections for stream segments for improved resiliency against packet losses, as proposed in [9] for video, motivating the design of a payload with a flexible support of a range of error control mechanisms that could be adapted to the stream segments types. The different priority levels considered here are defined according to 3 main criteria: - impact on decoder initialization, - quality degradation due to packet loss, - delay requirements. leading to the consideration, in this document, of the following priority levels or stream segment types: C.Guillemot, P.Christ, S.Wesner [Page 4] INTERNET-DRAFT draft-guillemot-genrtp-00.txt November 13, 1998 - HPTD (High Priority with Tolerance to Delay): Vital information, e.g. decoder initialization and configuration that can tolerate increased end-to-end latency (e.g beginning of the video session, scene description streams such as MPEG-4 BIFS streams, VRML, ....) - HPND (High Priority with No tolerance to Delay): Vital information (e.g decoder configuration parameters - picture types, quantization values, ....) that cannot tolerate increased delay - LP (Lower Priority information) (e.g. video frames without configura- tion paremeters,...) 2.4. Hierarchy of Error Control Solutions Different solutions for increased error resiliency are usually considered, either based on reliable transport protocols for highly sensitive and high priority data [9], either relying on error control mechanisms. Error control mechanisms as ARQ (Automatic Repeat Request) and FEC-based error control mechanisms aim at increasing the stream resiliency to packet loss, but do not avoid packet loss. More precisely, closed loop mechanisms as ARQ techniques consist in re-transmitting the lost data. With open loop mechanisms such as redundant data - which can be repeated data or data encoded with different schemes [1] -, or FEC (Forward Error Correction) - e.g. parity data or data encoded using block codes -, is transmitted along with the original data, so that some of the lost original data can be recovered from the redundant information. Reliable transport increases overall latency and delay, which can be uncompatible with delay requirements of real-time multimedia, whereas error control mechanisms increase bandwidth as well as overall delay. Also, the potential of the different error control mechanisms, depends on the characteristics of both the packet loss process, and of the compressed media streams. This motivates the design of a payload that would provide support, with a unique mechanism, for a range of error control solutions, i.e. C.Guillemot, P.Christ, S.Wesner [Page 5] INTERNET-DRAFT draft-guillemot-genrtp-00.txt November 13, 1998 - redundant data, with different types (e.g. lower rate secondary data, duplicated 'HPND' information,...) - FEC, based on parity data or block codes - no protection 3. Payload Format specification The packet will consist of an RTP header followed by possibly multiple payloads. 3.1. RTP Header Usage Each RTP packet starts with a fixed RTP header. The following fields of the fixed RTP header are used: - Market bit (M bit): The marker bit of the RTP header is set to 1 when the current packet carries the end of an access unit AU, or the last fragment of an AU. - Payload Type (PT): The payload type shall be set to value assigned to this format or a payload type in the dynamic range should be chosen. - Timestamp: The RTP timestamp encodes the presentation time of the first AU contained in the packet. The RTP timestamp may be the same on successive packets if an AU (e.g. audio or video frame) occupies more than one packet. If the packet contains only 'extension' data objects (see below), then the RTP time- stamp is set at the value of the presentation time of the AU to which the first extension data object (e.g. FEC or redundant data) applies. 3.2. Payload Header C.Guillemot, P.Christ, S.Wesner [Page 6] INTERNET-DRAFT draft-guillemot-genrtp-00.txt November 13, 1998 The payload header is always present, with a variable length, and is defined as follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |G|E|F| X T | LENGTH | FOFFSET . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ .FOFFSET(cnt'd) | TSOFFSET | . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . OBJECT (PDU or Extension Data) . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |G|E|F| X T | LENGTH | FOFFSET . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ .FOFFSET(cnt'd) | TSOFFSET | . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . OBJECT (PDU or Extension Data) . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1. Sample RTP payload, using the payload format. G (Group) (1 bit): If this field is 1, it indicates that the object associated to the current header is followed by another object. E (Extension) (1 bit): If its value is 1 then the next object contains Extension data (similarly to [2]). If its value is 0, then the next F (Fragmentation) (1 bit): This field is only present when the E field is 0 and the Gfield is 0. If its value is 1, then the next 'object field' is a fragment of a PDU. If this field is 0, then the next 'object field' is a complete PDU. XT (Extension type) (8 bits): This field is only present if E is set to 1. It then specifies the type of extension data. Examples of types will be FEC data with the specification of the FEC coding scheme (parity codes, block codes such as Reed Solomon codes, ...), redundant data with the specification of the redundant data encoding scheme, duplicated high priority data,...etc. LENGTH (13 bits): this field specifies the length in bytes of the next ‘object field’. If the object contains the last PDU or last PDU fragment of the payload then this field is not present. FOFFSET (16 bits): This field is present only when the F field is C.Guillemot, P.Christ, S.Wesner [Page 7] INTERNET-DRAFT draft-guillemot-genrtp-00.txt November 13, 1998 present and F=1. It contains the byte offset of the first byte of the fragment from the beginning of the PDU. TSOFFSET (Time Stamp OFFSET) (16 bits): The value of the field is an unsigned 16 bit integer. The default value is 0. If the E field is '1', then the next 'object' carries extension data, and the TSOFFSET added to the value of the RTP timestamp yields the presentation time of the PDU to which the extension data apply. The TSOFFSET is, in this case set to the difference between the media TS and the TS of the media to which the extension data apply. If the E field is '0', then the next ‘object field’ is a PDU. If this PDU is not the first PDU in the payload (i.e. the previous object is also a PDU), then the TSOFFSET added to the value of the RTP timestamp yields the presentation time of the following PDU. If this PDU or fragment of PDU is the first in the payload (even if it has been preceded by extension data) then this field is not present. 4. Examples of payload headers 4.1. The payload contains Extension data followed by a PDU First payload header: G=1, E=1, so F not present,FOFFSET not present; Second payload header: G=0, E=0, F=0, XT not present, FOFFSET not present (F=0). last PDU (G=0) in the payload, so the length field is not present. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |G|E| X T | LENGTH | TSOFFSET . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ .TSOFFSET(cnt'|P| Extension Data . +-+-+-+-+-+-+-+-+ . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |G|E|F| TSOFFSET | padding | . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . . PDU . . . C.Guillemot, P.Christ, S.Wesner [Page 8] INTERNET-DRAFT draft-guillemot-genrtp-00.txt November 13, 1998 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4.2. The payload contains Extension data followed by a fragment First payload header: G=1, E=1, so F not present Second payload header: G=0, E=0, F=1, XT not present, last fragment (G=0) in the payload,so LENGTH not present. first fragment in the payload, so TSOFFSET is not present. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |G|E| X T | LENGTH | TSOFFSET . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ .TSOFFSET(cnt'|P| Extension Data . +-+-+-+-+-+-+-+-+ . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |G|E|F| FOFFSET | padding | . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . . fragment of PDU . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4.3. The payload contains Extension data followed by 2 PDU's First payload header: G=1, E=1, so F field not present Second payload header: G=1, E=0, F=0, XT not present, first PDU in the payload, so TSOFFSET is not present. Third payload header: G=0, E=0, F=0, XT field not present, last PDU in the payload, so LENGTH field not present 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |G|E| X T | LENGTH | TSOFFSET . C.Guillemot, P.Christ, S.Wesner [Page 9] INTERNET-DRAFT draft-guillemot-genrtp-00.txt November 13, 1998 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ .TSOFFSET(cnt'|P| Extension Data . +-+-+-+-+-+-+-+-+ . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |G|E|F| LENGTH | . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . . PDU . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |G|E|F| TSOFFSET |padding | . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . . PDU . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4.4. The payload contains Extension data followed by one fragment followed by one PDU First payload header: G=1, E=1, so F field not present Second payload header: G=1, E=0, F=1, XT not present, first PDU fragment in the payload, so TSOFFSET is not present. Third payload header: G=0, E=0, F=0, XT field not present, last PDU in the payload, so LENGTH field not present 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |G|E| X T | LENGTH | TSOFFSET . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ .TSOFFSET(cnt||P| Extension Data . +-+-+-+-+-+-+-+-+ . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |G|E|F| LENGTH | FOFFSET | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . PDU . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |G|E|F| TSOFFSET |padding | . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . C.Guillemot, P.Christ, S.Wesner [Page 10] INTERNET-DRAFT draft-guillemot-genrtp-00.txt November 13, 1998 . PDU . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 5. Extension data field for parity and Reed Solomon codes The Extension data field can be used for transporting FEC data in the spirit of [4], but in the same channel as the media data. Similarly to [4], it can support a variety of FEC mechanisms (parity codes, block codes such as Reed Solomon codes). In the approach proposed here, provided that the XT field semantic is announced via a non-RTP and out of band signalling, such as SDP [9], with appropriate extensions, then the FEC mechanisms can, during the session, and depending on the segment type, and on the network characteristics, be adapted without further out of band signalling. 5.1. Parity codes Inspired from [4], in the case of parity codes, the extension data field can include the following header: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SN Base | length recovery | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E|R| Mask | . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . TS Recovery | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The fields SN base, E, Mask, TS recovery are defined as in [4]. R: The bit R is the Marker recovery bit. The marker bit is computed from the RTP media packets marker bits M, to which is applied the protection operation. Length recovery: determines the length of the recovered packets and is here computed via the protection operation applied to the 16 bit C.Guillemot, P.Christ, S.Wesner [Page 11] INTERNET-DRAFT draft-guillemot-genrtp-00.txt November 13, 1998 natural binary representation of the lengths (in bytes) of the media payload, CSRC list, extension and padding of media packets associated with this FEC data, PLUS THE MARKER BIT. The length recovery field allows to apply the procedure to packets which are not of the length, including here to some objects of the given packets. 5.2. Reed Solomon Codes Similarly, in the case of Reed Solomon codes, the extension data field can include the following header: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SN Base | length recovery | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E|R| N | k | i |TS Recovery. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . TS Recovery (cnt'd) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Note that, unlike [4], the PT recovery field is not used, since the payload type of the packets transported in a given channel is supposed to be known, namely to be of the type corresponding to this proposed payload. 6. Usage of Extension data field for redundant data 6.1. Usage for redundant 'HPND' data All AU-level or frame-level decoder configuration information can be considered as of HPND type. This information is of high priority, since, if lost, the whole frame is lost and does not tolerate increased latency. The extension data field may hence hold duplicated HPND data in 'n' consecutive packets. The parameter 'n' may be chosen so that the probability that 'n' consecutive packets are lost is below a given C.Guillemot, P.Christ, S.Wesner [Page 12] INTERNET-DRAFT draft-guillemot-genrtp-00.txt November 13, 1998 threshold. But these decision mechanisms are outside the scope of this document. Note that this duplication of frame-level information is envisaged in payload formats defined for compressed video streams [5-7], in the MPEG-4 video syntax itself, by using a 'Header extension' (HEC) - see annex A -,... with conceptually closed, yet with 6.2. Usage for redundant 'LP' data As a special type of FEC, it has been proposed in [1] to use, lower rate, secondary encoding of the media data to be protected. The mechanism described above is directly useable for the transport of secondary compressed streams along with primary compressed data. Note that the secondary compressed stream can also be the a lower layer (with a lower rate) of a scaleable compression scheme, such as specified in [12] and [13] for respectively video and audio. 7. Conclusion This document describes a solution for a 'generic' payload, i.e. unique for a large variety of compressed streams - audio, video, with different compression schemes, or scene description. It allows to have a generic support for flexible error protection which can be in addition adaptable to stream segment characteristics, as well as to network characteristics. Unavoidable 'specificity' is moved from the payload type to the extension data type. The extension data field can be used for supporting mechanisms for improved packet loss resiliency. This concept, with respect to payload headers which are fixed and dedicated to given compression schemes, brings additional flexibility in error resiliency, from no protection (the extension data field will not be used for FEC, redundant data or duplicated headers) to various degrees of protection depending on the types of the following segments of data. An out-of-band mechanism, such as SDP, could be used for announcing at the beginning of the session the semantic of extension data and/or list of extension data types supported. During the session, different extension data types (e.g. for supporting different error protection mechanisms) can then selected and announced without out-of-band mechanisms. C.Guillemot, P.Christ, S.Wesner [Page 13] INTERNET-DRAFT draft-guillemot-genrtp-00.txt November 13, 1998 8. Authors Addresses Christine Guillemot INRIA Campus Universitaire de Beaulieu 35042 RENNES Cedex, FRANCE email: Christine.Guillemot@irisa.fr Paul Christ Computer Center - RUS University of Stuttgart Allmandring 30 D70550 Stuttgart, Germany. email: Paul.Christ@rus.uni-stuttgart.de Stefan Wesner Computer Center - RUS University of Stuttgart Allmandring 30 D70550 Stuttgart, Germany. email: wesner@rus.uni-stuttgart.de 9. Bibliography [1] : C. Perkins, I. Kouvelas, O. Hodson, V. Hardman, M. Handley, J. Bolot, A. Vega-Garcia, S. Fosse-Parisis, 'RTP Payload for Redundant Audio Data', draft-ietf-avt-redundancy-revised-00.txt, 10-Aug-98 [2] : A. Klemets, 'Common Generic RTP Payload Format', draft-klemets- generic-rtp-00, March 13, 1998. [3] : A. Periyannan, D. Singer, M. Speer,'Delivering Media Generically over RTP', draft-periyannan-generic-rtp-00, March 13, 1998 [4] : J. Rosenberg, H. Schulzrinne, 'An RTP Payload Format for Generic Forward Error Correction', draft-ietf-avt-fec-03.txt, 10-Aug-98. [5] : T. Turletti, C. Huitema, 'RTP payload for H.261 video', RFC 2032. [6] : C. Zhu, 'RTP payload format for H.263 Video Streams', RFC 2190. [7] : C. Borman, L. Cline, G. Deisher, T. Gardos, C. Maciocco, D. Newell, J. Ott, S. Wenger, C. Zhu, 'RTP payload format for the 1998 version of ITU-T Rec. H.263 video (H.263+)', draft-ietf- avt-rtp-h263-video-02.txt, 7-May-98. [8] : D. Hoffman, G. Fernando, V. Goyal, M. Civanlar,'RTP Payload C.Guillemot, P.Christ, S.Wesner [Page 14] INTERNET-DRAFT draft-guillemot-genrtp-00.txt November 13, 1998 format for MPEG1/MPEG2 video', RFC 2250, January 1998. [9] : M. Reha Civanlar, G.L. Cash, B.G. Haskell, 'AT&T Error Resilient Video Transmission Technique', draft-civanlar-hplp-00.txt, July 1998. [10] : H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, 'RTP: A Transport Protocol for Real-Time Applications', draft-ietf-avt- rtp-new-00.ps, December 1997. [11] : Mark Handley, Van Jacobson, 'SDP:Session Description Protocol', draft-ietf-mmusic-sdp-07.txt, 2nd Apr 1998. [12] : Information Technology - Coding of Audiovisual Objects, Visual, ISO/IEC 14496-2, May 15, 1998. [13] : Information Technology - Coding of Audiovisual Objects, Audio - CELP, ISO/IEC 14496-3, subpart 3, May 15, 1998. ANNEX A: Usage for Transport of MPEG-4 Video Elementary Streams A.1 Usage of RTP packet header for the MPEG-4 video ES Preliminary remark: In the MPEG-4 framework, the indication of the compression scheme is not necessary in the payload type, since this information is provided by the decoderconfigdescriptor delivered with the scene description stream or via out of band signaling. The RTP Timestamp is set at the value of the MPEG-4 video ES Composition Time Stamp (CTS). A.2 MPEG-4 Error-resiliency Modes The notion of 'video packet' (corresponding to the above media aware created PDUs) has been introduced in MPEG-4 [11]. When the error resilience mode is 'on' ('error-resilience-disabled' flag set to '0'), then a resync_marker is inserted by the encoder before the first macro-block after the number of bits output since the last resync_marker field exceeds a predetermined value. The marker spacing value is dependent on the anticipated error conditions of the transmission channel and compressed data rate. The compressed data included between two resync_markers is C.Guillemot, P.Christ, S.Wesner [Page 15] INTERNET-DRAFT draft-guillemot-genrtp-00.txt November 13, 1998 called a video packet. Depending on the initial setting of the resync_markers and of the transmission channels variations, the video packet may be larger than the size of one RTP packet, and may then be fragmented as explained above (by an MTU aware, but media unaware network adaptation layer). --------------------------------------------- |Resync | MBA | Quant. | Header | Temporal | |Marker | | Param | Extens. | Ref. | ---------------------------------------------- . . . . . . . . . . . . . . . . ------------------------------------------------------ | | | | | shape | Motion | Texture | Error | Texture | | | | | | data | data | data | Burst | data | ------------------------------------------------------ Figure 2 : MPEG-4 Video Packet Syntax In order to make each video packet independently decodable, all predictively encoded information must be confined within a video packet so as to prevent the propagation of errors. As shown in figure 2, Header information is also provided at the start of a video packet. Contained in this header is the information necessary to restart the decoding process and includes: the macroblock address (number) of the first macroblock contained in this packet and the quantization parameter (quant-scale) necessary to decode that first macroblock. The macroblock number provides the necessary spatial resynchronization while the quantization parameter allows the differential decoding process to be resynchronized. Following the quant-scale is the Header Extension Code (HEC). HEC is a single bit used to indicate whether additional information will be available in this header. If the HEC is equal to one then the following additional information is available in this packet header: modulo time base, VOP-time- C.Guillemot, P.Christ, S.Wesner [Page 16] INTERNET-DRAFT draft-guillemot-genrtp-00.txt November 13, 1998 increment, VOP-coding-type, intra-dc-vlc-thr, VOP-fcode-forward, VOP-fcode-backward. When the Header Extension Code is set to '1', each video packet (VP) can be decoded independently. The necessary information to decode the VP is then included in the header extension code field. If the VOP header information is corrupted by the transmission error, they can be corrected by the HEC information. Note: This HEC information actually is FEC information under the form of redundant data. It should not be in the compression layer but in a network adaptation layer, which would respond to mechanisms of adaptation to network conditions. It is then proposed to remove it and to shift the redundant information into the æExtension dataÆ object field of the RTP packet. This would allow to have a unifying framework for any type of video encoding (H.261 [5], H263 [6], H.263++ [7], MPEG1/2 [8], MPEG-4 [12],à) and for audio encoding ([1], [8]). A.3 Prioritisation of MPEG-4 Video Syntax The MPEG-4 video syntax presents a hierarchical structure. Header decoder configuration information carried in the æVisObjectSeqÆ, can be transmitted with the support of an ARQ (Automatic Repeat Request) mechanism. Decoder configuration information of GOV and VOP levels are of type HPND data, and redundant data mechanism will be used for providing an acceptable level of protection. Texture, motion and shape information contained in the video packets are considered as LP data and can be protected either with no protection or with FEC, depending on both the media QoS and on the network characteristics. In the case of scaleable coding, the choice of the level of protection for this data may be adapted to each layer of the scaleable stream. ANNEX B: Usage for Transport of H.263+ Video Streams The fields SBIT, EBIT, PLEN PEBIT specified in the H.263+ payload header are not necessary any more. Only the TID, TRUN and RR fields are useful in the VRC mode, and can be part of the extension data field, their usage being negociated with the extension data negotiation. In [7] their usage has also to be negotiated by external means. C.Guillemot, P.Christ, S.Wesner [Page 17] INTERNET-DRAFT draft-guillemot-genrtp-00.txt November 13, 1998 All the functions covered by the mechanisms dedicated to a better packet loss resilience, as the 'encapsulating packet that begins with PSC' or in [7] for an H.263 video stream are covered by the placement of the appropriate redundant data (e.g. duplication of picture header information). The Follow-on packet encapsulation of [7] is here covered by the fragmentation and grouping mechanisms. o ANNEX C: Usage for Transport of MPEG1/MPEG2 Video Streams All the fragmentation restriction rules specified in section 3.1 of [8] are not forced upon the implementations any more. With the fragmentation/grouping mechanism defined in this document, the applications have the choice/freedom, which can be made dependent on the network loss characteristics, to have fragmentation restrictions or not. It is proposed to place, when needed, the MPEG video-specific header and the MPEG-2 video specific header extension defined in [8] into the extension data field. ANNEX D: Usage for Transport of MPEG1/2 Audio The frag_offset value of the MPEG Audio-specific header of [8], here, is placed into the FOFFSET field. ANNEX E: Usage for Transport of MPEG-4 Audio Elementary Streams Let us consider as an example the CELP audio encoder specified by MPEG-4 [12]. The syntax of the CELP bitstream can be segmented into the decoder configuration information at the beginning of the audio session, the CelpSpecificConfig() information made of the CELP header in the standalone mode or in the base layer of both bitrate and bandwidth scalable modes or of the CelpBWSenhHeader information in the enhancement layer. The CELP decoder configuration information is of HPTD type. Each scaleable layer bitstream transported in the RTP payload will then include the frame level data, i.e. CelpBaseFrame(), CelpBRSenhFrame(), CelpBWSenhFrame() information. Each of these scaleable layer bitstreams can be segmented into frame- level decoder configuration information of HPND type and raw C.Guillemot, P.Christ, S.Wesner [Page 18] INTERNET-DRAFT draft-guillemot-genrtp-00.txt November 13, 1998 data of LP type, which, similarly to the video, will be transmitted with respectively the support of redundant data and FEC mechanisms. C.Guillemot, P.Christ, S.Wesner [Page 19]