Internet Engineering Task Force Audio Visual Transport WG Internet-Draft J.Van der Meer/ D.Curet, E.Gouleau,S.Relier,C.Roux/ P.Clement/G.Cherry Document: draft-curet-avt-rtp- Philips/FT R&D /Thales BM/nCube mpeg4-flexmux-02.txt November, 8 2001 Expires: May, 8 2001 RTP Payload Format for MPEG-4 FlexMultiplexed Streams draft-curet-avt-rtp-mpeg4-flexmux-02.txt STATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Section 10 of this document is intended for registering SDP names with IANA as in RFC 2048. Abstract This document describes a payload format for transporting MPEG-4 synchronised and multiplexed data using RTP. MPEG-4 is a recent standard from ISO/IEC for the coding of natural and synthetic audio-visual data. Several services provided by RTP are beneficial for MPEG-4 encoded and multiplexed data transport over the Internet. Additionally, the use of RTP makes it possible to synchronize MPEG-4 data with other real-time data types. This specification is a product of the Audio/Video Transport working group within the Internet Engineering Task Force and ISO/IEC MPEG-4 ad hoc group on MPEG-4 over Internet. Comments are solicited and should be addressed to the working group's mailing list at rem-conf@es.net and/or the authors. curet et al. expires May 2002 [Page 1] Internet Draft RTP payload for MPEG-4 FlexMux streams November 01 1 Introduction The MPEG-4 standard (ISO/IEC 14496) can be represented in a layered architecture, where three layers can be identified as follows: +---------------------------------------+ media aware, | COMPRESSION LAYER: | | Elementary Streams (ES) encoding | delivery unaware| MPEG-4 part 2 Visual | layer | MPEG-4 part 3 Audio | | MPEG-4 part 1 Bifs,OD,IPMP,OCI | +---------------------------------------+ ================================================ ESI Interface +---------------------------------------+ media and | SYNC LAYER (SL) | delivery unaware| Elementary streams management | layer | and synchronisation | +---------------------------------------+ ================================================DAI Interface +---------------------------------------+ delivery aware, | DELIVERY LAYER (DMIF) | media unaware |provides Flexmultiplexing of SL streams| layer | and transparent access | | to the delivery technology | +---------------------------------------+ Although the Delivery Layer mostly focuses on the control plane it also encompasses multiplexing tools, called the Flexmux tools, to multiplex MPEG-4 SL streams. MPEG makes the assumption that each MPEG-4 packet transmitted over the network should have a nearly constant transmission delay. Under this assumption, the reconstruction of the correct timing of MPEG-4 bitstreams is supported both by the MPEG-4 SL stream syntax and by the MPEG-4 FlexMux stream syntax. The reconstruction of the correct timing of an MPEG-4 Flexmux stream is possible under various QoS constraints related to the reduction of network jitter. One payload format has been defined to carry MPEG-4 Audio and Video Elementary streams [7], or are under definition to carry MPEG-4 SL streams [14] & [15]. This document will specify a RTP payload format to enable the carriage of Flexmux streams. It is inline with the MPEG-4 delivery Framework [9]. 2 MPEG-4 overview 2.1 Compresion layer: The compressed content produced by this layer are the Elementary Streams (ESs)that are organised in Access Units (AUs). An AU is the smallest element to which timestamps can be assigned. curet et al. expires May 2002 [Page 2] Internet Draft RTP payload for MPEG-4 FlexMux streams November 01 AUs are passed to the Synchronisation Layer (SL) together with timestamps, RandomAccess, and other information through the ESI interface. The Compression Layer processes the traditional individual audio/visual elementary streams (ES) and some associated 'systems' elementary streams (ES) such as Bifs, OD, IPMP and OCI elementary streams. The MPEG-4 audio/visual ES syntaxes are defined in[3] and[2]. The 'systems' ES syntaxes are described in [1]: the Bifs ES syntax allows a dynamic scene description. The OD ES syntax allows the description of the hierarchical relations, location and properties of different ESs through a dynamic set of Object Descriptors. The 'system' ES may require to be carried with a better protection than the traditional audio/visual ESs. The compression layer is unaware of a specific delivery technology, but it can react to the characteristics of a particular delivery layer such as the path-MTU or packet loss or bit error characteristics. 2.2 Synchronisation Layer: The MPEG-4 SL stream syntax is defined in [1]. It provides a unique and homogeneous encapsulation of any ES which is organised in AUs. This the case of all the MPEG-4 ESs, but it can also be the case of non MPEG-4 ESs. That layer primarily provides the synchronisation between ESs. Integer or fractional AUs, from the same ES, are encapsulated in SL packets that build an SL stream. SL packets are passed to the Delivery Layer (DMIF) through the DMIF Application Interface (DAI interface), which can allows assigning QoS requirements to the delivery of SL streams. The compression layer is unaware of a specific delivery technology, but it can react to the characteristics of a particular delivery layer such as the path-MTU or packet loss. 2.3 Delivery Layer: The MPEG-4 Delivery Layer consists of the Delivery Multimedia Integration Framework defined in [4]. This layer is media unaware but delivery technology aware. It provides transparent access to and delivery of content irrespective of the technologies used. curet et al. expires May 2002 [Page 3] Internet Draft RTP payload for MPEG-4 FlexMux streams November 01 This interface supports content location independent protocols firstly for establishing the MPEG-4 session and secondly for accessing to transport channels. DMIF monitors transport channels on the QoS requirements assigned to the SL streams, and supports the multiplexing of the SL streams, by the means of the MPEG-4 FlexMux tools. There are different possible FlexMux tools. FlexMux streams delivery is defined in [4], while the FlexMux stream syntax is defined within [1]. MPEG makes the assumption that the carriage of MPEG-4 Flexmux streams over the network should affect packets with an 'ideal' constant transmission delay; the reconstruction of the correct timing of a MPEG-4 Flexmux streams is based on that assumption. This draft specifies an RTP [5] payload format for transporting multiplexed MPEG-4 encoded data streams. It can be presented as an instance of the MPEG-4 Delivery layer. 3 Benefits of using RTP for transport: i. Ability to synchronize MPEG-4 streams with other RTP payloads ii. Monitoring MPEG-4 delivery performance through RTCP iii. Combining MPEG-4 and other real-time data streams received from multiple end-systems into a set of consolidated streams through RTP mixers iv. Converting data types, etc. through the use of RTP translators 4 Conventions used in this document 4.1 general: The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL NOT','SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'MAY', and 'OPTIONAL' in this document are to be interpreted as described in RFC-2119 [6]. 4.2 MPEG-4 glossary: AU :Access Unit, Bifs: Binary format for scene, DMIF: Delivery Multimedia Integration Framework, DAI: DMIF Application Interface, ES: Elementary stream, ESI: Elementary stream Interface, FlexMux: Flexible Multiplex, FCR: FlexMux Clock reference, IPMP: Intellectual Property Management and Protection, OCI: Object Content Information, OD: Object descriptor, QoS: Quality of service, SL: Synchronization layer curet et al. expires May 2002 [Page 4] Internet Draft RTP payload for MPEG-4 FlexMux streams November 01 5 The RTP packet 5.1 The RTP packet header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ : contributing source (CSRC) identifiers | |=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | | | | RTP Packet Payload | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1 - An RTP packet for MPEG-4 FlexMux stream 5.2 RTP header fields usage streams: Payload Type (PT): The assignment of a particular RTP payload type to this new packet format, is outside the scope of this document, and is not specified here. If the dynamic payload type assignment is used, it can be specified by some out of band means (e.g. SDP, according to the syntax proposed in the paragraph 9) that the MPEG-4 FlexMux payload format is used for the corresponding RTP packets. Marker (M) bit: The M bit is set to 1 to indicate that the RTP packet payload includes the end of each Access Unit of which data is contained in this RTP packet. The M bit is set to 0 when the RTP packet contains one or more Access Unit fragments that are not Access Unit ends. Extension (X) bit: Defined by the RTP profile used. Sequence Number: Increment by one for each RTP data packet sent. It starts with a random initial value for security reasons. curet et al. expires May 2002 [Page 5] Internet Draft RTP payload for MPEG-4 FlexMux streams November 01 Timestamp: it represents the target transmission time for the first byte of the RTP packet. Unless specified by an out of band means (e.g. SDP), the resolution of the timestamp is set to its default (90KHz). SSRC, CC and CSRC fields are used as described in RFC 1889 [5]. RTCP SHOULD be used as defined in RFC 1889 [5]. Timestamps in RTCP SR packets: The RTP timestamp value is the RTP timestamp that would be applied to an RTP packet for data that would be sent at the instant the SR packet is being generated and sent. The NTP timestamp value is the NPT time at which that SR packet is sent. 5.3 The RTP packet payload The RTP packet payload is built from an integer number of complete FlexMux packets, defined in [1]. 6 Fragmentation rules The MPEG-4 FlexMux layer does not fragment SL packets. However, there are two FlexMux tools, the first tool with a packet length limited to 257 bytes (255 bytes for the SL packet), the second tool with a packet length limited to 268 Mbytes. Fragmentation rules are needed on the SL layer. These fragmentation rules for the SL layer are detailed in a section of [14] and in a section of [15], where it is shown how the SL layer may have to be path-MTU aware. 7 FlexMultiplexing 7.1 Some of the advantages : 1. Since a typical MPEG-4 session may involve a large number of objects, that may be as many as a few hundred, transporting each ES as an individual RTP session may not always be practical. The use of one session per elementary stream cannot be much cost effective, both on the server side and on the client side in terms of performance, when the number of elementary streams will increase within a scene. curet et al. expires May 2002 [Page 6] Internet Draft RTP payload for MPEG-4 FlexMux streams November 01 2. The use of one single session for a multiplexed bitstream enables to send a bunch of ESs that are tightly synchronized together. Some of these ESs can themselves be Bifs and OD ESs when a scene description is used with Audio-Visual ES, and some other ESs can be OCI ES, and even IPMP ES when such systems are involved. 3. The FlexMultiplexing management supports embedding multiple SL packets into one FlexMux packet, by the use of FlexMux codetable entries. 4. If the multiplexing policy used is smoothing at most the multiplexed SL streams, mutual synchronization between these SL streams can be easily preserved when packet losses occur. 5. The use of the FlexMux technology enables possible interconnection between Internet network and digital television network, as MPEG normatively defines the use the MPEG-4 FlexMux syntax to carry MPEG-4 over MPEG-2 transport channels[1]. 6. The reconstruction of the correct timing of the FlexMux stream is possible, using timing samples carried within the FlexMux in-band signalling mechanism, if some QoS requirements are supported. 7. The overall MPEG-4 receiver buffer size is reduced, as MPEG-4 compliant Flexmultiplexed streams, by the use of the MPEG-4 timestamps, respect the MPEG-4 system decoder model. 8. The FlexMux stream syntax supports an in-band signalling mechanism that allows to signal dynamically, at anytime, within the stream itself, the bit-rate of the stream (FlexMux streams have a piecewise constant bit-rate), allowing to have an easy bandwidth management. FlexMux descriptors and Ad Hoc descriptors are carried within this in-band signalling mechanism. 9. Protection can be enhanced by means of repetition of vital SL packets. 10. Content providers are able to bundle together a single stream with assurance that associated streams will be kept together and synchronized. 7.2 Disadvantages: The major disadvantage with the packetization of the MPEG-4 Flexmultiplexed streams is the added packet header overhead. Two FlexMux tools, with two different FlexMux packet length fields are supported. curet et al. expires May 2002 [Page 7] Internet Draft RTP payload for MPEG-4 FlexMux streams November 01 According to the size of the Access Units, and to the size of the SL packets, the use of smaller or longer FlexMux packets, is a way to minimize the overhead, MPEG-4 does not support a reduction mechanism of the carried MPEG-4 Flexmultiplexed streams packet headers. This issue needs certainly be resolved using a mechanism similar to what was proposed with [8]. 8 Handling of scene description streams To describe virtual scenes, in addition to the two traditional Audio-visual streams, MPEG-4 introduces two new 'systems' Elementary Streams as described in the section 2, namely the OD (Object Descriptors) and Bifs (Binary format for Scene) streams. ODs may be seen, among other things, as a link between the scene description itself and its audio and video contents. ODs allow complete bi-directional independance between a scene and its audiovisual contents. This payload format offers a unique solution for downloaded and real-time applications, where the scene description can be either static or dynamic. If it is required by the implied network, and by the application, the System Elementary streams may be Multiplexed together within a first FlexMux stream, while the audio visual Elementary streams may be Multiplexed within a second FlexMux stream. Each FlexMux stream being assigned a different QoS. As applications want to prevent receivers to lose access to a content (in case where an OD update command would be lost, breaking the link between the scene and one of its content), senders must use suitable schemes to transport sensitive configuration information. Such suitable schemes may involve redundancy that can be added by different tools such as packet based FEC, packet duplication, or similar tools. 9 Transport of MPEG-4 FlexMux streams An MPEG-4 FlexMux packet is mapped directly onto the RTP payload without any addition of extra header fields or removal of any FlexMux packet header syntactic elements. Each RTP packet will contain a timestamp derived from the sender's clock reference, synchronized to the FlexMux Clock. That timestamp represents the target transmission time of the first byte of the RTP packet payload. On the receiving side, the RTP packet timestamp will not be passed to the MPEG-4 Flexdemultiplexor. This use of the timestamp is slightly different from the normal use in RTP, in that it is not considered to be the media display time-stamp. The first purpose of this RTP timestamp will then be to take over (after estimation) the network jitter. curet et al. expires May 2002 [Page 8] Internet Draft RTP payload for MPEG-4 FlexMux streams November 01 When this is achieved, the FCR samples may be used, on the receiver side to accurately reconstruct the original senderËs FlexMux clock. There are packetization restrictions due to the fact that no synchronization pattern is part of the FlexMux packet header: An RTP packet will contain an integer number of FlexMux packets. An RTP packet payload should start with the start of a FlexMux packet. An RTP packet payload should end with the end of a FlexMux packet. The FlexMux descriptors (declaration descriptor, timing descriptor, Channel Table descriptor, codetable entry descriptor, buffersize descriptor,etc..) describing the characteristics of the FlexMux stream may be provided by out of band means (e.g. SDP), and (or) by the inband signalling mechanism supported by the FlexMux stream syntax [1]. Ad Hoc (non MPEG) descriptors are also supported. When the IP packet marking facility is needed, as it is based on the 'degradationPriority' field present in each SL packet, all the FlexMux packets grouped in the same RTP packet should contain SL packets where the 'degradationPriority' field should be filled with the same value. The size of the FlexMux packets should be adjusted such that the resulting RTP packet (embedding one or several FlexMux packets) is not larger than the path-MTU. Protection mechanisms for FlexMux streams within RTP packets are outside of the scope of this specification. 10 SDP syntax It is assumed that one typical way to signal the FlexMux streams characteristics of this payload format is via a SDP message that may be transported to the client in reply to a RTSP [11] DESCRIBE command, or via SAP [12]. The SDP protocol is decribed in [10]. 10.1 Types and Names This section describes the MIME types and names associated with this payload format. This section is intended for registration with IANA [13]. curet et al. expires May 2002 [Page 9] Internet Draft RTP payload for MPEG-4 FlexMux streams November 01 10.1.1 MIME type registration MIME media type name: "video" or "audio" or "application" "video" SHOULD be used for MPEG-4 Visual streams (i.e. video as defined in ISO/IEC 14496-2 [2] and/or graphics as defined in ISO/IEC 14496-1 [1]) or MPEG-4 Systems streams that convey information needed for an audio/visual presentation. "audio" SHOULD be used for MPEG-4 Audio streams (ISO/IEC 14496-3) or MPEG-4 Systems streams that convey information needed for an audio only presentation. "application" SHOULD be used for MPEG-4 Systems streams (ISO/IEC14496-1) like MPERG-4 FlexMux streams that serve other purposes than only an audio/visual presentation. The payload names used in an RTPMAP attribute within SDP, to specify the mapping of payload number to its definition, also come from the MIME namespace. Each of the RTP payload mappings defined above has a distinct name. It is recommended that visual streams be identified under 'video', and audio streams be identified under 'audio', and otherwise 'application' be used. When a FLexMux stream is served (e.g. over HTTP) or otherwise must be identified by a MIME type, the type 'application/mpeg4- flexmux' SHALL be used. These files consist of concatenated FLexMux packets in transmission order. MIME media type name:application MIME subtype name:mpeg4-flexmux, Required parameters: mpeg4-flexmuxinfo Optional parameters: none Encoding considerations:base64 generally preferred; files are binary and should be transmitted without CR/LF conversion, 7-bit stripping etc. 10.1.2 attributes Update of the parameters of the rtpmap attribute, and addition of a new attribute. Update of the rtpmap attribute: A new encoding name is defined for the a = rtpmap attribute, the curet et al. expires May 2002 [Page 10] Internet Draft RTP payload for MPEG-4 FlexMux streams November 01 new registred mpeg4-flexmux MIME subtype a = rtpmap: /