Audio/Video Transport B. Link Internet-Draft T. Hager Expires: June 2005 Dolby Laboratories J. Flaks Microsoft Corporation December 2004 RTP Payload Format for AC-3 Streams Status of this Memo This document is an Internet-Draft and is subject to all provisions of section 3 of RFC 3667. By submitting this Internet-Draft, I certify that any applicable patent or other IPR claims of which I am aware have been disclosed, or will be disclosed, and any of which I become aware will be disclosed, in accordance with RFC 3668. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMEDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [1]. Abstract This document describes an RTP payload format for transporting AC-3 encoded audio data. AC-3 is a high quality, multichannel audio coding system used in US HDTV, DVD, cable and satellite television and other media. The RTP payload format as presented in this document includes support for data fragmentation. Link/Hager/Flaks Expires June 2005 [Page 1] Internet Draft RTP Payload Format for AC-3 Streams December 2004 0. Change Log for Primary author has changed to Brian Link since the last version, which was (and further back, .) This version contains many editorial changes. Also these technical changes are made: -The MIME subtype name is changed from ac3 to vnd.dolby.dd-rtp and restricted to use for RTP only. -BSID MIME parameter is added to aid in compatibility negotiation. -Packetization is clarified and constrained to be one or more complete AC-3 frames or one AC-3 frame fragment. Associated header fields are simplified and promoted from Data Unit Header to Payload Header. -The mechanism for sending redundant data and the related header parameters are removed. -A mechanism for carrying SMPTE time code is added. -The field for indicating the number of frames in a packet, is now also used to indicate the number of packets needed to complete a fragmented frame. 1. Introduction AC-3 is a high quality audio codec designed to encode multiple channels of audio into a low bit-rate format. AC-3 achieves its large compression ratios via encoding a multiplicity of channels as a single entity. Dolby Digital, which is a branded version of AC-3, encodes up to 5.1 channels of audio. AC-3 has been adopted as an audio compression scheme for many consumer and professional applications. It is a mandatory audio codec for DVD-video, Advanced Television Standards Committee (ATSC) digital terrestrial television, laser disc, and Digital Living Network Alliance (DLNA) home networking, as well as an optional multichannel audio format for DVD-audio. There is a need to stream AC-3 data over IP networks. Applications for streaming AC-3 include streaming movies from a home media server to a display, video on demand, and multichannel Internet radio. RTP provides a mechanism for stream synchronization and hence serves as the best transport solution for AC-3, which is a codec primarily used in audio-for-video applications. Section 2 gives a brief overview of the AC-3 algorithm. Section 3 describes time code that may be optionally included in the payload for synchronization of audio with video devices not using RTP. Section 4 specifies values for fields in the RTP header, while Section 5 specifies the AC-3 payload format, itself. Section 6 discusses MIME types and SDP usage. Security considerations are covered in Section 7 and IANA considerations in Section 8. References are given in Sections 9 and 10. Link/Hager/Flaks Expires June 2005 [Page 2] Internet Draft RTP Payload Format for AC-3 Streams December 2004 2. Overview of AC-3 AC-3 can deliver up to 5.1 channels of audio at data rates approximately equal to half of one PCM channel [2], [7], [8]. The ".1" refers to a band-limited, optional, low-frequency enhancement channel. AC-3 was designed for signals sampled at rates of 32, 44.1, or 48 kHz. Data rates can vary between 64 kbps and 640 kbps, depending the number of channels and desired quality. AC-3 exploits psychoacoustic phenomena that cause a significant fraction of the information contained in a typical audio signal to be inaudible. Substantial data reduction occurs via the removal of inaudible information contained in an audio stream. Source coding techniques are further used to reduce the data rate. Like most perceptual coders, AC-3 operates in the frequency domain. A 512-point TDAC transform is taken with 50% overlap, providing 256 new frequency samples. Frequency samples are then converted to exponents and mantissas. Exponents are differentially encoded. Mantissas are allocated a varying number of bits depending on the audibility of the spectral component associated with it. Audibility is determined via a masking curve. Bits for mantissas are allocated from a global bit pool. 2.1 AC-3 Bit stream AC-3 bit streams are organized into synchronization frames. Each AC-3 frame contains a Sync Information (SI) field, a Bit Stream Information (BSI) field, and 6 audio blocks (AB), each representing 256 PCM samples for each channel. The entire frame represents a time duration of 1536 PCM samples across all coded channels (e.g., 32 msec @ 48kHz sample rate) [2]. Figure 1 shows the AC-3 frame format. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |SI |BSI| AB0 | AB1 | AB2 | AB3 | AB4 | AB5 |AUX|CRC| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1. AC-3 Frame Format The Synchronization Information field contains information needed to acquire and maintain synchronization. The Bit Stream Information field contains parameters that describe the coded audio service [2]. Each audio block also contains fields that indicate the use of various coding tools: block switching, dither, coupling, and exponent strategy. They also contain metadata, optionally used to enhance the playback, such as dynamic range control. Figure 2 shows the format of an AC-3 audio block. Link/Hager/Flaks Expires June 2005 [Page 3] Internet Draft RTP Payload Format for AC-3 Streams December 2004 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Block |Dither |Dynamic |Coupling |Coupling |Exponent | | switch |Flags |Range Ctrl |Strategy |Coordinates |Strategy | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Exponents | Bit Allocation | Mantissas | | | Parameters | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2. AC-3 Audio Block Format 3. Society of Motion Picture and Television Engineers (SMPTE) Time Code Time code is useful in applications outside RTP where time information must be kept closely associated with encoded audio data. An example of this would be in a digital audio/video transmission system where both audio and video sources have SMPTE time code. When such audio is sent over RTP it is useful if the original SMPTE time code information is carried with it. Time code is optional in this payload format. When a time code value is included in this payload format, its value applies to the Data Unit which immediately follows. Details of the use of the time code to synchronize devices is beyond the scope of this document. The time code format used in this payload format is defined in Section 4.2 ("Time Stamp burst_payload" and Table 1) of [6]. Note that the time code has two modes, 12 octets (six required 16-bit words,) and 18 octets (Three 'optional' 16-bit words are present in addition to the six required 16-bit words.) 4 RTP Header Fields Payload Type (PT): The assignment of an RTP payload type for this packet format is outside the scope of this document; it is specified by the RTP profile under which this payload format is used, or signaled dynamically out-of-band (e.g., using SDP). Marker (M) bit: The M bit is set to one to indicate that the RTP packet payload contains at least one complete AC-3 frame or contains the final fragment of an AC-3 frame. Extension (X) bit: Defined by the RTP profile used. Timestamp: A 32-bit word that corresponds to the sampling instant for the first AC-3 frame in the RTP packet. AC-3 encodes audio sampled at 32 kHz, 44.1 kHz, and 48 kHz. Packets containing fragments of the same frame MUST have the same time stamp. The starting timestamp SHOULD be selected at random. Link/Hager/Flaks Expires June 2005 [Page 4] Internet Draft RTP Payload Format for AC-3 Streams December 2004 5. RTP AC-3 Payload Format According to [5], RTP payload formats should contain an integral number of application data units (ADUs). An ADU shall be equivalent to an AC-3 frame. To simplify the implementation of RTP receivers, each RTP packet MUST contain an integral number of complete AC-3 frames, or one fragment of an AC-3 frame. If an AC-3 frame exceeds the MTU for a network, it SHOULD be fragmented for transmission within an RTP packet. Section 5.2 provides guidelines for creating frame fragments. 5.1 Payload-Specific Header There is a two-octet Payload Header at the beginning of each payload. There is also a Data Unit Header at the beginning of each frame or frame fragment. The Data Unit Header may be empty or contain time code (12 or 18 octets), depending on the value of the Time Code Mode field in the Payload Header. It is always empty for a frame fragment other than the initial fragment. 5.1.1 Payload Header Each AC-3 RTP payload MUST begin with the following payload header. Figure 3 shows the format of this header. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MBZ |TCM| FT| NF | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3. AC-3 RTP Payload Header Frame Type (FT): This two-bit field indicates the type of frame(s) present in the payload. It takes the following values: 0 - One or more complete frames. 1 - Initial fragment of frame which includes the first 5/8ths of the frame. (See Section 5.2.) 2 - Initial fragment of frame, which does not include the first 5/8ths of the frame. 3 - Fragment of frame other than initial fragment. (Note that M bit in RTP header is set for final fragment.) Time Code Mode (TCM): This two-bit field indicates the presence and mode of optional SMPTE time code for all frames in the packet. For non-initial frame fragments (FT of 3), TCM MUST be 0 (Time code not present.) TCM takes the following values: 0 - Time code is not present. 1 - Time code is present in its required mode. 2 - Time code is present in its optional mode. 3 - Reserved. Link/Hager/Flaks Expires June 2005 [Page 5] Internet Draft RTP Payload Format for AC-3 Streams December 2004 Number of frames/fragments(NF): An 8-bit field whose meaning depends on the Frame Type (FT) in this payload. For complete frames (FT of 0), it is used to indicate the number of AC-3 frames in the RTP payload. For frame fragments (FT of 1, 2, or 3), it is used to indicate the number fragments (and therefore packets) that make up the current frame. NF MUST be identical for packets containing fragments of the same frame. Must Be Zero (MBZ): Bits marked MBZ MUST have the value zero and are reserved. 5.1.2 Data Unit Header Each audio data unit (i.e., AC-3 frame or fragment) MUST begin with the data unit header, which contains SMPTE time code or is empty. The format of the time code (TC) is defined by SMPTE in [6]. (Also, see Section 3.) The size of the Data Unit Header is determined by the Time Code Mode (TCM) field in the Payload Header. Time Code Mode Size of Data Unit Header (octets) 0 (Not present) 0 1 (Required mode) 12 2 (Optional mode) 18 Figure 4 shows the full AC-3 RTP payload format. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. +-+-+-+-+-+-+-+ | Payload | TC | Frame | TC | Frame | | TC | Frame | | Header | (1) | (1) | (2) | (2) | | (n) | (n) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. +-+-+-+-+-+-+-+ Figure 4. Full AC-3 RTP payload (multiple frames; with time code) 5.2 Fragmentation of AC-3 Frames The size of an AC-3 frame depends on the sample rate of the audio and the data rate of the encoder (which are indicated in the "Synchronization Information" header in the AC-3 frame.) The size of a frame, for a given sample rate and data rate, is specified in Table 5.18 ("Frame Size Code Table") of [2]. This table shows that AC-3 frames range in size from a minimum of 128 bytes to a maximum of 3840 bytes. If the size of an AC-3 frame exceeds the MTU size, the frame SHOULD be fragmented. When an AC-3 frame is fragmented, it MAY be fragmented such that the first 5/8ths of the frame data is in the first fragment to provide greater resilience to packet loss. This initial portion of a frame is guaranteed to contain the data necessary to decode the first two blocks of the frame. Any other frame fragments are only decodable once the complete frame is received. The 5/8ths point of the frame is defined in Table 7.34 ("5/8_frame Size Table") of [2]. Link/Hager/Flaks Expires June 2005 [Page 6] Internet Draft RTP Payload Format for AC-3 Streams December 2004 6 Types and Names 6.1 MIME Type Registration MIME media type name: audio MIME subtype name: vnd.dolby.dd-rtp Required parameters: Rate: The RTP timestamp clock rate which is equal to the audio sampling rate. Permitted rates are 32000, 44100, and 48000. BSID: This parameter is a repetition of the Bit Stream Identification field in the AC-3 bit stream [2]. It indicates the version number of the bit stream. An AC-3 decoder is capable of decoding bit streams of a given version number and all lower version numbers. An AC-3 decoder is not capable of decoding bit streams with higher version number. In the AC-3 specification, BSID is a 5-bit unsigned integer, so the maximum allowed value is 31. BSID of 8 corresponds to AC-3 as defined in [2]. Optional parameters: Channels: The number of channels present in the AC3 stream. This MUST be a number between 1 and 6. The LFE (".1") channel MUST be counted as one channel. Ptime: The duration of time in milliseconds represented by the AC-3 frame(s) in the packet. For the case of a fragmented frame, Ptime equals the frame's duration in the packet of the first fragment, and 0 in all subsequent packets containing fragments of that frame. Maxptime: The maximum duration of media which can be encapsulated in each RTP packet, expressed as time in milliseconds. Encoding considerations: This MIME subtype is defined for RTP transport only. The AC-3 bit stream MUST be generated according to the AC-3 specification [2]. The RTP packets MUST be packetized according to the RTP payload format defined in this document. Security considerations: See Section 7 of this Document. Interoperability considerations: none Published specification: This payload format specification and See [2]. Link/Hager/Flaks Expires June 2005 [Page 7] Internet Draft RTP Payload Format for AC-3 Streams December 2004 Applications: Multichannel audio compression of audio and audio for video. Additional Information: Magic number(s): The first two octets of an AC-3 frame are always the synchronization word, which has the hex value 0x0B77. File extension(s): .ac3 Macintosh File Type Code(s): none Object Identifier(s) or OID(s): none Person & email address to contact for further information: Brian Link IETF AVT working group. Intended Usage: COMMON Author/Change controller: Author: Brian Link Change Controller: IETF AVT WG 6.2 SDP Usage The encoding name when using SDP [3] SHALL be "vnd.dolby.dd-rtp" (MIME subtype). An example of the media representation in SDP is given below. m = audio 49111 RTP/AVP 100 a = rtpmap:100 vnd.dolby.dd-rtp/48000/6 a = fmtp:100 BSID=8 7. Security Considerations In order to protect copyrighted material, certain security precautions may be necessary. The payload format described in this document is subject to the security considerations defined in [4]. The security considerations discussed in [4] imply the usage of encryption to protect the confidentiality of content. Such an encryption scheme is harmless to the encoded audio data presuming the data is decrypted before being sent to the decoder. 8. IANA Considerations Registration of a new MIME subtype for AC-3 is requested (see Section 6.) Link/Hager/Flaks Expires June 2005 [Page 8] Internet Draft RTP Payload Format for AC-3 Streams December 2004 9. Normative References [1] Bradner, S., "Key Words for use in RFCs to Indicate Requirement Levels", RFC 2119, Internet Engineering Task Force, March 1997. [2] U.S. Advanced Television Systems Committee (ATSC), "ATSC Standard: Digital Audio Compression (AC-3), Revision A," Doc A/52A, August 2001. [3] Handley, M. and Jacobson, V., "SDP: Session Description Protocol," RFC 2327, Internet Engineering Task Force, April 1998 [4] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobsen, "RTP: A Transport Protocol for Real-Time Applications", RFC 3550, STD 64, July 2003. [5] Handley, M. and Perkins, C., "Guidelines for Writers of RTP Payload Format Specifications," RFC 2736, Internet Engineering Task Force, December 1999. [6] Society of Motion Picture and Television Engineers (SMPTE), "SMPTE Standard for Television - Formant for Non-PCM Audio and Data in AES3 - Generic Data Types," SMPTE 339M-2000, April 2000. 10. Informative References [7] Todd, C. et. al, "AC-3: Flexible Perceptual Coding for Audio Transmission and Storage," Preprint 3796, Presented at the 96th Convention of the Audio Engineering Society, May 1994. [8] Fielder, L. et. al, "AC-2 and AC-3: Low-Complexity Transform-Based Audio Coding," Collected Papers on Digital Audio Bit-Rate Reduction, pp. 54-72, Audio Engineering Society, September 1996. Authors' Addresses Brian Link Dolby Laboratories 100 Potrero Ave San Francisco, CA 94103 Phone: +1 415 558 0200 Email: bdl@dolby.com Todd Hager Dolby Laboratories 100 Potrero Ave San Francisco, CA 94103 Phone: +1 415 558 0136 Email: thh@dolby.com Link/Hager/Flaks Expires June 2005 [Page 9] Internet Draft RTP Payload Format for AC-3 Streams December 2004 Jason Flaks Microsoft Corporation 1 Microsoft Way Redmond, WA 98052 Phone: +1 425 722 2543 Email: jasonfl@microsoft.com The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Copyright (C) The Internet Society (2004). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.