Network Working Group A. Sollaud Internet-Draft France Telecom Expires: July 22, 2006 January 18, 2006 RTP payload format for the future scalable and wideband extension of G.729 audio codec draft-ietf-avt-rtp-g729-scal-wb-ext-01 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on July 22, 2006. Copyright Notice Copyright (C) The Internet Society (2006). Abstract This document specifies a real-time transport protocol (RTP) payload format to be used for the future scalable and wideband extension of the International Telecommunication Union (ITU-T) G.729 audio codec. A media type registration is included for this payload format. Sollaud Expires July 22, 2006 [Page 1] Internet-Draft RTP payload format for G.729EV January 2006 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. RTP header usage . . . . . . . . . . . . . . . . . . . . . . . 4 4. Payload format . . . . . . . . . . . . . . . . . . . . . . . . 4 4.1. Payload structure . . . . . . . . . . . . . . . . . . . . 5 4.2. Payload Header: MBS field . . . . . . . . . . . . . . . . 5 4.3. Payload Header: FT field . . . . . . . . . . . . . . . . . 6 4.4. Audio data . . . . . . . . . . . . . . . . . . . . . . . . 7 5. Payload format parameters . . . . . . . . . . . . . . . . . . 7 5.1. Media type registration . . . . . . . . . . . . . . . . . 7 5.2. Mapping to SDP parameters . . . . . . . . . . . . . . . . 9 5.3. Offer-answer model considerations . . . . . . . . . . . . 9 6. Security considerations . . . . . . . . . . . . . . . . . . . 10 7. IANA considerations . . . . . . . . . . . . . . . . . . . . . 11 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 8.1. Normative references . . . . . . . . . . . . . . . . . . . 11 8.2. Informative references . . . . . . . . . . . . . . . . . . 11 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 13 Intellectual Property and Copyright Statements . . . . . . . . . . 14 Sollaud Expires July 22, 2006 [Page 2] Internet-Draft RTP payload format for G.729EV January 2006 1. Introduction The International Telecommunication Union (ITU-T) is working on a scalable and wideband extension of its recommendation G.729 [6]. This future audio codec will be called G.729EV in the following text. This document specifies the payload format for packetization of G.729EV encoded audio signals into the real-time transport protocol (RTP). The payload format itself and the handling of variable bit rate are described in Section 4. A media type registration and the details for the use of G.729EV with SDP are given in Section 5. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT","RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [1]. 2. Background G.729EV is mainly designed to be used as a speech codec, but it can be used for music at the highest bit rates. The sampling frequency is 16000 Hz and the frame size is 20 ms. This G.729-based codec produces an embedded bitstream providing an improved narrow band quality [300, 3400 Hz] at 12 kbps, and an enhanced and gracefully improving wideband quality [50, 7000 Hz] from 14 kbps to 32 kbps, by steps of 2 kbps. At 8 kbps it generates a G.729 bitstream. It has been mainly designed for packetized wideband voice applications (Voice over IP or ATM, Telephony over IP, private networks...) and particularly for those requiring scalable bandwidth, enhanced quality above G.729, and easy integration into existing infrastructures. G.729EV is also designed to cope with other services like high quality audio/video conferencing, archival, messaging, etc. For all those applications, the scalability feature allows to tune the bit rate versus quality trade-off, possibly in a dynamic way during a session, taking into account service requirements and network transport constraints. G.729EV produces frames that are said embedded because they are composed of embedded layers. The first layer is called the core layer and is bitstream compatible with the ITU-T G.729 with annex B coder. Upper layers are added while bit rate increases, to improve Sollaud Expires July 22, 2006 [Page 3] Internet-Draft RTP payload format for G.729EV January 2006 quality and enlarge audio bandwidth from narrowband to wideband. As a result, a received frame can be decoded at its original bit rate or at any lower bit rate corresponding to lower layers which are embedded. Only the core layer is mandatory to decode understandable speech, upper layers provide quality enhancement and wideband enlargement. Audio codecs often support voice activity detection (VAD) and comfort noise generation (CNG). During silence periods, the coder may significantly decrease the transmitted bit rate by sending only comfort noise parameters in special small frames called silence insertion descriptors (SID). The receiver's decoder will generate comfort noise according to the SID information. This operation of sending low bit rate comfort noise parameters during silence periods is usually called discontinuous transmission (DTX). G.729EV will be first released without support for DTX. Anyway, this functionality is planned and will be defined in a separate annex later. Thus this specification provides DTX signalling, even if the size of a SID frame is not yet standardized. 3. RTP header usage The format of the RTP header is specified in RFC 3550 [2]. This payload format uses the fields of the header in a manner consistent with that specification. The RTP timestamp clock frequency is the same as the sampling frequency, that is 16 kHz. So the timestamp unit is in samples. The duration of one frame is 20 ms, corresponding to 320 samples per frame. Thus the timestamp is increased by 320 for each consecutive frame. The M bit should be set as specified in the applicable RTP profile, for example, RFC 3551 [3]. The assignment of an RTP payload type for this packet format is outside the scope of the document, and will not be specified here. It is expected that the RTP profile under which this payload format is being used will assign a payload type for this codec or specify that the payload type is to be bound dynamically (see Section 5.2). 4. Payload format Sollaud Expires July 22, 2006 [Page 4] Internet-Draft RTP payload format for G.729EV January 2006 4.1. Payload structure The complete payload consists of a payload header of 1 octet, followed by audio data representing one or more consecutive frames at the same bit rate. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MBS | FT | | +-+-+-+-+-+-+-+-+ + : one ore more frames at the same bit rate : : : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4.2. Payload Header: MBS field MBS (4 bits): maximum bit rate supported. Indicates a maximum bit rate to the encoder at the site of the receiver of this payload. The value of the MBS field is set according to the following table: +-------+--------------+ | MBS | max bit rate | +-------+--------------+ | 0 | 8 kbps | | 1 | 12 kbps | | 2 | 14 kbps | | 3 | 16 kbps | | 4 | 18 kbps | | 5 | 20 kbps | | 6 | 22 kbps | | 7 | 24 kbps | | 8 | 26 kbps | | 9 | 28 kbps | | 10 | 30 kbps | | 11 | 32 kbps | | 12-14 | (reserved) | | 15 | NO_MBS | +-------+--------------+ The MBS is used to tell the other party the maximum bit rate one can receive. The encoder MUST follow the received MBS. It MUST NOT send frames at a bit rate higher than the received MBS. Thanks to the embedded property of the coding scheme, note that it can send frames at the MBS rate or any lower rate. As long as it does not exceed the MBS, it can change its bit rate at any time without previous notice. The MBS received is valid until the next MBS is received, i.e. a Sollaud Expires July 22, 2006 [Page 5] Internet-Draft RTP payload format for G.729EV January 2006 newly received MBS value overrides the previous one. If a payload with an invalid MBS value is received, the MBS MUST be ignored. Note that the MBS is a codec bit rate, the actual network bit rate is higher and depends on the overhead of the underlying protocols. The MBS field MUST be set to 15 for packets sent to a multicast group. The MBS field MUST be set to 15 in all packets when the actual MBS value is sent through non-RTP means. This is out of the scope of this specification. 4.3. Payload Header: FT field FT (4 bits): Frame type of the frame(s) in this packet, as per the following table: +-------+---------------+------------+ | FT | encoding rate | frame size | +-------+---------------+------------+ | 0 | 8 kbps | 20 octets | | 1 | 12 kbps | 30 octets | | 2 | 14 kbps | 35 octets | | 3 | 16 kbps | 40 octets | | 4 | 18 kbps | 45 octets | | 5 | 20 kbps | 50 octets | | 6 | 22 kbps | 55 octets | | 7 | 24 kbps | 60 octets | | 8 | 26 kbps | 65 octets | | 9 | 28 kbps | 70 octets | | 10 | 30 kbps | 75 octets | | 11 | 32 kbps | 80 octets | | 12-14 | (reserved) | | | 15 | NO_DATA | 0 | +-------+---------------+------------+ The FT value 15 (NO_DATA) indicates that there is no audio data in the payload. This MAY be used to update the MBS value when there is no audio frame to transmit. The payload will then be reduced to the payload header. If a payload with an invalid FT value is received, the whole payload MUST be ignored. Sollaud Expires July 22, 2006 [Page 6] Internet-Draft RTP payload format for G.729EV January 2006 4.4. Audio data Audio data of a payload contains one or more consecutive audio frames at the same bit rate. The audio frames are packed in order of time, that is the older first. The actual number of frame is easy to infer from the size of the audio data part: nb_frames = (size_of_audio_data) / (size_of_one_frame). This is compatible with DTX, with the restriction that the SID frame MUST be at the end of the payload (it is consistent with the payload format of G.729 described in section 4.5.6 of RFC 3551 [3]). Since the SID frame is much smaller than any other frame, it will not hinder the calculation of the number of frames at the receiver side and can be easily detected. Actually the presence of a SID frame will be inferred by the result of the above division not being an integer. Note that if FT=15, there will be no audio frame in the payload. 5. Payload format parameters This section defines the parameters that may be used to configure optional features in the G.729EV RTP transmission. The parameters are defined here as part of the media subtype registration for the G.729EV codec. A mapping of the parameters into the Session Description Protocol (SDP) [4] is also provided for those applications that use SDP. In control protocols that do not use MIME or SDP, the media type parameters must be mapped to the appropriate format used with that control protocol. 5.1. Media type registration This registration is done using the template defined in RFC 4288 [7] and following RFC 3555 [8]. Type name: audio Subtype name: G729EV Required parameters: none Optional parameters: Sollaud Expires July 22, 2006 [Page 7] Internet-Draft RTP payload format for G.729EV January 2006 dtx: indicates that discontinuous transmission (DTX) is used or preferred. DTX means voice activity detection and non transmission of silent frames. Permissible values are 0 and 1. 0 means no DTX. 0 is implied if this parameter is omitted. The first version of G.729EV will not support DTX. maxbitrate: the absolute maximum codec bit rate for the session. Permissible values are between 0 and 11 (see table in Section 4.2 of RFC XXXX). 11 is implied if this parameter is omitted. The maxbitrate restricts the range of bit rates which can be used. Frames bit rate (FT) and MBS MUST NOT exceed this value. mbs: the initial value of MBS, that is the current maximum codec bit rate supported as a receiver. Permissible values are between 0 and maxbitrate (see table in Section 4.2 of RFC XXXX). The maximum MBS value is implied if this parameter is omitted. Note that this parameter will be dynamically updated by the MBS field of the RTP packets sent, it is not an absolute value for the session. The goal is to announce this value, prior to the sending of any packet, to avoid the remote sender to exceed the MBS at the beginning of the session. ptime: the recommended length of time in milliseconds represented by the media in a packet. See RFC 2327 [4]. maxptime: the maximum length of time in milliseconds which can be encapsulated in a packet. Encoding considerations: This media type is framed and contains binary data. Security considerations: See Section 6 of RFC XXXX Interoperability considerations: none Published specification: RFC XXXX Applications which use this media type: Audio and video conferencing tools. Additional information: none Person & email address to contact for further information: Aurelien Sollaud, aurelien.sollaud@francetelecom.com Intended usage: COMMON Restrictions on usage: This media type depends on RTP framing, and Sollaud Expires July 22, 2006 [Page 8] Internet-Draft RTP payload format for G.729EV January 2006 hence is only defined for transfer via RTP [2]. Author/Change controller: IETF Audio/Video Transport working group delegated from the IESG 5.2. Mapping to SDP parameters The information carried in the media type specification has a specific mapping to fields in the Session Description Protocol (SDP) [4], which is commonly used to describe RTP sessions. When SDP is used to specify sessions employing the G.729EV codec, the mapping is as follows: o The media type ("audio") goes in SDP "m=" as the media name. o The media subtype ("G729EV") goes in SDP "a=rtpmap" as the encoding name. The RTP clock rate in "a=rtpmap" MUST be 16000 for G.729EV. o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and "a=maxptime" attributes, respectively. o Any remaining parameters go in the SDP "a=fmtp" attribute by copying them directly from the media type string as a semicolon separated list of parameter=value pairs. Some example SDP session descriptions utilizing G.729EV encodings follow. Example 1: default parameters m=audio 53146 RTP/AVP 98 a=rtpmap:98 G729EV/16000 Example 2: recommended packet duration of 40 ms (=2 frames), DTX off, and initial MBS set to 26 kbps m=audio 51258 RTP/AVP 99 a=rtpmap:99 G729EV/16000 a=fmtp:99 dtx=0; mbs=8 a=ptime:40 5.3. Offer-answer model considerations The following considerations apply when using SDP offer-answer procedures to negotiate the use of G.729EV payload in RTP: Sollaud Expires July 22, 2006 [Page 9] Internet-Draft RTP payload format for G.729EV January 2006 o Since G.729EV is an extension of G.729, the offerer SHOULD announce G.729 support in its "m=audio" line, with G.729EV preferred. This will allow interoperability with both G.729EV and G.729-only capable parties. Below is an example of such an offer: m=audio 55954 RTP/AVP 98 18 a=rtpmap:98 G729EV/16000 a=rtpmap:18 G729/8000 If the answerer supports G.729EV, it will keep the payload type 98 in its answer and the conversation will be done using G.729EV. Else, if the answerer supports only G.729, it will leave only the payload type 18 in its answer and the conversation will be done using G.729 (the payload format for G.729 is defined in RFC 3551 [3]). o The "dtx" parameter concerns both sending and receiving, so both sides of a bi-directional session MUST use the same "dtx" value. If one party indicates it does not support DTX, DTX must be deactivated both ways. o The "maxbitrate" parameter is bi-directional. If the offerer sets a maxbitrate value, the answerer MUST reply with a smaller or equal value. The actual maximum bit rate for the session will be the minimum. o The "mbs" parameter is not symmetric. Values in the offer and the answer are independent and take into account local constraints. Anyway, one party MUST NOT start sending frames at a bit rate higher than the "mbs" of the other party. o The parameters "ptime" and "maxptime" will in most cases not affect interoperability. The SDP offer-answer handling of the "ptime" parameter is described in RFC 3264 [5]. The "maxptime" parameter MUST be handled in the same way. 6. Security considerations RTP packets using the payload format defined in this specification are subject to the general security considerations discussed in the RTP specification [2] and any appropriate profile (for example, RFC 3551 [3]). As this format transports encoded speech/audio, the main security issues include confidentiality and authentication of the speech/audio Sollaud Expires July 22, 2006 [Page 10] Internet-Draft RTP payload format for G.729EV January 2006 itself. The payload format itself does not have any built-in security mechanisms. Confidentiality of the media streams is achieved by encryption, therefore external mechanisms, such as SRTP [9], MAY be used for that purpose. This payload format and the G.729EV encoding do not exhibit any significant non-uniformity in the receiver-end computational load and thus in unlikely to pose a denial-of-service threat due to the receipt of pathological datagrams. 7. IANA considerations It is requested that one new media subtype (audio/G729EV) is registered by IANA, see Section 5.1. 8. References 8.1. Normative references [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [2] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [3] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video Conferences with Minimal Control", STD 65, RFC 3551, July 2003. [4] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998. [5] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002. 8.2. Informative references [6] International Telecommunications Union, "Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear- prediction (CS-ACELP)", ITU-T Recommendation G.729, March 1996. [7] Freed, N. and J. Klensin, "Media Type Specifications and Registration Procedures", BCP 13, RFC 4288, December 2005. [8] Casner, S. and P. Hoschka, "MIME Type Registration of RTP Payload Formats", RFC 3555, July 2003. Sollaud Expires July 22, 2006 [Page 11] Internet-Draft RTP payload format for G.729EV January 2006 [9] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004. Sollaud Expires July 22, 2006 [Page 12] Internet-Draft RTP payload format for G.729EV January 2006 Author's Address Aurelien Sollaud France Telecom 2 avenue Pierre Marzin Lannion Cedex 22307 France Phone: +33 2 96 05 15 06 Email: aurelien.sollaud@francetelecom.com Sollaud Expires July 22, 2006 [Page 13] Internet-Draft RTP payload format for G.729EV January 2006 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2006). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Sollaud Expires July 22, 2006 [Page 14]