Network Working Group M. Ramalho Internet-Draft Cisco Systems, Inc. Expires: December 30, 2003 July 2003 RTP Payload Format for RGL Codec draft-ramalho-rgl-rtpformat-02.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http:// www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on December 30, 2003. Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved. Abstract This document describes the RTP payload format and the storage mode format for the RGL Codec (Version 1.0.0) described in draft-ramalho-rgl-desc-01.txt [4] and documented fully at www.vovida.org [12]. The necessary details for the use of the RGL codec with SDP are included in this document. Ramalho Expires December 30, 2003 [Page 1] Internet-Draft RTP Payload Format for RGL Codec July 2003 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. RGL Frame Specifics Necessary for the Understanding of the Proposed RTP Format . . . . . . . . . . . . . . . . . . . . . 6 4. RTP Payload Format for RGL Codec . . . . . . . . . . . . . . . 8 4.1 Type One: One RGL frame in RTP payload . . . . . . . . . . . . 9 4.2 Type Two: A Multiplicity of Fully Specified RGL Frames in RTP payload . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.3 Decoding Type One and Type Two RGL Payload Formats . . . . . . 13 5. RGL Storage Mode . . . . . . . . . . . . . . . . . . . . . . . 15 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 6.1 MIME Registration for RGLU . . . . . . . . . . . . . . . . . . 19 6.2 MIME Registration for RGLA . . . . . . . . . . . . . . . . . . 20 7. SDP Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 22 8. Security Considerations . . . . . . . . . . . . . . . . . . . 23 Normative References . . . . . . . . . . . . . . . . . . . . . 24 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 25 Intellectual Property and Copyright Statements . . . . . . . . 26 Ramalho Expires December 30, 2003 [Page 2] Internet-Draft RTP Payload Format for RGL Codec July 2003 1. Introduction The RGL (short for Ramalho G.711 Lossless) Codec obtains lossless compression of speech/audio packet payloads encoded with ITU-T Recommendation G.711 PCM (mu-law or A-law) with trivial complexity and virtually no delay. The RGL Codec (Version 1.0.0) is described in draft-ramalho-rgl-desc-01.txt [4] and documented fully at www.vovida.org [13]. The RGL codec is freeware subject to the Vovida.org licensing terms found at www.vovida.org [14]. The RGL codec performs lossless compression of G.711 encoded frames of arbitrary frame length. However, as described the RGL Codec whitepaper at www.vovida.org [15], the recommended size for optimal RGL compression gains during (8k sampled) speech segments is less than 30 milliseconds. For example, ten millisecond frames are near optimum and correspond to exactly 80 samples of 8k sampled (G.711) speech/audio input. The RTP payload format described herein provides for two types of packetizations: one packetization consists of one RGL frame per RTP packet and the other accommodates multiple RGL frames per RTP packet. The single RGL frame packetization accommodates RGL frame sizes (in octets) up to the packet MTU. The multiple RGL frame packetization accommodates RGL frames containing up to 250 compressed G.711 samples (31.25 milliseconds of 8k sampled speech/audio). As the interfaces at the RGL end systems are often PSTN/GSTN networks, note that a range of up to 250 (G.711) samples includes convenient frame sizes for optimal transcoding into payloads of virtually all PSTN/GSTN transport systems (e.g., ATM VoAAL2 frame of 44 bytes/5.5 milliseconds or ATM VoAAL5 frame of 48 bytes/6.0 milliseconds). A "storage mode" format used for storing RGL frames in a file format for "playback" applications such as IVR prompts or voicemail applications is also described. This storage mode format has the capability to store very large RGL frames (up to approximately 8 second frames) and "erasure" frames. Note that although RGL is referred to as a "codec", it is really a LOSSLESS data compression algorithm. That is, no signal processing induced degradations occur with this codec; exactly the same G.711 output frame is produced by the decoder that was provided to the encoder. RGL has been tuned for extremely low computational complexity and optimized for typical auditory-based G.711 input distributions. Although RGL was optimized for typical G.711 input distributions, is lossless for ANY G.711 input frame. The words "byte" and "octet" are used interchangeably in this document to denote eight bit words. In conformance with the Internet Ramalho Expires December 30, 2003 [Page 3] Internet-Draft RTP Payload Format for RGL Codec July 2003 Protocol, all fields are carried in network byte order, that is, most significant byte (octet) first. Within a byte, the most significant bit is transmitted first. This byte order is commonly known as big endian. In this specification, bytes and bits shown on the left are more significant. Ramalho Expires December 30, 2003 [Page 4] Internet-Draft RTP Payload Format for RGL Codec July 2003 2. Conventions The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, NOT RECOMMENDED, MAY, and OPTIONAL, when they appear in this document, are to be interpreted as described in RFC2119 [2]. Ramalho Expires December 30, 2003 [Page 5] Internet-Draft RTP Payload Format for RGL Codec July 2003 3. RGL Frame Specifics Necessary for the Understanding of the Proposed RTP Format This section outlines RGL codec framing necessary for understanding the proposed RGL RTP frame format. The RGL codec compresses a frame of Y G.711 bytes into the compressed RGL frame that can be of length 1 to (Y+1) bytes. There are two implications of this compression relative to a native G.711 RTP payload: 1) one cannot a priori determine the length of a received RGL frame in bytes, and 2) on relatively infrequent basis, the RGL frame is one byte *longer* than the corresponding native G.711 frame. The first implication implies that if more than one RGL frame is packetized in one RTP payload, that information must be placed in the RTP payload (specifically a Table of Contents immediately after the RTP header) to demark the RGL frame boundaries. Note that this is not necessary for native G.711 RTP encodings, as Y bytes of G.711 RTP payload is equal to exactly Y G.711 samples. The second implication is that the RGL encoding occasionally expands the RGL frame relative to G.711. When expansion occurs, it has been explicitly limited to be one byte in the design of the RGL codec. A consideration for the design of a payload format for the RGL codec is that the payload format should easily accommodate single-RGL frame per RTP packet and multiple-RGL frame per RTP packet cases on the fly, packet by packet. The reason for this desire is the nature of the RGL codec compression. For example, assume an SDP specified environment where SDP parameter ptime is 20 milliseconds. Consider the case where silence is present for the first 11 milliseconds of a 20 millisecond encoding interval. A RGL encoder may choose as an optimization to create two, 10 millisecond RGL frames rather than one, 20 millisecond RGL frame - as a greater overall compression may result from the former. Such RGL coding optimization is anticipated when RGL is used in recording or "storage mode" environments (silence and other low energy segments are compressed more effectively by such heuristics). Storage Mode is discussed later in Section 5 (Section 5). One way to differentiate between single and multiple RGL frame payload formats in a typical SDP-specified VoIP environment would be to use different SDP RTP payload types for each mode (e.g., EVRC payload format in RFC 3558 [5]). However, the RGL coding methodology is expected to be used in non-SDP specified and other non-IP packet transport environments. Therefore a "self-describing" mechanism to differentiate between single and multiple RGL frame payload formats was desired and is explained herein. This mechanism employs "RGL reserved codes" in the first byte of the RTP payload. For the multiple RGL frame payload format case, a Table of Contents (TOC) is Ramalho Expires December 30, 2003 [Page 6] Internet-Draft RTP Payload Format for RGL Codec July 2003 placed at the beginning of the RTP payload. To accommodate a TOC for the multiple RGL frame per RTP payload case, the RGL codec was revised to Version 1 in early 2003 to create seven "reserved first RGL byte" codes. This was done by deleting an "anchor codepoint" that was never used for real world signals (see draft-ramalho-rgl-desc-01.txt [4] or documents at www.vovida.org [16] for details). The end result is that a (version 1.0.0 or higher) RGL encoder will never produce a first byte of the form {XXX11110} where {XXX} is not equal to {000}. In other words the first byte in an RGL encoded frame will never be 0x3E, 0x5E, 0x7E, 0x9E, 0xBE, 0xDE or 0xFE. This modification was accomplished such that a Version 1.0.0 (or higher) RGL encoder is backwards compatible with previous version RGL decoders. Knowing that these "reserved codes" can never be produced as the first byte of an RGL frame, we can use these codes to create TOC for the multiple RGL frame payload case to be described below. If one of these seven "reserved codes" is not the first byte in the RTP payload, then the RGL payload consists on only one RGL frame (Type One (Section 4.1) packetization below). For the multiple RGL frame payload case one of the "reserved codes" will be the first byte of the RTP payload (Type Two (Section 4.2) packetization below). Ramalho Expires December 30, 2003 [Page 7] Internet-Draft RTP Payload Format for RGL Codec July 2003 4. RTP Payload Format for RGL Codec The RTP payload format for RGL codec conforms to the Real-Time Transport Protocol (RFC1889 [7])in every aspect. A RTP packet for the RGL codec looks like: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ \ contributing source (CSRC) identifiers \ / (zero up to fifteen) / \ ..... \ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \ Zero or One RTP Header Extension \ / (only if X bit =1) / \ ..... \ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ \ \ / RTP PAYLOAD (TOC, if required, and one or more RGL frames) / \ \ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1 The first twelve octets are present in every RTP packet as per RFC 1889, while the list of CSRC identifiers is present only when inserted by a mixer. This profile follows the RTP profile recommendations in RFC 1890 [8]. There are two bits in the mandatory twelve octet header that are defined by this profile, the extension (X) bit and the marker (M) bit. The payload specification used by this profile does not make use of the RTP header extension field to specify the RGL frame(s) in the RTP payload. Thus no payload extensions are defined herein and the usual setting of the X bit is zero. However note that as per RFC 1890 [8] that RTP applications SHOULD NOT assume that the RTP header X bit is always zero and should be prepared to ignore the header extension. The RGL codec obtains high compression during periods of silence and low-noise. For this reason, the use of VAD/SS (voice activity detection/silence suppression) is NOT RECOMMENDED. Therefore the value of the M bit MUST be set to zero if the RGL application does not use VAD/SS. If VAD/SS is employed despite the recommendation Ramalho Expires December 30, 2003 [Page 8] Internet-Draft RTP Payload Format for RGL Codec July 2003 against using it, then the value of the M bit MUST be set consistent with RFC 1890 [8] (i.e., set to one to mark the beginning of a talkspurt). The following two subsections describe the different types of RGL payloads: one RGL frame per RTP packet payload and a multiplicity of RGL frames per RTP packet payload. Following those two subsections, the "storage mode" format is descirbed which may be used outside of RTP environments (i.e., sent reliably via TCP as a file). 4.1 Type One: One RGL frame in RTP payload Exactly one RGL frame is placed in RTP payload. For this common case, the RGL frame is simply placed in the RTP payload by the RTP payload format encoder. Note from the text in the previous RGL detail section, that whatever the first byte of the RGL frame is, it will not be one of the "RGL reserved codes" (0x3E, 0x5E, 0x7E, 0x9E, 0xBE, 0xDE and 0xFE). The multiple RGL frame per RTP packetization will require a TOC and the first byte of the TOC will always begin with one of the reserved codes. Thus, this packetization case is determined by the RTP decoder by noting that the first byte is not one of these reserved codes. The number of samples contained in this payload is specified by the SDP [9] "ptime" parameter and is exactly (ptime*8) samples. If ptime isn't explicitly signaled in SDP, the default ptime for the RGL codec is used (which is defined later in the SDP section of this document to be 20 milliseconds). As described in draft-ramalho-rgl-desc-01.txt [4] the RGL payload for a ptime=10 millisecond payload (this would map to 10*8 = 80 G.711 samples) can be from one to 81 bytes long. If one desires to explicitly note the number of samples in a single RGL frame payload without using the SDP "ptime" parameter, RGL packetization "Type Two" (Section 4.2 (Section 4.2)) MUST be used. The entire RTP payload is passed to the RGL decoder at the far-end by the RTP payload format decoder. The RTP encoder MAY choose to align the RTP payload to 32-bit word boundaries, although it is more bandwidth efficient not to do so. RGL frames are bit zero padded in the RGL encoder to be an integer number of bytes long, but are not 32-bit word aligned. If the RTP encoder does align to 32-bit word boundaries, this is not a problem for the RGL decoder. That is, the RGL decoder calculates the end of the RGL frame by knowing the number of bits per sample (this is encoded in the first byte of the RGL frame) and the number of samples represented in the RGL frame (specified here via "ptime"). Thus, if extra bytes after the end of any RGL frame are passed to the far-end RGL decoder they are inconsequential to decoding the RGL frame (they are not used). Ramalho Expires December 30, 2003 [Page 9] Internet-Draft RTP Payload Format for RGL Codec July 2003 It is sometimes useful or desired in some RTP relay applications (or other re-transmission contexts) to "relay" the fact that an expected RTP payload was not received (e.g., recieved in error and dropped or not received at all). In these environments, the sending of a "null payload frame" to the far-end may be required (e.g., some TDM system interworking systems in which the "information payload" does not contain timestamp information). These "null payload frames" are usually called "erasure frames". Type Two packetization of the following section MUST be used for the sending of "erasure frames" for the environments that require the sending of such "erasure frames". 4.2 Type Two: A Multiplicity of Fully Specified RGL Frames in RTP payload The Table of Contents (TOC) for this case is the "reserved code" 0xFE followed by a number of parameters which is dependent on the number of RGL frames in the RTP packet, followed by the RGL frames themselves. The parameter "Num_Frames" is set to the number of RGL frames are in the RTP payload. The following illustration is an example assuming four RGL frames. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1 1 1 1 1 1 1 0| Num_Frames | RGL_Size_1 | Num_Samps_1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RGL_Size_2 | Num_Samps_2 | RGL_Size_3 | Num_Samps_3 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RGL_Size_4 | Num_Samps_4 | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + \ Remainder of RTP PAYLOAD: In this example, exactly \ / four RGL frames. The second RGL frame begins / \ [RGL_Size_1+1] bytes after the last TOC byte. \ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2 Field Length Meaning -------------------------------------------------------------------- Num_Frames 8 bits The number of RGL frames in RTP payload. (unsigned, Although this parameter has a range from 0 uchar 8) to 255, the number of RGL frames in the RTP payload is limited to the integer number of RGL frames that can be placed in the packet MTU. In practice, this should limit the number of RGL packets to well below 255. A minimum Ramalho Expires December 30, 2003 [Page 10] Internet-Draft RTP Payload Format for RGL Codec July 2003 of one RGL frame MUST be present; zero is not a valid value. The parameters which reference frame index "j" range from one to Num_Frames, inclusive. RGL_Size_1 8 bits The first RGL frame begins immediately (unsigned, after the last TOC parameter. RGL_Size_1 is uchar8) set to the length of the first RGL frame size. Valid rage for this parameter is from 0 to 251. The zero value is used for an "erasure frame"; values above 251 are reserved. RGL_Size_2 8 bits The beginning of the second RGL frame, if (unsigned, present, is (RGL_Size_1+1) bytes after uchar8) the last TOC parameter. RGL_Size_2 is set to the length of the second RGL frame size. Valid rage for this parameter is from 0 to 251. The zero value is used for an "erasure frame"; values above 251 are reserved. RGL_Size_j 8 bits The beginning of the RGL frame j (for j>2), (unsigned, if present, is uchar8) (RGL_Size_1+ ... +RGL_Size_[j-1]+1) bytes after the last TOC parameter. RGL_Size_j is set to the length of the jth RGL frame size. Valid rage for this parameter is from 0 to 251. The zero value is used for an "erasure frame"; values above 251 are reserved. Num_Samps_j 8 bits The number of samples in RGL encoded frame j. (unsigned, This range of this parameter is from 0 to 250 uchar8) samples. This payload format has been designed to accommodate one or more RGL frames; as such it is expected that the TOC will contain a minimum of four bytes. In all cases, the number of RGL frames in the RTP payload is specified in the uchar parameter "Num_Frames". If more than one RGL frame is to be placed in the payload, additional {RGL_Size_j, Num_Samps_j} two-byte tuples are added to the TOC after the {RGL_Size_1, Num_Samps_1} tuple (as illustrated above). The number of such additional two-byte tuples added after the first 32-bit word in the RTP payload is therefore (Num_Frames-1). The end of the TOC is defined as the "Num_Samps_j" byte that is associated with the last RGL frame in the payload. The TOC is not necessarily 32-bit aligned (it is for an odd number of RGL frames in the payload, however). The length of the TOC is therefore (2+2*Num_Frames). The first RGL frame follows immediately after the end of the TOC. Ramalho Expires December 30, 2003 [Page 11] Internet-Draft RTP Payload Format for RGL Codec July 2003 RGL_Size_j is set to the length of the jth RGL frame size. Since a RGL frame is at most one byte longer than the corresponding G.711 frame, this packetization allows for RGL frames containing a maximum of 250 samples (31.25 milliseconds of 8k sampled G.711). The oldest RGL frame MUST occur first, as per guidelines in RFC3551 [11]. The number of samples contained in any RGL frame in this payload is specified by the corresponding Num_Samps_j parameter and overrides any "ptime" parameter that may be specified via SDP [9]. Due to this, care SHOULD be exercised to set the sum of the individual Num_Samps_j parameters consistent with "ptime" (or the other way around) if this packetization is used in SDP environments (or other media negotiation environments). In other words, if SDP is used, the number of samples contained in the entire RTP payload should be exactly ptime*8 samples; the sum of all the Num_Samps_j parameters should therefore equal ptime*8. Although the RTP payload is illustrated above as 32-bit word aligned, this need not be the case. RGL frames are bit zero padded in the RGL encoder to be an integer number of bytes long, but are not 32-bit word aligned. The RTP encoder MAY choose to align the RTP payload to 32-bit word boundaries by padding the last RGL frame, however it is NOT RECOMMENDED to do so. If the RTP encoder chooses align to 32-bit word boundaries by padding the last RGL frame, this is not a problem for the RGL decoder on the last RGL frame. That is, the RGL decoder calculates the end of the RGL frame by knowing the number of bits per sample (this is encoded in the first byte of the RGL frame) and the number of samples represented in the RGL frame (explicitly provided in this packetization via the appropriate Num_Samps_j parameter). Thus, if extra bytes after the end of any RGL frame are passed to the far-end RGL decoder they are inconsequential to decoding the RGL frame (they are not used). Thus aligning the RTP payload to 32-bit boundaries may result in extra bytes being passed to the RGL decoder for the last RGL frame in the RTP payload. These extra bytes are thus inconsequential for successful decoding. However, if the RTP encoder aligns to 32-bit boundaries by padding the last RGL frame it SHOULD set the RGL_Size parameter for the last RGL frame to the actual RGL frame size itself (i.e., not include the padding in the count). This sublety becomes important for the case where the RTP receiver uses the "storage mode" (Section 5 (Section 5)) to record/save the RTP packet payload in a file. The result of using the actual RGL frame size (excluding the padding) is that the storage mode will not record/save the unnecessary padding. The number of RGL frames in the RTP payload MUST be a non-zero positive integer number (no fractional RGL frames) and this number MUST be chosen such that the resulting IP/UDP/RTP packet does not exceed the MTU (no packet fragmentation). Ramalho Expires December 30, 2003 [Page 12] Internet-Draft RTP Payload Format for RGL Codec July 2003 It is sometimes useful or desired in some RTP relay applications (or other re-transmission contexts) to "relay" the fact that an expected RTP payload was not received (e.g., recieved in error and dropped or not received at all). In these environments, the sending of a "null payload frame" to the far-end is desireable (e.g., some TDM system interworking systems in which the "information payload" does not contain timestamp information). To enable the relay of RGL frames that were not received, an "erasure frame" is defined by setting the corresponding "RGL_Size_j" parameter to zero and the "Num_Samps_j" parameter to the number of samples of the expected, but not received, RGL frame. The RGL "erasure" frame specified for use in this packetization format is truly zero bytes long (i.e., no RGL frame). In this way, gaps in a reveiced RTP stream may be represented for either these "RTP relay" environments or for the storage mode file format described in the next section. If this erasure frame mechanism is employed to relay or convey "erasure intervals" and the number of corresponding G.711 samples contained in the erasure interval exceeds 250 samples, then multiple RGL erasure frames MUST be produced such that each erasure frame has 250 or less corresponding G.711 samples. This will usually occur naturally, as real-time applications will usually have RGL frames corresponding to 250 or less G.711 samples (31.25 milliseconds of 8 kHz sampled G.711) and a singular "erasure frame" for each of these RGL frames can be produced. 4.3 Decoding Type One and Type Two RGL Payload Formats The following algorithm MAY be used for the RTP decoding of a RGL RTP packet. Step One: If the first byte of RTP payload is of the form {XXX11110} where{XXX} != {000} (a "reserved" RGL code is present), go to STEP TWO. Otherwise use Type One packetization to decode the RGL frame. Step Two: If the first byte of the RTP payload is 0xFE, use Type Two packetization to decode the multiple RGL frames present in the RTP payload. Otherwise continue to Step Three. Step Three: A "reserved code" to which a packetization has not been defined by this profile has been found. Do not further process the RTP payload (i.e., do not present it to the RGL decoder). The reserve codes other than 0xFE (i.e., 0x3E, 0x5E, 0x7E, 0x9E, 0xBE Ramalho Expires December 30, 2003 [Page 13] Internet-Draft RTP Payload Format for RGL Codec July 2003 and 0xDE) are reserved for future use in defining additional RGL packetization formats or for other future purposes. Step Three provides a backwardly compatible mechanism for the RGL RTP decoder (dropping the packets) when encountering yet-to-be-defined "reserved RGL codes". Ramalho Expires December 30, 2003 [Page 14] Internet-Draft RTP Payload Format for RGL Codec July 2003 5. RGL Storage Mode Storage mode is defined here for the purposes of storing RGL compressed frames in a file format for later replay (e.g., voice mail applications). The storage mode default sampling rate is 8 kHz sampled audio (SDP rate = 8000), as the overwhelming application of G.711 PCM is in 8 kHz sampled environments. If a rate other than 8 kHz is used, this rate MUST be conveyed to the receiver(s) of the storage mode file by means other than elements in the storage mode file itself. The storage mode format designed has taken into account two "storage mode use cases" for RGL. The first use case is simply to "store" RGL frames from received RTP packets. Like most other AVT codecs, this use case simply stores the RGL frames (together with their RGL_Size and Num_Samp parameters) as the RTP packets are received. The second use case considers a "non-real time" use of RGL that compresses arbitrarily long G.711 frames (i.e., longer than 31.25 milliseconds assumed in the Type Two packetization above). As mentioned previously and in RGL documents at www.vovida.org [17], active speech is efficiently compresed using frame sizes on the order of 10 through 30 milliseconds. However, RGL can efficiently compress long periods of silence or background noise using much longer intervals. For non real-time enviornments, a RGL encoding strategy can be optimized over relatively large numbers of samples to take advantage of changes in the instantaneous signal power - the result produces RGL frames of varying lengths in order to compress the overall signal more effectively. The storage format is similar to a "Type Two" RTP packetization format baove in that it requires both RGL_Size and Num_Samps parameters per each RGL frame in the file. The difference is that these parameters are pre-appended in front of each RGL frame to form a "RGL Frame Block". The storage mode file consists of a series of concatenated "RGL Frame Blocks". Consistent with the RTP payload format definitions, the oldest RGL Frame Block MUST occur first. At the beginning of the file there is a a "magic number" header which is pre-appended to the series of RGL Frame Blocks. |---------------------| | Magic Number Header | |---------------------| | RGL Frame Block #1 | |---------------------| | RGL Frame Block #2 | Ramalho Expires December 30, 2003 [Page 15] Internet-Draft RTP Payload Format for RGL Codec July 2003 |---------------------| \ / / \ \ / |---------------------| | RGL Frame Block #N | |---------------------| RGL Storage Mode Format The file format begins with a header that incorporates a magic number to identify that this file contains either RGLA (RGL A-law encoded) or RGLU (RGL mu-law encoded) data. The magic number for a RGLA file MUST correspond to the ASCII character string "#!RGLA\n", or "0x23 0x21 0x52 0x47 0x4C 0x41 0x0A" in hexadecimal form. The magic number for a RGLU file MUST correspond to the ASCII character string "#!RGLU\n", or "0x23 0x21 0x52 0x47 0x4C 0x55 0x0A" in hexadecimal form. For applications storing received RTP packets RGL_Size and Num_Samps parameters must be generated for each RGL frame. Note that Type Two RTP packetizations have RGL_Size and Num_Samps parameters for each RGL frame; they are simply re-used in this format. Type One RTP packetizations did not have RGL_Size or Num_Samps parameters sent in the RTP payload; therefore recievers receiving Type One RTP packetizations must generate these parameters when storing their RGL frames in this storage format. For this case, a Num_Samps parameter for the RGL frame MUST be generated and is calculated from the SDP "ptime" parameter, the default "ptime" (if ptime was not sent in the SDP) or the equivalent information in non-SDP environments. Additionally, a RGL_Size parameter MUST be generated and SHOULD be set to the size of the RGL payload (in bytes). Note that Type One RTP may add padding at the end of the RGL frame; thus the RTP payload may be larger than the RGL frame itself. If this optional padding was added, the padding of RGL frames in Type One RTP payloads will be stored in the storage format. Setting the RGL_Size in this manner doesn't require inspection of the RGL frame itself to determine it's "true length" if padding was employed. For applications directly storing RGL encoder output, the Num_Samps and RGL_Size parameters MUST be set to the number of G.711 samples represented in the RGL frame and the (exact) size of the RGL frame (in bytes), respectively. The RGL Frame Block for RGL frames which represent less than 251 G.711 samples can use the block format defined immediately below. This frame RGL frame block is denoted as a "Type One RGL Frame Ramalho Expires December 30, 2003 [Page 16] Internet-Draft RTP Payload Format for RGL Codec July 2003 Block". |--------------------| | RGL_Size (uchar8) | |--------------------| | Num_Samps (uchar8) | |--------------------| \ / / RGL Frame \ \ / |--------------------| Type One RGL Frame Block The RGL_Size parameter in this frame block format is defined consistent with "Type Two RTP" packetizations and values between 252-255 (0xFC, 0xFD, 0xFE and 0xFF) are reserved. To accommodate "RGL Frame Blocks" representing larger than 250 G.711 samples, the RGL_Size and Num_Samps parameters are extended to be 16 bit unsigned integers (uchar16) and a byte with value of 255 (0xFF) is placed before the RGL_Size parameter, as illustrated below. This RGL frame block is denoted as a "Type Two RGL Frame Block". |----------| | 0xFF | |---------------------| | RGL_Size (uchar16) | |---------------------| | Num_Samps (uchar16) | |---------------------| \ / / RGL Frame \ \ / |---------------------| Type Two RGL Frame Block To determine which type of RGL Frame Block is being used, a simple heuristic is employed. If the first byte of a RGL Frame Block is not 0xFF, then the Type One RGL Frame Block is in use. Note that for RGL frames of less than 251 G.711 samples cannot have RGL_Size of value of 255 (0xFF). If the first byte of a RGL Frame Block is 0xFF, then the Type Two RGL Frame Block is in use. Type One RGL Frame Blocks SHOULD be used for for RGL frames that Ramalho Expires December 30, 2003 [Page 17] Internet-Draft RTP Payload Format for RGL Codec July 2003 represent less than 251 G.711 samples, as it is more efficient than Type Two RGL Frame Blocks for these RGL frames. The maximum number of samples for RGL frames that can be represented with Type Two Frame Blocks is 65534 (2^16-2). Therefore, the maximum possible size of a RGL frame is 65535 (2^16-1). 65534 G.711 samples corresponds to roughly 8.1 seconds of 8 kHz sampled G.711. This is a reasonable upper bound on the size of RGL encodings, as RGL coding overheads are neglible over such a large number of input samples (see RGL codec specifications). Therefore, RGL frames intended for storage in storage format MUST represent 65534 samples of G.711 or less. It is RECOMMENDED that RGL encodings from optimized RGL encoders operate on at most 8 second long segments of 8 kHz sampled G.711 input (i.e., 64000 G.711 samples or less). The last consideration to note is when this storage mode format is used to record received RGL RTP payloads. When an expected RTP payload doesn't arrive, a RGL frame (or frames) must be created in place of the missing information. By using the information from the RTP packet sequence number, time stamp and the M bit, the receiver can detect missing codec frames from packet loss and/or silence suppression and generate either "packet loss concealment" (PLC) frames or "erasure frames". If PLC frames are generated, the PLC frames must be generated (usually in linear or G.711 format), then encoded in RGL format before they are storaged as RGL frames (in corresponding RGL Frame Blocks) and they MUST represent exactly the segment of time of the missing information. Note that if this missing information spans more than 65534 G.711 samples then multiple PLC frames MUST be produced such that each RGL frame produced is less than 65535 bytes. Erasure frames are simply frames that imply that information is missing for the number of samples represented in the erasure frame. Erasure frames MUST represent exactly the segment of time of the missing information. Note that is this missing information spans more than 65535 samples then multiple erasure frames MUST be produced such that each RGL erasure frame produced has less than 65536 samples. Lastly, note that "erasure frames" are identified by a RGL_Size parameter of value zero for both RGL frame block definitions. Ramalho Expires December 30, 2003 [Page 18] Internet-Draft RTP Payload Format for RGL Codec July 2003 6. IANA Considerations Two new MIME sub-types as described in this section are to be registered by the IANA. The RGL codec operating in mu-law input environments is named RGLU and operating in A-law environments is named RGLA. In this way it is consistent with existing G.711 PCMU and PCMA IANA registrations. The MIME names for the RGLU and RGLA are to be allocated from the IETF tree since these two codecs are expected to be widely used for Voice-over-IP applications. 6.1 MIME Registration for RGLU MIME media type name: audio MIME media subtype name: RGLU Required parameters: rate The RTP timestamp clock rate, which is equal to the sampling rate. The typical rate is 8000, but other rates may be specified. Optional parameters: The following parameter apply to RTP transfer only. ptime: Defined as usual for RTP audio (see RFC 2327). Encoding considerations: This type is defined for transfer of RGLU data via RTP using the payload format specified in Section 4 of this document. It is also defined for other transfer methods using the storage format specified in Section 5 of this document. Security considerations: See Section 8 "Security Considerations" of this document. Public specification: RGL Codec is described and fully specified in draft-ramalho-rgl-desc-01.txt and documented fully at www.vovida.org. Additional information: The following information applies to storage format only. Magic number: ASCII character string "#!RGLU\n", or "0x23 0x21 0x52 0x47 0x4C 0x55 0x0A" in hexadecimal form. Ramalho Expires December 30, 2003 [Page 19] Internet-Draft RTP Payload Format for RGL Codec July 2003 File extensions: rla (stands for "RGL codec, A-Law") Macintosh file type code: none Object identifier or OID: none Intended usage: COMMON. Person & email address to contact for further information: Michael A. Ramalho mramalho@cisco.com or mar42@cornell.edu Author/Change controller: Author: Michael A. Ramalho, mramalho@cisco.com or mar42@cornell.edu Change Controller: IETF Audio/Video Transport Working Group 6.2 MIME Registration for RGLA MIME media type name: audio MIME media subtype name: RGLA Required parameters: rate The RTP timestamp clock rate, which is equal to the sampling rate. The typical rate is 8000, but other rates may be specified. Optional parameters: The following parameter apply to RTP transfer only. ptime: Defined as usual for RTP audio (see RFC 2327). Encoding considerations: This type is defined for transfer of RGLA data via RTP using the payload format specified in Section 4 of this document. It is also defined for other transfer methods using the storage format specified in Section 5 of this document. Security considerations: See Section 8 "Security Considerations" of this document. Public specification: RGL Codec is described and fully specified in draft-ramalho-rgl-desc-01.txt and documented fully at www.vovida.org. Ramalho Expires December 30, 2003 [Page 20] Internet-Draft RTP Payload Format for RGL Codec July 2003 Additional information: The following information applies to storage format only. Magic number: ASCII character string "#!RGLU\n", or "0x23 0x21 0x52 0x47 0x4C 0x41 0x0A" in hexadecimal form. File extensions: rlu (stands for "RGL codec, mu-Law") Macintosh file type code: none Object identifier or OID: none Intended usage: COMMON. Person & email address to contact for further information: Michael A. Ramalho mramalho@cisco.com or mar42@cornell.edu Author/Change controller: Author: Michael A. Ramalho, mramalho@cisco.com or mar42@cornell.edu Change Controller: IETF Audio/Video Transport Working Group Ramalho Expires December 30, 2003 [Page 21] Internet-Draft RTP Payload Format for RGL Codec July 2003 7. SDP Issues Parameters are mapped to SDP [9] in a standard way. When conveying information by SDP, the encoding name SHALL be "RGLU" for mu-law encodings and "RGLA" for A-law encodings (the same as the MIME subtype and made similar to PCMU/PCMA). Additionally, as the RGL codec is not a defined SDP [9] static codec type, it must use a SDP dynamic payload type and be referenced via an SDP rtpmap attribute. If a ptime other than the default is desired, ptime MUST be specified in the SDP. Putting this together, we have an example SDP of: m=audio 49232 RTP/AVT 94 94=rtpmap:94 RGLU/8000 a=ptime:10 for mu-law (PCMU) encodings and m=audio 49232 RTP/AVT 94 94=rtpmap:94 RGLA/8000 a=ptime:10 for A-law (PCMA) encodings. In these examples the dynamic payload type of 94 is used (as an example), the sampling rate is conveyed via the rate parameter, and the optional ptime parameter is set to 10 milliseconds. Thus, the RGL frame(s) in a particular RTP payload would represent exactly 80 G.711 samples. As noted previously, the number of G.711 samples represented in each RGL frame in a Type Two RGL RTP payload is contained in the payload itself (see Section 4.2 (Section 4.2)). However, as noted in Section 4.2 (Section 4.2), the setting of the "Num_Samps" parameters in Type Two packitizations MUST be consistent with ptime in SDP environments. If the ptime line is not specified in the SDP, then the default ptime is used. The default ptime for the RGLU or RGLA codec is defined here to be 20 milliseconds, thereby making it consistent with the default "packetized audio" ptime parameter in RFC 1889 [7]. However, it is RECOMMENDED that ptime be set such that each RGL frame compresses approximately 10 milliseconds of speech (see RGL Codec Whitepaper at www.vovida.org [18] for rationale) or a more appropriate value for interworking with existing or legacy PSTN/GSTN endpoints (e.g., 5.5 milliseconds for ATMVoAAL2). Ramalho Expires December 30, 2003 [Page 22] Internet-Draft RTP Payload Format for RGL Codec July 2003 8. Security Considerations RTP packets using the payload format defined in this specification are subject to the general security considerations discussed in [7] and any appropriate profile (e.g.[8]). As this format transports encoded speech, the main security issues include confidentiality and authentication of the speech itself. The payload format itself does not have any built-in security mechanisms. Confidentiality of the media streams is achieved by encryption, therefore external mechanisms, such as SRTP [10], MAY be used for that purpose. The data compression used with this payload format is applied end-to-end; hence encryption may be performed after compression with no conflict between the two operations. Note also that the RGL payload format is self-describing; if padding of the RGL payload is required by the encryption operation, the decoding of the RGL payload can occur at the far-end without knowledge of the amount of padding applied. A potential Denial-Of-Service (DOS) threat exists for data encoding using compression techniques that have non-uniform receiver-end computational load. The attacker can inject pathological datagrams into the stream which are complex to decode (e.g., inject hard or impossible inverse root-finding situations) and cause the receiver to become overloaded. The RGL codec, due to its trivial complexity, has bounded receiver-end load for any "bogus RGL" compressed frames and thus does not suffer from this fate. The only known DOS attack is simply a stream of more frames than the RTP/DSP flow can accommodate. Ramalho Expires December 30, 2003 [Page 23] Internet-Draft RTP Payload Format for RGL Codec July 2003 Normative References [1] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [3] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 2434, October 1998. [4] Ramalho, M., "RGL Codec Description Document", draft-ramalho-rgl-desc-01.txt (work in progress), February 2003. [5] Li, A., "RTP Payload Format for Enhanced Variable Rate Codecs (EVRC) and Selectable Mode Vocoders (SMV)", RFC 3558, June 2003. [6] Casner, S. and P. Hoschka, "MIME Type Registration of RTP Payload Formats", RFC 3555, July 2003. [7] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", RFC 1889, January 1996. [8] Schulzrinne, H., "RTP Profile for Audio and Video Conferences with Minimal Control", RFC 1890, January 1996. [9] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998. [10] Baugher, M., Blom, R., Carrara, E., McGrew, D., Naslund, M., Noorman, K. and D. Oran, "The Secure Real-Time Transport Protocol", draft-ietf-avt-srtp-09.txt (work in progress), July 2003. [11] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video Conferences with Minimal Control", RFC 3551, July 2003. [12] [13] [14] [15] Ramalho Expires December 30, 2003 [Page 24] Internet-Draft RTP Payload Format for RGL Codec July 2003 [16] [17] [18] Author's Address Michael A. Ramalho Cisco Systems, Inc. 1802 Rue de la Porte Wall Township, NJ 07719-3784 USA Phone: +1.941.708.4650 EMail: mramalho@cisco.com Ramalho Expires December 30, 2003 [Page 25] Internet-Draft RTP Payload Format for RGL Codec July 2003 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. Full Copyright Statement Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assignees. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION Ramalho Expires December 30, 2003 [Page 26] Internet-Draft RTP Payload Format for RGL Codec July 2003 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Ramalho Expires December 30, 2003 [Page 27]