Audio/Video Transport (avt) Internet Draft H. Schulzrinne Document: draft-ietf-avt-rfc2833bis-04.txt Columbia U. S. Petrack eDial Expires: July 2004 February 2004 RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals Status of this Memo This document is an Internet-Draft and is subject to all provisions of Section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This memo describes how to carry dual-tone multifrequency (DTMF) signaling, other tone signals and telephony events in RTP packets. This memo preserves the content standardized by RFC 2833, but clarifies its use through reorganization of the text, addition of tutorial content, and addition of normative text describing the detailed application of the content. Conventions used in this document In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in RFC 2119 [N-1] and indicate requirement levels for compliant implementations. Normative references appear as [N-n], while informative references appear as [I-n]. All references are at the end of this memo. Schulzrinne, Petrack Expires - July 2004 [Page 1] RTP Events and Tones Payloads February 2004 Table of Contents 1. Introduction................................................4 1.1 Terminology..............................................4 1.2 Overview.................................................4 1.3 Potential Applications...................................4 1.4 Events, States, Tone Patterns, and Voice Encoded Tones...6 2. RTP Payload Format for Named Telephone Events...............7 2.1 Introduction.............................................7 2.2 Use of RTP Header Fields.................................7 2.2.1 Timestamp.............................................7 2.2.2 Marker Bit............................................8 2.3 Payload Format...........................................8 2.3.1 Event Field...........................................8 2.3.2 E ("End") Bit.........................................8 2.3.3 R Bit.................................................8 2.3.4 Volume Field..........................................8 2.3.5 Duration Field........................................9 2.4 Optional MIME Parameters.................................9 2.4.1 Relationship to SDP..................................10 2.5 Procedures..............................................10 2.5.1 Sending Procedures...................................10 2.5.1.1 Negotiation of Payloads...........................10 2.5.1.2 Transmission of Event Packets.....................11 2.5.1.3 Long Duration Events..............................12 2.5.1.4 Retransmission of Final Packet....................12 2.5.1.5 Packing Multiple Events Into One Packet...........12 2.5.1.6 RTP Sequence Number...............................13 2.5.2 Receiving Procedures.................................13 2.5.2.1 Indication of Receiver Capabilities using SDP.....13 2.5.2.2 Playout of Tone Events playout....................13 2.5.2.3 Long Duration Events..............................15 2.5.2.3 Multiple Events In a Packet.......................15 2.5.2.4 Soft States.......................................15 2.6 Reliability.............................................15 2.6.1 Intra-Event Updates..................................16 2.6.2 Multi-Event Redundancy...............................16 3. Specification of Codepoints For Telephone Events..............17 3.1 DTMF Events.............................................17 3.2 Data Modem and Fax Events...............................19 3.2.1 V.8bis Events........................................20 3.2.2 V.21 Events..........................................25 3.2.3 V.8 Events...........................................26 3.2.4 V.25 Events..........................................28 3.2.5 T.30 Events..........................................30 3.2.6 V.18 Events..........................................33 3.3 Basic Subscriber Line Events............................38 Schulzrinne, Petrack Expires - July 2004 [Page 2] RTP Events and Tones Payloads February 2004 3.4 Extended Subscriber Line Events.........................43 3.5 Trunk Events............................................47 3.5.1 Signalling System No. 5..............................48 3.5.2 North American R1....................................51 3.5.3 MFC R2 signaling.....................................52 3.5.4 ABCD Transitional Signaling For Digital Trunks.......54 3.5.5 Continuity Tones.....................................55 3.5.6 Trunk Unavailable Event..............................56 4. RTP Payload Format for Telephony Tones........................56 4.1 Introduction............................................56 4.2 Examples of Common Telephone Tone Signals...............57 4.3 Use of RTP Header Fields................................58 4.3.1 Timestamp............................................58 4.3.2 Marker Bit...........................................58 4.3.3 Payload Format.......................................59 4.3.4 Optional MIME Parameters.............................60 4.4 Procedures..............................................60 4.4.1 Sending Procedures...................................61 4.4.2 Receiving Procedures.................................62 5. Application Considerations....................................62 5.1 Combining Tones and Named Events........................62 5.2 Simultaneous Generation of Audio and Events.............62 5.3 Strategies For Handling FAX and Modem Signals...........63 5.4 Examples................................................64 5.4.1 Use of RFC 2198 Redundancy With Named Events.........64 5.4.2 Combined Tone and Telephone-event Payloads...........66 6. MIME Registration.............................................68 6.1 audio/telephone-event...................................68 6.2 audio/tone..............................................69 7. Security Considerations.......................................70 8. IANA Considerations...........................................70 9. Changes Since RFC 2833........................................70 10. Acknowledgements...........................................71 11. Authors ..................................................72 12. References.................................................72 12.1 Normative References....................................72 12.2 Informative References..................................75 Schulzrinne, Petrack Expires - July 2004 [Page 3] RTP Events and Tones Payloads February 2004 1. Introduction 1.1 Terminology This document uses the following abbreviations: DTMF Dual Tone Multifrequency IVR Integrated Voice Response unit PSTN Public Switched (circuit) Telephone Network 1.2 Overview This memo defines two RTP [N-4] payload formats, one for carrying dual-tone multifrequency (DTMF) digits and other line and trunk signals as events (section 2), and a second one to describe general multi-frequency tones in terms only of their frequency and cadence (section 4). Separate RTP payload formats for telephony tone signals are desirable since low-rate voice codecs cannot be guaranteed to reproduce these tone signals accurately enough for automatic recognition. In addition, tone properties such as the phase reversals in the ANSam tone will not survive speech coding. Defining separate payload formats also permits higher redundancy while maintaining a low bit rate. Finally, some telephony events such as "on-hook" occur out-of-band and cannot be transmitted as tones. The remainder of this section provides the motivation for defining the payload types described in this document. Section 2 defines the payload format and associated procedures for use of named events. Section 3 describes the events for which codepoints are defined in this document. Section 4 describes the payload format and associated procedures for tone representations. Section 5 deals with achievement of reliable delivery through redundancy and the use of combined payloads. Section 6 provides the MIME media type registrations for the two payload formats, and also defines the IANA requirements for registration of codepoints for named telephone events. Section 7 deals with security considerations. 1.3 Potential Applications The payload formats described here may be useful in a number of different scenarios. On the sending side, there are two basic possibilities: either the sending side is an end system which originates the signals itself, or it is a gateway with the task of propagating incoming telephone signals into the Internet. Schulzrinne, Petrack Expires - July 2004 [Page 4] RTP Events and Tones Payloads February 2004 On the receiving side there are more possibilities. The first is that the receiver must propagate tone signalling accurately into the PSTN for machine consumption. One example of this is a gateway passing DTMF tones to an IVR. In this scenario, frequencies, amplitudes, tone durations, and the durations of pauses between tones are all significant, and individual tone signals must be delivered reliably and in order. In the second scenario, the receiver must play out tones for human consumption. Typically, rather than a series of tone signals each with its own meaning, the content will consist of a single sequence of tones and possibly silence, played out continuously or repeated cyclically for some period of time. Often the end of the tone playout will be triggered by an event fed back in the other direction, using either in- or out-of-band means. Examples of this are dial tone or busy tone. The relationship between locality and the tones to be played out is a complicating factor in this scenario. In the phone network, tones are generated at different places, depending on the switching technology and the nature of the tone. This determines, for example, whether a person making a call to a foreign country hears her local tones she is familiar with or the tones as used in the country called. For analog lines, dial tone is always generated by the local switch. ISDN terminals may generate dial tone locally and then send a Q.931 [I-7] SETUP message containing the dialed digits. If the terminal just sends a SETUP message without any Called Party digits, then the switch does digit collection, provided by the terminal as KEYPAD messages, and provides dial tone over the B-channel. The terminal can either use the audio signal on the B-channel or can use the Q.931 messages to trigger locally generated dial tone. Ringing tone (also called ringback tone) is generated by the local switch at the callee, with a one-way voice path opened up as soon as the callee's phone rings. (This reduces the chance of clipping the called party's response just after answer. It also permits pre-answer announcements or in-band call-progress indications to reach the caller before or in lieu of a ringing tone.) Congestion tone and special information tones can be generated by any of the switches along the way, and may be generated by the caller's switch based on ISUP messages received. Busy tone is generated by the caller's switch, triggered by the appropriate ISUP message, for analog instruments, or the ISDN terminal. In the third scenario, an end system is directly connected to the Internet and does not need to generate tone signals again, so that time alignment and power levels are not relevant. These systems rely Schulzrinne, Petrack Expires - July 2004 [Page 5] RTP Events and Tones Payloads February 2004 on PSTN gateways or Internet end systems to generate DTMF events and do not perform their own audio waveform analysis. An example of such a system is an Internet interactive voice-response (IVR) system. In circumstances where exact timing alignment between the audio stream and the DTMF digits or other events is not important and data is sent unicast, such as the IVR example mentioned earlier, it may be preferable to use a reliable control protocol rather than RTP packets. In those circumstances, this payload format would not be used. Note that in a number of these cases it is possible that the gateway or end system will be both a sender and receiver of telephone signals. Sometimes the same class of signals will be sent as received -- in the case of "RTP trunking" or voiceband data, for instance. In other cases, such as that of an end system serving analogue lines, the signals sent will be in a different class from those received. 1.4 Events, States, Tone Patterns, and Voice Encoded Tones This document provides the means for in-band transport over the Internet of two broad classes of signalling information: in-band tones or tone sequences, and signals sent out-of-band in the PSTN. Three methods, two of which are defined by this document, are available for carrying tone signals; only one of the three can be used to carry out-of-band PSTN signals. Depending on the application, it may be desirable to carry the signalling information in more than one form at once. Section 5 discusses when and how this should be done. 1) The gateway or end system can upspeed to a higher-bandwidth codec such as G.711 [I-3] when tone signals are to be conveyed. Alternatively, for FAX or modem signals respectively, a specialized transport such as T.38 [I-8], RFC 2793 [I-1], or V.150.1 modem relay [I-19] may be used. 2) The sending gateway can simply measure the frequency components of the voice band signals and transmit this information to the RTP receiver using the tone representation defined in this document (section 4). In this mode, the gateway makes no attempt to discern the meaning of the tones, but simply distinguishes tones from speech signals. An end system may use the same approach using configured rather than measured frequencies. All tone signals in use in the PSTN and meant for human consumption are sequences of simple combinations of sine waves, either added or modulated. (There is at least one tone, however, Schulzrinne, Petrack Expires - July 2004 [Page 6] RTP Events and Tones Payloads February 2004 the ANSam tone [N-20] used for indicating data transmission over voice lines, that makes use of periodic phase reversals.) 3) As a third option, a gateway can recognize the tones and translate them into a name, such as ringing or busy tone or DTMF digit '0' (section 2). The receiver then produces a tone signal or other indication appropriate to the signal. Generally, since the recognition of signals at the sender often depends on their on/off pattern or the sequence of several tones, this recognition can take several seconds. On the other hand, the gateway may have access to the actual signaling information that generates the tones and thus can generate the RTP packet immediately, without the detour through acoustic signals. The use of named events is the only feasible method for transmitting out-of-band PSTN signals as content within RTP sessions. 2. RTP Payload Format for Named Telephone Events 2.1 Introduction The RTP payload format for named telephone events is designated as "telephone-event", the MIME type as "audio/telephone-event". In accordance with current practice, this payload format does not have a static payload type number, but uses a RTP payload type number established dynamically and out-of-band. The default clock frequency is 8000 Hz, but the clock frequency can be redefined when assigning the dynamic payload type. Named telephone events are carried as part of the audio stream, and MUST use the same sequence number and time-stamp base as the regular audio channel to simplify the generation of audio waveforms at a gateway. The named telephone events payload type can be considered to be a very highly-compressed audio codec, and is treated the same as other codecs. 2.2 Use of RTP Header Fields 2.2.1 Timestamp The RTP timestamp reflects the measurement point for the current packet. The event duration described in section 2.5 extends forwards from that time. For events that span multiple RTP packets, the RTP timestamp identifies the beginning of the event, i.e., several RTP packets may carry the same timestamp. For long-lasting events that Schulzrinne, Petrack Expires - July 2004 [Page 7] RTP Events and Tones Payloads February 2004 have to be split into subevents (see below, section 2.5.1.3), the timestamp indicates the beginning of the subevent. 2.2.2 Marker Bit The RTP marker bit indicates the beginning of a new event. For long- lasting events that have to be split into subevents (see below, section 2.5.1.3), only the first subevent will have the marker bit set. 2.3 Payload Format The payload format for named telephone events is shown in Figure 1. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | event |E|R| volume | duration | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1: Payload Format for Named Events 2.3.1 Event Field The event field is a number between 0 and 255 identifying a specific telephony event. An IANA registry of codepoints for this field has been established (see IANA Considerations, section 8). The initial content of this registry consists of the events defined in section 3. 2.3.2 E ("End") Bit If set to a value of one, the "end" bit indicates that this packet contains the end of the event. For long-lasting events that have to be split into subevents (see below, section 2.5.1.3), only the final packet for the final subevent will have the "E" bit set. 2.3.3 R Bit This field is reserved for future use. The sender MUST set it to zero, the receiver MUST ignore it. 2.3.4 Volume Field For DTMF digits and other events representable as tones, this field describes the power level of the tone, expressed in dBm0 after dropping the sign. Power levels range from 0 to -63 dBm0. Thus, larger values denote lower volume. This value is defined only for events for which the documentation indicates that volume is Schulzrinne, Petrack Expires - July 2004 [Page 8] RTP Events and Tones Payloads February 2004 applicable. For other events, the sender MUST set volume to zero and the receiver MUST ignore the value. 2.3.5 Duration Field The duration field indicates the duration of the event or subevent being reported, in timestamp units, expressed as an unsigned integer. For a non-zero value, the event or subevent began at the instant identified by the RTP timestamp and has so far lasted as long as indicated by this parameter. The event may or may not have ended. If the event duration exceeds the maximum representable by the duration field, the event is split into several contiguous subevents as described below (section 2.5.1.3). The special duration value of zero is reserved to indicate that the event lasts "forever", i.e., is a state and is considered to be effective until updated. A sender MUST NOT transmit a zero duration for events other than those defined as states. The receiver SHOULD ignore an event report with zero duration if the event is not a state. Events defined as states MAY contain a non-zero duration, indicating that the sender intends to refresh the state before the time duration has elapsed ("soft state"). For a sampling rate of 8000 Hz, the duration field is sufficient to express event durations of up to approximately 8 seconds. 2.4 Optional MIME Parameters As indicated in the MIME registration for named events in section 6.1, the telephone-event MIME type supports two optional parameters: the "events" parameter, and the "rate" parameter. The "events" parameter lists the events supported by the implementation. Events are listed as one or more comma-separated elements. Each element can either be a single integer or two integers separated by a hyphen, representing a range of consecutive event codepoints. No white space is allowed in the argument. The integers designate the event numbers supported by the implementation. The "rate" parameter describes the sampling rate, in Hertz, and hence the units for the RTP timestamp and event duration fields. The number is written as a floating point number or as an integer. If omitted, the default value is 8000 Hz. Schulzrinne, Petrack Expires - July 2004 [Page 9] RTP Events and Tones Payloads February 2004 2.4.1 Relationship to SDP The recommended mapping of MIME optional parameters to SDP is given in section 3 of RFC 3555 [N-6]. The "rate" MIME parameter for the named event payload type follows this convention: it is expressed as usual as the component of the a=rtpmap: attribute line. The "events" MIME parameter deviates from the convention suggested in RFC 3555 because it omits the string "events=" before the list of supported events. a=fmtp: The list of values has the format described above for the MIME parameter. The list does not have to be sorted. For example, if the payload format uses the payload type number 100, and the implementation can handle the DTMF tones (events 0 through 15) and the dial and ringing tones, it would include the following description in its SDP message: m=audio 12345 RTP/AVP 100 a=rtpmap:100 telephone-event/8000 a=fmtp:100 0-15,66,70 The following sample media type definition corresponds to the SDP example above: audio/telephone-event;events="0-15,66,67";rate="8000" 2.5 Procedures This section defines the procedures associated with the named event payload type. Additional procedures may be specified in the documentation associated with specific event codepoints. 2.5.1 Sending Procedures 2.5.1.1 Negotiation of Payloads Negotiation of payloads between sender and receiver is achieved by out-of-band means, using SDP, for example. The sender SHOULD indicate what events it supports, using the optional "events" parameter associated with the telephone-events MIME type. If the sender receives an "events" parameter from the receiver, it MUST restrict the set of events it sends to those listed in the received "events" parameter. Schulzrinne, Petrack Expires - July 2004 [Page 10] RTP Events and Tones Payloads February 2004 2.5.1.2 Transmission of Event Packets DTMF digits and named telephone events are carried as part of the audio stream, and MUST use the same sequence number and time-stamp base as the regular audio channel to simplify the generation of audio waveforms at a gateway. An audio source SHOULD start transmitting event packets as soon as it recognizes an event, and continue to send updates until the event has ended. The update packet MUST have the same RTP timestamp value as the initial packet for the event, but the duration MUST be increased to reflect the total cumulative duration since the beginning of the event. The first packet for an event MUST have the "M" bit set. The final packet for an event MUST have the "E" bit set, but setting of the "E" bit MAY be deferred until the final packet is retransmitted (see section sec:retrfin). Intermediate packets for an event MUST NOT have either the "M" bit or the "E" bit set. Sending of a packet with the "E" bit set is OPTIONAL if the packet reports two events which are defined as mutually exclusive states, or if the final packet for one state is immediately followed by a packet reporting a mutually exclusive state. (For events defined as states, the appearance of a mutually exclusive state implies the end of the previous state.) [Is this exception really worth the bother?] A source has wide latitude as to how often it sends event updates. A natural interval is the spacing between non-event audio packets. (Recall that a single RTP packet can contain multiple audio frames for frame-based codecs and that the packet interval can vary during a session.) Alternatively, a source MAY decide to use a different spacing for event updates, called an event period, with a value of 50 ms RECOMMENDED. DTMF digits and events are sent incrementally to avoid having the receiver wait for the completion of the event. Since some tones are two seconds long, this would incur a substantial delay. The transmitter does not know if event length is important and thus needs to transmit immediately and incrementally. If the receiver application does not care about event length, the incremental transmission mechanism avoids delay. Some applications, such as gateways into the PSTN, care about both delays and event duration. For robustness, the sender SHOULD retransmit "state" events periodically. Timing information is contained in the RTP timestamp, allowing precise recovery of inter-event times. Thus, the sender does not Schulzrinne, Petrack Expires - July 2004 [Page 11] RTP Events and Tones Payloads February 2004 need to maintain precise or consistent time intervals between event packets. 2.5.1.3 Long Duration Events If an event persists beyond the maximum duration expressible in the duration field (0xFFFF), the sender MUST send a packet reporting this maximum duration but MUST NOT set the "E" bit in this packet. The sender MUST then begin reporting a new "subevent" with the RTP timestamp set to the time at which the previous subevent ended and the duration set to the cumulative duration of the new subevent. The "M" bit of the first packet reporting the new subevent MUST NOT be set. The sender MUST repeat this procedure as required until the end of the complete event has been reached. The final packet for the complete event MUST have the "E" bit set (either on initial transmission or on retransmission as described below). 2.5.1.4 Retransmission of Final Packet The final packet for each event and for each subevent SHOULD be sent a total of three times at the interval used by the source for updates. (If a new event is recognized during the retransmissions and RFC 2198 [N-2] is in use, the old event will be part of the redundancy in the RFC 2198 payloads.) This ensures that the duration of the event or subevent can be recognized correctly even if an instance of the last packet is lost. A sender MAY delay setting the "E" bit until retransmitting the last packet for a tone, rather than setting the bit on its first transmission. This avoids having to wait to detect whether the tone has indeed ended. Once the sender has set the "E" bit for a packet, it MUST continue to set the "E" bit for any further retransmissions of that packet. 2.5.1.5 Packing Multiple Events Into One Packet Multiple named events can be packed into a single RTP packet if and only if the events are consecutive and contiguous, i.e., occur without overlap and without pause between them, and if the last event packed into a packet occurs quickly enough to avoid excessive delays at the receiver. This approach is similar to having multiple frames of frame-based audio in one RTP packet. The constraint that packed events not overlap implies that events designated as states can be followed in a packet only by other state events which are mutually exclusive to them. The constraint itself Schulzrinne, Petrack Expires - July 2004 [Page 12] RTP Events and Tones Payloads February 2004 is needed so that the beginning time of each event can be calculated at the receiver. In a packet containing events packed in this way, the RTP timestamp MUST identify the beginning of the first event or subevent in the packet. The "M" bit MUST be set (since the packet records the beginning of at least one event). The "E" bit and duration for each event in the packet MUST be set using the same rules as if that event were the only event contained in the packet. For events with a duration shorter than a typical packet interval, for example, V.21 bits (section 3.2.2), it is RECOMMENDED that multiple events are represented by a single RFC 2198 [N-2] packet, as described in section 5. 2.5.1.6 RTP Sequence Number The RTP sequence number MUST be incremented by one in each successive RTP packet sent. Incrementing applies to retransmitted as well as initial instances of event reports, to permit the receiver to detect lost packets for RTCP receiver reports. 2.5.2 Receiving Procedures 2.5.2.1 Indication of Receiver Capabilities using SDP Receivers can indicate which named events they can handle, for example, by using the Session Description Protocol (RFC 2327 [N-3]). SDP descriptions using the event payload MUST contain an fmtp format attribute that lists the event values that the receiver can process. 2.5.2.2 Playout of Tone Events playout In the gateway scenario, an Internet telephony gateway connecting a packet voice network to the PSTN recreates the DTMF or other tones and injects them into the PSTN. Since, for example, DTMF digit recognition takes several tens of milliseconds, the first few milliseconds of a digit will arrive as regular audio packets. Thus, careful time and power (volume) alignment between the audio samples and the events is needed to avoid generating spurious digits at the receiver. Note that regular audio packets may continue to arrive while named tone event reports are being received. This is likely to occur at the onset of a tone and is necessary to avoid possible errors in the interpretation of the reproduced tone at the remote end. Implementations supporting this payload format must be prepared to handle the overlap. It is RECOMMENDED that gateways only render the tone encoded in the named event reports since the audio may contain Schulzrinne, Petrack Expires - July 2004 [Page 13] RTP Events and Tones Payloads February 2004 spurious tones introduced by the audio compression algorithm. However, it is anticipated that these extra tones will probably not not interfere with recognition at the far end. Receiver implementations MAY use different algorithms to create tones, including the two described here. Note that not all implementations have the need to recreate a tone; some may only care about recognizing the events. In the first algorithm, the receiver simply places a tone of the given duration in the audio playout buffer at the location indicated by the timestamp. As additional packets are received that extend the same tone, the waveform in the playout buffer is extended accordingly. (Care has to be taken if audio is mixed, i.e., summed, in the playout buffer rather than simply copied.) Thus, if a packet in a tone lasting longer than the packet interarrival time gets lost and the playout delay is short, a gap in the tone may occur. Alternatively, the receiver can start a tone and play it until it receives a packet with the "E" bit set, the next tone, distinguished by a different timestamp value or a given time period elapses. This is more robust against packet loss, but may extend the tone beyond its original duration if all retransmissions of the last packet in an event are lost. Limiting the time period of extending the tone is necessary to avoid that a tone "gets stuck". This algorithm is not a license for senders to set the duration field to zero; it MUST be set to the current duration as described, since this is needed to create accurate events if the first event packet is lost, among other reasons. Regardless of the algorithm used, the tone SHOULD NOT be extended by more than three packet interarrival times. A slight extension of tone durations and shortening of pauses is generally harmless. If a receiver has extended a tone by the maximum extension duration and started playing silence, it MUST NOT resume playing the tone when later packets for that event arrive, as this would cause spurious events to be detected downstream. A receiver SHOULD NOT rely on a particular event packet spacing, but instead MUST use the event timestamps and durations to determine timing and duration of playout. The receiver MUST calculate jitter for RTCP receiver reports based on all packets with a given timestamp. Note: The jitter value should primarily be used as a means for comparing the reception quality between two users or two time-periods, not as an absolute measure. Schulzrinne, Petrack Expires - July 2004 [Page 14] RTP Events and Tones Payloads February 2004 If a zero volume is indicated for an event for which the volume field is defined, then the receiver MAY reconstruct the volume from the volume of non-event audio or MAY use the nominal value specified by the ITU Recommendation or other document defining the tone. This ensures backwards compatibility with RFC 2833, where the volume field was defined only for DTMF events. 2.5.2.3 Long Duration Events If an event report is received with duration equal to the maximum duration expressible in the duration field (0xFFFF) and the "E" bit for the report is not set, the event report may mark the end of a subevent generated according to the procedures of section 2.5.1.3. If another report for the same event type is received, the receiver MUST compare the RTP timestamp for the new event with the sum of the RTP timestamp of the previous report plus the duration (0xFFFF). The receiver uses the absence of a gap between the events to detect that it is receiving a single long-duration event. The total duration of a long duration event is (obviously) the sum of the durations of the subevents used to report it. This is equal to the duration of the final subevent (as indicated in the final packet for that subevent), plus 0xFFFF multiplied by the number of subevents preceding the final subevent. 2.5.2.3 Multiple Events In a Packet The procedures of section 2.5.1.5 require that if multiple events are reported in the same packet, they are contiguous and non-overlapping. As a result, it is not strictly necessary for the receiver to know the start times of the events following the first one in order to play them out -- it needs only to respect the duration reported for each event. Nevertheless, if knowledge of the start time for a given event after the first one is required, it is equal to the sum of the start time of the preceding event plus the duration of the preceding event. 2.5.2.4 Soft States If the duration of a soft state event expires, the receiver SHOULD consider the value of the state to be "unknown" unless otherwise indicated in the event documentation (e.g., in section 3). 2.6 Reliability The named event mechanism uses three complementary redundancy mechanisms to deal with lost packets: Schulzrinne, Petrack Expires - July 2004 [Page 15] RTP Events and Tones Payloads February 2004 Intra-event updates: Events that last longer than one event period (e.g., 50 ms) are updated periodically, so that the receiver can reconstruct the event and its duration if it receives any of the update packets, albeit with delay. This mechanism is described in section 2.6.1 and is most helpful for longer events. Repeat last event packet: As described in section 2.5.1.4, the last event packet is transmitted a total of three times if there is no subsequent event. This mechanism is applicable for widely-spaced events. Multi-event redundancy: Section 2.6.2 describes how a summary of earlier events MAY be carried in RFC 2198 redundancy payloads. This is particularly useful for sequences of short events, e.g., digits dialed by a modem or autodialer or in-band tone signaling sequences (section 3.2 or 3.5). 2.6.1 Intra-Event Updates During an event, the RTP event payload format provides incremental updates on the event. The error resiliency afforded by this mechanism depends on whether the first or second algorithm in section sec:playout is used and on the playout delay at the receiver. For example, if the receiver uses the first algorithm and only places the current duration of tone signal in the playout buffer, for a playout delay of 120 ms and a packet gap of 50 ms, two packets in a row can get lost without causing a premature end of the tone generated. 2.6.2 Multi-Event Redundancy The audio redundancy mechanism described in RFC 2198 [N-2] MAY be used to recover from packet loss across events. For the suggested packet gap of 50 ms, the effective data rate is r times 64 bits (32 bits for the redundancy header and 32 bits for the telephone-event payload) plus 8 bits for the primary encoding every 50 ms or (r times 1280 + 160) bits/second, where r is the number of redundant events carried in each packet. The value of r is an implementation trade- off, with a value of 5 suggested. The timestamp offset in this redundancy scheme has 14 bits, so that it allows a single packet to "cover" 2.048 seconds of telephone events at a sampling rate of 8000 Hz. Including the starting time of previous events allows precise reconstruction of the tone sequence at a gateway. The scheme is resilient to consecutive packet losses Schulzrinne, Petrack Expires - July 2004 [Page 16] RTP Events and Tones Payloads February 2004 spanning this interval of 2.048 seconds or $r$ digits, whichever is less. Note that for previous digits, only an average loudness can be represented. An encoder MAY treat the event payload as a highly-compressed version of the current audio frame. In that mode, each RTP packet during an event would contain the current audio codec rendition (say, G.723.1 [I-4] or G.729 [I-5] of this digit as well as the representation described in section 2, plus any previous events seen earlier. This approach allows dumb gateways that do not understand this format to function. See also the discussion in section 1. The payload format described here achieves a higher redundancy even in the case of sustained packet loss than the method proposed for the Voice over Frame Relay Implementation Agreement [I-20]. In short, senders generate updates at regular intervals, thus ensuring that each event is transmitted multiple times. RFC 2198 [N-2] is used to recover events where all packets sent during the event have been lost. 3. Specification of Codepoints For Telephone Events This document defines five classes of named events: 1) DTMF tones (section 3.1); 2) data and fax-related tones (section 3.2); 3) standard subscriber line tones and events (section 3.3); 4) additional subscriber line tones and events (section 3.4); and 5) trunk signalling events (section 3.5). The tables listing the event codepoints for each class indicate whether the respective events are states, tones, or other. For tone events, the tables indicate whether the volume field is applicable or must be set to 0. Notes to the tables indicate which states are mutually exclusive. 3.1 DTMF Events DTMF signalling [N-13] is typically generated by a telephone set or possibly by a PBX. DTMF digits may be consumed by entities such as gateways or application servers in the IP network, or by entities such as telephone switches or IVRs in the circuit switched network. Schulzrinne, Petrack Expires - July 2004 [Page 17] RTP Events and Tones Payloads February 2004 The DTMF events support two possible applications at the sending end, and two at the receiving end. In the first application at the sending end, the Internet telephony gateway detects DTMF on the incoming circuits and sends the RTP payload described here instead of regular audio packets. The gateway likely has the necessary digital signal processors and algorithms, as it often needs to detect DTMF, e.g., for two-stage dialing. Having the gateway detect tones relieves the receiving Internet end system from having to do this work and also avoids having low bit-rate codecs like G.723.1 [I-4] render DTMF tones unintelligible. In the second application, an Internet end system such as an "Internet phone" can emulate DTMF functionality without concerning itself with generating precise tone pairs and without imposing the burden of tone recognition on the receiver. A similar distinction occurs at the receiving end. In the gateway scenario, an Internet telephony gateway connecting a packet voice network to the PSTN recreates the DTMF tones or other telephony events and injects them into the PSTN. Since, for example, DTMF digit recognition takes several tens of milliseconds, the first few milliseconds of a digit will arrive as regular audio packets. Thus, careful time and power (volume) alignment between the audio samples and the events is needed to avoid generating spurious digits at the receiver. In the end system scenario, the DTMF events are consumed by the receiving entity itself. Table 1 shows the DTMF-related named event codepoints within the telephone-event payload format. The DTMF digits 0-9 and * and # are commonly supported. DTMF digits A through D are less frequently encountered, typically in special applications such as military networks. ITU-T Recommendation Q.24 [N-14], Table A-1, indicates that the legacy switching equipment in the countries surveyed expects a minimum recognizable signal duration of 40 ms, a minimum pause between signals of 40 ms, and a maximum signalling rate of 8 to 10 digits per second depending on the country. Schulzrinne, Petrack Expires - July 2004 [Page 18] RTP Events and Tones Payloads February 2004 Event Encoding Type Volume? (decimal) 0--9 0--9 tone yes * 10 tone yes # 11 tone yes A--D 12--15 tone yes Table 1: DTMF named events 3.2 Data Modem and Fax Events This section summarizes the control events and tones that can appear on a subscriber line serving a fax machine or modem. Their purpose is to support negotiation, start-up and takedown of FAX and modem sessions and transitions between operating modes. The actual FAX and modem content are carried by other payload types (e.g, G.711 [I-3], T.38 [I-8], or, in specific circumstances, V.150.1 [I-19] modem relay, RFC 2793 [I-1], or CLEARMODE [I-2]. The events are organized into several groups, corresponding to the ITU-T Recommendation in which they are defined. NOTE: implementors SHOULD NOT rely on the descriptions of the various modem protocols described below without consulting the original references (generally ITU-T Recommendations). The descriptions are provided in this document to give a context for the use of the events defined here. They frequently omit important details needed for implementation. The typical application of these events is to allow the Internet to serve as a bridge between terminals operating on the PSTN. This application is characterized as follows: - each gateway will act both as sender and as receiver; - time constraints apply to the exchange of signals, making the early identification and reporting of events desirable so that receiver playout can proceed in timely fashion; - transfer of the events must be reliable. Schulzrinne, Petrack Expires - July 2004 [Page 19] RTP Events and Tones Payloads February 2004 In some cases, an implementation may simply ignore certain events, such as fax tones, that do not make sense in a particular environment. Section 2.4.1 specifies how an implementation can use the SDP "fmtp" parameter within an SDP description to indicate its inability to understand a particular event or range of events. Regardless of which events they support, implementations MUST be prepared to send and receive data signals using payload types other than telephone-event, simultaneously with the use of the latter. This is discussed further in section 5.3. A further word on time constraints is in order. Time constraints governing the duration of tones do not pose a problem when using the telephone-events payload type: the payload specifies the duration and the receiving gateway can play out the tones accordingly. Problems come when time constraints are specified for the duration of silence between tones. A silent period of "at least x ms" is not a problem - - event notifications can be received late, but they can still be played out at their specified durations. The problem comes with requirements of silence for "exactly" some period or for "at most" some period. The most general constraint of the latter type has to do with the operation of echo suppressors (ITU-T Rec. G.164 [N-10] and echo cancellers (ITU-T Rec. G.165 [N- 11]). These devices may re-activate after as little as 100 ms of no signal on the line. As a result, in any situation where echo suppressors or cancellers must be disabled for signalling to work, tone events must be reported quickly enough to ensure that these devices do not become renabled. This principle is reflected in the succeeding sections. 3.2.1 V.8bis Events Recommendation V.8bis [N-21] is a general procedure for two endpoints to establish each others' capabilities and to transition between different operating modes, both at call startup and after the call has been established. It supports many of the same terminals as V.8 [N-20] (see below), but allows more detailed parameter negotiation. It lacks support for some of the older V-series modems defined in V.8, but adds capabilities for simultaneous voice and data, H.324 [I- 6] multilink, and T.120 [I-10] conferencing. The ability to change operating modes in mid-call (e.g., to provide alternating voice and data) is unavailable in V.8. V.8bis distinguishes between signals and messages. The V.8bis signals: ESi/ESr, MRe/MRd, and CRe/CRd -- consist of tones, as described in the next paragraph. The V.8bis messages: MS, CL, CLR, ACK(1), ACK(2), NAK(1), NAK(2), NACK(3), and NACK(4) -- consist of sequences of bits transported over V.21 [N-23] modulation. Schulzrinne, Petrack Expires - July 2004 [Page 20] RTP Events and Tones Payloads February 2004 Signals are intended to be comprehensible at the receiver even in the presence of voice content. They consist of two tone segments. The first segment consists of a dual frequency tone held for 400 ms, and has the function of preparing the receiver and any in-line echo suppressor or canceller for what follows. The specific frequencies depend only on whether the signal is from the initiator or the responder in a transaction. The second segment follows immediately after the first, and is a single tone held for 100 ms. The frequency used indicates the specific signal of the six signals defined. The complete V.8bis strategy for dealing with echo suppressors or cancellers is described in Rec. V.8bis Appendix III. The only silent period constraints imposed are of the "at least" type, posing no difficulties for the use of the telephone-events payload. V.8bis messages can be transmitted only when voice content is absent. The V.8bis protocol uses signals to ensure that the connection is operating in non-voice mode before passing messages. At the physical level, V.8bis messages use V.21 [N-23] frequency-shift signalling to transfer message content. V.21 is described in the next section. V.8bis uses V.21 in half-duplex mode, assigning the lower channel to the initiator and the upper channel to the responder. The V.21 signals are preceded by a 100 ms preamble of 1650 Hz tone (the V.21 upper channel mark tone), which must be omitted if the preceding signal was ESi or ESr. (The second segment of ESr is also 1650 Hz.) The sender MAY report this preamble tone either as a single extended V.21 upper channel "1" event, or as a series of "1" events of normal duration. It is not necessary to provide an event report before the preamble has completed, since the receiver will still be playing out the preceding V.8bis signal when this happens (see below). The events associated with V.8bis signals are described further below. No events are defined for V.8bis messages, only for the individual bits transmitted using V.21, so a brief description follows: - the V.8bis CL message describes the sending terminal's capabilities - the CLR message also describes capabilities, but indicates that the sender wants to receive a CL in return - the MS establishes a particular operating mode - the ACK and NAK messages are used to terminate the message transactions. Schulzrinne, Petrack Expires - July 2004 [Page 21] RTP Events and Tones Payloads February 2004 The V.8bis messages are organized as a sequence of octets. The first two to five octets are HDLC flags. Then comes a message type identifier (four bits), a V.8bis version identifier (four bits), zero to two more octets of identifying information, followed by zero or more information field parameters in the form of bit maps. An individual bit map is one to five octets in length. Up to 64 octets of non-standard information may also be present. The information fields are followed by a checksum and one to three HDLC flags. Applications supporting V.8bis signalling using the telephone-events payload MUST transfer V.8bis messages in the form of sequences of bits, using the V.21 bit events defined in the next section. The transmitted information MUST include the complete contents of the message: the initial HDLC flags, the information field, the checksum, and the terminating HDLC flags. Transmission MUST also include the extra "0" bits added according to the procedures of Rec. V.8bis clause 7.2.8 to prevent false recognition of HDLC flags at the receiver. Implementors should note that these extra "0" bits mean that in general V.8bis messages as transmitted on the wire will not come out to an even multiple of octets. Sending implementations MAY choose to vary the packetization interval to include exactly one octet of information plus any extra "0" bits inserted into that octet; the resulting variation will be insignificant compared with the amount of buffering caused by the preceding V.8bis signal (see below). The power levels of the V.8bis and V.21 signals are subject to national regulation. Thus it seems suitable to model V.8bis events as tones for which the volumes SHOULD be specified by the sender. If the receiver is rendering the V.8bis tones as audio content for onward transmission, the receiver MAY use the volumes contained in the event reports, or MAY modify the volumes to match downstream national requirements. Table 2 summarizes the V.8bis signal codepoints defined in this document. The individual signal events are described following the table. The sender SHALL set the RTP timestamp for these events to indicate the time at which the beginning of segment 1 was detected. The sender SHOULD send an interim report for the event as soon as it has been identified. The end of the event SHALL be indicated when the end of segment 2 has been detected. Note: since the sender cannot identify the specific event until segment 2 has been detected, the receiver will receive the first report of the event more than 400 ms after it has begun. This has the implication that the receiver MUST be able to buffer more than 400 ms of the V.21 events which follow (i.e., more than 120 events at the nominal V.21 rate of 300 bits/s). Schulzrinne, Petrack Expires - July 2004 [Page 22] RTP Events and Tones Payloads February 2004 Event Frequencies (Hz) Encoding Type Volume? (decimal) Segment 1 Segment 2 CRdi 1375 + 2002 1900 41 tone yes CRdr 1529 + 2225 1900 42 tone yes CRe 1375 + 2002 400 43 tone yes ESi 1375 + 2002 980 44 tone yes ESr 1529 + 2225 1650 45 tone yes MRdi 1375 + 2002 1150 46 tone yes MRdr 1529 + 2225 1150 47 tone yes MRe 1375 + 2002 650 48 tone yes Table 2: Events for V.8bis signals CRdi: V.8bis [N-21] Capabilities Request (CRd) signal when used to initiate a transaction (Rec. V.8bis Table 7, transactions 2,3,8, and 9). This signal requests that the remote station transition from telephony mode to an information transfer mode and requests the transmission of a capabilities list message by the remote station. CRdi is sent by the calling station at call startup (transactions 2 and 3), the initiating station subsequent to call startup (also transactions 2 and 3), or by the answering station in response to MRd at call startup if the answering station originally issued an MRe and now wants to know the calling station's capabilities (transactions 8 and 9). CRe: V.8bis [N-21] Capabilities Request (CRe) signal, used specifically by an automatic answering station to initiate V.8bis signalling (Rec. V.8bis Table 7, transactions 2, 3, 12, and 13). Like CRdi, this signal requests that the remote station transition from telephony mode to an information transfer mode and requests the transmission of a capabilities list message by the remote station. Schulzrinne, Petrack Expires - July 2004 [Page 23] RTP Events and Tones Payloads February 2004 CRdr: V.8bis [N-21] Capabilities Request (CRd) signal when used by the calling station as a response to MRe or CRe during call startup to allow the calling station to control the outcome of the message transaction (Rec. V.8bis Table 7, transactions 10-13). Like CRdi and CRe, this signal requests that the remote station transition from telephony mode to an information transfer mode and requests the transmission of a capabilities list message by the remote station. ESi: V.8bis [N-21] Escape Signal (ESi). This signal requests that the remote station transition from telephony mode to an information transfer mode. ESi is used to precede a message which initiates a V.8bis transaction if the transaction is not initiated by MRx or CRx (Rec. V.8bis Table 7, transactions 4, 5, and 6). It is intended to allow the responding station to detect the arrival of an initiating signal in the presence of local voice or other audio. PSTN connections with network echo suppressors may be accommodated by inserting a 1.5 s silent interval between the ESi signal and the transmission of the MS, CL or CLR message. ESr: V.8bis [N-21] Escape Signal (ESr) has the same meaning as ESi, but is used as a response to MRe or CRe to prepare the way for an MS, CL, or CLR message (Rec. V.8bis Table 7, transactions 1, 2, and 3). Used in this way, it turns off any announcement being generated by the automatic answering station during message transmission. MRdi: V.8bis [N-21] Mode Request (MRd) signal when used to initiate a transaction (Rec. V.8bis Table 7, transaction 1). This signal requests that the remote station transition from telephony mode to an information transfer mode and requests the transmission of a mode select message by the remote station. In particular, signal MRdi is sent by the initiating station during the course of a call, or by the calling station at call establishment. MRe: V.8bis [N-21] Mode Request (MRe) signal, sent by an automatic answering station during call setup.signal. Like MRdi, this signal requests that the remote station transition from telephony mode to Schulzrinne, Petrack Expires - July 2004 [Page 24] RTP Events and Tones Payloads February 2004 an information transfer mode and requests the transmission of a mode select message by the remote station. MRdr: V.8bis [N-21] Mode Request (MRd) signal when used to respond to an MRe in order to give the calling station control over the outcome of the message transaction (Rec. V.8bis Table 7, transactions 7, 8, and 9). It has the same meaning as MRdi and MRe. 3.2.2 V.21 Events V.21 [N-23] is a modem protocol offering data transmission at a maximum rate of 300 bits/s. Two channels are defined, supporting full duplex data transmission if required. One channel uses frequencies 980 Hz for "1" and 1180 Hz for "0"; the other channel uses frequencies 1650 Hz for "1" and 1850 Hz for "0". The modem can operate synchronously or asynchronously. V.21 is used by other protocols (e.g., V.8bis, V.18, T.30) for transmission of control data, and is also used in its own right between text terminals. The telephone-events payload type SHOULD NOT be used to carry user data as opposed to control data -- other payload types such as G.711 [I-3], RFC 2793 [I-1], or V.150.1 [I-19] modem relay are more suitable for that purpose. The V.21 events are summarized in Table 3. Sending implementations MUST report a completed event for every bit transmitted (i.e., rather than at transitions between "0" and "1"). Implementations SHOULD pack multiple events into one packet, using the procedures of section 2.5.1.5. Eight to ten bits is a reasonable packetization interval. Reliable transmission of V.21 events is important, to prevent data corruption. Reporting an event per bit rather than per transition increases reporting redundancy and thus reporting reliability, since each event completion is retransmitted three times as described in section sec:retrfin. To reduce the number of packets required for reporting, implementations SHOULD carry the retransmitted events using RFC 2198 [N-2] redundancy encoding. Schulzrinne, Petrack Expires - July 2004 [Page 25] RTP Events and Tones Payloads February 2004 Event Frequency Encoding Type Volume? Hz (decimal) V.21 channel 1, 1180 37 tone yes "0" bit V.21 channel 1, 980 38 tone yes "1" bit V.21 channel 2, 1850 39 tone yes "0" bit V.21 channel 2, 1650 40 tone yes "1" bit Table 3: Events for V.21 signals 3.2.3 V.8 Events V.8 [N-20] is an older general negotiation and control protocol, supporting startup for the following terminals: H.324 [I-6] multimedia, V.18 [-22] text, T.101 [I-9] videotext, T.30 [N-19] send or receive FAX, and a long list of V-series modems including V.34 [I- 15], V.90 [I-16], V.91 [I-17], and V.92 [I-18]. In contrast to V.8bis [N-21], in V.8 only the calling terminal can determine the operating mode. V.8 does not use the same terminology as V.8bis. Rather, it defines four signals which consist of bits transferred by V.21 [N-23] at 300 bits/s: the call indicator signal (CI), the call menu signal (CM), the CM terminator (CJ), and the joint menu signal (JM). In addition, it uses tones defined in V.25 [N-25] and T.30 [N-19] (described below), and one tone (ANSam) defined in V.8 itself. The calling terminal sends using the V.21 low channel; the answering terminal uses the high channel. The basic protocol sequence is subject to a number of variations to accommodate different terminal types. A pure V.8 sequence is as follows: 1) After an initial period of silence, the calling terminal transmits the V.8 CI signal. It repeats CI at least three times, continuing with occasional pauses until it detects ANSam tone. The CI indicates whether the calling terminal wants to function as H.324, V.18, T.30 send, T.30 receive, or a V-series modem. Schulzrinne, Petrack Expires - July 2004 [Page 26] RTP Events and Tones Payloads February 2004 2) The answering terminal transmits ANSam after detecting CI. ANSam will disable any G.164 [N-10] echo suppressors on the circuit after 400 ms and any G.165 [N-11] echo cancellors after one second of ANSam playout. 3) On detecting ANSam, the calling terminal pauses at least half a second, then begins transmitting CM to indicate detailed capabilities within the chosen mode. 4) After detecting at least two identical sequences of CM, the answering terminal begins to transmit JM, indicating its own capabilities (or offering an alternative terminal type if it cannot support the one requested). 5) After detecting at least two identical sequences of JM, the calling terminal completes the current octet of CM, then transmits CJ to acknowledge the JM signal. It pauses exactly 75 ms, then starts operating in the selected mode. 6) The answering terminal transmits JM until it has detected CJ. At that point it stops transmitting JM immediately, pauses exactly 75 ms, then starts operating in the selected mode. The CI, CM, and JM signals all consist of a fixed sequence of ten "1" bits followed by a signal-dependent pattern of ten synchronization bits, followed by one or more octets of variable information. Each octet is preceded by a "0" start bit and followed by a "1" stop bit. The combination of the synchronization pattern and V.21 channel uniquely identifies the message type. The CJ signal consists of three successive octets of all zeros with stop and start bits but without the preceding "1"s and synchronizing pattern of the other signals. If both gateways support V.21 bit events (section 3.2.2), the sending gateway for a given message MUST report each instance of a CM, JM, and CJ signal respectively as a series of V.21 bit events. A packetization interval of 10 events per packet is suggested, since V.8 signals are organized in this way. If either gateway does not support the CI event in Table 4, the complete CI message MUST also be signalled as a series of V.21 bit events. If both gateways support the CI event in Table 4, the sender MUST send a first report of this event no later than when the last bit of the synchronization pattern for CI (ten '1's followed by '00000 00001') has been recognized. The beginning of the event according to the RTP timestamp MUST be the time at which the first of the ten '1's was detected. The event completion MUST be indicated only after the end of the complete CI message has been reached. In addition to this indication of the CI event, the sender MUST report Schulzrinne, Petrack Expires - July 2004 [Page 27] RTP Events and Tones Payloads February 2004 the content of the call function octet which follows the synchronization pattern, including stop and start bits, as a series of V.21 bit events. The overlapping nature of V.8 signalling means that there is no risk of silence exceeding 100 ms once ANSam has disabled any echo control circuitry. However, the 75 ms pause before entering operation in the selected data mode will require both the calling and the answering gateways to recognize the completion of CJ, so they can change from playout of telephone-events to playout of the data-bearing payload after the 75 ms period. Event Frequency Encoding Type Volume? Hz (decimal) ANSam 2100 x 15 34 tone yes /ANSam 2100 x 15 35 tone yes phase rev. CI (V.21 bits) 53 tone yes Table 4: Events for V.8 signals Modified answer tone ANSam consists of a sinewave signal at 2100 Hz with phase reversals at an interval of 450 ms, amplitude-modulated by a sine wave at 15 Hz. The modulated envelope ranges in amplitude between 0.8 and 1.2 times its average amplitude. The average transmitted power is governed by national regulations. Thus it makes sense to indicate the volume of the signal. The ANSam phase reversals are allowed only if echo canceller disabling is required. The sender MUST report ANSam as soon as it is recognized, providing updates at reasonable intervals as it continues. However, an ANSam event packet SHOULD NOT be sent until it is possible to discriminate between an ANSam event and an ANS event (see V.25 events, below). If a phase reversal is detected, the sender MUST report completion of the ANSam event and beginning of the /ANSam event at the time that the reversal was detected. If another phase reversal is detected, the sender MUST report the end of the /ANSam event and the beginning of an ANSam event, continuing in this way until the tone is removed. 3.2.4 V.25 Events V.25 [N-25] is a start-up protocol antedating V.8 [N-20] and V.8bis [N-21]. It specifies the exchange of two tone signals: Schulzrinne, Petrack Expires - July 2004 [Page 28] RTP Events and Tones Payloads February 2004 CT: "The calling tone consists of a series of interrupted bursts of 1300 hz tone, on for a duration of not less than 0.5 s and not more than 0.7 s and off for a duration of not less than 1.5 s and not more than 2.0 s." [N-25]. Modems not starting with the V.8 call initiation signal often use this tone. ANS: Answering tone. This 2100 Hz tone is used to disable echo suppression for data transmission [N-25], [N-19]. For fax machines, Recommendation T.30 [N-30] refers to this tone as called terminal identification (CED) answer tone. ANS differs from V.8 ANSam in that the latter varies in amplitude due to modulation by a 15 Hz signal. V.25 specifically includes procedures for disabling echo suppressors as defined by ITU-T Rec. G.164 [N-10]. However, G.164 echo suppressors have now for the most part been replaced by G.165 [N-11] echo cancellers, which require phase reversals in the disabling tone (see ANSam above). As a result, V.25 was modified in July, 2001 to say that phase reversal in the ANS tone is required if echo cancellers are to be disabled. One possible V.25 sequence is as follows: 1) The calling terminal starts generating CT as soon as the call is connected. 2) The called terminal waits in silence for 1.8 to 2.5 s after answer, then begins to transmit ANS continuously. If echo cancellers are on the line the phase of the ANS signal is reversed every 450 ms. ANS will not reach the calling terminal until the echo control equipment has been disabled. Since this takes about a second it can only happen in the gap between one burst of CT and the next. 3) Following detection of ANS, the calling terminal may stop generating CT immediately or wait until the end of the current burst to stop. In any event, it must wait at least 400 ms (at least 1 s if phase reversal of ANS is being used to disable echo cancellers) after stopping CT before it can generate the calling station response tone. This tone is modem-specific, not specified in V.25. 4) The called terminal plays out ANS for 2.6 to 4.0 seconds or until it has detected calling station response for 100 ms. It waits 55- 95 ms (nominal 75 ms) in silence. (Note that the upper limit of Schulzrinne, Petrack Expires - July 2004 [Page 29] RTP Events and Tones Payloads February 2004 95 ms is rather close to the point at which echo control may reestablish itself.) If the reason for ANS termination was timeout rather than detection of calling station response, the called terminal begins to play out ANS again to maintain disabling of echo control until the calling station responds. The events defined for V.25 signalling are shown in Table 5. The gateway at the calling end SHOULD use a packetization interval smaller than the nominal duration of a CT burst, to ensure that CT playout at the called end precedes the sending of ANS from that end. The gateway at the called end MUST report ANS as soon as it is recognized, providing updates at reasonable intervals as it continues. However, an ANS event packet SHOULD NOT be sent until it is possible to discriminate between an ANS event and an ANSam event (see V.8 events, above). If a phase reversal is detected, the sender MUST report completion of the ANS event and beginning of the /ANS event at the time that the reversal was detected. If another phase reversal is detected, the sender MUST report the end of the /ANS event and the beginning of an ANS event, continuing in this way until the tone is removed. Event Frequency Encoding Type Volume? Hz (decimal) Answer tone 2100 32 tone yes (ANS) /ANS 2100 rev 33 tone yes CT 1300 49 tone yes Table 5: Events for V.25 signals 3.2.5 T.30 Events ITU-T Recommendation T.30 [N-19] defines the procedures used by Group III FAX terminals. The pre-message procedures for which the events of this section are defined are used to identify terminal capabilities at each end and negotiate operating mode. Post-message procedures are also included, to handle cases such as multiple document transmission. FAX terminals support a wide variety of protocol stacks, so T.30 has a number of options for control protocols and sequences. T.30 defines two tone signals used at the beginning of a call. The CNG signal is sent by the calling terminal. It is a pure 1100 Hz Schulzrinne, Petrack Expires - July 2004 [Page 30] RTP Events and Tones Payloads February 2004 tone played in bursts: 0.5 s on, 3 s off. It continues until timeout or until the calling terminal detects a response. The called terminal waits in silence for at least 200 ms. It then may return CED tone, which is identical to V.25 ANS, or else V.8 ANSam if it has V.8 capability. If ANSam is returned and the calling terminal has V.8 capability, it transmits CI to begin a V.8 negotiation. Otherwise, the calling and called terminals enter the T.30 negotiation phase. In the negotiation phase the terminals exchange binary messages using V.21 signals, high channel frequencies only. Each message is preceded by a one-second (nominal) preamble consisting entirely of HDLC flag octets (0x7E). This flag has the function of preparing echo control equipment for the message which follows. The pre-transfer messages exchanged using the V.21 coding are: Digital Identification Signal (DIS): Characterizes the standard ITU-T capabilities of the called terminal. Digital Transmit Command (DTC): The digital command response to the standard capabilities identified by the DIS signal. Digital Command Signal (DCS): The digital set-up command responding to the standard capabilities identified by the DIS signal. Confirmation To Receive (CFR): A digital response confirming that the entire pre-message procedure has been completed and the message transmissions may commence. If the calling terminal wishes to transmit a document, the three messages exchanged are DIS (from the called terminal), DCS, and CFR. If it wishes to receive, the sequence changes to DIS, DTC, DCS, and CFR. Each message may consist of multiple frames, each bounded by HDLC flags. The messages are organized as a series of octets, but like V.8bis, T.30 calls for the insertion of extra "0" bits to prevent spurious recognition of HDLC flags. T.30 also provides for the transmission of control messages after document transmission has completed (e.g., to support transmission of Schulzrinne, Petrack Expires - July 2004 [Page 31] RTP Events and Tones Payloads February 2004 multiple documents). The transition back from the modem used for document transmission (V.17 [I-11], V.27ter [I-13], V.29 [I-14], V.34 [I-15]) to V.21 signalling is preceded by 75 ms (nominal) of silence). Control message transmission is preceded by the preamble described above. Before CFR the transmitting terminal sends a training signal consisting of a steady string of V.21 high channel zeros (1850 Hz tones) for 1.5 s. The sender MAY report this training signal either as a single extended V.21 upper channel "0" event, or as a series of "0" events of normal duration. The event(s) MUST be reported as soon as the training signal is recognized, with updates at reasonable intervals thereafter. Applications supporting T.30 signalling using the telephone-events payload MUST transfer T.30 messages in the form of sequences of bits, using the V.21 bit events defined in the next section. The transmitted information MUST include the complete contents of the message: the initial HDLC flags, the information field, the checksum, and the terminating HDLC flags. Transmission MUST also include the extra "0" bits added to prevent false recognition of HDLC flags at the receiver. Implementors should note that these extra "0" bits mean that in general T.30 messages as transmitted on the wire will not come out to an even multiple of octets. Sending implementations MAY choose to vary the packetization interval to include exactly one octet of information plus any extra "0" bits inserted into that octet. The events defined for T.30 signalling are shown in Table tab:t30ev. The CED and /CED events represent exactly the same tone signals as V.8 ANS and /ANS, and are given the same codepoints; they are reproduced here only for convenience. For reporting of CNG, the gateway at the calling end SHOULD use a packetization interval smaller than the nominal duration of a CNG burst, to ensure that CED has time to disable echo control before it times out. The gateway at the called end MUST report CED as soon as it is recognized, providing updates at reasonable intervals as it continues. However, a CED event packet SHOULD NOT be sent until it is possible to discriminate between a CED event and an ANSam event (see V.8 events, above). If a phase reversal is detected, the sender MUST report completion of the CED event and beginning of the /CED event at the time that the reversal was detected. If another phase reversal is detected, the sender MUST report the end of the /CED event and the beginning of an CED event, continuing in this way until the tone is removed. Schulzrinne, Petrack Expires - July 2004 [Page 32] RTP Events and Tones Payloads February 2004 The sending gateway SHOULD report the V.21 preamble flag event as soon as it is identified, with updates at intervals which are multiples of one octet of transmision time (nominally 26.4 ms) until it completes. The preamble is reported as a single event; reports of the individual bits making it up MUST NOT be sent. The end of the event is reported when a pattern of V.21 bits other than an HDLC flag is observed. This means that the V.21 preamble event absorbs the initial HDLC flags of the following message. Event Frequency Encoding Type Volume? Hz (decimal) Calling tone 1100 36 tone yes (CNG) Called tone 2100 32 tone yes (CED) /CED 2100 33 tone yes ph. rev. V.21 preamble (V.21 bits) 54 tone yes flag Table 6: Events for T.30 signals 3.2.6 V.18 Events ITU-T Recommendation V.18 [N-22] defines a terminal for text conversation, possibly in combination with voice. What follows is a description of the use of telephone events for V.18 startup. In all cases, once the startup procedures have been completed, the gateways SHOULD use another payload type to transfer the content of the text conversation. V.18 is intended to interoperate with a variety of legacy text terminals, so its start-up sequence can consist of a series of stimuli designed to determine what is at the other end. Two V.18 terminals talking to each other will use V.8bis to negotiate startup, and continue at the physical level with V.21 at 300 bits/s carrying 7-bit characters bounded by start and stop bits. The V.18 terminal is also designed to interoperate with: - Baudot [I-21], a five bit character encoding nominally operating at 45.45 or 50 bits/s with frequencies 1800 Hz = "0", 1400 Hz = "1"; Schulzrinne, Petrack Expires - July 2004 [Page 33] RTP Events and Tones Payloads February 2004 - Q.23 [N-13] (DTMF), which uses combinations of "*" and "#" as escapes to achieve a full repertoire of characters; these combinations are documented in V.18 Annex B; - EDT, which is V.21 [N-23] operating at 110 bits/s in half-duplex mode (lower channel only); characters are 7 bit IA5 plus initial start bit, trailing parity bit, and two stop bits; - Bell 103 mode (documented in Recommendation V.18 Annex D), which is structurally similar to V.21, but uses different frequencies: lower channel, 1070 Hz = "0", 1270 Hz = "1"; upper channel, 2025 Hz = "0", 2225 Hz = "1"; characters are US ASCII framed by one start bit, one trailing parity bit, and one stop bit; - V.23 [I-12] based videotex, in Minitel and Prestel versions. V.23 offers a forward channel operating at 1200 bits/s if possible (2100 Hz = "0", 1300 Hz = "1") or otherwise at 600 bits/s (1700 Hz = "0", 1300 Hz = "1"), and a 75 bits/s backward channel which is transmitting 390 Hz (continuous "1"s) except when "0" is to be transmitted (450 Hz); - a non-V.18 text terminal using V.21 [N-23] at 300 bits/s. Characters are 7 bit national (e.g., US ASCII) with a start bit, parity, and one stop bit. The startup sequences for all these different terminal types are naturally quite different. The V.18 initial startup sequence addresses itself to V.8-capable terminals and V.21 terminals and, by the combination of signals, to V.23 videotex terminals. During the initial startup sequence the V.18 terminal listens for frequency responses characterizing the other terminal types. If it does not make contact in the preliminary step it probes for each type specifically. By the nature of the application, V.18 has been designed to provide an extremely robust startup capability. More on the details of V.18 startup below. The point to make here is that gateways intending to serve V.18 MUST be prepared to transfer information using payload types other than telephone-events from the start of the session. Events have been defined as shown in Table tab:v18ev to allow the sending gateway to indicate the nature of the modulated content it is receiving. However, the alternative payload type used to transfer the content may (for example, in the case of RFC 2793) be independent of the type of modulation received at the sending gateway. A receiving gateway MUST NOT rely on the receipt of a V.18-related event to control playout at its end if content is available in another payload type. Note that ANS2225 was defined in RFC 2833, but the other events are new to this document. Schulzrinne, Petrack Expires - July 2004 [Page 34] RTP Events and Tones Payloads February 2004 Event Bit Rate Frequency Encoding Type Volume? bits/s Hz (decimal) ANS2225 N/A 2225 52 tone yes V21L110 110 980/1180 55 other no B103L300 300 1070/1270 56 other no V23Main 600/1200 1700-2100 57 other no /1300 V23Back 75 450/390 58 other no Baud4545 45.45 1800/1400 59 other no Baud50 50 1800/1400 60 other no Table 7: Events for V.18 interworking ANS2225: This 2225 Hz answer tone is described in ITU-T Recommendation V.18, Annex D [N-22] for Bell 103 class modems operating in the text telephone mode. It is also referred to in ITU-T Recommendation V.22 [N-24]. This is a pure tone with no amplitude modulation and no semantics attached to phase reversals, if there are any. It is necessary to accommodate it for completeness, and for compliance with various legal ordinances. A distinct codepoint must be allocated to this event since it must be differentiated from the normal, 2100 Hz answer tone when reproduced at the far- end gateway. V21L110: indicates that the sending device has detected V.21 modulation operating in the lower channel at 110 bits/s. B103L300: indicates that the sending device has detected Bell 103 class modulation operating in the low channel at 300 bits/s. V231200: indicates that the sending device has detected V.23 modulation operating in the high speed channel. Schulzrinne, Petrack Expires - July 2004 [Page 35] RTP Events and Tones Payloads February 2004 V2375: indicates that the sending device has detected V.23 modulation operating in the 75 bit/s back-channel. Baud4545: indicates that the sending device has detected Baudot modulation operating at 45.45 bits/s. Baud50: indicates that the sending device has detected Baudot modulation operating at 50 bits/s. The V.18 startup procedure for the calling terminal requires it to transmit a V.18 sequence in the following cycle: 1) Silence for one second. 2) Repeat the following steps three times: i) Four repetitions of V.8 CI on the V.21 low channel, without preamble. If using telephone-events, the sending gateway SHOULD report each CI as the combination of a CI event as defined in Table 4, overlapped by a series of V.21 low channel bit events expressing the final octet with its start and stop bits. The final octet for a V.18 text terminal is defined in V.8 to be '01000 00101'. ii) Silence for two seconds. 3) Play out the XCI signal, a three second string of V.23 bit patterns defined in clause 3.13 of Recommendation V.18 and using the V.23 1200 bits/s upper channel. The sending gateway MUST provide the pattern using an alternate payload type, but MAY also send the V23Main event defined in Table 7 for the duration of XCI playout. The receiving gateway MUST be prepared to play out the pattern from that alternate payload type without relying on receipt of the V23Main event. The second and third steps are repeated until a response is detected. The following responses are possible: - 2100 Hz modulated (ANSam) as defined in ITU-T Recommendation V.8; this would indicate a V.8-capable terminal. The V.18 terminal completes a V.8 negotiation to start up. The gateways MUST use the events as defined for V.8 to sustain this negotiation. Schulzrinne, Petrack Expires - July 2004 [Page 36] RTP Events and Tones Payloads February 2004 - 2100 Hz (ANS) as defined in ITU-T V.25; this could indicate a V.18, V.21 (300 bits/s), or V.23 terminal. The calling V.18 terminal transmits a 40-bit pattern (TXP) using the V.21 low channel and monitors the frequencies returned. The calling end gateway SHOULD send the TXP pattern as a sequence of V.21 low channel bit events. An answering V.18 terminal will return TXP, so the calling end gateway MUST be prepared to play the corresponding V.21 sequence back to the calling terminal. - 2225 Hz; this indicates a Bell 103 class terminal in answer mode. The gateway at the answering end MUST report this as the ANS2225 defined in this section. The event begins when the 2225 Hz tone is detected. Event updates should be provided at reasonable intervals until the tone is taken away. - 1300 Hz; provided this persists for at least 1.7 s, it indicates a V.23-based terminal operating at 600 or 1200 bits/s. The calling terminal will enter V.23 mode, transmitting on the 75 bits/s V.23 back-channel. The gateway at the answering end 1300 Hz tone MAY also report the V23Main event. When the calling V.18 terminal responds, the gateway at the calling end MAY also report the V23Back event. - 1650 Hz; if this persists at least 500 ms, it indicates a V.21 (300 bits/s) terminal. The calling V.18 terminal will enter into that mode of operation. - 1400 or 1800 Hz; this indicates a Baudot terminal. The calling terminal will determine the line rate and enter into Baudot mode. Either gateway MAY send the Baud4545 or Baud50 event as applicable if and when it identifies the nature of the signals being passed. - DTMF tones; these indicate a DTMF terminal. The calling terminal will enter DTMF mode. - 980 or 1180 Hz; these indicate a V.21-based terminal running at either 110 or 300 bits/s, and using the low channel. The calling terminal does timing to make the distinction. If it observes continuous 980 Hz for at least 1.5s, the calling terminal enters V.21 (300 bit/s) mode using the high channel for transmission. The gateway at the answering end SHOULD NOT use V.21 events to report the initial signals from the answering terminal. The tones payload type defined in this document MAY be used instead. A gateway receiving V.21 signals at 110 bits/s MAY report the V21L110 event once it has made a definitive determination of the line speed. - 1270 Hz; this indicates a Bell 103 terminal operating in calling mode (lower channel). The V.18 terminal enters Bell 103 mode Schulzrinne, Petrack Expires - July 2004 [Page 37] RTP Events and Tones Payloads February 2004 using the higher channel. The gateways MUST transmit the Bell 103 modem content using an alternative payload type, and MAY report the B103L300 or B103H300 event as applicable to the modulation received from the terminal at their end. - 390 Hz (only when sending XCI); this indicates a V.23 terminal using the 75 bits/s channel. The V.18 terminal enters V.23 mode using the high-speed (1200 bits/s) channel. The gateway at the answering end MAY report the V2375 event. The gateway at the calling end MAY report the V231200 event. Similar logic governs the actions taken by a V.18 terminal operating in answer mode. 3.3 Basic Subscriber Line Events Table 8 summarizes the basic events applicable to a subscriber line. All of them except for the two line states "On Hook" and "Off Hook" and the "Flash" event are propagated from the network toward the line. There are two typical applications for these events: 1) A gateway to which the line is attached reports line states and "Flash" to a call controller (possibly indirectly through another device) and propagates tones and ringing in the other direction. In this application, the gateway is being controlled in-band through the use of telephony-events rather than through a separate media gateway control protocol such as Megaco/H.248. 2) Tones and media are being passed between two gateways in the middle of the media path, where both ends of the call are in the PSTN. In this application, only a limited subset of the events in Table 8 are applicable. These are indicated by a "Yes" in the "Mid-path?" column of Table 8. It is rather difficult to define the "On Hook" state, since it is still possible to transmit information (ringing, information for displays) when the line is on hook. Moreover, an ISDN set can still signal while it is on hook. A working definition is that "On Hook" is a state where the terminal will not originate media, will not present media other than display information to the user, and will accept only a limited set of signals. "Off Hook" is a state where these restrictions are lifted. The line states "On Hook" and "Off Hook" are mutually exclusive. The "Flash" event indicates a brief transition from off hook to on hook and back to off hook. By definition, the transition is too brief to end the current call. Depending on what services are subscribed to on the line, the "Flash" event may be interpreted as a service invocation. The "Flash' event MUST NOT be sent when the Schulzrinne, Petrack Expires - July 2004 [Page 38] RTP Events and Tones Payloads February 2004 state is "On Hook". Its duration is from the point at which "On Hook" was first observed until the line returns to "Off Hook". The gateway MUST NOT report both "Flash" and the "On Hook" - "Off Hook" pair, and MUST NOT report "Flash" until the event is complete. The time threshold for distinguishing true "On Hook" - "Off Hook" from "Flash" is a matter of national standards. ITU Recommendation E.182 [N-9] defines when certain tones should be used. The specification of the actual tones varies from country to country. An useful source for this purpose is Supplement 2 to ITU-T Recommendation E.180 [N-8] and its successor documents. E.182 [N-9] defines the following standard tones that are heard by the caller: Dial tone: The exchange is ready to receive address information. PABX internal dial tone: The PABX is ready to receive address information. Special dial tone: Same as dial tone, but the caller's line is subject to a specific condition, such as call diversion or a voice mail is available (e.g., "stutter dial tone"). Second dial tone: The network has accepted the address information, but additional information is required. Ring: This named signal event causes the recipient to generate an alerting signal ("ring"). The actual tone or other indication used to render this named event is left up to the receiver. (This differs from the ringing tone, below, heard by the caller.) Ringing tone: The call has been placed to the callee and a calling signal (ringing) is being transmitted to the callee. This tone is also called "ringback" and is heard by the caller to confirm call progress. Special ringing tone: Schulzrinne, Petrack Expires - July 2004 [Page 39] RTP Events and Tones Payloads February 2004 A special service, such as call forwarding or call waiting, is active at the called number. Busy tone: The called telephone number is busy. Congestion tone: Facilities necessary for the call are temporarily unavailable. Calling card service tone: The calling card service tone consists of 60 ms of the sum of 941 Hz and 1477 Hz tones (DTMF '#'), followed by 940 ms of 350 Hz and 440 Hz (U.S dial tone), decaying exponentially with a time constant of 200 ms. Special information tone: The callee cannot be reached, but the reason is neither "busy" nor "congestion". This tone should be used before all call failure announcements, for the benefit of automatic equipment. Comfort tone: The call is being processed. This tone may be used during long post-dial delays, e.g., in international connections. Hold tone: The caller has been placed on hold. Record tone: The caller has been connected to an automatic answering device and is requested to begin speaking. Caller waiting tone: The called station is busy, but has call waiting service. Pay tone: The caller, at a payphone, is reminded to deposit additional coins. Schulzrinne, Petrack Expires - July 2004 [Page 40] RTP Events and Tones Payloads February 2004 Positive indication tone: The supplementary service has been activated. Negative indication tone: The supplementary service could not be activated. Off-hook warning tone: The caller has left the instrument off-hook for an extended period of time and is not engaged in a call. The following tones can be heard by either calling or called party during a conversation: Call waiting tone: Another party wants to reach the subscriber. Warning tone: The call is being recorded. This tone is not required in all jurisdictions. Intrusion tone: The call is being monitored, e.g., by an operator. CPE alerting signal (CAS): A tone used to alert a device to an arriving in-band FSK data transmission. A CPE alerting signal is a combined 2130 and 2750 Hz tone, both with tolerances of 0.5% and a duration of 80 to 85 ms. The CPE alerting signal is used with ADSI services and Call Waiting ID services [N-26]. The following tone is heard by operators: Payphone recognition tone: The person making the call or being called is using a payphone (and thus it is ill-advised to allow collect calls to such a person). Schulzrinne, Petrack Expires - July 2004 [Page 41] RTP Events and Tones Payloads February 2004 Event Mid- Encoding Type Volume? path? (decimal) ------------ --------- ------- ------ ----- Off-Hook no 64 state no On-Hook no 65 state no Flash no 16 other no Dial tone no 66 tone yes PABX internal no 67 tone yes dial tone Special dial tone no 68 tone yes Second dial tone yes 69 tone yes Ringing tone yes 70 tone yes Special ringing yes 71 tone yes tone Busy tone yes 72 tone yes Congestion tone yes 73 tone yes Special yes 74 tone yes information tone Comfort tone yes 75 tone yes Hold tone yes 76 tone yes Record tone yes 77 tone yes Caller waiting yes 78 tone yes tone Call waiting tone no 79 tone yes Pay tone yes 80 tone yes Positive yes 81 tone yes indication tone Schulzrinne, Petrack Expires - July 2004 [Page 42] RTP Events and Tones Payloads February 2004 Negative yes 82 tone yes indication tone Warning tone yes 83 tone yes Intrusion tone yes 84 tone yes Calling card yes 85 tone yes service tone Payphone yes 86 tone yes recognition tone CPE alerting no 87 tone yes signal (CAS) Off-hook warning no 88 tone yes tone Ring no 89 other no Table 8: Basic subscriber line events 3.4 Extended Subscriber Line Events Table 9 summarizes a number of additional tones that can appear on a subscriber line. All of these are directed toward the line. As in the previous section, some of these may be initiated only by the call controller controlling the line concerned, while others may be initiated elsewhere along the call path. Unfortunately, it has been difficult to locate documentation for the usage of some of these events, even though reasonable guesses can be made based on their names -- most do not appear in ITU-T standards. Depending on the available user interfaces, an implementation MAY render all tones in Table 8 the same or, preferably, use the tones conveyed by a concurrent "tone" payload or other RTP audio payload. Alternatively, it MAY provide a textual representation. Acceptance tone: No description available. Schulzrinne, Petrack Expires - July 2004 [Page 43] RTP Events and Tones Payloads February 2004 Confirmation tone: Used to indicate that an exchange has received information from an access line or has processed a request received from an access line, such as the activation/deactivation of line services. In North America, this is implemented as a dual-frequency tone, 350 and 440 Hz, played for 100 ms with a pause of 100 ms followed by the tone for another 300 ms. Recall dial tone: Sometimes referred to as "stutter dial tone". Recall dial tone is used to indicate that an exchange is ready to accept address information or other information from an access line. In North America, this is implemented as a dual-frequency tone, 350 and 440 Hz, played as three 100 ms bursts separated by pauses of 100 ms. End of three party service tone: No description available. Facilities tone: No description available. Line lockout tone: A tone or silence played out to a line after an extended period of off-hook state where the line is not involved in a call. Typically line lockout follows a period of playout of off-hook warning tone (see previous section). Number unobtainable tone: A tone played out to indicate that the dialled number is not in service. The tone may precede an announcement. In North America, the tone is implemented as a dual- frequency tone, 480 and 620 Hz, played as two 500 ms bursts separated by pauses of 500 ms. Offering tone: No description available. Permanent signal tone: A tone played out to a line if the phone has gone off-hook, has received dial tone, but no digits have been entered within a timeout period (of the order of ten to twenty seconds). If the phone remains off-hook, the permanent signal tone will typically Schulzrinne, Petrack Expires - July 2004 [Page 44] RTP Events and Tones Payloads February 2004 give way after a further timeout to off-hook warning tone (see previous section). Preemption tone: No description available. Queue tone: No description available. Refusal tone: No description available. Route tone: No description available. Valid tone: No description available. Waiting tone: No description available. Warning tone (end of period): No description available. Warning Tone (PIP tone): No description available. Schulzrinne, Petrack Expires - July 2004 [Page 45] RTP Events and Tones Payloads February 2004 Event Mid- Encoding Type Volume? path? (decimal) ------------ ----- ------- ---- ---- Acceptance tone yes 96 tone yes Confirmation tone yes 97 tone yes Dial tone, recall ?? 98 tone yes End of three party yes 99 tone yes service tone Facilities tone yes 100 tone yes Line lockout tone no 101 tone yes Number yes 102 tone yes unobtainable tone Offering tone yes 103 tone yes Permanent signal no 104 tone yes tone Preemption tone yes 105 tone yes Queue tone yes 106 tone yes Refusal tone yes 107 tone yes Route tone yes 108 tone yes Valid tone yes 109 tone yes Waiting tone yes 110 tone yes Warning tone (end yes 111 tone yes of period) Warning Tone (PIP yes 112 tone yes tone) Table 9: Extended subscriber line events Schulzrinne, Petrack Expires - July 2004 [Page 46] RTP Events and Tones Payloads February 2004 3.5 Trunk Events Trunks (or circuits) in the PSTN are the media paths between telephone switches. They may carry media corresponding to any of the events described in the previous sections except the non-mid-path line events. They may also carry signals corresponding to the events defined in this section. These events support an application where PSTN signalling is carried between two gateways without being interworked to signalling in the IP network: the "RTP trunk" application. In the "RTP trunk" application, RTP is used to replace a normal circuit-switched trunk between two nodes. This is particularly of interest in a telephone network that is still mostly circuit- switched. In this case, each end of the RTP trunk encodes audio channels into the appropriate encoding, such as G.723.1 [I-4] or G.729 [I-5]. However, this encoding process destroys in-band signaling information which is carried using the least-significant bit ("robbed bit signaling") and may also interfere with in-band signaling tones, such as the MF (multi- frequency) digit tones. This section defines events related to four different signalling systems. Three of these are based on the exchange of multi-frequency tones. The fourth operates on digital trunks only, and makes use of low-order bits stolen from the encoded media. In addition, this section defines tone events for continuity testing of the media path. The "trunk unavailable" event was added during the updating of this document from the original RFC 2833. Note: implementors are warned that the descriptions of signalling systems given below are incomplete. They are provided to give context to the related event definitions, but omit many details important to implementation. Table 10 lists all of the events defined for trunk signalling. Sending implementations conforming to this document MUST NOT send any of the event codes marked "Reserved" or "Unassigned" in the table. Receiving implementations MUST ignore event codes marked "Reserved" or "Unassigned". In a typical application, the gateways may exchange roles from one call to the next: they must be capable of either sending or receiving each signal in the table. Schulzrinne, Petrack Expires - July 2004 [Page 47] RTP Events and Tones Payloads February 2004 Event Frequency Encoding Type Volume? Hz (decimal) ------------------- ---------- ----------- ------ ---- MF 0...9 (Table 11) 128...137 tone yes MF Code 11 / KP3P / 700+1700 138 tone yes ST3P MF KP/KP1 1100+1700 139 tone yes MF KP2/ST2P 1300+1700 140 tone yes MF ST 1500+1700 141 tone yes MF Code 12/STP 900+1700 142 tone yes Reserved 143 ABCD signaling (see 144...159 state no below) Reserved 160...166 Continuity check- 2000 167 tone yes tone Continuity verify- 1780 168 tone yes tone Reserved 169...173 Unassigned 174 Trunk unavailable 175 other no MFC Forward 1...15 (Table 13) 176...190 tone yes MFC Backward 1...15 (Table 14) 191...205 tone yes Table 10: Trunk signalling events 3.5.1 Signalling System No. 5 Signalling System No. 5 (SS No. 5) is defined in ITU-T Recommendations Q.140 through Q.180 [N-15]. It has two systems of Schulzrinne, Petrack Expires - July 2004 [Page 48] RTP Events and Tones Payloads February 2004 signals: "line signalling", to acquire and release the trunk, and "register signalling", to pass digits forward from one switch to the next. No. 5 line signalling uses tones at two frequencies: 2400 and 2600 Hz. The tones are used singly for most signals, but together for the Clear-forward and Release-guard. (This reduces the chance of an accidental call release due to carried media content looking like one of the frequencies.) The specific signal indicated by a tone depends on the stage of call set-up at which it is applied. No events are defined in support of No. 5 line signalling. However, implementations MAY use the ABCD events described in section 3.5.4 and shown in Table 10 to propagate SS No. 5 line signals. If they do so, they MUST use the following mappings. These mappings are based on an underlying mapping equating A=1 to presence of 2400 Hz signal and B=1 to presence of 2600 Hz signal in the indicated direction. - neither signal present: event code 144; - 2400 Hz present: event code 145; - 2600 Hz present: event code 146; - both 2400 and 2600 Hz present: event code 147. The initial event report for each signal SHOULD be generated at the time of recognition as indicated in ITU-T Recommendation Q.141, Table 1 (i.e. 40 ms for "seizing" and "proceed-to-send", 125 ms for all other signals). The packetization interval following the initial report SHOULD be chosen with considerations of reliable transmission given first priority. Note that the receiver must supply its own volume values for converting these events back to tones. Moreover, the receiver MAY extend the playout of "seizing" until it has received the first report of a KP event (see below), so that it has better control of the interval between ending of the seizing signal and start of KP playout. The KP has to be sent beginning 80 +/- 20 ms after the SS No. 5 "seizing" signal has stopped. No. 5 register signalling uses pairs of tones to convey digits and signals framing them. The tone combinations and corresponding signals are shown in the Table 11. All signals except KP1 and KP2 are sent for a duration of 55 ms. KP1 and KP2 are sent for a duration of 100 ms. Inter-signal pauses are always 55 ms. Schulzrinne, Petrack Expires - July 2004 [Page 49] RTP Events and Tones Payloads February 2004 Lower Upper Frequency (Hz) Frequency (Hz) 900 1100 1300 1500 1700 700 Digit 1 Digit 2 Digit 4 Digit 7 Code 11 900 Digit 3 Digit 5 Digit 8 Code 12 1100 Digit 6 Digit 9 KP1 1300 Digit 0 KP2 1500 ST Table 11: SS No. 5 Register Signals The KP signals are used to indicate start of digit signalling. KP1 indicates a call expected to terminate in a national network served by the switch to which the signalling is being sent. KP2 indicates a call that is expected to transit through the switch to which the signalling is being sent, to another international exchange. The end of digit signalling is indicated by the ST signal. Code 11 or Code 12 following a country code (and possibly another digit) indicates a call to be directed to an operator position in the destination country. A Code 12 may be followed by other digits indicating a particular operator to whom the call is to be directed. Implementations using the telephone-events payload to carry SS No. 5 register signalling MUST use the following events from Table tab:trunk to convey the register signals shown in Table 11: - event code 128 to convey Digit 0 - event codes 129-137 to convey Digits 1 through 9 respectively - event code 139 to convey KP1 - event code 140 to convey KP2 - event code 141 to convey ST - event code 138 to convey Code 11 - event code 142 to convey Code 12. The sending implementation SHOULD send an initial event report for the KP signals as soon as they are recognized, and MUST send an event report for all of these signals as soon as they have completed. Schulzrinne, Petrack Expires - July 2004 [Page 50] RTP Events and Tones Payloads February 2004 These events support an application where the receiving gateway is intended to capture the received digits for processing. To meet timing requirements in the case where signalling is to be propagated from one PSTN segment to another, implementations SHOULD use another payload type, such as the tones payload type also defined in this document, to pass both line and register signals. The alternative is to use the ABCD events for line signals as described earlier. 3.5.2 North American R1 The MF signaling system R1 is mainly used in North America. R1 is defined in ITU-T Recommendations Q.310-Q.332 [N-16]. Like SS No. 5, R1 has both line and register signals. The line signals (not counting Busy and Reorder) are implemented on analogue trunks through the application of a 2600 Hz tone, and on digital trunks by using digital channels obtained by bit stealing of the eighth bit of each channel every sixth frame. Interpretation of the line signals is state-dependent (as with SS No. 5). Implementations MAY use the "off-hook" state, event code 64 from Table 8 to indicate that 2600 Hz tone is playing (binary "1" is indicated), and "on-hook" state, event code 65, to indicate that no 2600 Hz tone is playing (binary "0" is indicated). If this system is used, idle state MUST be indicated by a state of "on-hook" at both ends. R1 has a signal capacity of 15 codes for forward inter-register signals but no backward inter-register signals. Each code or digit is transmitted by a tone pair from a set of 6 frequencies. The R1 register signals consist of KP, ST, and the digits "0" through "9". The frequencies allotted to the signals are shown in Table 12. Note that these frequencies are the same as those allotted to the similarly-named SS No. 5 register signals, except that KP uses the frequency combination corresponding to KP1 in SS No. 5. Table 12 also shows additional signals used in North American practice: KP', KP2P, KP3P, STP or ST', ST2P, and ST3P. [Reference to be added when verified -- probably Telcordia GR-506.] Lower Upper Frequency (Hz) Frequency (Hz) 900 1100 1300 1500 1700 700 Digit 1 Digit 2 Digit 4 Digit 7 KP3P or ST3P 900 Digit 3 Digit 5 Digit 8 KP' or STP 1100 Digit 6 Digit 9 KP Schulzrinne, Petrack Expires - July 2004 [Page 51] RTP Events and Tones Payloads February 2004 1300 Digit 0 KP2P or ST2P 1500 ST Table 12: North American R1 and MF Register Signals Implementations using the telephone-events payload to carry North American R1 register signalling MUST use the following events from Table 10 to convey the register signals shown in Table 12: - event code 128 to convey Digit 0 - event codes 129-137 to convey Digits 1 through 9 respectively - event code 139 to convey KP - event code 141 to convey ST - event code 142 to convey KP' or STP - event code 140 to convey KP2P or ST2P - event code 138 to convey KP3P or ST3P. Unlike SS No. 5, R1 allows a large tolerance for the time of onset of register signalling following the recognition of start-dialling line signal. This means that sending implementations MAY wait to send a KP event report until the KP has completed. 3.5.3 MFC R2 signaling International MFC R2 is described in ITU-T Recommendations Q.400- Q.490 [N-17], but there are many national variants. R2 line signals are continuous, out-of-band, link by link, and channel associated. R2 (inter)register signals are multifrequency, compelled, in-band, end to end, and also channel associated. R2 line signals may be analog, one-bit digital using the A bit in the 16th channel, or digital using both A and B bits. No events are defined in support of R2 line signalling. However, implementations MAY use the ABCD events described in section 3.5.4 and shown in Table 10 to propagate these signals. If they do so, they MUST use the following mappings. For the analog R2 line signals shown in Table 1 of ITU-T Recommendation Q.411, implementations MUST map the R2 signals as Schulzrinne, Petrack Expires - July 2004 [Page 52] RTP Events and Tones Payloads February 2004 follows. This mapping is based on an underlying mapping of A bit = 1 when tone is present. - event code 144 (Table 10) is used to indicate the Q.411 "tone-off" condition - event code 145 (Table 10), is used to indicate the Q.411 "tone-on" condition. The digital R2 line signals as described by ITU-T Recommendation Q.421 are carried in two bits, A and B. The mapping between A and B bit values and event codes SHALL be the same in both directions and SHALL follow the principles for A and B bit mapping specified in section section 3.5.4. In R2 signaling, the signaling sequence is initiated from the outgoing exchange by sending a line "seizing" signal. After line "seizing" signal (and "seizing acknowledgment" signal in R2D), the signaling sequence continues using MF register signals. ITU-T Recommendation Q.441 classifies the forward MF register signals into Groups I and II, the backward MF register signals into Groups A and B. These groups are significant with respect both to what sort of information they convey and where they can occur in the signalling sequence. R2 is a compelled tone signaling protocol, meaning that one tone is played until an "acknowledgment or directive for the next tone" is received which indicates that the original tone should cease. In R2 signaling, the signaling sequence is initiated from the outgoing exchange by sending a forward Group I signal. The first forward signal is typically the first digit of the called number. The incoming exchange typically replies with a backward Group A-1 indicating to the outgoing exchange to send the next digit of the called number. The tones have meaning, however, the meaning varies depending on where the tone occurs in the signaling. The meaning may also depend on the country. Thus, to avoid an unmanageable number of events, this document simply provides means to indicate the 15 forward and 15 backward MF R2 tones (i.e., using event codes 176-190 and 191-205 respectively as shown in Table 10). The frequency pairs for these tones are shown in Tables 13 and 14. Schulzrinne, Petrack Expires - July 2004 [Page 53] RTP Events and Tones Payloads February 2004 Lower Upper Frequency (Hz) Frequency (Hz) 1500 1620 1740 1860 1980 1380 Fwd 1 Fwd 2 Fwd 4 Fwd 7 Fwd 11 1500 Fwd 3 Fwd 5 Fwd 8 Fwd 12 1620 Fwd 6 Fwd 9 Fwd 13 1740 Fwd 10 Fwd 14 1860 Fwd 15 Table 13: R2 Forward Register Signals Lower Upper Frequency (Hz) Frequency (Hz) 1140 1020 900 780 660 1020 Bkwd 1 Bkwd 2 Bkwd 4 Bkwd 7 Bkwd 11 900 Bkwd 3 Bkwd 5 Bkwd 8 Bkwd 12 780 Bkwd 6 Bkwd 9 Bkwd 13 660 Bkwd 10 Bkwd 14 540 Bkwd 15 Table 14: R2 Backward Register Signals 3.5.4 ABCD Transitional Signaling For Digital Trunks ABCD is a 4-bit signaling system used by digital trunks. For N-state (N<=16) signaling, the first N values are used. ABCD signaling events are all mutually exclusive states. The most recent state transition determines the current state. The T1 ESF (extended super frame format) allows 2, 4, and 16 state signalling bit options. These signalling bits are named A, B, C, and D. Signalling information is sent as robbed bits in frames 6, 12, 18, and 24 when using ESF T1 framing. A D4 superframe only transmits 4- state signalling with A and B bits. On the CEPT E1 frame, all Schulzrinne, Petrack Expires - July 2004 [Page 54] RTP Events and Tones Payloads February 2004 signalling is carried in timeslot 16, and two channels of 16-state (ABCD) signalling are sent per frame. Since this information is a state rather than a changing signal, implementations SHOULD use the following triple-redundancy mechanism, similar to the one specified in ITU-T Rec. I.366.2 [N-12], Annex L. At the time of a transition, the same ABCD information is sent 3 times at an interval of 5 ms. If another transition occurs during this time, then this continues. After a period of no change, the ABCD information is sent every 5 seconds. As shown in Table 10, the 16 possible states are represented by event codes 144 to 159 respectively. Implementations using these event codes MUST map them to and from the ABCD information based on the following principles: 1) State numbers are derived from the used subset of ABCD bits by treating them as a single binary number, where the A bit is the low-order bit. As examples: if only A and B bits are used, A=0, B=1, the state number is 2 (binary 10); if all four bits are used, A=0, B=1, C=0, D=1, then the state number is 10 (binary 1010). 2) State numbers map to event codes by order of increasing value (i.e., state number 0 maps to event code 144, ..., state number 15 maps to event code 159). 3.5.5 Continuity Tones Continuity tones are used for testing circuit continuity during call setup. Two basic procedures are used. In international practice, clause 7 of ITU- T Recommendation Q.724 [N-N-18] describes a procedure applicable to four-wire trunk circuits, where a single 2000 +/- 20 Hz check-tone is transmitted from the initiating telephone switch. The remote switch sets up a loopback, and continuity check passes if the sending switch can detect the tone on the return path. Q.724 clause 8 describes the procedure for two-wire trunk circuits. The two-wire procedure involves two tones: a 2000 Hz tone sent in the forward direction, and a 1780 +/- 20 Hz tone sent in response. If implementations use the telephone-events payload type to propagate continuity check-tones, they MUST map these tones to event codes as follows: - For four-wire continuity testing, the 2000 Hz check-tone is mapped to event code 167. - For two-wire continuity testing, the initial 2000 Hz check-tone Hz tone is mapped to event code 167. The 1780 Hz continuity verify tone is mapped to event code 168. Schulzrinne, Petrack Expires - July 2004 [Page 55] RTP Events and Tones Payloads February 2004 3.5.6 Trunk Unavailable Event This event indicates that the trunk is unavailable for service. The length of the downtime is indicated in the duration field. The duration field is set to a value that allows adequate granularity in describing downtime. A value of 1 second is RECOMMENDED. When the trunk becomes unavailable, this event is sent with the same timestamp three times at an interval of 20 ms. If the trunk persists in the unavailable state at the end of the indicated duration, then it is retransmitted, preferably with the same redundancy scheme. Unavailability of the trunk might result from a failure or an administrative action. This event is used in a stateless manner to synchronize trunk unavailability between equipment connected through provisioned RTP trunks. It avoids the unnecessary consumption of bandwidth in sending a continuous stream of RTP packets with a fixed payload for the duration of the downtime, as would be required in certain E1-based applications. In T1-based applications, trunk conditioning via the ABCD transitional events can be used instead. 4. RTP Payload Format for Telephony Tones 4.1 Introduction As an alternative to describing tones and events by name, as described in section sec:named, it is sometimes preferable to describe them by their waveform properties. In particular, recognition is faster than for naming signals since it does not depend on recognizing durations or pauses. There is no single international standard for telephone tones such as dial tone, ringing (ringback), busy, congestion ("fast-busy"), special announcement tones or some of the other special tones, such as payphone recognition, call waiting or record tone. However, across all countries, these tones share a number of characteristics \citenorm{E.180s2: - Telephony tones consist of either a single tone, the addition of two or three tones or the modulation of two tones. (Almost all tones use two frequencies; only the Hungarian "special dial tone" has three.) Tones that are mixed have the same amplitude and do not decay. - In-band tones for telephony events are in the range of 25 Hz (ringing tone in Angola) to 2600 Hz (the tone used for line signalling in SS No. 5 and R1). The in-band telephone frequency range is limited to 3400 Hz. R2 defines a 3825 Hz out-of-band Schulzrinne, Petrack Expires - July 2004 [Page 56] RTP Events and Tones Payloads February 2004 tone for line signalling on analogue trunks. (The piano has a range from 27.5 to 4186 Hz.) - Modulation frequencies range between 15 (ANSam tone) to 480 Hz (Jamaica). Non-integer frequencies are used only for frequencies of 16 2/3 and 33 1/3 Hz. (These fractional frequencies appear to be derived from AC power grid frequencies.) - Tones that are not continuous have durations of less than four seconds. - ITU Recommendation E.180 [N-7] notes that different telephone companies require a tone accuracy of between 0.5 and 1.5%. The Recommendation suggests a frequency tolerance of 1%. 4.2 Examples of Common Telephone Tone Signals As an aid to the implementor, Table 15 summarizes some common tones. The rows labeled "ITU ..." refer to ITU-T Recommendation E.180 [N-7]. Note that there are no specific guidelines for these tones. In the table, the symbol "+" indicates addition of the tones, without modulation, while "*" indicates amplitude modulation. The meaning of these tones is described in section 3.3. Tone Name Frequency On Period Off Period (s) (s) CNG 1100 0.5 3.0 V.25 CT 1300 0.5 2.0 CED 2100 3.3 -- ANS 2100 3.3 -- ANSam 2100*15 3.3 -- V.21 "0" bit, 1180 0.00333 -- channel 1 V.21 "1" bit, 980 0.00333 -- channel 1 V.21 "0" bit, 1850 0.00333 -- channel 2 Schulzrinne, Petrack Expires - July 2004 [Page 57] RTP Events and Tones Payloads February 2004 V.21_"1"_bit, 1650 0.00333 -- channel 2 ------------- ---------- --------- ---------- ITU dial tone 425 -- -- U.S. dial 350+440 -- -- tone ITU ringing 425 0.67-1.5 3-5 tone U.S._ringing_ 440+480 2.0 4.0 tone ITU busy tone 425 U.S. busy 480+620 0.5 0.5 tone ITU 425 congestion tone U.S. 480+620 0.25 0.25 congestion tone Table 15: Examples of telephony tones 4.3 Use of RTP Header Fields 4.3.1 Timestamp The RTP timestamp reflects the measurement point for the current packet. The event duration described in section 4.3.3 extends forwards from that time. 4.3.2 Marker Bit The tones payload type uses the marker bit to distinguish the first RTP packet reporting a given instance of a tone from succeeding packets for that tone. The marker bit SHOULD be set to 1 for the first packet, and to 0 for all succeeding packets relating to the same tone. Schulzrinne, Petrack Expires - July 2004 [Page 58] RTP Events and Tones Payloads February 2004 4.3.3 Payload Format Based on the characteristics described above, this document defines an RTP payload format called "tone" that can represent tones consisting of one or more frequencies. (The corresponding MIME type is "audio/tone".) The default timestamp rate is 8000 Hz, but other rates may be defined. Note that the timestamp rate does not affect the interpretation of the frequency, just the durations. In accordance with current practice, this payload format does not have a static payload type number, but uses a RTP payload type number established dynamically and out-of-band. The payload format is shown in Figure 2. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | modulation |T| volume | duration | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R R R R| frequency |R R R R| frequency | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R R R R| frequency |R R R R| frequency | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ...... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R R R R| frequency |R R R R| frequency | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: Payload Format for Tones The payload contains the following fields: modulation: The modulation frequency, in Hz. The field is a 9-bit unsigned integer, allowing modulation frequencies up to 511 Hz. If there is no modulation, this field has a value of zero. T: If the "T" bit is set (one), the modulation frequency is to be divided by three. Otherwise, the modulation frequency is taken as is. Schulzrinne, Petrack Expires - July 2004 [Page 59] RTP Events and Tones Payloads February 2004 This bit allows frequencies accurate to 1/3 Hz, since modulation frequencies such as 16 2/3 Hz are in practical use. volume: The power level of the tone, expressed in dBm0 after dropping the sign, with range from 0 to -63 dBm0. (Note: A preferred level range for digital tone generators is -8 dBm0 to -3 dBm0.) duration: The duration of the tone, measured in timestamp units. The tone begins at the instant identified by the RTP timestamp and lasts for the duration value. The value of zero is not permitted and tones with such a duration SHOULD be ignored. The definition of duration corresponds to that for sample-based codecs, where the timestamp represents the sampling point for the first sample. frequency: The frequencies of the tones to be added, measured in Hz and represented as a 12-bit unsigned integer. The field size is sufficient to represent frequencies up to 4095 Hz, which exceeds the range of telephone systems. A value of zero indicates silence. A single tone can contain any number of frequencies. If the number of frequencies it contains is odd, padding SHALL be added to bring the packet to a 32-bit boundary. (RFC 3550 [N-4] requires that padding be set to all zeroes.) R: This field is reserved for future use. The sender MUST set it to zero, the receiver MUST ignore it. 4.3.4 Optional MIME Parameters The "rate" parameter describes the sampling rate, in Hertz. The number is written as a floating point number or as an integer. If omitted, the default value is 8000 Hz. 4.4 Procedures This section defines the procedures associated with the tones payload type. Schulzrinne, Petrack Expires - July 2004 [Page 60] RTP Events and Tones Payloads February 2004 4.4.1 Sending Procedures As indicated by the examples in Table 15, the duration of an individual tone may range from a few milliseconds to a number of seconds. Timing considerations dictate some general guidelines for how these two extremes should be handled by the sender. For tones directed to human listeners, timing is not critical, within a tolerance of 100 ms or so at either beginning or end. For tones directed to remote equipment, the most critical aspect of timing is intra-stream time relationships -- that is, the individual tone durations and the interval between tones for a related sequence of them. The timing of the start of playout of a related sequence is less critical within limits. In the case of longer-duration tones, implementations SHOULD expect to generate multiple RTP packets for the same tone instance. The considerations just enumerated suggest that a packetization interval in the order of 50 ms may be acceptable, in terms of the initial delay it imposes on remote playback. Implementations MAY adjust the packetization interval to suit the nature of the tones being played out. The packetization interval SHOULD remain constant until the tone ends in order not to distort playout times through buffer under- runs. The RTP timestamp MUST be updated for each packet generated (in contrast, for instance, to the timestamp for packets carrying telephone- events). The first RTP packet for a tone SHOULD have the marker bit set to 1. Subsequent packets for the same tone SHOULD have the marker bit set to 0, and the RTP timestamp in each subsequent packet MUST equal the sum of the timestamp and the duration in the preceding packet. A final RTP packet SHOULD be generated as soon as the end of the tone is detected, without waiting for the latest packetization interval to elapse. If the tones are meant for machine consumption, the intervals between them are potentially critical. Implementations may be aware of this situation, or may infer it from a heuristic such as that the tones are less than a second in duration. In this situation, it is RECOMMENDED that if a tone follows another tone within a period of 100 ms or less, the new tone should be reported as soon as it has been identified. The suggested 50 ms packetization interval should be applied to subsequent reports for the same tone. The above advice applies to tones lasting in the order of 25 ms or more. Shorter tones, which are likely to be from modems, SHOULD be reported in batches. The tones payload format requires that each tone be reported in a separate RTP packet, but it is RECOMMENDED that multiple RTP packets be reported in the same UDP packet. Individual tones should be given their actual durations (i.e., from transition Schulzrinne, Petrack Expires - July 2004 [Page 61] RTP Events and Tones Payloads February 2004 point to transition point) rather than reporting a new tone at each bit boundary. 4.4.2 Receiving Procedures Receiving implementations play out the tones as received. When playing out successive tone reports for the same tone (marker bit is zero, the RTP timestamp is contiguous with that of the previous RTP packet, and payload content is identical), the receiving implementation SHOULD continue the tone without change or a break. 5. Application Considerations 5.1 Combining Tones and Named Events Gateways which send signalling events via RTP MAY send both named signals (section named) and the tone representation (section tones) as a single RTP session, using the redundancy mechanism defined in RFC 2198 [N-2] to interleave the two representations. It is generally a good idea to send both, since it allows the receiver to choose the appropriate rendering. If a gateway cannot present a tone representation, it SHOULD also send the audio tones as regular RTP audio packets using either the codec used for regular speech signals or a codec that is known to carry such signals successfully (e.g., PCMU). Some low-rate codecs cannot accurately represent certain tones, such as DTMF. 5.2 Simultaneous Generation of Audio and Events A source can choose between four approaches: Events and audio: The source sends events and encoded audio packets (e.g., PCMU or the codec used for speech signals) for the same time instant. In that mode, events are treated as redundant encodings for the encoded audio stream. Events only: The source does not send encoded audio while event tones are active and only sends named events, without any redundancy beyond the periodic updates of longer-lasting events. Schulzrinne, Petrack Expires - July 2004 [Page 62] RTP Events and Tones Payloads February 2004 Events only, with redundancy: The source does not send encoded audio while event tones are active. It only sends named events, but uses RFC 2198 [N-2] redundancy, with named events as both primary and redundant encodings. Events and audio, with redundancy: During an event, the source sends both named events and audio, using RFC 2198 [N-2] to interleave audio data, current and redundant named events. The choices above do not affect the event redundancy mechanism described in section 2.6. Note that a period covered by a named event may overlap in time with a period of audio encoded by other means. This is likely to occur at the onset of a tone and is necessary to avoid possible errors in the interpretation of the reproduced tone at the remote end. Implementations supporting this payload format MUST be prepared to handle the overlap. It is RECOMMENDED that gateways only render the encoded tone since the audio may contain spurious tones introduced by the audio compression algorithm. However, it is anticipated that these extra tones in general should not interfere with recognition at the far end. 5.3 Strategies For Handling FAX and Modem Signals As described in section 3.2, the typical data application involves a pair of gateways interposed between two terminals, where the terminals are in the PSTN. The gateways are likely to be serving a mixture of voice and data traffic, and need to adopt payload types appropriate to the media flows as they occur. If voice compression is in use for voice calls, this means that the gateways need the flexibility to switch to other payload types when data streams are recognized. Within the established IETF framework, this implies that the gateways must negotiate the potential payloads (voice, telephone-events, tones, voice-band data, T.38 FAX [I-8], and possibly RFC 2793 [I-1] text and CLEARMODE [I-2] octet streams) as separate payload types. From a timing point of view, this is most easily done at the beginning of a call, but results in an over-allocation of resources at the gateways and in the intervening network. One alternative is to use named events to buy time while out-of-band signals are exchanged to update to the new payload type applicable to the session. Thanks to the events defined in section sec:data, this Schulzrinne, Petrack Expires - July 2004 [Page 63] RTP Events and Tones Payloads February 2004 is a viable approach for sessions beginning with V.8, V.8bis, T.30, or V.25 control sequences. Named data-related events also allow gateways to optimize their operation when data signals are received in a relatively general form. One example is the use of V.8-related events to deduce that the voice-band data being sent in a G.711 payload comes from a higher-speed modem and therefore requires disabling of echo cancellors. All of the control procedures described in section sec:data eventually give way to data content. As mentioned above, this content will be carried by other payload types. Receiving gateways MUST be prepared to switch to the other payload type within the time constraints associated with the respective applications. (For several of the procedures documented below, the sender provides 75 ms of silence between the initial control signalling and the sending of data content.) In some cases (V.8bis [N-21], T.30 [N-19]), further control signalling may happen after the call has been established. A possible strategy is to send both telephone-events and the data payload in an RFC 2198 redundancy arrangement. The receiving gateway then propagates the data payload whenever no event is in progress. For this to work, the data payload and events (when present) MUST cover exactly the same time period; otherwise spurious events will be detected downstream. Note that there are a number of cases where no control sequence will precede the data content. This is true, for example, for a number of legacy text terminal types. In such instances, the events defined in section sec:v18ev in particular MAY be sent to help the remote gateway optimize its handling of the alternative payload. 5.4 Examples 5.4.1 Use of RFC 2198 Redundancy With Named Events A typical RTP packet, where the user is just dialing the last digit of the DTMF sequence "911", is shown in Figure 3. The first digit was 200 ms long (1600 timestamp units) and started at time 0, the second digit lasted 250 ms (2000 timestamp units) and started at time 800 ms (6400 timestamp units), the third digit was pressed at time 1.4 s (11,200 timestamp units) and the packet shown was sent at 1.45 s (11,600 timestamp units). The frame duration is 50 ms. To make the parts recognizable, Figure 3 ignores byte alignment. Timestamp and sequence number are assumed to have been zero at the beginning of the first digit. In this example, the dynamic payload types 96 and 97 have been assigned for the redundancy mechanism and the telephone event payload, respectively. Schulzrinne, Petrack Expires - July 2004 [Page 64] RTP Events and Tones Payloads February 2004 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | | 2 |0|0| 0 |0| 96 | 13 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | | 11200 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | | 0x5234a8 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| block PT | timestamp offset | block length | |1| 97 | 11200 | 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| block PT | timestamp offset | block length | |1| 97 | 11200 - 6400 = 4800 | 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| Block PT | |0| 97 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | digit |E R| volume | duration | | 9 |1 0| 7 | 1600 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | digit |E R| volume | duration | | 1 |1 0| 10 | 2000 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | digit |E R| volume | duration | | 1 |0 0| 20 | 400 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3: Example RTP packet after dialing "911" Table 16 shows all packets up to and including the packet shown in the figure. The last three columns describe the duration fields in the event payloads. The timestamp offset is not shown. We assume here that the digits happen to start on a 50 ms multiple, which is somewhat unlikely. Schulzrinne, Petrack Expires - July 2004 [Page 65] RTP Events and Tones Payloads February 2004 Time (s) Event RTP Timestamp Duration Seq "9" "1" "1" 0.00 "9" - - - - - starts 0.05 0 0 400 - - 0.10 1 0 800 - - 0.15 2 0 1,200 - - 0.20 "9" ends 3 0 1,600 - - 0.25 4 0 1,600 - - 0.30 5 0 1,600 - - 0.80 "1" - - - - - starts 0.85 6 6,400 1,600 400 - 0.90 7 6,400 1,600 800 - 0.95 8 6,400 1,600 1,200 - 1.00 9 6,400 1,600 1,600 - 1.05 "1" ends 10 6,400 1,600 2,000 - 1.10 11 6,400 1,600 2,000 - 1.15 12 6,400 1,600 2,000 - 1.40 "1" - - - - - starts 1.45 13 11,200 1,600 2,000 400 Table 16: RTP packets for example 5.4.2 Combined Tone and Telephone-event Payloads The payload formats in sections 2 and 4 can be combined into a single payload using the method specified in RFC 2198 [N-2]. Figure 4_shows an example. In that example, the RTP packet combines two "tone" and one "telephone-event" payloads. The payload types are chosen arbitrarily as 97 and 98, respectively, with a sample rate of 8000 Hz. Here, the redundancy format has the dynamic payload type 96. The packet represents a snapshot of U.S. ringing tone, 1.5 seconds (12,000 timestamp units) into the second "on" part of the 2.0/4.0 second cadence, i.e., a total of 7.5 seconds (60,000 timestamp units) into the ring cycle. The 440 + 480 Hz tone of this second cadence started at RTP timestamp 48,000. Four seconds of silence preceded it, but since RFC 2198 only has a fourteen-bit offset, only 2.05 seconds (16383 timestamp units) can be represented. Even though the tone sequence is not complete, the sender was able to determine that this is indeed ringback, and thus includes the corresponding named event. Schulzrinne, Petrack Expires - July 2004 [Page 66] RTP Events and Tones Payloads February 2004 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | V |P|X| CC |M| PT | sequence number | | 2 |0|0| 0 |0| 96 | 31 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | | 48000 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | | 0x5234a8 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| block PT | timestamp offset | block length | |1| 98 | 16383 | 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| block PT | timestamp offset | block length | |1| 97 | 16383 | 8 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| Block PT | |0| 97 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | event=ring |0|0| volume=0 | duration=28383 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | modulation=0 |0| volume=63 | duration=16383 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0| frequency=0 |0 0 0 0| frequency=0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | modulation=0 |0| volume=5 | duration=12000 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0| frequency=440 |0 0 0 0| frequency=480 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4: Combining tones and events in a single RTP packet Schulzrinne, Petrack Expires - July 2004 [Page 67] RTP Events and Tones Payloads February 2004 6. MIME Registration 6.1 audio/telephone-event MIME media type name: audio MIME subtype name: telephone-event Required parameters: none. Optional parameters: The "events" parameter lists the events supported by the implementation. Events are listed as one or more comma-separated elements. Each element can either be a single integer or two integers separated by a hyphen. No white space is allowed in the argument. The integers designate the event numbers supported by the implementation. The "rate" parameter describes the sampling rate, in Hertz. The number is written as a floating point number or as an integer. If omitted, the default value is 8000 Hz. Encoding considerations: This type is only defined for transfer via RTP [N-4]. Security considerations: See the "Security Considerations" section (section 7) in this document. Interoperability considerations: none Published specification: This document. Applications which use this media: The telephone-event audio subtype supports the transport of events occuring in telephone systems over the Internet. Additional information: 1. Magic number(s): N/A 2. File extension(s): N/A 3. Macintosh file type code: N/A Schulzrinne, Petrack Expires - July 2004 [Page 68] RTP Events and Tones Payloads February 2004 6.2 audio/tone MIME media type name: audio MIME subtype name: tone Required parameters: none Optional parameters: The "rate" parameter describes the sampling rate, in Hertz. The number is written as a floating point number or as an integer. If omitted, the default value is 8000 Hz. Encoding considerations: This type is only defined for transfer via RTP [N-4]. Encoding considerations: audio/tone MIME body parts contain binary data. A content- transfer-encoding of "binary" is strongly encouraged for messaging environments which support binary transport. A content-transfer- encoding of base-64 (and the associated transformation) is strongly encouraged for messaging environments which do not support binary transfer. Security considerations: See the "Security Considerations" section (section 7) in this document. Interoperability considerations: none Published specification: This document. Applications which use this media: The tone audio subtype supports the transport of pure composite tones, for example those commonly used in the current telephone system to signal call progress. Additional information: 1. Magic number(s): N/A 2. File extension(s): N/A 3. Macintosh file type code: N/A Schulzrinne, Petrack Expires - July 2004 [Page 69] RTP Events and Tones Payloads February 2004 7. Security Considerations RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification (RFC 3550 [N-4]), and any appropriate RTP profile (for example RFC 3551 [N-5]). This implies that confidentiality of the media streams is achieved by encryption. Because the data compression used with this payload format is applied end-to-end, encryption may be performed after compression so there is no conflict between the two operations. This payload type does not exhibit any significant non-uniformity in the receiver side computational complexity for packet processing to cause a potential denial-of-service threat. In older networks employing in-band signaling and lacking appropriate tone filters, the tones in Section sec:trunk may be used to commit toll fraud. Additional security considerations are described in RFC 2198 [N-2]. A security review of this payload format found no additional considerations beyond those in the RTP specification. 8. IANA Considerations This document defines two new RTP payload formats, named telephone- event and tone, and associated Internet media (MIME) types, audio/telephone-event and audio/tone. Within the audio/telephone-event type, additional events MUST be registered with IANA. Registrations are subject to approval by the current chair of the IETF audio/video transport working group, or by an expert designated by the transport area director if the AVT group has closed. The meaning of new events MUST be documented either as an RFC or an equivalent standards document produced by another standardization body, such as ITU-T. 9. Changes Since RFC 2833 - RFC 2833 had assigned only two code points to the three MF signals S1, S2 and S3. S3 has been moved to code point 174. - The test tone descriptions were confusing; now, there are just two test tone entries, for the 2010 Hz and 1780 Hz tone. Schulzrinne, Petrack Expires - July 2004 [Page 70] RTP Events and Tones Payloads February 2004 - MFC R2 forward and backward tones were added to the trunk event list. - Added the "trunk unavailable" event (Rajesh Kumar). - Clarified that the duration timestamp is unsigned and that events exceeding the maximum duration expressible in the duration field should be split into several events, i.e., with a new start time. - Distinguished states from events. States are sent with an estimated duration, and can be superseded if the state changes before the duration has expired. A special duration value of 0 indicates an infinite duration. - Clarified how very long events that exceed the maximum expressable duration value should be handled. - Updated RTP and AVP RFC references. - The -04 version includes a major reorganization of prior text and addition of extensive material, both tutorial and normative, on the use of the named events. Text added in -04 is marked by change bars. - In the -04 version, removed the conformance statements present in section 3.3 in previous versions. - Also in the -04 version, moved the flash event from the "DTMF" to the "Basic Line Event" category. 10. Acknowledgements The suggestions of the Megaco working group are gratefully acknowledged. Detailed advice and comments were provided by Hisham Abdelhamid, Flemming Andreasen, Fred Burg, Steve Casner, Dan Deliberato, Fatih Erdin, Bill Foster, Mike Fox, Mehryar Garakani, Gunnar Hellstrom, Rajesh Kumar, Terry Lyons, Steve Magnell, Zarko Markov, Kai Miao, Satish Mundra, Vern Paxson, Colin Perkins, Raghavendra Prabhu, Todd Sherer, Mira Stevanovic, Alex Urquizo and Herb Wildfeur. Tom Taylor provided the editorial reworking and additional text introduced in the -04 version of the bis document. Schulzrinne, Petrack Expires - July 2004 [Page 71] RTP Events and Tones Payloads February 2004 11. Authors Henning Schulzrinne Dept. of Computer Science Columbia University 1214 Amsterdam Avenue New York, NY 10027 USA electronic mail: schulzrinne@cs.columbia.edu Scott Petrack eDial USA electronic mail: sf scott.petrack@edial.com 12. References 12.1 Normative References [N-1] S. Bradner, "Key words for use in RFCs to indicate requirement levels", RFC 2119, Internet Engineering Task Force, Mar. 1997. [N-2] C. E. Perkins, I. Kouvelas, O. Hodson, V. J. Hardman, M. Handley, J. C. Bolot, A. Vega-Garcia, and S. Fosse-Parisis, "RTP payload for redundant audio data", RFC 2198, Internet Engineering Task Force, Sept. 1997. [N-3] M. Handley and V. Jacobson, "SDP: session description protocol", RFC 2327, Internet Engineering Task Force, Apr. 1998. [N-4] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: a transport protocol for real-time applications", RFC 3550, Internet Engineering Task Force, Jul. 2003. [N-5] H. Schulzrinne, "RTP profile for audio and video conferences with minimal control", RFC 3551, Internet Engineering Task Force, Jul. 2003. [N-6] S. Casner, P. Hoschka, "MIME Type Registration of RTP Payload Formats", RFC 3555, Internet Engineering Task Force, Jul. 2003. Schulzrinne, Petrack Expires - July 2004 [Page 72] RTP Events and Tones Payloads February 2004 [N-7] International Telecommunication Union, "Technical characteristics of tones for the telephone service", Recommendation E.180/Q.35, ITU-T, Geneva, Switzerland, Mar. 1998. [N-8] International Telecommunication Union, "Various tones used in national networks", Recommendation Supplement 2 to Recommendation E.180, ITU-T, Geneva, Switzerland, Jan. 1994. This publication has now been replaced by a list periodically updated and available through the "International Numbering Resources" link on the ITU-T home page. The latest version (dated Feb. 2003) appears as an annex to ITU-T Operational Bulletin 781. While a price is posted for the Operational Bulletin, the list itself is available at no charge through the "Lists annexed..." link on the Operational Bulletin page. [N-9] International Telecommunication Union, "Application of tones and recorded announcements in telephone services", Recommendation E.182, ITU-T, Geneva, Switzerland, Mar. 1998. [N-10] International Telecommunication Union, "Echo suppressors", Recommendation G.164, ITU-T, Geneva, Switzerland, Nov. 1988. [N-11] International Telecommunication Union, "Echo cancellers", Recommendation G.165, ITU-T, Geneva, Switzerland, Mar. 1993. [N-12] International Telecommunication Union, "AAL type 2 service specific convergence sublayer for trunking", Recommendation I.366.2, ITU-T, Geneva, Switzerland, Feb. 1999. [N-13] International Telecommunication Union, "Technical features of push-button telephone sets", Recommendation Q.23, ITU-T, Geneva, Switzerland, Nov. 1988. [N-14] International Telecommunication Union, "Multifrequency push- button signal reception", Recommendation Q.24, ITU-T, Geneva, Switzerland, Nov. 1988. [N-15] International Telecommunication Union, "Specifications for signaling system no. 5", Recommendation Q.140-Q.180, ITU-T, Geneva, Switzerland, Nov. 1988. [N-16] International Telecommunication Union, "Specifications of Signalling System R1", Recommendation Q.310-Q.332, ITU-T, Geneva, Switzerland, Nov. 1988. [N-17] International Telecommunication Union, "Specifications of signalling system R2", Recommendation Q.400-Q.490, ITU-T, Geneva, Switzerland, Nov. 1988. Schulzrinne, Petrack Expires - July 2004 [Page 73] RTP Events and Tones Payloads February 2004 [N-18] International Telecommunication Union, "Telephone user part signalling procedures", Recommendation Q.724, ITU-T, Geneva, Switzerland, Nov. 1988. [N-19] International Telecommunication Union, "Procedures for document facsimile transmission in the general switched telephone network", Recommendation T.30, ITU-T, Geneva, Switzerland, July 2003. [N-20] International Telecommunication Union, "Procedures for starting sessions of data transmission over the public switched telephone network", Recommendation V.8, ITU-T, Geneva, Switzerland, Nov. 2000. [N-21] International Telecommunication Union, "Procedures for the identification and selection of common modes of operation between data circuit-terminating equipments (DCEs) and between data terminal equipments (DTEs) over the public switched telephone network and on leased point-to-point telephone-type circuits", Recommendation V.8bis, ITU-T, Geneva, Switzerland, Nov. 2000. [N-22] International Telecommunication Union, "Operational and interworking requirements for {DCEs operating in the text telephone mode", Recommendation V.18, ITU-T, Geneva, Switzerland, Nov. 2000. See also Recommendation V.18 Amendment 1, Nov. 2002. [N-23] International Telecommunication Union, "300 bits per second duplex modem standardized for use in the general switched telephone network", Recommendation V.21, ITU-T, Geneva, Switzerland, Nov. 1988. [N-24] International Telecommunication Union, "1200 bits per second duplex modem standardized for use in the general switched telephone network and on point-to-point 2-wire leased telephone-type circuits", Recommendation V.22, ITU-T, Geneva, Switzerland, Nov. 1988. [N-25] International Telecommunication Union, "Automatic answering equipment and general procedures for automatic calling equipment on the general switched telephone network including procedures for disabling of echo control devices for both manually and automatically established calls", Recommendation V.25, ITU-T, Geneva, Switzerland, Oct. 1996. See also Corrigendum 1 to Recommendation V.25, Jul. 2001. Schulzrinne, Petrack Expires - July 2004 [Page 74] RTP Events and Tones Payloads February 2004 [N-26] Bellcore, "Functional criteria for digital loop carrier systems", Technical Requirement TR-NWT-000057, Telcordia (formerly Bellcore), Morristown, New Jersey, Jan. 1993. 12.2 Informative References [I-1] G. Hellstrom, "RTP Payload for Text Conversation", RFC 2793, Internet Engineering Task Force, May 2000. [I-2] R. Kreuter, "{RTP Payload for a 64 kbit/s transparent call", Work in progress, Internet Engineering Task Force, December 2003. [I-3] International Telecommunication Union, "Pulse code modulation (PCM) of voice frequencies", Recommendation G.711, ITU-T, Geneva, Switzerland, Nov. 1988. [I-4] International Telecommunication Union, "Speech coders : Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s", Recommendation G.723.1, ITU-T, Geneva, Switzerland, Mar. 1996. [I-5] International Telecommunication Union, "Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear-prediction (CS-ACELP)", Recommendation G.729, ITU-T, Geneva, Switzerland, Mar. 1996. [I-6] International Telecommunication Union, "Terminal for low bit- rate multimedia communication", Recommendation H.324, ITU-T, Geneva, Switzerland, Mar. 2002. [I-7] International Telecommunication Union, "ISDN user-network interface layer 3 specification for basic call control", Recommendation Q.931, ITU-T, Geneva, Switzerland, May 1998. [I-8] International Telecommunication Union, "Procedures for real- time Group 3 facsimile communication over IP networks", Recommendation T.38, ITU-T, Geneva, Switzerland, Jul. 2003. [I-9] International Telecommunication Union, "International interworking for videotex services", Recommendation T.101, ITU-T, Geneva, Switzerland, Nov. 1994. [I-10] International Telecommunication Union, "Data protocols for multimedia conferencing", Recommendation T.120, ITU-T, Geneva, Switzerland, Jul. 1996. Schulzrinne, Petrack Expires - July 2004 [Page 75] RTP Events and Tones Payloads February 2004 [I-11] International Telecommunication Union, "A 2-wire modem for facsimile applications with rates up to 14 400 bit/s", Recommendation V.17, ITU-T, Geneva, Switzerland, Feb. 1991. [I-12] International Telecommunication Union, "600/1200-baud modem standardized for use in the general switched telephone network", Recommendation V.23, ITU-T, Geneva, Switzerland, Nov. 1988. [I-13] International Telecommunication Union, "4800/2400 bits per second modem standardized for use in the general switched telephone network", Recommendation V.27ter, ITU-T, Geneva, Switzerland, Nov. 1988. [I-14] International Telecommunication Union, "9600 bits per second modem standardized for use on point-to-point 4-wire leased telephone-type circuits", Recommendation V.29, ITU-T, Geneva, Switzerland, Nov. 1988. [I-15] International Telecommunication Union, "A modem operating at data signalling rates of up to 33 600 bit/s for use on the general switched telephone network and on leased point-to- point 2-wire telephone-type circuits", Recommendation V.34, ITU-T, Geneva, Switzerland, Feb. 1998. [I-16] International Telecommunication Union, "A digital modem and analogue modem pair for use on the Public Switched Telephone Network (PSTN) at data signalling rates of up to 56 000 bit/s downstream and up to 33 600 bit/s upstream", Recommendation V.90, ITU-T, Geneva, Switzerland, Sep. 1998. [I-17] International Telecommunication Union, "A digital modem operating at data signalling rates of up to 64 000 bit/s for use on a 4-wire circuit switched connection and on leased point-to-point 4-wire digital circuits", Recommendation V.91, ITU-T, Geneva, Switzerland, May 1999. [I-18] International Telecommunication Union, "Enhancements to Recommendation V.90", Recommendation V.92, ITU-T, Geneva, Switzerland, Nov. 2000. [I-19] International Telecommunication Union, "Modem-over-IP networks: Procedures for the end-to-end connection of V- series DCEs", Recommendation V.150.1, ITU-T, Geneva, Switzerland, Jan. 2003. [I-20] R. Kocen and T. Hatala, "Voice over frame relay implementation agreement", Implementation Agreement FRF.11, Frame Relay Forum, Foster City, California, Jan. 1997. Schulzrinne, Petrack Expires - July 2004 [Page 76] RTP Events and Tones Payloads February 2004 [I-21] ANSI/TIA, "A Frequency Shift Keyed Modem for Use on the Public Switched Telephone Network", ANSI TIA-825-A-2003, Telecommunications Industry Association, Washington, D.C. USA, Apr. 2003. [I-22] J. G. van Bosse, em Signaling in Telecommunications Networks. Telecommunications and Signal Processing, New York, New York: Wiley, 1998. [I-23] Siemens, "MFC signaling systems", Jan. 1983. Siemens topics. Schulzrinne, Petrack Expires - July 2004 [Page 77]