Internet Engineering Task Force SIP WG Internet Draft Category: Informational Sanjoy Sen draft-sen-sip-earlymedia-01.txt Jayshree Bharatia November 21st, 2001 Chris Hogg Expires: May 2002 Francois Audet Early Media Issues and Scenarios STATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [9]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. For potential updates to the above required-text see: http://www.ietf.org/ietf/1id-guidelines.txt Abstract The ability to carry Early Media - defined as media that is delivered prior to call answer or session establishment - has been the subject of many discussions and internet drafts within the IETF [6][7]. Despite these discussions, it appears that no clear solution has emerged to solve the Early media problem. Furthermore recent changes resulting from the re-drafting/re-wording of RFC2543bis-05 [4] combined with the publication of the "SDP-offer- answer draft" [5] and the latest version of Manyfolks [3] have further muddied the waters regarding the behaviour associated with early media. This draft aims to highlight the sources of confusion surrounding how early media operates in the context of these drafts and presents a solution for achieving alignment of the concepts outlined hence solving the early media problem. Additionally, it is shown how this solution is applicable to a variety of early media scenarios such as the complexity introduced by forked INVITEs, interactions with NAT/FW traversal and PSTN/PBX interworking issues. 1 Introduction Early media is a concept of delivering a media stream prior to call answer or session establishment. Normally, early media designates a media transmission sent before the actual completion of the call. In terms of SIP, early media refers to transmission of media prior to response code 200 OK being sent to an INVITE. Early media is Sen 1 Early Media Issues and Scenarios November 2001 generally required to deliver inbound call progress messages when inter-working with Public Switched Telephone Network (PSTN) or Private Switched Telephone Network (i.e. PBX). In the former, a one- way voice path is established to the caller by the Address Complete Message (ACM). In the PBX, a one-way voice path is established to the caller based on the presence of a Progress indicator indicating in-band information. The one-way voice path is used for transmitting early media, such as busy tone, reorder tone, announcements etc. From SIP perspective, early media can also be useful to avoid clipping of the backwards voice path when a call is answered. This can happen because the audio media may unintentionally arrive at the originating user agent ahead of a 200 OK response of an INVITE. According to [5], SIP terminals should be ready to receive media any time after sending an INVITE. Hence, a SIP terminal should be able to support early media. However, there are some issues with the way session descriptions (SDP) in provisional responses are handled, optional use of resource reservation, complexity introduced by forked INVITEs, interactions with NAT/Firewall traversal etc. This draft discusses some of these issues along with related IP-PSTN/PBX inter-working scenarios. Although most of the early media scenarios discussed in this document refer to IP-PSTN/PBX inter-working, they are applicable to other IP-IP scenarios as well. Also note that, although many of the current SIP implementations do not support the origination of early media, it is assumed that this support will soon become an intrinsic feature of all SIP terminals. This draft discusses issues in supporting early media when dealing with SIP terminals. 1.1 Terminology PSTN This is the Public Switched Telephone Network. PSTN is also sometimes refer as GSTN (General Switched Telephone Network). Almost all modern parts of the PSTN employ Signaling System Number 7 (SS7). The most common application-layer signaling protocol is the ISDN User Part (ISUP). Hence, for the purpose of this document, it is assumed that ISUP signaling is used while inter-working with PSTN. PSTN Origination An originatorĖs ingress-MGC receives ISUP from the PSTN network. This request is forwarded to SIP network either using SIP-T mechanism or direct translation of parameters from received ISUP message to a SIP method. PSTN termination A terminatorĖs egress-MGC receives an INVITE from an IP terminal. This request is forwarded to PSTN network either using SIP-T mechanism or direct translation of parameters from received SIP method to the ISUP message. IP Terminal The term used to represent all end-user devices that originate and terminate SIP calls ISUP Initial Address Message (IAM) This message is used to establish a connection on a specified circuit. It includes all necessary information required for handling ISUP call. Sen Expires May 2002 2 Early Media Issues and Scenarios November 2001 ISUP Address Complete (ACM) This message is considered as a response of an ISUP IAM. It indicates that the call is being processed, and the distant exchange is checking the availability of the called party. This could also mean that called party is ringing/alerted. In GSTN, a one-way voice path is established to the caller by the ACM message. This voice path is used to carry voice announcements and to transmit tones. ISUP Answer Message (ANM) This is also sent in the same direction as ACM to indicate that the called party has answered. Callee Refers to the terminating host Caller Refers to the originating host Media Gateway Controller (MGC) The media gateway controller controls the parts of the call state that pertain to connection control for media channels in a Media Gateway. Media Gateway (MG) The media gateway converts media provided in one type of network to the format required in another type of network. For example, a MG could terminate bearer channels from PSTN (e.g., DS0s) and media streams from a packet network (e.g., RTP streams in an IP network). Q.931 SETUP message This message is used to establish a connection on a specified circuit. It includes all necessary information required for handling ISDN calls. Q.931 Progress indicator A progress indicator can be included in a Q.931 CALL PROCEEDING, PROGRESS or ALERTING message. If the value of the progress indicator is #1, "Call is not end-to-end ISDN" or #8, "In-band information or an appropriate pattern is now available", it means that in-band tones and announcements (such as ringing or busy) may be provided by the far end. Q.931 ALERTING message An ALERTING message indicates that the user is being alerted, and that unless in-band tones and announcements are provided, local ringing shall be provided. Q.931 CONNECT message A CONNECT message indicates that the called party has answered. 2 Early Media Support in current SIP drafts Descriptions of early media support within SIP are not explicit and currently reside in a number of different drafts. According to [5] when a SIP entity offers a new SDP it should be prepared to receive on the ports it lists using any of the codecs listed within the SDP. Bis-05 [4] however provides no discussion of the early media issue while [3] alludes to the presence of possible early media streams within the session by indicating that SDP negotiation can be Sen Expires May 2002 3 Early Media Issues and Scenarios November 2001 performed in the 18x/PRACK messages which precede any resource reservation / COMET messages. Amalgamating the behaviours described within both the [4] and the [5] (i.e. simple SIP call) the call flow shown in figure 1 is valid. INVITE (SDP OFFER) INVITE(no SDP) |---------------------->| |---------------------->| | 18x | | 18x | |<----------------------| |<----------------------| | 200 OK (SDP ANSWER)| | 200 OK (SDP OFFER) | |<----------------------| |<----------------------| | ACK | | ACK (SDP ANSWER) | |---------------------->| |---------------------->| Figure 1a Figure 1b Figure 1: OFFER/ANSWER in Simple SIP call. In this case, according to [5] there are 2 scenarios. The first scenario (shown in Fig 1a) requires that the INVITE and the 200 OK contain SDP - the SDP in the INVITE being interpreted as the OFFER and the SDP in the 200 OK as the ANSWER. In order to enable early media to be received by the UAC in this scenario the UAS needs to send a 18x response which contains a session description which describes the early media session. However [5] and [4] are not clear as to what this SDP in the 18x response constitutes - should it be a new OFFER for an early media stream (independent of the initial INVITE offer) or should it be a provisional answer to the INVITE? Similarly, in fig 1b the scenario is shown whereby there is no SDP within the INVITE request resulting (as described within [5]) in the OFFER being within the 200 OK and the ANSWER within the ACK request. If early media is required in this scenario then the 18x response will contain a session description. However, amalgamating the behaviour in bis-05 and SDP-offer-answer it is presently unclear as to whether this SDP constitutes an initial ANSWER for the final negotiated media OR is a completely separate OFFER for early media. In addition, in both fig 1a and fig 1b if the SDP within the 18x response is considered as an early media OFFER, then unless the 100rel draft [8] is used there is no mechanism for the early media OFFER to be responded to with an appropriate ANSWER (as defined by the rules for ANSWERs within [5]). This situation becomes even cloudier when behaviour described within Manyfolks [3] is introduced. Consider for example Figure 2 which shows a typical session containing confirmation of mandatory pre-conditions. Sen Expires May 2002 4 Early Media Issues and Scenarios November 2001 | INVITE (SDP) | |--------------------------------->| | 18x(SDP) | |<---------------------------------| | PRACK(Optional SDP) | |--------------------------------->| | 200 OK (PRACK) | |<---------------------------------| | COMET | |--------------------------------->| | 200 OK(COMET_) | |<---------------------------------| | COMET | |<---------------------------------| | 200 OK(COMET) | |--------------------------------->| | 180 Ringing (optional) | |<- - - - - - - - - - - - - - - - -| | PRACK(180) (optional) | |- - - - - - - - - - - - - - - - ->| | 200 OK (PRACK 180)(optional) | |<- - - - - - - - - - - - - - - - -| | 200 OK (INVITE) | |<---------------------------------| | ACK | |--------------------------------->| | | Figure 2: Manyfolks Call Flow. In this example the initial INVITE contains SDP with pre-conditions for the establishment of the session. Manyfolks mandates that the "UAS MUST generate a 18x provisional response containing a subset of the pre-conditions supported by the UAS". Thus the SDP contained within this 18x response MUST (in this case) be a subset of that within the INVITE OFFER. Furthermore Manyfolks mandates the use of 100rel (PRACK) and states further that: "If the 18x provisional response with preconditions requested an acknowledgement (using the methods of [11]), the UAC MUST include an updated SDP in the PRACK if the UAC modified the original SDP based on the response from the UAS. Such a modification MAY be due to negotiation of compatible codecs, or MAY be due to negotiation of mandatory preconditions. " Thus, an optional SDP may be included within the PRACK. From reading this spec it appears that the intention is for a situation whereby the INVITE contains the OFFER for the final session and the 18x contains an ANSWER and an additional †confirmationĖ SDP included in the PRACK of the 18x. Thus, as in the scenarios described in Figure 1, it appears that there is a mechanism available for the negotiation of early media sessions (the 18x/PRACK interaction). However this is not being used for this purpose and is instead being used for negotiation of final session media. Also, a 3-way negotiation is required rather than a 2-way offer-answer. In the same vane as the simple SIP call case, if the INVITE in Sen Expires May 2002 5 Early Media Issues and Scenarios November 2001 figure 2 did not contain any SDP the 18x/PRACK exchange must correspond to a 2 way OFFER/ANSWER for the final session media. OPEN ISSUE #1: Manyfolks[3] and SDP-Offer-Answer[5] are not aligned since Manyfolks discusses 3 way SDP negotiation while the SDP-offer-answer draft is focused on 2 way OFFER/ANSWER SDP negotiation. Work is required to align these drafts. 3 Early Media Requirements As can be seen from section 2 there is an obvious need within SIP to align the behaviour of early media across the SIP community. However, rather than blindly mandating the semantics associated with session descriptions in the 18x / PRACK messages it is prudent to take a look at the requirements of early media within SIP. Previous attempts have been made to determine these requirements [6]. This section borrows somewhat from this earlier work and additionally proposes some newly discovered requirements. The key requirements for early media are: 1. Distinction of this Early media session from "final" negotiated session. This requirement stems from the fact that it is entirely possible that the early media stream which is played during the initial stages of a SIP call may be entirely different from the final negotiated session parameters. Such a scenario would occur, for example, if the early media consisted of an announcement from an announcement server which is in a physically separate location from the called party. In this case the announcement (early media) session may have entirely different codecs and will have entirely different connection addresses than the final negotiated session however is still logically part of the same SIP call. This requirement could be met in one of two ways: (i) an independent "early media" OFFER/ANSWER exchange carried in the SIP messaging independently of the INVITE and 200 OK (which carry the "final media" OFFER/ANSWER), or (ii) the ability to completely modify the initially negotiated media. 2. Ability of provisional responses associated with early media to indicate whether or not local alerting by the SIP UA should be used or whether early media should be played instead. Requirement 2 arises from the need to enable the callers' SIP UA to determine the desired behaviour to apply to the call. Previous versions of the RFC2543bis draft alluded to this by specifying semantics associated with the carriage or otherwise of a session description. However these semantics (which enabled the SIP UA to clearly determine whether to play local ringing or early media) have disappeared from the current (bis-05)[4] draft. Nevertheless, a requirement still exists for this information to be conveyed. 3. Early Media mechanism must be able to deal with forked call cases. There is a need to develop a clear view of what happens to an early media stream when a call is forked / how a SIP entity deals with Sen Expires May 2002 6 Early Media Issues and Scenarios November 2001 and/or controls the reception of multiple forked early media streams. Whatever solution is developed requires to maximise the ability of the call to complete successfully by ensuring the potential for call defects caused by user-confusion regarding the state of the call are minimised. Any solution should take account of both sequential and parallel forking scenarios. 4 Proposed Solutions This section examines the proposed solution of the Rosenburg early media draft [6] to the early media requirements detailed in section 3. 4.1 Simple SIP call. Consider the case of a simple SIP call (without any resource reservation/Manyfolks support) similar to that shown in Fig 1a and Fig 1b. In order to support requirement 1 there is a need for the 18x response to be acknowledged (so that the early media "answer" can be accommodated within the flow). Given this requirement, it is proposed that for cases where early media is required within basic SIP calls support of the 100rel (PRACK) draft [8] is mandatory. However, mandating the PRACK is only part of the solution - the semantics associated with the SDP in these messages needs also be defined. The proposal of [6] is essentially that SDP contained within the 18x/PRACK/200OK(PRACK) exchange is based upon the same semantics described within the SDP-OFFER-ANSWER draft for INVITE/200 OK/ACK. In particular if the 18x contains an early media OFFER then the PRACK MUST contain an early media ANSWER and the 200OK(PRACK) MUST not contain any SDP. In contrast, if the 18x contains no SDP then the PRACK MUST contain an early media OFFER and the 200 OK(PRACK) MUST contain the early media ANSWER. All OFFERs and ANSWERs MUST be formed according to the rules outlined in the SDP-offer-answer draft[5]. Thus the call flows for the case of the simple SIP calls from fig1a and fig1b become as shown in Figures 3a and 3b respectively. INVITE (SDP OFFER) INVITE(no SDP) |------------------------->| |---------------------------->| |18x(SDP em OFFER) | |18x(SDP em OFFER) | |<-------------------------| |<----------------------------| |PRACK(SDP em ANS/em OFFER)| |PRACK(SDP em ANS/em OFFER) | |------------------------->| |---------------------------->| |200 OK (PRACK)(em ANS) | | 200 OK(PRACK)(em ANS) | |<-------------------------| |<----------------------------| | | | | | 200 OK (SDP ANSWER) | | 200 OK (SDP OFFER) | |<-------------------------| |<----------------------------| | ACK | | ACK (SDP ANSWER) | |------------------------->| |---------------------------->| Figure 3a Figure 3b Figure 3: OFFER/ANSWER in Simple SIP call with Early Media. Sen Expires May 2002 7 Early Media Issues and Scenarios November 2001 In all cases, it is proposed that if the 18x response contains a session description then the UA receiving this response SHALL proceed to play out any early media stream described within the session description unless the early media stream has a pre- condition attribute which is still to be fulfilled (see next section) in which case the same behaviour as described in Manyfolks holds (namely that the UA behaves as if it has received no SDP until after any pre-conditions are met and/or appropriate COMET messages have been delivered (if requested)). 4.2 SIP call with pre-conditions In the case of a SIP call with pre-conditions (similar to the call flow shown previously in Figure 2) the 183/PRACK/200OK(PRACK) messaging already exists (since Manyfolks mandates the use of 100rel). However as discussed in section 2, the SDP conveyed within these messages is related to the final session, not early media. In this context, the proposal of [6] is essentially as follows: This solution is generic to both the cases when the INVITE contains the OFFER for the final session media OR the 183 contains this OFFER. Sen Expires May 2002 8 Early Media Issues and Scenarios November 2001 | INVITE (OFFER SDP or No SDP) |\ |--------------------------------->| \ | 18x(ANSWER SDP or OFFER SDP) | \ |<---------------------------------| \ | PRACK(No SDP or ANSWER SDP) | \ |--------------------------------->| \ | 200 OK (PRACK) | \ |<---------------------------------| \ | COMET | \ |--------------------------------->| | Final Media | 200 OK(COMET_) | | negotiation. |<---------------------------------| / | COMET | / |<---------------------------------| / | 200 OK(COMET) | / |--------------------------------->| / | | / | | / | 18x(OFFER SDP or no SDP) | \ |<---------------------------------| \ | PRACK(ANSWER SDP or OFFER SDP)| \ |--------------------------------->| \ |200 OK(PRACK)(No SDP or ANS SDP) | \ |<---------------------------------| \ | COMET | \ |--------------------------------->| \ | 200 OK(COMET_) | \ |<---------------------------------| | Early Media | COMET | | negotiation |<---------------------------------| | | 200 OK(COMET) | / |--------------------------------->| / | | / | 180 Ringing (optional) |------- |<- - - - - - - - - - - - - - - - -| | PRACK(180) (optional) | |- - - - - - - - - - - - - - - - ->| | 200 OK (PRACK 180)(optional) | |<- - - - - - - - - - - - - - - - -| | 200 OK (INVITE) | |<---------------------------------| | ACK | |--------------------------------->| | | FIGURE 4: Early Media and Manyfolks - Separate Final and early media reservation. In this case it is proposed that the first INVITE/18x/PRACK transaction contains the SDP OFFER followed by ANSWER exchange for the FINAL session media. Thus the following rule applies for FINAL session SDP negotiation: Sen Expires May 2002 9 Early Media Issues and Scenarios November 2001 -IF SDP is within the INVITE then this constitutes an OFFER. The ANSWER to this SDP MUST be within the 18x response. -IF SDP is not in the initial INVITE the OFFER MUST be within the 18x response with the ANSWER being placed in the PRACK. -All OFFERs and ANSWERs must be formed according to the rules laid out in the SDP-offer-answer draft [5]. However, as per Manyfolks[3] if the final session OFFER/ANSWER SDPs contain pre-conditions then the session may not proceed until these resources have been reserved and any confirmations that were requested have been received. This means that the next stage of the session MAY not proceed until the last of the COMETs associated with the final media resource reservation have been received. Assuming that one is seeking to fulfil requirement 1 the next stage in the proposed call flow is then to negotiate the set up of the early media streams. This is achieved through the use of another 18x/PRACK/200OK(PRACK) exchange with semantics exactly the same as that described for the simple SIP call (see section 4.1). Further COMET messages MAY also be required if confirmation of early media resource reservation was requested. Thus assuming the separation of early media and final session setup there is a need for a two separate resource reservations as indicated. In addition this solution makes exchange of SDP within 200 OK (INVITE) and ACK messages redundant. 4.3 Evaluation of Solutions versus perceived requirements. Both 4.1 (simple SIP call) and 4.2 (Manyfolks SIP call) are designed to ensure the separation of early and final session media negotiation in order to fulfil requirement 1. The solutions outlined in sections 4.1 and 4.2 fulfil this requirement and are essentially a manifestation of the behaviour proposed within the Rosenberg Early media draft [6] interpreted using the language of the SDP-offer-answer draft[5]. Implementing these changes in SIP would require minor alignment changes to both the Manyfolks and SDP- offer-answer drafts. While these solutions facilitate the flexibility that was envisaged in the Rosenberg draft [6] with respect to achieving independence of early and final media negotiations it is worth noting that in the Manyfolks session call flow (see section 4.2) additional reserved resources and much additional messaging are required in order to set up the call due to the two separate resource reservations. Given this situation, it is worth considering whether the present Manyfolks behaviour (whereby the OFFER in the INVITE MUST contain SDP which may also be used for the early media streams, and the first 18x represents the ANSWER to the INVITE SDP ) is preferable. Indeed, it is unclear as to whether there is really a requirement that the early media itself must be separately negotiated from the final media streams since in many cases (especially in the case of PSTN/PBX inter working) the early media stream is essentially the forerunner of the final media stream and often uses the same channels and codecs. (In the PSTN/PBX case the Early media stream is simply a one way backwards speech path that is available after Sen Expires May 2002 10 Early Media Issues and Scenarios November 2001 the SS7 ACM and before the SS7 ANM is received. Upon answer, the same path is used for the speech itself). Further, if instead it was decided to use a single resource reservation for Early Media, and modify this, if required, for the final session, then it would be prudent to modify the non-Manyfolks case (4.1) as well. If this approach were followed, then the INVITE/183/PRACK sequence would contain an OFFER (in the INVITE or 183) and an ANSWER (in the 183 or PRACK respectively) for a media stream, with or without pre-conditions. This media stream would be used for any Early Media (and in the Manyfolks case, a mechanism would be required to indicate this). Mechanisms would be required to allow the modification of this media by either party, which could be achieved by a new OFFER/ANSWER sequence carried by one of the following exchanges: 183/PRACK exchange the PRACK & 200OK of a 183/PRACK/200 OK (PRACK) Forwards Re-INVITE/200 OK the 200 OK and ACK of a Forwards Re-INVITE/200 OK/ACK 200 OK/ACK OPEN ISSUE #2: Which of these early media models should be adopted “ (i) the Ÿseparate OFFER/ANSWER" model of [6] or (ii) a modified version of the current Ÿsingle OFFER/ANSWER÷ as described in section 4.3, paragraph 3, above. Note that a third option would be to adopt (i) for the non-manyfolks case and (ii) for the manyfolks case. This is not considered here as it was thought beneficial to maintain a common solution to both cases. 5 Early Media - Application based issues. Aside from the issue of achieving alignment between [5],[3] and [4] there are a number of application based issues which also require to be addressed if successful, standardised early media implementations are to be developed. This section, looks at issues raised specifically by implementing an early media solution for inter working via a gateway to the PSTN or a PBX. For simplicity, the call flows developed in this section assume the "single OFFER/ANSWER" (ii) solution is being used, however, it should be noted that the issues raised in this section are independent of which of the proposed solutions identified in section 4 are applied. In all cases, it is assumed that the call has not undergone forking. Forking issues associated with early media are investigated separately in section 6. 5.1 SIP originating calls (PSTN/PBX terminating) In case of SIP originating calls, the early media is expected from the PSTN/PBX network only after an ACM/ALERTING message is received at the MGC, because, in the PSTN/PBX network, a one-way call is established to the caller by the ACM/ALERTING message. Thus the MGC should reserve appropriate resources at the media gateway to allow this media through, even before sending out the IAM/SETUP message (the ACM/ALERTING message is sent in response to the IAM/SETUP Sen Expires May 2002 11 Early Media Issues and Scenarios November 2001 message). Note that, this assumes that the original INVITE from the caller contains an SDP with receiving RTP port information for early media reception - as described in section 4 this SDP represents the OFFER for the final media session and MUST be acceptable to the MGC in terms of the enabling the MGC to send early media to the calling SIP UA. Upon receiving this OFFER the MGC replies with a 18x response which MUST contain a session description. This SDP represents the ANSWER SDP and MUST be formed according to the rules for constructing ANSWERs as defined in [5]. Since in most PSTN/PBX inter working situations it is expected that resource reservation prior to call setup will be a feature then it is highly likely that these SDPs will contain mandatory pre-conditions to session completion according to Manyfolks. . In this case, the 18x response is replied to by the originating SIP UA with a PRACK. Both ends then reserve resources and report back to each other using COMET messages. Upon confirmation of resource reservation (via the COMET requests), the MGC may send out the IAM/SETUP message to the PSTN/PBX endpoint (since it has now been confirmed that all early and final session media resources have been reserved). What happens next, depends upon whether the SIP originated call is being inter worked to the PSTN or a PBX. 1. PSTN case. In the PSTN case, successful call setup usually results in the IAM being responded to with an ACM message. In the case of a successful call setup, this IAM message is usually responded to with an ACM message. However, due to the presence of legacy PSTN networks, the ACM message does not necessarily mean that the callee is being alerted - instead the ACM is delivered with either a "subscriber-free" tag; indicating that the called party is being alerted or no indication at all. To enable the MGC to convey this information to the originating SIP UA a 18x message is used as shown in table 1. Thus, in the case of the ACM returning a "subscriber-free" indication the SIP UA is informed of this by sending a 180 Ringing response and should play early media (which will contain the remotely generated ring-tone). In the case of the "status-unknown" response, an optional 183 SIP provisional response will be sent back from the MGC. This provisional response could be used to keep the SIP UA informed that it should play out early media from the PSTN endpoint. ------------------------------------------------------- ACM indications May be Interpreted as 18x response ------------------------------------------------------------------ Free Early Media Possible 180 Ringing Busy Early Media possible Optional 183 (?) Status Unknown Early Media possible Optional 183 ------------------------------------------------------------------- Table 1 - PSTN case - Mapping of ACM to 18x response Sen Expires May 2002 12 Early Media Issues and Scenarios November 2001 2. PBX case. If the mandatory pre-conditions are met then the call setup proceeds with the MGC sending out a Q.931 SETUP message. This SETUP message may be replied to with either a PROGRESS, ALERTING or CONNECT message. If the ALERTING message is received with a call progress indicator set to #1 or #8 then early media MUST be played out to the SIP endpoint (Caller). The MGC will send back a 180 Ringing “ which must contain an indication that Early Media should be played to the caller. In the non-Manyfolks case, this is easily indicated by the presence of SDP. In the Manyfolks case, an indication may be required to stop local ringback at the caller. If the ALERTING message did not contain a progress indicator of #1 or #8 then the MGC MUST send a 180 Ringing response (without SDP) to initiate local ringing at the originating SIP UA. OPEN ISSUE #3: There is presently no way in the manyfolks case for the SIP endpoint to be informed that it should start playing the early media. Should the use of a 183 provisional response to inform the originating SIP UA to start playing the media stream therefore be made mandatory? Figure 5 illustrates the behaviour for SIP originating calls inter working via a gateway to a PSTN or PBX network. Originating MGC PSTN/PBX SIP UA | INVITE (SDP offer) | | |--------------------------------->| | | 18x(SDP answer) | | |<---------------------------------| | | PRACK | | |--------------------------------->| | | 200 OK (PRACK) | | |<---------------------------------| | | COMET | | |--------------------------------->| | | 200 OK(COMET_) | | |<---------------------------------| | | COMET | | |<---------------------------------| | | 200 OK(COMET) | IAM/SETUP | |--------------------------------->|----------------------->| |180 Ringing | ACM/ALERTING | |<- - - - - - - - - - - - - - - - -|<-----------------------| | PRACK(180) | | |- - - - - - - - - - - - - - - - ->| | | 200 OK (PRACK 180) | | |<- - - - - - - - - - - - - - - - -| | | 200 OK (INVITE) | ANM/CONNECT | |<---------------------------------|<---------------------- | | ACK | | |--------------------------------->| | | | | Figure 5 - SIP Originating calls Sen Expires May 2002 13 Early Media Issues and Scenarios November 2001 5.2 SIP terminating calls (PSTN/PBX originating) In this case, the terminating SIP end-point can be the originator of early media. When the MGC receives an IAM/SETUP message from the PSTN/PBX, for the same reason as described in the previous section, the Media Gateway and the PSTN/PBX network should reserve resources to allow the receipt of early media from the callee. In most cases, it is expected that Manyfolks will be used to control this resource reservation. Thus, upon receipt of the IAM/SETUP message the MGC reserves resources on the PSTN side and sends out an INVITE to the terminating SIP UA. This INVITE MUST contain an SDP OFFER which MUST contain a media stream that has a high likelihood of being acceptable to the terminating SIP UA for sending early media. As in the SIP originating case, this INVITE is replied to with a 18x/PRACK interaction. This 18x MUST contain a valid ANSWER (as per the rules in [5]). Given that Manyfolks is being used to control resource reservation it is likely that the OFFER and ANSWER contained pre-conditions to the establishment of the session. Resource reservation therefore takes place and the status of the resource reservation is reported via a (series of) COMET message exchange(s). IF the mandatory pre-conditions are met then the call setup proceeds as follows depending upon whether the call is being inter worked from the PSTN or from a PBX. 1. PSTN originating case. The call proceeds with the MGC sending out the ACM with the Backward Call indicator bits set to "status-unknown". This sets up a one way media path from the called SIP UA to the originating PSTN endpoint. A successful session may then continue with the terminating SIP UA playing early media (no 18x response required) OR sending a 180 Ringing response. In the case of the MGC receiving a 180 Ringing response, then the MGC should instruct the MG to insert a ringtone into the bearer path. 2. PBX originating case. The call proceeds with the MGC sending out the PROGRESS message and setting the progress indicators #1 or #8. This sets up a one way media path from the called SIP UA to the originating PBX endpoint. A successful session may then continue with the terminating SIP UA playing early media (no 18x response required) OR sending a 180 Ringing response. In the case of the MGC receiving a 180 Ringing response, then the MGC should instruct the MG to insert a ringtone into the bearer path. Behaviour for the SIP terminating case is illustrated in figure 6. Sen Expires May 2002 14 Early Media Issues and Scenarios November 2001 Terminating MGC Originating SIP UA PSTN/PBX endpoint | INVITE (SDP offer) | IAM/SETUP | |<---------------------------------|<-----------------------| | 18x(SDP answer) | | |--------------------------------->| | | PRACK | | |<---------------------------------| | | 200 OK (PRACK) | | |--------------------------------->| | | COMET | | |--------------------------------->| Note: MGC instructs | | 200 OK(COMET_) | MG to insert ring | |<---------------------------------| tone if 180 received| | COMET | Otherwise, if 183 | |<---------------------------------| received MG plays | | 200 OK(COMET) | early media to PSTN | |--------------------------------->| /PBX endpoint | |180 Ringing (no SDP if MGC to | | | insert ringing) |ACM/ALERTING | |- - - - - - - - - - - - - - - - ->|----------------------->| | PRACK(180) | | |<- - - - - - - - - - - - - - - - -| | | 200 OK (PRACK 180) | | |- - - - - - - - - - - - - - - - ->| | | 200 OK (INVITE) | ANM/CONNECT | |--------------------------------->|----------------------->| | ACK | | |<---------------------------------| | | | | Figure 6 SIP terminating 5.3 Other Early Media Issues 1. Interaction with Firewall/NAT traversal When the SIP client is behind a firewall or NAT/NAPT, the firewall pinhole needs to be opened or NAT/NAPT bindings need to be established in one direction to allow early media before the session establishment is completed [1]. In this scenario, there are potential security loopholes if the firewall/NAT has to establish pinholes/bindings without complete knowledge of the media flow (i.e., IP address/port of the callee). This is currently being considered by the MIDCOM WG. Sen Expires May 2002 15 Early Media Issues and Scenarios November 2001 6 Forking Issues The fact that proxies, en-route, can fork a SIP INVITE creates additional issues with the potential of the caller receiving multiple early media streams. The issues can be summarized as follows: -Need to arbitrate between multiple early media streams -Need to ensure consistent user behaviour that does not end in the user hanging up the call in between multiple early media sessions -Partial knowledge about the early-media sources during call set-up -Arbitration between multiple provisional 18x responses from early media sources -Potential of race conditions between multiple media streams The decoupling of SIP call control from the media allows us less control over the ensuing early media sessions leading to inconsistency in call set-up and undesirable user behaviour. We will discuss these issues in the context of the two kinds of forking scenarios supported by SIP - Parallel and Sequential. Again, we assume that the SIP terminals have the potential to generate early media. 6.1 Parallel Forking This is definitely the most complex scenario. Multiple proxies can be involved in a call some or all of which can fork an INVITE transaction. In one scenario, the forking end-point destinations can be multiple PSTN gateways. Depending on the call progress at the PSTN networks at the forked legs, the caller can expect one or more simultaneous early media sessions. The issue is with how to treat and, if required, arbitrate between the multiple early media sessions. For example, consider the scenario depicted in Figure 7, where two forked INVITES reach a media gateway (GW1) and a SIP end-point. GW1 sends back a busy-tone, which reaches the caller before the called party answers (through the SIP end-point). In this case, the caller may hang-up the call before the callee answers. Thus, such race conditions need to be prevented to avoid undesirable user behaviour during call set-up. ---------- --------- | |----- INVITE (1)-------->| | INVITE--->| Forking | | GW1 | | Proxy | --------- ---------- PSTN | ---------- | | | | | SIP End- | +----- INVITE (2)-------->| point | ---------- Figure 7 Sen Expires May 2002 16 Early Media Issues and Scenarios November 2001 The INVITE can be forked multiple times by proxies, en-route, compounding the problem of race condition. This is shown in Figure 8. ---------- --------- | |-- INVITE (1)-->| | INVITE--->| Forking | | GW1 | | Proxy | | | | | --------- ---------- PSTN | ---------- ---------- | | | | | | | Forking |-INVITE(3)-->| GW2 | +- INVITE (2)->| Proxy | ---------- ---------- PSTN | | ---------- | | | | | SIP End- | +--INVITE (4)------>| point | ---------- Figure 8 Here early media from three potential sources can reach the caller at any time and, potentially, at the same time too. The multiple simultaneous early media sessions, which can result in these scenarios, need to be segregated on the bearer path such that it provides coherent and consistent information to the caller. This can either be done in any media gateway on the media path or at the calling end-point. 6.2 Sequential Forking Sequential forking somewhat alleviates the problem caused by multiple parallel early media streams. The forking process may be controlled at the forking proxy imposing certain priority and order on the execution of the early media sessions. Sequential forking can be implemented under policy control, where the forking process is governed by a pre-established priority of the called end-points (assumed to be known at the forking proxy). There might be a need for the sequential play-out of all the early media sessions (assuming there are multiple of them). This implies that the forking proxy may need indication of the end of an early media session and use this to trigger the next INVITE to another branch. It may be required that this type of branch migration be controlled by either the caller or the called endpoints. Note that, route-advance is currently triggered [4] either when the party rejects the call with a 4xx or 5xx response, or when the proxy makes a route advance decision based on a timer. Sen Expires May 2002 17 Early Media Issues and Scenarios November 2001 6.3 Proposed strategies for solving "forked" early media problem 6.3.1 Background on Previous Proposals The following options to deal with 18x provisional responses were proposed in previous working group meetings and are still under investigation as possible resolutions. 1. Use INFO to pass ACM-related parameters for interworking with ISUP. 2. Eliminate usage of 18x completely (if QoS negotiation is not required, 18x/PRACK can be eliminated). Instead use a one-way 200 OK to establish one-way media path and subsequently use a re-INVITE to complete the two-way session establishment. 3. Use of 18x is made optional and negotiable between the clients. 6.3.2 Strategies to deal with forking There are multiple ways to deal with the forking issue. In this section, we discuss some of the possible solution strategies. Possible ways of handling multiple early media sessions due to forking are as follows: 1. Allow no early media 2. Allow only one early media session (e.g., the first one) 3. Allow multiple early media sessions in a particular order 4. Allow arbitration of which of multiple received early media sessions to play if any at all. For case (1), any SDP information received before the final 200 OK can be blocked at the Proxy. A solution for (2) will, for example, allow the first 18x with SDP and block at the Proxy any other SDP information before the final 200 OK. Case (3) applies most readily to the sequential forking scenario and is discussed in section 6.3.2.1. Case (4) is most appropriate for parallel forking scenarios and is discussed in section 6.3.2.2. 6.3.2.1 Sequential forking The two main issues here are - (1) control of the forking process, and (2) triggering of branch migration at the end of an early media session, in case of multiple sequential early media sessions. If the forking proxy is aware of the priorities of the end-points (potential early media sources), it would be possible for it to send them the INVITE's at a particular order. This priority may be set by the end-user and can be communicated to the Proxy prior to session establishment. When an end-point completes transmission of early media, it may send a response to trigger the proxy to route advance the next INVITE. Note that to avoid the caller hanging up the phone on receiving the first announcement and miss several important announcements following it, it may be necessary to notify the UAC via a SIP provisional response (18x) that multiple early media sessions are possible. This is possible by adding an indication (e.g., through a new header) in the 18x response at the proxies, if the original INVITE had been forked. Note that, this is applicable to both types of forking scenarios. Sen Expires May 2002 18 Early Media Issues and Scenarios November 2001 6.3.2.2 Parallel forking The main issue here is the potential of the UAC receiving multiple early media streams. The arbitration between the media streams can be done by intelligent handling either at the client terminal or at a gateway on the media path. In both cases however, it is important that the callers' UAC is informed that the call has been forked. This may require enhancements to SIP signaling between the forking proxy and the callers' UAC. The following arbitration logic is proposed: - All endpoints indicate "busy" => Play busy tone. - >= 1 endpoint indicate "free/ringing" => Play local ringing. - No endpoints free + some busy + 1 endpoint "status unknown" =>Play early media from the "status unknown" endpoint. - No endpoints free + some busy + > 1 endpoint "status unknown" => Play local "announcement" indicating that the network is trying to contact the callee. - All endpoints "status unknown" => Play local announcement that indicates to the caller that the network is trying to contact the callee. Prior to applying this logic it is assumed that the entity performing the arbitration (usually the callers' UAC) shall start a timer. This timer should be user configurable to a time for which the arbitration entity could reasonably expect most of the responses to the initial forked request to be received. Upon timer expiration, the arbitration entity shall apply the logic described above to determine which of the received responses to accept if any. In the case whereby no responses are accepted (i.e. when all endpoints indicate "status unknown") then the UAC will require to initiate appropriate signaling to connect to an announcement server to play an "network trying to contact the callee" announcement. In all cases, forks which are no longer required may be released as per normal SIP operation. This mechanism is a guideline as to how we expect early media to be dealt with within parallel forked calls. The exact details of the call flows involved and any modifications which may be required to the SIP protocol are for further study. 7 Conclusion This draft has illustrated the main issues behind the current confusion as to the operation of early media within SIP. Through evaluation of the current behaviour of early media in SIP [3][4][5] it has been demonstrated that there is a need for alignment on the issue of early media within the SIP community. In addition the draft has also studied common issues associated with the implementation of early media within various different applications. As a result of this work a number of open issues have been identified which require to be resolved by the SIP community before a consistent early media solution may become widely adopted. These open issues are repeated below for ease of reading. OPEN ISSUE #1: Manyfolks[3] and SDP-Offer-Answer[5] are not aligned since Manyfolks discusses 3 way SDP negotiation while the Sen Expires May 2002 19 Early Media Issues and Scenarios November 2001 SDP-offer-answer draft is focused on 2 way OFFER/ANSWER SDP negotiation. Work is required to align these drafts. OPEN ISSUE #2: Which of these early media models should be adopted “ (i) the Ÿseparate OFFER/ANSWER" model of [6] or (ii) a modified version of the current Ÿsingle OFFER/ANSWER÷ as described in section 4.3, paragraph 3, above. OPEN ISSUE #3: There is presently no way in the manyfolks case for the SIP endpoint to be informed that it should start playing the early media. Should the use of a 183 provisional response to inform the originating SIP UA to start playing the media stream therefore be made mandatory? 8 Acknowledgements Authors of this document would like to acknowledge Mary Barnes, Scott Orton and Mark Watson for their input and reflections on this work. 10 Document History draft-00: Initial draft submitted at IETF-51, August 2001. draft-01: Revised version of draft-01. Submitted at IETF-52, December 2001. The draft was updated since changes made to other drafts to which this document makes reference are out of date and have undergone significant changes. In particular, as a result of these dependencies, the new version of the draft is re-focused on identifying areas which cause confusion within the SIP community with respect to early media issues and seeks to present a number of solutions to the alignment problems identified. Changes have also been made to the Forking Issues section to further develop the issues identified in draft-00 and present potential solutions. In addition to this re-focusing exercise, some editorial changes have been made including: - Amalgamation of PSTN/PBX scenarios into a single discussion of SIP originating / SIP terminating calls. - Updating references to refer to latest versions of drafts. - Re-wording of language used to discuss early media to reflect the style of language used in the OFFER/ANSWER model. Sen Expires May 2002 20 Early Media Issues and Scenarios November 2001 10 References [1]C. Huitema, "MIDCOM Scenarios", draft-ietf-midcom-scenarios- 02 (work in progress), November 2001 [2]Aparna Vemuri, Jon Peterson, "SIP for Telephones (SIP-T):Context and Architectures", draft-vemuri-sip-t-context-02 (work in progress), August 2001 [3] Marshall et al "Integration of Resource Management and SIP extensions for Resource Management", draft-ietf-sip-manyfolks- resource-02, Expires Feb 2002 (work in progress) [4]Handley, Schulzrinne, Schooler, Rosenberg, "Session Initiation Protocol", draft-ietf-sip-rfc2543bis-05 bis-5 draft, Expires April 2002.(work in progress) [5]Rosenberg, J., "An offer/answer model with SDP", draft-rosenberg- mmusic-sdp-offer-answer-00.txt, Expires April 2002 (Work in progress) [6]Rosenberg, J., "SIP Early Media", draft-rosenberg-early-media- 00.txt, Expires February 2002 (Work in progress) [7]Burger, E., "Why Early Media in SIP", draft-burger-sipping-em- rqt-00.txt, Expires April 2002 (Work in progress) [8]Rosenberg, J., Schulzrinne, H, "Reliability of Provisional Responses in SIP", draft-ietf-sip-100rel-04.txt Expires March 2002 (Work in progress) [9]Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. Sen Expires May 2002 21 Early Media Issues and Scenarios November 2001 11 Full copyright statement Copyright (C) The Internet Society (1999). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Sen Expires May 2002 22 Early Media Issues and Scenarios November 2001 10 Authors Addresses Sanjoy Sen 2375 N. Glenville Drive, Building B, Richardson, TX-75082 Phone : 972-685-8275 E-mail: sanjoy@nortelnetworks.com Jayshree Bharatia 2201, Lakeside Blvd, Richardson, TX-75082 Phone : 972-684-5767 E-mail: jayshree@nortelnetworks.com Chris Hogg Maidenhead Office Park (MOP4) Bray House Westacott Way Maidenhead Berkshire SL6 3QH United Kingdom Phone : + 44-162-843-1720 E-mail: chogg@nortelnetworks.com Francois Audet 4301 Great American Parkway, Santa Clara, CA-95054 Phone : 408-495-3756 E-mail: audet@nortelnetworks.com Sen Expires May 2002 23