SIPPING E. Burger Internet-Draft SnowShore Networks, Inc. Expires: December 29, 2003 June 30, 2003 Keypad Markup Language (KPML) draft-burger-sipping-kpml-02 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http:// www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on December 29, 2003. Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved. Abstract Keypad Markup Language (KPML) is a markup language used in conjunction with SIP and HTTP to provide instructions to SIP User Agents for the reporting of user key presses. Conventions used in this document RFC2119 [1] provides the interpretations for the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" found in this document. In the narrative discussion, the "user device" is a User Agent that will report stimulus. An "application" is a User Agent requesting the user device to report stimulus. The "user" is an entity that Burger Expires December 29, 2003 [Page 1] Internet-Draft KPML June 2003 stimulates the user device. In English, the user device is a phone, the application is an application server or proxy server, and the user presses keys to generate stimulus. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. KPML . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Digit Suppression . . . . . . . . . . . . . . . . . . . . . 7 4. HTTP Reporting . . . . . . . . . . . . . . . . . . . . . . . 9 5. SIP Reporting . . . . . . . . . . . . . . . . . . . . . . . 10 6. Mixing HTTP and SIP . . . . . . . . . . . . . . . . . . . . 11 7. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 12 7.1 Monitoring for Octorhorpe . . . . . . . . . . . . . . . . . 12 7.2 Dial String Collection . . . . . . . . . . . . . . . . . . . 12 7.3 Interactive Digit Collection . . . . . . . . . . . . . . . . 13 7.4 SIP Request . . . . . . . . . . . . . . . . . . . . . . . . 14 8. Report Body . . . . . . . . . . . . . . . . . . . . . . . . 15 9. Formal Syntax . . . . . . . . . . . . . . . . . . . . . . . 16 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . 19 10.1 IANA Registration of MIME media type application/kpml+xml . 19 10.2 Schema Registration . . . . . . . . . . . . . . . . . . . . 19 11. Security Considerations . . . . . . . . . . . . . . . . . . 20 Normative References . . . . . . . . . . . . . . . . . . . . 21 Informative References . . . . . . . . . . . . . . . . . . . 22 Author's Address . . . . . . . . . . . . . . . . . . . . . . 22 A. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 23 B. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 24 Intellectual Property and Copyright Statements . . . . . . . 25 Burger Expires December 29, 2003 [Page 2] Internet-Draft KPML June 2003 1. Introduction This document describes the Keypad Markup Language, KPML. KPML is a markup [7] that enables "dumb phones" to report on basic user key-press interactions. This document refers to a "dumb phone" as a user device that does not have a display. Otherwise, it is actually a rather smart device. Most KPML implementations require the user device to be an http [2] client and interpret KPML markup. We strongly discourage the use of non-validating XML parsers, as one can expect problems with future versions of KPML. That said, one could envision user devices that only accept SIP reporting and have a fixed parser, rather than a full XML parser. This means that KPML can fit in to an extremely small memory and processing footprint. Note KPML has a corresponding lack of functionality. For those applications that require more functionality, please refer to VoiceXML [8] and MSCML [9]. The name of the markup, KPML, reflects its legacy support role. The public switched telephony network (PSTN) accomplished end-to-end signaling by transporting Dual-Tone, Multi-Frequency (DTMF) tones in the bearer channel. This is in-band signaling. NOTE: The spelunking community already took the name KML for their cave data base interchange format. From the point of view of an application being signaled, what is important is the fact the stimulus occurred, not the tones used to transport the stimulus. For example, an application may ask the caller to press the "1" key. What the application cares about is the key press, not that there were two cosine waves at 697 Hz and 1209 Hz transmitted. A SIP-signaled [3] network transports end-to-end signaling with RFC2833 [10] packets. In RFC2833, the signaling application inserts RFC2833 named signal packets instead of generating tones. The receiving application gets the signal information, which is what it wanted in the first place. RFC2833 is the only method that can correlate the time the end user pressed a digit with the user's media. However, out-of-band signaling methods, as are appropriate for user device to application signaling, do not need millisecond accuracy. On the other hand, they do need reliability, which RFC2833 does not provide. An interested application could request notifications of every key Burger Expires December 29, 2003 [Page 3] Internet-Draft KPML June 2003 press. However, many of the use cases for such signaling has the application interested in only one or a few keystrokes. Thus we need a mechanism for specifying to the user device what stimulus the application would like notification of. Burger Expires December 29, 2003 [Page 4] Internet-Draft KPML June 2003 2. KPML KPML is a stateless, declarative markup. A KPML document contains a tag with a series of tags. The tag has a value attribute that is a H.248.1 [4] digit map. NOTE: We use RFC3525 digit maps instead of MGCP [11] digit maps because the former is an IETF standard and the latter is not. NOTE: We do not use SRGS [12] DTMF grammars because it is unlikely one would use KPML for independent digit collection in a browser context. Interface attributes, such as the interdigit timeout and what constitutes a long key press, are implementation matters beyond the scope of this document. For many applications, the user device needs to quarantine (buffer) digits. Some applications use modal interfaces where the first few key presses determine what the following digits mean. For a novice user, the application may play a prompt describing what mode the application is in. However, "power users" often barge through the prompt. KPML provides a barge attribute to the tag. The default is "barge=yes". Enabling barge means that the user device buffers digits and applies them immediately when the next KPML document arrives. Disabling barge by specifying "barge=no" means the user device flushes any collected digits before collecting more digits and comparing them against the tags. NOTE: Quarantine and barge are separate actions. However, the barge action directly determines the quarantine action. Thus KPML only specifies the barge action request. If the user presses a key not matched by the tags, the user device discards the key press from consideration against the current or future KPML documents. However, as described above, once there is a match, the user device quarantines any keys the user enters subsequent to the match. KPML documents are independent. Thus it is not possible for the current document to know if a following document will enable barging or want the digits flushed. Therefore, the user device MUST quarantine all digits detected between the time of the report (http POST or SIP NOTIFY) and the interpretation of the next script, if any. If the next script has "barge=no", then the interpreter MUST flush all collected digits. If the next script has "barge=yes", then Burger Expires December 29, 2003 [Page 5] Internet-Draft KPML June 2003 the interpreter MUST apply the collected digits against the digit maps presented by the script's tags. If there is a match, the interpreter MUST quarantine the remaining digits. If there is no match, the interpreter MUST flush all of the collected digits. Unless there is a suppress indicator in the digit map, it is not possible to know if the signaled digits are for local KPML processing or for other recipients of the media stream. Thus, in the absence of a digit suppression indicator, the user device transmits the digits to the far end in real time, using either RFC2833, generating the appropriate tones, or both. The following section details the operation of the suppress indicator. Burger Expires December 29, 2003 [Page 6] Internet-Draft KPML June 2003 3. Digit Suppression Under basic operation, a KPML endpoint will transmit in-band tones (RFC2833 [10] or actual tone) in parallel with digit reporting. NOTE: If KPML did not have this behavior, then a user device executing KPML could easily break called applications. For example, take a personal assistant that uses "*9" for attention. If the user presses the "*" key, KPML will hold the digit, looking for the "9". What if the user just enters a "*" key, possibly because they accessed an IVR system that looks for "*"? In this case, the "*" would get held by the user device, because it is looking for the "*9" pattern. The user would probably press the "*" key again, hoping that the called IVR system just did not hear the key press. At that point, the user device would send both "*" entries, as "**" does not match "*9". However, that would not have the effect the user intended when they pressed "*". On the other hand, there are situations where passing through tones in-band is not desirable. Such situations include call centers that use in-band tone spills to effect a transfer. For those situations, KPML adds a digit suppression token to H.248.1 [4] digit maps. The digit suppression token is a "Q". There MUST NOT be more than one Q in any given . If there is only a single and a single , the suppression processing is straightforward. The end-point passes digits until the stream matches the regular expression up to the suppression token, Q. At that point, the endpoint will continue collecting digits, but will suppress the generation or pass-through of any in-band digits. If the endpoint suppresses digits, it MUST indicate this by including the attribute "suppressed" with a value of "yes" in the digit report. A KPML endpoint MAY perform digit suppression. If it is not capable of digit suppression, it ignores the digit suppression token and will never send a suppressed indication in the digit report. At some point in time, the endpoint will collect enough digits to the point it hits a suppression marker. The interdigittimer attribute indicates how long to wait once the user enters digits before reporting a time-out error. If the interdigittimer expires, the endpoint MUST issue a time-out report and transmit the suppressed digits on the media stream. After digit suppression begins, it may become clear that a match will Burger Expires December 29, 2003 [Page 7] Internet-Draft KPML June 2003 not occur. For example, take the regular expression "*8Qxxx[2-9]xxxxxx". At the point the endpoint receives "*8", it will stop forwarding digits. Let us say that the next three digits are "408". If the next digit is a zero or one, the pattern will not match. NOTE: It is critically important for the endpoint to have a sensible inter-digit timer. This is because an errant dot (".") may suppress digit sending forever. Applications should be very careful to indicate suppression only when they are fairly sure the user will enter a digit string that will match the regular expression. In addition, applications should deal with situations such as no-match or time-out. This is because the endpoint will hold digits, which will have obvious user interface issues in the case of a failure. Burger Expires December 29, 2003 [Page 8] Internet-Draft KPML June 2003 4. HTTP Reporting For HTTP reporting, each tag in the markup has an href attribute. When the user enters keypress(es) that match a tag, the user device issues an http POST to the URI specified by the href. The body of the POST is a report of the actual digits entered. This is so the user device can indicate what digit string matched a pattern with wildcards. If the resulting document returned by the http POST is empty, the user device terminates the KPML session. NOTE: This is different than the behavior for VoiceXML as described in Basic Network Media Services with SIP [13], where an empty document results in the termination of the session. If the KPML document includes "sip:" or "im:" href targets, and the KPML interpreter does not support SIP Reporting, the KPML interpreter MUST reject the document in its entirety at interpretation time with the appropriate SIP error as described in ????. EDITOR'S NOTE: draft-jennings-sip-app-info-00.txt should cover document rejection. It does not right now. KPML does not address document rejection, other than the criteria for rejection. This draft focuses on KPML, not on the initiation mechanism. Burger Expires December 29, 2003 [Page 9] Internet-Draft KPML June 2003 5. SIP Reporting For SIP reporting, the href attribute of the tag MUST be a SIP compatible scheme, such as "sip:", "sips:", or "im:". When the user enters keypress(es) that match a tag, the user device will issue a SIP MESSAGE to the Request-URI specified by the href attribute. RFC3428 [14] describes the protocol machinery for the MESSAGE method. NOTE: What MESSAGE has over INFO is some sense of congestion control. What MESSAGE has over NOTIFY is there are no implicit subscription issues. A KPML interpreter cannot validate the href URI. See the Security Considerations (Section 11) section. The reason one must specify a sip: scheme, and not simply make href optional, is to catch a HTTP-based script error where one forgets to specify the href tag. If href was optional, then this error would result in the user device generating a SIP NOTIFY, which would not be the desired action. The specification of any scheme-specific part, that is, anything following the colon in "sip:", is an error. The interpreter MUST reject the request. After reporting a SIP , the interpreter terminates the KPML session. To collect more digits, the requestor must issue a re-INVITE on the dialog. NOTE: This highlights the "one shot" nature of KPML, reflecting the balance of features and ease of implementing an interpreter. If your goal is to build an IVR session, we strongly suggest you investigate more appropriate technologies such as VoiceXML [8] or MSCML [9]. If the KPML document includes "http:" href targets, and the KPML interpreter does not support HTTP Reporting, the KPML interpreter MUST reject the document in its entirety at interpretation time with the appropriate SIP error as described in ?????. NOTE: draft-jennings-sip-app-info-00.txt should cover document rejection. It does not right now. This draft should not address document rejection, other than the criteria for rejection. This draft focuses on KPML, not on the initiation mechanism. Burger Expires December 29, 2003 [Page 10] Internet-Draft KPML June 2003 6. Mixing HTTP and SIP NOTE: So, now that Pandora's Box is open... There is nothing to prevent mixing SIP and HTTP reporting requests in the same KPML document. While this may be a benefit, it has definite drawbacks. The major drawback is that one cannot negotiate SIP-ness or HTTP-ness. As far as the endpoints are concerned it is all KPML. One could use "application/KPML+SIP+XML" and "application/ KPML+HTTP+XML", but that is pretty ugly. Burger Expires December 29, 2003 [Page 11] Internet-Draft KPML June 2003 7. Examples 7.1 Monitoring for Octorhorpe A common need for pre-paid and personal assistant applications is to monitor a conversation for a signal indicating a change in user focus from the party they called through the application to the application itself. For example, if you call a party using a pre-paid calling card and the party you call redirects you to voice mail, digits you press are for the voice mail system. However, many applications have a special key sequence, such as the octothorpe (#, or pound sign) or *9 that terminate the called party leg and shift the user's focus to the application. The following figure shows the KPML for long octothorpe. Note that the href is really on one line, but divided for clarity. Figure 1 - Long Octothorpe Example In this example, the parameter "session=19fsjcalksd" associates the http POST with the SIP call session. One can use other methods to associate the POST with a SIP call. The following examples will show these various methods. The regex value Z indicates the following digit needs to be a long-duration key press. F, from the H.248.1 DTMF package, is the octothorpe key. In fact, KPML supports all digits, 1-9, *, #, A-D from the H.248 DTMF.1 package. 7.2 Dial String Collection In this example, the user device collects a dial string. The application uses KPML to quickly determine when the user enters a target number. In addition, KPML indicates what type of number the user entered. Burger Expires December 29, 2003 [Page 12] Internet-Draft KPML June 2003 Figure 5 - IVR KPML Example Code The target of the http post, "sess$9aej08asd7", identifies the SIP session. NOTE: It is unclear if this usage of KPML is better than using a device control protocol like H.248.1. From the application's point of view, it has to do the low-level prompt-collect logic. Granted, it is relatively easy to change the key mappings for a given menu. However, often more of the call flow than a given menu mapping gets changed. Thus there would be little value in such a mapping to KPML. We STRONGLY suggest using a real scripting language such as VoiceXML or MSCML for this purpose. 7.4 SIP Request For example, the following figure is the example from Figure 1, but with SIP NOTIFY reporting. Figure 6 - Long Octothorpe Example The response body is identical to the response that Figure 1 would generate. Burger Expires December 29, 2003 [Page 14] Internet-Draft KPML June 2003 8. Report Body For HTTP or SIP responses, the body of the response from the user device is a KPML response form. The tag has an attribute, "digits". The digits attribute is the digit string. The digit string uses the conventional characters '*' and '#' for star and octothorpe respectively. Figure 7 shows a sample response body to the example in the Dial String Collection (Section 7.2) section. Figure 7 - Response Body NOTE: KPML does not include a timestamp. There are a number of reasons for this. First, what timestamp would in include? Would it be the time of the first detected keypress? The time the interpreter collected the entire string? A range? Second, if the RTP timestamp is a datum of interest, why not simply get RTP in the first place? That all said, if it is really compelling to have the timestamp in the response, it could be an attribute to the tag. Burger Expires December 29, 2003 [Page 15] Internet-Draft KPML June 2003 9. Formal Syntax The following syntax specification uses the augmented Data Type Definition (DTD) as described in XML [7]. Figure 8 - KPML DTD And, if you prefer XML Schema [5], here it is. Burger Expires December 29, 2003 [Page 16] Internet-Draft KPML June 2003 Burger Expires December 29, 2003 [Page 17] Internet-Draft KPML June 2003 Figure 9 - XML Schema for KPML Burger Expires December 29, 2003 [Page 18] Internet-Draft KPML June 2003 10. IANA Considerations 10.1 IANA Registration of MIME media type application/kpml+xml MIME media type name: application MIME subtype name: kpml+xml Required parameters: none Optional parameters: charset charset This parameter has identical semantics to the charset parameter of the "application/xml" media type as specified in XML Media Types [6]. Encoding considerations: See RFC3023 [6]. Interoperability considerations: See RFC2023 [6] and this document. Published specification: This document. Applications which use this media type: Session-oriented applications that have primitive user interfaces. Intended usage: COMMON 10.2 Schema Registration We really need a place to register the XML Schema. Where would that be? Burger Expires December 29, 2003 [Page 19] Internet-Draft KPML June 2003 11. Security Considerations Since a KPML document directs a device to send results to an arbitrary URI, one could construct distributed denial of service attacks. For example, an errant application might send out KPML to numerous endpoints, all reporting to a single, unrelated href. Thus the policies for accepting KPML, such as access control lists ("only accept KPML from these hosts"), report limiting lists ("only send KPML responses to these hosts"), and other protections must be well thought out. At the very least, the session startup protocol SHOULD be non-repudiable and secure. KPML presents no further security issues beyond the startup issues addressed in the companion documents to this document. As an XML markup, all of the security considerations of RFC3023 [6] apply. Burger Expires December 29, 2003 [Page 20] Internet-Draft KPML June 2003 Normative References [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [2] Fielding, R., Gettys, J., Mogul, J., Nielsen, H., Masinter, L., Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. [3] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [4] Groves, C., Pantaleo, M., Anderson, T. and T. Taylor, "Gateway Control Protocol Version 1", RFC 3525, June 2003. [5] Thompson, H., Beech, D., Maloney, M. and N. Mendelsohn, "XML Schema Part 1: Structures", W3C REC REC-xmlschema-1-20010502, May 2001. [6] Murata, M., St. Laurent, S. and D. Kohn, "XML Media Types", RFC 3023, January 2001. Burger Expires December 29, 2003 [Page 21] Internet-Draft KPML June 2003 Informative References [7] Bray, T., Paoli, J., Sperberg-McQueen, C. and E. Maler, "Extensible Markup Language (XML) 1.0 (Second Edition)", W3C REC REC-xml-20001006, October 2000. [8] World Wide Web Consortium, "Voice Extensible Markup Language (VoiceXML) Version 2.0", W3C Working Draft , April 2002, . [9] Burger, E., Van Dyke, J. and A. Spitzer, "SnowShore Media Server Control Markup Language and Protocol", draft-vandyke-mscml-00 (work in progress), November 2002. [10] Schulzrinne, H. and S. Petrack, "RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals", RFC 2833, May 2000. [11] Andreasen, F. and B. Foster, "Media Gateway Control Protocol (MGCP) Version 1.0", RFC 3435, January 2003. [12] Hunt, A. and S. McGlashan, "Speech Recognition Grammar Specification Version 1.0", W3C CR CR-speech-grammar-20020626, June 2002. [13] Van Dyke, J., Burger (Ed.), E. and A. Spitzer, "Basic Network Media Services with SIP", January 2003. [14] Campbell, B., Rosenberg, J., Schulzrinne, H., Huitema, C. and D. Gurle, "Session Initiation Protocol (SIP) Extension for Instant Messaging", RFC 3428, December 2002. [15] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", RFC 1889, January 1996. Author's Address Eric Burger SnowShore Networks, Inc. 285 Billerica Rd. Chelmsford, MA 01824-4120 USA EMail: e.burger@ieee.org Burger Expires December 29, 2003 [Page 22] Internet-Draft KPML June 2003 Appendix A. Contributors Robert Fairlie-Cuninghame, Cullen Jennings, Jonathan Rosenberg, and I were the members of the Application Stimulus Signaling Design Team. All members of the team contributed significantly to this work. In addition, Jonathan Rosenberg postulated DML in his "A Framework for Stimulus Signaling in SIP Using Markup" draft. This version of KPML has significant influence from MSCML, the SnowShore Media Server Control Markup Language. Jeff Van Dyke and Andy Spitzer were the primary contributors to that effort. That said, any errors, misinterpretation, or fouls in this document are my own. Burger Expires December 29, 2003 [Page 23] Internet-Draft KPML June 2003 Appendix B. Acknowledgements Hal Purdy and Eric Cheung of AT&T Laboratories helped immensely through many conversations and challenges. Steve Fisher of AT&T Laboratories helped with the digit suppression logic and syntax. Burger Expires December 29, 2003 [Page 24] Internet-Draft KPML June 2003 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. Full Copyright Statement Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assignees. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION Burger Expires December 29, 2003 [Page 25] Internet-Draft KPML June 2003 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Burger Expires December 29, 2003 [Page 26]