MMUSIC Working Group S. Whitehead Internet-Draft Verizon Laboratories Inc. Expires: December 23, 2006 M.J. Montpetit Motorola Connected Home Solutions X. Marjou France Telecom June 21, 2006 An Evaluation of Session Initiation Protocol (SIP) for use in Streaming Media Applications draft-whitehead-mmusic-sip-for-streaming-media-01 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on December 23, 2006. Copyright Notice Copyright (C) The Internet Society (2006). Abstract This draft summarizes a set of use-cases and their associated requirements that suggest a convergence between the Session Initiation Protocol (SIP) [2] and the Real Time Streaming Protocol (RTSP and RTSP v2) [3] and [4] that may be beneficial for streaming S. Whitehead, et al. Expires December 23, 2006 [Page 1] Internet-Draft SIP for Streaming Media June 2006 media applications. This benefit is especially apparent in the context of converged/blended media services. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Use Case Scenarios . . . . . . . . . . . . . . . . . . . . . . 4 3.1. Characteristics . . . . . . . . . . . . . . . . . . . . . 4 3.2. Use Cases Descriptions . . . . . . . . . . . . . . . . . . 5 3.2.1. Video Surveillance . . . . . . . . . . . . . . . . . . 5 3.2.2. Blended services/videoconferencing . . . . . . . . . . 6 3.2.3. Sharing a video with another person over a multi-media call . . . . . . . . . . . . . . . . . . . 6 3.2.4. Allow access to personal/private video content . . . . 6 3.2.5. VOD services that requires resource or QOS-guarantees . . . . . . . . . . . . . . . . . . . . 6 3.2.6. Settlement across provider boundaries . . . . . . . . 7 3.2.7. Intelligent selection of media encoding . . . . . . . 7 4. Required capabilities/Derived Requirements . . . . . . . . . . 7 4.1. Scalability . . . . . . . . . . . . . . . . . . . . . . . 7 4.2. Signaling Latency . . . . . . . . . . . . . . . . . . . . 7 4.3. User identification, authentication and authorization . . 8 4.4. Accounting, charging, and settlements . . . . . . . . . . 8 4.5. Server/client Location Discovery . . . . . . . . . . . . . 8 4.6. NAT and Firewall Traversal . . . . . . . . . . . . . . . . 8 4.7. Session-based transport policy control . . . . . . . . . . 8 4.8. Extensible with respect to application control signaling . . . . . . . . . . . . . . . . . . . . . . . . 8 4.9. Support media negotiation . . . . . . . . . . . . . . . . 8 4.10. Allow proxies . . . . . . . . . . . . . . . . . . . . . . 9 4.11. Support media negotiation . . . . . . . . . . . . . . . . 9 4.12. Support auto-configuration/installation . . . . . . . . . 9 4.13. Keeping DRM rights during the mobility session . . . . . . 9 5. Rationale for using SIP and RSTP for streaming application . . 9 6. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 10 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 8. Security Considerations . . . . . . . . . . . . . . . . . . . 10 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 10 Appendix B. Change History . . . . . . . . . . . . . . . . . . . 10 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12 Intellectual Property and Copyright Statements . . . . . . . . . . 13 S. Whitehead, et al. Expires December 23, 2006 [Page 2] Internet-Draft SIP for Streaming Media June 2006 1. Introduction IP-based networks are continually improving in terms of bandwidth capacity and transport quality of service. At the same time, broadband services are continually expanding globally -- both in terms of reach and value-added. These developments are leading to an increase in the number and variety of deployment scenarios for streaming media applications. Many of these scenarios impose challenging new requirements on the signaling protocols used for these applications in terms of flexibility, scalability and network independence. Historically, RTSP [3] and [4] has been the protocol of choice for streaming media applications and has covered both session control and media control. An obvious approach to address these new requirements then is to extend RTSP. This strategy appears to be able to address some of the new requirements, but not others. In particular extending RTSP to meet some of these new requirements would involve introducing protocol mechanisms that already exist elsewhere, namely in SIP and its associated extensions. An alternative approach is to consider the possibility of using SIP for some of the functions needed by streaming media applications. While historically SIP has been used for communication services, the protocol itself is flexible enough (by design) to signal a wide variety of media streams. Moreover, driven in large part by the requirements associated with IP-based communication services, SIP has been extended over the years to address many of the same requirements currently facing next-generation media streaming applications. Rather than reinvent or duplicate protocol mechanisms in RTSP that already exist in SIP, a reasonable strategy may be to find a way to use SIP instead or in conjunction with RTSP. This document presents some of the use cases that suggest a convergence between SIP and RTSP. These will also be eventually used to derive requirements on the service signaling protocol. Then, high-level strategies will be defined. The goal of the draft is not propose all SIP or SIP/RTSP solutions but to considers possible ways in which RTSP is not sufficient for some streaming media applications and where SIP could fill the gap. The purpose of the document is to give a list of use-cases (section 3), a list of derived capabilities (section 4)", a rationale for considering SIP in conjunction with RTSP (section 5), and recommendations for future work (section 6). The technical solution options may be investigated in a future draft based on the current document. S. Whitehead, et al. Expires December 23, 2006 [Page 3] Internet-Draft SIP for Streaming Media June 2006 2. Terminology In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in BCP 14, [1] and indicate requirement levels for compliant implementations. See [3] and [2] for terminology. ...others to be added when necessary... 3. Use Case Scenarios The scope of applications for this document includes applications with the following characteristics: content-on-demand, streaming media, unicast-media streams, live or recorded content, ubiquitous access (any-device, any-access). While of interest, non-streaming media applications, such as downloaded media services, are outside the scope of this document. 3.1. Characteristics For the purposes of this document, the term 'controlled streaming media application' represents the class of applications with the following characteristics: o multiple servers that can be a source of content but showing up as a single muxed stream at the client o one or more clients can receive the content o the media stream(s) needs to be delivered isochronously, in the most common case: the client intends to begin rendering the media before delivery is complete o less common but equally valid, the server does not have resources to buffer content until the client is ready to receive it, e.g., a live feed o a session exists between source (e.g. server or peer) and destination (e.g. client or peer) o the session is established, managed, and terminated through the use of a signaling protocol, in which control messages are exchanged (either directly or indirectly) between the source and the destination referred to as 'session signaling' o the application supports media stream control. The client (or a proxy element acting on behalf of the client(s)) has the ability to manipulate the media stream (or other aspect of the application) via a signaling. This is referred to as application- S. Whitehead, et al. Expires December 23, 2006 [Page 4] Internet-Draft SIP for Streaming Media June 2006 signaling (or media control signaling). 3.2. Use Cases Descriptions As IP-based broadband data services have continued to develop and expand, opportunities for streaming media applications have also proliferated and expanded beyond the traditional framework. This section describes several streaming media application use case scenarios. These scenarios illustrate the variety of conditions and environments in which streaming media applications need to operate. Use cases are used with the purpose of clarifying the 'streaming media application' and to explore the application space. The objectives are to: o clarify the frame / scope the discussion o illustrate some of usage scenarios o identify some of the key attributes that characterize these use cases These use-cases will clearly show that most of the time the choice is not SIP or RTSP but that the advantages of more integration lead to more robust solutions. The use cases also show that that the session setup negotiations are usually independent of the media controls. Hence in many of the use cases SIP could be used to replace the RTSP SETUP and DESCRIBE methods, leaving RTSP to be used for media control only. 3.2.1. Video Surveillance This is the first of a number of use cases that relate to "conversational video". In this use case a user wants: o either to switch from the unidirectional audio-video monitoring session into a 2-way (bi-directional) conversation when wanting to interact with an unknown visitor. In this case the session is "upgraded" from one-way to two-way. o or to switch from the unidirectional "live" received content to a unidirectional "recorded" content with a rewind "play trick command" Instead of RTSP, SIP can be used to setup the communication, which provides the ability to switch from the one-way monitoring communication to the bidirectional communication. RTSP features related to trick plays can be used to remotely interact on the stream. For example the remote viewer should be able to place a rewind command to the previously recorded content. This is one use case for the convergence of both protocols. S. Whitehead, et al. Expires December 23, 2006 [Page 5] Internet-Draft SIP for Streaming Media June 2006 3.2.2. Blended services/videoconferencing In this use case, the user wants to switch between a "bidirectional" live conversation to a "unidirectional" recorded content. This use case is also for multiple services (streaming/communications/info) to use common signaling infrastructure. In this case SIP is again used for authentication billing and location. RTSP allows the remote viewer/user to place a "trick mode" rewind into a command videoconferencing context (to see what happened before joining in). 3.2.3. Sharing a video with another person over a multi-media call This is another "conversational video" use case. A user already in an audio-video conversation (using a SIP based protocol) wants to provide local audio-video content to the called party. If the remote user wants to watch the live content (you should see what I am witnessing now) or request the content (can I look at the game you recorded yesterday?), a lot of the communication setup, generally established by RTSP, can be avoided with SIP the relationship (including authentication and billing) between the two parties is already established. This use case does not mean that the bidirectional "conversational" are switched to unidirectional "recorded" streams, but that "recorded" streams are to be added to the "conversational" streams. 3.2.4. Allow access to personal/private video content In this scenario a user wants to remotely access personal content stored on a variety of media devices (watch at pre-recorded show from a mobile device at work). While the streaming of the content and the trick plays will use established RTSP functionality the use of SIP locator services, strong authentication and authorization as well as presence make this solution more feasible. For example, if the content is stored on a device located on a private network behind a FW/NAT, SIP via its name/location registration mecanism can be used to locate and connect to the device. In addition for this use-case, the access to a video content should make "session mobility" possible. In other words, when viewing a video content, it should be possible for the session of a user to seamlessly switch from one terminal (e.g.: mobile phone) to another one (e.g.: television). 3.2.5. VOD services that requires resource or QOS-guarantees Consider a Video on Demand (stored video) service provided as a unicast session to an end user device from a server. The user requests a VOD movie. The VOD server determines the video that user wants to watch, and then contacts the appropriate network element (NE) and requests to reserve resources for the user and confirm back S. Whitehead, et al. Expires December 23, 2006 [Page 6] Internet-Draft SIP for Streaming Media June 2006 to the server. The problem is here that until this point, the NE can not fully estimate and negotiate the media resources needed for the whole duration of the VOD. SIP has pre-conditions that could be used; RTSP has no such functionality. 3.2.6. Settlement across provider boundaries If a commercial VOD service is being offered by one party (e.g. a service provider) but receives carriage (transport) by one or more other parties (network providers) a mechanism is needed to allow service signaling to exchange transaction identifiers for the purpose of charge correlation and settlements. SIP (and its associated extensions) supports these capabilities. RTSP at present does not. 3.2.7. Intelligent selection of media encoding A user orders content to be delivered to its current device the content could exist in different format (e.g. standard definition or high definition) or encoding (MPEG2 or MPEG4 for example). Media negociations may need to be informed by network transport capabilities. This is based on knowledge of access-network type. ... need more words ... 4. Required capabilities/Derived Requirements This section lists key requirements derived from the application use cases described in the previous section. The requirements are described in terms of the capabilities provided by a prospective solution. 4.1. Scalability Any solution must be able to accommodate: o Millions of clients and servers o An individual server may need to support thousands of parallel sessions. o An individual client may need to support a number of simultaneous sessions 4.2. Signaling Latency Because the use cases refer to live content the latency budget is important for the user experience. Thus, the solution must support the following requirements S. Whitehead, et al. Expires December 23, 2006 [Page 7] Internet-Draft SIP for Streaming Media June 2006 o The session negotiation should complete in a few seconds at most (TBC: need confirmation). o The media control operations should complete is less than a second (TBC: need confirmation). 4.3. User identification, authentication and authorization In many targeted personal video streaming solution (including peer to peer) there is still a need for identifying source and destination, authenticate users and authorize access. 4.4. Accounting, charging, and settlements In the case of commercial applications billing aspects need to be addressed. Billing aspects must also provide mechanisms to suppport charging correlation and settlements between one or more parties that collaborate to deliver the service. 4.5. Server/client Location Discovery Location and discovery of the end point of a session is essential for personalization and targeted services. 4.6. NAT and Firewall Traversal The solution must provide a way to traverse NATs and Firewalls. For example, when switching from a remote monitoring communication to a conversational communication, the NATs bindings should not be renegotiated. 4.7. Session-based transport policy control The signaling mechanism should provide a means to insure that sufficient network resources are available to deliver the service at the desired quality of experience. In the event that sufficient resources are unavailable, the signaling mecanism should provide a means for denying the service request. 4.8. Extensible with respect to application control signaling (Support many different application types) ...words to be added... 4.9. Support media negotiation The solution must provide a way to negotiate all media sessions (e.g.: conversational and streaming) as a whole, as described in the S. Whitehead, et al. Expires December 23, 2006 [Page 8] Internet-Draft SIP for Streaming Media June 2006 RFC3264 with the offer/answer so that both parties can estimate the media resources involved in the session. 4.10. Allow proxies 4.11. Support media negotiation The signaling protocol should support the following capabilities with respect to negotiating media flows: o Support for negotiating per-flow/media QoS and bandwidth requirements. o Ability to add, delete, and modify media flows to a session. o Ability to support both uni-directional an bi-directional flows in a single session. 4.12. Support auto-configuration/installation ...words to be added... 4.13. Keeping DRM rights during the mobility session ...words to be added... 5. Rationale for using SIP and RSTP for streaming application The set of use cases outlined above present a compelling argument for considering SIP for establishing "conversational video" communications or for those sessions potentially becoming conversations. SIP also supports a rich set of capabilities that are useful in the context of commercial streaming media applications. In particular, SIP has the following properties that can be leveraged of in streaming sessions: o SIP acts as a rendezvous protocol (with many capabilities) o SIP carries the session description protocol o SIP supports invitation to unicast or multicast sessions o SIP supports SDP with the Offer/Answer Model o SIP works with NAT via ICE for SIP o SIP supports unidirectional and bidirectional communications e.g., can switch to 2-way or mix with streaming services o SIP supports a set of P-headers useful in the context of commercial service settings, for example: * charging-ids : useful for 3rd party content providers * access-network headers: useful for inferring proper content encoding We support the use of RTSP for controlling the playback of media flows such as the delivery of webcam monitoring content, the delivery S. Whitehead, et al. Expires December 23, 2006 [Page 9] Internet-Draft SIP for Streaming Media June 2006 of Video on Demand and the delivery of IPTV. RTSP as defined in [3] has the following properties that are important to build on: o RTSP acts as a lightweight rendezvous protocol o RTSP supports trick plays and media control (pause/rewind/forward/...) o RTSP carries and interprets the session description protocol o RTSP supports invitation to unicast or multicast sessions o RTSP is a recognized standard for streaming applications There are obvious overlaps and more work is needed to assess how the two protocols can be made to better interwork without major disruptions to existing applications. 6. Recommendations We propose that further work be initiated to explorer how to signal streaming media sessions using SIP based on the use cases defined in this document. We propose to reuse RTSP as a control stream negotiated by SIP/SDP. 7. IANA Considerations The RTSP 'encoding format' and the new media attributes may need to be registered. 8. Security Considerations No rogue 3rd party should be allowed to get the token and use it to setup an un-authorized RTSP session. Appendix A. Acknowledgements Many thanks to those who provided valuable inputs for this document namely Darren Loher, C. Steck, Osher Hmelnizky, Jonathan Rosenberg, David Ress, Ravishankar Shiroor, Martti Mela and Xupei Li. Many Thanks to David Ress, JK Muthukumarasamy, Jim Baratz and Sam Ganesan for many emails and personal discussions. Appendix B. Change History v01 S. Whitehead, et al. Expires December 23, 2006 [Page 10] Internet-Draft SIP for Streaming Media June 2006 o Removed sections on particular solutions o Refined the use cases sections and the scenarios o Added recommendations based on discussions at and after IETF 65. 9. Normative References [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [2] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [3] Schulzrinne, H., Rao, A., and R. Lanphier, "RTSP: Real Time Streaming Protocol", RFC 2326, April 1998. [4] Schulzrinne, H., Rao, A., Lanphier, R., Westerlund, M., and A. Narasimhan, "Real Time Streaming Protocol 2.0 (RTSP)", October 2005. S. Whitehead, et al. Expires December 23, 2006 [Page 11] Internet-Draft SIP for Streaming Media June 2006 Authors' Addresses Steven Whitehead Verizon Laboratories Inc. 40 Sylvan Road Waltham, MA 02451 USA Email: steven.d.whitehead@verizon.com Marie-Jose Montpetit Motorola Connected Home Solutions 55 Hayden Avenue, 1st floor Lexington, MA 02421 USA Email: mmontpetit@motorola.com Xavier Marjou France Telecom Rue Pierre Marzin Lannion 22300 France Email: xavier.marjou@orange-ft.com S. Whitehead, et al. Expires December 23, 2006 [Page 12] Internet-Draft SIP for Streaming Media June 2006 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2006). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. S. Whitehead, et al. Expires December 23, 2006 [Page 13]