Internet Engineering Task Force MMUSIC WG Internet Draft H. Schulzrinne ietf-mmusic-stream-00.txt Columbia U. November 26, 1996 Expires: 26/8/97 A real-time stream control protocol (RTSP') STATUS OF THIS MEMO This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress''. To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. ABSTRACT This strawman proposal presents a revised version of the RTSP proposal put forward to the MMUSIC group, borrowing liberally from the original. The Real Time Streaming Protocol, or RTSP, is an application-level protocol for control over the delivery of data with real-time properties. RTSP provides an extensible framework to enable controlled, on-demand delivery of real- time data, such as audio and video. Sources of data can include both live data feeds and stored clips. This protocol is intended to control multiple data delivery sessions, provide a means for choosing delivery channels such as UDP, multicast UDP and H. Schulzrinne [Page 1] Internet Draft stream November 26, 1996 TCP, and delivery mechanisms based upon RTP (RFC 1889). 1 Introduction 1.1 Terminology conference: a multiparty, multimedia session, where "multi" implies greater than or equal to one. client: The client requests media data from the media server. entity: An entity is a participant in a conference. This participant may be non-human, e.g., a media record or playback server. media server: The network entity providing playback or recording services for one or more media streams. Different media streams within a session may originate from different media servers. A media server may reside on the same or a different host as the web server the media session is invoked from. (media) stream: A single media instance, e.g., an audio stream or a video stream as well as a whiteboard or shared application session. When using RTP, a stream consists of all RTP and RTCP packets created by a source within an RTP session. [TBD: terminology is confusing since there's an RTP session, which is used by a single RTSP stream.] media session: A collection of media streams to be treated. Typically, a client will synchronize all media streams within a media session. session description: A session description contains information about one or more media within a session, such as the set of encodings, network addresses and information about the content. The session description may take several different formats, including SDP and SDF. Both client and server can send commands. The protocol supports the following operations: Retrieval of media from media server: The client can request a session decription via HTTP or some other method. If the session is being multicast, the session description contains the multicast addresses and ports to be used. If the session is to be sent only to the client, the client provides the destination for security reasons. H. Schulzrinne [Page 2] Internet Draft stream November 26, 1996 Invitation of media server to conference: A media server can be "invited" to join an existing conference, either to play back media into the session or to record all or a subset of the media in a session. This mode is useful for distributed teaching applications. Several parties in the conference may take turns "pushing the remote control buttons". Adding media to an existing session: Particularly for live events, it is useful if the server can tell the client about additional media becoming available. 1.2 Requirements The protocol satisfies the following requirements extendable: new commands and parameters can be easily added easy to parse: standard HTTP or MIME parsers can (but do not have to be) used secure: re-uses web security mechanisms, either at the transport level (SSL) or within the requests (basic and digest authentication) transport-independent: may use either an unreliable datagram protocol (UDP), a reliable datagram protocol (RDP, not widely used) or a reliable stream protocol (TCP) by implementing application-level reliability multi-server capable: Each media stream within a session can reside on a different server. The client automatically establishes several concurrent control sessions with the different media servers. Media synchronization is performed at the transport level. multi-client capable: Stream identifiers can be used by several control streams, so that "passing the remote" is possible. The protocol does not address how several clients negotiate access; this is left to either a "social protocol" or some other floor control mechanism. control of recording devices: The protocol can control both recording and playback devices, as well as devices that can alternate between the two modes ("VCR"). separation of stream control and conference initiation: Stream control is divorced from inviting a media server to a conference. The only requirement is that the conference H. Schulzrinne [Page 3] Internet Draft stream November 26, 1996 initiation protocol either provides or can be used to create a unique conference identifier. In particular, S*IP or H.323 may be used to invite a server to a conference. suitable for professional applications: RTSP' supports frame-level accuracy through SMPTE time stamps to allow remote digital editing. S*IP compatible: As much as possible, stream control should be aligned with the IETF conference initiation effort. However, for simple applications, a media server should not have to implement a conference initiation protocol. session description neutral: The protocol does not impose a particular session description or metafile format and can convey the type of format to be used. However, the session description must contain an RTSP URI. proxy and firewall friendly: The protocol should be readily handled by both application and transport-layer (SOCKS) firewalls. For proxies, re-use of existing proxies should be possible, but remains to be verified. [TBD: what exactly is needed to make a protocol firewall-friendly?] A firewall may need to understand the SET_PORT directive to open a "hole" for the UDP media stream. HTTP friendly: Where sensible, RTSP re-uses HTTP concepts, so that the existing infrastructure can be re-used. 1.3 Extending the Protocol The protocol described below can be extended in three ways, listed in order of the magnitude of changes supported: o Existing commands can be extended with new parameters, as long as these parameters can be safely ignored by the recipient. (This is equivalent to adding new parameters to an HTML tag.) o New methods can be added. If the recipient of the message does not understand the request, it responds with error code 501 (Not implemented) and the sender can then attempt an earlier, less functional version. o A new version of the protocol can be defined, allowing almost all aspects (except the position of the protocol version number) to change. 1.4 Overall Operation H. Schulzrinne [Page 4] Internet Draft stream November 26, 1996 Each media stream and session is identified by an rtsp URL. The overall session and the properties of the media the session is made up of are defined by a session description file, the format of which is outside the scope of this specification. The session description file is retrieved using HTTP, either from the web server or the media server, typically using an URL with scheme http. The session description file contains a description of the media streams making up the media session, including their encodings, language, and other parameters that enable the client to choose the most appropriate combination of media. In this session description, each media stream is identified by an rtsp URL, which points to the media server handling that particular media stream. Several media streams can be located on different servers; for example, audio and video tracks can be split across servers for load sharing. The description also enumerates which transport methods the server is capable of. If desired, the session description can also contain only an RTSP URL, with the complete session description retrieved via RTSP. Besides the media parameters, the network destination address and port need to be determined. Several modes of operation can be distinguished: Unicast: The media is transmitted to the source of the RTSP request, with the port number picked by the client. Alternatively, the media is transmitted on the same reliable stream as RTSP. Multicast, server chooses address: The media server picks the multicast address and port. This is the typical case for a live or near-media-on-demand transmission. Multicast, client chooses address: If the server is to participate in an existing multicast conference, the multicast address, port and encryption key are given by the conference. 1.5 Relationship with Other Protocols RTSP' has some overlap in functionality with HTTP. It also needs to interact with the web in that the initial contact with streaming content is often to be made through a web page. The current protocol specification aims to allow different hand-off points between a web server and the media server implementing RTSP'. For example, the session description can be retrieved using HTTP or RTSP'. Having the session description be returned by the web server makes it possible to have the web server take care of authentication and billing, by handing out a session description whose media identifier includes an encrypted version of the requestor's IP address and a timestamp, with H. Schulzrinne [Page 5] Internet Draft stream November 26, 1996 a shared secret between web and media server. However, RTSP' differs fundamentally from HTTP in that data delivery takes place out-of-band, in a different protocol. HTTP is an asymmetric protocol, where the client issues requests and the server responds. In RTSP', both the media client and media server can issue requests. RTSP' requests are also not stateless, in that they may set parameters and continue to control a media stream long after the request has been acknowledged. Re-using HTTP functionality has advantages in at least two areas, namely security and proxies. The requirements are very similar, so having the ability to adopt HTTP work on caches, proxies and authentication is valuable. The current RTSP already has first hints on caches and proxies, but is nowhere near as complete as HTTP in that regard. It is possible to very quickly build a simple RTSP' server by adding a PLAY and, optionally, a SET_PARAMETER method to an existing HTTP/1.1 web server. All of RTSP' can be implemented as part of an HTTP server as long as only the client issues requests. While most real-time media will use RTP as a transport protocol, RTSP' is not tied to RTP. RTSP' assumes the existence of a session description format that can express both static and temporal properties of a media session containing several media streams. 2 Protocol Parameters 2.1 Message Format and Transmission RTSP is a text-based protocol [TBD] and uses the ISO 10646 character set in UTF-8 encoding (RFC 2044) [TBD; this conflicts with ]. Lines are terminated by CRLF, but receivers should be prepared to also interpret CR and LF by themselves as line terminators. Text-based protocols make it easier to add optional parameters in a self-describing manner. Since the number of parameters and the frequency of commands is low, processing efficiency is not a concern. Text-based protocols, if done carefully, also allow easy implementation in scripting languages such as Tcl, VisualBasic and Perl. The 10646 character set avoids tricky character set switching, but is invisible to the application as long as US-ASCII is being used. This H. Schulzrinne [Page 6] Internet Draft stream November 26, 1996 is also the encoding used for RTCP. ISO 8859-1 translates directly into Unicode, with a high-order octet of zero. ISO 8859-1 characters with the most-significant bit set are represented as 1100001x 10xxxxxx. RTSP messages can be carried over any lower-layer transport protocol that is 8-bit clean. Commands are acknowledged by the receiver unless they are sent to a multicast group. If there is no acknowledgement, the sender may resend the same message after a timeout of one round-trip time (RTT). The round-trip time is estimated as in TCP (RFC TBD), with an initial round-trip value of 500 ms. An implementation MAY cache the last RTT measurement as the initial value for future connections. If a reliable transport protocol is used to carry RTSP, the timeout value MAY be set to an arbitrarily large value. This can greatly increase responsiveness for proxies operating in local-area networks with small RTTs. The mechanism is defined such that the client implementation does not have be aware of whether a reliable or unreliable transport protocol is being used. It is probably a bad idea to have two reliability mechanisms on top of each other, although the RTSP RTT estimate is likely to be larger than the TCP estimate. Each request carries a sequence number, which is incremented by one for each request transmitted. If a request is repeated because of lack of acknowledgement, the sequence number is incremented. This avoids ambiguities when computing round-trip time estimates. [TBD: An initial sequence number negotiation needs to be added for UDP; otherwise, a new stream connection may see a request be acknowledged by a delayed response from an earlier "connection". This handshake can be avoided with a sequence number containing a timestamp of sufficiently high resolution.] The reliability mechanism described here does not protect against reordering. This may cause problems in some instances. For example, a STOP followed by a PLAY has quite a different effect than the reverse. Similarly, if a PLAY request arrives before all parameters are set due to reordering, the media server would have to issue an error indication. Since sequence numbers for retransmissions are incremented (to allow easy RTT estimation), the receiver cannot just ignore out-of-order packets. [TBD: This problem could be fixed by including both a sequence number that stays the same for retransmissions and a timestamp for RTT estimation.] H. Schulzrinne [Page 7] Internet Draft stream November 26, 1996 Systems implementing RTSP MUST support carrying RTSP over TCP and MAY support UDP. The default port for the RTSP server is [PORT] for both UDP and TCP. A number of RTSP packets destined for the same control end point may be packed into a single lower-layer PDU or encapsulated into a TCP stream. RTSP data MAY be interleaved with RTP and RTCP packets. An RTSP packet is terminated with an empty line. (TBD: doesn't work well for including session descriptions. Maybe use Content-length for payloads - these are usually imported anyway? or new page? Wrapping a packet in some kind of braces or parenthesis is another possibility, but again puts restrictions on the SDF.) Unless all but the RTP data is textual, there is not much point in keeping the payload as textual data, since visual debugging is more difficult and "telnet protocol emulation" is no longer possible. Length fields don't make much sense for textual data, particularly because of the line termination ambiguities, i.e., CR, LF and CRLF. There does not seem to be a need for an explicit, connection-oriented framing layer as in the original RTSP proposal. However, if we allow interleaving with RTP, a textual format gets very awkward. Requests contain methods, the object the method is operating upon and parameters to further describe the method. Methods are idempotent, unless otherwise noted. Methods are also designed to require little or no state maintenance at the media server. A message has the following format: Method Object Version Sequence-Number *(Parameter-Value) CRLF A message with a message body has the following format: Method Object Version Sequence-Number Content-length: *(Parameter-Value) CRLF message-body H. Schulzrinne [Page 8] Internet Draft stream November 26, 1996 After receiving and interpreting an RTSP' request, the server responds with an RTSP' response message. [TBD: proper BNF] A typical response to a request with sequence number 17 might be: RTSP/1.0 200 17 OK This format is HTTP-friendly; the sequence number is simply ignored by HTTP servers. The likelihood that a textual protocol will share the same port and not have that format seems fairly remote. RTP packets have the most-significant bit set and can thus be easily distinguished. If a connectionless transport protocol is used, the media server considers all packets originating from a single port number and network address to be part of the same session. [TBD: is this necessary?] 2.2 Session and Media URI The RTSP URL scheme is used to locate and control stream resources via the RTSP protocol. A media stream is identified by an textual session and media identifier, using the character set and escape conventions of URLs. The media identifier is separated from the session by a slash. Commands below can refer to either the whole session or an individual stream. Stream identifiers can be passed between clients ("passing the remote control"). A specific instance of a session, e.g., one of several concurrent transmissions of the same content, is appended where needed. The instance identifies the whole session, so that all media streams within that session have the same instance identifier. For example, rtsp://media.content.com:5000/twister/audio.en/1234 identifies instance 1234 of the stream audio.en within the session "twister", which is located at port 5000 of host media.content.com. H. Schulzrinne [Page 9] Internet Draft stream November 26, 1996 The ordering and significance of the path components of the rtsp URL is only of significance to the media server. This decoupling also allows session descriptions to be used with non-RTSP media control protocols, simply by replacing the scheme in the URL. 2.3 Encoding Identifiers RTP profile and/or MIME types. [TBD: should probably register all the RTP data types as MIME types.] 2.4 Conference Identifiers Conference identifiers are opaque to RTSP' and are encoded using standard URI encoding methods (i.e., escaping with %). They can contain any octet value. The conference identifier MUST be globally unique. For H.323, the conferenceID value is to be used. If the conference participant inviting the media server would only supply a conference identifier which is unique for that inviting party, the media server could add an internal identifier for that party, e.g., its Internet address. However, this would prevent that the conference participant and the initiator of the RTSP commands are two different entities. 2.5 Relative Timestamps A relative time-stamp expresses time relative to the start of the clip. Relative timestamps are expressed as SMPTE time codes for frame-level access accuracy. The time code has the format hours:minutes:seconds.frames, with the origin at the start of the clip. For NTSC, the frame rate is 29.97 frames per second. This is handled by dropping the first frame index of every minute, except every tenth minute. If the frame value is zero, it may be omitted. Examples: 10:12:33.40 10:7:33 2.6 Absolute Time H. Schulzrinne [Page 10] Internet Draft stream November 26, 1996 Absolute time is expressed as ISO 8601 timestamps. It is always expressed as UTC (GMT). Example for November 8, 1996 at 14h37 and 20 seconds GMT: 19961108T143720Z 3 Header Field Definitions 3.1 Accept The Accept request-header field can be used to specify certain session description types which are acceptable for the response. The only parameter allowed is that of level , which indicates the highest level or version accepted by the requestor. Example of use: Accept: application/sdf, application/sdp;level=2 3.2 Address 3.3 Allow The Allow response header field lists the methods supported by the resource identified by the Request-URI. The purpose of this field is to strictly inform the recipient of valid methods associated with the resource. An Allow header field must be present in a 405 (Method not allowed) response. Example of use: Allow: PLAY, RECORD, SET_PARAMETER 3.4 Authorization 3.5 Blocksize 3.6 Conference H. Schulzrinne [Page 11] Internet Draft stream November 26, 1996 This field establishes a logical connection between a conference, established using non-RTSP' means, and an RTSP stream. [TBD: This parameter is for further study. May not be needed with the Given parameter.] 3.7 Content-Length 3.8 Content-Type 3.9 Given 3.10 Location 3.11 Port 3.12 Range 3.13 Speed 3.14 Transport 3.15 TTL 4 Methods The Method token indicates the method to be performed on the resource identified by the Request-URI case-sensitive. New methods may be defined in the future. Method names may not start with a $ character (decimal 24) and must be a token 4.1 GET The GET method retrieves a session description from a server. It may use the Accept header to specify the session description formats that the client understands. GET twister RTSP/1.0 937 Accept: application/sdp, application/sdf, application/mheg If the media server has previously been invited to a conference, the GET method also contains a conference identifier or a Given parameter. H. Schulzrinne [Page 12] Internet Draft stream November 26, 1996 GET twister RTSP/1.0 834 Conference: 128.16.64.19/32492374 Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FTZQ== If the GET request contains a conference identifier, the media server MUST locate the conference description and use the multicast addresses and port numbers supplied in that description. The media server SHOULD only offer media types corresponding to the media types currently active within the conference. If the media server has no local reference to this conference, it returns status code 452. The conference invitation should also contain an indication whether the media server is expected to receive or generate media, or both. (A VCR-like device would support both directions.) If the invitation does not contain an indication of the operations to be performed, the media server should accept and then reject inappropriate operations. A typical response might be: 200 18 OK Content-Type: application/sdf session description 4.2 SESSION This method is used by a media server to send media information to the client. If a new media type is added to a session (e.g., during a live event), the whole session description should be sent again, rather than just the additional components. This allows the deletion of session components. Example: SESSION twister/*/1234 Content-Type: application/sdp Session Description Response: 200, 302, 303, 500, can't do this operation, busy, 4.3 PLAY The PLAY method tells the server to start sending data via the previously set transport mechanism. The Range header specifies the range. The range can be specified in a number of units. This specification defines the smpte (see Section 2.5) and clock (see H. Schulzrinne [Page 13] Internet Draft stream November 26, 1996 Section 2.6) range units. PLAY media-name Range: smpte= range-value The following example plays the whole session starting at SMPTE time code 0:10:20 until the end of the clip. PLAY twister/*/1234 Range: smpte=0:10:20- For playing back a recording of a live event, it may be desirable to use clock units: PLAY meeting/*/1234 Range: clock=19961108T142300Z-19961108T143520Z A media server only supporting playback MUST support the smpte format and MAY support the clock format. [TBD: It may be desirable to allow several ranges, so that remote digital editing can be done easily.] Response: 200, 500, 501, clock format not supported. 4.4 RECORD This method initiates eecording a range of media data according to the session description. The timestamp reflects start and end time (UTC). If no time range is given, use the start or end time provided in the session description. If the session has already started, commence recording immediately. The Conference header is mandatory. A media server supporting recording of live events MUST support the clock range format; the smpte format does not make sense. RECORD meeting/audio.en/1234 Conference: 128.16.64.19/32492374 H. Schulzrinne [Page 14] Internet Draft stream November 26, 1996 4.5 REDIRECT A redirect request informs the client that it must connect to another server location. It contains the mandatory header Location , which indicates that the client should issue a GET for that URL. It may contain the parameter Range , which indicates when the redirection takes effect. 4.6 SET_PARAMETER Both client and media server can issue this request. The following parameters are defined: Blocksize: This advisory parameter is sent from the client to the media server setting the transport packet size. The server truncates this packet size to the closest multiple of the minimum media-specific block size or overrides it with the media specific size if necessary. The block size is a strictly positive decimal number and measured in bytes. The server only returns an error (416) if the value is syntactically invalid, but not if the server adjusts it according to the mechanism described above or decides to simply ignore the advice. Port: UDP or TCP port to be used for this media. SSRC: RTP SSRC value to be used by the media server. This parameter is only valid for unicast transmission. It identifies the synchronization source to be associated with the media stream. This can be used for demultiplexing by the client of data received on the same port. Address: Destination network address, consisting of the address class identifier and the address. Currently, the address classes IP4 and IP6 are defined. Transport: Transport protocol stack to be used: UDP or TCP or interleaved, followed by the next-layer transport protocol. in whatever protocol is being used by the control stream. Currently, the next-layer protocols RTP is defined. Parameters may be added to each protocol, separated by a semicolon. For RTP, the boolean parameter compressed is defined, indicating compressed RTP according to RFC XXXX. Example: UDP RTP;compressed TTL: Multicast time-to-live value. In some cases, it may make sense for a client to ask a media server sending on a given multicast address to expand its range. H. Schulzrinne [Page 15] Internet Draft stream November 26, 1996 Speed: This advisory parameter sets the speed at which the server delivers data to the client, contingent on the server's ability and desire to serve the media stream at the given speed. Implementation by the server is optional. The default is the bit rate of the stream. The parameter value is expressed as a decimal ratio, e.g., 2.0 indicates that data is to be delivered twice as fast as normal. A speed of zero is invalid. A negative value indicates that the stream is to be played back in reverse direction. A request SHOULD only contain a single parameter to allow the client to determine why a particular request failed. A server MUST allow a parameter to be set repeatedly to the same value, but it MAY disallow changing parameter values. The parameters are split in a fine-grained fashion so that, for example, the client can set just the unicast port, without having to modify the destination address. There is no substantial difference between the privileged parameters and the parameters identified by family and parameter id in the current RTSP spec. If desired, parameter names could easily take the form family/parameter , e.g., Audio/Annotations A SET_PARAMETER request without parameters can be used as a way to detect whether the other side is still responding. Example: SET_PARAMETER twister/1234/audio.en RTSP/1.0 68 Speed: 2.3 [TBD: Or should this be like SET_PARAMETER? Bit longer, but forces single parameter per request.] 4.7 GET_PARAMETER Both client and media server can issue a GET_PARAMETER request to retrieve a specific parameter. All parameters described for the SET_PARAMETER request are valid. In the request, the message body contains the parameter value. Only one parameter can be requested in each GET_PARAMETER request. H. Schulzrinne [Page 16] Internet Draft stream November 26, 1996 Example: C->S: GET_PARAMETER twister/1234/audio.en RTSP/1.0 6 Content-length: 17 Audio/Annotations S->C: RTSP/1.0 200 6 OK Content-type: text/ascii Content-length: 2 64 4.8 STOP Stops delivery of stream immediately. Returns indication of current position to allow play instead of resume. Thus, RESUME is not needed. C->M: STOP movie RTSP/1.0 76 M->C: RTSP/1.0 200 76 OK 4.9 BYE Sent by either client or server to terminate a connection and release resources. 4.10 Embedded Data Stream The command DATA is used to indicate an embedded media data object, together with the content types. DATA requests are not acknowledged by RTSP'. The embedded object can have any type. For space-efficient encapsulation of binary data, the method in Section 4.11 should be used instead. DATA twisters/audio.en/1234 RTSP/1.1 Content-Length: 500 H. Schulzrinne [Page 17] Internet Draft stream November 26, 1996 Content-Type: message/rtp (RTP data) This is workable, but not very space-efficient. However, the interesting case is that of a single TCP stream carrying both commands and media data. There is no particular reason to have small chunks in that case. 4.11 Embedded Binary Data Binary packets such as RTP data are encapsulated by an ASCII dollar sign (24 decimal), followed by a one-byte session identifier, followed by the length of the encapsulated binary data as a binary, two-byte integer in network byte order. The binary data follows immediately afterwards, without a CRLF. This makes the encapsulation overhead 4 bytes, less than the 8 bytes imposed by SCP. 5 Status Codes Definitions Where applicable, HTTP status codes are re-used. [TBD: add those relevant here] 5.1 Successful 2xx 5.1.1 200 OK The request has succeeded. The information returned with the response depends on the method used in the request, for example: GET: the session description; GET_PARAMETER: the value of the parameter. 5.2 Redirection 3xx 5.2.1 301 Moved Permanently 5.2.2 303 Moved Temporarily 5.3 Client Error 4xx H. Schulzrinne [Page 18] Internet Draft stream November 26, 1996 5.3.1 400 Bad Request The request could not be understood by the recipient due to malformed syntax. The request SHOULD NOT be repeated without modification. 5.3.2 401 Unauthorized The request requires user authentication. 5.3.3 402 Payment Required This code is reserved for future use. 5.3.4 405 Method Not Allowed 5.3.5 406 Not Acceptable 5.3.6 408 Request Timeout 5.3.7 411 Length Required 5.3.8 414 Request URI Too Long 5.3.9 415 Unsupported Mediatype The recipient of the request is refusing to service the request because the entity of the request is in a format not supported by the requested resource for the requested method. 5.3.10 450 Invalid Parameter The parameter in the request is not valid, i.e., out of range or malformed. 5.3.11 451 Parameter Not Understood The recipient of the request does not support one or more parameters contained in the request. 5.3.12 452 Conference Not Found The conference indicated by a Conference: identifier is unknown to the media server. 5.3.13 453 Not Enough Bandwidth The request was refused since there was insufficient bandwidth. This may, for example, be the result of a resource reservation failure. H. Schulzrinne [Page 19] Internet Draft stream November 26, 1996 5.4 Server Error 5xx 5.4.1 500 Internal Server Error 5.4.2 501 Not Implemented 5.4.3 502 Bad Gateway 5.4.4 503 Service Unavailable The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated. 5.4.5 504 Gateway Timeout 5.4.6 505 RTSP Version Not Supported 6 Examples 6.1 Media on demand (unicast) Client C requests a movie media servers A (audio.content.com) and V (video.content.com). The media description is stored on a web server W. This, however, is transparent to the client. The client is only interested in the last part of the movie. The server requires authentication for this movie. The audio track can be dynamically switched between between two sets of encodings. The URL with scheme rtpsu indicates the media servers want to use UDP for exchanging RTSP messages. C->W: GET twister HTTP/1.0 Accept: application/sdf; application/sdp W->C: 200 OK Content-type: application/sdf (session (all (media (t audio) (oneof ((e PCMU/8000/1 89 DVI4/8000/1 90) (id lofi)) ((e DVI4/16000/2 90 DVI4/16000/2 91) (id hifi)) ) (language en) (id rtspu://audio.content.com/twister/audio.en/1234) ) H. Schulzrinne [Page 20] Internet Draft stream November 26, 1996 (media (t video) (e JPEG) (id rtspu://video.content.com/twister/video/1234) ) ) ) C->A: SET_PARAMETER twister/audio.en/1234/lofi RTSP/1.0 1 Port: 3056 Transport: RTP;compression A->C: RTSP/1.0 200 1 OK C->V: SET_PARAMETER twister/video/1234/hifi RTSP/1.0 2 Port: 3058 Transport: RTP;compression V->C: RTSP/1.0 200 2 OK C->V: PLAY twister/video/1234 RTSP/1.0 3 Range: smpte 0:10:00- V->C: RTSP/1.0 200 3 OK C->A: PLAY twister/audio.en/1234/lofi RTSP/1.0 4 Range: smpte 0:10:00- S->C: 200 4 OK Even though the audio and video track are on two different servers, may start at slightly different times and may drift with respect to each other, the client can synchronize the two using standard RTP methods. 6.2 Live Media Event Using Multicast The media server chooses the multicast address and port. Here, we assume that the web server only contains a pointer to the full description, while the media server M maintains the full description. During the session, a new subtitling stream is added. C->W: GET concert HTTP/1.0 W->C: HTTP/1.0 200 OK Content-Type: application/sdf H. Schulzrinne [Page 21] Internet Draft stream November 26, 1996 (session (id rtsp://live.content.com/concert) ) C->M: GET concert RTSP/1.0 1 M->C: RTSP/1.0 200 OK Content-Type: application/sdf (session (all (media (t audio) (id music) (a IP4 224.2.0.1) (p 3456)) )) C->M: PLAY concert/music RTSP/1.0 Range: smpte 1:12:0 M->C: RTSP/1.0 405 No positioning possible M->C: SESSION concert RTSP/1.0 Content-Type: application/sdf (session (all (media (t audio) (id music)) (media (t text) (id lyrics)) )) C->M: PLAY concert/lyrics RTSP/1.0 Since the session description already contains the necessary address information, the client does not set the transport address. The attempt to position the stream fails since this is a live event. 6.3 Playing media into an existing session A conference participant C wants to have the media server M play back a demo tape into an existing conference. When retrieving the session description, C indicates to the media server that the network addresses and encryption keys are already given by the conference, so they should not be chosen by the server. The example omits the simple ACK responses. C->M: GET demo RTSP/1.0 1 Accept: application/sdf, application/sdp Given: address, privacy H. Schulzrinne [Page 22] Internet Draft stream November 26, 1996 M->C: RTSP/1.0 200 1 OK Content-type: application/sdf (session (id 548) (media (t audio) (id sound) ) C->M: SET_PARAMETER demo/548/sound RTSP/1.0 2 Address: IP4 224.2.0.1 Port: 3456 TTL: 127 6.4 Recording Conference participant C asks the media server M to record a session. If the session description contains any alternatives, the server records them all. C->M: SESSION meeting RTSP/1.0 89 Content-type: application/sdp v=0 s=Mbone Audio i=Discussion of Mbone Engineering Issues M->C: 415 89 Unsupported Media Type Accept: application/sdf C->M: SESSION meeting RTSP/1.0 90 Content-type: application/sdf M->C: 200 90 OK C->M: RECORD meeting Range: clock 19961110T1925-19961110T2015 7 Access Authentication Besides limiting access, access authentication is also needed to avoid denial-of-service attacks. H. Schulzrinne [Page 23] Internet Draft stream November 26, 1996 8 Security Considerations The protocol offers the opportunity for a remote-control denial-of- service attack. The attacker, using a forged source IP address, can ask for a stream to be played back to that forged IP address. This can be prevented by a challenge-response authentication. If the goal is simply to prevent this denial-of-service attack, a default, widely known key can be used. If the client retrieves a session description, the server hand out an encrypted version of the client's IP address to the client during the initial retrieval of the session description. A Session Description A session description must be able to identify sessions and individual media streams. The per-media identifier is created by the entity creating the session description and is opaque to anyone else. It may contain any 8-bit value except CR and LF. B Notes on RTSP o The STREAM_HEADER functionality has been subsumed by the session description. o SEND_REPORT is not really needed. Should define an RTCP request with a random response interval. o Error reports are sent automatically. If server wants to terminate connection, it sends a BYE. o Resending (UDP_RESEND) should be handled by RTCP since it is always media-specific and RTCP can be readily flow-controlled to avoid congestion collapse. o Is STOP really needed? What's the difference between STOP and PAUSE? Resources (which?) cannot be released since there may be a PLAY command immediately. Bearing on resource reservation? C Author Addresses Henning Schulzrinne Dept. of Computer Science Columbia University 1214 Amsterdam Avenue New York, NY 10027 USA electronic mail: schulzrinne@cs.columbia.edu H. Schulzrinne [Page 24] Internet Draft stream November 26, 1996 D Acknowledgements This draft is based on the functionality of the RTSP draft. It also borrows format and descriptions from HTTP/1.1. H. Schulzrinne [Page 25]