Internet Engineering Task Force                                MMUSIC WG
Internet Draft                                            H. Schulzrinne
ietf-mmusic-stream-00.txt                                    Columbia U.
November 26, 1996
Expires: 26/8/97


              A real-time stream control protocol (RTSP')

STATUS OF THIS MEMO

   This document is an Internet-Draft. Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as ``work in progress''.

   To learn the current status of any Internet-Draft, please check the
   ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
   Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
   ftp.isi.edu (US West Coast).

   Distribution of this document is unlimited.

                                 ABSTRACT


         This strawman proposal presents a revised version of the
         RTSP proposal put forward to the MMUSIC group, borrowing
         liberally from the original.

         The Real Time Streaming Protocol, or RTSP, is an
         application-level protocol for control over the delivery
         of data with real-time properties. RTSP provides an
         extensible framework to enable controlled, on-demand
         delivery of real- time data, such as audio and video.
         Sources of data can include both live data feeds and
         stored clips. This protocol is intended to control
         multiple data delivery sessions, provide a means for
         choosing delivery channels such as UDP, multicast UDP and


H. Schulzrinne                                                [Page 1]

Internet Draft                   stream                November 26, 1996


         TCP, and delivery mechanisms based upon RTP (RFC 1889).

1 Introduction

1.1 Terminology

   conference: a multiparty, multimedia session, where "multi" implies
        greater than or equal to one.

   client: The client requests media data from the media server.

   entity: An entity is a participant in a conference. This participant
        may be non-human, e.g., a media record or playback server.

   media server: The network entity providing playback or recording
        services for one or more media streams. Different media streams
        within a session may originate from different media servers. A
        media server may reside on the same or a different host as the
        web server the media session is invoked from.

   (media) stream: A single media instance, e.g., an audio stream or a
        video stream as well as a whiteboard or shared application
        session. When using RTP, a stream consists of all RTP and RTCP
        packets created by a source within an RTP session.

   [TBD: terminology is confusing since there's an RTP session, which is
   used by a single RTSP stream.]

   media session: A collection of media streams to be treated.
        Typically, a client will synchronize all media streams within a
        media session.

   session description: A session description contains information about
        one or more media within a session, such as the set of
        encodings, network addresses and information about the content.
        The session description may take several different formats,
        including SDP and SDF.

   Both client and server can send commands.

   The protocol supports the following operations:

   Retrieval of media from media server: The client can request a
        session decription via HTTP or some other method. If the session
        is being multicast, the session description contains the
        multicast addresses and ports to be used. If the session is to
        be sent only to the client, the client provides the destination
        for security reasons.


H. Schulzrinne                                                [Page 2]

Internet Draft                   stream                November 26, 1996


   Invitation of media server to conference: A media server can be
        "invited" to join an existing conference, either to play back
        media into the session or to record all or a subset of the media
        in a session.  This mode is useful for distributed teaching
        applications. Several parties in the conference may take turns
        "pushing the remote control buttons".

   Adding media to an existing session: Particularly for live events, it
        is useful if the server can tell the client about additional
        media becoming available.

1.2 Requirements

   The protocol satisfies the following requirements

   extendable: new commands and parameters can be easily added

   easy to parse: standard HTTP or MIME parsers can (but do not have to
        be) used

   secure: re-uses web security mechanisms, either at the transport
        level (SSL) or within the requests (basic and digest
        authentication)

   transport-independent: may use either an unreliable datagram protocol
        (UDP), a reliable datagram protocol (RDP, not widely used) or a
        reliable stream protocol (TCP) by implementing application-level
        reliability

   multi-server capable: Each media stream within a session can reside
        on a different server. The client automatically establishes
        several concurrent control sessions with the different media
        servers.  Media synchronization is performed at the transport
        level.

   multi-client capable: Stream identifiers can be used by several
        control streams, so that "passing the remote" is possible. The
        protocol does not address how several clients negotiate access;
        this is left to either a "social protocol" or some other floor
        control mechanism.

   control of recording devices: The protocol can control both recording
        and playback devices, as well as devices that can alternate
        between the two modes ("VCR").

   separation of stream control and conference initiation: Stream
        control is divorced from inviting a media server to a
        conference. The only requirement is that the conference


H. Schulzrinne                                                [Page 3]

Internet Draft                   stream                November 26, 1996


        initiation protocol either provides or can be used to create a
        unique conference identifier. In particular, S*IP or H.323 may
        be used to invite a server to a conference.

   suitable for professional applications: RTSP' supports frame-level
        accuracy through SMPTE time stamps to allow remote digital
        editing.

   S*IP compatible: As much as possible, stream control should be
        aligned with the IETF conference initiation effort. However, for
        simple applications, a media server should not have to implement
        a conference initiation protocol.

   session description neutral: The protocol does not impose a
        particular session description or metafile format and can convey
        the type of format to be used. However, the session description
        must contain an RTSP URI.

   proxy and firewall friendly: The protocol should be readily handled
        by both application and transport-layer (SOCKS) firewalls.  For
        proxies, re-use of existing proxies should be possible, but
        remains to be verified. [TBD: what exactly is needed to make a
        protocol firewall-friendly?] A firewall may need to understand
        the SET_PORT directive to open a "hole" for the UDP media
        stream.

   HTTP friendly: Where sensible, RTSP re-uses HTTP concepts, so that
        the existing infrastructure can be re-used.

1.3 Extending the Protocol

   The protocol described below can be extended in three ways, listed in
   order of the magnitude of changes supported:

        o Existing commands can be extended with new parameters, as long
         as these parameters can be safely ignored by the recipient.
         (This is equivalent to adding new parameters to an HTML tag.)

        o New methods can be added. If the recipient of the message does
         not understand the request, it responds with error code 501
         (Not implemented) and the sender can then attempt an earlier,
         less functional version.

        o A new version of the protocol can be defined, allowing almost
         all aspects (except the position of the protocol version
         number) to change.

1.4 Overall Operation


H. Schulzrinne                                                [Page 4]

Internet Draft                   stream                November 26, 1996


   Each media stream and session is identified by an rtsp URL. The
   overall session and the properties of the media the session is made
   up of are defined by a session description file, the format of which
   is outside the scope of this specification. The session description
   file is retrieved using HTTP, either from the web server or the media
   server, typically using an URL with scheme http.

   The session description file contains a description of the media
   streams making up the media session, including their encodings,
   language, and other parameters that enable the client to choose the
   most appropriate combination of media. In this session description,
   each media stream is identified by an rtsp URL, which points to the
   media server handling that particular media stream. Several media
   streams can be located on different servers; for example, audio and
   video tracks can be split across servers for load sharing. The
   description also enumerates which transport methods the server is
   capable of. If desired, the session description can also contain only
   an RTSP URL, with the complete session description retrieved via
   RTSP.

   Besides the media parameters, the network destination address and
   port need to be determined. Several modes of operation can be
   distinguished:

   Unicast: The media is transmitted to the source of the RTSP request,
        with the port number picked by the client. Alternatively, the
        media is transmitted on the same reliable stream as RTSP.

   Multicast, server chooses address: The media server picks the
        multicast address and port. This is the typical case for a live
        or near-media-on-demand transmission.

   Multicast, client chooses address: If the server is to participate in
        an existing multicast conference, the multicast address, port
        and encryption key are given by the conference.

1.5 Relationship with Other Protocols

   RTSP' has some overlap in functionality with HTTP. It also needs to
   interact with the web in that the initial contact with streaming
   content is often to be made through a web page. The current protocol
   specification aims to allow different hand-off points between a web
   server and the media server implementing RTSP'. For example, the
   session description can be retrieved using HTTP or RTSP'. Having the
   session description be returned by the web server makes it possible
   to have the web server take care of authentication and billing, by
   handing out a session description whose media identifier includes an
   encrypted version of the requestor's IP address and a timestamp, with


H. Schulzrinne                                                [Page 5]

Internet Draft                   stream                November 26, 1996


   a shared secret between web and media server.

   However, RTSP' differs fundamentally from HTTP in that data delivery
   takes place out-of-band, in a different protocol. HTTP is an
   asymmetric protocol, where the client issues requests and the server
   responds. In RTSP', both the media client and media server can issue
   requests. RTSP' requests are also not stateless, in that they may set
   parameters and continue to control a media stream long after the
   request has been acknowledged.


        Re-using HTTP functionality has advantages in at least two
        areas, namely security and proxies. The requirements are
        very similar, so having the ability to adopt HTTP work on
        caches, proxies and authentication is valuable. The current
        RTSP already has first hints on caches and proxies, but is
        nowhere near as complete as HTTP in that regard.

   It is possible to very quickly build a simple RTSP' server by adding
   a PLAY and, optionally, a SET_PARAMETER method to an existing
   HTTP/1.1 web server. All of RTSP' can be implemented as part of an
   HTTP server as long as only the client issues requests.

   While most real-time media will use RTP as a transport protocol,
   RTSP' is not tied to RTP.

   RTSP' assumes the existence of a session description format that can
   express both static and temporal properties of a media session
   containing several media streams.

2 Protocol Parameters

2.1 Message Format and Transmission

   RTSP is a text-based protocol [TBD] and uses the ISO 10646 character
   set in UTF-8 encoding (RFC 2044) [TBD; this conflicts with ]. Lines
   are terminated by CRLF, but receivers should be prepared to also
   interpret CR and LF by themselves as line terminators.

        Text-based protocols make it easier to add optional
        parameters in a self-describing manner. Since the number of
        parameters and the frequency of commands is low, processing
        efficiency is not a concern. Text-based protocols, if done
        carefully, also allow easy implementation in scripting
        languages such as Tcl, VisualBasic and Perl.

   The 10646 character set avoids tricky character set switching, but is
   invisible to the application as long as US-ASCII is being used. This


H. Schulzrinne                                                [Page 6]

Internet Draft                   stream                November 26, 1996


   is also the encoding used for RTCP. ISO 8859-1 translates directly
   into Unicode, with a high-order octet of zero. ISO 8859-1 characters
   with the most-significant bit set are represented as 1100001x
   10xxxxxx.

   RTSP messages can be carried over any lower-layer transport protocol
   that is 8-bit clean.

   Commands are acknowledged by the receiver unless they are sent to a
   multicast group. If there is no acknowledgement, the sender may
   resend the same message after a timeout of one round-trip time (RTT).
   The round-trip time is estimated as in TCP (RFC TBD), with an initial
   round-trip value of 500 ms. An implementation MAY cache the last RTT
   measurement as the initial value for future connections. If a
   reliable transport protocol is used to carry RTSP, the timeout value
   MAY be set to an arbitrarily large value.

        This can greatly increase responsiveness for proxies
        operating in local-area networks with small RTTs. The
        mechanism is defined such that the client implementation
        does not have be aware of whether a reliable or unreliable
        transport protocol is being used. It is probably a bad idea
        to have two reliability mechanisms on top of each other,
        although the RTSP RTT estimate is likely to be larger than
        the TCP estimate.

   Each request carries a sequence number, which is incremented by one
   for each request transmitted. If a request is repeated because of
   lack of acknowledgement, the sequence number is incremented.

        This avoids ambiguities when computing round-trip time
        estimates.  [TBD: An initial sequence number negotiation
        needs to be added for UDP; otherwise, a new stream
        connection may see a request be acknowledged by a delayed
        response from an earlier "connection". This handshake can
        be avoided with a sequence number containing a timestamp of
        sufficiently high resolution.]

   The reliability mechanism described here does not protect against
   reordering. This may cause problems in some instances. For example, a
   STOP followed by a PLAY has quite a different effect than the
   reverse.  Similarly, if a PLAY request arrives before all parameters
   are set due to reordering, the media server would have to issue an
   error indication.  Since sequence numbers for retransmissions are
   incremented (to allow easy RTT estimation), the receiver cannot just
   ignore out-of-order packets. [TBD: This problem could be fixed by
   including both a sequence number that stays the same for
   retransmissions and a timestamp for RTT estimation.]


H. Schulzrinne                                                [Page 7]

Internet Draft                   stream                November 26, 1996


   Systems implementing RTSP MUST support carrying RTSP over TCP and MAY
   support UDP. The default port for the RTSP server is [PORT] for both
   UDP and TCP.

   A number of RTSP packets destined for the same control end point may
   be packed into a single lower-layer PDU or encapsulated into a TCP
   stream.  RTSP data MAY be interleaved with RTP and RTCP packets. An
   RTSP packet is terminated with an empty line. (TBD: doesn't work well
   for including session descriptions. Maybe use Content-length for
   payloads - these are usually imported anyway? or new page? Wrapping a
   packet in some kind of braces or parenthesis is another possibility,
   but again puts restrictions on the SDF.)


        Unless all but the RTP data is textual, there is not much
        point in keeping the payload as textual data, since visual
        debugging is more difficult and "telnet protocol emulation"
        is no longer possible.  Length fields don't make much sense
        for textual data, particularly because of the line
        termination ambiguities, i.e., CR, LF and CRLF.  There does
        not seem to be a need for an explicit, connection-oriented
        framing layer as in the original RTSP proposal. However, if
        we allow interleaving with RTP, a textual format gets very
        awkward.

   Requests contain methods, the object the method is operating upon and
   parameters to further describe the method. Methods are idempotent,
   unless otherwise noted. Methods are also designed to require little
   or no state maintenance at the media server.

   A message has the following format:


   Method Object Version Sequence-Number
   *(Parameter-Value)
   CRLF


   A message with a message body has the following format:


   Method Object Version Sequence-Number
   Content-length:
   *(Parameter-Value)
   CRLF
   message-body


H. Schulzrinne                                                [Page 8]

Internet Draft                   stream                November 26, 1996


   After receiving and interpreting an RTSP' request, the server
   responds with an RTSP' response message.

   [TBD: proper BNF]

   A typical response to a request with sequence number 17 might be:


   RTSP/1.0 200 17 OK


        This format is HTTP-friendly; the sequence number is simply
        ignored by HTTP servers. The likelihood that a textual
        protocol will share the same port and not have that format
        seems fairly remote. RTP packets have the most-significant
        bit set and can thus be easily distinguished.

   If a connectionless transport protocol is used, the media server
   considers all packets originating from a single port number and
   network address to be part of the same session. [TBD: is this
   necessary?]

2.2 Session and Media URI

   The RTSP URL scheme is used to locate and control stream resources
   via the RTSP protocol.

   A media stream is identified by an textual session and media
   identifier, using the character set and escape conventions of URLs.
   The media identifier is separated from the session by a slash.
   Commands below can refer to either the whole session or an individual
   stream. Stream identifiers can be passed between clients ("passing
   the remote control"). A specific instance of a session, e.g., one of
   several concurrent transmissions of the same content, is appended
   where needed.  The instance identifies the whole session, so that all
   media streams within that session have the same instance identifier.

   For example,


   rtsp://media.content.com:5000/twister/audio.en/1234


   identifies instance 1234 of the stream audio.en within the session
   "twister", which is located at port 5000 of host media.content.com.


H. Schulzrinne                                                [Page 9]

Internet Draft                   stream                November 26, 1996


   The ordering and significance of the path components of the rtsp URL
   is only of significance to the media server.


        This decoupling also allows session descriptions to be used
        with non-RTSP media control protocols, simply by replacing
        the scheme in the URL.

2.3 Encoding Identifiers

   RTP profile and/or MIME types. [TBD: should probably register all the
   RTP data types as MIME types.]

2.4 Conference Identifiers

   Conference identifiers are opaque to RTSP' and are encoded using
   standard URI encoding methods (i.e., escaping with %). They can
   contain any octet value. The conference identifier MUST be globally
   unique. For H.323, the conferenceID value is to be used.


        If the conference participant inviting the media server
        would only supply a conference identifier which is unique
        for that inviting party, the media server could add an
        internal identifier for that party, e.g., its Internet
        address. However, this would prevent that the conference
        participant and the initiator of the RTSP commands are two
        different entities.

2.5 Relative Timestamps

   A relative time-stamp expresses time relative to the start of the
   clip.  Relative timestamps are expressed as SMPTE time codes for
   frame-level access accuracy. The time code has the format
   hours:minutes:seconds.frames, with the origin at the start of the
   clip.  For NTSC, the frame rate is 29.97 frames per second. This is
   handled by dropping the first frame index of every minute, except
   every tenth minute. If the frame value is zero, it may be omitted.

   Examples:


   10:12:33.40
   10:7:33


2.6 Absolute Time


H. Schulzrinne                                               [Page 10]

Internet Draft                   stream                November 26, 1996


   Absolute time is expressed as ISO 8601 timestamps. It is always
   expressed as UTC (GMT).

   Example for November 8, 1996 at 14h37 and 20 seconds GMT:


   19961108T143720Z


3 Header Field Definitions

3.1 Accept

   The Accept request-header field can be used to specify certain
   session description types which are acceptable for the response. The
   only parameter allowed is that of level , which indicates the highest
   level or version accepted by the requestor.

   Example of use:


   Accept: application/sdf, application/sdp;level=2


3.2 Address

3.3 Allow

   The Allow response header field lists the methods supported by the
   resource identified by the Request-URI. The purpose of this field is
   to strictly inform the recipient of valid methods associated with the
   resource. An Allow header field must be present in a 405 (Method not
   allowed) response.

   Example of use:


   Allow: PLAY, RECORD, SET_PARAMETER


3.4 Authorization

3.5 Blocksize

3.6 Conference


H. Schulzrinne                                               [Page 11]

Internet Draft                   stream                November 26, 1996


   This field establishes a logical connection between a conference,
   established using non-RTSP' means, and an RTSP stream.

   [TBD: This parameter is for further study. May not be needed with the
   Given parameter.]

3.7 Content-Length

3.8 Content-Type

3.9 Given

3.10 Location

3.11 Port

3.12 Range

3.13 Speed

3.14 Transport

3.15 TTL

4 Methods

   The Method token indicates the method to be performed on the resource
   identified by the Request-URI case-sensitive. New methods may be
   defined in the future. Method names may not start with a $ character
   (decimal 24) and must be a token

4.1 GET

   The GET method retrieves a session description from a server. It may
   use the Accept header to specify the session description formats that
   the client understands.


   GET twister RTSP/1.0 937
   Accept: application/sdp, application/sdf, application/mheg


   If the media server has previously been invited to a conference, the
   GET method also contains a conference identifier or a Given
   parameter.


H. Schulzrinne                                               [Page 12]

Internet Draft                   stream                November 26, 1996


   GET twister RTSP/1.0 834
   Conference: 128.16.64.19/32492374
   Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FTZQ==


   If the GET request contains a conference identifier, the media server
   MUST locate the conference description and use the multicast
   addresses and port numbers supplied in that description. The media
   server SHOULD only offer media types corresponding to the media types
   currently active within the conference. If the media server has no
   local reference to this conference, it returns status code 452.

   The conference invitation should also contain an indication whether
   the media server is expected to receive or generate media, or both.
   (A VCR-like device would support both directions.) If the invitation
   does not contain an indication of the operations to be performed, the
   media server should accept and then reject inappropriate operations.

   A typical response might be:

   200 18 OK
   Content-Type: application/sdf
   session description

4.2 SESSION

   This method is used by a media server to send media information to
   the client. If a new media type is added to a session (e.g., during a
   live event), the whole session description should be sent again,
   rather than just the additional components.

        This allows the deletion of session components.

   Example:

   SESSION twister/*/1234 Content-Type: application/sdp

   Session Description

   Response: 200, 302, 303, 500, can't do this operation, busy,

4.3 PLAY

   The PLAY method tells the server to start sending data via the
   previously set transport mechanism. The Range header specifies the
   range. The range can be specified in a number of units. This
   specification defines the smpte (see Section 2.5) and clock (see


H. Schulzrinne                                               [Page 13]

Internet Draft                   stream                November 26, 1996


   Section 2.6) range units.

   PLAY media-name
   Range: smpte= range-value

   The following example plays the whole session starting at SMPTE time
   code 0:10:20 until the end of the clip.


   PLAY twister/*/1234
   Range: smpte=0:10:20-


   For playing back a recording of a live event, it may be desirable to
   use clock units:


   PLAY meeting/*/1234
   Range: clock=19961108T142300Z-19961108T143520Z


   A media server only supporting playback MUST support the smpte format
   and MAY support the clock format.

   [TBD: It may be desirable to allow several ranges, so that remote
   digital editing can be done easily.]

   Response: 200, 500, 501, clock format not supported.

4.4 RECORD

   This method initiates eecording a range of media data according to
   the session description. The timestamp reflects start and end time
   (UTC).  If no time range is given, use the start or end time provided
   in the session description. If the session has already started,
   commence recording immediately. The Conference header is mandatory.

   A media server supporting recording of live events MUST support the
   clock range format; the smpte format does not make sense.


   RECORD meeting/audio.en/1234
   Conference: 128.16.64.19/32492374


H. Schulzrinne                                               [Page 14]

Internet Draft                   stream                November 26, 1996


4.5 REDIRECT

   A redirect request informs the client that it must connect to another
   server location. It contains the mandatory header Location , which
   indicates that the client should issue a GET for that URL. It may
   contain the parameter Range , which indicates when the redirection
   takes effect.

4.6 SET_PARAMETER

   Both client and media server can issue this request.

   The following parameters are defined:

   Blocksize: This advisory parameter is sent from the client to the
        media server setting the transport packet size. The server
        truncates this packet size to the closest multiple of the
        minimum media-specific block size or overrides it with the media
        specific size if necessary.  The block size is a strictly
        positive decimal number and measured in bytes. The server only
        returns an error (416) if the value is syntactically invalid,
        but not if the server adjusts it according to the mechanism
        described above or decides to simply ignore the advice.

   Port: UDP or TCP port to be used for this media.

   SSRC: RTP SSRC value to be used by the media server. This parameter
        is only valid for unicast transmission. It identifies the
        synchronization source to be associated with the media stream.
        This can be used for demultiplexing by the client of data
        received on the same port.

   Address: Destination network address, consisting of the address class
        identifier and the address. Currently, the address classes IP4
        and IP6 are defined.

   Transport: Transport protocol stack to be used: UDP or TCP or
        interleaved, followed by the next-layer transport protocol.  in
        whatever protocol is being used by the control stream.
        Currently, the next-layer protocols RTP is defined. Parameters
        may be added to each protocol, separated by a semicolon. For
        RTP, the boolean parameter compressed is defined, indicating
        compressed RTP according to RFC XXXX. Example: UDP
        RTP;compressed

   TTL: Multicast time-to-live value. In some cases, it may make sense
        for a client to ask a media server sending on a given multicast
        address to expand its range.


H. Schulzrinne                                               [Page 15]

Internet Draft                   stream                November 26, 1996


   Speed: This advisory parameter sets the speed at which the server
        delivers data to the client, contingent on the server's ability
        and desire to serve the media stream at the given speed.
        Implementation by the server is optional. The default is the bit
        rate of the stream.

   The parameter value is expressed as a decimal ratio, e.g., 2.0
   indicates that data is to be delivered twice as fast as normal. A
   speed of zero is invalid. A negative value indicates that the stream
   is to be played back in reverse direction.

   A request SHOULD only contain a single parameter to allow the client
   to determine why a particular request failed. A server MUST allow a
   parameter to be set repeatedly to the same value, but it MAY disallow
   changing parameter values.


        The parameters are split in a fine-grained fashion so that,
        for example, the client can set just the unicast port,
        without having to modify the destination address. There is
        no substantial difference between the privileged parameters
        and the parameters identified by family and parameter id in
        the current RTSP spec. If desired, parameter names could
        easily take the form family/parameter , e.g.,
        Audio/Annotations

   A SET_PARAMETER request without parameters can be used as a way to
   detect whether the other side is still responding.

   Example:


   SET_PARAMETER twister/1234/audio.en RTSP/1.0 68
   Speed: 2.3


   [TBD: Or should this be like SET_PARAMETER? Bit longer, but forces
   single parameter per request.]

4.7 GET_PARAMETER

   Both client and media server can issue a GET_PARAMETER request to
   retrieve a specific parameter. All parameters described for the
   SET_PARAMETER request are valid. In the request, the message body
   contains the parameter value. Only one parameter can be requested in
   each GET_PARAMETER request.


H. Schulzrinne                                               [Page 16]

Internet Draft                   stream                November 26, 1996


   Example:


   C->S: GET_PARAMETER twister/1234/audio.en RTSP/1.0 6
         Content-length: 17

         Audio/Annotations

   S->C: RTSP/1.0 200 6 OK
         Content-type: text/ascii
         Content-length: 2

         64


4.8 STOP

   Stops delivery of stream immediately. Returns indication of current
   position to allow play instead of resume.

        Thus, RESUME is not needed.


   C->M: STOP movie RTSP/1.0 76

   M->C: RTSP/1.0 200 76 OK


4.9 BYE

   Sent by either client or server to terminate a connection and release
   resources.

4.10 Embedded Data Stream

   The command DATA is used to indicate an embedded media data object,
   together with the content types. DATA requests are not acknowledged
   by RTSP'. The embedded object can have any type. For space-efficient
   encapsulation of binary data, the method in Section 4.11 should be
   used instead.


   DATA twisters/audio.en/1234 RTSP/1.1
   Content-Length: 500


H. Schulzrinne                                               [Page 17]

Internet Draft                   stream                November 26, 1996


   Content-Type: message/rtp

   (RTP data)


        This is workable, but not very space-efficient. However,
        the interesting case is that of a single TCP stream
        carrying both commands and media data. There is no
        particular reason to have small chunks in that case.

4.11 Embedded Binary Data

   Binary packets such as RTP data are encapsulated by an ASCII dollar
   sign (24 decimal), followed by a one-byte session identifier,
   followed by the length of the encapsulated binary data as a binary,
   two-byte integer in network byte order. The binary data follows
   immediately afterwards, without a CRLF.


        This makes the encapsulation overhead 4 bytes, less than
        the 8 bytes imposed by SCP.

5 Status Codes Definitions

   Where applicable, HTTP status codes are re-used. [TBD: add those
   relevant here]

5.1 Successful 2xx

5.1.1 200 OK

   The request has succeeded. The information returned with the response
   depends on the method used in the request, for example:

   GET: the session description;

   GET_PARAMETER: the value of the parameter.

5.2 Redirection 3xx

5.2.1 301 Moved Permanently

5.2.2 303 Moved Temporarily

5.3 Client Error 4xx


H. Schulzrinne                                               [Page 18]

Internet Draft                   stream                November 26, 1996


5.3.1 400 Bad Request

   The request could not be understood by the recipient due to malformed
   syntax. The request SHOULD NOT be repeated without modification.

5.3.2 401 Unauthorized

   The request requires user authentication.

5.3.3 402 Payment Required

   This code is reserved for future use.

5.3.4 405 Method Not Allowed

5.3.5 406 Not Acceptable

5.3.6 408 Request Timeout

5.3.7 411 Length Required

5.3.8 414 Request URI Too Long

5.3.9 415 Unsupported Mediatype

   The recipient of the request is refusing to service the request
   because the entity of the request is in a format not supported by the
   requested resource for the requested method.

5.3.10 450 Invalid Parameter

   The parameter in the request is not valid, i.e., out of range or
   malformed.

5.3.11 451 Parameter Not Understood

   The recipient of the request does not support one or more parameters
   contained in the request.

5.3.12 452 Conference Not Found

   The conference indicated by a Conference:  identifier is unknown to
   the media server.

5.3.13 453 Not Enough Bandwidth

   The request was refused since there was insufficient bandwidth. This
   may, for example, be the result of a resource reservation failure.


H. Schulzrinne                                               [Page 19]

Internet Draft                   stream                November 26, 1996


5.4 Server Error 5xx

5.4.1 500 Internal Server Error

5.4.2 501 Not Implemented

5.4.3 502 Bad Gateway

5.4.4 503 Service Unavailable

   The server is currently unable to handle the request due to a
   temporary overloading or maintenance of the server. The implication
   is that this is a temporary condition which will be alleviated.

5.4.5 504 Gateway Timeout

5.4.6 505 RTSP Version Not Supported

6 Examples

6.1 Media on demand (unicast)

   Client C requests a movie media servers A (audio.content.com) and V
   (video.content.com). The media description is stored on a web server
   W.  This, however, is transparent to the client. The client is only
   interested in the last part of the movie. The server requires
   authentication for this movie. The audio track can be dynamically
   switched between between two sets of encodings. The URL with scheme
   rtpsu indicates the media servers want to use UDP for exchanging RTSP
   messages.


   C->W: GET twister HTTP/1.0
         Accept: application/sdf; application/sdp

   W->C: 200 OK
         Content-type: application/sdf

         (session
           (all
            (media (t audio) (oneof
               ((e PCMU/8000/1 89 DVI4/8000/1 90) (id lofi))
               ((e DVI4/16000/2 90 DVI4/16000/2 91) (id hifi))
              )
              (language en)
              (id rtspu://audio.content.com/twister/audio.en/1234)
              )


H. Schulzrinne                                               [Page 20]

Internet Draft                   stream                November 26, 1996


              (media (t video) (e JPEG)
                (id rtspu://video.content.com/twister/video/1234)
              )
            )
          )

   C->A: SET_PARAMETER twister/audio.en/1234/lofi RTSP/1.0 1
         Port: 3056
         Transport: RTP;compression

   A->C: RTSP/1.0 200 1 OK

   C->V: SET_PARAMETER twister/video/1234/hifi RTSP/1.0 2
         Port: 3058
         Transport: RTP;compression

   V->C: RTSP/1.0 200 2 OK

   C->V: PLAY twister/video/1234 RTSP/1.0 3
         Range: smpte 0:10:00-

   V->C: RTSP/1.0 200 3 OK

   C->A: PLAY twister/audio.en/1234/lofi RTSP/1.0 4
         Range: smpte 0:10:00-

   S->C: 200 4 OK


   Even though the audio and video track are on two different servers,
   may start at slightly different times and may drift with respect to
   each other, the client can synchronize the two using standard RTP
   methods.

6.2 Live Media Event Using Multicast

   The media server chooses the multicast address and port. Here, we
   assume that the web server only contains a pointer to the full
   description, while the media server M maintains the full description.
   During the session, a new subtitling stream is added.


   C->W: GET concert HTTP/1.0

   W->C: HTTP/1.0 200 OK
         Content-Type: application/sdf


H. Schulzrinne                                               [Page 21]

Internet Draft                   stream                November 26, 1996


         (session
           (id rtsp://live.content.com/concert)
         )

   C->M: GET concert RTSP/1.0 1

   M->C: RTSP/1.0 200 OK
         Content-Type: application/sdf

         (session (all
           (media (t audio) (id music) (a IP4 224.2.0.1) (p 3456))
         ))

   C->M: PLAY concert/music RTSP/1.0
         Range: smpte 1:12:0

   M->C: RTSP/1.0 405 No positioning possible

   M->C: SESSION concert RTSP/1.0
         Content-Type: application/sdf

         (session (all
           (media (t audio) (id music))
           (media (t text) (id lyrics))
         ))

   C->M: PLAY concert/lyrics RTSP/1.0


   Since the session description already contains the necessary address
   information, the client does not set the transport address. The
   attempt to position the stream fails since this is a live event.

6.3 Playing media into an existing session

   A conference participant C wants to have the media server M play back
   a demo tape into an existing conference. When retrieving the session
   description, C indicates to the media server that the network
   addresses and encryption keys are already given by the conference, so
   they should not be chosen by the server. The example omits the simple
   ACK responses.


   C->M: GET demo RTSP/1.0 1
         Accept: application/sdf, application/sdp
         Given: address, privacy


H. Schulzrinne                                               [Page 22]

Internet Draft                   stream                November 26, 1996


   M->C: RTSP/1.0 200 1 OK
         Content-type: application/sdf

         (session
            (id 548)
            (media (t audio) (id sound)
         )

   C->M: SET_PARAMETER demo/548/sound RTSP/1.0 2
         Address: IP4 224.2.0.1
         Port:    3456
         TTL:     127


6.4 Recording

   Conference participant C asks the media server M to record a session.
   If the session description contains any alternatives, the server
   records them all.


   C->M: SESSION meeting RTSP/1.0 89
         Content-type: application/sdp

         v=0
         s=Mbone Audio
         i=Discussion of Mbone Engineering Issues

   M->C: 415 89 Unsupported Media Type
         Accept: application/sdf

   C->M: SESSION meeting RTSP/1.0 90
         Content-type: application/sdf

   M->C: 200 90 OK

   C->M: RECORD meeting
         Range: clock 19961110T1925-19961110T2015


7 Access Authentication

   Besides limiting access, access authentication is also needed to
   avoid denial-of-service attacks.


H. Schulzrinne                                               [Page 23]

Internet Draft                   stream                November 26, 1996


8 Security Considerations

   The protocol offers the opportunity for a remote-control denial-of-
   service attack. The attacker, using a forged source IP address, can
   ask for a stream to be played back to that forged IP address. This
   can be prevented by a challenge-response authentication.  If the goal
   is simply to prevent this denial-of-service attack, a default, widely
   known key can be used.

   If the client retrieves a session description, the server hand out an
   encrypted version of the client's IP address to the client during the
   initial retrieval of the session description.

A Session Description

   A session description must be able to identify sessions and
   individual media streams. The per-media identifier is created by the
   entity creating the session description and is opaque to anyone else.
   It may contain any 8-bit value except CR and LF.

B Notes on RTSP

        o The STREAM_HEADER functionality has been subsumed by the
         session description.

        o SEND_REPORT is not really needed. Should define an RTCP
         request with a random response interval.

        o Error reports are sent automatically. If server wants to
         terminate connection, it sends a BYE.

        o Resending (UDP_RESEND) should be handled by RTCP since it is
         always media-specific and RTCP can be readily flow-controlled
         to avoid congestion collapse.

        o Is STOP really needed? What's the difference between STOP and
         PAUSE? Resources (which?) cannot be released since there may be
         a PLAY command immediately. Bearing on resource reservation?

C Author Addresses

   Henning Schulzrinne
   Dept. of Computer Science
   Columbia University
   1214 Amsterdam Avenue
   New York, NY 10027
   USA
   electronic mail: schulzrinne@cs.columbia.edu


H. Schulzrinne                                               [Page 24]

Internet Draft                   stream                November 26, 1996


D Acknowledgements

   This draft is based on the functionality of the RTSP draft. It also
   borrows format and descriptions from HTTP/1.1.


H. Schulzrinne                                               [Page 25]