Internet Engineering Task Force                     Audio-Video Transport WG
INTERNET-DRAFT                                      H. Schulzrinne/S. Casner
                                                                    AT&T/ISI
                                                               July 30, 1993
                                                          Expires:  10/01/93


                    RTP: A Real-Time Transport Protocol


Status of this Memo


This document is an Internet Draft.   Internet Drafts are working  documents
of the Internet Engineering  Task Force (IETF), its  Areas, and its  Working
Groups.   Note that other  groups may also  distribute working documents  as
Internet Drafts.

Internet Drafts  are draft  documents valid  for a  maximum of  six  months.
Internet Drafts may be  updated, replaced, or  obsoleted by other  documents
at any time.   It  is not appropriate  to use Internet  Drafts as  reference
material or to  cite them other  than as  a ``working draft''  or ``work  in
progress.''

Please check  the I-D  abstract  listing contained  in each  Internet  Draft
directory to learn the current status of this or any other Internet Draft.

Distribution of this document is unlimited.


                                  Abstract


     This   draft  describes  a  real-time   transport  protocol  (RTP)
    suitable  for the  network  transport of  real-time  data, such  as
    audio,  video or  simulation data  for both  multicast  and unicast
    transport  services.     The   data  transport  is  enhanced  by  a
    control protocol  (RTCP)  designed to  provide minimal  control and
    identification  functionality particularly  in multicast  networks.
    RTP  and RTCP  are designed  to  be independent  of  the underlying
    transport and  network layers.   The  protocol supports the  use of
    RTP-level translators and bridges.   Within multicast associations,
    sites can direct control messages to individual sites.


This specification is a product  of the Audio-Video Transport working  group
within the Internet  Engineering Task  Force.   Comments  are solicited  and
should be addressed to the  working group's mailing list at  rem-conf@es.net
and/or the authors.


INTERNET-DRAFT                       RTP                       July 30, 1993

Contents


1 Introduction                                                             2


2 Protocol Conventions                                                     3

3 Real-time Data Transfer Protocol -- RTP                                  4

  3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

  3.2 RTP Fixed Header Fields . . . . . . . . . . . . . . . . . . . . . . 6

  3.3 The RTP Options . . . . . . . . . . . . . . . . . . . . . . . . . . 8

  3.4 Reverse-Path Option . . . . . . . . . . . . . . . . . . . . . . . . 9

  3.5 Security Options  . . . . . . . . . . . . . . . . . . . . . . . . . 11

  3.6 The Use of the Security Options . . . . . . . . . . . . . . . . . . 15


4 Real Time Control Protocol --- RTCP                                     17

5 Security Considerations                                                 22


6 RTP over network and transport protocols                                23

  6.1 Defaults  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

  6.2 ST-II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24


A Implementation Notes                                                    24

  A.1 Timestamp recovery  . . . . . . . . . . . . . . . . . . . . . . . . 24

  A.2 Detecting the Beginning of a Synchronization Unit . . . . . . . . . 25

  A.3 Demultiplexing and Locating the Synchronization Source  . . . . . . 26

B Addresses of Authors                                                    27


1 Introduction


This draft concisely specifies a real-time transport protocol.  A discussion
of the design decisions can be found in the current version of the companion

H. Schulzrinne/S. Casner              Expires 10/01/93              [Page 2]


INTERNET-DRAFT                       RTP                       July 30, 1993

Internet draft draft-ietf-avt-issues.txt.   The transport protocol  provides
end-to-end delivery services for one or more s_t_r_e_a_m_s_ of data with  real-time
characteristics, for example,  interactive audio  and video.    It does  n_o_t_
guarantee delivery or prevent out-of-order delivery, nor does it assume that
the underlying network is reliable and  delivers packets in sequence.   [The
sequence numbers included  in RTP allow  the end system  to reconstruct  the
sender's packet sequence, but sequence numbers may also be used to determine
the proper location  of a  packet, for  example in  video decoding,  without
necessarily decoding packets in sequence].   RTP is  designed to run on  top
of a variety  of network  and transport  protocols, for  example, IP,  ST-II
or UDP. [For most  applications, RTP  offers insufficient demultiplexing  to
run directly on IP.] RTP transfers  data in a single direction, possibly  to
multiple destinations if supported by the  underlying network.  A  mechanism
for indicating a return path for control data is provided.

While RTP is primarily  designed to satisfy  the needs of  multi-participant
multimedia conferences, it  is not limited  to that particular  application.
Storage of continuous data, interactive distributed simulation, active badge
and control  and  measurement applications  may  also find  RTP  applicable.
Profiles are  used to  instantiate  certain header  fields and  options  for
particular sets of applications.  A profile for audio and video data may  be
found in the companion Internet draft draft-ietf-avt-profile.txt.

This document defines two packet formats and protocols:


  o the  real-time  transport  protocol  (RTP)  for  exchanging  data   with
    real-time properties.

  o the real-time  control protocol (RTCP)  for conveying information  about
    the sites  in an  on-going association.    RTCP options  may be  ignored
    without affecting the ability to  correctly receive data.  RTCP is  used
    for loosely  controlled conferences,  i.e., where there  is no  explicit
    admission control  and set-up.   Its  functionality may  be subsumed  by
    a  conference control  protocol  (which  is  beyond the  scope  of  this
    document).


2 Protocol Conventions


Control fields  (options) for  RTP and  RTCP share  the same  structure  and
numbering space and are carried within the same packet.  Options may  appear
in any  order, unless  specifically restricted  by the  option  description.
[The position of some security options may have significance.]  Each  option
consists of the final bit, the  option type designation, a one-octet  length
field denoting the total number of  32-bit long words comprising the  option
(including final  bit, type  and length),  and  finally any  option-specific
data.  The last option before the packet data portion (``payload'') has  the
'F' (final) bit set to one, for all other options this field has a  value of


H. Schulzrinne/S. Casner              Expires 10/01/93              [Page 3]


INTERNET-DRAFT                       RTP                       July 30, 1993

zero.

Fields within the fixed header and within options are aligned to the natural
length of  the field,  i.e., 16-bit  words are  aligned  on even  addresses,
32-bit long words are aligned at addresses  divisible by four, etc.   Octets
designated as padding  have the  value zero.    Options unknown  to the  RTP
implementation or the application  are to be ignored.   Options with  option
types having values  from 64 to  127 inclusive  are to be  used for  private
extensions.  Fields designated as 'reserved' or 'R' are set aside for future
use; they should be set to zero by senders and ignored by receivers.

All integer  fields  are  carried in  network  byte  order,  that  is,  most
significant byte (octet)  first.   The  transmission order  is described  in
detail in [1], Appendix A. Unless otherwise noted, numeric constants are  in
decimal (base 10).  Numeric constants prefixed by '0x' are in hexadecimal.

Textual information is  encoded accorded to  the UTF-2 encoding  of the  ISO
standard 10646 (Annex F) [2,3].  US-ASCII  is a subset of this encoding and
requires no additional encoding.   The presence  of multi-byte encodings  is
indicated by setting the  most significant bit to  a value of one.   A  byte
with a binary value of zero may  be used as a string terminator for  padding
purposes.

[Text in square brackets is intended to motivate the design decisions made.]


3 Real-time Data Transfer Protocol -- RTP


3.1 Definitions


P_a_y_l_o_a_d_ is the data following the RTP fixed header and the RTP/RTCP options.
The payload format and interpretation are beyond the scope of this memo.   A
valid RTP packet may carry no payload.

An R_P_D_U_ stands for RTP protocol data unit.  It consists of the encapsulation
specific to a particular underlying protocol, the fixed RTP header, RTP  and
RTCP options (if any) and the payload, if any.

A s_y_n_c_h_r_o_n_i_z_a_t_i_o_n_ s_o_u_r_c_e_ is the combination  of one or more content  sources
with its  own  timing.    The  RPDUs  emitted by  a  synchronization  source
have non-decreasing sequence  numbers and  time stamps  (modulo their  field
lengths).  The  audio coming from  a microphone or the  video from a  source
are examples of synchronization sources.  Typically, a single source emits a
single medium (e.g., audio or video).  A synchronization source is a  member
of exactly one  channel, as  defined below.   A  synchronization source  may
change its data format  over time.   Synchronization sources are  identified
by their source network address, source transport address (e.g., UDP  source
port) and the value of SSRC identifier carried  in the SSRC option.  If  the


H. Schulzrinne/S. Casner              Expires 10/01/93              [Page 4]


INTERNET-DRAFT                       RTP                       July 30, 1993

SSRC option is not present, a value of zero for that identifier is assumed.

A c_o_n_t_e_n_t_ s_o_u_r_c_e_ is the actual source of the data carried, for example,  the
user and host that originally generated the audio data.  One or more content
sources may contribute data for one synchronization source.  Content sources
are used for identifying the logical source of the data; they have no effect
on the delivery of the data itself.

A n_e_t_w_o_r_k_ s_o_u_r_c_e_ is  the network-level origin  of the RPDUs  as seen by  the
receiving end system.

All  sources  sending   to  the   same  destination   network  address   and
transport-level address using the  same RTP flow  identifier belong to  same
c_h_a_n_n_e_l_.

An e_n_d_ s_y_s_t_e_m_ generates the content to  be used in RTP packets and  delivers
the content of received RTP packets to the user application.  An end  system
can act as  one or  more synchronization  sources.   (Most  end systems  are
expected to be a single synchronization source.)

An (RTP-level)  b_r_i_d_g_e_  receives  RTP  packets from  one  or  more  sources,
combines them in some manner and then forwards  a new RTP packet.  A  bridge
may change the data format.   Since the  timing among multiple input  source
will not generally be synchronized, the bridge will make timing  adjustments
among the  streams and  generate its  own timing  for the  combined  stream.
Therefore, bridges are  synchronization sources,  with each  of the  sources
whose packets  were combined  into an  outgoing RTP  packet as  the  content
sources for that outgoing  packet.  Audio  bridges and media converters  are
examples of bridges.  Example:   assume SMITH@FOO and JONES@BAR are using  a
bridge to translate their audio  from one encoding to  another.  The  bridge
mixes audio packets  from Smith and  Jones together and  forwards the  mixed
packets.   If,  say, Smith  was talking,  she  is indicated  as the  content
source of the outgoing packet, allowing the receiver to properly display the
current speaker rather  than just  the bridge  that mixed  the audio.    For
an end system  receiving RTP  packets from that  bridge, the  bridge is  the
synchronization source and Smith the content source.  The RTP-level  bridges
described in  this document  are unrelated  to the  data link-layer  bridges
found in local area networks.   If there  is possibility for confusion,  the
term 'RTP-level bridge' should be used.   [The name 'bridge' follows  common
telecommunication usage.]

An (RTP-level) t_r_a_n_s_l_a_t_o_r_ does not alter the timing of packets.  Examples of
its use include encoding conversion  without mixing or retiming,  conversion
from multicast  to  unicast,  and application-level  filters  in  firewalls.
A translator  is  neither a  synchronization  nor a  content  source.    The
properties of bridges and translators are summarized in Table 1.  Checkmarks
in parentheses designate possible, but unlikely actions.

A s_y_n_c_h_r_o_n_i_z_a_t_i_o_n_ u_n_i_t_ consists  of one or  more packets that,  as a  group,
share a common fixed  delay between generation and  playout of each part  of
the group, or can only be scheduled as a whole.  The delay may change at the


H. Schulzrinne/S. Casner              Expires 10/01/93              [Page 5]


INTERNET-DRAFT                       RTP                       July 30, 1993


                                    end sys.  bridge  translator
           mix sources                 --        x        --
           change encoding             N/A       x        x
           encrypt                      x        x       (x)
           sign for authentication      x        x        --
           touch content                x        x       (x)
           insert CSRC                 --        x        --
           insert SSRC                  x        x        x
           insert SDST                  x        x        --
           insert SDES                  x        x        --


      Table 1:  The properties of end systems, bridges and translators

beginning of such a synchronization unit.   The most common  synchronization
units are talkspurts for voice and frames for video transmission.

N_o_n_-_R_T_P_ m_e_c_h_a_n_i_s_m_s_  refers to  other protocols  and mechanisms  that may  be
needed to  provide  a  useable  service.    In  particular,  for  multimedia
conferences, a  conference  control application  may  distribute  encryption
and authentication  keys, negotiate  the encryption  algorithm to  be  used,
determine the mapping from  the RTP format field  to the actual data  format
used.  For simple applications, electronic mail or a conference database may
also be used.   The  specification of  the mechanism itself  is outside  the
scope of this memorandum.


3.2 RTP Fixed Header Fields


The RTP header has the following format:


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver|   FlowID  |P|S|  format   |       sequence number         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     timestamp (seconds)       |     timestamp (fraction)      |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| options ...                                                   |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+


The fields in the  first eight octets  are present in  every RTP packet  and
have the following meaning:


protocol version: 2 bits

H. Schulzrinne/S. Casner              Expires 10/01/93              [Page 6]


INTERNET-DRAFT                       RTP                       July 30, 1993

    Defines  the protocol  version.    The version  number of  the  protocol
    defined in this memo is one.

FlowID: 6 bits
    The value  of the  field is  the flow  identifier, which  forms part  of
    the tuple identifying  a channel (see definition above).   [The flow  ID
    field is  convenient if several  different channels are  to receive  the
    same treatment  by the  underlying layers  or  if a  profile allows  for
    the concatenation of several  RPDUs on different channels into a  single
    protocol data unit of the underlying protocol layer.]

option present bit (P): 1 bit
    This flag has a value of one if the fixed RTP header is followed  by one
    or more options and a value of zero otherwise.

end-of-synchronization-unit (S): 1 bit
    This flag has  a value of  one in the last  packet of a  synchronization
    unit, a value of zero otherwise.  [As shown in Section A,  the beginning
    of a  synchronization unit can  be readily established  from this  flag.
    If this flag were to signal to the beginning of a  synchronization unit,
    the end  of a  synchronization unit  could  not be  established in  real
    time.]

format: 6 bits
    The  'format'  field  forms  an  index  into  a  table  defined  through
    the  RTCP FMT  option or  non-RTP  mechanisms (see  Section  3.1.    The
    mapping establishes  the format of  the RTP payload  and determines  its
    interpretation by  higher layers.   If  no mapping has  been defined  in
    this manner,  a standard mapping is  specified by the companion  profile
    document, RFC TBD. Also,  default formats may be defined by the  current
    edition of the Assigned Numbers RFC.

sequence number: 16 bits
    The sequence  number counts RPDUs.   The  sequence number increments  by
    one for  each packet  sent.   [The sequence  number may be  used by  the
    receiver  to detect  packet  loss, to  restore  packet sequence  and  to
    identify packets to the application.]

timestamp: 32 bits
    The timestamp reflects the  wallclock time when the RPDU was  generated.
    The timestamp consists of the middle 32 bits of a 64-bit  NTP timestamp,
    as defined in RFC 1305 [4].  Several consecutive packets may  have equal
    timestamps.

    The  timestamp of  the first  packet(s)  within a  synchronization  unit
    is expected  to closely  reflect the actual  sampling instant,  measured
    by  the  local  system  clock.     The  local  system  clock  should  be
    controlled  by a  time  synchronization protocol  such  as NTP  if  such
    a service  is available.    It  is not  expected that  the local  system
    clock  be referenced  to  obtain  the  timestamp for  the  beginning  of
    every synchronization  unit, but  the local clock  should be  referenced


H. Schulzrinne/S. Casner              Expires 10/01/93              [Page 7]


INTERNET-DRAFT                       RTP                       July 30, 1993

    frequently enough so that clock drift between synchronized  system clock
    and  sampling clock  can  be compensated  for  gradually.    Within  one
    synchronization unit, it may be appropriate to compute  timestamps based
    on the  logical timing  relationships between the  packets.   For  audio
    samples, for  example, the nominal  sampling interval may be  used.   If
    the clock quality field of the CDES option does not  indicate otherwise,
    it is assumed that  the timestamp at the beginning of a  synchronization
    unit is  derived  from a  synchronized system  clock.   However,  it  is
    allowable to  operate without synchronized time  on those systems  where
    it is  not  available, unless  a profile  or session  protocol  requires
    otherwise.


3.3 The RTP Options


The packet header may be followed by  options and the payload.  Options  are
summarized below.  Unless otherwise noted, each option may appear only  once
per packet.  Each  packet may contain any number  of options.  A  conforming
implementation of RTP  has to support  the RTP options  listed here,  unless
otherwise noted.


CSRC 0   Content source identifiers.  The content source option is  inserted
        only by bridges and identifies  all sources that contributed to  the
        packet.  For example, for audio packets, all sources that were mixed
        together to create this packet  are listed, allowing correct  talker
        indication at the  receiver.   Each CSRC option  may contain one  or
        more content source identifiers, each 16 bits long.  The  identifier
        values must be  unique for  all content sources  received through  a
        particular synchronization source (bridge) on a particular  channel;
        the value of binary zero  is reserved and may not  be used.  If  the
        number of content sources is even, the two octets needed to pad  the
        list to a multiple  of four octets  are set to zero.   There  should
        only be a single CSRC option within a packet.  If no CSRC  option is
        present, the content source identifier is assumed to have a value of
        zero.  CSRC options are not modified by RTP-level translators.

        A conformant RTP implementation does not have to be able to generate
        or interpret the CSRC option.


         0                   1                   2                   3
         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |F|    CSRC     |    length     | content source identifier    ...
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

SSRC 1   Synchronization  source  identifier.     The  SSRC  option  may  be
        inserted by RTP-level  translators,  end systems  and bridges.    It
        is typically used  only by  translators, but  it may be  used by  an

H. Schulzrinne/S. Casner              Expires 10/01/93              [Page 8]


INTERNET-DRAFT                       RTP                       July 30, 1993

        end system application to distinguish several sources sent with  the
        same lower-layer source address.   Each synchronization source  with
        the same lower-layer  address (e.g.,  the  same IP  address and  UDP
        port) must have  a distinct SSRC.  Synchronization sources that  are
        distinguishable by their lower-layer address do not require the  use
        of SSRC options.   The SSRC value zero is  reserved and must not  be
        used.  If no SSRC option  is present, the network source is  assumed
        to indicate the synchronization source.  There must be no more  than
        one SSRC identifier per  packet; thus, a  translator must remap  the
        SSRC identifier of  an incoming packet  into a  new, locally  unique
        SSRC identifier.  The SSRC option may be considered in functionality
        as an extension  of the source  port number in  protocols like  UDP,
        ST-II or TCP.

        A RTP receiver must support the SSRC option.  RTP senders only  need
        to support this option if they  intend to send more than one  source
        to the same channel using the same source port.


         0                   1                   2                   3
         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |F|    SSRC     | length = 1    | identifier                    |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

BOP 3   (beginning of playout unit)  16-bit sequence number designating  the
        first packet within the current playout unit.


         0                   1                   2                   3
         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |F|     BOP     | length = 1    | sequence number               |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


3.4 Reverse-Path Option


With two-party (unicast) communications,  relaying back control  information
to the sender is  easy.  For  multicast communications, control  information
can be sent to all members of the  group.  It may, however, be desirable  to
send a message to  an individual member  of a multicast  group, for  example
to request retransmission of  a particular data frame  or to request/send  a
reception quality report.  For this particular use, we introduce a mechanism
for sending so-called reverse RPDUs.   The RPDU  format of reverse RPDUs  is
exactly the  same as  for regular  messages and  they can  make use  of  all
the options defined in  this memorandum.   Reverse RPDUs travel through  the
same translators as other RPDUs.   The receiver distinguishes reverse  RPDUs
by their arrival on  a different transport selector  (e.g., a different  UDP
port), namely the  same one  which is  used as a  source transport  selector

H. Schulzrinne/S. Casner              Expires 10/01/93              [Page 9]


INTERNET-DRAFT                       RTP                       July 30, 1993

(e.g., UDP source  port) for  forward RPDUs.   A receiver  of reverse  RDPUs
cannot rely on any sequence  number ordering, as a  sender may use the  same
sequence number  space while  communicating through  this reverse  mechanism
with several receivers.  The sequence  number space of reverse RPDUs has  to
be completely separate from that used for RPDUs sent to the multicast group.
If the same sequence  number space were used,  the members of the  multicast
group not  receiving reverse  RPDUs would  detect a  gap in  their  received
sequence number space.


SDST 2   Synchronization destination identifier.   The  SDST option is  only
        inserted by  RTP  end systems  and  bridges  if they  want  to  send
        unicast information to a particular site within the multicast group.
        Packets containing an SDST  option must not  contain an SSRC  option
        and vice versa.   The identifier value  zero is allowed, unlike  for
        SSRC options (see example below).

        Denote the the end system that wants to return a unicast message  by
        S and the desired destination end system of that unicast message  by
        D. If the  multicast packets received  by S from  D contain no  SSRC
        option, S and D must  be directly connected, without an  intervening
        translator.  No SDST option is need in this case.

        If the multicast packet received by S from D contain an SSRC option,
        S inserts  an SDST  option  using the  identifier contained  in  the
        SSRC option  received from  D. D  then forwards  the packet  to  the
        source network and transport address found in the multicast  packets
        coming from D.  The packet  will thus  reach the  translator on  the
        path between S  and D closest  to S. The  arrival on that  transport
        address tells the translator  that the packet  is a unicast  reverse
        control packet.    The translator  determines which  source it  maps
        into the identifier contained  in the SDST  option and replaces  the
        SDST identifier by that value.   In other words:   if a forward  RTP
        packet carries SSRC  identifier X  between two  systems (either  two
        translators or an end system and a translator), the unicast  reverse
        control packet will carry SDST  with identifier X between those  two
        systems.

        Example for  UDP: T1  and  T2 are  translators between  end  systems
        S and  D. In  the forward  direction, D  sends regular  RTP  packets
        with no SSRC to (among other multicast group members) translator  T2
        with destination port  3456 and source  port 5678;  T2 inserts  SSRC
        identifier 13  and forwards  to translator  T1 on  source port  4590
        and destination port  3456; T1 translates  SSRC 13  into SSRC 8  and
        forwards to S using destination port 3456 and source port 12789.  In
        the unicast reverse RPDU, site S sends the packet to translator  T1,
        destination port 12789 with SDST value 8.  T1 replaces SDST value  8
        with SDST value 13  and forwards to  translator T2 with  destination
        port 4590.  T2 finally sends the message with SDST value 0 to site D
        at destination port 5678.   By its  arrival port, site D  determines
        that the RPDU is a reverse RPDU and treat it accordingly.

H. Schulzrinne/S. Casner             Expires 10/01/93              [Page 10]


INTERNET-DRAFT                       RTP                       July 30, 1993

        [Reverse control  unicast packets  are already  identified by  their
        destination transport address,  so SSRC  could be  used for  reverse
        control packets.  A separate option is used to limit confusion.]

        Only applications  that  need to  send  or receive  unicast  control
        information flow need to implement the SDST option.


         0                   1                   2                   3
         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |F|    SDST     | length = 1    |           identifier          |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


3.5 Security Options


The security  options  below offer  message  integrity,  authentication  and
privacy and the combination of the three.

Support for the security  options is not mandatory,  but see the  discussion
for the ENC option.  The four message integrity check options --- MIC, MICA,
MICK and MICS --- are mutually exclusive,  i.e., only one of them should  be
used for a single RPDU.

All message  integrity check  options are  computed over  the fixed  header,
the ENC option preceding  the message integrity  check option (if  present),
the first four  octets of the  message integrity check  option and the  data
(remaining header and payload) following the message integrity check option.

The message  integrity check  options and  the ENC  option shall  not  cover
the SSRC and  SDST options,  i.e., SSRC  and SDST must  be inserted  between
the fixed header and  the ENC or  message integrity check  options, as  SSRC
and SDST  are  subject to  change  by translators  that  are likely  not  in
possession of  the necessary  descriptor table  (see below)  and  encryption
keys.  Translators that have  the necessary keys and descriptor  translation
table may modify the contents  of the RPDU, unless  the MICA option is  used
(see MICA description).

All security options carry  a one-octet descriptor field.   This  descriptor
is an index into  two tables, one for  the message integrity check  options,
one for the  ENC option,  established  by non-RTP means,  containing  digest
algorithms (MD2,  MD5,  etc.),  encryption  algorithms  (DES  variants)  and
encryption keys or shared secrets (for the MICK option).  All sources within
the same channel  share the same  table.   The descriptor  value may  change
during a session, for example, to use a different set of encryption keys.

The descriptor value zero describes a set of default algorithms to be  used:
MD5 for the message digest algorithm, DES CBC for the encryption algorithm.


H. Schulzrinne/S. Casner             Expires 10/01/93              [Page 11]


INTERNET-DRAFT                       RTP                       July 30, 1993

The MIC, MICK and MICS message integrity checks offer g_r_o_u_p_  a_u_t_h_e_n_t_i_c_a_t_i_o_n_,
that is, the receiver can ascertain  that the RPDU originated from a  member
of the group  of sites  sharing a  common secret,  but  the receiver  cannot
authenticate which of  the sources  among that  group sent  the data.    The
receiver can also be assured that nobody outside the group tampered with the
RPDU.


ENC 8   All packet  data  after  this option,  but  not  the  fixed  header,
        is encrypted,  using the  encryption  key and  symmetric  encryption
        algorithm specified by the descriptor  field.  The descriptor  value
        may change over time to accomodate varying security requirements  or
        reduce the amount of ciphertext using  the same key.  [For  example,
        in a network interview, the  candidate and interviewers could  share
        one key, with a second key set aside for the interviewers only.  For
        symmetric keys, source-specific keys offer no advantage.]

        The descriptor  value zero  is  reserved for  a default  mode  using
        the Data Encryption  Standard (DES) algorithm  in CBC (cipher  block
        chaining) mode, as described in  Section 1.1 of RFC  1423 [5].   The
        padding specified  in that  section is  to  be used.    The  8-octet
        initialization vector (IV) may be carried unencrypted within the ENC
        option or generated anew for  each packet.   If the ENC option  does
        not contain an initialization vector (indicated by an option  length
        of 1), the  fixed RTP  header is used  as the IV.  [Using the  fixed
        RTP header as  the IV  avoids regenerating  the IV  for each  packet
        and incurs less  header overhead.]    For details  on the  tradeoffs
        for CBC IV use, see  [6].  Support  for encryption is not  required.
        Implementations that do not support encryption should recognize  the
        ENC option  so that  they can  avoid processing  encrypted  messages
        and provide a meaningful failure  indication.  Implementations  that
        support encryption should,  at the minimum,  always support the  DES
        CBC algorithm.


         0                   1                   2                   3
         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |F|     ENC     |   length = 3  |    reserved   |   descriptor  |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        | DES (CBC) initialization vector, bytes 0 through 3            |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        | DES (CBC) initialization vector, bytes 4 through 7            |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |F|     ENC     |   length = 1  |    reserved   |   descriptor  |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


H. Schulzrinne/S. Casner             Expires 10/01/93              [Page 12]


INTERNET-DRAFT                       RTP                       July 30, 1993

MIC 9   Messsage integrity check.   The  MIC option option  is used only  in
        combination with the ENC option immediately preceding it to  provide
        privacy and group membership authentication.  The message  integrity
        check uses the digest algorithm  specified by the descriptor  field.
        The value zero implies the use of the MD5 message digest.  Note that
        the MIC option is not separately encrypted.


         0                   1                   2                   3
         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |F|     MIC     |     length    |    reserved   |   descriptor  |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |                 message digest (unencrypted)                 ...
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


MICA 10   Message integrity check, asymmetric  encryption.  Currently,  only
        the use of the MD2 and MD5 message digest algorithms is defined,  as
        described in RFC 1319 [7] (as corrected in Section 2.1 of RFC  1423)
        and RFC 1321 [8], respectively.  The MD2 and MD5 message digests are
        16 octets long.

        ``To avoid any  potential ambiguity  regarding the  ordering of  the
        octets of  an MD2  message digest  that  is input  as a  data  value
        to another encryption process  (e.g., RSAEncryption), the  following
        holds true.   The first (or  left-most displayed,  if one thinks  in
        terms of  a digest's  "print" representation)  octet of  the  digest
        (i.e., digest[0]  as  specified in  RFC 1319),  when  considered  as
        an RSA  data value,  has  numerical weight  2**120.   The  last  (or
        right-most displayed) octet  (i.e., digest[15] as  specified in  RFC
        1319) has numerical weight 2**0.''  [RFC 1423, Section 2.1]

        ``To avoid any  potential ambiguity  regarding the  ordering of  the
        octets of a MD5 message digest that is input as an RSA data value to
        the RSA encryption process, the following holds true.  The first (or
        left-most displayed, if one  thinks in terms  of a digest's  "print"
        representation) octet of the  digest (i.e.,  the low-order octet  of
        A as specified in RFC 1321),  when considered as an RSA data  value,
        has numerical weight  2**120.   The last  (or right-most  displayed)
        octet (i.e., the high-order octet of D as specified in RFC 1321) has
        numerical weight 2**0.''  [RFC 1423, Section 2.2]

        The message digest  is encrypted,  using asymmetric  keys, with  the
        sender's private key using the algorithm described in Section  4.2.1
        of RFC 1423:   ``As described in  PKCS #1,  all quantities input  as
        data values to the RSAEncryption process shall be properly justified
        and padded to  the length  of the  modulus prior  to the  encryption
        process.   In general,  an RSAEncryption  input value  is formed  by
        concatenating a  leading NULL  octet,  a block  type BT,  a  padding
        string PS, a NULL octet, and the data quantity D, that is, RSA input


H. Schulzrinne/S. Casner             Expires 10/01/93              [Page 13]


INTERNET-DRAFT                       RTP                       July 30, 1993

        value = 0x00, BT, PS, 0x00,  D. To prepare a MIC for  RSAEncryption,
        the PKCS  #1 ``block  type 01''  encryption-block formatting  scheme
        is employed.   The block type  BT is a  single octet containing  the
        value 0x01 and the padding string  PS is one or more octets  (enough
        octets to make  the length  of the  complete RSA  input value  equal
        to the length of  the modulus) each containing  the value 0xFF.  The
        data quantity  D is  comprised  of the  MIC  and the  MIC  algorithm
        identifier.''.   The encoding is  described in detail  in RFC  1423.
        For encrypting MD2 and MD5, the data quantity D is comprised of  the
        16-byte checksum, preceded  by the  binary sequences  shown here  in
        hexadecimal:  0x30, 0x20, 0x30, 0x0C, 0x06, 0x08, 0x2A, 0x86,  0x48,
        0x86, 0xF7, 0x0D,  0x02, 0x02, 0x05,  0x00, 0x04,  0x10 for MD2  and
        0x30, 0x20, 0x30, 0x0C,  0x06, 0x08, 0x2A,  0x86, 0x48, 0x86,  0xF7,
        0x0D, 0x02, 0x05, 0x05, 0x00, 0x04, 0x10 for MD5.

        The signature is padded as necessary.   The value of the padding  is
        left unspecified.  [Note:  The number of non-padding bits within the
        signature is known to the receiver as being equal to the key length.
        The MIC algorithm is identified  through the bytes prepended to  the
        actual 16-byte signature.]

        Contrary to what is specified in RFC 1423 for privacy enhanced mail,
        the asymmetrically signed MIC is carried in binary, NOT  represented
        in the  printable  encoding of  RFC  1421,  Section 4.3.2.4.     The
        encrypted length of the  signature will be equal  to the modulus  of
        the RSA encryption used,  rounded to the  next integral byte  count.
        The modulus and public key is  conveyed to the receivers by  non-RTP
        means.  [Note:  Asymmetric keys are used since symmetric keys  would
        not allow authentication of the  individual source in the  multicast
        case.]

        A translator that  receives an  RPDU is  not allowed  to modify  the
        parts of the RPDU covered by  the MICA option as the receiver  would
        have no way of establishing the identity of the translator and  thus
        could not verify the integrity of the RDPU.

        Support for sending or interpreting MICA options is not required.


         0                   1                   2                   3
         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |F|    MICA     |    length     |  encrypted message-digest    ...
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

MICK 11   Message integrity  check, keyed.    This message  integrity  check
        does not require encryption.   In addition to  the RPDU parts to  be
        included in the message digest according to the introduction to this
        section, the shared secret is placed in the MICK option and included
        in the message  digest.   (The shared  secret is  equivalent to  the
        key used for the  MICS and ENC  options, but is  16 octets long,  if


H. Schulzrinne/S. Casner             Expires 10/01/93              [Page 14]


INTERNET-DRAFT                       RTP                       July 30, 1993

        necessary by padding with binary zeroes.)  The shared secret in  the
        MICK option is then replaced by the computed 128-bit digest.

        The receiver saves the message digest contained in the MICK  option,
        replacing it with  the shared  secret key and  computes the  message
        digest in the same manner as the sender.   If the RPDU has not  been
        tampered and originated with one of  the holders of the secret  key,
        the computed  message digest  will agree  with the  digest found  on
        reception in the MICS option.

        [The message integrity check follows the practice of SNMP Version 2,
        as described in RFC  1446, Section  1.5.1.   The MICS option  itself
        is covered  by the  digest in  order to  detect tampering  with  the
        descriptor field itself.    Using the  secret key  in the  signature
        instead of encrypting the  MD5 message digest avoids  the use of  an
        encryption algorithm when only authentication is desired.   However,
        the security of this  approach has not been  as well established  as
        that based on encrypting message digests,  as used in the MICS,  MIC
        and MICA options.]


         0                   1                   2                   3
         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |F|    MICS     |    length     |   reserved    |   descriptor  |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |                     encrypted message digest                 ...
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

MICS 12   Message integrity check,  symmetric-key encrypted.   This  message
        integrity check encrypts the  message digest using  DES ECB mode  as
        described in RFC 1423, Section 3.1.


         0                   1                   2                   3
         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |F|    MICS     |    length     |   reserved    |   descriptor  |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |                     encrypted message digest                 ...
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


3.6 The Use of the Security Options


Combinations of the message integrity check and ENC security options can  be
used to provide a variety of security services:


H. Schulzrinne/S. Casner             Expires 10/01/93              [Page 15]


INTERNET-DRAFT                       RTP                       July 30, 1993

confidentiality: Confidentiality   here  means   that  only   the   intended
    receiver(s) can decode  the received RTP packets;  for others, the  data
    contains no  useful  information.   Confidentiality  of the  content  is
    achieved by  encryption using DES.  The presence of  encryption and  the
    initialization vector  is indicated  by the  ENC option.    [Note:   for
    efficiency reasons,  this  specification does  not insist  that  content
    encryption  only  be used  in  connection  with  message  integrity  and
    authentication mechanisms.   In most  all cases, it  will be obvious  to
    the person receiving  the data if he or  she does not possess the  right
    encryption key.]

authentication and message integrity: In combination with certificates,  the
    receiver  can  ascertain that  the  claimed  originator  is  indeed  the
    originator  of the  data  (authentication) and  that  the data  has  not
    been altered after  leaving the sender (message  integrity).  These  two
    security services are provided  by the message integrity check  options.
    Certificates for MICA must be distributed through means outside  of RTP.
    The services  offered by MICA and  MIC/MICK/MICS differ:   MIC/MICK/MICS
    differ:   With  MIC/MICK/MICS, the  receiver can  only verify  that  the
    message originated within the group holding the secret key,  rather than
    authenticate the sender  of the message,  while the MICA option  affords
    true authentication of the sender.

authentication, message integrity, and confidentiality: By   carrying   both
    the  message  integrity  check  and  ENC  option  in RTP  packets,   the
    authenticity, message  integrity and confidentiality  of the packet  can
    be  assured  (subject to  the  limitations  discussed  in  the  previous
    paragraph).

    The message  integrity check is applied  first to the  all parts of  the
    outgoing packet  to be  authenticated, and the  message integrity  check
    option is  prepended to those  parts.   Then, the  packet including  the
    message integrity  check option  is  encrypted using  the shared  secret
    key.    The ENC  option  must be  followed  immediately by  the  message
    integrity check  option,  without any  other options  in between.    The
    receiver first  decrypts the octets  following the ENC  option and  then
    authenticates the  decrypted data using the  signature contained in  the
    message integrity check option.

    For this combination of security features and group authentication,  the
    combination ENC and MIC is  recommended (instead of MICS or MICK) as  it
    yields the lowest processing overhead.


A message integrity  check option followed  by an ENC  option should not  be
used.


H. Schulzrinne/S. Casner             Expires 10/01/93              [Page 16]


INTERNET-DRAFT                       RTP                       July 30, 1993

4 Real Time Control Protocol --- RTCP


The real-time control protocol (RTCP)  conveys minimal control and  advisory
information during a conference.  It provides support for loosely controlled
conferences, i.e.,  where  participants enter  and leave  without  admission
control and parameter negotiation.   The services provided by RTCP  services
enhance RTP, but an end system does  not have to implement RTCP features  to
participate in conferences(1).   RTCP does not  aim to provide the  services
of a conference control protocol and  does not provide some of the  services
desirable for two-party conversations.  If a conference control protocol  is
in use, the  services of RTCP should  not be required.   (Note:   as of  the
writing of this document, a conference  or session control protocol has  not
been specified within the Internet.)

Unless otherwise  noted,  control  information is  carried  periodically  as
options within RPDUs, with  or without payload.   RTCP  packets are sent  to
all members of a conference.   These packets are  part of the same  sequence
nubmer space as RTP packets not containing RTCP options.  The period  should
be varied randomly  to avoid  synchronization of  all sources  and its  mean
should increase with the number of  participants in the conference to  limit
the growth of the overall  network and host interrupt load.   The length  of
the period determines, for example, how long a receiver joining a conference
has to wait in the worst case until it can identify the source.   A receiver
may remove from its list of active sites  a site that it has not heard  from
for a given time-out period; he time-out period may depend on the number  of
sites or the observed average interarrival time of RTCP messages.  Note that
not every periodic message has to contain all RTCP options; for example, the
MAIL part within the SDES option might only be sent every few messages.

The item types are defined below:


FMT 32   Format description.


        format:  6 bits
            The  'format' field  corresponds to  the  index value  from  the
            'format' RTP fixed header  field, with values ranging from 0  to
            63.

        Clock quality:  8 bits
            Provides an  indication as  to the  sender-perceived quality  of
            the timestamps  in the  RTP header.   The  octet is  interpreted
            as a quantity indicating  the maximum dispersion to a root  time
            server measured  in fractions  of a  second and  expressed as  a

------------------------------
 1. There  is one  exception to  that rule:   if  an application  sends  FMT
options, the receiver has to decode these in order to properly interpret the
RTP payload.


H. Schulzrinne/S. Casner             Expires 10/01/93              [Page 17]


INTERNET-DRAFT                       RTP                       July 30, 1993

            power of two.

            If a source  is known to be  synchronized to standard time,  but
            with an  unknown dispersion, or the  dispersion is greater  than
            TBD, the  value TBD  is used.    If the  clock is  based on  the
            nominal sample rate of the source, a value of TBD is used.

            The clock quality indication can be used to judge how  the delay
            measurements reported by the  QOS option can be interpreted  (as
            absolute delay or only as  delay variation).  It is also  useful
            for determining  to what extent  several sources with  different
            clocks can be synchronized.

        Format-dependent data:  variable
            Format-dependent data  may or may  not appear in  a FMT  option.
            It is passed to the next layer and not interpreted by RTP.


        A FMT mapping changes the  interpretation of a given 'format'  value
        (as carried  in  the  fixed  RTP  header)  starting  at  the  packet
        containing the  FMT option.    The new  interpretation applies  only
        to packets  from the  synchronization  source of  this  packet.    A
        sender should refrain  from changing  the mappings  between the  RTP
        format field and the other fields  in the FMT option that have  been
        established through a conference registry, a conference announcement
        protocol or otherwise.  Dynamic  changes to these values may  result
        in misinterpretation of RTP payload if the packet(s) containing  the
        FMT option are lost.


          0                   1                   2                   3
          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |F|     FMT     |    length     |R|R|  format   | clock quality |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |  format-dependent data                                       ...
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

SDES 33   This option provides a mapping between a numeric source identifier
        and one or more  identifying attributes.   [Several attributes  were
        combined into one option to avoid multiple mappings from identifiers
        to the receiver site data structure.]  For those applications  where
        the size of  a multipart SDES  option would be  a concern,  multiple
        SDES options may be formed with subsets  of the parts to be sent  in
        separate packets.

        An end  system or  a bridge  uses  an identifier  value of  zero  to
        identify itself.  For each  contributor, a bridge forwards the  SDES
        information received from  that contributor,  but  changes the  SDES
        source identifier to correspond to the value used in the CSRC option
        when identifying this contributor.   A bridge that contributes  data


H. Schulzrinne/S. Casner             Expires 10/01/93              [Page 18]


INTERNET-DRAFT                       RTP                       July 30, 1993

        to outgoing packets should  use a CSRC  and select another  non-zero
        source identifier for that  traffic and send  CSRC and SDES  options
        for it.

        Translators do not modify  or insert SDES options.   The end  system
        performs the same mapping  it uses to  identify the content  sources
        (that is, the combination of network source, synchronization  source
        and the  source  number  within  this SDES  option)  to  identify  a
        particular source.   SDES  information is specific  to a  particular
        flow identifier,  unless  a higher-layer  control  protocol  defines
        that all  packets  with  the same  source  identifier  (network  and
        transport-level source addresses and the optional SSRC value) from a
        set of channels defined by the control protocol are described by the
        same SDES.

        Currently, the following items  are defined.   Each has a  structure
        similar to that  of RTCP  and RTP  options,  that is,  a type  field
        followed by a length field  (measured in multiples of four  octets).
        No final bit  is needed  since the  overall length  is known.    All
        of the  SDES  items are  optional;  however,  if  quality-of-service
        monitoring is  to be  used,  the  ADDR and  TSEL  items need  to  be
        provided (see QOS option).


              type   value description
              ADDR   1     network address of source
              TSEL   2     transport address
              CNAME  4     canonical user and host identifier,
                           e.g., ``doe@sleepy.megacorp.com'' or
                           ``sleepy.megacorp.com''
              MAIL   5     user's electronic mail address
                           e.g., ``John.Doe@megacorp.com''
              LOC    8     geographic user location,
                           e.g., ``Rm.  2A244, Berkeley Heights, NJ''
              TXT    16    text describing the source,
                           e.g.,``John Doe, Bit Recycler, Megacorp''


        Items are padded with the binary value zero to the next multiple  of
        four octets.  Each item may appear only once unless otherwise noted.

        A more description of the content of some of these types follows:


        ADDR:  A source may  send several  network addresses,  but only  one
            for each address  type value.   Address types are identified  by
            the Domain Name Service Resource Record (RR) type,  as specified
            in the  current edition of  the Assigned Numbers  RFC. For  NSAP
            addresses, the NSEL byte is not included.

        TSEL:  The protocol identifier uses the IP Protocol Numbers  defined

H. Schulzrinne/S. Casner             Expires 10/01/93              [Page 19]


INTERNET-DRAFT                       RTP                       July 30, 1993

            in  the  current  edition  of  the  Assigned  Numbers  RFC.  The
            figure shows  the  use of  the TSEL  item for  the TCP  and  UDP
            protocols.  There must be no more than one TSEL item in  an SDES
            option.  The  TSEL item should precede any address  information.
            [Multiple  concurrent transport  addresses are  not  meaningful.
            The ordering simplifies processing at the receiver.]

        CNAME:  The  CNAME  item  must  have  the  format  ``user@host''  or
            ``host'', where ``host''  is the fully qualified domain name  of
            the host  where the  real-time data  originates from,  formatted
            according to  the  rules specified  in RFC  1034, RFC  1035  and
            Section 2.1 of  RFC 1123.   The ``host'' form  may be used if  a
            user name is not available, for example on  single-user systems.
            The  user name  should be  in  a form  that  a program  such  as
            ``finger'' or  ``talk'' could  use, i.e.,  it  typically is  the
            login name rather  than the ``real life'' name.   Note that  the
            host name  is not necessarily identical  to the electronic  mail
            address of the participant.  The latter is provided  through the
            MAIL item.

        LOC:  Depending on the application, different degrees of detail  are
            appropriate  for this  item.    For conference  applications,  a
            string like  ``Tampere, Finland'' may  be sufficient, while  for
            an active badge system, strings like ``Room 2A244, AT&T  BL MH''
            might be appropriate.


          0                   1                   2                   3
          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |F|     SDES    |    length     |       source identifier       |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |  type = ADDR  |    length     |    reserved   | address type  |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |                     network-layer address                    ...
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |  type = ADDR  |   length = 2  |    reserved   | addr. type = 1|
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |                          IPv4 address                         |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |  type = TSEL  |    length     |    reserved   | transport pro.|
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |                  transport-address (port number)              ...
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


H. Schulzrinne/S. Casner             Expires 10/01/93              [Page 20]


INTERNET-DRAFT                       RTP                       July 30, 1993

         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |  type = TSEL  |    length     |    reserved   |  protocol = 6 |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |            reserved           |       TCP port number         |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |  type = TSEL  |    length     |    reserved   | protocol = 17 |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |            reserved           |       UDP port number         |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |  type = CNAME |    length     | user and domain name         ...
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |  type = MAIL  |    length     | electronic mail address      ...
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |   type = LOC  |    length     | geographic location of site  ...
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |   type = TXT  |    length     | text describing source       ...
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


BYE 35   The BYE  option  indicates that  a  particular site  is  no  longer
        active.  A bridge sends BYE options with a (non-zero) content source
        value.   An  identifier  value of  zero  indicates that  the  source
        indicated by the  synchronization source (SSRC)  option and  network
        address is no  longer active.   If  a bridge shuts  down, it  should
        first send BYE options for all content sources it handles,  followed
        by a BYE option with an identifier value of zero.  Each RTCP message
        can contain one or  more BYE messages.   [Multiple identifiers in  a
        single BYE option are not  allowed to avoid ambiguities between  the
        special value of zero and any necessary padding.]


         0                   1                   2                   3
         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |F|     BYE     | length = 1    | content source identifier     |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

QOS 36   Quality  of  service  measurement.     The  QOS  options  describes
        statistics of a single synchronization source.  The  synchronization
        source is identified by one of  the ADDR items from the SDES  option
        together with the TSEL item  from the SDES option.   The SDES  items


H. Schulzrinne/S. Casner             Expires 10/01/93              [Page 21]


INTERNET-DRAFT                       RTP                       July 30, 1993

        are appended directly to  the fixed-length part  of the QOS  option,
        with TSEL following ADDR. For a description of these items, see  the
        SDES option.

        The other  fields  of the  option  contains the  number  of  packets
        received (32 bits), the  number of packets  expected (32 bits),  the
        minimum delay, the maximum delay and  the average delay.  The  delay
        measures are  encoded as  16/16 NTP  timestamps,  that is,  16  bits
        encode the number and seconds and 16 bits the fraction of a second.

        A single RTCP packet may  contain several QOS options.   It is  left
        to the  implementor to  decide  how often  to transmit  QOS  options
        and which  sources  are to  be  included.    [The  timestamp  format
        is identical  to  the  one used  in  the  fixed RTP  header.     The
        quality-of-service information is identical  to that carried in  the
        reverse control option.]


          0                   1                   2                   3
          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |F|     QOS     |    length     |     synchronization source    |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |                       packets expected                        |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |                       packets received                        |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         | minimum delay (seconds)       | minimum delay (fraction)      |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         | maximum delay (seconds)       | maximum delay (fraction)      |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         | average delay (seconds)       | average delay (fraction)      |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |  type = ADDR  |    length     |    reserved   | address type  |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |                     network-layer address                    ...
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |  type = TSEL  |    length     |    reserved   | transport pro.|
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |                  transport-address (port number)              ...
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


5 Security Considerations


IP multicast provides no direct means for a sender to know all the receivers
of the data  sent.   RTP  options make  it easy  for all  participants in  a
conference to  identify themselves;  if deemed  important  for a  particular


H. Schulzrinne/S. Casner             Expires 10/01/93              [Page 22]


INTERNET-DRAFT                       RTP                       July 30, 1993

application, it  is the  responsibility of  the application  writer to  make
listening without identification difficult.   It  should be noted,  however,
that within  an internet,  privacy  of the  payload  can generally  only  be
assured by encryption.

The periodic transmission of session messages may make it possible to detect
denial-of-service attacks.  For many types of payload expected to be carried
in RTP packets, such as compressed audio  and video, the data is very  close
to white noise, making  statistics-based ciphertext-only attacks  difficult.
Without MICS/MICA options, it may even be difficult to detect  automatically
when the  code  has been  broken.    However,  the  session  information  is
more or less  constant and  predictable,  allowing known-plaintext  attacks.
Chosen-plaintext attacks appear to be difficult.

Since the timestamp in the RTP header is protected by the message  integrity
check options, some replay attacks can be detected if the receiver can bound
the maximum packet delay and clock offset of the sender.

Without authentication, the SDES fields  may be used to impersonate  another
site.    Impersonation  and  denial-of-service  attacks  can  be  made  more
difficult by providing  digital signatures for  all or parts  of a  message.
The MICA  or  MICS  and ENC  RTP  options  described in  Section  3  support
privacy within group  communications.   The issues of  key distribution  and
a certification  hierarchy  are outside  the  scope of  this  document.    A
direct mapping  of  all PEM  header  fields  into RTCP  option  types  would
be straightforward and  would allow reuse  of existing PEM  implementations.
However,  it  is  questionable  whether  loose  conference  control  is  the
appropriate mechanism for distributing key and certificate information.


6 RTP over network and transport protocols


This section describes  issues specific  to carrying  RPDUs over  particular
network and transport protocols.


6.1 Defaults


The following rules apply unless superseded by protocol-specific subsections
in this section.

If RTP protocol data units (RPDU) are carried over underlying protocols that
provide the abstraction  of a  continuous bit stream  rather than  messages,
each RPDU is  prefixed by a  32-bit framing field  containing the length  of
the RPDU measured in octets, not including the framing field itself.  If  an
RPDU traverses a path  over a mixture  of octet-stream and  message-oriented
protocols, each RTP-level bridge between these protocols is responsible  for
adding and removing the framing field.  A profile may determine that framing


H. Schulzrinne/S. Casner             Expires 10/01/93              [Page 23]


INTERNET-DRAFT                       RTP                       July 30, 1993

is to  be used  for protocols  that do  provide framing  in order  to  allow
carrying several RPDUs  in one  underlying protocol  data unit.    [Carrying
several RPDUs in one network or transport packet reduces header overhead and
may ease synchronization between different streams.]


6.2 ST-II


The next protocol  field (``NextPCol'',  Section  4.2.2.10 in  RFC 1190)  is
used to distinguish  two encapsulations of  RTP over ST-II.  The first  uses
NextPCol value TBD and  directly places the RPDU  into the ST-II data  area.
If NextPCol value TBD is used, the RTP header is preceded by a 32-bit header
shown below.   The  byte count  determines the number  of bytes  in the  RTP
header and payload to be checksummed.  The 16-bit checksum uses the TCP  and
UDP checksum algorithm.


  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | count of bytes to be checked  |           check sum           |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
... RTP header ...
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


A Implementation Notes


We describe aspects of the receiver  implementation in this section.   There
may be other implementation methods that are faster in particular  operating
environments or have other advantages.   These implementation notes are  for
informational purposes only.


A.1 Timestamp recovery


A fully specified NTP timestamp with 32 bits of full seconds and 16 bits  of
resolution for the fractional seconds can  be easily recovered from the  RTP
timestamp.  The following code stores timestamps as the 48-bit whole part of
a double precision floating point number:


#include <math.h>

typedef double CLOCK_t;
typedef unsigned long u_long;

H. Schulzrinne/S. Casner             Expires 10/01/93              [Page 24]


INTERNET-DRAFT                       RTP                       July 30, 1993

#define MAX32_bit 4294967296.
#define MAX31 0x7fffffff

CLOCK_t extend_timestamp(t, now)
u_long t;    /* in: timestamp, low-order 32 bits */
double now;  /* in: current local time */
{
  u_long high, low;   /* high and low order bits of 48-bit clock */

  low  = fmod(x, MAX_32bit);
  high = now / MAX_32bit;

  if      ((low > t) && (low - t > MAX31)) high++;
  else if ((low < t) && (t - low > MAX31)) high--;
  return high * MAX_32bit + t;
} /* extend_timestamp */


Using the full timestamp internally has the advantage that the remainder  of
the receiver code does not have to be concerned with modulo arithmetic.  The
current local time  does not  have to be  derived directly  from the  system
clock for every packet; a clock  based on samples, e.g., incremented by  the
nominal audio frame duration, is sufficient.


A.2 Detecting the Beginning of a Synchronization Unit


RTP packets contain a bit flag indicating the end of a synchronization unit.
The following code  fragment determines if  a packet is  the beginning of  a
synchronization unit:


CLOCK_t eos_t, t, now;
int flag;

struct {
  unsigned int ver:2;      /* version number */
  unsigned int flow:6;     /* flow */
  unsigned int o:1;        /* option present */
  unsigned int s:1;        /* sync bit */
  unsigned int format:6;   /* content type */
  u_short seq;             /* sequence number */
  u_long  ts;              /* time stamp */
} *h;

t = extend_timestamp(h->ts, now);

if (h->s) {

H. Schulzrinne/S. Casner             Expires 10/01/93              [Page 25]


INTERNET-DRAFT                       RTP                       July 30, 1993

  flag  = 1;
  eos_t = t;
}
else if (flag && t > eot_t) {
  flag = 0;
  /* handle beginning of synchronization unit */
}


(The structure definition has to be changed for little endian systems.)


A.3 Demultiplexing and Locating the Synchronization Source


For a combination of multicast  or destination unicast address,  destination
port, the flow ID determines  the channel.   For each channel, the  receiver
maintains a list of all  sources, content and synchronization sources  alike
in a table  or other suitable  datastructure.   Synchronization sources  are
stored with a content source value of zero.  When an RTP packet arrives, the
receiver determines its  network source address  and port (from  information
returned by the operating system), synchronization source (SSRC option)  and
content source(s)  (CSRC option).    To locate  the table  entry  containing
timing information,  mapping from  content  descriptor to  actual  encoding,
etc., the  receiver sets  the content  source to  zero and  locates a  table
entry based on the triple (network address and port, synchronization  source
identifier, 0).

The receiver identifies  the contributors to  the packet  (for example,  the
speaker who is  heard in  the packet) through  the list  of content  sources
carried in the CSRC option.   To locate the  table entry, it matches on  the
triple (network address and port, synchronization source identifier, content
source).

Note that  since  network  addresses  are  only  generated  locally  at  the
receiver, the receiver can choose whatever format seems most appropriate for
matching.  For example, a Berkeley Unix-based system may use struct sockaddr
data types if it expects network sources with non-IP addresses.


Acknowledgments


This draft  is based  on discussion  within the  IETF audio-video  transport
working group  chaired by  Stephen Casner.    The current  protocol has  its
origins in the Network Voice Protocol  and the Packet Video Protocol  (Danny
Cohen and Randy Cole) and the protocol implemented by the 'vat'  application
(Van Jacobson and  Steve McCanne).    Stuart Stubblebine  (ISI) helped  with
the security aspects of  RTP. Ron Frederic  (Xerox PARC) provided  extensive
editorial assistance.


H. Schulzrinne/S. Casner             Expires 10/01/93              [Page 26]


INTERNET-DRAFT                       RTP                       July 30, 1993

B Addresses of Authors


Stephen Casner
USC/Information Sciences Institute
4676 Admiralty Way
Marina del Rey, CA 90292-6695
telephone:  +1 310 822 1511 (extension 153)
electronic mail:  casner@isi.edu


Henning Schulzrinne
AT&T Bell Laboratories
MH 2A244
600 Mountain Avenue
Murray Hill, NJ 07974
telephone:  +1 908 582 2262
electronic mail:  hgs@research.att.com


References


 [1] J.  Postel, ``Internet protocol,''  Network Working  Group Request  for
     Comments RFC 791, Information Sciences Institute, Sept. 1981.

 [2] International  Standards   Organization,  ``ISO/IEC  DIS   10646-1:1993
     information technology -- universal multiple-octet coded  character set
     (UCS) -- part I: Architecture and basic multilingual plane,'' 1993.

 [3] The  Unicode Consortium,  T_h_e_ U_n_i_c_o_d_e_  S_t_a_n_d_a_r_d_.  New York,  New  York:
     Addison-Wesley, 1991.

 [4] D.  L. Mills,  ``Network time  protocol (version  3) --  specification,
     implementation  and  analysis,''  Network  Working  Group  Request  for
     Comments RFC 1305, University of Delaware, Mar. 1992.

 [5] D. Balenson, ``Privacy enhancement for internet electronic mail:   Part
     III:  Algorithms,  modes,  and  identifiers,''  Network  Working  Group
     Request for Comments RFC 1423, IETF, Feb. 1993.

 [6] V.  L. Voydock  and S.  T. Kent,  ``Security  mechanisms in  high-level
     network  protocols,'' A_C_M_  C_o_m_p_u_t_i_n_g_ S_u_r_v_e_y_s_,  vol. 15,  pp.  135--171,
     June 1983.

 [7] J. Kaliski,  Burton S., ``The  md2 message-digest algorithm,''  Network
     Working Group  Request for  Comments RFC 1319,  RSA Laboratories,  Apr.
     1992.

 [8] R. Rivest, ``The MD5 message-digest algorithm,'' Network  Working Group


H. Schulzrinne/S. Casner             Expires 10/01/93              [Page 27]


INTERNET-DRAFT                       RTP                       July 30, 1993

     Request for Comments RFC 1321, IETF, Apr. 1992.

 [9] P. Mockapetris,  ``Domain names --  concepts and facilities,''  Network
     Working Group Request for Comments RFC 1034, ISI, Nov. 1987.

[10] P. Mockapetris,  ``Domain names -- implementation and  specification,''
     Network Working Group Request for Comments RFC 1035, ISI, Nov. 1987.


H. Schulzrinne/S. Casner             Expires 10/01/93              [Page 28]