Network Working Group                                        S. Wenger
Internet Draft                                               Y.-K. Wang
Document: draft-wenger-avt-rtp-svc-01.txt                    T. Schierl
Expires: September 2006
                                                             March 2006


                   RTP Payload Format for SVC Video


Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on September 5, 2006.

Copyright Notice

   Copyright (C) The Internet Society (2006).


Abstract

   This memo describes an RTP Payload format for the scalable extension
   of the ITU-T Recommendation H.264 video codec which is the
   technically identical to ISO/IEC International Standard 14496-10
   video codec.  The RTP payload format allows for packetization of one
   or more Network Abstraction Layer Units (NALUs), produced by the
   video encoder, in each RTP payload.  The payload format has wide
   applicability, as it supports applications from simple low bit-rate
   conversational usage, to Internet video streaming with interleaved
   transmission, to high bit-rate video-on-demand.

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006

Table of Content

   RTP Payload Format for SVC Video...............................1
   1.   Introduction............................................4
   1.1.   SVC - the scalable extensions of H.264/AVC................4
   2.   Conventions.............................................4
   3.   The SVC Codec ...........................................4
   3.1.   Overview..............................................4
   3.2.   Parameter Set Concept...................................5
   3.3.   Network Abstraction Layer Unit Header ....................5
   4.   Scope...................................................8
   5.   Definitions and Abbreviations.............................8
   5.1.   Definitions............................................8
   5.2.   Abbreviations..........................................9
   6.   RTP Payload Format.......................................9
   6.1.   Design Principles......................................9
   6.2.   RTP Header Usage......................................10
   6.3.   Common Structure of the RTP Payload Format...............10
   6.4.   NAL Unit Header Usage..................................10
   6.5.   Packetization Modes....................................11
   6.6.   Decoding Order Number (DON)............................11
   6.7.   Single NAL Unit Packet.................................11
   6.8.   Aggregation Packets....................................11
   6.9.   Fragmentation Units (FUs)..............................11
   7.   Packetization Rules.....................................11
   8.   De-Packetization Process (Informative)....................11
   9.   Payload Format Parameters................................12
   9.1.   MIME Registration.....................................12
   9.2.   SDP Parameters........................................13
   9.2.1.  Mapping of MIME Parameters to SDP......................13
   9.2.2.  Usage with the SDP Offer/Answer Model..................14
   9.2.3.  Usage in Declarative Session Descriptions..............14
   9.3.   Examples.............................................14
   9.4.   Parameter Set Considerations ...........................14
   10.  Security Considerations .................................14
   11.  Congestion Control......................................14
   12.  IANA Consideration......................................15
   13.  Informative Appendix: Application Examples ................15
   13.1.  Introduction..........................................15
   13.2.  Layered Multicast.....................................15
   13.3.  Streaming of an SVC scalable stream.....................16
   13.4.  Multicast to MANE, SVC scalable stream to endpoint........17
   13.5.  Scenarios currently not considered for complexity reasons..18
   13.6.  Scenarios currently not considered for being unaligned with
   IP philosophy...............................................18
   14.  Informative Appendix: NAL Unit Re-ordering for Layered
   Multicast...................................................19
   14.1.  Examples.............................................19
   14.2.  Discussion: Using enhanced DON over different RTP sessions.24
   15.  Acknowledgements........................................24
   16.  References.............................................24
   16.1.  Normative References...................................24
   16.2.  Informative References.................................25
   17.  Author's Addresses......................................25
   18.  Intellectual Property Statement..........................25

Wenger, Wang, Schierl      Standards Track                    [page 2]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006

   19.  Disclaimer of Validity..................................26
   20.  Copyright Statement.....................................26
   21.  RFC Editor Considerations................................26
   22.  Open Issues............................................26
   23.  Changes Log............................................26


Wenger, Wang, Schierl      Standards Track                    [page 3]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006


1. Introduction

1.1. SVC - the scalable extensions of H.264/AVC

   This memo specifies an RTP [RFC3550] payload format for a
   forthcoming new mode of the H.264/AVC video codec, known as Scalable
   Video Coding (SVC). Formally, SVC will take the form of an Amendment
   to ISO/IEC 14496 Part 10 [MPEG4-10], and likely as one or more new
   Annexes of ITU-T Rec. H.264 [H.264].  It is planned to keep the
   technical alignment between the two mentioned specifications, as
   well as backward compatibility with previous versions of H.264/AVC.

   The current working draft of SVC is available for public review
   [SVC]. Technical maturity will be reached perhaps around mid 2006.
   In this memo, SVC is used as an acronym for the mentioned scalable
   extensions of H.264/AVC.

   SVC covers all of H.264/AVC's applications, ranging from all forms
   of digital compressed video from, low bit-rate Internet streaming
   applications to HDTV broadcast and Digital Cinema applications with
   nearly lossless coding.

   This memo tries to follow a backward compatible enhancement
   philosophy similar to what the video coding standardization
   committees implement, by keeping as close an alignment to the
   H.264/AVC payload RFC [RFC3984] as possible.  It basically documents
   the enhancements relevant from an RTP transport viewpoint, defines
   signaling support for SVC, and deprecates the single NAL unit mode
   of RFC 3984.

2. Conventions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in BCP 14, RFC 2119
   [RFC2119].

   This specification uses the notion of setting and clearing a bit
   when bit fields are handled.  Setting a bit is the same as assigning
   that bit the value of 1 (On).  Clearing a bit is the same as
   assigning that bit the value of 0 (Off).

3. The SVC Codec

3.1. Overview

   SVC provides scalable video bitstreams.  A scalable video bitstream
   contains a base layer and one or more enhancement layers.  An
   enhancement layer may enhance the temporal resolution (i.e. the
   frame rate), the spatial resolution, or the quality of the video
   content represented by the lower layer or part thereof.  The
   scalable layers can be aggregated to a single RTP stream, or
   transported independently.


Wenger, Wang, Schierl      Standards Track                    [page 4]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006

   The concept of video coding layer (VCL) and network abstraction
   layer (NAL) is inherited from AVC. The VCL contains the signal
   processing functionality of the codec; mechanisms such as transform,
   quantization, motion-compensated prediction, loop filtering and
   inter-layer prediction.  A coded picture of a base or enhancement
   layer consists of one or more slices.  The Network Abstraction Layer
   (NAL) encapsulates each slice generated by the VCL into one or more
   Network Abstraction Layer Units (NAL units). Please consult RFC 3984
   for a more in-depth discussion of the NAL unit concept.  SVC
   specifies the decoding order of these NAL units.

   The term "Layer" in Video Coding Layer and Network Abstraction Layer
   refers to a conceptual distinction, and is closely related to syntax
   layers (block, macroblock, slice, ... layers).  It should not be
   confused with base and enhancement layers.

   The concept of scaling the visual content quality by omitting the
   transport and decoding of entire enhancement layers is denoted as
   coarse-grained scalability (CGS).

   In some cases, the bit rate of a given enhancement layer can be
   reduced by truncating bits from individual NAL units.  Truncation
   leads to a graceful degradation of the video quality of the
   reproduced enhancement layer.  This concept is known as Fine
   Granularity Scalability (FGS).


3.2. Parameter Set Concept

   The parameter set concept is inherited from AVC. In SVC, pictures
   from different layers may use the same sequence or picture parameter
   set and may also use different sequence or picture parameter sets.
   If different sequence parameter sets are used, then at any time
   instant during the decoding process, there may be more than one
   active sequence picture parameter set. Any specific active sequence
   parameter set remains unchanged throughout a coded video sequence in
   the layer in which the active sequence parameter set is referred to.
   The active picture parameter set remains unchanged within a coded
   picture.

3.3. Network Abstraction Layer Unit Header

   An SVC NAL unit consists of a header of one, two or three bytes and
   the payload byte string.  The header indicates the type of the NAL
   unit, the (potential) presence of bit errors or syntax violations in
   the NAL unit payload, information regarding the relative importance
   of the NAL unit for the decoding process, and (optionally, when the
   header is of three bytes) the scalable layer decoding dependency
   information. This RTP payload specification is designed to be
   unaware of the bit string in the NAL unit payload.

   The NAL unit header co-serves as the payload header of this RTP
   payload format.  The payload of a NAL unit follows immediately.


Wenger, Wang, Schierl      Standards Track                    [page 5]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006

   The syntax and semantics of the NAL unit header are specified in
   [SVC], but the essential properties of the NAL unit header are
   summarized below.

   The first byte of the NAL unit header has the following format (the
   bit fields are the same as in H.264/AVC and RFC 3984, while the
   semantics are slightly different, in a backward compatible way):

         +---------------+
         |0|1|2|3|4|5|6|7|
         +-+-+-+-+-+-+-+-+
         |F|NRI|  Type   |
         +---------------+

   F: 1 bit
   forbidden_zero_bit.  The H.264 specification declares a value of 1
   as a syntax violation.

   NRI: 2 bits
   nal_ref_idc.  A value of 00 indicates that the content of the NAL
   unit is not used to reconstruct reference pictures for inter picture
   prediction.  Such NAL units can be discarded without risking the
   integrity of the reference pictures in the same layer.  Values
   greater than 00 indicate that the decoding of the NAL unit is
   required to maintain the integrity of the reference pictures. For a
   slice or slice data partitioning NAL unit, a NRI value of 11
   indicates that the NAL unit contains data of a key picture, as
   specified in [SVC].

   Informative Note: The concept of a key picture has been introduced
   in SVC, and no assumption should be made that any pictures in bit
   streams compliant with the 2003 and 2005 versions of H.264 follow
   this rule.

   Type: 5 bits
   nal_unit_type.  This component specifies the NAL unit payload type
   as defined in table 7-1 of [SVC], and later within this memo.  For a
   reference of all currently defined NAL unit types and their
   semantics, please refer to section 7.4.1 in [SVC].

   Previously, NAL unit types 20 and 21 (among others) have been
   reserved for future extensions.  SVC is using these two NAL unit
   types.  They indicate the presence of one more byte that is helpful
   from a transport viewpoint.  The additional byte(s), described
   below, is called transport priority indicator.

         +---------------+
         |0|1|2|3|4|5|6|7|
         +-+-+-+-+-+-+-+-+
         |   PRID    |D|E|
         +---------------+

   PRID: 6 bits
   simple_priority_id.  This component specifies a priority identifier
   for the NAL unit. When extension_flag (E) is equal to 0,

Wenger, Wang, Schierl      Standards Track                    [page 6]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006

   simple_priority_id is used for inferring the values of
   temporal_level (TL), dependency_id (DID), , and quality_level (QL).
   When simple_priority_id is not present, it shall be inferred to be
   equal to 0.

   D: 1 bit
   discardable_flag.  A value of 1 indicates that the content of the
   NAL unit with dependency_id equal to currDependencyId is not used in
   the decoding process of NAL units with dependency_id larger than
   currDependencyId.  Such NAL units can be discarded without risking
   the integrity of higher scalable layers with larger values of
   dependency_id.  discardable_flag equal to 0 indicates that the
   decoding of the NAL unit is required to maintain the integrity of
   higher scalable layers with larger values of dependency_id.

   E: 1 bit
   extension_flag.  A value of 1 indicates that the third byte of the
   NAL unit header is present.

   When the E-bit of the second byte is 1, then the NAL unit header
   extends to a third byte:

         +---------------+
         |0|1|2|3|4|5|6|7|
         +-+-+-+-+-+-+-+-+
         | TL  | DID | QL|
         +---------------+

   TL: 3 bits
   temporal_level indicates the temporal layer (or frame rate)
   hierarchy. A layer consisted of pictures of a smaller temporal_level
   value has a smaller frame rate.

   DID: 3 bits
   dependency_id denotes the inter-layer coding dependency hierarchy.
   At any temporal location, a picture of a smaller dependency_id value
   may be used for inter-layer prediction for coding of a picture of a
   larger dependency_id value, while a picture of a larger
   dependency_id value is disallowed to be used for inter-layer
   prediction for coding of a picture of a smaller dependency_id value.

   QL: 2 bits
   quality_level designates the quality level hierarchy of a
   progressive refinement slice. At any temporal location and with
   identical dependency_id value, a quality enhancement of a picture
   with quality_level value equal to ql uses the quality enhancement or
   base quality information (the non-quality enhancement information of
   the slice when ql = 1) of the slice with quality_level value equal
   to ql-1 for inter-layer prediction. When quality_level is larger
   than 0, the NAL unit contains a progressive refinement slice or part
   thereof.

   This memo introduces new NAL unit types, which are presented in
   section 5.2.  The NAL unit types defined in this memo are marked as
   unspecified in [SVC].  Moreover, this specification extends the

Wenger, Wang, Schierl      Standards Track                    [page 7]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006

   semantics of F, NRI, PRID, D, TL, DID and QL as described in section
   5.3.

4. Scope

   This payload specification can only be used to carry the "naked" SVC
   NAL unit stream over RTP, and not the bitstream format according to
   in Annex B of [SVC].  Likely, the applications of this specification
   will be in the IP based multimedia communications fields including
   conversational multimedia, video telephony or video conferencing,
   Internet streaming and TV over IP.

   This specification allows, in a given RTP session, to encapsulate
   NAL units belong to
     o the base layer, or
     o one or more enhancement layers, or
     o the base layer and one or more enhancement layers


5. Definitions and Abbreviations

5.1. Definitions

   This document uses the definitions of [SVC] and [H.264].  The
   following terms, defined in [SVC], are summed up for convenience:

   scalable bitstream: an SVC compliant bit stream containing a base
   layer and at least one enhancement layer.

   base layer:  The base layer is typically representing the minimal
   temporal and, or spatial resolution and, or minimal quality of an
   SVC bitstream.  The base layer may be fully complying with [H.264].
   The base layer is independently decodable without the requirement of
   using any other layer of the SVC bitstream.  If the base layer
   contains NAL units fully conforming to [H.264] only, the layer is
   called H.264/AVC base layer.  For such a layer the ability of
   signaling transport priority (simple_priority_id or temporal_level,
   dependency_id and quality_level) per NAL unit may not be given.

   operation point: A operation of a SVC bitstream represents a certain
   level of temporal, spatial and quality scalability.  An operation
   point contains all NAL units required for successfully decoding a
   certain SVC enhancement layer, which represents the highest value of
   temporal and, or spatial and, or quality of the operation point.

   scalable enhancement layer:  an SVC enhancement layer is identified
   by a certain NAL unit header value (transport priority) of
   simple_priority_id or, if present, by a combination of
   temporal_level, dependency_id, quality_level as defined in [SVC] and
   summarized in section 3.3.

   access unit: A set of NAL units pertaining to a certain temporal
   location. An access unit includes the slice data of the pictures of
   all scalable layers at that temporal location and possibly other
   associated data e.g. SEI messages and parameter sets.

Wenger, Wang, Schierl      Standards Track                    [page 8]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006


   coded video sequence: A sequence of access units that consists, in
   decoding order, of an instantaneous decoding refresh (IDR) access
   unit followed by zero or more non-IDR access units including all
   subsequent access units up to but not including any subsequent IDR
   access unit.

   IDR access unit: An access unit in which all the primary coded
   pictures are IDR pictures.
   [Edt. note: This needs to be updated according to the new adoption
   of the enhancement-layer IDR (EIDR) concept in January 2006. At the
   time of writing, the SVC spec update for the January JVT meeting has
   not yet been available.]

   IDR picture: A coded picture with the property that the decoding of
   this coded picture and all the following coded pictures in decoding
   order, in the same layer (i.e. with the same values of dependency_id
   and quality_level, respectively), can be performed without inter
   prediction from any picture prior to the coded picture in decoding
   order in the same layer. An IDR picture causes a "reset" in the
   decoding process of the scalable layer containing the IDR picture.
   [Edt. note: This needs to be updated according to the new adoption
   of the enhancement-layer IDR (EIDR) concept in January 2006. At the
   time of writing, the SVC spec update for the January JVT meeting has
   not yet been available.]

   progressive refinement slice: A progressive refinement slice [SVC]
   is contained in an SVC NAL unit and may be signaled, if
   extension_flag equal to one, by a quality_level not equal to zero.
   Such slices can be truncated byte-wise from the end in NAL unit
   payload byte-string order for bit-rate and quality reduction.  This
   ability is also known as Fine Granularity Scalability (FGS).

5.2. Abbreviations

   In addition to the abbreviations defined in [RFC3984], the following
   ones are defined.

   CGS:       Coarse Granularity Scalability
   FGS:       Fine Granularity Scalability

6. RTP Payload Format

6.1. Design Principles

   The authors tried to follow design principles as follows:

   o Backward compatibility with RFC 3984 wherever possible.

   o As we expect the SVC base layer to be H.264/AVC compatible, we
     assume the base layer (when transmitted in its own session) to be
     encapsulated using RFC 3984.  Requiring this has the desirable
     side effect that it can be used by RFC3984 legacy devices.


Wenger, Wang, Schierl      Standards Track                    [page 9]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006

   o MANEs are signaling aware and rely on signaling information.
     In other words, MANEs have state.

   o MANEs terminate RTP sessions, and create different RTP sessions
     with perhaps modified content.
     Edt. Note: need to clarify this wrt. Translators and Mixers in the
     spirit of PV06 paper.

   o MANEs are within the security context of the RTP session.

   o Packet integrity needs to be preserved end-to-end (whereby
     end-to-end can mean endpoint to endpoint but also endpoint to
     MANE.

   o others?

6.2. RTP Header Usage

   Please see section 5.1 of RFC3984 [RFC3984].

6.3. Common Structure of the RTP Payload Format

   Please see section 5.2 of RFC3984 [RFC3984].

6.4. NAL Unit Header Usage

   The structure and semantics of the NAL unit header were introduced
   in section 3.3.  This section specifies the semantics of F, NRI,
   PRID, D, TL, DID and QL according to this specification.

   The semantics of F specified in section 5.3 of [RFC3984] also
   applies herein.

   For NRI, for the bitstream that is compliant with AVC, the semantics
   specified in section 5.3 of [H.264] are applicable, otherwise only
   the semantics specified in SVC [SVC] is applicable.

   For PRID, in addition to the semantics specified in [SVC], according
   to this RTP payload specification, values of PRID indicate the
   relative transport priority, as determined by the sender, which is
   typically increasing from a layer of lower to a layer of higher
   importance.  MANEs implementing unequal error protection can use
   this information to protect more important NAL units better than
   less important ones, for example by including only the more
   important NAL units in a FEC protection mechanism.  The transport
   priority increases as the PRID value increases.

   For D, MANEs can use this information to protect NAL units with D
   equal to 0 better than NAL units with D equal to 1. Furthermore a
   MANE can determine whether the transmission of a NAL unit is
   required for successfully decoding a certain operation point of the
   SVC bitstream.

   For TL, DID and QL, in addition to the semantics specified in [SVC],
   according to this RTP payload specification, values of TL, DID or QL

Wenger, Wang, Schierl      Standards Track                    [page 10]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006

   indicate the relative transport priority.  MANEs can use this
   information to protect more important NAL units better than less
   important NAL units.  A higher value of TL, DID or QL indicates a
   higher priority if the other two components are identical
   correspondingly.

      Informative note: Using of PRID, D, TL, DID and QL in combination
      may better indicate the relative transport priority. [Edt. note:
      such examples may be provided in Informative Appendix 13 in
      future versions.]

6.5. Packetization Modes

   Please see section 5.4 of RFC3984 [RFC3984].  The single NAL unit
   mode SHALL NOT be used.

6.6. Decoding Order Number (DON)

   Please see section 5.5 of RFC3984 [RFC3984].

6.7. Single NAL Unit Packet

   Please see section 5.6 of RFC3984 [RFC3984].

6.8. Aggregation Packets

   Please see section 5.7 of RFC3984 [RFC3984].

6.9. Fragmentation Units (FUs)

   Please see section 5.8 of RFC3984 [RFC3984].

7. Packetization Rules

   Please see section 6 of RFC3984 [RFC3984].  The following rules
   apply in addition.

   The single NAL unit mode SHALL NOT be used.

   In an RTP session, the first NAL unit of an aggregation packet SHALL
   have a two- or three-byte NAL unit header containing the transport
   priority indicator, as described in section 3.3.  Non-VCL NAL units
   SHALL be transmitted out-of-band or in a separate session for the
   current state of this specification.  If aggregating NAL units of
   different layers within one aggregation packet, the first NAL unit
   of the packet MUST have the highest transport priority of all NAL
   units contained in the packet. The order of NAL units within a
   packet is the same as the decoding order.

8. De-Packetization Process (Informative)

   Please see section 7 of RFC3984 [RFC3984].  The following rules
   apply in addition.

   The single NAL unit mode SHALL NOT be used.

Wenger, Wang, Schierl      Standards Track                    [page 11]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006


   Layered multicast is supported by this specification.  An
   informative appendix on recovering NAL unit decoding order in
   layered multicast can be found in section 14.

9. Payload Format Parameters

   [Edt. note: this section 9 and its subsections will be updated
   according to the changes listed below, a little later in the
   process.  For now, we just list the adjustments necessary, so not to
   bury any new information in the RFC 3984 text.]

   Section 8 of [RFC3984] applies with the following modification.

   The sentence

   "The parameters are specified here as part of the MIME subtype
   registration for the ITU-T H.264 | ISO/IEC 14496-10 codec."

   is replaced with

   "The parameters are specified here as part of the MIME subtype
   registration for the SVC codec."

9.1. MIME Registration

   The MIME subtype for the SVC codec is allocated from the IETF tree.

   The receiver MUST ignore any unspecified parameter.

   Media Type name:     video

   Media subtype name:  H.264-SVC

   Required parameters: none

   OPTIONAL parameters:

   The optional MIME parameters specified in [RFC3984] apply, in
   addition to the following.

   sprop-scalability-info:
   This parameter MAY be used to convey the NAL unit containing the
   scalability information SEI message that MUST precede any other NAL
   units in decoding order. The parameter MUST NOT be used to indicate
   codec capability in any capability exchange procedure.  The value of
   the parameter is the base64 representation of the NAL unit
   containing the scalability information SEI message as specified in
   [SVC].

   sprop-transport-priority:
   This parameter MAY be used to signal the transport priority
   indicator value(s) in terms of the one or two byte SVC NAL unit
   header extension of one or more SVC layer(s) of one RTP session.  A
   transport priority indicator is base64 coded.  If more than one

Wenger, Wang, Schierl      Standards Track                    [page 12]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006

   layer is transmitted within one RTP session, the transport priority
   indicator value of each layer MUST be itemized with decreasing
   importance for decoding and MUST be comma-separated.
   If a H.264/AVC base layer is part of the RTP session, this parameter
   SHALL not be used.

      Encoding considerations:
                           This type is only defined for transfer via
   RTP (RFC 3550).

      Security considerations:
                           See section 9 of this specification.

      Public specification:
                           Please refer to section 15 of this
   specification.

      Additional information:
                           None

      File extensions:     none
      Macintosh file type code: none
      Object identifier or OID: none
      Person & email address to contact for further information:
      Intended usage:      COMMON
      Author:
      Change controller:
                           IETF Audio/Video Transport working group
                           delegated from the IESG.

9.2. SDP Parameters

9.2.1.    Mapping of MIME Parameters to SDP

   The MIME media type video/SVC string is mapped to fields in the
   Session Description Protocol (SDP) as follows:

   *  The media name in the "m=" line of SDP MUST be video.

   *  The encoding name in the "a=rtpmap" line of SDP MUST be SVC (the
      MIME subtype).

   *  The clock rate in the "a=rtpmap" line MUST be 90000.

   *  The OPTIONAL parameters "profile-level-id", "max-mbps", "max-fs",
      "max-cpb", "max-dpb", "max-br", "redundant-pic-cap", "sprop-
      parameter-sets", "parameter-add", "packetization-mode", "sprop-
      interleaving-depth", "deint-buf-cap", "sprop-deint-buf-req",
      "sprop-init-buf-time", "sprop-max-don-diff", "max-rcmd-nalu-
      size", "sprop-transport-priority", and "sprop-scalability-info",
      when present, MUST be included in the "a=fmtp" line of SDP. These
      parameters are expressed as a MIME media type string, in the form
      of a semicolon separated list of parameter=value pairs.


Wenger, Wang, Schierl      Standards Track                    [page 13]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006

9.2.2.    Usage with the SDP Offer/Answer Model

   TBD.

9.2.3.    Usage in Declarative Session Descriptions

   TBD.

9.3. Examples

   TBD.

9.4. Parameter Set Considerations

   Please see section 10 of RFC3984 [RFC3984].

10.  Security Considerations

   Please see section 11 of RFC3984 [RFC3984].

11.  Congestion Control

   Within any given RTP session carrying payload according to this
   specification, the provisions of section 12 of RFC3984 [RFC3984]
   apply.

   One key motivation for the recent attention to scalable codecs has
   been the increasing awareness of media codec designers to network
   congestion.  While CGS scalability cannot reduce congestion for the
   transport path of a given RTP session, MANEs and layered multicast
   technologies can be used to alleviate congestion on a larger scale.
   FGS scalability can be helpful to reduce session bandwidth both end-
   to-end (with pre-coded content) and in network segments, again
   assuming the use of MANEs.

   MANEs MAY alleviate congestion on their outgoing network path by
   a) removing the NAL units belonging to hierarchically "highest"
      enhancement layer (or set of enhancement layers) from an RTP
      stream carrying base and enhancement layers.
   b) removing some or all bits of a given FGS NAL unit as long as the
      remaining bits still form a conforming SVC NAL unit.

   Edt. note: In the following paragraph, "translator" and "mixer" are
   not used consistently with RFC 3550.  What we think we would need is
   a "mixer" that mixes only a single input in a single output (as a
   mixer terminates sessions).  A "Translator" (that does not terminate
   the RTP session) carries certain unnecessary baggage which appears
   to make it undesirable for MANEs.  The following paragraph can
   either be fixed into RFC 3550 style and logic (thereby removing an
   operation point we consider desirable), or we would need to explain
   in detail what we want to do (not really congestion control related
   and long).  Perhaps we refer to the detailed discussions in the CCM
   draft...  Added to open issues.


Wenger, Wang, Schierl      Standards Track                    [page 14]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006

   In both cases, the incoming RTP session is terminated in the MANE,
   and a second RTP session originates at the MANE.  The MANE acts as
   an RTP translator.  The concept of scalability keeps the
   implementation and computational effort within the MANE low, and
   avoids expensive and delay-intensive full transcoding (in the sense
   of reconstruction and re-encoding).

   When scalable layers are transported in their own RTP sessions, an
   RTP receiver SHOULD unsubscribe to one or more enhancement layers
   when it senses congestion, similar to what has been described in
   [McCanne/Vetterli].  This behavior could perhaps be sufficient to
   ease the network load to an acceptable level of congestion.
   Nevertheless, it MUST follow the mechanisms described in section 12
   of [RFC3984].


12.  IANA Consideration

   [Edt. note: A new MIME type should be registered from IANA.]

13.  Informative Appendix: Application Examples

13.1.     Introduction

   Scalable video coding is a concept that has been around at least
   since MPEG-2 [MPEG2], which goes back as early as 1993.
   Nevertheless, it has never gained wide acceptance; perhaps partly
   because applications didn't materialize in the form envisioned
   during standardization.

   MPEG and JVT, respectively, performed a requirement analysis before
   the SVC project was launched.  Dozens of scenarios have been
   studied.  While some of the scenarios appear not to follow the most
   basic design principles of the Internet -- and are therefore not
   appropriate for IETF standardization -- others are clearly in the
   scope of IETF work.  Of these, this draft chooses the following
   subset for immediate consideration.  Note that we do not reference
   the MPEG and JVT documents directly; partly, because at least the
   MPEG documents have a limited lifespan and are not publicly
   available, and partly because the language used in these documents
   is inappropriately video centric and imprecise, when it comes to
   protocol matters.

   With these remarks, we now introduce three main application
   scenarios that we consider as relevant, and that are implementable
   with this specification.

13.2.     Layered Multicast

   This well-understood form of the use of layered coding
   [McCanne/Vetterli] implies that all layers are individually conveyed
   in their own RTP session using their own IP multicast address.
   Receivers "tune" into the layers by subscribing to the IP multicast,
   normally by using IGMP [IGMP].  Optimization forms could be
   envisioned in which a number of layers are sent combined in a single

Wenger, Wang, Schierl      Standards Track                    [page 15]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006

   RTP session; but these optimizations are currently not considered in
   this document.

   Layered Multicast has the great advantage of simplicity and easy
   implementation.  However, it has also the great disadvantage of
   utilizing many different ports.  While we consider this not to be a
   major problem for a professionally maintained content server,
   receiving client endpoints need to open many ports to IP multicast
   addresses in their firewalls.  This is a practical problem from a
   firewall/NAT viewpoint.  Furthermore, even today IP multicast is not
   as widely deployed as many wish.

   We consider layered multicast an important application scenario for
   three reasons.  First, it is well understood and the implementation
   constraints are well known.  There may well by large scale IP
   networks outside the immediate Internet context that may wish to
   employ layered multicast in the future.  One possible example could
   be a combination of content creation and core-network distribution
   for the various mobile TV services, e.g. those being developed by
   3GPP (MBMS) [MBMS] and DVB (DVB-H) [DVB-H].  Finally, when one base
   and one enhancement layer is in use and are being conveyed
   separately, that represents one operation point of layered
   multicast.

13.3.     Streaming of an SVC scalable stream

   In this scenario, a streaming server has a repository of stored SVC
   coded layers for a given content.  At the time of streaming, and
   according to the capabilities and connectivity of the client(s), the
   streaming server generates a scalable stream.  This scalable stream
   is served to the client(s).  Both unicast and multicast serving is
   possible.  At the same time, the streaming server may use the same
   repository of stored layers to compose different streams (with a
   different set of layers) intended for different audiences.

   As every endpoint receives only a single SVC RTP session, the number
   of firewall pinholes can be optimized.  In fact, only a single
   firewall pinhole is required.

   The main difference between this scenario and straightforward
   simulcasting lies in the architecture and the requirements of the
   streaming server, and is therefore out of the scope of IETF
   standardization.  However, compelling arguments can be made why such
   a streaming server design makes sense.  One possible argument is
   related to storage space and channel bandwidth.  Another is
   bandwidth adaptivity without transcoding -- a considerable advantage
   in a congestion controlled network.  When the streaming server
   learns about congestion, it can reduce sending bitrate by choosing
   fewer layers when composing the layered stream.  SVC is designed to
   gracefully support both bandwidth rampdown and bandwidth rampup with
   a considerable dynamic range.  This payload format is designed to
   allow for bandwidth flexibility in the mentioned sense, both for CGS
   and FGS layers.  While, in theory, a transcoding step could achieve
   a similar dynamic range, the computational demands are impractically


Wenger, Wang, Schierl      Standards Track                    [page 16]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006

   high and video quality is typically lowered -- therefore, few (if
   any) streaming servers implement full transcoding.

13.4.     Multicast to MANE, SVC scalable stream to endpoint

   This final scenario is a bit more complex, and designed to optimize
   the network traffic in a core network, while still requiring only a
   single pinhole in the endpoint's firewall.  One of its key
   applications is the mobile TV market.

   Consider a large IP network, e.g. the core network of 3GPP.
   Streaming servers within this core network can be assumed to be
   professionally maintained.  We assume that these servers can have
   many ports open to the network and that layered multicast is a real
   option.  Therefore, we assume that the streaming server multicasts
   SVC scalable layers, instead of simulcasting different
   representations of the same content at different bit rates.

   Also consider many endpoints of different classes.  Some of these
   endpoints may not have the processing power or the display size to
   meaningfully decode all layers; other may have these capabilities.
   Users of some endpoints may not wish to pay for high quality and are
   happy with a base service, which may be cheaper or even free.  Other
   users are willing to pay for high quality.  Finally, some connected
   users may have a bandwidth problem in that they can't receive the
   bandwidth they would want to receive -- be it through congestion,
   change of service quality, or for whatever other reasons.  However,
   all these users have in common that they don't want to be exposed
   too much, and therefore the number of firewall pinholes need to be
   small.

   This situation can be handled best by introducing middleboxes close
   to the edge of the core network, which receive the layered multicast
   streams and compose the single SVC scalable bit stream according to
   the needs of the endpoint connected.  These middleboxes are called
   MANEs throughout this specification.  In practice, we envision the
   MANE to be part of (or at least physically and topologically close
   to) the base station of a mobile network, where all the signaling
   and media traffic necessarily are multiplexed on the same physical
   link.  This is why we do not worry too much about decomposition
   aspects of the MANE as such.

   Edt. note: In the following paragraph, Mixers and Translators need
   to be clarified.

   MANEs necessarily need to be fairly complex devices.  They certainly
   need to understand the signaling, so, for example, to associate the
   PT octet in the RTP header with the SVC payload type.  Furthermore,
   they terminate the multicasted layered RTP sessions coming in from
   the core network side, and create new RTP sessions (perhaps even
   multicast sessions) to the endpoints connected to them.  In RTP
   terminology, it appears that MANEs necessarily are mixers AND
   translators; a MANE first mixes the content of one or more incoming
   RTP streams, and then "translates" it into the outgoing stream
   (which may involve pruning FGS coded NAL units and similar tasks).

Wenger, Wang, Schierl      Standards Track                    [page 17]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006


   While the implementation complexity of a MANE, as discussed above,
   is fairly high, the computational demands are comparatively low.  In
   particular, SVC and/or this specification contain means to easily
   generate the correct inter-layer decoding order of NAL units.  It is
   also simple to identify the fine granularity scalable bits in a
   given NAL unit.  No serious bit-oriented processing is required and
   no significant state information (beyond that of the signaling and
   perhaps the SVC sequence parameter sets) need to be kept.

   Finally, another scenario with very similar properties could be
   implemented in which the streaming server would send a single SVC
   scalable stream (containing basically all available scalable layers)
   to the MANE, and the MANE de-layers this scalable bit stream into
   its individual layers, before further processing.

13.5.     Scenarios currently not considered for complexity reasons


   -- vacat --

13.6.     Scenarios currently not considered for being unaligned with
          IP philosophy

   Remarks have been made that the current draft does not take into
   consideration at least one application scenario which some JVT folks
   consider important.  In particular, their idea is to make the RTP
   payload format (or the media stream itself) self-contained enough
   that a stateless, non signaling aware device can "thin" an RTP
   session to meet the bandwidth demands of the endpoint.  They call
   this device a "Router" or "Gateway", and sometimes a MANE.
   Obviously, it's not a Router or Gateway in the IETF sense.  To
   distinguish it from a MANE as defined in RFC3984 and in this
   specification, let's call it a MDfH (Magic Device from Heaven).

   To simplify discussions, let's assume point-to-point traffic only.
   The endpoint has a signaling relationship with the streaming server,
   but it is known that the MDfH is somewhere in the media path (e.g.
   because the physical network topology ensures this).  It has been
   requested, at least implicitly through MPEG's and JVT's requirements
   document, that the MDfH should be capable to intercept the SVC
   scalable bit stream, modify it by dropping packets or parts thereof,
   and forwarding the resulting packet stream to the receiving
   endpoint.  It has been requested that this payload specification
   contains protocol elements facilitating such an operation, and the
   argument has been made that the NRI field of RFC 3984 serves exactly
   the same purpose.

   The authors of this I-D do not consider the scenario above to be
   aligned with the most basic design philosophies the IETF follows,
   and therefore have not addressed the comments made (except through
   this section).  In particular, we see the following problems with
   the MDfH approach):


Wenger, Wang, Schierl      Standards Track                    [page 18]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006

   - As the very minimum, the MDfH would need to know which RTP streams
     are carrying SVC.  We don't see how this could be accomplished but
     by using a static payload type.  None of the IETF defined RTP
     profiles envision static payload types for SVC, and even the de-
     facto profiles developed by some application standard
     organizations (3GPP for example) do not use this outdated concept.
     Therefore, the MDfH necessarily needs to be at least "listening"
     to the signaling.
   - If the RTP packet payload were encrypted, it would be impossible
     to interpret the payload header and/or the first bytes of the
     media stream.  We understand that there are crypto schemes under
     discussion that encrypt only the last n bytes of an RTP payload,
     but we are more than unsure that this is fully in line with the
     IETF's security vision.

   Even if the above two problems would have been overcome through
   standardization outside of the IETF, we still foresee serious design
   flaws:

   - An MDfH can't simply dump RTP packets it doesn't want to forward.
     It either needs to act as a full RTP Translator (implying that it
     patches RTCP RRs and such), or it needs to patch the RTP sequence
     numbers to fulfill the RTP specification.  Not doing either would,
     for the receiver, look like the gaps in the sequence numbers
     occurred due to unintentional erasures, which has interesting
     effects on congestion control (if implemented), will break pretty
     much every meta-payload ever developed, and so on.  (Many more
     points could be made here).
   - An MDfH also can't "prune" FGS packets.  Again, doing so would not
     be compatible with meta payloads, and would mess up RTCP RRs and
     congestion control (if the congestion control is based on octet
     count and not on packet count; there are discussions related to
     the former at least in the context of TFRC).

   In summary, based on our current knowledge we are not willing to
   specify protocol mechanisms that support an operation point that has
   so little in common with classic RTP use.


14.  Informative Appendix: NAL Unit Re-ordering for Layered Multicast

14.1.     Examples

   In layered multicast, the base layer, one or more enhancement
   layers, or the base layer and one or more enhancement layers may be
   transmitted within a separate RTP session, i.e. the NAL units
   required for decoding an access unit of a certain operation point of
   the scalable bitstream may be distributed in different RTP sessions.
   After receiving NAL units from different RTP sessions, restoring of
   the decoding order of NAL units is required.  Since SVC typically
   exploits temporal frame re-ordered structures for increased coding
   efficiency, the decoding order of access units may not match their
   presentation order.  If the interleaved packetization mode is used
   in any RTP session, then de-interleaving within that RTP session
   must be first processed.

Wenger, Wang, Schierl      Standards Track                    [page 19]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006


   1) Example for 3 RTP sessions, each carrying a different set of
      layers of the SVC bitstream without NAL unit interleaving

   An example for temporal re-ordering of SVC access units and
   transmission of 11 different possible operation points within 3 RTP
   sessions is given below.  A, B and C represent the RTP sessions
   carrying SVC layers for different operation points.  'A' contains
   the base layer (DID = 0) with the second lowest temporal resolution
   and its FGS quality enhancement (TL-DID-QL values: 0,0,0; 0,0,1;
   1,0,0; 1,0,1), 'B' contains the second layer (DID = 1) with a higher
   temporal level than 'A' (TL-DID-QL values: 0,1,0; 1,1,0; 2,1,0), 'C'
   contains a temporal enhancement to the layer contained in 'B' and a
   FGS quality enhancement to this layer 'C' (TL-DID-QL values: 1,1,1;
   2,1,1; 3,1,0; 3,1,1).

   Tree of the SVC stream showing dependencies of operation points
   identified by the TL-DID-QL values per RTP session:

   A:^                   000
     |                 /  |  \
     |                /  100  001
     |               /   / \  /
     v              /   /   101
   B:^             /   /
     |           010  /
     |            \  /
     |             110
     |            /   \
     |           /    210
     v          /     /  \
   C:^        111    /    \
     |           \  /     310
     |            211    /
     |              \   /
     v               311

   Figure 1. SVC bistream dependency tree


   Decoding order and dependency of NAL units per RTP session:

   A: -(1,2)-(3,4)---------------------------------------(5,6)--(7,8)-
        |     |                                           |      |
   B: -(1)---(2)--(3)---(4)------------------------------(5)----(6)---
        |     |    |     |                                |      |
   C: -(1)---(2)--(3)---(4)--(5,6)-(7,8)-(9,10)-(11,12)--(13)---(14)--
   ------------------------------------------------------------------->
   TL: <0>   <1>  <2>   <2>  <3>   <3>   <3>    <3>      <0>    <1>
   TS: [8]   [4]  [2]   [6]  [1]   [3]   [5]    [7]      [16]   [12]

   Key:
   A, B, C                - RTP sessions
   Integer values in '()' - NAL unit decoding order per RTP session
   '( )'                  - groups the NAL units of an access unit in

Wenger, Wang, Schierl      Standards Track                    [page 20]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006

                            an RTP session
   '|'                    - indicates layer dependency
   Integer values in '[]' - (Presentation) Timestamp (TS)
   Integer values in '<>' - Temporal Level (TL)

   Figure 2. Distribution of SVC NAL units among different RTP sessions
             in Layered Mulicast transmission


   The re-ordered decoding order for all operation points of RTP
   session C is the following:

   A(1,2)B(1)C(1), A(3,4)B(2)C(2), B(3)C(3), B(4)C(4), C(5,6), C(7,8),
   C(9,10), C(11,12), A(5,6)B(5)C(13), A(7,8)B(6)C(14).

   The decoding order of NAL units received from RTP sessions A, B and
   C has to be restored after reception.  Therefore an initial
   buffering of NAL units received per RTP session is required.  NAL
   units belonging to the same access unit are identified by having
   identical timestamps. The timestamps of different sessions are
   aligned beforehand.  Therefore NAL units of the same instance of
   time are re-assembled to access units.  While keeping the decoding
   order of NAL units per RTP session, the NAL units with the same time
   stamp are re-ordered to access units.  The dependency information of
   the sprop-scalability-info and sprop-transport-priority parameters
   may be required for this operation.  Note: The decoding order,
   presentation order and transmission order of NAL units may vary from
   each other, i.e. time stamps are not monotonically increasing with
   the transmission (and decoding) order of the NAL units.

   In case of using the non-interleaved mode, the decoding order of NAL
   units within a RTP session is given by the transmission order, which
   is indicated by the RTP sequence number.  If an amount of NAL units
   is received and initially buffered for each RTP session, re-ordering
   of NAL units can be applied.  Alternatively, an initial buffering
   time is waited before NAL unit reordering is applied for all the RTP
   sessions of the layered multicast transmission. In any case,
   initially buffer amount of NAL units or the initial buffering time
   shall guarantee correct NAL unit re-ordering with all valid
   combinations of operation points of the scalable stream.  The
   initial buffering time for each RTP session is defined as the
   maximum value of (transmission time of the NAL unit - decoding time
   of an NAL unit) in terms of RTP timestamp time scale, assuming
   reliable and instantaneous transmission and the same timeline for
   transmission and decoding.
   The re-combining of layers transported in different RTP sessions to
   operation points of the scalable bitstream may be applied by using
   the information provided by the sprop-scalability-info and the
   sporp-transport-priority parameters in order to maintain integrity
   of the resulting SVC bitstream.  See also note at end of this
   example 1).


   Summarized re-ordering process for layered multicast:


Wenger, Wang, Schierl      Standards Track                    [page 21]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006

   o Timestamp values are aligned for all the RTP sessions.
   o Decoding order within a RTP session is derived from the
     transmission order if using the non-interleaved mode. If using the
     Interleaved mode, the Decoding Order Number (DON) must be used to
     recover decoding order from transmission order in a RTP session.
   o After reception of a safe amount of RTP packets or after a certain
     initial-buffering time per RTP session, re-ordering process of NAL
     units to decoding order can be started.
   o NAL units belonging to one access unit are identified by an
     identical timestamp value.
   o The dependency of layers contained in various RTP sessions may be
     derived form the sprop-scalability-info and the
     sprop-transport-priority parameters.

   Note:
   If layers of different operation points are combined to one RTP
   session, which do not directly or indirectly reference a layer
   contained in this session, NAL unit re-ordering may be applied by
   using the transport priority indicator of each NAL unit, which could
   be very painful.  This may be the case, if different hierarchical
   dependencies of the operation points are possible, as shown in the
   following example with RTP sessions U, V and W. All layers indicated
   by their TL-DID-QL values are part of the same time instance:

   U:^                   000
     v                   /  \
   V:^                 010   \
     v                  \     \
   W:^                   \    001
     |                    \   /
     v                     011

   RTP session U contains the base layer (DID=0), session V the spatial
   enhancement (DID=1) and session W quality enhancement for both
   layers.  Re-ordering session U and V is simple as described before.
   For inserting session W into sessions U and V each NAL unit header
   may be parsed for identifying correct decoding order.

   A starting point of a discussion on a possible solution for this
   issue can be found in 14.2.


   2) Example for 3 RTP sessions, each carrying a different set of
      layers of the SVC bitstream without NAL unit interleaving but
      with packet losses

   If packet loss is present, NAL unit re-ordering may become
   complicated.  Let us assume NAL units B(3), B(4) and B(5) are lost
   (indicated by XXX) in the following scheme.


Wenger, Wang, Schierl      Standards Track                    [page 22]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006

   A: -(1,2)-(3,4)---------------------------------------(5,6)--(7,8)-
        |     |                                           |      |
   B: -(1)---(2)---XXX------------------------------------------(6)---
        |     |    |     |                                |      |
   C: -(1)---(2)--(3)---(4)--(5,6)-(7,8)-(9,10)-(11,12)--(13)---(14)--
   ------------------------------------------------------------------->
   TL: <0>   <1>  <2>   <2>  <3>   <3>   <3>    <3>      <0>    <1>
   TS: [8]   [4]  [2]   [6]  [1]   [3]   [5]    [7]      [16]   [12]

   Figure 3. Packet loss in Layered Multicast


   The re-ordered decoding order for all operation points of RTP
   session C is the following:

   A(1,2)B(1)C(1), A(3,4)B(2)C(2), XXX C(3), XXX C(4), C(5,6), C(7,8),
   C(9,10), C(11,12), A(5,6) XXX C(13), A(7,8)B(6)C(14).

   In this case the receiver would not be able to correctly decode the
   access units of timestamps [2] and [6]. Additionally the access
   units (following in decoding order) of timestamps [1], [3], [5] and
   [7] would not be correctly decode-able, although these access units
   are not directly affected by the loss.  Further the complete
   operation point contained in session B and C must be discarded
   following NAL unit B(5), since B(5) is also missing. Therefore a re-
   ordering algorithm must determine the transport priority of each
   received NAL unit following the packet loss. It cannot be determined
   how many NAL units are missing, thus the integrity of each re-
   constructed access unit must be verified with the decoding
   dependency information of the sprop-scalability-info.

   Note: The issue described above is especially important, if the
   receiving node is a MANE, which intends to combine different streams
   to new RTP sessions containing valid operation points.


   3) Example for 3 RTP sessions, each carrying a different set of
      layers of the SVC bitstream with NAL unit interleaving

   The example is similar to example 1, but transmission order of the
   NAL units of RTP session A and B has changed, e.g. for increasing
   error robustness.

   A: -(1,2)-(5,6)-(3,4)-(7,8)-----------------------------------------
        |     |     |     |
   B: -(1)---(5)---(2)---(6)---(3)---(4)-------------------------------
        |     |     |     |     |     |
   C: -(1)---(14)--(2)---(13)--(3)---(4)--(5,6)-(7,8)-(9,10)-(11,12)--
   ------------------------------------------------------------------->
   TL: <0>   <0>   <1>   <1>   <2>   <2>  <3>   <3>   <3>    <3>
   TS: [8]   [16]  [4]   [12]  [2]   [6]  [1]   [3]   [5]    [7]

   Figure 4. NAL unit interleaving in Layered Multicast


Wenger, Wang, Schierl      Standards Track                    [page 23]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006

   The re-ordered decoding order for all operation points of RTP
   session C is the following and the same as in example 1:

   A(1,2)B(1)C(1), A(3,4)B(2)C(2), B(3)C(3), B(4)C(4), C(5,6), C(7,8),
   C(9,10), C(11,12), A(5,6)B(5)C(13), A(7,8)B(6)C(14).

   In this case first the decoding order of RTP sessions A and B must
   be restored by using the Decoding Order Number (DON) of the
   interleaved packetization mode.  After the de-interleaving process a
   process equal to example 1 can be applied in order to restore the
   decoding order of NAL units received from the different RTP
   sessions.  Using the interleaved mode in some or all RTP sessions is
   unproblematic in layered multicast.


14.2.     Discussion: Using enhanced DON over different RTP sessions

   NAL unit re-ordering over different RTP sessions can lead to
   complicated search operations in each receiver-buffer of these RTP
   sessions for recovering decoding order, i.e. analyzing timestamps
   and decoding dependency (transport priority) of the NAL units may be
   required, as mentioned in section 14.1.  This problem could be
   solved by using an extended Decoding Order Number (DON) [RFC3984]
   value, which is increased with NAL unit decoding order over
   different RTP sessions.  Such an extended DON may save much of the
   complexity of the re-ordering process in the receiving node at the
   cost of the additional signaling overhead.


15.  Acknowledgements

   Funding for the RFC Editor function is currently provided by the
   Internet Society.


16.  References

16.1.     Normative References

[RFC3550]   Schulzrinne, H., Casner, S., Frederick, R., and V.
            Jacobson, "RTP: A Transport Protocol for Real-Time
            Applications", STD 64, RFC 3550, July 2003.
[MPEG4-10]  ISO/IEC International Standard 14496-10:2003.
[H.264]     ITU-T Recommendation H.264, "Advanced video coding for
            generic audiovisual services", May 2003.
[SVC]       Joint Video Team, "Joint Scalable Video Model JSVM-4 Annex
            G", available from http://ftp3.itu.ch/av-arch/jvt-site/
            2005_10_Nice/JVT-Q202.zip., October 2005
[RFC3984]   Wenger, S., Hannuksela, M, Stockhammer, T, Westerlund, M,
            Singer, D, "RTP Payload Format for H.264 Video", RFC 3984,
            February 2005
[RFC2119]   Bradner, S., "Key words for use in RFCs to Indicate
            Requirement Levels", BCP 14, RFC 2119, March 1997.


Wenger, Wang, Schierl      Standards Track                    [page 24]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006

16.2.     Informative References

[DVB-H]     DVB - Digital Video Broadcasting (DVB); DVB-H
            Implementation Guidelines, ETSI TR 102 377, 2005
[IGMP]      Cain, B., Deering S., Kovenlas, I., Fenner, B. and
Thyagarajan, A., "Internet Group Management Protocol, Version 3", RFC
3376, October 2002.
[McCanne/Vetterli]  V. Jacobson, S. McCanne and M. Vetterli. Receiver-
            driven layered multicast. In Proc. of ACM SIGCOMM'96, pages
            117--130, Stanford, CA, August 1996.
[MBMS]      3GPP - Technical Specification Group Services and System
            Aspects; Multimedia Broadcast/Multicast Service (MBMS);
            Protocols and codecs (Release 6), December 2005
[MPEG2]     ISO/IEC International Standard 13818-2:1993.


17.  Author's Addresses

   Stephan Wenger                 Phone: +358-50-486-0637
   Nokia Research Center          Email: stewe@stewe.org
   P.O. Box 100
   FIN-33721 Tampere
   Finland

   Ye-Kui Wang                    Phone: +358-50-486-7004
   Nokia Research Center          Email: ye-kui.wang@nokia.com
   P.O. Box 100
   FIN-33721 Tampere
   Finland

   Thomas Schierl                 Phone: +49-30-31002-227
   Fraunhofer HHI                 Email: schierl@hhi.fhg.de
   Einsteinufer 37
   D-10587 Berlin
   Germany

18.  Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any

Wenger, Wang, Schierl      Standards Track                    [page 25]

INTERNET-DRAFT  Scalable Video Codec RTP Payload Format  February 2006

   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


19.  Disclaimer of Validity

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


20.  Copyright Statement

   Copyright (C) The Internet Society (2006).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.

21.  RFC Editor Considerations

   none

22.  Open Issues

   1. Signaling: Guidance from AVT mailing list: try to come up with
   media independent signaling for layered codecs.  Needs to go into a
   new draft in MMUSIC, as it looks.
   2. Cross-Layer DON, see 14.2  Is that acceptable?  It would solve
   many problems, but at the expense of cross-session fields in a
   payload header.  Also, DON has known IPR.
   3. Need to clarify MANE, Mixers, and Translators throughout the
   document (consistently with RFC 3550).
   4. Packetization rules need work ones 3) is addressed
   5. Alignment with JVT spec (ongoing)


23.  Changes Log

04.02.2006, StW: Added details to scope
04.02.2006, StW: Added short subsection 6.1 "Design Principles"
04.02.2006, StW: Added section 15, "Application Examples"
06.02 - 03.03.2006, YkW: Various modifications throughout the document
13.02.2006 - 03.03.2006 , ThS: Added definitions and additional
information to section 3.3, 5.1, 7 and 8, parameters in section 9.1 and
added section 14 for NAL unit re-ordering for layered multicast.
Further modifications throughout the document
06.03.2006, StW: Editorial improvements


Wenger, Wang, Schierl      Standards Track                    [page 26]