INTERNET-DRAFT 19 October 2003 Internet Engineering Task Force Expires: 19 April 2004 Audio/Video Transport Working Group Alan Tseng Gamze Seckin Vidiator Technology US Inc. RTCP Streaming Extended Reports draft-tseng-avt-rtcp-streaming-extens-00.txt Status of this Memo This document is an Internet-Draft and is subject to all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract This document defines a new streaming report block for the Extended Report (XR) packet type for the RTP Control Protocol (RTCP) [1]. XR packets are an extension to the reception report blocks of RTCP sender report (SR) or Receiver Report (RR) packets. In [1] seven report blocks are defined to receive various information that helps the sender to monitor the network and assess quality of VoIP delivery. In this document we define a new report block for the XR packet, namely the streaming report block, to receive statistics about the quality of an audio and/or video streaming session. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Tseng, Seckin Expires April 19, 2004 [Page 1] Internet-Draft RTCP Streaming XR October 2003 Table of Contents 1. Introduction .............................................2 2. Generic Definitions.......................................2 3. Streaming Quality Metric Definitions......................3 3.1 Reporting Period......................................4 3.2 Good Frame............................................4 3.3 Corruption............................................4 3.3.1 Video Corruptions...............................6 3.3.2 Audio Corruptions...............................6 3.4 Rebuffering...........................................7 3.5 Per media reporting...................................8 4. Streaming Report Block for XR Packets.....................8 5. Streaming Report Block Interval..........................12 6. The SDP attribute........................................12 7. IANA Considerations .....................................13 8. Security Considerations .................................13 9. Intellectual Property ...................................13 10. Acknowledgments .........................................13 11. Authors' Addresses ......................................14 12. References ..............................................14 13. Full Copyright Statement ................................14 1. Introduction This document defines a new streaming report block for the Extended Report (XR) packet type for the RTP Control Protocol (RTCP) [1]. XR packets are an extension to the reception report blocks of RTCP sender report (SR) or Receiver Report (RR) packets. In [1] seven report blocks are defined to receive various information that helps to monitor the network and assess quality of VoIP delivery. In this document we define a new report block for the XR packet, namely the streaming report block, to receive statistics about the quality of an audio and/or video streaming session. As all the other report blocks defined in [1], the streaming report blocks are applicable to one-to-one or one-to-many applications that generate RTCP SR or RR reports. 2. Generic Definitions AVP (Audio and Video Profile) module: An abstract module responsible for presenting data to an individual decoder for decoding (Figure 1). For purposes of the XR report, the AVP module is responsible for parsing AVP syntax and all lower layers including RTP, UDP, etc. Tseng, Seckin Expires April 19, 2004 [Page 2] Internet-Draft RTCP Streaming XR October 2003 Decoder module: An abstract module responsible for decoding an input raw bitstream, and outputting timestamped multimedia data such as video frames or PCM audio. Player application: An abstract entity that encompasses one or more AVP modules, decoder modules, and XR report modules, and is responsible for composing, rendering, and synchronizing a multimedia presentation. For the purposes of the XR report, the player application is responsible for configuring the XR module with timing information, and reporting. XR Report module: An abstract module responsible for collecting data from the AVP module, the decoder module, and player application, and compiling them into an XR Report. XR Report: An RTCP extension block defined in this document. +---------------------------------------------------+ | Player application | +---------------------------------------------------+ ^ | ^ | | | | | | v | v +---------+ +---------+ +---------+ +---------+ | Decoder |-->| | | Decoder |-->| | +---------+ | | +---------+ | | ^ | XR | ^ | XR | | | Report | | | Report | | | Block | | | Block | +---------+ | | +---------+ | | | AVP |-->| | | AVP |-->| | +---------+ +---------+ +---------+ +---------+ ^ | ^ | | | | | | v | v +---------+ +---------+ +---------+ +---------+ | RTP | | RTCP | | RTP | | RTCP | +---------+ +---------+ +---------+ +---------+ Figure 1: Illustration of data flows in a sample streaming multimedia playback application. 3. Streaming Quality Metric Definitions As defined in [2] there are several metrics that can help determining the quality of experience in a streaming session. Examples of poor quality of experience could include: Tseng, Seckin Expires April 19, 2004 [Page 3] Internet-Draft RTCP Streaming XR October 2003 o macroblock corruption for video o bursts of garbled, statics, or silent audio o rebuffering periods in which a video presentation 'freezes'. Based on [2] the basic streaming quality metrics definitions are as follows: 3.1 Reporting period The reporting period is the period over which a set of metrics is calculated. All timestamps reported by the XR packet SHALL be RTP timestamps. All durations SHALL be in units of the RTP timebase. It is possible that the decoder and network layers use different timelines (e.g., the decoder reports MPEG-4 timestamps and the AVP layer reports RTP timestamps). If the decoder does not report corruption periods in RTP timestamps, the application is responsible for configuring the XR report block to use correct timing and synchronization information. A list of possible methods follows; this list is by no means exhaustive. - If the decoder specifies corruption intervals as NPT times, the application may provide NPT to RTP mapping, obtained from RTSP Rtp-Info header. - If the decoder specifies corruption intervals as RTP times, the application does not need to provide any additional information. 3.2 Good frame Ideally a good frame is a decoded video or audio frame received by the client that matches the frame sent by the server. However, since the client can not compare these directly, a good frame is defined as a decoded video or audio frame for which no corruption is detected. 3.3. Corruption A media (audio/speech/video) corruption period is defined as period where the media (audio/speech/video) has freezes/gaps or quality degradations. Corruption period in time is defined from start of the first corrupted media (audio/speech/video) decoded frame to the start of first subsequent decoded good frame or the end of reporting period (whichever is sooner) not including the buffering freezes/gaps and pause freezes/gaps. Tseng, Seckin Expires April 19, 2004 [Page 4] Internet-Draft RTCP Streaming XR October 2003 In the example in Figure 2, a block of contiguous X's and Y's represent a single corruption period. The X's represent corruption due to a lost or corrupted packets. The Y's represent corruption due to dependencies on corrupted X's (e.g., video P-frames). +------------------------------------------------------- Video | |XXYYYYYYYY| +------------------------------------------------------- +------------------------------------------------------- Audio | |X| |X| |XXX| +------------------------------------------------------- | Playback |---------|---------|---------|---------|---------|-----> t Time (s) 0 1 2 3 4 5 Figure 2. Example AV presentation with corrupted playback As mentioned in [2] errors that cause corruption can be detected in two places: - at the AVP module: Packet drop and packet errors can sometimes be detected by incorrect or missing sequence number or checksum. The AVP module records the RTP timestamp of the RTP packet containing an error, and marks the duration of the RTP packet as a corruption. Duration of an RTP packet varies according to codec and AVP profile. - at the decoder module: The decoder can sometimes detect bitstream errors that result in illegal syntax for run length encoding, Huffman tables, RVLC, etc. The decoder may or may not be aware of RTP timestamps, their relationship with the normal play time (NPT), or synchronization with other media decoders. In some cases, the errors detected at the network level will not be detected at the decoder level; in other cases errors detected at the decoder level will not be detected at the network level. In either case, the level that detects the error will mark the timeline for that media stream as corrupted. In other cases, an error may be detected by both the network and decoder levels. In such case, to be conservative, the union of the corruption periods will be marked on the timeline. The following errors can be detected by the AVP module and reported as a corruption: - packet corrupted (e.g., checksum failed) - packet dropped (e.g., sequence number missing) - frame incomplete (e.g., sequence number missing, RTP marker bit unset) Tseng, Seckin Expires April 19, 2004 [Page 5] Internet-Draft RTCP Streaming XR October 2003 3.3.1 Video Corruptions The atomic unit over which a corruption may be detected at the decoder level is a video frame. The beginning of corruption is defined as the timestamp of the video frame in which the error is detected. In general, the end of corruption is defined as the timestamp of the first subsequent video frame that is decoded completely and successfully. The details are specific to the codec. For MPEG-4, H.263, and all other macroblock based codecs unless otherwise defined: a completely and successfully decoded I-frame always marks the end of a corruption. In addition, the end of a corruption is also marked by a video frame for which all corrupted areas of the video frame are known to have been replaced by successfully and completely decoded I-macroblocks. Whenever a corruption does not end before the end of an XR reporting period, the XR report block SHALL report the end of corruption as the end time of the XR reporting period. The beginning of the next XR report block SHALL mark the beginning of a new corruption duration. Whenever a corruption does not end before the end of playback, the XR report block SHALL report the end of corruption as the end time of playback. Potential reporting errors: It is possible that the video decoder completely decodes a frame with a bit error. This results in a false negative, in which the user actually perceives corrupted video, but the application reports the end of a corruption period, or no corruption at all. This problem may be prevented if the bit error is detected at the AVP module. It is also possible that a bit error will result in a deferred error, in which an actual error causes decoding to fail in the current frame, but the decoder does not detect it until the subsequent frame. In this case, the decoder reports the beginning of corruption period to be later than the user actually perceives. Once again, detecting the error at the AVP module reduces the chances of a false negative report. 2.3.2 Audio Corruptions The atomic unit over which an audio corruption can be detected at the decoder level is an audio frame. For the case of PCM audio, an audio frame is defined as a single sample. Tseng, Seckin Expires April 19, 2004 [Page 6] Internet-Draft RTCP Streaming XR October 2003 Optionally, any audio codec may instead report audio corruption periods with a resolution of one sample. Note however, that the XR block reports RTP timestamps in the RTP timebase; therefore, the actual XR reports may be limited in time resolution. The beginning of corruption is defined as the timestamp of the audio frame in which the error is detected. In general, the end of corruption is defined as the timestamp of the first subsequent audio frame that is decoded completely and successfully. The details are specific to the codec. For AAC, MP3, EVRC, AMR, AMR-WB and other frame-based subband audio compression or voice-codec unless otherwise defined: any audio frame that can not be successfully decoded due to dependencies on other audio frames that have been corrupted, (e.g., frames using AAC LTP tool, or MP3 frames with back pointers) are considered in error. Whenever a corruption does not end before the end of an XR reporting period, the XR report block SHALL report the end of corruption as the end time of the XR reporting period. The beginning of the next XR report block SHALL mark the beginning of a new corruption duration. Whenever a corruption does not end before the end of playback, the XR report block SHALL report the end of corruption as the end time of playback. Potential reporting errors: It is possible that the audio decoder completely decodes a frame with a bit error. This results in a false negative, in which the user actually perceives corrupted audio, but the player reports the end of a corruption period, or no corruption at all. This problem may be alleviated if the bit error is detected at the AVP module. It is also possible that a bit error will result in a deferred error in which an actual error causes decoding to fail in the current frame, but the decoder does not detect it until the subsequent frame. In this case, the decoder reports the beginning of corruption period to be later than the user actually perceives. Once again, detecting the error at the AVP module reduces the chances of a false negative report. 3.4 Re-buffering Re-buffering is defined as any stall in play time due to any involuntary event at the client side. Tseng, Seckin Expires April 19, 2004 [Page 7] Internet-Draft RTCP Streaming XR October 2003 A streaming multimedia player application typically begins playback only after accumulating a buffer of data from which to begin playback. If this buffer underflows (e.g., due to network congestion) the player application typically pauses playback until the buffer can be sufficiently refilled to resume playback. Such an interruption in playback negatively affects user experience and SHALL be reported in the XR packet. - Buffer underflow: decoder consumes data at a faster rate than buffer is filled. Certain interruptions in playback are specifically excluded from reporting. - User intentionally pauses playback. In addition, whenever a rebuffering period does not end before the end of an XR reporting period, the XR report block SHALL report the end of rebuffering period as the end time of the XR reporting period. The beginning of the next XR report block SHALL mark the beginning of a new rebuffering period. Whenever a rebuffering does not end before the end of playback, the XR report block SHALL report the end of corruption as the end time of playback. 3.5 Per media reporting Each of these metrics is recorded on independent timelines for each individual media. 4. Streaming Report Block for XR Packets An RTCP XR packet header is defined as in [1], as follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|reserved | PT=XR=207 | length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | report blocks | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3. RTCP XR header [1] version (V): 2 bits Identifies the version of RTP. This specification applies to RTP version two. Tseng, Seckin Expires April 19, 2004 [Page 8] Internet-Draft RTCP Streaming XR October 2003 padding (P): 1 bit If the padding bit is set, this XR packet contains some additional padding octets at the end. reserved: 5 bits This field is reserved for future definition. In the absence of such definition, the bits in this field MUST be set to zero and MUST be ignored by the receiver. packet type (PT): 8 bits Contains the constant 207 to identify this as an RTCP XR packet. length: 16 bits As defined in Section 3 in [1]. SSRC: 32 bits The synchronization source identifier for the originator of this XR packet. report blocks: variable length. Zero or more extended report blocks. In keeping with the extended report block framework defined in [1], each block MUST consist of one or more 32-bit words. The streaming report block is defined as follows: Tseng, Seckin Expires April 19, 2004 [Page 9] Internet-Draft RTCP Streaming XR October 2003 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | BT=8 |C|D|N|R| rsvd. | block length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Starting PTS (pts_start) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ending PTS (pts_stop) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | num_cor | num_rebuf | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | min_dur_cor | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | max_dur_cor | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | avg_dur_cor | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | std_dur_cor | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | max_dur_rebuf | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | min_dur_rebuf | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | avg_dur_rebuf | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | std_dur_rebuf | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4. streaming report block The block is encoded as twelve 32-bit words: block type (BT): 8 bits A streaming metrics report block is identified by the constant 8. Number of Corruptions Flag (C): 1 bit Duration of Corruptions Flag (D): 1 bit Number of times Player Re-buffered Flag (N): 1 bit Duration of Player Re-buffering Flag (R): 1 bit reserved: 4 bits This field is reserved for future definition. In the absence of such definition, the bits in this field MUST be set to zero and MUST be ignored by the receiver. Tseng, Seckin Expires April 19, 2004 [Page 10] Internet-Draft RTCP Streaming XR October 2003 block length: 16 bits As defined in Section 3 in [1]. pts_start: 32 bit The RTP timestamp that marks the beginning of the duration being reported. pts_stop: 32 bit The RTP timestamp that marks the beginning of the duration being reported. num_cor: 16 bits The number of corruption periods in this reporting period. num_rebuf: 16 bits The number of rebuffering periods in this reporting period. min_dur_cor: 32 bits The minimum duration of a corruption period during this reporting period. Units are in timebase of the RTP timestamp. max_dur_cor: 32 bits The maximum duration of a corruption period during this reporting period. Units are in timebase of the RTP timestamp. avg_dur_cor: 32 bits The average duration of a corruption period during this reporting period. Units are in timebase of the RTP timestamp. std_dur_cor: 32 bits The standard deviation of the duration of a corruption period during this reporting period. Units are in the square of the timebase of the RTP timestamp. This value is a single precision floating point number, according to the IEEE 754 standard (Figure 5). 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| exponent | mantissa | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5. Floating-point format for standard deviation. S: 1 bit Sign bit (0=positive, 1=negative). Note: for standard deviation this is always positive. Tseng, Seckin Expires April 19, 2004 [Page 11] Internet-Draft RTCP Streaming XR October 2003 Exponent: 8 bits Exponent, with bias of 127. Mantissa: 23 bits Mantissa. min_dur_rebuf: 32 bits The minimum duration of a rebuffering period during this reporting period. Units are in timebase of the RTP timestamp. max_dur_rebuf: 32 bits The maximum duration of a rebuffering period during this reporting period. Units are in timebase of the RTP timestamp. avg_dur_rebuf: 32 bits The average duration of a rebuffering period during this reporting period. Units are in timebase of the RTP timestamp. std_dur_rebuf: 32 bits The standard deviation of the duration of a rebuffering period during this reporting period. Units are in the square of the timebase of the RTP timestamp. This value is a single precision floating point number, according to the IEEE 754 standard (Figure 5). 5. Streaming Report Block Interval XR report blocks need not to be sent in each RTCP packet. Therefore, the interval between sequential XR report blocks may exceed the interval between sequential RTCP packets. It is the receivers responsibility to use the statistics for the specified interval in the report blocks. 6. The SDP attribute The usage of SDP is as specified in section 5 in [1]. An rtcp-xr-atrib is defined for streaming metrics as follows: rtcp-xr-attrib = "a=" "rtcp-xr" ":" [xr-format *(SP xr-format)] CRLF xr-format = streaming-metrics streaming-metrics = "streaming-metrics" Tseng, Seckin Expires April 19, 2004 [Page 12] Internet-Draft RTCP Streaming XR October 2003 7. IANA Considerations A request will be made to IANA to register the following block type: BT name -- ---- 8 Streaming Report Block 8. Security Considerations All security considerations addressed in [1] are valid for this document and no new solutions are proposed to address those considerations. 9. Intellectual Property The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. 10. Acknowledgments The Authors thank Jayank Bhalod, Yanda Ma, Raaghu Nagaraj, and Lalit Sarna from Vidiator Technology Us Inc. and Roland Banks from 3 UK for their valuable input. Funding for the RFC Editor function is currently provided by the Internet Society. Tseng, Seckin Expires April 19, 2004 [Page 13] Internet-Draft RTCP Streaming XR October 2003 11. Authors' Addresses Alan Tseng, alan@vidiator.com 888 Villa St. Suite 500 Mountain View, CA 94041 USA Gamze Seckin, gamze@vidiator.com 411 108th Ave NE, Suite 688 Redmond, WA 98004 USA 12. References [1] RTP Control Protocol Extended Reports (RTCP XR), Proposed Standard, Friedman, Cacares, Clark, May 2003, draft-ietf-avt-rtcp-report-extns-06.txt [2] 3GPP SA4 TDOC S4-030682 Draft Rel-6 Quality Metrics Permanent Document v0.06 13. Full Copyright Statement Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Tseng, Seckin Expires April 19, 2004 [Page 14]