Internet Engineering Task Force G. Liebl, T.Stockhammer Internet Draft LNT, Munich Univ. of Technology Document: draft-ietf-avt-uxp-01.txt November 2001 M. Wagner, J.Pandel, W. Weng, G. Baese, M. Nguyen, F. Burkert Expires: May 2002 Siemens AG, Munich An RTP Payload Format for Erasure-Resilient Transmission of Progressive Multimedia Streams Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 []. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 1. Abstract This document specifies an efficient way to ensure erasure-resilient transmission of progressively encoded multimedia sources via RTP using Reed-Solomon codes. The level of erasure protection can be explicitly adapted to the importance of the respective parts in the source stream, thus allowing a graceful degradation of application quality with increasing packet loss rate on the network. Hence, this type of unequal erasure protection (UXP) schemes is intended to cope with the rapidly varying channel conditions on wireless access links to the Internet backbone. Nevertheless, backward compatibility to currently standardized non-progressive multimedia codecs is ensured, since equal erasure protection (EXP) represents a subset of generic UXP. By defining a comparably simple payload format, the proposed scheme can be easily integrated into the existing framework for RTP. Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page1] Internet Draft Unequal Erasure Protection November 2001 2. Conventions used in this document The following terms are used throughout this document: 1.) Message block: a higher layer transport unit (e.g. an IP packet), that enters/leaves the segmentation/reassembly stage at the interface to wireless data link layers. 2.) Segment: denotes a link layer transport unit. 3.) CRC: Cyclic Redundancy Check, usually added to transport units at the sender to detect the existence of erroneous bits in a transport unit at the receiver. 4.) Segmentation/Reassembly Process: If the size of the transport units at the link layer is smaller than that at the upper layers, message blocks have to be split up into several parts, i.e. segments, which are then transmitted subsequently over the link. If nothing is lost, the original message block can be restored at the receiving entity (reassembly). 5.) Quality-of-service: application-dependent criterion to define a certain desired operation point. 6.) Codec: denotes a functional pair consisting of a source encoding unit at the sender and a corresponding source decoding unit at the receiver; usually standardized for different multimedia applications like audio or video. 7.) Progressive source coding: results in successive blocks of (source-)encoded data (e.g. a single video or audio frame), each of which can be viewed as a bitstream of certain length, whose distinct elements are of different importance to the reconstruction process at the decoder. Elements are commonly ordered from highest to least importance, where the latter elements depend on the previous. 8.) Reed-Solomon (RS) code: belongs to the class of linear nonbinary block codes, and is uniquely specified by the block length n, the number of parity symbols t, and the symbol alphabet. 9.) n: is a variable, which denotes both the block length of a RS codeword, and the number of columns in a TB (see 16). 10.) k: is a variable, which denotes the number of information symbols in a RS codeword. 11.) t: is a variable, which denotes the number of parity symbols in a RS codeword. 12.) Erasure: When a packet is lost during transmission, an erasure is said to have happened. Since the position of the erased packet in a sequence is usually known, a corresponding erasure marker can be set at the receiving entity. Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page 2] Internet Draft Unequal Erasure Protection November 2001 13.) Base layer: comprises the first and most important elements in a progressively encoded bitstream, without which all subsequent information is useless. 14.) Enhancement layer: comprises one or more sets of the less important subsequent elements in a progressively encoded bitstream. A specific enhancement layer can be decoded, if and only if the base layer and all previous enhancement layer data (of higher importance) is available. 15.) Info stream: denotes the final bitstream which has to be protected by the proposed UXP scheme. It usually consists of the (source-encoded) bitstream (progressive or not), which is already arranged according to a desired syntax (e.g. as specified in the respective RTP profile for the media codec in use). In any case, it is assumed that every info stream is already octet- aligned according to the standard procedures defined in the context of the used syntax specifications. 16.) Transmission block (TB): denotes a memory array of L rows and n columns. Each row of a TB represents a RS codeword, whereas each column, together with the respective UXP header (see 33) in front, forms the payload of a single RTP packet. Each TB consists of at least two distinct transmission sub blocks (TSB, see 17): The first L_s rows belong to the signaling TSB, whereas the last L_d=(L-L_s) rows belong to one or more data TSB. 17.) Transmission sub block (TSB): denotes a memory array of 0 +-+-+-+-+-+-+-+-+-+ |&|&|&|&|&|&|&|*|*| +-+-+-+-+-+-+-+-+-+ <------------><---> k=n-t t (&:info) (*:parity) Fig. 1: Structure of a systematic RS codeword 5. Progressive Source Coding If the output of a multimedia codec, be it audio or video, is said to be progressive, the encoded bitstream must consist of several distinct elements, often organized in separate layers. The latter shall be defined via their relative importance with respect to the quality of the reconstruction process at the receiver. Hence, there exists at least one layer, often called base layer, without which reconstruction fails at all, whereas all the other layers, often called enhancement layers, just help to continually improve the quality. Consequently, the different layers are usually contained in the (source-)encoded bitstream in decreasing order of importance, i.e. the base layer data is followed by the various enhancement layers. An example can be found in the fine granular scalability modes which have been proposed to various standardization bodies like MPEG-4 [4] or ITU (H.26L) [5], where the resolution of the scaling process in the progressive source encoder is as low as one symbol in the enhancement layer. From the above definition, it is quite obvious that the most important base layer data must be protected as strongly as possible against packet loss during transmission. However, the protection of the enhancement layers could be continually lowered, since a loss at this stage has only minor consequences for the reconstruction process. Thus, by using a suitable unequal erasure protection strategy across a progressive source stream, the overhead due to redundancy spent per (channel-)encoded block is reduced. Furthermore, if channel conditions get worse during transmission, only more and more enhancement layers are lost, i.e. a graceful degradation in application quality at the receiver is achieved [6]. Nevertheless, it should be mentioned that the specific structure of a (source-)encoded bitstream strongly depends on the actual media codec in use, and the desired syntax which is used for adapting the output of the codec to a suitable transport level format (see also 7.3). In order to keep the description of the unequal erasure protection strategy in section 6 as general as possible, the final bitstream which has to be protected by the proposed UXP scheme will be called "info stream" in the following. Furthermore, it is assumed Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page 7] Internet Draft Unequal Erasure Protection November 2001 that every info stream is already octet-aligned according to the standard procedures defined in the context of the used syntax specifications. 6. General Structure of UXP schemes In this section, the principle features of the proposed UXP scheme are described with a special focus on the protection and reconstruction procedure which is applied to the info stream. In addition, the behavior of the sender and receiver is specified as far as it concerns the reconstruction of the info stream. However, the complete UXP payload structure, including the additional UXP header, is described in section 7. Fig. 1 already illustrated the structure of a systematic codeword, which shall be represented by a single row and n successive columns that contain the information and the parity bytes. This structure shall now be extended by forming a transmission block (TB) consisting of L codewords of length n bytes each, which amounts to a total of L rows and n columns [7]: Each column, together with the respective UXP header in front, shall represent the payload of an RTP packet, i.e. the whole data of a TB is transmitted via a sequence of n RTP packets all carrying a payload of length (L+2) bytes (UXP header included). The value of L should be chosen in such a way that the whole length of the resulting IP packet (i.e. RTP payload plus sum of RTP, UDP, and IP header) equals a multiple of the segment size on the wireless link to avoid stuffing at the data link layer. Each TB usually consists of two or more horizontal slices, the so- called transmission sub blocks (TSB), as can be seen in Fig. 2: The first L_s rows always belong to the signaling TSB, which is used to convey the actual redundancy profile in the data part to the receiver (see 7.3). The following L_d=(L-L_s) rows belong to one or more data TSBs, which contain the interleaved and RS encoded info stream, as will be described below. Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page 8] Internet Draft Unequal Erasure Protection November 2001 Transmission Block (TB) /\ +-+-+-+-+-+-+-+-+-+ /\ | | signaling TSB | | L_s bytes | +-+-+-+-+-+-+-+-+-+ \/ | | | /\ /\ | + data TSB #1 + | L_d(1) bytes | | | | | | | +-+-+-+-+-+-+-+-+-+ \/ | L bytes | | | /\ | payload | + data TSB #2 + | L_d(2) bytes | per packet | + | | | L_d bytes | +-+-+-+-+-+-+-+-+-+ \/ | | | . | . | | + . + . | | | . | . | | +-+-+-+-+-+-+-+-+-+ /\ | | | data TSB #z | | L_d(z) bytes | \/ +-+-+-+-+-+-+-+-+-+ \/ \/ <-----------------> n packets Fig. 2: General structure of a TB Since the UXP procedure is mainly applied to the data TSBs, it will be described next, whereas the content and syntax of the signaling TSB will be defined in section 7.3. For means of simplification, only one single data TSB will be assumed throughout the following explanation of the encoding and decoding procedure. However, an extension to more than one data TSB per TB is straightforward, and will be shown in section 7.4. As depicted in Fig. 3, the rows of a transmission sub block shall be partitioned into T+1 different classes CA_i, where i=0...T, such that each class contains exactly A_i=|CA_i| consecutive rows of the matrix, where the A_i have to satisfy the following relationship: A_0+A_1+...+A_T=L_d Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page 9] Internet Draft Unequal Erasure Protection November 2001 Data Transmission Sub Block (data TSB) T <-------> /\ +-+-+-+-+-+-+-+-+-+ /\ | |&|&|&|&|&|*|*|*|*| | | +-+-+-+-+-+-+-+-+-+ | A_T=3 | |&|&|&|&|&|*|*|*|*| | | +-+-+-+-+-+-+-+-+-+ | L_d bytes | |&|&|&|&|&|*|*|*|*| \/ per packet | +-+-+-+-+-+-+-+-+-+ /\ | +%|%|%|%|%|%|*|*|*| | A_(T-1)=1 | +-+-+-+-+-+-+-+-+-+ \/ | |$|$|$|$|$|$|$|*|*| . | +-+-+-+-+-+-+-+-+-+ . | |¦|¦|¦|¦|¦|¦|¦|¦|*| . | +-+-+-+-+-+-+-+-+-+ /\ | |#|#|#|#|#|#|#|#|#| | A_0=1 \/ +-+-+-+-+-+-+-+-+-+ \/ <-----------------> n packets &,%,$,¦,# : info bytes belonging to a certain info stream in decreasing order of importance * : parity bytes gained from Reed-Solomon coding Fig. 3: General structure for coding with unequal erasure protection Furthermore, all rows in a particular class CA_i shall contain exactly the same number of parity bytes, which is equal to the index i of the class. For each row in a certain class CA_i, the same (n,n- i) RS code shall be applied. As can be observed from Fig. 3, class CA_T contains the largest number of parity bytes per row, i.e. offers the highest erasure protection capability in the block. Consequently, the most important element in the info stream must be assigned to class CA_T, where the value of T should be chosen according to the desired outage threshold of the application given a certain packet erasure rate on the link. All other classes CA_(T-1)...CA_0 shall be sequentially filled with the remaining elements of the info stream in decreasing order of importance, where the optimal choice for the size of each class (0 or more rows), i.e. the structure of the redundancy profile, should depend on the quality-of-service requirements for the various (progressively-encoded) layers. The following set of rules contains a compact description of all the operations that must be performed for each transmission block: 1.) The total number of columns n of the TB shall be chosen according to the actual delay constraints of the application. Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page 10] Internet Draft Unequal Erasure Protection November 2001 2.) Next, the expected number of rows reserved for the signaling TSB has to selected, which limits the data TSB to L_d=(L-L_s) rows. 3.) The maximum erasure correction capability T in the data TSB should be chosen according to the desired outage threshold of the application given the actual packet erasure rate on the link. 4.) The redundancy profile for the rest of the data TSB should depend on the size and number of the various layers in the info stream, as well as the desired probability of successful decoding for each of them (quality-of-service requirement). 5.) Any suitable optimization algorithm may be used for deriving an adequate redundancy profile. However, the result has to satisfy the following constraints: a) All available info byte positions in the data TSB have to be completely filled. If the info stream is too short for a desired profile, media stuffing may be applied to the empty info byte positions at the end of the data TSB by appending a sufficient number of bytes (with arbitrary value, e.g. 0x00). The actual number of stuffing symbols per data TSB is then signaled via the respective stuffing indicator (see 7.3). However, before resorting to any stuffing, it should be checked whether it is possible to strengthen the protection of certain rows instead, thus improving the overall robustness of the decoding process. b) The info stream should be fully contained within the data TSB (unless cutting it off at a specific point is explicitly allowed by the properties of the used media codec). c) The number of required descriptors and stuffing indicators (see section 7.3) to signal the profile shall not exceed the space initially reserved for them in the signaling TSB. Constraints a) and b) should be already incorporated in the optimization algorithm. However, if constraint c) is not met, the data TSB has to be reduced by one row in favor of the signaling TSB to accomodate more space for the descriptors and stuffing indicators, i.e. steps 2-5 have to be repeated until a valid redundancy profile has been obtained. 6.) For each nonempty class CA_i, i=T...0, in the data TSB, the following steps have to be performed: a) All rows of this specific class shall be filled from left to right and top to bottom with data bytes of the info stream in decreasing order of importance (i.e. starting with the most important element). b) For each row in the class, the required i parity-check bytes are computed from the same set of codewords of an (n,n-i) RS code, and filled in the empty positions at the end of each row. Thus, every row in the class constitutes a valid codeword of the chosen RS code. 7.) After having filled the whole data TSB with information and parity bytes, the redundancy profile is mapped to the signaling TSB as described in section 7.3. Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page 11] Internet Draft Unequal Erasure Protection November 2001 8.) Each column of the resulting TB is now read out byte-wise from top to bottom and, together with the respective UXP header (see section 7.2) in front, is mapped onto the payload section of one and only one RTP packet. 9.) The n resulting RTP packets shall be transmitted subsequently to the remote host, starting with the leftmost one. 10.) At the corresponding protocol entity at the remote host, the payload (without the UXP header) of all successfully received RTP packets belonging to the same sending TB shall be filled into a similar receiving TB column-wise from top to bottom and left to right. 11.) For every erased packet of a received TB, the respective column in the TB shall be filled with a suitable erasure marker. 12.) Before any other operations can be performed, the redundancy profile has to be restored from the signaling TSB according to the procedure defined in section 7.3. If the attempt fails because of too many lost packets, the whole TB shall be discarded and the receiving entity should wait for the next incoming TB (the source decoder may be informed about the missing info stream, if required). 13.) If the attempt to recover the redundancy profile has been successful, a decoding operation shall be performed for each row of the data TSB by applying any suitable algorithm for erasure decoding. 14.) For all rows of the data TSB for which the decoding operation has been successful, the reconstructed data bytes are read out from left to right and top to bottom, and appended to the reconstructed version of the info stream. 15.) For all rows of the data TSB for which the decoding operation has failed, a sufficient number of suitable dummy symbols may be added to the reconstructed info stream to inform the source decoder about the missing symbols. One can easily realize that the above rules describe an interleaver, i.e. at the sender a single codeword of a TB is spread out over n successive packets. Thus, each codeword of a transmitted TB experiences the same number of erasures at exactly the same positions. Two important conclusions can be drawn from this: a) Since the same RS code is applied to all rows contained in a specific class, either all of them can be correctly decoded or not. Hence, there exist no partly decodable classes at the receiver. b) If decoding is successful for a certain class CA_i, all the classes CA_(i+1)...CA_T can also be decoded, since they are protected by at least one more parity byte per row. Together with rule 6, it is therefore always ensured, that in case a decodable Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page 12] Internet Draft Unequal Erasure Protection November 2001 enhancement layer exists, all other layers it depends on can also be reconstructed! Given the maximum erasure protection value T, the redundancy profile for a data TSB of size (L_d x n) shall be denoted by a so-called erasure protection vector AV of length (T+1), where AV:=(A_0,A_1,...,A_(T-1),A_T) From the above definition, it is easy to realize that the trivial cases of no erasure protection and EXP are a subset of UXP: a) no erasure protection at all: all application data is mapped onto class CA_0, i.e. AV=(L_d,0,0,...,0). b) EXP: all application data is mapped onto class CA_T, i.e. AV=(0,0,...,0,A_T=L_d). Hence, backward compatibility to currently standardized non- progressive multimedia codecs is definitely achieved. 7. RTP payload structure For every packet whose payload is formed by reading out a column of the TB, the RTP header must be followed by an UXP header. 7.1. Specific settings in the RTP header The timestamp of each RTP packet resulting from reading out a TB is set to the time instant when the first byte of the progressive source data stream has been written into the TB. This results in the TS value being the same for all RTP packets belonging to a specific TB. The payload type is of dynamic type, and obtained through out-of- band signaling similar to [1]. The signaling protocol must establish a payload length to be associated with the payload type value. End systems, which cannot recognize a payload type, must discard it. The marker bit is set to 1 for every last packet in a TB. Otherwise, its value is 0. All other fields in the RTP header are set to those values proposed for regular multimedia transmission using the same source codecs, but no erasure protection scheme enabled. The RTP payload shall consist of the UXP header followed by one column of the TB. Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page 13] Internet Draft Unequal Erasure Protection November 2001 7.2. Structure of the UXP header The UXP header shall consist of 2 octets, and is shown in Fig. 4: 0 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |X| block PT | block length n| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fig. 4: Proposed UXP header The fields in the header shall be defined as follows: - X (bit 0): extension bit, reserved for future enhancements, currently not in use -> default value: 0 - block PT (bits 1-7): regular RTP payload type to indicate the media type contained in the info stream - block length n (bits 8-15): indicates total number of RTP packets resulting from one TB (which equals the number of columns of the TB) The syntax of the info stream which is protected by UXP is specified by the RTP payload type field contained in the UXP header. For example, payload type H.263 means that the info stream conforms to the specifications of the RTP profile for H.263, but does not represent the "raw" H.263 stream produced by a H.263 encoder. However, UXP can also be applied to the raw output of the media codec (in case it is already octet-aligned), if this can be signaled to the receiver via other means, e.g. by use of H.245 or SDP. Based on the RTP sequence number, the marker bit, and the repetition of the block length n in each UXP header, the receiving entity is able to recognize both TB boundaries and the actual position of lost packets in the TB. Furthermore, the specific choice of equal TS values for all RTP packets belonging to a TB allows for overcoming possible sequence number overflow. 7.3. In-band signaling of the structure of the redundancy profile To enable a dynamic adaptation to varying link conditions, the actual redundancy profile used in the data TSB must be signaled to the receiving entity. Since out-of-band signaling either results in excessive additional control traffic, or prevents quick changes of the profile between successive TBs, an in-band signaling procedure is desired. As without knowledge of the correct redundancy profile, the decoding process cannot be applied to any of the erasure protection classes, it has to be protected at least as strongly as the most important Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page 14] Internet Draft Unequal Erasure Protection November 2001 element in the info stream against packet loss. Therefore, an additional class CA_P is used in the signaling TSB, where the number of parity symbols is by default set to the following value: P=ceil(n/2) Hence, up to 50% of the RTP packets can be lost, before the redundancy profile cannot be recovered anymore. This seems to be a reasonable value for the lowest point of operation over a lossy link. Alternatively, p may be explicitly signaled during session setup by means of SDP or H.245 protocol. Consequently, since all other classes must have equal or less erasure protection capability, the maximum allowable value for class CA_T in the data TSB is now limited to T<=P. The signaling of the erasure protection vector is accomplished by means of descriptors. For each class CA_i with A_i>0, there is a descriptor DP_i providing information about the size of class CA_i (i.e. the value of A_i) and establishing a relationship between the erasure protection of class CA_i and that of the first preceding class CA_(i+j) with A_(i+j)>0, where j>0. A descriptor DP_i is mapped onto one byte, which is sub-divided into two half-bytes (i.e. the higher and the lower four bits). The first half-byte is of type unsigned and contains the 4-bit representation of the decimal value A_i. The second half-byte is of type signed and contains the difference in erasure protection between class CA_i and class CA_(i+j), i.e. the signed 4-bit representation of the decimal value (-j) (where the MSB denotes the sign, and the lower three bits the absolute value). Note that the erasure protection p of class CA_p is fixed, whereas the size A_p may vary. Thus, the data to be filled into class CA_p shall consist of a sequence of descriptors separated by stuffing indicators (see below), where the number of descriptors is primarily given by the number of protection classes CA_i, 0<=i<=T, in the data TSB with A_i>0. Without a-priori knowledge, the initial value for the size of the signaling TSB should be set to one (row). When the number of necessary descriptors and stuffing indicators exceeds the (n-p) information positions, one or more additional rows have to be reserved. This is usually done by increasing the value for L_s to A_p>1, i.e. the data TSB is reduced to (L-A_p) rows. Hence, in order to indicate the actual size of the signaling TSB, an additional descriptor is inserted at the very beginning, which takes on the value 0xq0, where q denotes the (octal) four bit representation of the decimal value A_p. Furthermore, the end of each data TSB is signaled by the otherwise unused descriptor value 0x00, followed by exactly one stuffing indicator (SI). The latter is mapped onto a byte, which is of type unsigned and contains the 8-bit representation of the decimal value Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page 15] Internet Draft Unequal Erasure Protection November 2001 of the number of media stuffing symbols used at the end of the respective data TSB. The (extended) sequence of descriptors and stuffing indicators is then mapped to the info byte positions in the A_p rows of the signaling TSB from left to right and top to bottom. Each row is then encoded with the same (n,n-p) RS code. If the number of descriptors and stuffing indicators is less than the available info byte positions, however, empty positions in class CA_p may be filled up with the otherwise unused descriptor 0x00. At the receiving entity, the sequence of descriptors shall be recovered by performing erasure decoding on the first row of the TB (which definitely belongs to the signaling TSB) using the same algorithm as later for the data TSB. If successful, the very first descriptor now indicates the number of rows of the signaling TSB, and the next (A_p-1) rows are decoded to reconstruct the redundancy profile for the data TSB(s), together with the number of media stuffing symbols denoted by the respective SI(s). The complete structure of the TB is now depicted in Fig. 5. Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page 16] Internet Draft Unequal Erasure Protection November 2001 Transmission Block (TB) P <---------> /\ +-+-+-+-+-+-+-+-+-+ /\ | |?|?|?|?|*|*|*|*|*| | A_P=1 | +-+-+-+-+-+-+-+-+-+ \/ | |&|&|&|&|&|*|*|*|*| /\ | +-+-+-+-+-+-+-+-+-+ | A_T=3 | |&|&|&|&|&|*|*|*|*| | | +-+-+-+-+-+-+-+-+-+ | L bytes | |&|&|&|&|&|*|*|*|*| \/ payload | +-+-+-+-+-+-+-+-+-+ /\ per packet | +%|%|%|%|%|%|*|*|*| | A_(T-1)=1 | +-+-+-+-+-+-+-+-+-+ \/ | |$|$|$|$|$|$|$|*|*| . | +-+-+-+-+-+-+-+-+-+ . | |¦|¦|¦|¦|¦|¦|¦|¦|*| . | +-+-+-+-+-+-+-+-+-+ /\ | |#|#|#|#|#|#|#|#|#| | A_0=1 \/ +-+-+-+-+-+-+-+-+-+ \/ <-----------------> n packets ? : descriptors and stuffing indicators for in-band signaling of the redundancy profile &,%,$,¦,# : info bytes belonging to a certain element of the info stream in decreasing order of importance * : parity bytes gained from Reed-Solomon coding Fig. 5: General structure for UXP with in-band signaling of the redundancy profile The following simple example is meant to illustrate the idea behind using descriptors: Let an erasure protection vector of length T+1=7 be given as follows: AV=(A_0,A_1,...,A_5,A_6)=(7,0,2,2,0,3,10) Hence, the length L of the TB (including one row for the signaling TSB) is equal to 7+2+2+3+10+1=25 (rows/bytes). If the width is assumed to be equal to 20 (columns/packets), then the erasure protection of the descriptors is p=10. The corresponding sequence of descriptors can be written as DP=(DP_6,DP_5,DP_3,DP_2,DP_0)=(0xAC,0x39,0x2A,0x29,0x7A), where the values of the descriptors are given in hexadecimal notation. Next, the descriptor indicating the length of the signaling TSB has to be inserted, the end of the data TSB has to be marked by 0x00, and the SI has to be appended. If the number of media stuffing symbols is assumed to be 3, the 10 info bytes in the signaling TSB take on the following values (descriptor stuffing included): Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page 17] Internet Draft Unequal Erasure Protection November 2001 (0x10,0xAC,0x39,0x2A,0x29,0x7A,0x00,0x03,0x00,0x00) 7.4 Optional Concatenation of Transmission Sub Blocks: The following procedure may be applied if a single info stream would be too short to achieve an efficient mapping to a transmission block with respect to the fixed payload length L and the desired number of packets n. For example, intra-coded video frames (I-frames) are usually much larger than the following predicted ones (P-frames). In this case, a certain number z of successive small info streams should be each mapped to a transmission sub block with length L_d(y) and width n, such that L_d(1)+L_d(2)+?+L_d(z)=L_d. The resulting transmission sub blocks can then be easily concatenated to form a TB of size L x n having one common signaling TSB: Since the second half-byte of the descriptors is of type signed, we are able to incorporate both decreasing and increasing erasure protection profiles within one single signaling TSB. Note that once the lengths L_d(y) of the individual blocks have been fixed, the respective redundancy profiles can be determined independently of each other. However, the space initially reserved for the signaling TSB should be already large enough to avoid profile recalculation for each of the data TSBs in case the sequence of descriptors gets too long! Again, we will give a simple example to illustrate this idea: Let the erasure protection vectors for two concatenated data TSBs be given as follows: AV1=(A1_0,A1_1,...,A1_5,A1_6)=(0,0,2,2,0,3,10), AV2=(A2_0,A2_1,...,A2_5,A2_6)=(0,0,2,2,0,3,10). Hence, two single identical data TSBs will be concatenated to form a TB of length L=2*(2+2+3+10)+2=36 (rows/bytes). If the width is again assumed to be equal to 20 (columns/packets), then the erasure protection of the descriptors is p=10, and therefore a total of two rows for the signaling TSB have been reserved this time. The corresponding sequence of descriptors can now be written as DP=(0xAC,0x39,0x2A,0x29,0xA4,0x39,0x2A,0x29), where the values of the descriptors are given in hexadecimal notation. If the number of media stuffing symbols is assumed to be 3 for each data TSB, the 20 info byte positions in the signaling TSB are filled with the following values (descriptor stuffing included): (0x20,0xAC,0x39,0x2A,0x29,0x00,0x03,0xA4,0x39,0x2A,0x29,0x00,0x03, 0x00,0x00,0x00,0x00,0x00,0x00,0x00) Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page 18] Internet Draft Unequal Erasure Protection November 2001 8. Security Considerations The payload of the RTP-packets consists of an interleaved multimedia and parity stream. Therefore, it is reasonable to encrypt the resulting stream with one key rather than using different keys for multimedia and parity data. It should also be noted that encryption of the multimedia data without encryption of the parity data could enable known-plaintext attacks. The overall proportion between parity bytes and info bytes should be chosen carefully if the packet loss is due to network congestion. If the proportion of parity bytes per TB is increased in this case, it could lead to increasing network congestion. Therefore, the proportion between parity bytes and info bytes per TB MUST NOT be increased as packet loss increases due to network congestion. The overall ratio between parity and info bytes MUST NOT be higher than 1:1, i.e. the absolute bitrate spent for redundancy must not be larger than the bitrate required for transmission of multimedia data itself. 9. Application Statement There are currently two different schemes proposed for unequal error protection in the IETF-AVT: Unequal Level Protection (ULP) and Unequal Erasure Protection (UXP). Although both methods seem to address the same problem, the proposed solutions differ in many respects. This section tries to describe possible application scenarios and to show the strength and weaknesses of both approaches. The main difference between both approaches is that while ULP preserves the structure of the packets which have to protected and provides the redundancy in extra packets, UXP interleaves the info stream which has to be protected, inserts the redundancy information, and thus creates a totally new packet structure. Another difference concerns multicast compatibility: It cannot be assumed that all future terminals will be able to apply UXP/ULP. Therefore, backward compatibility could be an issue in some cases. Since ULP does not change the original packet structure, but only adds some extra packets, it is possible for terminals which do not support ULP to discard the extra packets. In case of UXP, however, two separate streams with and without erasure protection have to be sent, which increases the bandwidth. Next, both approaches offer different mechanism to adjust packet sizes, if necessary: UXP allows to adjust the packet sizes arbitrarily. This is an advantage in case the loss probability is dependent on the packet length, which happens, for example, if the end-to-end connection contains wireless links. In this case proper Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page 19] Internet Draft Unequal Erasure Protection November 2001 adjustment of the packet size is one essential network adaption technique. In addition, if a preencoded stream is sent over the network, the packet size can be adjusted independently of slice structures. Since ULP does not change the existing packetization scheme, this flexibility does not exist. The ability of UXP to adjust the packet size arbitrarily can be especially exploited in a streaming scenario, if a delay of several hundred milliseconds is acceptable. It is then possible to fill several video frames into a single TB of desired size, e.g. a group of pictures consisting of I-frame, P-frames and B-frames. The redundancy scheme can thus be selected in such a way as to guarantee the following property: In case of packet loss, the streams for P- frames are only recoverable, if the I-frame, on which the decoding of P-frames depends, is recoverable. The same is true for B-frames, which can only be decoded if the respective P-frames are recoverable. This prevents situations in which, for example, the B-frames have been received correctly, but the P-frames have been lost, i.e. assures a gradual decrease in application quality also on the frame level. Of course, a similar encoding is possible with ULP. But in this case one might have to send several frames within one packet which leads to large packet sizes. Finally, decoding delay is also a crucial issue in communications. Again, both approaches have different delay properties: UXP introduces a decoding delay because a reasonable amount of correctly received packets are necessary to start decoding of a TB. The delay in general depends on the dimensions of the interleaver. This should be considered for any system design which includes UXP. With ULP, every correctly received media packet can be decoded right away. However, a significant delay is introduced, if packets are corrupted, because in this case one has to wait for several redundancy packets. Thus, the delay is in general dependent on the actual ULP-FEC-packet scheme and cannot be considered in advance during the system design phase. 10. Intellectual Property Considerations Siemens AG has filed patent applications that might possibly have technical relations to this contribution. On IPR related issues, Siemens AG refers to the Siemens Statement on Patent Licensing, see http://www.ietf.org/ietf/IPR/SIEMENS-General. Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page 20] Internet Draft Unequal Erasure Protection November 2001 11. References [1] J. Rosenberg and H. Schulzrinne, "An RTP Payload Format for Generic Forward Error Correction", Request for Comments 2733, Internet Engineering Task Force, Dec. 1999. [2] A. Albanese, J. Bloemer, J. Edmonds, M. Luby, and M. Sudan, "Priority encoding transmission", IEEE Trans. Inform. Theory, vol. 42, no. 6, pp. 1737-1744, Nov. 1996. [3] Shu Lin and Daniel J. Costello, Error Control Coding: Fundamentals and Applications, Prentice-Hall, Inc., Englewood Cliffs, N.J., 1983. [4] W. Li: "Fine Granularity Scalability Using Bit-Plane Coding of DCT Coefficients", ISO/IEC JTC1/SC29/WG11, Doc. MPEG98/M4204, Dec. 1998. [5] G. Blaettermann, G. Heising, and D. Marpe: "A Quality Scalable Mode for H.26L", ITU-T SG16, Q.15, Q15-J24, Osaka, May 2000. [6] F. Burkert, T. Stockhammer, and J. Pandel, "Progressive A/V coding for lossy packet networks - a principle approach", Tech. Rep., ITU-T SG16, Q.15, Q15-I36, Red Bank, N.J., Oct. 1999. [7] Guenther Liebl, "Modeling, theoretical analysis, and coding for wireless packet erasure channels", Diploma Thesis, Inst. for Communications Engineering, Munich University of Technology, 1999. 12. Acknowledgments Many thanks to Thomas Stockhammer, who initially came up with the idea of unequal erasure protection to improve progressive video transmission over lossy networks. Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page 21] Internet Draft Unequal Erasure Protection November 2001 13. Author's Addresses Guenther Liebl, Thomas Stockhammer Institute for Communications Engineering (LNT) Munich University of Technology D-80290 Munich Germany Email: {liebl,tom}@lnt.e-technik.tu-muenchen.de Minh-Ha Nguyen, Frank Burkert Siemens AG - ICM D MP RD MCH 83/81 D-81675 Munich Germany Email: {minhha.nguyen,frank.burkert}@mch.siemens.de Marcel Wagner, Juergen Pandel, Wenrong Weng, Gero Baese Siemens AG - Corporate Technology CT IC 2 D-81730 Munich Germany Email: {marcel.wagner,juergen.pandel,wenrong.weng,gero.baese}@mchp.siemens. de Full Copyright Statement "Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES; EXPRESS OR IMPLIED; INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page 22]