Internet Engineering Task Force Mitsuyuki Hatanaka Internet Draft Sony Corporation Document: draft-hatanaka-avt-rtp-atracx-02.txt June 2003 Expires: December 30 2003 RTP payload format for ATRAC-X Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document describes an RTP payload format for efficient and flexible transporting of ATRAC-X encoded audio data. ATRAC-X is a high quality audio coding technology that supports multiple channels. The RTP payload format as presented in this document includes support for metadata, data fragmentation, and continuous decoding even during packet losses. 1. Introduction ATRAC-X is a state-of-the-art perceptual audio coding technology, and is the successor of ATRAC and ATRAC3. ATRAC technology has been used in MD, NetMD, and Memory Stick Audio products. Improvements over previous versions of ATRAC include: - Higher sound quality at lower bit-rates - Wide range of bit-rates, from 8kbps to 1.4Mbps - Support for multichannel coding - A flexible format for future extensions - Suitability for streaming, including scalability and fixed frame lengths Hatanaka [Page 1] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-02.txt June 2003 The modularity and portability of ATRAC-X means it can be widely used in many applications and platforms. 1.1 Overview of ATRAC-X ATRAC-X can deliver multiple channels of audio, from monaural to 7.1 channels, and from bit rates of 8kbps to 1.4Mbps. Sampling rates of 32kHz, 44.1kHz and 48kHz are currently supported, with higher rates of up to 96kHz on the horizon. Since ATRAC-X has adopted a flexible format, future extensions can include better-than-CD quality and increases in band width. Similar to other perceptual audio coding algorithms, ATRAC-X is based on time/frequency mappings. However, new techniques have been incorporated which enable more precise signal scaling for QoS. 1.2 Overview of ATRAC-X streaming on RTP The basic building block for ATRAC-X streaming on RTP is the ATRAC-X "segment". Each such segment contains the current ATRAC-X encoded audio data and metadata, as well as any necessary redundant data. ATRAC-X segments also incorporate a fragmentation mechanism to avoid excessive packet sizes for one MTU. Multiple ATRAC-X streams can be transmitted over a single RTP session by sending multiple segments within each ATRAC-X "slot" -- our nomenclature for an arbitrary frame of time in which the received audio data resides. Figure 1 is a visualization of this concept. +------0--------1--------2--------3----> ATRAC-X Segment | +-----+ +-----+ +-----+ +-----+ 0 | N | | N | | N | | N | .. | +-----+ +-----+ +-----+ +-----+ | +-----+ +-----+ +-----+ +-----+ 1 | N+1 | | N+1 | | N+1 | | N+1 | .. | +-----+ +-----+ +-----+ +-----+ | +-----+ +-----+ +-----+ +-----+ +-----+ 2 | N+2 | | N+2 | | N+2 | | N+2 | .. | n | = ATRAC-X Segment | +-----+ +-----+ +-----+ +-----+ +-----+ with sequence n | : : : : V time ("slot") Figure 1: ATRAC-X RTP Multiplexed Packetization Streaming Concept More specific examples of this generalized image can be seen in figures 4 and 5. This scheme allows for various content distribution methods, including a substantial number of audio channels. Hatanaka [Page 2] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-02.txt June 2003 2. Payload Format 2.1 ATRAC-X Full Payload Visualization The complete structure of an ATRAC-X RTP Payload Format is shown below. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2 |P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| FRSEQNO |ElementID|C|FragNo |NmSeg| TCC | BCP | |Priority |NF(=2) |RNF(=2)|RNMD(=1) | Time Stamp Offset | |NMD(=1) | RSV | MDID | MDLEN | | |RSV | | | META-DATA(1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | ATRAC-X Main Frame Data(1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | ATRAC-X Main Frame Data(2) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Rd MDID |Rd MDID_LEN |RSV | | Redundant META-DATA | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | ATRAC-X Redudant Frame Data(1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | ATRAC-X Redudant Frame Data(2) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: ATRAC-X RTP Payload Format Hatanaka [Page 3] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-02.txt June 2003 2.2 ATRAC-X Specific Data The section specific to ATRAC-X is shown below 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| FRSEQNO |ElementID|C|FragNo |NmSeg|TCC | BCP | |Priority |NF(=N) |RNF(=0)|RNMD(=0) | Time Stamp Offset | |NMD(=0) | RSV |LENGTH |RSV| | | | | ATRAC-X Main Frame Data(1) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | ATRAC-X Main Frame Data(2) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | ............ | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | ATRAC-X Main Frame Data(N) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3: ATRAC-X Main Data format 2.3. Payload Header Description - Version: Version number (4bit) receiver supports the version in the payload header, the transmitted packets will be parsed and reconstructed; otherwise the packets may be discarded by the receiver. Receivers may support more than one version of this protocol if desired. - FRSEQNO: Frame Sequence Number (7bit) FRSEQNO denotes the frame sequence number from 0 to 127, and wraps around accordingly. Hatanaka [Page 4] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-02.txt June 2003 - ElementID: ATRAC-X bit stream Element ID (5bit) ElementID identifies each individual ATRAC-X bit stream. One ATRAC-X RTP session can handle up to 32 ATRAC-X streams simultaneously. ElementIDs allow greater content distribution control, such as flexibility in the number of channels and QoS management. - NmSeg: Number Of Segment in one ATRAC-X slot (3bit) This identifier indicates the number of ATRAC-X segments in one ATRAC-X slot. NmSeg must be identical for all segments within one ATRAC-X slot. The maximum value of NmSeg is determined at the application level or negotiated between receiver and sender prior to content transmission using the Session Description Protocol(SDP). - Priority : Priority identifier (5bit) This identifier denotes the priority between individual segments (within the same slot) of the same ElementID. Lower values denote higher priority. Priority values are not absolute but relative to each other within one ATRAC-X slot. The value of each priority does not have to be unique, and it is thus up to the receiver to decide how to process the segment priorities. ____________ ____________ ____________ ____________ | ATRAC-X | | ATRAC-X | | ATRAC-X | | ATRAC-X | |7.1 (12.2)ch| |5.1 (12.2)ch| |7.1 (12.2)ch| |5.1 (12.2)ch| | 384kbps | | 256kbps | | 384kbps | | 256kbps | |FRSEQNO:N | |FRSEQNO:N | |FRSEQNO:N+1 | |FRSEQNO:N+1 | |ElementID:0 | |ElementID:1 | |ElementID:0 | |ElementID:1 | |<---------->| |<---------->| |<---------->| |<---------->| ATRAC-X ATRAC-X ATRAC-X ATRAC-X Segment(1) Segment(2) Segment(1) Segment(2) |<---------------------------->|<----------------------------->| ATRAC-X Slot -Nth- ATRAC-X Slot -N+1th- Figure 4: Transmission of more than 7.1ch (12.2ch) ATRAC-X bit streams using two individual streams in one ATRAC-X RTP Payload Figure 4 shows an example packetization for a 12 channel ATRAC-X bit stream using two individual streams. We define "n-th ATRAC-X slot" as the set of ATRAC-X segments that have identical frame sequence number n. In this case, each ATRAC-X slot is composed with two ATRAC-X segments. One of the ATRAC-X segments contains an 384kbps bit stream for the first 7.1 channels, and the other contains a 256kbps bit stream Hatanaka [Page 5] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-02.txt June 2003 - NF : Number of ATRAC-X audio Frames (4bit) NF denotes the number of ATRAC-X audio frames in one ATRAC-X segment, with a maximum of 15. When transmitting metadata only, NF must be set to 0. - TCC: Total Channel Configuration (3bit) TCC denotes the ATRAC-X Channel Configuration information as defined in Table 1. A single ATRAC-X stream supports multichannel coding of up to 8 channels through a combination of stereo and monaural channel blocks. By splitting up the channel information into segments, receivers can select necessary packets for partial decoding. Another benefit is the ability to conceal dropped channel data by using another channel block's data for decoding. Hatanaka [Page 6] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-02.txt June 2003 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |TCC Index| Number of | Audio ChannelBlock | Default block for | | | Speakers | Groupings | speaker mapping | | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1 | 1 | mono_channel_block | front: center | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 2 | 2 | stereo_channel_block | front: left, right | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 3 | 3 | stereo_channel_block | front: left, right | | | | mono_channel_block | front: center | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 4 | 4 | stereo_channel_block | front: left, right | | | | mono_channel_block | front: center | | | | mono_channel_block | rear: surround | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 5 | 5+1 | stereo_channel_block | front: left, right | | | | mono_channel_block | front: center | | | | stereo_channel_block | rear: left, right | | | | mono_channel_block |low frequency effects| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 6 | 6+1 | stereo_channel_block | front: left, right | | | | mono_channel_block | front: center | | | | stereo_channel_block | rear: left, right | | | | mono_channel_block | rear: center | | | | mono_channel_block |low frequency effects| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 7 | 7+1 | stereo_channel_block | front: left, right | | | | mono_channel_block | front: center | | | | stereo_channel_block | rear: left, right | | | | stereo_channel_block | side: left, right | | | | mono_channel_block |low frequency effects| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Table 1: Total Channel Configuration Index values - BCP: Block Coupling Pattern (5bits) BCP indicates which ATRAC-X channel block within a group are contained in an ATRAC-X segment. Given the TCC Index value, the Nth bit from the left indicates the Nth channel block, counting down the list of channel blocks as defined in column 3 of Table 1. If the TCC value is 1 or 2, BCP must be set to "00000". The combination of ATRAC-X channel blocks must be chosen from the ones listed in the third column of Table 1. Further examples should help clarify this terminology. Hatanaka [Page 7] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-02.txt June 2003 ___________ ____________ ___________ ___________ | ATRAC-X | | ATRAC-X | | ATRAC-X | |ATRAC-X | | Front L,R | |Front Center| | Rear L,R | |LFE | |FRSEQNO:N | |FRSEQNO:N | |FRSEQNO:N | |FRSEQNO:N | |ElementID:0| |ElementID:0 | |ElementID:0| |ElementID:0| |Priority:0 | |Priority:1 | |Priority:0 | |Priority:1 | |FN:1 | |FN:1 | |FN:1 | |FN:1 | |TCC:5 | |TCC:5 | |TCC:5 | |TCC:5 | |BCP:10000 | |BCP:01000 | |BCP:00100 | |BCP:00010 | |<--------->| |<---------->| |<--------> | |<--------> | ATRAC-X ATRAC-X ATRAC-X ATRAC-X Segment(1) Segment(2) Segment(3) Segment(4) |<------------------------------------------------------->| ATRAC-X Slot -Nth- Figure 5: Dividing 5.1 ATRAC-X data into four ATRAC-X Segments Figure 5 illustrates an example sequence of ATRAC-X segments utilizing the BCP field. All segments belong to the same ATRAC-X stream and therefore have the same ElementID value of 0. As listed in Table 1, a TCC value of 5 means that the audio data being sent is from a 5.1 multichannel source. However, in this example, the data is broken up into four ATRAC-X segments, corresponding to an ATRAC-X channel block of Front LR, Front Center, Rear LR, and LFE, and with BCP values of 10000, 01000, 00100 and 00010, respectively. In this case a higher priority is assigned to FrontL,R and RearL,R. In some cases the data size of the LFE channel block is small, so the LFE channel data can be combined with another channel block for greater transmission efficiency. Figure 6 illustrates an example of combining channel blocks into a segment. In this case, the BmCP value of each ATRAC-X segment is set as follows: Front LR block: BCP = 10000 Front Center block: BCP = 01000 Rear LR + LFE block: BCP = 00110 Hatanaka [Page 8] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-02.txt June 2003 ___________ ____________ _____________ | ATRAC-X | | ATRAC-X | | ATRAC-X | | Front L,R | |Front Center| |Rear L,R+LFE | |FRSEQNO:N | |FRSEQNO:N | |FRSEQNO:N | |ElementID:0| |ElementID:0 | |ElementID:0 | |Priority:0 | |Priority:1 | |Priority:0 | |FN:1 | |FN:1 | |FN:1 | |TCC:5 | |TCC:5 | |TCC:5 | |BCP:10000 | |BCP:01000 | |BCP:00110 | |<--------->| |<---------->| |<----------->| ATRAC-X ATRAC-X ATRAC-X |<------------------------------------------>| ATRAC-X Slot -Nth- Figure 6: Combining rear LR and LFE channel blocks into an ATRAC-X Segment The ATRAC-X RTP payload format is capable of sending a mixture of divided and non-divided ATRAC-X streams. Figure 7 illustrates an example of sending divided and non-divided streams. ___________ ____________ ____________ ___________ | ATRAC-X(1)| | ATRAC-X(1) | | ATRAC-X(1) | |ATRAC-X(2) | | Front L,R | |Front Center| |Rear L,R+LFE| | Front L,R | |FRSEQNO:N | |FRSEQNO:N | |FRSEQNO:N | |FRSEQNO:N | |ElementID:0| |ElementID:0 | |ElementID:0 | |ElementID:1| |Priority:0 | |Priority:1 | |Priority:0 | |Priority:0 | |FN:1 | |FN:1 | |FN:1 | |FN:1 | |TCC:5 | |TCC:5 | |TCC:5 | |TCC:2 | |BCP:10000 | |BCP:01000 | |BCP:00110 | |BCP:00000 | |<--------->| |<---------->| |<---------->| |<--------> | ATRAC-X ATRAC-X ATRAC-X ATRAC-X Segment(1) Segment(2) Segment(3) Segment(4) |<-------------------------------------------------------->| ATRAC-X Slot -Nth- Figure 7: Sending mixture of divided and non-divided ATRAC-X stream ATRAC-X Segments (1) through (3) are a divided 5.1 channel stream, and segment (4) is a non-divided stereo stream. (Note the BCP for segment (4) must be "00000".) - LENGTH: Length of ATRAC-X data (17bit) The bit size of each ATRAC-X frame in an ATRAC-X segment is placed in LENGTH. But actual frame data will be filled with adequate number of 0 for byte allignment, and these 0 data will be igonored when using the alligned frame data. Hatanaka [Page 9] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-02.txt June 2003 3. QoS Consideration Realtime bit-rate control is a natural step in the implementation of Quality of Service (QoS). The ATRAC-X payload format allows for this control through the NmSeg parameter. Figure 8 below illustrates the NmSeg value changing while transmitting a 12 channel ATRAC-X bit stream. At first, two ATRAC-X segments containing the first 7.1ch and remaining 5.1ch, respectively, are transmitted in the Nth ATRAC-X slot. It is then reduced to transmit only one segment in the (N+1)th and following ATRAC-X frame sequences by omitting transmission of the latter 5.1ch stream. ____________ ____________ ____________ ____________ | ATRAC-X | | ATRAC-X | | ATRAC-X | | ATRAC-X | |7.1 (12.2)ch| |5.1 (12.2)ch| |7.1 (12.2)ch| |7.1 (12.2)ch| | 384kbps | | 256kbps | | 384kbps | | 384kbps | |FRSEQNO:N | |FRSEQNO:N | |FRSEQNO:N+1 | |FRSEQNO:N+2 | |ElementID:0 | |ElementID:1 | |ElementID:0 | |ElementID:0 | |NmSeg:2 | |NmSeg:2 | |NmSeg:1 | |NmSeg:1 | |<---------->| |<---------->| |<---------->| |<---------->| ATRAC-X ATRAC-X ATRAC-X ATRAC-X Segment(1) Segment(2) Segment(1) Segment(1) |<-------------------------->|<-------------->|<------------>| ATRAC-X Slot ATRAC-X Slot ATRAC-X Slot -Nth- -N+1th- -N+2th- Figure 8: Bit-rate control by omitting secondary 5.1ch data As another example, Figure 9 below illustrates an example of the NmSeg field changing while transmitting a 5.1ch ATRAC-X bit stream. Initially, two ATRAC-X segments comprising all channel blocks of a 5.1ch stream are transmitted in the Nth ATRAC-X slot. Then it is reduced to transmit only FrontL,R and Center channel blocks in the (N+1)th and following ATRAC-X slots by omitting the transmission of RearL,R and LFE channel blocks. Hatanaka [Page 10] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-02.txt June 2003 ___________ ____________ ____________ ____________ | ATRAC-X | | ATRAC-X | | ATRAC-X | |ATRAC-X | |FrontL,R | |RearL,R+LFE | |Front L,R | | Front L,R | | +Center | | | | + Center | | +Center | |FRSEQNO:N | |FRSEQNO:N | |FRSEQNO:N+1 | |FRSEQNO:N+2 | |ElementID:0| |ElementID:0 | |ElementID:0 | |ElementID:0 | |NmSeg:2 | |NmSeg:2 | |NmSeg:1 | |NmSeg:1 | |TCC:5 | |TCC:5 | |TCC:5 | |TCC:5 | |BCP:11000 | |BCP:00110 | |BCP:11000 | |BCP:11000 | |<--------->| |<---------->| |<---------->| |<---------->| ATRAC-X ATRAC-X ATRAC-X ATRAC-X Segment(1) Segment(2) Segment(1) Segment(1) |<-------------------------->|<-------------->|<------------>| ATRAC-X Slot ATRAC-X Slot ATRAC-X Slot -Nth- -N+1th- -N+2th- Figure 9: Bit-rate control by omitting some channel blocks 4. Metadata The ATRAC-X RTP payload provides support for the inclusion of metadata. Metadata can be used for controlling the playback of ATRAC-X data as it is streamed in real-time, or simply as supplemental information. Example uses include downmix parameters, speaker configuration settings, and effects such as panning, fading, etc. The receiver may handle all or part of the metadata segments, which are each classified by a unique ID. The following information must be defined in the ATRAC-X RTP payload header when referring to metadata. - NMD: Number of Metadata Frames(5bit) Number of metadata frames included in the RTP packet - MDID: MetaData ID (16bit) A unique ID which indicates the metadata type associated with this frame. Although unique, there are two ID types. The first type of identifier is globally pre-define for specific metadata types, while the other identifier type is for session specific use, as generated and negotiated between transmitter and receiver dynamically prior to the streaming session. The two types are distinguished by the MSB of the otherwise the ID is a session specific one. Thus, 32767 kinds of metadata will be available for each type of identifier. Currently all globally pre-defined identifiers are reserved and prohibited.Definition of the negotiation method between transmitter and receiver is outside the scope of this document. Hatanaka [Page 11] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-02.txt June 2003 - MDLEN: MetaData LENgth (10bit) The byte size of the metadata corresponding to the above metadata ID. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |MDID(=N) |MDLEN |RSV | | | | | | META-DATA (N) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |MDID(=N+1) |MDLEN |RSV | | | | META-DATA (N+1) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 9: Metadata segment 5. Redundant data for robustness Redundant data can be included in the ATRAC-X RTP payload in order to recover from errors due to packet loss. ATRAC-X audio frames from previous ATRAC-X slots are re-sent as redundant audio data. Metadata can also be re-sent as redundant data. Existence of redundant data in the payload is not mandatory. When transmitting redundant data, the following information must be defined in the ATRAC-X RTP payload header: - TimeStampOffset : Time Stamp Offset for redundant data (14bit) An unsigned timestamp offset for this ATRAC-X segment relative to the timestamp given in the RTP header. The use of an unsigned offset implies that redundant data is sent after the original data. Thus, TimeStampOffset is subtracted from the current timestamp to determine the timestamp of the redundant data. - RNF : The number of redundant ATRAC-X audio frames(4bit) - RNMD : The number of redundant ATRAC-X metadata frames(5 bit) Hatanaka [Page 12] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-02.txt June 2003 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TimeStampOffset |RNF |RNMD | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 10: Control bit field for redundant data The following 2 figures show hypothetical ATRAC-X packets at previous and current time frames when sending redundant data. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| FRSEQNO |ElementID|C|FragNo |NmSeg|TCC | BCP | |Priority |NF(=3) |RNF(=3)|RNMD(=0) |Time Stamp Offset | |NMD(=0) | RSV |Length |RSV | | ATRAC-X Main Frame Data (N th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | ATRAC-X Main Frame Data (N+1 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | ATRAC-X Main Frame Data (N+2 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | ATRAC-X Redundant Frame Data (M th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | ATRAC-X Redundant Frame Data (M+1 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | ATRAC-X Redundant Frame Data (M+2 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ M is the ATRAC-X frame number corresponding to the time which is calculated by (RTP TimeStamp - TimeStampOffset). Figure 11: An example with redundant data Hatanaka [Page 13] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-02.txt June 2003 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| FRSEQNO |ElementID|C|FragNo |NmSeg|TCC | BCP | |Priority |NF(=3) |RNF(=3)|RNMD(=1) |Time Stamp Offset | |NMD(=1) | RSV | MDID | MDLEN | | |RSV | | | META-DATA | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | ATRAC-X Main Frame Data (N th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | ATRAC-X Main Frame Data (N+1 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | ATRAC-X Main Frame Data (N+2 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |MDID |MDLEN |RSV | | | | Redundant META DATA | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | ATRAC-X Redundant Frame Data (M th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | ATRAC-X Redundant Frame Data (M+1 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | ATRAC-X Redundant Frame Data (M+2 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ M is the ATRAC-X frame number corresponding to the time which is calculated by (RTP TimeStamp - TimeStampOffset). Figure 12: An example with redundant data and additional metadata Hatanaka [Page 14] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-02.txt June 2003 6. Fragmentation In the event that ATRAC-X frame data, metadata and/or redundant data are too large to be packetized into one RTP packet, transmissions of one ATRAC-X segment can be fragmented into sub-segments. 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |C|FragNo |RSV | +-+-+-+-+-+-+-+-+ Figure 13: Control bit field for fragmentation - C: Continuous flag (1bit) Continuous flag indicates that succeeding parts of the data in the current packet exists in following packets, and a value of 0 denotes the data is complete in the current packet. - FragNo: Fragmentation Number (4bit) The sequence number for each packet in the fragmentation. Up to 15 fragmentations are supported. Metadata can exist only in the first fragmented packet (FragNo = 0) to avoid conflicts in fragmentation. ___________ ____________ ____________ ____________ | Front L,R | |Front Center| |Rear L,R+LFE| |Rear L,R+LFE| | | | | | | |Fragmented | |FRSEQNO:N | |FRSEQNO:N | |FRSEQNO:N | |FRSEQNO:N | |ElementID:0| |ElementID:0 | |ElementID:0 | |ElementID:0 | |Priority:0 | |Priority:1 | |Priority:0 | |Priority:0 | |NF:1 | |NF:1 | |NF:1 | |NF:1 | |TCC:5 | |TCC:5 | |TCC:5 | |TCC:5 | |BCP:10000 | |BCP:01000 | |BCP:00110 | |BCP:00110 | |C:0 | |C:0 | |C:1 | |C:0 | |FragNo:0 | |FragNo:0 | |FragNo:0 | |FragNo:1 | |<--------->| |<---------->| |<-------------------------->| ATRAC-X ATRAC-X ATRAC-X Segment(1) Segment(2) Segment(3) |<--------------------------------------------------------->| ATRAC-X Slot -Nth- Figure 14: An example of fragmentation in 5.1ch ATRAC-X Segment(3) Hatanaka [Page 15] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-02.txt June 2003 7. RTP Standard Header The RTP standard header timestamp is the presentation time of the first ATRAC-X frame data in a segment, and is described as the PCM sample number from the contents' beginning. The initial value for timestamp is arbitrary, but a random number is preferable. The "Marker bit" is set to 1 for the last packet in each ATRAC-X Slot, 0 otherwise. Remarks: The sampling frequency of all ATRAC-X bit streams included in one ATRAC-X RTP payload format must be indentical to avoid time stamp conflicts. 8. Multicasting Consideration This payload can be used for unicast and multicast session system. But currently in case of multicasting, the QoS compensating functions which are described in section 3 "QoS Consideration" should be disable in order to avoid the confliction of packet handling in multicasting transmission. 9. Glossary (1) ATRAC-X Audio Frame : The smallest unit of ATRAC-X data. This is equivalent to 2048 PCM samples (as defined in the ATRAC-X specification ). (2) ATRAC-X Channel Block : A unit representing how audio signals are contained. Two types of channel blocks exist: the "mono_channel_block", which represents a one monaural channel, and the "stereo_channel_block" which represents one pair of stereo channels. Constructing one complete bit stream which contains more than two channels is realized by a combination of the two types of channel blocks. Possible combinations are defined in Table 1. (3) ATRAC-X Segment : A unit of ATRAC-X data that is sent inside an RTP packet. A segment consists of any combination of audio frames, metadata frames, redundant metadata frames, and redundant audio frames. (4) ATRAC-X Slot: A unit of time within which all audio frames of an ATRAC-X segment belong. For example, in Figure 4, two segments make up the Nth ATRAC-X slot. However, because these two segments are from would play in the same amount of time. As another example, in Figure 5, four segments make up the Nth ATRAC-X slot. However, because decoded audio samples from each segment would all play at the same time, they are in the same slot. Hatanaka [Page 16] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-02.txt June 2003 10. Security Considerations RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [1]. This implies that confidentiality of the media streams is achieved by encryption. Because the data compression used with this payload format is applied end-to-end, encryption may be performed on the compressed data so there is no conflict between the two operations. 11. References [1] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson "RTP: A Transport Protocol for Real Time Applications", RFC 1889, January 1996. 12. Author's Address Mitsuyuki Hatanaka Sony Corporation 6-7-35 Kitashinagawa Shinagawa-ku Tokyo,Japan EMail: hatanaka@av.crl.sony.co.jp Jun Matsumoto Sony Corporation 6-7-35 Kitashinagawa Shinagawa-ku Tokyo,Japan EMail: jun@av.crl.sony.co.jp Matthew Romaine Sony Corporation 6-7-35 Kitashinagawa Shinagawa-ku Tokyo,Japan EMail: Matthew.Romaine@jp.sony.com Hatanaka [Page 17]