INTERNET-DRAFT J. Lazzaro July 7, 2005 J. Wawrzynek Expires: January 7, 2006 UC Berkeley Requirements for a Stage and Studio Multimedia Framework Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on January 6, 2006. Copyright Notice Copyright (C) The Internet Society (2005). All Rights Reserved. Lazzaro/Wawrzynek [Page 1] INTERNET-DRAFT 7 July 2005 Abstract Is the IETF multimedia stack appropriate for use in the digital audio equipment found in recording studios and concert halls? To help answer this question, this memo lists the requirements for a session management framework for stage and studio devices. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Bidirectional Heretogeneous Media Flows . . . . . . . . . . . . . 4 4. Fine-Grained Media Selection and Control . . . . . . . . . . . . 4 5. Presentation and Capture Timing . . . . . . . . . . . . . . . . . 5 6. Sample Accurate Signal Processing . . . . . . . . . . . . . . . . 5 7. Session Chaining . . . . . . . . . . . . . . . . . . . . . . . . 6 8. Multicast Support . . . . . . . . . . . . . . . . . . . . . . . . 7 9. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 7 11. Security Considerations . . . . . . . . . . . . . . . . . . . . 7 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 13.1 Normative References . . . . . . . . . . . . . . . . . . . 8 13.2 Informative References . . . . . . . . . . . . . . . . . . 8 14. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 15. Intellectual Property Rights Statement . . . . . . . . . . . . . 9 16. Full Copyright Statement . . . . . . . . . . . . . . . . . . . . 9 17. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Lazzaro/Wawrzynek [Page 2] INTERNET-DRAFT 7 July 2005 1. Introduction Digital technology has made a deep impact on how contemporary music is performed and how all audio content is produced. Microprocessors are ubiquitous on stage and in the recording studio: a few in personal computers, but most in embedded systems. However, Internet technologies have not yet truly hit the stage and studio world. Digital media flows between computers largely occur via USB, Firewire, and specialized digital transports (S/PIDF and AES synchronous protocols, and customized versions of Ethernet). Why hasn't the IETF content-streaming protocol suite (RTP [1] and its payload formats, SDP [2], and RTSP [4]) found a home in this world? This memo is an attempt to start a discussion on this topic. We list the requirements for a framework for stage and studio applications. To highlight the challenges of the requirements, an ASSESSMENT heading in each section discusses how a strawman IETF architecture would handle the requirement. [3] describes the strawman architecture in detail. In the strawman architecture, stage and studio devices are Real Time Streaming Protocol (RTSP) servers, and operating systems access devices via RTSP clients. RTSP is in widespread use as a session manager for audio and video content-streaming on the Internet [4]. 2. Discovery Today, adding a new digital device to a recording studio is (usually) easy. Most devices use USB, and are USB class-compliant for audio and MIDI. A user connects the new device to a computer using a USB cable, and the operating system detects its presence. Applications display audio and MIDI sources and sinks for all attached devices upon user request. This level of automatic discovery is a requirement for a stage and studio Internet framework. Users should be able to add a device to a wired or wireless LAN, and shortly thereafter audio and MIDI inputs and outputs from the device should accessible by applications. Users should not have to do anything (install drivers, manually set up network sessions, etc) to make this happen. ASSESSMENT Automatic discovery at the application level should not be a challenge for our strawman architecture. Dnsext protocols [5] [6] may be used to advertise services over link-local multicast. The advertisement will Lazzaro/Wawrzynek [Page 3] INTERNET-DRAFT 7 July 2005 point to the network address and port for the RTSP server. The framework should specify a set of normative RTSP URLs that clients may access with the DESCRIBE method to discover what the device does and how to access it. Depending on the design, the body returned by DESCRIBE may use the Session Description Protocol, or may use some other new or existing protocol. 3. Bidirectional Heretogeneous Media Flows Many stage and studio devices support input and output of several types of media. For example, a breakout box might send 8 channels of audio input onto the network (originating from 8 analog audio input jacks on the box), receive 8 channels of audio output from the network (which it would send to 8 analog audio output jacks on the box), along with several pairs of MIDI input and output jacks. Support for bidirectional heretogeneous media flows is a requirement for a stage and studio Internet framework. Thus, the framework concerns sessions (bundles of "wires") not an individual "wire" (as would a framework that provided a "virtual patchbay" service). ASSESSMENT Heretogeneous flows are simple in our strawman architecture, as session descriptions support the synchronized transport of audio, MIDI, and other media types. Bidirectional flows are a challenge for our strawman architecture, because RTSP's SDP session descriptions are recvonly by convention, and the semantics of control URLs deeply reflect this assumption. 4. Fine-Grained Media Selection and Configuration Stage and studio devices that use USB or Firewire permit fine-grained control and selection of media. For example, an 8-input/8-output USB breakout box sends descriptive names for each channel for presentation to the user. Users may dynamically select which I/O channels should send or receive media flows (and sample rates and bit-depths) via applications or operating-system utilities. Fine-grained media selection and control is a requirement for a stage and studio Internet framework. Lazzaro/Wawrzynek [Page 4] INTERNET-DRAFT 7 July 2005 ASSESSMENT Fine-grained media selection and control are challenges for our strawman architecture. RTSP's SDP conventions do provide some tools: clients can choose to use a subset of offered control URLs, and clients sending an audio stream can signal the use of one of a small set of sample rate and bit-depth combinations via the payload type. However, these tools do not cover all situations, and the use of these tools for devices with a large number of inputs and outputs is unwieldy. 5. Presentation and Capture Timing Stage and studio users expect to control the presentation time of audio outputs and the capture time of audio inputs. More specifically, users expect all audio inputs and outputs from a breakout box to use the same sample clock. To support the use of multiple breakout boxes, users expect a breakout box to accept a remote sample clock input, and to generate an output sample clock for other devices to use. Users expect the input and output latencies of breakout boxes to be deterministic (within the limits of clock jitter), and knowable (via signalling, a user manual, or measurement). Presentation and capture timing control is a requirement for a stage and studio Internet framework. ASSESSMENT These issues are a challenge for our strawman protocol. Traditionally, RTP considers sender and receiver behaviors in these respects to be outside its domain. RTSP and SDP do not offer tools for signalling this sort of timing information during session setup. 6. Sample Accurate Signal Processing In a breakout box, the input and output flows are independent. However, not all stage and studio devices fit the breakout box model. In some stage and studio devices, the output flows are produced in response to input flows. We refer to these devices as "signal processors". For example, a reverberation unit is a signal processor: it accepts a "dry" audio input sample stream and generates a "wet" audio output sample stream from its input. As a second example, a music synthesizer is a signal processor: it accepts a MIDI input stream, and generates an audio output sample stream from its MIDI input. Lazzaro/Wawrzynek [Page 5] INTERNET-DRAFT 7 July 2005 With few exceptions, current implementations of signal processor hardware devices are not "sample-accurate". In other words, for a reverb unit, there is no way to know at the transport level that a particular output audio sample corresponds to a particular input audio sample. In current practice, audio engineers work around the lack of sample accurate transport by estimating a nominal latency for the device. However, transport-level sample-accurate signal processing is desired by "change agents" in the stage and studio community, as it would bring exact repeatability to the studio workflow. Sample-accurate signal processing is a requirement for a stage and studio Internet framework. ASSESSMENT Signal processors in general, and sample-accurate signal processing in specific, are challenges for our strawman architecture. SDP does not have the semantics for expressing that an output flows depends on an input flow. An assessment of RTP's present capability for sample- accurate signal processing is controversial: some would say that the NTP timestamps in RTCP packets are sufficient, others would argue for the need to label particular RTP timestamps with a timecode value (via RTP or RTCP). 7. Session Chaining Stage and studio devices are often connected in serial and parallel configurations. For audio devices that support S/PDIF and similar protocols, devices may be interconnected manually using cables, or electronically using digital patchbays. For devices that support USB or Firewire, a personal computer program usually simulates a patchbay. In an Internet framework, series and parallel interconnections could be expressed within the description of a single session. This functionality (which we call "session chaining") is a requirement for a stage and studio Internet framework. ASSESSMENT Simple forms of session chaining would not be difficult to add to the strawman architecture. Simple session chaining is within the expressive power of SDP, as the connection information is specified on a per-RTP- session basis. Lazzaro/Wawrzynek [Page 6] INTERNET-DRAFT 7 July 2005 8. Multicast Support Stage and studio devices that network using shared wires sometimes use the shared physical fabric to do multicasting. For example, powered speakers may have a feed-through port for its audio input, so that a set of speakers may be driven by a single daisy-chained wire. The potential use of multicast for media flows is a requirement for a stage and studio framework. ASSESSMENT Multicast is supported by our strawman architecture, as multicast is within the expressive power of SDP. 9. Discussion As an early draft of an individual submission, the requirements listed above reflect the views of the authors. One purpose to be served by making this I-D a working-group item is to elicit feedback from the stage and studio community, so that the document evolves to represent a community consensus. At that point, serious evaluation can begin on the appropriateness of using the IETF protocol stack (in its original form or an augmented form) to meet the requirements, and the interest in the working group to support that work. 10. Acknowledgements This work is a spin-off of the RTP MIDI work in AVT. We thank the RTP MIDI community for insights into the problem domain. 11. Security Considerations Security requirements for stage and studio protocols will be added to later versions of this I-D. 12. IANA Considerations None. 13. References Lazzaro/Wawrzynek [Page 7] INTERNET-DRAFT 7 July 2005 13.1 Normative References None. 13.2 Informative References [1] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson. "RTP: A transport protocol for real-time applications", RFC 3550, July 2003. [2] Handley, M., Jacobson, V., and C. Perkins. "SDP: Session Description Protocol", draft-ietf-mmusic-sdp-new-22.txt. [3] Lazzaro, J. and J. Wawrzynek, "An RTP payload format for MIDI", expired version draft-ietf-avt-rtp-midi-format-08.txt, linked at http://ietfreport.isoc.org/all-ids/draft-ietf-avt-rtp-midi-format-08.txt The strawman architecture appears in Appendix C.6.2. Note that is section (and the strawman architecture) has been removed from the current version of the RTP MIDI draft. [4] Schulzrinne, H., Rao, A., and R. Lanphier. "Real Time Streaming Protocol (RTSP)", RFC 2326, April 1998. [5] Cheshire, S. and M. Krochmal. "DNS-based Service Discovery", draft-cheshire-dnsext-dns-sd-03.txt, June 2005. [6] Cheshire, S., and M. Krochmal, "Multicast DNS", draft-cheshire-dnsext-multicastdns-05.txt, June 2005. 14. Authors' Addresses John Lazzaro (corresponding author) UC Berkeley CS Division 315 Soda Hall Berkeley CA 94720-1776 Email: lazzaro@cs.berkeley.edu John Wawrzynek UC Berkeley CS Division 631 Soda Hall Berkeley CA 94720-1776 Email: johnw@cs.berkeley.edu Lazzaro/Wawrzynek [Page 8] INTERNET-DRAFT 7 July 2005 15. Intellectual Property Rights Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf- ipr@ietf.org. 16. Full Copyright Statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement Funding for the RFC Editor function is currently provided by the Internet Society. Lazzaro/Wawrzynek [Page 9] INTERNET-DRAFT 7 July 2005 17. Change Log [Note to RFC Editors: this Appendix, and its Table of Contents listing, should be removed from the final version of the memo] Initial release. Lazzaro/Wawrzynek [Page 10]