Conference Recording

  1. Which software are we extending? sipconf
  2. Features to be supported and priorities? Recording using any combination of AU, WAV, and rtpdump format. Option to do local file based recording or using external rtspd media server. Playing back a recorded stream in to an existing conference. All configuration should be via web ineterface and sql database. Logging of new participant joining or leaving. Lower priority: TCP based media in librtsp and rtspd.
  3. Existing libraries or modules? sipconf and related libraries like RTP.
  4. Programming language? C (and may be C++)
  5. Operating system? Solaris/Linux
  6. GUI tool kit? web cgi for last part of the project.
  7. Component testing? Using Quicktime, sipc and RAT.
  8. Error logging? not applicable.
  9. Installation mechanism? integrated into cinema (sipconf and rtspd)
  10. Milestones? described in this document.
A centralized conference has a conference server to which all the participants connect. The conference server mixes and redistributes the multimedia data from the participants.

Henning's comments (Sept 26)

Note that there should be no commandline options. Recording (on/off) and format should be configured through the SQL interface. It must be possible to record in multiple formats simultaneously. Since I was in mysql, I added a column 'recordingformat' as a SET.

As part of conference recording, you should record when people join and leave the conference. For rtpdump, one could presumably generate "artificial" RTCP packets, but I think a text or SQL log would be better. Given experience, an SQL log is probably better.

Each conference should have a unique identifier which is never re-used. I just created an 'id' column for that purpose.

Also, make sure RTCP is recorded for rtpdump.

Not sure how MESSAGEs, once they are supported, are to be recorded. Again, probably SQL table.

Session Initiation Protocol (SIP) is used to establish the multimedia session between the server and the participants. Once the session is established the real time multimedia data (audio and video) are carried over Real-time Transport Protocol (RTP). The actual multimedia data is encoded using some audio/video codec, e.g., G.711, GSM, and ADPCM for audio, and H.261, H.263 and MPEG for video.

We have implemented a centralized conferencing server sipconf that alows multiple SIP users to take part in an audio video conference. One important feature of the conference server is the ability to record the conference. sipconf can be extended to allow recording of an ongoing conference. It should later be able to play back the recorded session.

In our implementation, there is a thread for every participant in a conference. Let's call this the receive thread. This thread keeps listening for RTP or RTCP packets from the participant. When a packet is received it decodes it and puts it into a centralized buffer. Packets from all the participant in one conference are accumulated in this centralized buffer and thus mixed. There is another thread, send thread, for each conference which periodically (say, every 40 ms) gets the mixed stream and transmits it back to the participants. The mixing is needed only for audio. Video streams can be replicated without modification. Also, the server has to remove the participants audio samples from the mixed stream before sending it to him so that he does not hear himself.

For recording a conference, the server can write a copy of all audio and video into a file, using either WAV, AU or rtpdump format. If a WAV or AU format is used then the server can write the mixed audio stream into the file. This can be done in the send thread when the mixed stream is created for the 40 ms interval. If rtpdump format is used then the packets can be dumped into the file as and when they arrive. This can be done in the receive thread as soon as the RTP packet is received.

Part-1: Initial Reading

  1. Read the sipconf architecture from this paper. Just get an idea of how it works.
  2. Get access to CVS and checkout the cinema module. Go through the mixing algorithm described in file libmixer/sendrecv.c. Study the implementation of send and receive threads. Look into libmixer/video.c to see how video replication works.
  3. Locate the places in these files where you need to insert the recording part, for both WAV/AU and rtpdump format.
We will start with WAV/AU file recording. For recording a conference in a file you need to implement three functions, as following, and insert them into the existing sipconf code.
CreateSoundFile
This function creates the sound file for writing using the specified format.
CloseSoundFile
This function closes the sound file.
WriteSoundFile
This function writes the received packet into the sound file.
When the conference is started the server will invoke CreateSoundFile to start recording in a file. Every 40 ms (or whatever the send thread interval is) the send thread will call the WriteSoundFile to write the mixed stream into the file. Once the conference is terminated CloseSoundFile will be called.

You can get the description of AU and WAV formats from the web. Alternatively you can read the rtspd/snd.c and rtspd/wav.c files in CVS to get an idea of these formats. You may also reuse part of the code from these files for handling AU/WAV formats. Both these types of files have an header (initial few bytes in the file) followed by audio samples. You MUST open these files in binary mode (so that it works fine on Windows platform). An AU file format is shown below.

          +------------------------------------+
          |        24 bytes header             |
          +------------------------------------+
          |   Optional Info (null terminated)  |
          +------------------------------------+
          |         audio samples              |
          |                                    |
                       . . . .
The 24 byte header can be described the following C structure.
typedef struct {
  u_int32 magic;          /* magic number */
  u_int32 hdr_size;       /* size of this header, with info (in bytes) */
  u_int32 data_size;      /* length of data (optional) */
  u_int32 encoding;       /* data encoding format */
  u_int32 sample_rate;    /* samples per second */
  u_int32 channels;       /* number of interleaved channels */
} audio_filehdr_t;
The magic code is 0x2e736e64 (which represesnts ".snd" in ascii). There is a list of encoding to be used, but in this project we will use only 8-bit Mu Law encoding (encoding=1). Sample rate to be used is 8000. Our recording will have only one channel. Info may contain information about the file, e.g., the session or conference name. Since data size is also present in the header it is important how to write that in the file while recording given that you will not know the total size till the recording ends whereas you have to write the header in the begining. The trick we use is to write the header initially with data_size=0. When the file is about to be closed, calculate the size and overwrite this field in the file. You can use fseek and associated functions. You may want to look at SndClose function in snd.c.

For WAV file format, please see wav.c file. WAV files can be written with multiple data chunks. For now, we write WAV files with a single data chunk. WAV (RIFF) files consist of chunks. Here also we will use PCM Mu Law format with 8bits per sample at 8000 Hz sampling rate.

struct {
  char magic[4] = "RIFF";   magic constant
  u_int32 length;           total length of file - 8
  char type[4]  = "WAVE";   designates as WAVE file
  union {
    struct {
      char type[4] = "fmt ";      type of chunk
      u_int32 length;             length in bytes
      u_int16 wFormatTag;         data format
      u_int16 wChannels;          number of channels
      u_int32 wSamplesPerSec;     samples per second per channel
      u_int32 wAvgBytesPerSec;    estimate of bytes per second
      u_int16 wBlockAlign;        byte alignment of a basic sample
      u_int16 wBitsPerSample;     bits per sample
      u_int16 wExtSize;           header extension size
      u_int16 wSamplesPerBlock;   samples per block
    } fmt_chunk;
    struct {
      char type[4] = "fact";      type of chunk
      u_int32 length;             length in bytes
      u_int32 dwFileSize;         number of samples
    } fact_chunk;
  } chunk[];
}

Since the mixed audio stream will be 16-bit linear, you will need to encode it before writing into the file. libmixer/g711.c implements the G.711 Mu Law codec, which we use in the project. See libmixer/transcode.c on how to invoke the codec.

You will also need to take care of silence periods. During silence the mixed stream will be empty, but you need to record the silence period in the file. This can be done by assuming the linear 16-bit samples were all zeros during silence.

Part-2: AU and WAV format recording due Oct 5.
  1. Compile the sipconf code from CVS. Test it using sipc. sipc can be found on CS machines at /proj/irt-gc2/irt/sipc.bin.
  2. Implement the three functions mentioned above for both AU and WAV formats. Functions the same but the arguments identify if it needs an AU or WAV format. (You can reuse code from snd.c, wav.c, g711.c)
  3. Integrate these functions into sipconf code and record a three party conference. Playback the recorded audio using standard tools like Media Player to see it works.
rtpdump and related tools allow you to dump RTP/RTCP traffic in a file which can later be played back. One disadvantage of AU/WAV format for conference recording is that it can not record multimedia (i.e., video also). Secondly it does not optimize the recording during silence. rtpdump on the other hand, just dumps the RTP packets in the file, so you can have any number of media channels. Also during silence when no body sent any packet nothing will get written to the file. Since the file also contains the timing information from the RTP header, the recording can be played back later with the same timing effects. Note that you do not have to interpret the RTP header or the paylaod when using this format.
Part-3: rtpdump format due Oct 15
  1. Study the format of rtpdump.
  2. Implement the binary recording using rtpdump. You need to modify the three functions implemented earlier. Insert these code into the sipconf.
  3. Provide a command line option to decide which mode has to be used for recording. At this point your program should be able to record in a file using any of the AU, WAV or rtpdump format.
  4. You should also record RTCP packets. Consider creating dummy RTCP for new join or leave.
  5. Test your recorded rtpdump file using rtpplay and RAT.
Real-time streaming of multimedia content involves a media server that has the content and a media client that needs it. Real-Time Streaming Protocol (RTSP) is used between the client and the server to negotiate the initiation and termination of a streaming session. Once the session is established, the media server sends the media packets using RTP to the client. RealPlayer and QuickTime are some of the RTSP clients. In short, RTSP defines a VCR protocol for Internet multimedia streaming where the content is at the server. RTSP allows recording of real-time content as well, although most of the available clients do not feel the need to support that.

We have implemented an RTSP media server, rtspd. It supports playback and recording of G.711 Mu Law audio. It can record using AU format. Our sipum voice mail system uses rtspd for recording of voice mails. See sipum/vmail.cpp for details on how to interact with remote RTSP server.

Part-4: rtspd enhancements by Oct 30
  1. Read the RTSP specification. Get a general idea of how the protocol works. Examples in the end are very useful.
  2. Study rtspd code to understand the program flow.
  3. Implement recording and playback of rtpdump format in rtspd.
  4. Test whether the recording and playback of AU and WAV formats work properly in rtspd.
  5. Implement RTP over TCP in rtspd. This is useful when the client is behind a firewall.
Once we accertain ourselves that the RTSP server is able to handle playback and recording of AU, WAV and rtpdump format, we will use that to record the conference. For recording using rtspd, you will need to send the
Part-5: Conference recording using rtspd by Nov 10
  1. Provide another command line option in sipconf to record using rtspd. The rtsp URL for recording should be passed from the command line.
  2. Study sipum/vmail.cpp code to find out how RTSP stream recording is initiated and terminated.
  3. Reuse the code from sipum/vmail.cpp, to allow RTSP recording in the three function mentioned above. You should establish the session during CreateSoundFile and terminate it during CloseSoundFile. Those three function allow us to separate the recording part from the rest of the code. You will need to use the RTP library to send the RTP packets to the RTSP server for recording.

At this point our project is capable of recording a conference into a file (AU/WAV or rtpdump) or to external media server. The next step is to enhance the usability of the system so that it can be controlled from the web interface and can use the existing database for configuration and control. This means get rid of any command line option. Instead use the web scripts and sql database.

A tutorial will be arranged describing the low level details of web scripts and database used in our environment.

Part-6: Enhanced control by Nov 30
  1. Study the conferencing database format and web scripts. MySQL Tables: 'conferences', and web/ files ConfEdit.cgi, ConfList.cgi, ConfUtil.cgi.
  2. Install the web interface in your account and get it working. You may use the installation program for cinema (Contact Kundan)
  3. Change the database to include a list of recording URLs. URLs are of the form
    file:///conference-id.au - for AU files
    file:///conference-id.wav - for WAV files
    file:///conference-id.rtp - for rtpdump format
    rtsp://server:port/conferences/conference-id.{au,wav,rtp} - for RTSP recording.
    There can be multiple such URLs in this field of the database table 'conferences'. Also modify the associated files, install.tcl, scripts/createdb, createsip.sql and altersip.sql. You should use another database instead of messing with our installation on marta.cs.columbia.edu.
  4. Modify the web interface to include options to record in various formats (au/wav/rtpdump) and various modes (local/rtsp). Your script should generate a new unique conference id when creating a conference.
  5. sipd/test/siptc can be used to send an independent SIP message to any address. Use this to send a NOTIFY message to the SIP conferencing server from the web script to indicate change in the status (start/stop recording). We should discuss the NOTIFY message headers and message body before implementing. Modify sipconf, to receive this notification and act on it accordingly. The host name and port number used for running sipconf should be stored in the database as a separate table (sipconf). TBD. Recording should not be enabled if the database table does not have any recording URLs.
Now we have a fully functional SIP conference recording mechanism where a conference administrator can go to the web interface and control the recording attributes.

Conference recording is needed so that it can later be played back possibly in a future conference. We have a web based interface to control and manage the system including the conferencing server. However, most of the web features are currently not implemented for controlling the server. It should be possible for a user or administrator to playback a conference recording into an exiting conference from the web interface. We will allow any RTSP server to play content into the conference. The user or adminstrator specifies the current conference URL and the RTSP URL. The web interface should then instruct the conference server to start playing the multimedia stream from the RTSP URL (along with that of the participants). The conference server on getting such instruction will open a RTSP stream with the RTSP server at the given URL, and setup a session for playback. The incoming RTP packets from the RTSP server will be treated as packets from another participant, except that this participant is in speak-only mode.

Challenges are two-fold: how to instruct the server from the web interface and the actual playback of media. For the first part we will use SIP NOTIFY message from the web interface to the conference server. The second part involves more of programming.

Part-7: Playback of recorded stream by Dec 10

Implement RTSP server playing contents into the conference. You need to support rtpdump, AU and WAV formats.

Consider writing a technical report describing your project. This is optional.