Conference Recording

Chin-Hong Lin
Columbia University
New York, NY 10025, USA

Agung Suyono
Columbia University
New York, NY 10025, USA


Session Initiation Protocol (SIP) is a signaling protocol that handles the setup, modification, and
tear down of multimedia sessions.  After the session is established, the real-time multimedia data
(audio and video) are carried over Real-time Transport Protocol (RTP) where the data itself is
encoded using audio/video codec, like G.711, GSM, and ADPCM for audio, and H.261, H.263,
 and MPEG for video.  In this project, we utilize an existing centralized conferencing server sipconf
that allows multiple SIP users to participate in an audio video conference and extend the server's
functionality to allow recording of an ongoing conference.  Our first goal is to record the conference
in three different formats, namely AU, WAV, or rtpdump.

Real Time Streaming Protocol (RTSP) is becoming a standard of network remote control for
multimedia servers that have the multimedia contents (audio/video).  RTSP is used between the
client and server to negotiate the initiation and termination of a streaming session.  After the session
is established, the media server sends the multimedia packets to the requesting client using RTP.
Our next step is to enhance the functionality of RTSP media server rtspd to support recording in
rtpdump format where AU or WAV format recording is already supported.

At last, we enhance the usability of the system where it can be controlled from the web interface and
can use the existing database for configuration and control.  This allows the system to retrieve
information dedicated for conference recording without cumbersome command-line options specified
during server startup.

1. Introduction

In the implementation of the SIP conferencing server, sipconf will create a thread for each participant
to listen or receive the RTP/RTCP packets from this participant where the receive thread is denoted as
RTPReceiveThread.  After a RTP packet is received, sipconf will decode the packet media payload into
Linear and put the result into a centralized buffer.  Packets from all the participants in one conference
are accumulated in this centralized buffer and thus mixed.

The conferencing server at the meantime will create another send thread (RTPSendThread in the
implementation) for each conference which gets the mixed stream periodically (e.g., every 40 ms) and
transmits the mixed result back to the participants.  The mixing is needed only for audio where the video
streams can be replicated without modification.  In addition, the server has to remove the audio data
belonging to the participant from the mixed stream to whom the server is going to send before sending
the mixed result to participants.The way we do this is to ensure the participant will not hear his/her own

For recording purpose, we must record all the media contents from each participant to the file according
to the desired file format.  If AU or WAV is chosen, then the server will write the mixed audio stream
into the target file.  If rtpdump format is chosen, the packets will be dumped to file as soon as they
arrive before mixing.

The organization of the report is as follows:
1. Introduction
2. AU and WAV format recording
3. rtpdump format recording
4. rtspd enhancements
5. Enhanced control
6. Program documentation
7. Task List
8. References

2. AU and WAV Format Recording

Before we step into details about recording, the following are some introduction for these two sound file

AU file format
This format is developed by SUN and serves as a standard for UNIX computers.
The 24 byte header can be described by the following C structure.

typedef struct {
  u_int32 magic;          /* magic number */
  u_int32 hdr_size;       /* size of this header, with info (in bytes) */
  u_int32 data_size;      /* length of data (optional) */
  u_int32 encoding;       /* data encoding format */
  u_int32 sample_rate;    /* samples per second */
  u_int32 channels;       /* number of interleaved channels */
} audio_filehdr_t;

WAV file format
This file format follows the RIFF (Resource Information File Format) specification. It was developed by
IBM and Microsoft as a counterpart of AIFF on Macs.  It is the native sound format of Windows
machines for recording and playback of recorded sound.

WAV files can be written with multiple data chunks. For this project, we write WAV files with a single
data chunk.

typedef struct {
    char magic[4];                         /*= "RIFF";    magic constant */
    u_int32 length;                         /* total length of file - 8 */
    char type[4];                           /*  = "WAVE";    designates as WAVE file */
    struct {
     char type[4];                           /* = "fmt ";       type of chunk */
     u_int32 length;                        /* length in bytes */
     u_int16 wFormatTag;             /* data format */
     u_int16 wChannels;                /* number of channels */
     u_int32 wSamplesPerSec;      /* samples per second per channel */
     u_int32 wAvgBytesPerSec;    /* estimate of bytes per second */
     u_int16 wBlockAlign;             /* byte alignment of a basic sample */
     u_int16 wBitsPerSample;       /* bits per sample */
    } fmt_chunk;
  struct {
    char type[4];                           /* = "data";       type of chunk */
    u_int32 length;                        /* length of the data (chunk size minus (-) 8 bytes */
    } data_chunk;
} wave_filehdr_t;

Sound Coding Technique

For these two file format in our implementation, we will use G.711 Mu Law codec with 8bits per
sample at 8000 Hz sampling rate.  Mu law is a variant of G.711, which is used primarily in North
America.  The other variant is A-law; the difference between the two is the manner in which
non-uniform quantization is performed.  G.711 is a waveform codec and is often called
Pulse-Code Modulation (PCM).

Procedures to do conference recording

Where do we put the functionality in Conferencing Server sipconf?
The recording for AU and WAV is done in send thread which is the thread the conferencing server
creates for each conference.  The thread is denoted as RTPSendThread in cinema/libmixer/sendrecv.c.

Before we can proceed the recording, the necessary step is to initialize the file header for the sound file.
The function we call is FILE *CreateSoundFile(char *filename, encoding_used sndformat); in the
parameter sndformat, we can specify the information dedicated for the desired file format.

Since the server will mix the samples from all participants into a mixed stream and send the mixed
results (exclude the participant's own samples) back to each participant periodically.  Thus, at the end
of each interval when mixing is completed and before subtracting participants' own samples is the
appropriate time to write the mixed results to the files (AU or WAV).  Before the actual writing is taking
place, since the mixed audio stream is 16-bit linear, we need to transcode the mixed results to the
encoding format used in the sound file.  Please see sound coding technique for our implementation.

Since there is a file-size field in the file header, before we close the sound file, we need to specify this
value.  For this purpose, when we do the recording, we also keep a record of data size being written
to the file which is accumulated throughout the lifetime of the conference.  Once the conference is
terminated, we call the function void CloseSoundFile(FILE *FN, u_int32 data_size) with calculated
total file size to close the file.  At this point, the conference recording for AU or WAV is done.

3. Recording Using rtpdump Format

Where do we put the functionality in Conferencing Server sipconf?
The recording for rtpdump is done in receive thread which is the thread the server sipconf creates for
each participant in the conference.  The thread is denoted as RTPReceiveThread in

Before we can do the recording for rtpdump format, we also need to initialize the file header.  The
function we call is void rtpdump_header(FILE *out, struct sockaddr_in *sin, struct timeval *start); in
the parameter list,sin is the socket address of the mixer (in this case, sipconf), and start is the time
stamp for the start of the conference.

The rtpdump header described using C structure is as follows:

typedef struct {
  struct timeval start;    /* start of recording (GMT) */
  u_int32 source;        /* network source (multicast address)--in our case, sipconf */
  u_int16 port;           /* UDP port --in our case, sipconf*/
} RD_hdr_t;

Since there are multiple receive threads for a conference with multiple participants, we should have only
one rtpdump file for each conference.  When the packet is receive by the server, it will be dumped to
the file via calling the function void packet_handler(FILE *out, int trunc, double dstart, struct timeval
now, int ctrl, struct sockaddr_in sin, int len, RD_buffer_t packet).  In the parameter list, trunc specifies
the max size in RTP/RTCP packets in case the packet size is too large; dstart is the time stamp when the
conference begins; now is the current time stamp to accommodate delay jitter; ctrl is the control parameter
to control actual recording, 0 for RTP and 1 for RTCP; sin is the socket address for the participant
sending this packet; packet is the received RTP/RTCP packet.

The packets will be written to file chronologically so the value of time stamps in the file will increase
accordingly.  Since the file also contains timing information from the RTP header, the recording can be
play back later with the same timing effects.  For this format, we do not manipulate the RTP packet's

4. rtspd Enhancement

In this part, we extend the functionality of an RTSP media server, rtspd, which already supports
playback and recording of G.711 Mu law audio.  It can record using AU format.

We modify the rtspd so that it can support for rtpdump.  We utilize the existing rtpdump functionality
from to read and write rtpdump file.  The file
we modify is rtpfile.c, and we add a file to support rtpdump utility function in rf.c and rf.h.

Since rtpdump format is different between RTP and RTCP packet, we differentiated between them
for the incoming buffer by recognizing the payload type in the packet header; if payload type is
between 200 to 204 inclusive, it is RTCP, otherwise is RTP.

5. Enhanced Control

The last part of our project is to control recording mechanism without command line options.  Our goal
is to use web interface and existing database for configuration and control in this regard.  The database
'sip' contains the table 'conferences' which has all information for conferences.  The column
'recordingformat' can be configured with the desired recording file format, e.g., AU, WAV, or rtpdump.
The conference owner can control this feature through web interface.

When the user login to CINEMA web interface, after pressing the 'conference' button, it will show the
conference list.  In the list, after pressing 'Edit' besides the conference url, the web script ConfEdit.cgi
will be invoked.  The web page then shows all the editable fields in the table 'conferences' where the
conference owner can specify the recording format.  After the edit button on this page is pressed, all
the information will be updated to sql database to reflect the changes.

The conferencing server at this time can retrieve the information about recording format for each
conference in the database.  If no format is specified, the recording will not take place.  We achieve
this by inserting embedded SQL commands in cinema/libmixer/sendrecv.c with the help from
cinema/libdb++/dbapi.h.  The file naming convention is conf_ID.recordingformat, where ID is the
primary key for each conference and recordingformat is AU, WAV, or rtpdump.

6. Program Documentation

Here we explain step by step how to run and test the modified sipconf.  We will introduce two kinds of
environment: Linux and SunOS.  First we need to unpack the file confrec.tar.gz.  It will create two
directories: libmixer and rtspd, under a directory confrec.

 $ gunzip confrec.tar.gz
 $ tar xvf confrec.tar
 $ ls confrec/libmixer sendrecv.c sendrecv.h sndfile.c sndfile.h
 $ Ls confrec/rtspd rtpfile.c rtpfile.h rf.c  rf.h

Copy the files under confrec/libmixer to cinema/libmixer, and the files under confrec/rtspd to cinema/rtspd.

In the cinema directory, create a directory for sipconf installation.  For example, in this test we name
it 'linux-sipconf'.  Configure it in order to be compatible with the linux platform, and compile the sipconf.

 $ pwd
 $ mkdir linux-sipconf
 $ cd linux-sipconf
 $ ../configure  --with-rtp=/proj/irt-gc4/rtp/Linux --with-mysql=/proj/irt-gc4/mysql/Linux
 $ make sipconf

After compilation finishes, run the sipconf with the following:

 $ CD sipconf
 $ ./sipconf -d -X -D SQL://

Make sure that you indicate the path to necessary library to run the sipconf, i.e. make sure you put
the following line in the file $HOME/.profile

export PATH=/proj/irt-gc4/mysql/Linux/lib/mysql:$PATH

Assume we run the programs on machine.  Under cinema directory, create
a working directory to install sipconf, e.g., sun-sipconf.

 $ pwd
 $ mkdir sun-sipconf
 $ CD sun-sipconf
 $ ../configure  --with-rtp=/proj/irt-gc4/rtp/SunOS --with-mysql=/proj/irt-gc4/mysql/SunOS
 $ make sipconf
 $ CD sipconf
 $ ./sipconf -d -X -D SQL://

Also make sure that you indicate the path to necessary library:

export LD_LIBRARY_PATH=/proj/irt-gc4/mysql/SunOS/lib/mysql:$LD_LIBRARY_PATH

Testing Tools

SIP Client

This program represents a participant for a conference, i.e., the client for the SIP conferencing server.
For Linux machine, we may use sipc from /proj/irt-gc2/irt/sipc.linux directory.  Gor to the directory,
and run the sipc.

 $ CD /proj/irt-gc2/irt/sipc.linux
 $ ./sipc &

sipc must be run on local machine since it need to use local audio device as an input.
Use sipc to connect to the available conference in the sipconf server.  If sipconf is running on machine, you may make a sip conference call to that machine.  For example, you
may specify the conference url in the sipc dialog window:

Tools to Test Our Recording Results

for AU and WAV Format
The .wav and .au format as a result of recording can be found in the cinema/linux-sipconf/sipconf/.
The file is named using the convention: and conf_ID.wav, where ID is the conference id.

You can use a Window Media Player, Real Player, etc. to playback the .wav and Au files.

for rtpdump Format
In cinema/sun-sipconf/sipconf/, we will find the rtpdump file.  The file naming convention is conf_ID.rtp.

Assume that we have already run rat application in the machine (rat will
automatically run if we start sipc).  Assume that rtpplay has been already installed in the machine.  From
another machine, for example,, run the rtpplay:

$ rtpplay -v -T -f /$WORKDIR/cinema/linux-sipconf/sipconf/conf_ID.rtp vienna/10000

You may hear the recording result from machine.
If the rtpplay has not yet been installed in the machine, download and install the file from

rtspd Installation
From the unpacked confrec.tar.gz file above, copy confrec/rtspd to cinema/rtspd.  Make a working
directory under cinema for rtspd installation, and compile the rtspd.

 $ pwd
 $ mkdir linux-rtspd
 $ CD linux-rtspd
 $ ../configure
 $ make rtspd

7. Task List

Student name: Chin-Hong Lin
Tasks: AU and WAV recording for sipconf
            rtpdump file creation and rtpdump format recording for sipconf
            Web cgi scripts and database modification for enhanced control

Student name: Agung Suyono
Tasks: AU and WAV file creation and closing for sipconf
             rtspd enhancement for rtpdump recording
             Embedded SQL query in sipconf for enhanced control

8. References

1    Jon Crowcroft, Internetworking Multimedia, Morgan Kaufmann Publishers, CA, 1999
2    RFC for RTP/RTCP
3    RFC for SIP
4    RFC for RTSP
5   Kundan Singh, Gautam Nair and Henning Schulzrinne, "Centralized Conferencing
      using SIP". Proceedings of the 2nd IP-Telephony Workshop (IPTel'2001), April 2001.