CoVit is a robust video network tool which can send and receive video data over internet in real-time. The software captures video from a webcam, encodes the video frames and sends the data over the network to a receiver using RTP. The receiver receives the video data, decodes it and displays it. The operations of the tool can be remotely controlled from another application with the help of XML-RPC methods.
There are many video communication tools available, however very few of them provide us the flexibility to choose amongst different coding formats. This software has been built to be reusable as a component, thus allowing other applications to seamlessly plug-in this tool and demonstrate video communication in those applications.
The software broadly consists of six components:
- video capture from a camera input.
- encoding into compressed video formats. (currently supported formats: MPEG-1,
MPEG-2, MPEG-4 and H.263)
- use existing RTP library for sending and receiving video frames.
- decoding from compressed video formats.
- rendering in a video window.
- remote control, so that CoVit can be used as a component.
This project primarily deals with video but its design and interfaces can very
easily be extended for audio media as well.
Protocols like SIP, RTSP, H.323 are not included within the scope of this
project. The software does not include any call setup, session initiation or
control protocols. CoVit is not concerned about signaling and assumes that to
be handled by the user (either a human or an application) of this software
tool.
Communication using RTP provides a common media transport layer without any dependencies on the signaling protocol or the application. The sender is responsible for capturing of the video stream from a camera device (webcam in our case), it can optionally compress the video streams into one of the many acceptable compressed video formats, form RTP packets and send them to the receiver. The receiver is listening for video streams on a certain port. It receives RTP packets from the network at its designated port, processes the RTP packets, and extracts the timing information associated with the video data as well as the actual video data itself. It then decompresses the media stream (video data in our case) and then displays the video on a rendering window. The receiver is also in charge of deciding whether to render certain data on the window depending on the timing information of the associated packet. Fig-1 and Fig-2 broadly lays out the flow of the media from the sender to the receiver over RTP.
The software is broadly divided into three layers:
1. The XML-RPC module:
This is the topmost layer and it is primarily concerned with the remote
controlling of the tool by another user. It also acts as an interface between
the remote user and the underlying communication and capture layers of the
software. This layer exposes well formed APIs for both the remote user to
access as well as for interacting with the underlying RTP and capture modules.
(described below)
2. The RTP module:
This middle layer is responsible for generating RTP packets (sender) or
receiving RTP packets from the network. Apart from the handling of the RTP
protocol details, this layer is also responsible for encoding or decoding of
the video data. Also it acts as a channel through which the XML-RPC layer can
communicate with lower Capture layer if required for instance for configuring
the camera. A remote user can effectively configure the camera capture formats
with the help of the RTP layer APIs.
3. The capture module:
This lower layer deals with the capturing of video streams from a camera
device. This module provides a generic interface to initialize and interact
with video device drivers. The capture module keeps on capturing video data
from the device and provides the data to the middle layer.
Figure 3 shows the interdependencies between the different layers of the
software. The Figure shows the model for a single video stream capture, sender
and receiver. CoVit supports multiple video sending streams and multiple video
receiving streams. There will always be only one XML-RPC module whereas there
will be multiple RTP modules, multiple capture and rendering modules for
multiple streams.
The software is built on a threading model.
The threads can be categorized as follows:
1. The XML-RPC Server
2. The RTP Sender
3. The RTP Receiver
4. The Capture
When the software tool is initiated, only the XML-RPC Server thread is active
and running. All the other three threads are created and run as and when
requested by the user. The XML-RPC server thread continuously listens on a
certain port (the port number is specified in the command line arguments when
the tool is run.
E.g.: ./covit 8082 (which means the XML-RPC server is listening on port 8082)
Depending on the commands from the remote user, the remaining three threads are
created and starts running.
E.g.: When the remote user wants the tool to send RTP packets with video then
both the RTP sender thread and the capture thread are created and run.
The XML-RPC Server thread is always on and listening for any incoming commands
from the remote user. The RTP sender thread is created and starts running when
the remote user gives a command to the XML-RPC Server to send data to a
receiver. Such a command also starts the capture thread. The software maintains
a circular list of media Buffers which it uses for processing of continuous
video media stream. We have kept 30 buffers each of which holds one encoded
video frame in the circular list. The capture thread keeps on grabbing frames
from the video input device and pushes the grabbed frame into one of the
available Media buffers available. The RTP Sender thread keeps processing the
video frames by pulling the video frames out from the media Buffers being
filled by the capture Thread. The media Buffers are a shared resource between
the capture thread and the RTP sender thread. Thus the media buffer accesses
are protected by appropriate mutexes. The capture thread which operates with
the input device may also have an internal list of buffers if the video driver
implements it in that fashion. The implementation and design of the RTP sender
is however completely independent of the design and model of the underlying
video device driver. The operation of the RTP receiver thread is simpler in
this context. It listens on a certain port number for RTP packets, decodes the
video packets and displays the video on a rendering window. The RTP receiver
receives video frames from the network decodes them and displays them on the
rendering window. The receiver is also responsible for deciding whether to display
the video frame depending on the timing information associated with the video
frame. CoVit supports multiple camera capture devices, multiple RTP senders and
multiple RTP receivers. Each capture thread and RTP sender share a unique video
send stream identifier. Similarly each RTP receiver is identifiable with a
unique video receive stream identifier. The user is responsible for creating
video streams (send stream and receive stream) by specifying unique video
stream identifiers and subsequently stopping those video stream. CoVit frees
all resources associated with a video stream identifier and removes the video
stream identifier when the XML RPC receives a command to stop that video
stream.
Figure 4 shows the Threading Model.
Unicap Library http://www.unicap-imaging.org
XML-RPC http://xmlrpc-c.svn.sourceforge.net.
3
RFC 3550 RTP: A Transport Protocol for Real-Time Applications.
Colin Perkins RTP: Audio and Video for the Internet.
3
RTP Library RTP Library API Specification.
FFMPEG Tutorial Online FFMPEG Tutorial.
Last updated: 01-22-2009 by Subhrendu Sarkar