INTERNET DRAFT

    A Profile for the Transmission of Video Data over RTP


                        9 January 1995
                    Document Revision 0.3
                  Revision Date: 02 Nov 1994

                       Frank Kastenholz
                      FTP Software, Inc
                        2 High Street
              North Andover, Mass 01845-2620 USA

                        kasten@ftp.com


                  <draft-kastenholz-loki-00.txt>


Status of this Memo

This document is an Internet Draft.  Internet Drafts are
working documents of the Internet Engineering Task Force
(IETF), its Areas, and its Working Groups.  Note that other
groups may also distribute working documents as Internet
Drafts.

Internet Drafts are draft documents valid for a maximum of six
months.  Internet Drafts may be updated, replaced, or
obsoleted by other documents at any time.  It is not
appropriate to use Internet Drafts as reference material or to
cite them other than as a ``working draft'' or ``work in
progress.'' Please check the 1id-abstracts.txt listing
contained in the internet-drafts Shadow Directories on
nic.ddn.mil, nnsc.nsf.net, nic.nordu.net, ftp.nisc.sri.com, or
munnari.oz.au to learn the current status of any Internet
Draft.

This is a working document only, it should neither be cited
nor quoted in any formal document.


Internet Draft   A Video Transmission Profile     January 1995


This document will expire before 14 July 1995.

Distribution of this document is unlimited.

Please send comments to the author(s).


IETF                  Exp. 14 July 1995               [Page 1]


Internet Draft   A Video Transmission Profile     January 1995


1.  INTRODUCTION

This document presents the specification for Loki, a profile
to carry video traffic over RTP[1].

Loki is an experimental protocol developed at ftp Software to
conduct research into the issues surrounding the development,
performance, and use of network video applications in the
PC/Microsoft Windows environment.  There are several factors
in that environment that affected the decision to develop our
own video profile as opposed to doing a a straight port of NV,
or at least a re-implementation of NV's protocols in a Windows
application, and then influenced the design of that protocol:
1. A main element of the PC/Windows environment is Microsoft's
   "Video for Windows." Video for Windows is a set of
   libraries and APIs that provide a common environment for
   writing video applications.  Of most concern to network-
   video is a "standard" API for controlling video capture
   devices and obtaining captured images.  The use of Video
   for Windows imposes certain constraints on the use of the
   video-capture hardware, including the formats of the data
   received from that hardware.
2. PC/Windows machines are rather limited in their performance
   when compared to typical Unix workstations.  Low speed (20,
   25, 33 MHz) 386 processors with 4 or 8 Meg of memory are
   still very very common.  There are other performance
   considerations as well, such as Windows' use of 16-bit
   mode, the use of DOS, and so on.
3. Network interface adaptors, and drivers, exhibit widely
   varying levels of performance.  Furthermore, the PC world
   has the notion of "server" cards and "client" cards, with
   "server" cards tending to exhibit higher performance than
   "client" cards.  Of course, "server" cards are also more
   expensive.  In many instances, it turns out that the key
   performance issue is the card and driver used by the PC.
4. The byte ordering native to PCs is reversed when compared
   to the standard Network Byte Order.  Whilst Loki headers
   are all in Network Byte Order, one particular pixel format
   type is transferred as 16-bit integers in PC order rather
   than Network Byte Order.
5. The programming environment in Windows is vastly different
   from the typical X/Unix environment to which NV was
   written. Therefore, a straight port of NV was eliminated
   from consideration rather early on.


IETF                  Exp. 14 July 1995               [Page 1]


Internet Draft   A Video Transmission Profile     January 1995


6. A system should be able to recieve and display Loki video
   transmissions without any special display processing
   hardware and with a minimum of


1.1.  Note On Terminology

While network video is symmetric - a node can be both the
sender of video data to others and the receiver/displayer of
data received from others - this symmetry is composed of many
asymetric relationships.  That is, a single, two-way video
conference between two people is actually composed of two,
one-way conferences.  The following terms are used in this
document to describe the two nodes of a single, one-way, video
conference:

SOURCE
     The SOURCE is the node that is actually transmitting
     video image data.

SINK
     The SINK is the node(s) that actually receives the video
     data from the SOURCE and displays it (or otherwise
     processes it).

Note that a single physical node can be both a SOURCE and a
SINK.  Furthermore, both SOURCEs and SINKS will transmit and
receive packets.  E.g., both will send and receive SDES
packets, a SOURCE will send SR packets and receive RR packets
from the SINKs, while the SINK receives the SR packets and
sends RR packets.


1.2.  Change Log

The following changes have been made to the Loki specification
since the previous document.

 (1)   The Loki header is no longer considered an RTP Header
       Extension.

 (2)   The Loki header has been rearranged since it is no
       longer an RTP header extension (the length field is no
       longer needed).  In addition, some of the unused fields


IETF                  Exp. 14 July 1995               [Page 2]


Internet Draft   A Video Transmission Profile     January 1995


       have been removed to save space.

     Fields have been gratuitously rearranged to make full use
     of the 4 bytes before the version number. This way the
     version number is kept 'in position' allowing easy
     version differentiation.

 (3)   The Loki Header version number has been changed to
       version 2.

 (4)   The Can I See request packet has been restructured.
       The packet now includes a field specifying the network
       address (IP Address/UDP Port) to send the Can I See
       video stream to.  This is needed because of the curious
       behavior of some PC TCP/IP stacks w.r.t. multicasts.

 (5)   The version number for the Loki RTCP Application header
       has been changed to version number 2.

 (6)   A description of the Can I See function has been added.


IETF                  Exp. 14 July 1995               [Page 3]


Internet Draft   A Video Transmission Profile     January 1995


2.  PROTOCOL SPECIFICATION

This chapter specifies the Loki video protocol.


2.1.  Byte Order

Unless otherwise mentioned, all data are transferred in
Network Byte Order, as specified in [2].


2.2.  Video Data

This section specifies the protocol used to carry the Video
Data.

Loki Video data packets are carried in RTP Data Packets[1].
Loki adds an additional header to the packet, behind the RTP
header:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     RTP Header...                                             |
:                                                               :
|                                          ... RTP Header       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     Loki Header...                                            |
:                                                               :
|                                          ... Loki Header      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     Loki Data...                                              |
:                                                               :

                       RTP Data Packet
Each tick mark represents one bit position.


2.2.1.  Packet Size

Loki implementations MUST be able to send and receive packets
containing at least 1,000 bytes of Loki Data.


IETF                  Exp. 14 July 1995               [Page 4]


Internet Draft   A Video Transmission Profile     January 1995


2.2.2.  RTP Header Profile

This section specifies the use of several fields within the
RTP header.

2.2.2.1.  Marker Bit

The Marker (M) bit of the RTP header is not used by Loki.
Loki implementations should not generate packets with the M
bit set and they should ignore the bit in received packets.

2.2.2.2.  Payload Type

The payload type field for Loki packets is TBD.  (Current
experiments use the value 0x11, but this is subject to
change).

2.2.2.3.  Extension Bit

The RTO Header Extension (E) bit is not used for Loki.


2.2.3.  Loki Header

This section specifies the Loki header (which follows the RTP
header).

All Loki video packets contain the following header between
the RTP header and the actual video data:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             Width             |           Height              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     Version   |  Unused 1     |         Format                |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                      Loki Video Header
Each tick mark represents one bit position.

Where:


IETF                  Exp. 14 July 1995               [Page 5]


Internet Draft   A Video Transmission Profile     January 1995


Width
     This field contains the width of the image, in pixels.
     By carrying the width and height information in every
     data packet the image size can be changed dynamically
     during the session.  All Loki SINKs MUST support this.

Height
     This field contains the height of the image, in pixels.
     By carrying the width and height information in every
     data packet the image size can be changed dynamically
     during the session.  All Loki SINKs MUST support this.

Version
     This field contains the version number of the Loki
     protocol.  The version of the protocol specified in this
     document is 2.

     The value 0 is explicitly reserved for research use.

     Any received packet with an invalid version number, or a
     number identifying a version that is not supported MUST
     be discarded.

Unused 1
     This field is currently unused.  It is present to
     preserve alignment of following fields.


Format
     This field contains a value that describes the video data
     format.  Loki allows for several different video data
     formats, both how each pixel is encoded and how the video
     frame is chopped up and transmitted.  All of these
     formats are described in the "Video Encoding
     Specifications" chapter, below.


2.3.  RTCP Extensions

With the exception of application-defined packets (see next
section), Loki does not extend any of the RTCP packets in any
way.


IETF                  Exp. 14 July 1995               [Page 6]


Internet Draft   A Video Transmission Profile     January 1995


2.4.  Loki Control Packets

Loki defines several control packets. These all are carried in
the RTCP APP packets.

Loki control packets are identified by the four character
string "loki" (in 7-bit ASCII) in the "name" field of the RTCP
APP packet[1].

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTCP Flags    |  RTCP Payload |     RTCP Length               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                              SSRC                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                 Application Name -- "loki"                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             Type              |    Version    |    Unused     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Type Specific...                                    |

                  Loki Control Packet Header
Each tick mark represents one bit position.

The RTCP fields, the SSRC field, and the Application Name
field are as described for the RTCP APP packet in the LOKI
specification.


Type
     This field identifies the type of Loki Control packet
     this is.  The specific types are described in following
     sections.


Version
     This field identifies the version of the Loki Control
     protocol.  The version number described in this document
     is version 2.  This field must contain a 2.


Unused
     This field is unused.


IETF                  Exp. 14 July 1995               [Page 7]


Internet Draft   A Video Transmission Profile     January 1995


2.4.1.  Discard

This is the DISCARD Loki Control packet.  If this packet is
received it must be discarded.  No error conditions or
notifications may be generated as a result of receiving this
packet.  This packet must be included in any packet-accounting
done by the receiver (e.g., bytes or packets received on the
control port).

There are no extensions to the Loki Control Packet Header.  If
an application appends more data, that data MUST be ignored by
the receiver.

This packet is packet type 0.


2.4.2.  Suspend

This packet tells the SINKS that the SOURCE has temporarily
stopped transmission but will probably resume shortly.  A SINK
may then notify the human users of this condition.

This packet is packet type 1.

A SOURCE should retransmit this packet periodically while
suspended.  The exact retransmission strategy is not
important.  One strategy which implementation experience has
show to be effective is for a SOURCE to send an initial burst
of 3 or 4 Suspend packets at short intervals (such as 100ms)
and then go to a longer period, such as once a minute.

There are no extensions to the Loki Control Packet Header.  If
an application appends more data, that data MUST be ignored by
the receiver.


2.4.3.  Can I See Request

This packet is used to submit a "Can I See" request to a known
video source. The operation of the Can I See function is
described in a following section.

The Can I See (CNIC) request packet is packet type 2.


IETF                  Exp. 14 July 1995               [Page 8]


Internet Draft   A Video Transmission Profile     January 1995


The CNIC Request packet adds the following fields to the basic
Loki Control Packet Header.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Loki Control Packet Header ...                                |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         Request Handle                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     Requested Frame Count     |       Desired Format          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Requested Epsilon       |   Addr Family |    Flags      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          Server Csrc                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|      RTCP Request Address...                                  |

|                             ...RTCP Request Address           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|      RTP Request Address...                                   |

|                            ... RTP Request Address            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                     CNIC Request Packet
Each tick mark represents one bit position.

Where

Request Handle
     This is a 32-bit value assigned by the SINK to the
     request.  The SOURCE will include this value in any Can I
     See status packets to assist the SINK in matching status
     indications to requests.

Requested Frame Count
     This is the desired number of video frames which the SINK
     wishes to see.  The SOURCE must make every effort to send
     this number of frames to the SINK.

     A frame count of 0 indicates that the SINK wishes the
     SOURCE to stop video transmission.


IETF                  Exp. 14 July 1995               [Page 9]


Internet Draft   A Video Transmission Profile     January 1995


     A frame count of 65535 (0xffff) indicates that the SINK
     wishes the SOURCE to send video data indefinitely.

     A frame count of 0 indicates that the SOURCE should
     immediately stop sending video data.


Desired Format
     The video transmit format (see "Video Encoding
     Specifications") which the SINK wishes to receive.  A
     SOURCE is not required to use this value in satisfying
     the request.  it is merely an indication of the SINK's
     desire.


Requested Epsilon
     This field carries a desired value to use for epsilon in
     any pixel-comparisons done in satisfying the request.
     See the section on "Pixel Comparisons", below, for a
     description of the use of the epsilon value.

     A SOURCE is not required to use this value in satisfying
     the request.  it is merely an indication of the SINK's
     desire.


Address Family
     This field identifies the address family for the Request
     Address.  The following values are supported:
          0 Reserved.
          1 UDP/IPv4
          2 Reserved for UDP/IPv6

Flags
     This field contains some flag bits:
     0x01 Indicates that the Server's CSRC field actually
          contains a valid CSRC.

Server's CSRC
     This field contains the Content Source that identifies
     the server to which the request is directed. This field
     may assist Mixers/Bridges and Translators in passing
     requests.  No particular CSRC value is reserved, so a
     separate flag bit has been defined to indicate that the


IETF                  Exp. 14 July 1995              [Page 10]


Internet Draft   A Video Transmission Profile     January 1995


     field actually contains a CSRC.

RTP Request Address

RTCP Request Address
     These two fields indicate the responses should be sent.
     The actual format of the fields depends on the address
     family.  Currently, only the UDP/IPv4 family is
     specified:

     +--------+--------+--------+--------+
     |     IP Address                    |
     +--------+--------+--------+--------+
     |     UDP Port    |     unused      |
     +--------+--------+--------+--------+
     There are two fields here. The first one contains the
     Addressing information where RTCP packets are to be sent.
     The second one contains the information specifying where
     the RTP data should be sent.

     If any element of the Request Address fields contains 0
     then the value should be derived from the Source Address
     and Port information for the packet.  For example, if the
     IP Address fields are 0 then the IP address to which the
     data are sent should be the source address of the packet
     containing the request.

2.4.4.  Can I See Status

This packet is used to indicate status information to a "Can I
See" requestor.

The Can I See (CNIC) Status packet is packet type 3.

The CNIC Status packet adds the following fields to the basic
Loki Control Packet Header.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Loki Control Packet Header ...                                |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         Request Handle                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


IETF                  Exp. 14 July 1995              [Page 11]


Internet Draft   A Video Transmission Profile     January 1995


|        Status Code            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                      CNIC Status Packet
Each tick mark represents one bit position.

Where

Request Handle
     Is the Request Handle field of the CNIC Request packet to
     which this status indication applies.  This information
     may assist the SINK in matching requests with status
     indications.


Status Code
     This is the actual status code.  The following codes are
     defined:
     0 - NO-OP
        This is a no-op.  It may be ignored.
     1 - No Resources
        This indicates that the CNIC request is being rejected
        because the SOURCE does not have the resources to
        honor the request.
     2 - Disabled
        This indicates that the CNIC request is being rejected
        because the CNIC service at the SOURCE is either
        administratively disabled or not supported at all.
     3 - Too Many Frames
        This value indicates that the request is being
        rejected because the requested number of frames
        exceeds an administrative limit set at the SOURCE on
        the number of frames allowed for any single request.
     4 - Prohibited to You
        This value indicates that the request is being
        rejected because the SOURCE is administratively
        prohibited from honoring requests from the SINK.
     5 - Terminated
        This code is sent to the SINK when the SOURCE has
        completed honoring the request.
     6 - Too Many Requests
        This value indicates that the request is being
        rejected because the request would exceed the SOURCE's
        administrative limit on the number of requests that it


IETF                  Exp. 14 July 1995              [Page 12]


Internet Draft   A Video Transmission Profile     January 1995


        can handle.


IETF                  Exp. 14 July 1995              [Page 13]


Internet Draft   A Video Transmission Profile     January 1995


3.  Video Encoding Specifications

This section describes the data formats used by Loki.  There
are separate parts to the data format specifications; there
are the specifications of the formats for the data for the
indivdual pixels (called Pixel Formats) and the specifications
for how the pixel data are assembled, formed into packets, and
transmitted (called Transfer Formats).


3.1.  Pixel Formats

This section contains the specifications on how individual
pixels are represented.


3.1.1.  24 Bit RGB

In this format the pixel is transferred as a 24-bit RGB
triple.  Each of the Red, Green, and Blue values are
transferred as a single 8-bit quantity, each one in a single
byte.

The elements are transferred in the order Blue, Green, Red:

 0             0   0   1         1   1       2     2
 0 1 2 3 4 5 6 7   8 9 0 1 2 3 4 5   6 7 8 9 0 1 2 3
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|     Blue      | |     Green     | |      Red      |
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+

                   24-bit RGB Pixel Format

Each tick-mark represents one bit

Note that this format is in the color-ordering that is native
to Microsoft Windows.


3.1.2.  16 Bit RGB

This format transfers each pixel as a 16 bit RGB value.  Each
of the three Red, Green, and Blue values are transferred in 5
bits. The three 5-bit values are packed into a single 16-bit


IETF                  Exp. 14 July 1995              [Page 14]


Internet Draft   A Video Transmission Profile     January 1995


element for transmission.  The elements are packed as follows:

   0                             1
   0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
 |G2 G1 G0 B4 B3 B2 B1 B0 xx R4 R3 R2 R1 R0 G4 G3|
 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

                   16-bit RGB Pixel Format

Each tick-mark represents one bit

Where:

xx   Is an unused bit.

B0...B4
     Are the 5 bits representing the BLUE value, B0 being the
     least significant bit, B4 being the most significant bit.

G0...G4
     Are the 5 bits representing the GREEN value, G0 being the
     least significant bit, G4 being the most significant bit.

R0...R4
     Are the 5 bits representing the RED value, R0 being the
     least significant bit, R4 being the most significant bit.

This bit and byte ordering is an artifact of the Intel '86
processor family's byte ordering.  The ordering is "correct"
for this processor family in that, when treating this pixel as
a single 16-bit integer value, the BLUE bits are the least-
significant, the RED bits the most-significant, and all of the
GREEN bits are adjacent.


3.1.3.  8 Bit Monochrome

This format transfers each pixel as a single 8-bit monochrome
value.  Each value is transferred in a single byte.

One possible method for conversion from a 24-bit RGB value
would be:
   mono_value = (pixel.red + pixel.green + pixel.blue) / 3;


IETF                  Exp. 14 July 1995              [Page 15]


Internet Draft   A Video Transmission Profile     January 1995


3.1.4.  4 Bit Monochrome

This format transfers each pixel as a single 4-bit monochrome
value.  Each transferred byte contains two pixels' of data:

                        0 1 2 3 4 5 6 7
                       +-+-+-+-+-+-+-+-+
                       |Pixel 1|Pixel 2|
                       +-+-+-+-+-+-+-+-+

                4-bit Monochrome Pixel Format
One possible method for conversion from a 24-bit RGB value
would be:

mono_byte = (((pixel[0].read +
               pixel[0].green +
               pixel[0].blue)
              /3)
             & 0xf0) +
            (((pixel[1].read +
               pixel[1].green +
               pixel[1].blue)
              /3)
             >> 4)


3.1.5.  8 Bit Palette

This pixel format type is reserved for future use.  The intent
is to use it to transfer image data using Microsoft Windows'
8-bit palettized data format.


3.1.6.  4 Bit Palette

This pixel format type is reserved for future use.  The intent
is to use it to transfer image data using Microsoft Windows'
4-bit palettized data format.


3.1.7.  411 YUV

This format is available from some video capture hardware
(notably Intel's Indeo card).  In this format there are 8 bits


IETF                  Exp. 14 July 1995              [Page 16]


Internet Draft   A Video Transmission Profile     January 1995


of combined Luminance/Y data per pixel.  In addition, for
every 2 pixels there is one U and 1 V bit.

When transferring 411-YUV data in the "Simple Transfer Format"
(see the following section), the data are transferred exactly
as received from the video capture hardware. Specifically, for
an N pixel image:
- N bytes of Luminance/Y data followed by,
- N/16 bytes of V data (1 bit per 2 pixels),
- N/16 bytes of U data (also 1 bit per 2 pixels).

When transferring 411 YUV data in the "Block Transfer Format"
(see the following section), all data for a single 8x8 pixel
block are transferred together. Specifically, each 64-pixel
block is transferred as:
- 64 bytes of Luminance/Y data, followed by
- 4 bytes of V Chrominance Data (1 bit for every 2 pixels),
  followed by
- 4 bytes of U Chrominance Data (1 bit for every 2 pixels),
  followed by

The Luminance/Y bytes are transferred in the following order
(given in X/Y coordinates within the 8x8 block):


IETF                  Exp. 14 July 1995              [Page 17]


Internet Draft   A Video Transmission Profile     January 1995


    (0,0), (0,1), (0,2), (0,3), (0,4), (0,5), (0,6) (0,7),
    (1,0), (1,1), (1,2), (1,3), (1,4), (1,5), (1,6) (1,7),
    (2,0), (2,1), (2,2), (2,3), (2,4), (2,5), (2,6) (2,7),
    (3,0), (3,1), (3,2), (3,3), (3,4), (3,5), (3,6) (3,7),
    (4,0), (4,1), (4,2), (4,3), (4,4), (4,5), (4,6) (4,7),
    (5,0), (5,1), (5,2), (5,3), (5,4), (5,5), (5,6) (5,7),
    (6,0), (6,1), (6,2), (6,3), (6,4), (6,5), (6,6) (6,7),
    (7,0), (7,1), (7,2), (7,3), (7,4), (7,5), (7,6) (7,7),
The U and V data are then transferred as follows
   Bit Number        Pixel Coordinates
       0                (0,0), (0,1)
       1                (0,2), (0,3)
       2                (0,4), (0,5)
       3                (0,6), (0,7)
       4                (1,0), (1,1)
       5                (1,2), (1,3)
       6                (1,4), (1,5)
       7                (1,6), (1,7)
       8                (2,0), (2,1)
       9                (2,2), (2,3)
      10                (2,4), (2,5)
      11                (2,6), (2,7)
      12                (3,0), (3,1)
      13                (3,2), (3,3)
      14                (3,4), (3,5)
      15                (3,6), (3,7)
      16                (4,0), (4,1)
      17                (4,2), (4,3)
      18                (4,4), (4,5)
      19                (4,6), (4,7)
      20                (5,0), (5,1)
      21                (5,2), (5,3)
      22                (5,4), (5,5)
      23                (5,6), (5,7)
      24                (6,0), (6,1)
      25                (6,2), (6,3)
      26                (6,4), (6,5)
      27                (6,6), (6,7)
      28                (7,0), (7,1)
      29                (7,2), (7,3)
      30                (7,4), (7,5)
      31                (7,6), (7,7)


IETF                  Exp. 14 July 1995              [Page 18]


Internet Draft   A Video Transmission Profile     January 1995


3.2.  Transfer Formats

This section describes how the data for the individual pixels
(encoded per the previous section) are assembled together into
packets for transmission.

There are two transfer formats, simple and block. In simple
mode, the data as received from the video capture hardware is
gathered up into chunks and transmitted "as is". In block
mode, the image-data are broken up into 8x8 pixel cells with
some "diff" analysis and some backgrounding done.

There are, in fact, two "simple" modes of transfer. One is for
all video formats EXCEPT the Indeo format. The other is JUST
for the Indeo-YVU format.

The reason that there is a separate Simple mode for Indeo is
that Indeo represents the pixels in YVU format and it breaks
up the three values (Y, V, and U) into separate locations in
the video capture buffers.  The "non-Indeo" Simple Mode
requires that all the data for a specific pixel be transferred
together, so sending Indeo-YVU data in this mode would require
that three separate chunks of data be brought together,
transmitted, and then broken up. The role of the Simple Mode
is to be simple, and this all seemed rather complex. In
addition, since the Indeo-YVU data format uses 9-bits for each
pixel, it would be very likely that a non-integral number of
bytes would have to be transferred, further complicating
matters.


3.2.1.  Simple Mode

This mode applies to all data formats except the Indeo data
format.  Simple Mode is a simple transfer of all the data
received from the video capture hardware.  No additional
compressions or transformations are performed on the data.

In Simple Mode, data from the video-capture hardware are
gathered up into blocks of up to 256 pixels. Each block is
preceeded by a 4-byte "Video Element Header" which gives the
length of the element and that element's position within the
image.


IETF                  Exp. 14 July 1995              [Page 19]


Internet Draft   A Video Transmission Profile     January 1995


Each packet MUST contain an integral number of pixels. That is
to say, for encodings that use more than one byte to represent
a pixel, all bytes for a pixel must be in the same packet.
E.g., if 24-Bit RGB data is being transmitted, it is NOT
allowed to send the bytes containing the Blue and Green
information in packet N and the byte containing the Red
information in packet N+1.

The format of the Video Element Header is:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Pixel Count  |      X Coordinate     |      Y Coordinate     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                     Video Element Header
Each tick mark represents one bit position.

Where

Pixel Count
     This is the number of pixels being transferred by this
     header.  The maximum number of pixels is 255. A count of
     0 means "no pixels"

X Coordinate
     This is the X coordinate (0 based) of the first pixel in
     the block being transferred.  Coordinate (0,0) is the
     upper-left corner of the screen.

Y Coordinate
     This is the Y coordinate (0 based) of the first pixel in
     the block being transferred.  Coordinate (0,0) is the
     upper-left corner of the screen.

A given packet may have more than one block of data in it.
For example, if an image is being transmitted in RGB-24
format, and a packet has up to 1,000 data bytes, the packet
could contain:
1. A 4-byte Video Element Header,
2. 765 bytes of RGB-24 data (255 pixels at 3 bytes per pixel),
3. A second 4-byte Video Element Header, and
4. 225 more bytes of RGB-24 data (75 pixels at 3 bytes per


IETF                  Exp. 14 July 1995              [Page 20]


Internet Draft   A Video Transmission Profile     January 1995


   pixel).

This gives a total of 998 bytes of data.  The remaining 2
bytes are not used since the packet must have an integral
number of pixels, and the pixel-format for this example uses 3
bytes per pixel.


3.2.2.  Indeo Simple Mode

When sending Indeo-formatted Pixels in the Simple Mode, a
single additional header is added to the packet. This header
is added after the Loki Video Header and before the image
date..

The header's format is:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           Buffer Size                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             Offset                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Where

Buffer Size
     Is the total size of the buffer needed to hold all of the
     Indeo data. This is in bytes.

Offset
     Is the offset (0-based) in the Indeo data buffer of the
     first byte in this packet.

     The rest of the packet contains the image data.


IETF                  Exp. 14 July 1995              [Page 21]


Internet Draft   A Video Transmission Profile     January 1995


3.2.3.  Block Mode

Each block is preceeded by a 4-byte header:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Flags        |      X Coordinate     |      Y Coordinate     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                      Block Mode Header
Each tick mark represents one bit position.

Where

Flags
     Contains flag information:
     0x80 Insert the background image information (stored in
          the SINK's background buffer) into the display at
          the location specified in the X/Y coordinates.  This
          is the DISPLAY_BACKGROUND_DATA flag in the algorithm
          presented below.
     0x40 Copy the data for this block from the packet into
          the SINK's background buffer.  This is the
          COPY_DATA_TO_BG flag in the algorithm presented
          below.

X Coordinate
     This is the X coordinate (0 based) of the block being
     transferred.  This coordinate is in units of 8x8 blocks;
     thus, the 8x8 block at coordinates (1,1) would be at
     pixel location (8,8) in the image.  Coordinate (0,0) is
     the upper-left corner of the screen.

Y Coordinate
     This is the Y coordinate (0 based) of the block being
     transferred.  This coordinate is in units of 8x8 blocks;
     thus, the 8x8 block at coordinates (1,1) would be at
     pixel location (8,8) in the image.  Coordinate (0,0) is
     the upper-left corner of the screen.

An advantage of block mode is that the quantum of transfer is
a single block of 64 pixels. Thus it allows for improved
efficiency when sending only image changes. A SOURCE would


IETF                  Exp. 14 July 1995              [Page 22]


Internet Draft   A Video Transmission Profile     January 1995


examine only 2 or 3 pixels of each block and if any of those
pixels had changed, the entire block is sent.


3.3.  Format Values

The Loki Extension Header Format field may contain the
following values, describing the format of the data carried in
the packet:


IETF                  Exp. 14 July 1995              [Page 23]


Internet Draft   A Video Transmission Profile     January 1995


            +-------+----------+----------------+
            | Value | Transfer | Pixel Encoding |
            |       | Format   | Format         |
            +-------+----------+----------------+
            |    0  |  Reserved                 |
            +-------+----------+----------------+
            |    1  | Simple   | 24 Bit RGB     |
            +-------+----------+----------------+
            |    2  | Block    | 24 Bit RGB     |
            +-------+----------+----------------+
            |    3  | Simple   | 16 Bit RGB     |
            +-------+----------+----------------+
            |    4  | Block    | 16 Bit RGB     |
            +-------+----------+----------------+
            |    5  | Simple   | 8 Bit Palette  |
            +-------+----------+----------------+
            |    6  | Block    | 8 Bit Palette  |
            +-------+----------+----------------+
            |    7  | Simple   | 4 Bit Palette  |
            +-------+----------+----------------+
            |    8  | Block    | 4 Bit Palette  |
            +-------+----------+----------------+
            |    9  | Simple   | 8 Bit Mono     |
            +-------+----------+----------------+
            |   10  | Block    | 8 Bit Mono     |
            +-------+----------+----------------+
            |   11  | Simple   | 4 Bit Mono     |
            +-------+----------+----------------+
            |   12  | Block    | 4 Bit Mono     |
            +-------+----------+----------------+
            |   13  |   Unused                  |
            +-------+----------+----------------+
            |   14  | Simple   | Raw Indeo      |
            +-------+----------+----------------+
            |   15  | Block    | Raw Indeo      |
            +-------+----------+----------------+


Obviously, in keeping with the robustness principle, if some
value other than the ones specified above is received in a
packet, a receiver will ignore the packet.


IETF                  Exp. 14 July 1995              [Page 24]


Internet Draft   A Video Transmission Profile     January 1995


4.  Algorithms

This chapter describes several key algorithms in the FTP
Software implementation of Loki.  In general, these algorithms
do not have any effects on interoperability.  However, they
can improve performance and as such are offered on an
informational basis.  Implementation experience has shown
these algorithms to be effective.


4.1.  Pixel Comparisons

Some video digitization hardware has a fairly large amount of
"noise" in the system.  Even if a scene is truly unchanging,
the numeric values for the pixels of the scene can vary over a
fairly large range.  So, when comparing two pixel values (such
as the "current" and "previous" values for block-mode
transmissions), a false "unequal" may occur because of this
noise.

Therefore, in order to eliminate this noise, comparisons of
pixel values are not done for "strict equality" but rather
"close equality".  Specifically, comparisons are done as:

if ((pixel_a <= (pixel_b + epsilon)) &&
    (pixel_a >= (pixel_b - epsilon)))
{
    The pixels are considered "equal"
}


One particular piece of video-capture hardware has a "noise
level" of +/- 32 when capturing RGB-24 images. That is, when
capturing an unchanging scene, the values of a pixel's
components in two successive images are deemed to be "unequal"
with epsilon values less than 32.

Other than tending to reduce the amount of data that needs to
be transmitted, this comparison algorithm has no effect on the
protocol or the network.  However, in order to support the
algorithm, the "Can I See" request includes a field that
permits the requestor to specify a desired "epsilon" value.


IETF                  Exp. 14 July 1995              [Page 25]


Internet Draft   A Video Transmission Profile     January 1995


4.2.  Adaptive Transmit

In order to make the best of changing network conditions and
differing receiver capabilities, an adaptive transmission
algorithm is used.  This algorithm operates at the SOURCE,
which evaluates the performance perceived by the SINKS. It
then inserts some artificial delay between transmitting
packets.  It varies this delay in an attempt to maximise the
throughput and frame rate while minimising the loss rate.

The algorithm uses a high and a low "receiver quality
threshold".  If the receiver quality falls below the low
threshold then the algorithm increases the added inter-packet
gap until the quality gets above the high threshold (or until
some administrative limit on the size of the gap is reached).
If the receiver quality exceeds the high-threshold then the
gap is reduced until the quality falls below the high-
threshold (or the added gap is reduced to 0).  The intent is
to keep the receiver quality within some range.

The algorithm operates as follows:

 (1)   The SINKS all periodically report their reception
       characteristics to the SOURCE via the RR packet.  This
       packet includes, among other things, a count of the
       number of packets actually received by the SINK as well
       as a count of the number of packets expected by the
       SINK (based on the RTP sequence number field).

       This is a standard function of RTP.

 (2)   For each known SINK, the SOURCE maintains a "percent
       received" value:

           Number of Packets Received by the Sink * 100
           --------------------------------------------
              Number of Packets Expected by the Sink


IETF                  Exp. 14 July 1995              [Page 26]


Internet Draft   A Video Transmission Profile     January 1995


 (3)   Periodically, the SOURCE takes the average "percent
       received" value for all known SINKS.  This value, Q, is
       then fed into the following algorithm:


       static BOOL correcting = false;

       if (Q > HI_THRESHOLD) {
           decrease interpacket gap;
           correcting = FALSE;
       } else if (Q < LOW_THRESHOLD) {
           correcting = TRUE;
           increase interpacket gap;
       } else if (correcting == TRUE) {
           increase interpacket gap;
       }


The two values HI_THRESHOLD and LOW_THRESHOLD define the
desired performance at the SINKS.  If the overall SINK
performance, Q, falls below the LOW_THRESHOLD then the inter-
packet gap is steadily increased until Q gets above the
HI_THRESHOLD.  If Q gets above HI_THRESHOLD then the
interpacket gap is decreased until it gets below that value

Increasing and decreasing the interpacket gap is done in
fairly small increments.  Significant increases in image
quality at the SINKS have been observed by adding only a few
milliseconds of gap between packets.  Ideally, the gap would
be increased by the smallest effective time quantum available
on the SOURCE system.

Obviously an implementation would wish to bound the maximum
amount of time added as an interpacket gap.

Decreasing the gap has the effect of increasing the load on
the network.  To be a "good network citizen", one should
decrease the gap slower than one increases the gap - perhaps
every n'th pass through the algorithm (with n > 1).


The basis for this algorithm is similar to the one for the Van
Jacobson congestion control algorithm for TCP. That is;


IETF                  Exp. 14 July 1995              [Page 27]


Internet Draft   A Video Transmission Profile     January 1995


networks are generally very reliable, they do not lose packets
because of error conditions.  Lost packets are the result of
congestion someplace in the network. Therefore, to reduce
packet loss rates (which is the goal for video transmission
since reduced loss means better image quality), we must reduce
congestion.

Network video transmission by PCs has a bi-modal network load
characteristic. There is a period of time when a frame is
captured and digitized and there is no network traffic
generated.  Then there is a period of time during which the
digitized frame is packetized and transmitted.

The congestion that must be reduced is the congestion that
occurs during this second period of time; when the frame is
actually transmitted.  Simply reducing the frame capture rate,
or increasing the compression will not solve this problem.
Decreasing the frame capture rate will only increase the
length of the quiet time.  Increasing the compression on the
frames will only reduce the length of time during which data
are transmitted (reducing the number of packets and bytes
sent). However, experience has indicated that packet loss
occurs because packets arrive at interfaces (recieving hosts
or intermediate routers) too fast -- the interface can not be
'turned around' in time to receive the next packet.

The chosen solution was to add a delay between packet
transmissions.

An improvement to the algorithm would be to add the gap
between every N packets, on the theory that most interfaces
can receive several packets in rapid succession but then there
needs to be some time for the hosts to "reset" the interface
for the next batch of packets.  This behavior has, in fact,
been seen on some combinations of PC and network interface.
Due to time constraints, this work was not done.


4.3.  Jitter

The RTP Receiver Reports carry an interarrival jitter field.
However, no algorithm is specified.

For Loki, the jitter is standard deviation in the average


IETF                  Exp. 14 July 1995              [Page 28]


Internet Draft   A Video Transmission Profile     January 1995


inter-packet gap, expressed in milliseconds.

A filter is applied to the incomming packets.  The inter-
arrival time between two packets, X and Y, is calculated and
used for jitter calculations if and only if those packets meet
the following two criteria:

 (1)   Both packets must be from the same video frame.  That
       is, the RTP Timestamp must be the same for both
       packets.  This criteria filters out the gap while the
       transmitter grabs, digitizes, and starts packetizing
       the "next" frame.

 (2)   The RTP Sequence Number for packet Y must be X's
       sequence number plus one.  This filters out any gaps
       that occur because of missing packets.

The calculation of the standard deviation, s, is according to
the following formula:

          ----------------------------------
     _   /  (n * Sigma(x**2) - Sigma(x)**2)
 s =  \ /   -------------------------------
       v                n(n-1)

Where

x    is an inter-packet arrival time in milliseconds,

n    is the number of packets over which the calculation is
     performed, and

Sigma
     is the summation operation.

As an implementation matter, only n, Sigma(x**2) and Sigma(x)
are calculated when a packet is received.  The remaining
calculations are performed only when the Receiver Report
packet is generated.


IETF                  Exp. 14 July 1995              [Page 29]


Internet Draft   A Video Transmission Profile     January 1995


4.4.  Block Mode Algorithm

In the block transmission mode, the image is divided up into
8-pixel by 8-pixel blocks of data and each 8x8 block of data
is treated as a single quantum.  This makes it reasonable to
retain a large set of state information on the actual image
data, allowing for more sophisticated transmit features:

IMAGE DIFFING
     In block mode it is reasonably efficient to examine a few
     pixels of each block in order to determine whether to
     transmit that block's data or not.  The pixels chosen for
     the old vs. new comparison should be randomly selected
     from within the block. Otherwise it is possible that a
     corner of a block would change but that change would
     never be detected.

BACKGROUNDING
     If a block doesn't change for several frames then that
     block is presumed to be 'background'.  The SOURCE would
     then inform the SINKs that a particular Block's data is
     now Background data. As the image changes, the background
     data can be re-displayed by a short header (which just
     gives the coordinates of the block and a "display
     background here" command) rather than sending the entire
     block's worth of data.

TIMEOUT
     It is conceivable that portions of a scene will not
     change for an appreciable period of time (if ever).
     Late-arriving SINKS would then never get the actual data
     for the block.  The algorithm realizes this and a SOURCE
     will time out portions of the image and then send those
     blocks. This ensures that all SINKs eventually get all
     parts of the image.

PROBABLE DELIVERY
     In order to increase the probability that all the SINKs
     receive a particular update, a SOURCE will send the
     update for several frames beyond which it would otherwise
     have to send the data. For example, if a particular block
     of the image changes from "A" to "B" in frame N (and
     doesn't change thereafter), the SOURCE will transmit the
     "B" data for several additional frames after frame N


IETF                  Exp. 14 July 1995              [Page 30]


Internet Draft   A Video Transmission Profile     January 1995


     (e.g., frame N, N+1, and N+2). This increases the
     likelyhood that all SINKs received the new data.

When deciding whether to send image data and flags, the
following algorithm is used:

(1) if ((background_valid == TRUE) &&
        (background_timeout <= NOW))
    {
       background_valid = FALSE;
       send_background = TRUE;
       retrans_count = RETRANS;
       background_timeout = NOW + LIFE;
    }
(2) if ((background_valid == TRUE) &&
        (current_data == background_data))
    {
       previous_data = current_data;
       TRANSMIT(NULL, DISPLAY_BACKGROUND_DATA);
       return;
    }
(3) if (current_data != previous_data)
    {
        send_background = FALSE;
        retrans_count = RETRANS;
        previous_data = current_data;
        TRANSMIT(current_data, 0);
        return;
    }
(4) if (retrans_count != 0)
    {
        retrans_count = retrans_count - 1;
        if (send_background == TRUE)
        {
            TRANSMIT(current_data, COPY_DATA_TO_BG);
            return;
        }
        TRANSMIT(current_data, 0);
        return;
    }

(5) if (send_background == FALSE)
    {
        send_background = TRUE;


IETF                  Exp. 14 July 1995              [Page 31]


Internet Draft   A Video Transmission Profile     January 1995


        retrans_count = RETRANS;
        background_data = current_data;
        TRANSMIT(current_data, COPY_DATA_TO_BG);
        return;
    }
(6) background_timeout = NOW + LIFE;
    background_valid = TRUE;
    send_background = FALSE;
    TRANSMIT(NULL, DISPLAY_BACKGROUND_DATA);
    return;

TRUE and FALSE are constants with the obvious meaning.

In the algorithm, RETRANS, LIFE, and NOW are global variables:

LIFE
     Is the lifetime of background data in units of captured
     frames.  The SOURCE will transmit the background data to
     the SINKs if the data has not changed in LIFE frames.
     The SOURCE will send the data RETRANS times (see next
     item) before it stops.

RETRANS
     Is the number of times data will be retransmitted after
     the data stops changing. In other words, it is used to
     define how "probable" the Probable Delivery is.  It also
     is the basis for defining background data. If a block
     does not change for 2xRETRANS frames then the block's
     data is called background data.

NOW  The current frame number being transmitted.

All other variables are state variables that are kept on a
per-block basis:

background_valid
     Is a boolean which indicates whether the data in
     "background_data" is valid or not.

background_timeout
     This is the timeout for the background data.  When the
     current frame number, NOW, exceeds this value the SOURCE
     will retransmit the background data.


IETF                  Exp. 14 July 1995              [Page 32]


Internet Draft   A Video Transmission Profile     January 1995


retrans_count
     This is the number of times that data will be
     transmitted.  This counts down. When it reaches 0,
     retransmission stops.

send_background
     This is a boolean which indicates whether the SOURCE is
     telling the SINKs to place the transmitted data into the
     background buffers at the SINKs or not.

current_data
     This is the current image data for the block.

background_data
     This is the background data for the block.

previous_data
     This is the data for the previous frame of the image.

The algorithm works as follows:

(1)  This step checks to see if the background data has timed
     out.  The intent is to periodically refresh the
     background information at the SINKs.  The background data
     is retransmitted by the expedient of simply declaring
     that the background is "invalid" but that we are in the
     process of transmitting it.

(2)  This step checks to see if the block in the current image
     is the same as the background data. If it is then no data
     is transmitted, but the "display the background data for
     this block" command is sent.  The algorithm then exits.

(3)  This step checks if the data for the current image is the
     same as the previous image.  If not then the image is
     changing and we have to transmit the "current" data.  The
     retransmission logic is also set up so that the data will
     be retransmitted several times (RETRANS) once the image
     'stops' changing.  The algorithm then exits.

(4)  At this point, we know that the current frame is A) not
     the same as the background and B) IS the same as the
     previous frame.  This step then checks to see if the
     block is being retransmitted.  If it is, the retransmit


IETF                  Exp. 14 July 1995              [Page 33]


Internet Draft   A Video Transmission Profile     January 1995


     count is decremented (we only retransmit a small, finite,
     number of times) and the current image data are
     transmitted.  If the data are being sent as "background"
     data to the SINKs then the COPY_DATA_TO_BG flag is set,
     otherwise it isn't.  The algorithm then exits.

(5)  Once the retransmit count goes to 0 we fall through into
     here.  At this point the decision is made as to whether
     the data should be considered background data or not.  In
     this step, we check to see if we were transmitting
     background information or not. If we were not then we
     start sending the current data as background data (i.e.
     send the COPY_DATA_TO_BG flag with the data).  Note that
     we only get to this point if the data have not changed
     for RETRANS frames, implying that this portion of the
     image is fairly stable.  Also, we remember that we are
     now sending background information (send_background is
     set to TRUE) and we set up the retransmission counter to
     send the background data a few times.

(6)  At this point, the there is no new data to send to the
     SINKs, all retransmissions are done, and we know that
     what they should display is what's in the background.
     Simply transmit no data, just the command to tell the
     SINKs to display what is in their background buffers.
     Also initialize the retransmission logic so that the
     background data will be sent LIFE frames in the future
     (assuming no other changes occur).

Note that this algorithm declares a particular block to be
"background" data if that data does not change for 2xRETRANS
frames.


IETF                  Exp. 14 July 1995              [Page 34]


Internet Draft   A Video Transmission Profile     January 1995


5.  Future Work

5.1.  RTP -06

The Loki work was done on RTP version 05[1].  Since that time,
a new version of RTP has been produced, the 06 version[3].
The Loki protocol and implementation should be revised to use
RTP 06.  The most significant modifications are that the 06
version of RTP includes implementation notes and algorithm
specifications which will supplant the algorithms used in
Loki.  In particular, the Jitter algorithm specified in the
[3] will replace the one specified in this document.


5.2.  Additional Transmit Adaptivity

The current adaptive transmit algorithm:

 (1)   Uses only the packet loss rate to determine received
       image quality,

 (2)   Only uses the inter-packet transmit delay as the tuning
       "knob", and

 (3)   Adjusts the delay for every transmitted packet.

There are additional strategies that could be investigated,
both when the determining received image quality and in
adjusting transmit behavior to fix any perceived quality
problems.  For example

 (1)   The inter-packet transmit delay could be adjusted only
       every N'th packet. Investigations indicate that some
       interfaces are able to receive a few packets back-to-
       back but then start to fail (the typical failure mode
       here is that on-card buffers fill up).  By inserting
       the gap every N'th packet, then the effective transmit
       rate can be increased without sacrificing quality.

 (2)   The current algorithm will increase the transmit delay
       until either the high-threshold or the maximum allowed
       delay is reached.  Rather, work could be done to make
       the algorithm cognizant of how the receiver quality is
       changing as the delay is increased, and the algorithm


IETF                  Exp. 14 July 1995              [Page 35]


Internet Draft   A Video Transmission Profile     January 1995


       could stop increasing the delay when the reported
       quality stops getting better.  Any such modifications
       must be aware of possible non-linear relationships
       between the delay and the quality and be able to deal
       with them.

 (3)   The packet size could be increased.  This would add a
       small amount of time between transmitting packets (more
       time is needed to packetize more data).  It would also
       reduce the number of packets sent per frame, reducing
       the number of inter-packet gaps, increasing the overall
       data transmission rate.

 (4)   The pixel-formatting could be changed.  This would have
       two effects. The time required to reformat the pixels
       (say from RGB-24 to MONO-4) would add to the
       interpacket gap.  The number of packets required to
       transmit an image would be decreased.


5.3.  Selective Retransmissions

Early development of Loki included a facility whereby a SINK
could request that the SOURCE retransmit certain portions of
the image.  This facility was not kept when Loki was ported to
run over RTP (mostly as a matter of time constraints on the
implementation work).  Experiments with that early version had
shown that, in certain types of environment (such as 1 to 1
video conferencing) this capability provides noticable
improvements to the video image.

This capability worked as follows.
1.  This was only available in block-mode.
2.  For each block, the SINK monitored how long it had been
   since any real data had been received for the block, either
   the foreground or the background. If a period of time had
   elapsed when no data had been received, a request was sent
   to the SOURCE that specifically asked that the data for the
   block be transmitted.
3.  The SOURCE then had the option of honoring the request, or
   ignoring it, or queueing it up for a short time in order to
   combine several requests together into a single response.
   The SOURCE also had the capability of sending a response to


IETF                  Exp. 14 July 1995              [Page 36]


Internet Draft   A Video Transmission Profile     January 1995


   the SINK(s) indicating that the SOURCE would not do
   selective retransmissions and therefore the SINK(s) should
   not bother to ask.
4. A SINK had the responsibility of
   A) Not asking for retransmissions too often (i.e.
      throttling its own behavior),
   B) Noticing if there were a "large number" of SINKS and if
      so, refraining from asking (to avoid flooding the SOURCE
      with too many requests), and
   C) Noticing whether the SINK was losing on too many parts
      of the image and if so, refraining from asking (on the
      theory that if the SINK was losing terribly badly, it
      was probably a real network problem and doing lots of
      requests and retransmissions would not solve the
      problem).
This capability proved most useful in dealing with situations
where the SINK just simply never received a certain portion of
the image.  This could occur quite often as a result of
peculiarities in network interface adaptors (e.g. every N'th
packet might be lost because the card only had N-1 buffers and
it spent the time when it should have received the N'th packet
emptying out its buffers), or various, inadvertant, "self-
synchronizations" that occured between the SOURCE and SINK.
One might have thought that the various random events that
occur in networks would have been enough to prevent this from
occuring, but these things did occur.


IETF                  Exp. 14 July 1995              [Page 37]


Internet Draft   A Video Transmission Profile     January 1995


6.  Can I See

6.1.  Overview

The Can I See (aka CNIC) feature is for unicast, one-to-one,
video traffic.  The model of operation is that a Requestor
submits a CNIC request to a known Server.  The Server then
unicasts back to the Requestor some number of video frames, or
a rejection notification, giving the reason.

The user must know the particulars for the specific Server
that is desired (IP Address and UDP ports).  Propagation of
this information is not a function of Loki.  This information
could easily be obtained via some directory or session control
protocols.

Some preliminary experimentation has been done on PCs that
combines a Loki receiver and a Web browser. A configuration
file that contains the details for a particular CNIC request
is obtained and downloaded via the Web. The browser then
invokes the Loki receiver with that configuration file and the
receiver then automatically submits the CNIC request.  This
worked reasonably well.

CNIC requests should never be multicast or broadcast.
Implosion problems would occur.

Using "N" unicast Can I See sessions is not a substitude of a
single one to N multicast session.


6.2.  Bridges/Mixers and Translators

The use of Can I See may be problematic when intermediate
devices such as bridges/mixers or translators are present.

In order to properly handle CNIC requests, an intermediate
system would have to: 1. Receive the request, 2. Be able to
map the request to some other, known, Server(s), (the Server
CSRC field of the CNIC request is provided to aid in this
mapping), 3. Generate one or more new requests, based on the
mapping in the previous step (which would then cause one or
more streams to be unicast to the intermediate system) and 4.
Take the received unicast streams, properly process them, and


IETF                  Exp. 14 July 1995              [Page 38]


Internet Draft   A Video Transmission Profile     January 1995


then unicast them on to the original Requestor.

This sort of higher level control protocol 'remapping' is
beyond the original scope of behavior planned for intermediate
devices.

Current work has not made use of Bridges/Mixers or
Translators.


6.3.  Historical Perspective

The Can I See (aka CNIC) feature was added to the protocol
early on as a debugging and demonstration tool.  As the first
implementations were developed and tested, we'd test them at
various places within FTP Software, as well as selected other
locations.  We could have left the software "transmitting" to
the intended receivers, either via unicasts or directed
broadcasts or multicasts.  This would have been, at the very
least, impolite.

Instead, we developed CNIC.  It was inspired by the general
idea of Cornell's CU See Me tool.  This allowed demonstrations
and tests from remote sites without running a lot of
unnecessary network traffic.

Since then it's proven to be a popular and continually useful
facility so it was retained in Loki.


IETF                  Exp. 14 July 1995              [Page 39]


Internet Draft   A Video Transmission Profile     January 1995


7.  Design Comments

A goal of the protocol, and its associated algorithms, is that
it be usable in a MS Windows environment by applications using
the standard, documented, MS Windows video application
programming interface (specifically, Microsoft's Video For
Windows).  This environment supports rich suite of video
encoding formats, image sizes, and so on.  Furthermore, there
is not necessarily a "common" video encoding format for all
Video for Windows compliant capture cards.  Since the
transmission functions are somewhat tied to the capabilities
of the hardware, this has led to a design decision to require
that the receivers be flexible, that they be able to receive
and properly process anything that a transmitter may send,
that they not consider it an error to receive something in a
format that they didn't expect.


IETF                  Exp. 14 July 1995              [Page 40]


Internet Draft   A Video Transmission Profile     January 1995


8.  References

[1]  Schulzrinne, Casner, Frederick, and Jacobson, "RTP: A
     Transport Protocol for Real-Time Applications", version
     05, 18 July 1994. draft-ietf-avt-rtp-05.  This document
     may no longer be available. Contact the author for
     copies.

[2]  Reynolds, J. K., and  J. Postel, "Assigned Numbers", RFC
     1700, October 1994.

[3]  Schulzrinne, Casner, Frederick, and Jacobson, "RTP: A
     Transport Protocol for Real-Time Applications", version
     06, 28 November 1994. draft-ietf-avt-rtp-06.


IETF                  Exp. 14 July 1995              [Page 41]


Internet Draft   A Video Transmission Profile     January 1995


9.  Security Considerations

Loki does not contain any provisions for security.  Loki
assumes that the underlying protocols provide any desired
security (i.e., the IP security work and/or the IP6 security
work actually produce useable security features).

Network video applications can generate a large amount of
traffic.  This traffic could severely load down various
elements in the network (64kbit lines can easily be saturated
with video traffic). In addition the high packet-rates can
overload some interface cards.  These two conditions, while
not, per se, security issues, can, end up degrading, or even
denying service to other users of the networks, media,
routers, and host.  This might be seen as not dissimilar to a
denial-of-service attack.


IETF                  Exp. 14 July 1995              [Page 42]


Internet Draft   A Video Transmission Profile     January 1995


10.  Author's Address

   Frank J. Kastenholz
   FTP Software
   2 High Street
   North Andover, MA, 01845-2620
   USA

   Phone: +1 508-685-4000

   EMail: kasten@ftp.com


IETF                  Exp. 14 July 1995              [Page 43]


Internet Draft   A Video Transmission Profile     January 1995


Table of Contents


 Status of this Memo ....................................    i
1 INTRODUCTION ..........................................    1
1.1 Note On Terminology .................................    2
1.2 Change Log ..........................................    2
2 PROTOCOL SPECIFICATION ................................    4
2.1 Byte Order ..........................................    4
2.2 Video Data ..........................................    4
2.2.1 Packet Size .......................................    4
2.2.2 RTP Header Profile ................................    5
2.2.2.1 Marker Bit ......................................    5
2.2.2.2 Payload Type ....................................    5
2.2.2.3 Extension Bit ...................................    5
2.2.3 Loki Header .......................................    5
2.3 RTCP Extensions .....................................    6
2.4 Loki Control Packets ................................    7
2.4.1 Discard ...........................................    8
2.4.2 Suspend ...........................................    8
2.4.3 Can I See Request .................................    8
2.4.4 Can I See Status ..................................   11
3 Video Encoding Specifications .........................   14
3.1 Pixel Formats .......................................   14
3.1.1 24 Bit RGB ........................................   14
3.1.2 16 Bit RGB ........................................   14
3.1.3 8 Bit Monochrome ..................................   15
3.1.4 4 Bit Monochrome ..................................   16
3.1.5 8 Bit Palette .....................................   16
3.1.6 4 Bit Palette .....................................   16
3.1.7 411 YUV ...........................................   16
3.2 Transfer Formats ....................................   19
3.2.1 Simple Mode .......................................   19
3.2.2 Indeo Simple Mode .................................   21
3.2.3 Block Mode ........................................   22
3.3 Format Values .......................................   23
4 Algorithms ............................................   25
4.1 Pixel Comparisons ...................................   25
4.2 Adaptive Transmit ...................................   26
4.3 Jitter ..............................................   28
4.4 Block Mode Algorithm ................................   30
5 Future Work ...........................................   35
5.1 RTP -06 .............................................   35
5.2 Additional Transmit Adaptivity ......................   35


IETF                  Exp. 14 July 1995              [Page ii]


Internet Draft   A Video Transmission Profile     January 1995


5.3 Selective Retransmissions ...........................   36
6 Can I See .............................................   38
6.1 Overview ............................................   38
6.2 Bridges/Mixers and Translators ......................   38
6.3 Historical Perspective ..............................   39
7 Design Comments .......................................   40
8 References ............................................   41
9 Security Considerations ...............................   42
10 Author's Address .....................................   43


IETF                  Exp. 14 July 1995             [Page iii]