* Want to be able to share both "desktops" and "applications" [need explicit
  definitions] between multiple (>=2) GUI systems.

* Desktops/applications might or might not have an actual
  monitor/keyboard/mouse attached.

* Two use cases: application sharing (I have something I want you to look
  at, or we want to edit something collaboratively) and remote desktop
  access (I want to manipulate a desktop from somewhere other than where the
  applications are running).

* Needs to be private, authenticated, integrity-protected,
  access-controlled.

* Want to integrate this into the existing IETF session model.  This is
  strongly motivated for the application sharing case (voice conference plus
  slides); we believe it's also useful for the remote desktop case (esp. for
  audio integration).

* Want to support N viewers watching the same application/desktop.  Use
  existing access-control mechanisms to determine who gets to send input to
  them.

* Probably: at most one remote user at a time can send input; however,
  potentially remote and local input can occur simultaneously.

* We believe that filtering what a remote user of a desktop is allowed
to do
  with their control of an app/desktop's keyboard and mouse input is out of
  scope; it's the "same" as the local user.  However, it may be
necessary to
  sandbox applications in some cases, with sandbox control outside the view
  of the remoting protocol.

* Want to be able to send the app/desktop's audio streams, time-synchronized
  with the GUI view of the application.  Probably also want to be able to
  send undecoded video, both time- and spacially-synchronized with the GUI.

* The expectation is that applications won't need to be modified to work
  with this protocol -- application sharing tools intercept windowing system
  APIs at some level.  Modifications would however be required to
  applications to allow sending undecoded video.

* Any protocol negotiation occurs at the IETF session level, and is
  capability-based (not version-based).

* "Applications" are more than just windows -- they're a stack of related
  windows which serve the same task and are usually associated with the same
  process on the server.  Modal dialogs and floating (always-on-top) windows
  work.

* Protocol latency is low; bandwidth use is low; protocol scales up to
  full-screen full-motion video over gigabit ethernet and is still useful
  over low-bandwidth (wireless, modem) links; protocol works both in
  low-latency (LAN) and high-latency (WAN, wireless) environments.

* Internationalization works; particularly, two end systems don't have to
  have the same kind of keyboard.

* Minimal operating system dependencies on the end system.  System works
  across widely different GUI "qualities".

* Question: do we want to allow copy and paste across this system?

* Generalizable to several transport protocols, e.g. reliable multicast.

* Modular architecture: the various pieces shouldn't have excessive
  inter-dependencies.  I.e. the framebuffer protocol, transport protocols,
  sound synchronization, etc., should be developable separately once we have
  the basic framework worked out. 

* Timing to allow reconstruction of same user experience (e.g., for
games and video)

* Floor control out of scope

http://metavnc.sourceforge.net/

http://www1.cs.columbia.edu/~ricardo/thinc/png/

the 32x32/1024x1024 in the graph is the maximum size of the update the
server 
sends, although this is not enforced by tiling but instead by sending
consecutive 
lines up until 322 pixels.

what was coded was (for the most part) images in web pages.  The text
and background regions are sent through other means.

the dirver and app interface, as it stands right now, uses bitmap
commands (bilevels and transparent glyphs) to draw text on the screen,
while using solid fills or pattern fills for the background parts.

of course, this all depends on what kind of commands the application
uses.  if for example, the app does its own text rendering and then
sends just an image of the text to the display, then at the video driver
level we only see an image and must pass it through normal compression
(i.e.  zlib).

I believe ffmpeg's libavcodec is the best open-source MPEG codec.  (LGPL'd.)

You'll want to use the CVS version, not the most recent release; their
release manager seems to have gone AWOL, but the developers are still making
improvements.

-------

Keith Lantz

I belatedly thought of that. The other two papers are there too:
http://portal.acm.org/citation.cfm?id=637106
http://portal.acm.org/citation.cfm?id=97301