In Figure 8.3 one can see the system as it synthesizes interactive behaviour with a single user. User A is given the illusion of interacting with user B through the synthesis of the ARL system. The vision system on A still takes measurements and these integrate and feed the learning system. However, the output of the learning system is also fed back into the short term memory. It fills in the missing component (user B) inside . Thus, not only does user A see synthesized output, continuity is maintained by feeding back synthetic measurements. This gives the system the ability to see its own actions and maintain self-consistent behaviour. The half of the time series that used to be generated by B is now being synthesized by the ARL system. The is continuously updated allowing good estimates of . In fact, the ARL prediction only computes small steps into the future and these deltas do not amount to anything on their own unless integrated and accumulated. Since the attentional window which integrates the measurements is longer than a few seconds, this gives the system enough short term memory to maintain consistency over a wide range of gestures and avoids instability.
Of course, for the initial few seconds of interaction, the system has not synthesized any actions and user A has yet to gesture. Thus, there are no vectors and no accumulated short term memory. Therefore, the unobserved first few seconds of the time series are set to reasonable default values. The system eventually bootstraps and stabilizes when a user begins interacting with it and output is fed back.
Simultaneously, the real-time graphical blob representation is used to un-map the predicted perceptual (the action) for the visual display. It is through this display that the human user receives feedback in real-time from the system's reactions. This is necessary to maintain the interaction which requires the human user to pay attention and respond appropriately. For the head and hand tracking case, the graphics system is kept simple and merely renders blobs in a 1 to 1 mapping. It displays to the user only what it perceives (three blobs). The ARL system's primary concern is head and hand position and this becomes clear from the coarse display. In addition, the user is not as misled into expecting too much intelligence from such a simple output.