Next: A Typical Scenario Up: Action-Reaction Learning: Analysis and Previous: Organization

Action-Reaction Learning: An Overview of the Paradigm

The following is an introduction to the Action-Reaction Learning architecture. The system's function is to observe multiple-human interaction passively, learn from it and then utilize this knowledge to interact with a single human.

**Figure 2.1:** Offline: Learning from Human Interaction
$\begin{figure} \center \begin{tabular}[b]{c} \epsfysize=3.0in \epsfbox{system1.ps} \end{tabular}\end{figure}$

The system is depicted in Figure 2.1. Three different types of processes exist: perceptual, synthesis and learning engines interlinked in real-time with asynchronous RPC data paths. Figure 2.1 shows the system being presented with a series of interactions between two individuals in a constrained context (i.e. a simple children's game) ^2.1. The system collects live perceptual measurements using a vision subsystem for each of the humans. The temporal sequences obtained are then analyzed by a machine learning subsystem to determine predictive mappings and associations between pieces of the sequences and their consequences.

On the left of the figure, a human user (represented as a black figure) is being monitored using a perceptual system. The perceptual system feeds a learning system with measurements which are stored as a time series within. Simultaneously, these measurements also drive a virtual character in a one-to-one sense (the gray figure) which mirrors the left human's actions as a graphical output for the human user on the right. A similar input and output is generated in parallel from the activity of the human on the right. Thus, the users interact with each other through the vision-to-graphics interface and use this virtual channel to visualize and constrain their interaction. Meanwhile, the learning system is 'spying' on the interaction and forming a time series of the measurements. This time series is training data for the system which is attempting to learn about this ongoing interaction in hopes of modeling and synthesizing similar behaviour itself.

**Figure 2.2:** Online: Interaction with Single User
$\begin{figure} \center \begin{tabular}[b]{c} \epsfysize=3.0in \epsfbox{system2.ps} \end{tabular}\end{figure}$

In Figure 2.2, the system has collected and assimilated the data. At this point it can computationally infer appropriate responses to the single remaining human user. Here, the perceptual system only needs to track the activity of the one human (black figure on the left) to stimulate the learning or estimation system for real-time interaction purposes (as opposed to interaction learning as before). The learning system performs an estimation and generates the most likely response to the user's behaviour. This is manifested by animating a computer graphics character (gray figure) in the synthesis subsystem. This is the main output of the ARL engine. It is fed back recursively into the learning subsystem so that it can remember its own actions and generate self-consistent behaviour. This is indicated by the arrow depicting flow from the reaction synthesis to the learning + estimation stage. Thus, there is a continuous feedback of self-observation in the learning system which can recall its own actions. In addition, the system determines the most likely action of the remaining user and transmits it as a prior to assist tracking in the vision subsystem. This flow from the learning system to the perception system (the eye) contains behavioural and dynamic predictions of the single user that is being observed and should help improve perception

Next: A Typical Scenario Up: Action-Reaction Learning: Analysis and Previous: Organization

Tony Jebara
1999-09-15