Next: Objectives and Features Up: Action-Reaction Learning: Analysis and Previous: List of Tables

Introduction

Behavior is what a man does, not what he thinks, feels, or believes.

Source Unknown

The Action Reaction Learning framework is an automatic perceptually based machine learning system. It autonomously studies the natural interactions of two humans to learn their behaviours and later engage a single human in a real-time interaction. The model is fundamentally empirical and is derived from what the humans do externally, not from their underlying behavioural architectures or hard wired cognitive knowledge and models.

Earlier models of human behaviour proposed by cognitive scientists analyzed humans as an input-output or stimulus-response system [67] [61]. The models were based on observation and empirical studies. These behaviourists came under criticism as cognitive science evolved beyond their over-simplified model and struggled with higher order issues (i.e. language, creativity, and attention) [34]. Nevertheless, much of the lower-order reactionary behaviour was still well modeled by the stimulus-response paradigm. To a casual observer, these simple models seem fine and it is only after closer examination that one realizes that far more complex underlying processes must be taking place.

The tendency to find simplistic interpretations for underlying complex phenomena has been almost omnipresent in history. Mankind understood the changing of the seasons and could predict and adapt to them long before realizing the world was a sphere which rotated around the sun on a tilted axis. The laws of physics are precise and elegant, however the underlying seasonal change mechanism is complex even though the resulting observed output is simple and regular. In this sense, the old heuristic models still serve their purpose. Similarly, the expression of certain behaviours can be understood at an external observational level without transcendental understanding of the true generative models that cause them.

We propose Action-Reaction Learning (ARL) for the recovery of human behaviour by making an appeal to the behaviourists' stimulus response or input-output model. We also emphasize the recovery of simple, externally observable behaviour as opposed to internal underlying cognitive machinery. For constrained applications, interaction and synthesis, it matters more what a human is doing on the outside than on the inside. The ARL system can be seen as an interesting way to explore how far a perceptual model of behavioural phenomena can be used. When can it explain correlations between gestures and to imitate the simple behaviours humans engage in? The emphasis on real-time perception and synthesis of real-time interactive and expressive output behaviour will therefore be central to the ARL paradigm.

The system begins by observing some interaction between humans to perform a type of imitation learning. The objective of the learning is specifically to later recreate the patterns of behaviour in a synthetic interaction with a single human participant. During the initial training session, the interaction (actions and reactions) being produced by two individuals are first analyzed using perceptual measurements and machine learning to uncover an observational model of the process. The perceptual system consists of a vision algorithm that tracks head and hand motion. Such computer vision analysis and automatic learning of dynamics and simple behaviour from perceptual measurements has recently developed very rapidly [24] [74] [45] [46] [70] [8] [7] [57]. An important transition is beginning to take place as the vision and other perceptual modalities become intimately with machine learning for human observation. These models begin to acquire the ability to make reliable predictions about regularities in human behaviour beyond simple dynamics and direct measurements [46].

Without explicitly discovering the particular underlying generative models that produced the interaction, the system learns a probabilistic mapping between past gestural actions and the future reactions that should follow. The learning is specifically an input-output type of learning which uncovers a predictive model as opposed to a full generative model. These two types of models are cast into statistical terms as conditional (predictive) models and joint (generative) models. In order to solve the desired conditional or predictive aspects of the model, we present a novel algorithm, Conditional Expectation Maximization (CEM), specifically for computing predictive or conditional probabilities from training data. This algorithm employs some novel mathematical machinery, General Bound Maximization (GBM), to optimize a model of the observed human interactions. Of particular relevance is the close similarity of the stimulus-response behaviourist model to input-output machine learning algorithms that have become workhorses in data-driven modeling. The Action-Reaction learning system's probabilistic learning algorithm uncovers such an input-output mapping. Here, the input is a stimulus and the output is the response from interaction data. The goal of the model is not to classify behaviour into categories or arrange it into some prespecified structure. Its objective is to learn how to predict and simulate appropriate behaviour in an interactive setting.

Having formed a predictive model by examining multiple humans interacting, the system is then used to synthesize one of the humans and simulate an interaction with graphical animation output. With advances in computation, the simulation and the analysis of behaviour has become a feasible proposition. In simulation domains, dynamics, kinematics, ethological models, rule based systems and reinforcement learning have been proposed to synthesize compelling interaction with artificial characters [6] [58] [52] [56] [3].

Thus, we propose the combination of the properties of both behaviour simulation and behaviour analysis into a common automatic framework. The Action-Reaction Learning approach acquires models of human behaviour from video and controls synthetic characters. Driven by these models and perceptual measurements, these characters are capable of interacting with humans in real-time. Ultimately, the user need not specify behaviour directly (and tediously) but teaches the system merely by interacting with another individual. The model results from an unsupervised analysis of human interaction and its ultimate goal is the synthesis of such human behaviour. This is attempted with minimal predetermined structures, hand-wired knowledge and user intervention. The behaviours this thesis will address and discuss will be limited to physical activities that can be externally measured by the system. ^1.1

In this thesis, we will be referring to a more generic definition of behaviour as in the Merriam-Webster dictionary: `anything that an organism does involving action and response to stimulation'. Alternatively, a hierarchy of behavioural levels can be described as in Figure 1.1 starting with gestures, actions and ending with higher order behaviour. In this more formal sense, behaviour specifically involves higher order concepts such as complex planning, goals, context, and understanding. The Action-Reaction Learning framework will only target the lower-order reactionary gestures and actions in this hierarchy and will not address these higher order cognitive issues.

**Figure 1.1:** Behaviour Hierarchy
$\begin{figure}\center \begin{tabular}[b]{c} \epsfxsize=2.5in \epsfbox{hierarchy.ps} \end{tabular}\end{figure}$

Next: Objectives and Features Up: Action-Reaction Learning: Analysis and Previous: List of Tables

Tony Jebara
1999-09-15