Next: Appendix Up: Conclusions and Contributions Previous: Face Modeling for Interaction

Conclusions

We have demonstrated a real-time system which learns two-person interactive behaviour automatically by modeling the probabilistic relationship between a past action and its consequent reaction. The system is then able to engage in real-time interaction with a single user and impersonates the missing person by estimating and simulating the most likely action to take. The system is data driven, autononomous, and perceptually grounded. In addition, the imitation-based behaviour learning is capable of synthesizing compelling interaction and synthetic behaviour with minimal structural specifications. The subsequent interaction is real-time, non-intrusive and computationally efficient. These were the objectives outlined in the introduction and were met in the ARL design process.

We have demonstrated a synthetic character which was able to engage in waving, clapping and other interesting gestures in response to the user. The synthetic character learned passively without any specific behavioural programming. In addition, a probabilistic model of the synthetic behaviour was performed and RMS errors were used to evaluate its usefulness as a prediction system. It was shown to form a superior predictor than nearest-neighbour and constant velocity assumptions. Finally, a novel approach for function optimization, GBM (Generalized Bound Maximization) was presented. It was used to derive CEM (a Conditional Expectation Maximization algorithm) for estimating maximum conditional likelihood probability densities. The algorithm was demonstrated to be better suited to computing conditional densities (Gaussian mixture models, in particular) than the EM algorithm. The CEM algorithm also performed successfully on standard test databases. More detailed implementations were discessed for the CEM algorithm and further derivations were cast using GBM machinery. The resulting learning algorithm was shown to be a monotonically convergent conditional density estimator and succesfully utilized for the Action-Reaction Learning framework.

The main contributions of the thesis are:

The Action-Reaction Learning Paradigm
The framework and the implementation of the Action-Reaction learning system was proposed and implemented to automatically acquire and synthesize human behaviour from perceptual data.
Real-Time Head and Hand Tracking
An expectation-maximization-based head and hand tracker was developed for robust real-time recovery of gestures even under occlusion and contact of head and hands.
Temporal Representation
A compact time series representation of past interaction data (i.e. a short term memory) was developed to represent the input to the learning system. The representation was shown to be useful for various sorts of temporal data.
Conditional versus Conditioned Joint Estimation
An analysis of the differences between conditional density estimation and conditioned joint density estimates was presented and a Bayesian formalism for both was developed.
Generalized Bound Maximization
A bounding technique was introduced which extends some of the variational bounding principles and other algorithms with an emphasis on quadratic bounds. Local and annealed optimization results were shown on a wide class of functions.
Conditional Expectation Maximization
An algorithm was derived for computing the maximum conditional likelihood estimate for probability densities. In addition, the implementation for a conditioned mixture of Gaussians was presented in detail.
Integration and Modes of Operation
The ARL was formulated as a modular framework which involves the integration of various interchangeable components and data flow between them. Different modes of operation were developed including human-human, human-computer and computer-computer types of interaction. The ARL architecture was shown to encompass interaction, learning, simulation, filtering and prediction of behaviour.

The Action-Reaction-Learning framework analyzed and synthesized human behaviour from perceptual data. A probabilistic conditional model of behaviour was uncovered by learning input and output interactions discriminantly. There were no underlying generative models of behaviour and the user did not specify explicit rules or behavioural mechanisms. The system simply learned behaviour by looking at humans from the outside.

Next: Appendix Up: Conclusions and Contributions Previous: Face Modeling for Interaction

Tony Jebara
1999-09-15