next up previous contents
Next: Disclaimer Up: Related Work, Background and Previous: Synthesis: Interactive Environments, Graphics

Automatic Machine Learning

The availability of large data sets and computational resources have encouraged the development of machine learning and data-driven models which pose an interesting alternative to explicit and fully structured models of behaviour. A battery of tools are now available which can automatically learn interesting mappings and simulate complex phenomena. These include techniques such as Hidden Markov Models, Neural Networks, Support Vector Machines, Mixture Models, Decision Trees and Bayesian Network Inference [5] [29]. In general, these tools have fallen into two classes: discriminative and generative models. The first attempts to optimize the learning for a particular task while the second models a phenomenon in its entirety. This difference between the two approaches will be addressed in this thesis in particular detail and the above probabilistic formalisms will be employed in deriving a machine learning system for our purposes.

More related learning applications include the analysis of temporal phenomena (after all, behaviour is a time varying process). The Santa Fe competition [20] [18] [42] was a landmark project in this subarea of learning. The methods investigated therein employed a variety of learning techniques to specifically model and simulate the behaviours and dynamics of time series data. The phenomena considered ranged from physiological data to J.S. Bach's last unfinished fugue. The emphasis in many techniques was fully automatic models with flexibility and applicability to a variety of domains. The results from these techniques give a powerful message about modeling complex phenomena: an underlying complex and unobservable hidden phenomenon can be predicted from simple observational cues. For example, the rhythm of the changing seasons and the oscillations of a pendulum can be predicted into the future using observations of their oscillations without a true understanding of the underlying mechanisms [20]. We will call upon the spirit of these temporal modeling techniques and the machine learning principles to acquire behaviours in a data-driven approach.

Of particular relevance to this thesis is the ability to train behavioural models without any direct user intervention. This concept is essential to this thesis and imposes strong constraints and requirements on behavioural design. This is similar in spirit to the work of Sims [56] who relied on genetic algorithms to explore a large space of behavioural models for animating simple life-forms. There is no explicit intervention of the user or the programmer. The designer ultimately engineers a virtual world with physics laws and then provides a cost function (i.e. efficient creature mobility) to favor the evolution of certain artificial life-forms. This concept is also explored in physical domains (as opposed to virtual) by Uchibe [64] where robots acquire behaviour using reinforcement learning. The objective is to play a soccer game and behaviours which accomplish good offensive and defensive plays are favored. Unlike the virtual world of Sims, the robots here have a real-world environment which reduces the manual specification of an artificial reality which can potential be designed by the programmer to induce quite different phenomena. These types of learning are reminiscent of learning by doing or trial-and-error.

An important insight is gained by these approaches. Complex behaviour can emerge using simple feedback and reinforcement type learning. This approach contrasts more involved work which requires explicit identification of behaviours, states, decisions and actions. These come about through the wisdom and artistic efforts of a programmer. Although this process can result in very compelling synthetic behaviours, the task is by no means automatic or easy and requires significant skill. This is often impractical and such techniques are beyond the reach of many users.

Unfortunately, trial-and-error and learning by doing can be an exhaustive process. Without a few clues along the way, a learner can get lost exploring a huge space of poor solutions dominated by negative reinforcement [32]. Unlike reinforcement learning or genetic algorithms which search a space of behaviour solutions, imitation learning discovers behaviour from a set of real examples. Behaviour can be learned by observing other agents (i.e. teachers) behaving or interacting with each other. These techniques reduce the search space and provide a set of templates from which further learning (such as reinforcement) can proceed. This type of learning will be the dominant one in this thesis. It involves minimal overhead since behaviour is taught to the system by humans interacting naturally instead of by a programmer identifying rules or cost functions.

Of course, once some initial learning has been performed, meta-learning and higher order introspective learning can begin and might yield better understanding of behaviour. This aspect of learning is still in its infancy in the machine learning field and has much potential [62]. Put simply, once a few learning sessions have been completed, it is possible to reflect on these individual instances and perform deduction and induction. In other words, learn how to learn, discover generalities from the aggregate and recover some higher order structures and properties of each of the individual lessons. This is an area of interesting future research for behaviour acquisition.


next up previous contents
Next: Disclaimer Up: Related Work, Background and Previous: Synthesis: Interactive Environments, Graphics
Tony Jebara
1999-09-15