Anticipatory Mechanisms of Human Sensory-Motor Coordination Inspire Control of Adaptive Robots: A Brief Review

Robot Learning is intended for one term advanced Machine Learning courses taken by students from different computer science research disciplines. This text has all the features of a renowned best selling text. It gives a focused introduction to the primary themes in a Robot learning course and demonstrates the relevance and practicality of various Machine Learning algorithms to a wide variety of real-world applications from evolutionary techniques to reinforcement learning, classification, control, uncertainty and many other important fields. Salient features: - Comprehensive coverage of Evolutionary Techniques, Reinforcement Learning and Uncertainty. - Precise mathematical language used without excessive formalism and abstraction. - Included applications demonstrate the utility of the subject in terms of real-world problems. - A separate chapter on Anticipatory-mechanisms-of-human-sensory-motor-coordination and biped locomotion. - Collection of most recent research on Robot Learning.

ii. forward sensory models, predicting sensory signals resultant from a given current state, and iii. forward models of the physical properties of the environment, anticipating the behaviour of the external world. Hence, by cascading accurate forward dynamic and forward sensory models, transformation of motor commands into sensory consequences can be achieved, producing a lifetime of calibrated movements. The accuracy of forward models is maintained through adaptive processes driven by sensory prediction errors. Plenty of neuroscientific studies in humans suggest evidence of anticipatory mechanisms based on the concept of internal models, and several robotic implementations of predictive behaviors have been inspired on those biological mechanisms in order to achieve adaptive agents. This chapter provides an overview of such neuroscientific evidences, as well as the state of the art relative to corresponding implementations in robots. The chapter starts by reviewing several behavioral studies that have demonstrated anticipatory and adaptive mechanisms in human sensory-motor control based on internal models underlying tasks such as eye-hand coordination, object manipulation, eye movements, balance control, and locomotion. Then, after providing a description of neuroscientific bases that have pointed to the cerebellum as a site where internal models are learnt, allocated and maintained, the chapter summarizes different computational systems that may be developed to achieve predictive robot architectures, and presents specific implementations of adaptive behaviors in robots including anticipatory mechanisms in vision, object manipulation, and locomotion. The chapter also provides a discussion about the implications involved in endowing a robot with the capability of exhibiting an integral predictive behavior while performing tasks in real-world scenarios, in terms of several anticipatory mechanisms that should be implemented to control the robot. Finally, the chapter concludes by suggesting an open challenge in the biorobotics field: to design a computational model of the cerebellum as a unitary module able to learn and operate diverse internal models necessary to support advanced perception-action coordination of robots, showing a human-like robust reactive behavior improved by integral anticipatory and adaptive mechanisms, while dynamically interacting with the real world during typical real life tasks.

Neuroscientific bases of anticipatory and adaptive mechanisms
This section reviews diverse neuroscientific evidences of human anticipatory and adaptive mechanisms in sensory-motor control, including the consideration of the cerebellum as a prime candidate module involved in sensory prediction.

Eye-hand coordination
Evidence of access to a forward dynamic model of the arm from the saccadic eye movement system is shown in (Ariff et al., 2002). In this study, subjects performed reaching movements having their arms hidden and tracking the position of their unseen hand with their eyes. Ariff et al. (2002) found that in unperturbed reaching movements, saccade occurrence at any time t consistently provided an estimate of hand position at t+196 ms. However, the ability of the brain to guide saccades to the future position of the hand failed when a force pulse unexpectedly changed the arm dynamics immediately after perturbation. Thus, saccades were suppressed for 100 ms and then accurate predictive saccades re-emerged. The saccade inhibition period that followed the hand perturbation was suggested as the time length it takes to recompute the estimate of the future hand position. In a further study, the arm dynamics was altered by applying various external force fields (Nanayakkara & Shadmehr, 2003). Eyes were able to make accurate predictive saccades after the force pulse only when the externally imposed arm dynamics was predictable, indicating that the saccadic system is able to use new information on arm dynamics to improve its performance. In the context of reaching adaptation, Kluzik et al. (2008) studied subjects performing goaldirected reaching movements while holding the handle of a robotic arm that produced forces perturbing trajectories. Authors compared subjects' adaptation between three trial conditions: with robot forces turned off in unannounced manner, with robot forces turned off in announced manner, and free-space trials holding the handle but detached from the robot. When forces increased abruptly and in a single step, subjects made large errors in reaching. In contrast, in a gradual case with small force changes from one trial to the next one, subjects reported smaller performance errors. These results allowed authors to conclude that, although practice with a novel tool caused the formation of an internal model of the tool, it also appeared to produce a transient change in the internal model of the subject's arm.

Object manipulation
In (Johansson, 1998), a control scheme of object grasping and manipulation is proposed. In this scheme, both visual and somatosensory inputs are used in conjunction with internal models for parametric adjustment of fingertip forces to object properties in anticipation of the upcoming force requirements. At the heart of this control is the comparison of somatosensory inflow with the predicted afferent input. Detection of a mismatch between predicted and actual sensory input triggers corrective responses along with an update of the relevant internal model and thus a change in parameter specification. Witney et al. (2004) confirmed that feedback from cutaneous afferents is critical for successful feedforward control of grip force. Feedback is not only essential for the acquisition of the internal model, but constant uninterrupted feedback is also necessary to maintain previously acquired forward models. In analyzing whether the human brain anticipates in real time the consequences of movement corrections, Danion and Sarlegna (2007) monitored grip force while subjects transported a hand-held object to a visual target that could move unexpectedly. They found that subjects triggered fast arm movement corrections to bring the object to the new target location, and initiated grip force adjustments before or in synchrony with arm movement corrections. Throughout the movement, grip force anticipated the mechanical consequences www.intechopen.com resulting from arm motion, even when it was substantially corrected. Moreover, the predictive control of grip force did not interfere with the on-line control of arm trajectory. Those results allowed authors to confirm that motor prediction is an automatic, real-time process operating during movement execution and correction.

Eye movements
The purpose of smooth pursuit eye movements is to minimize retinal slip, i.e., target velocity projected onto the retina, stabilizing the image of the moving object on the fovea. It has been demonstrated that the brain uses predictions to execute this task and that a typical value is about 200 ms (Barnes & Asselman, 1991). In (Barnes & Asselman, 1991), experiments were conducted on human subjects required to actively pursue a small target or stare passively at a larger display as it moved in the horizontal plane. Results indicated that prediction is carried out through the storage of information about both the magnitude and timing of eye velocity. Repeated exposure to the moving target leads to update that information. Initially, the response occurred with a latency of approximately 100 ms after the onset of target exposure, but after three or four exposures, the smooth eye movement has increased in peak velocity by a factor of 1.5-2. Authors stressed the important role of visual feedback to check the validity of the velocity estimate in the predictive process. When a conflict between the estimate and the current visual input occurs, the estimation system is shut down, and the pursuit system falls back on the use of conventional visual feedback in order to build up a new estimate of velocity. In doing so, the reaction time to peak response is increased to 300 ms for the initial response, but becomes reduced to 200 ms after two or three presentations. Hence, in the normal mode of operation of the pursuit reflex, continuous visual feedback is enhanced by predictive estimates of eye velocity initiated under the control of the periodicity estimator and only corrected if retinal error conflict indicates an inappropriate predictive estimate.

Balance control
Anticipation in balance control in the presence of external perturbations is discussed in (Huxham et al., 2001). Balance control during walking is achieved via two strategies: proactive, which reduce or counteract stresses acting on the body, and reactive, which respond to failures of proactive components or to unexpected external perturbation. Proactive balance mechanisms are visual-based. Information about environmental conditions and changes is constantly received through the eyes and interpreted in the light of experience about its impact on stability. Thus, we step around or over perceived obstacles, reduce our walking speed if the surface appears to be slippery, and maintain a higher degree of alertness in potentially hazardous situations such as rough terrain or cluttered areas. A second form of proactive strategy termed predictive balance control, considers the forces acting on and within the body to maintain stability within the body and between the body and the support surface. It is dependent upon an accurate internal representation of the body and a learned awareness of how any movement or muscle action will alter those relationships. Predictive control of the forces acting on the body is largely achieved by anticipatory postural adjustments, which initially are not based on sensory input but rather on what experience has taught will be the amount and direction of destabilization produced by the movement.

www.intechopen.com
In summary, the balance system proactively monitors the external environment and predicts the effects of forces generated by voluntary movement on the body, making the adjustments necessary to maintain posture and equilibrium in anticipation of need. It is only when these adjustments fail or an unexpected destabilization occurs that the emergency back-up system of reactive balance responses is called in for crisis management.

Locomotion
Anticipatory head movements toward the direction to be walked were studied by Grasso et al. (1998). They measured head and eye movements in subjects walking along the same 90° corner trajectory both at light and with eyes closed and in forward and backward locomotion. This study showed that coherent head and eye movements sustain a gaze orientation synergy during forward navigation tasks. Anticipation occurs relative to the direction one is about to take. In absence of visual stimuli, the orienting movements show similar behaviour. However, after inverting the locomotion direction, they are not maintained but disappear or are reversed according to the direction of steering. These results add evidence to the hypothesis of a feed-forward navigation control system governing synergic head and eye movements aimed at anticipating future motor events.

The role of the cerebellum
Imaging and electrophysiological studies have pointed to the cerebellum as a site where internal models are learnt, allocated, and maintained, allowing predictive behaviour. While studying grip force modulation, Kawato et al. (2003) suggested that forward models of object and arm dynamics are stored in the cerebellum predicting load force variations caused by arm/object dynamics. Functional imaging showed the activation of the right, anterior and superior cerebellum, and the biventer in the left cerebellum. While human subjects learned to use a new tool, Imamizu et al. (2000) measured cerebellar activity by functional imaging, showing that specific voxels in the cerebellar cortex have bold signals that remain modified after a subject has learned a motor task involving the creation and storage of an internal model of the previously-unknown tool. Cerminara et al. (2009) provided direct electrophysiological evidence for the operation of an internal model that simulates an external object's motion, expressed in simple-spike activity of Purkinje cells within the lateral cerebellum. The firing of these cells follows the velocity of the moving target even when the target has disappeared briefly. Ghasia et al. (2008) found that putative cerebellar target neurons discharge in relation to a change in ocular torsion, suggesting that the cerebellum stores a model of ocular mechanics. Using data from the floccular complex of the cerebellar cortex during normal smooth pursuit eye movements, and during the vestibulo-ocular reflex, Lisberger (2009) found that the simple-spike firing rates of a single group of floccular Purkinje cells may reflect the output of different internal models, such as a model of the inertia of real-world objects, and a model of the physics of the orbit, where head and eye motion sum to produce gaze motion. Ebner and Pasalar (2008) studied monkeys performing manual pursuit tracking, and associated the simple-spike discharge of Purkinje cells in the intermediate and lateral cerebellum with a forward internal model of the arm predicting the consequences of arm movements, specifically the position, direction of movement, and speed of the limb.

www.intechopen.com
Internal models are useful in sensory-motor coordination only if their predictions are generally accurate. When an accurate representation has been learnt, e.g., a forward model of how motor commands affect the motion of the arm or the eyes, motor commands can be apply to this internal model and predict the motion that will result. However, the relationship between a motor command and the movement it produces is variable, since the body and the environment can both change (e.g., bones grow and muscle mass increases during development; disease can affect the strength of muscles that act on the eyes; physical perturbations can alter the visual and proprioceptive consequences of motor commands). Hence, in order to maintain a desired level of performance, the brain needs to be "robust" to those changes by means of updating or adapting the internal models (Shadmehr et al., 2010). According to Lisberger (2009), the theory of cerebellar learning could be an important facet of the operation of internal models in the cerebellum. In this theory, errors in movement are signaled by consistently timed spikes on the climbing fiber input to the cerebellum. In turn, climbing fibers cause long-term depression of the synapses from parallel fibers onto Purkinje cells, specifically for the parallel fibers that were active at or just before the time the climbing fiber input arrived. The extension of the cerebellar learning theory to cerebellar internal models proposes that depression of the parallel fiber to Purkinje cell synapses corrects the internal model in the cerebellum, so that the next instance of a given movement is closer to perfection.

Robotic implementations of predictive behaviours
Anticipatory animats involve agent architectures based on predictive models. Underlying these predictive architectures, different computational systems may be implemented (Butz et al., 2002): • Model-based reinforcement learning, where a model of the environment is learnt in addition to reinforcement values, and several anticipatory mechanisms can be employed such as biasing the decision maker toward the exploration of unknown/unseen regions or applying internal reinforcement updates.

•
Schema mechanism, where the model is represented by rules and learnt bottom-up by generating more specialized rules where necessary, although no generalization mechanism applies and the decision maker is biased on the exploitation of the model to achieve desired items in the environment. • The expectancy model SRS/E, which is not generalized but represented by a set of rules, and includes an additional sign list storing all states encountered so far. Reinforcement is only propagated once a desired state is generated by a behavioral module, and the propagation is accomplished using dynamic programming techniques applied to the learnt predictive model and the sign list.

•
Anticipatory learning classifier systems that, similar to the schema mechanism and SRS/E, contain an explicit prediction component, and the predictive model consists of a set of rules (classifiers) which are endowed with an "effect" part to predict the next situation the agent will encounter if the action specified by the rules is executed. These systems are able to generalize over sensory input.

•
Artificial neural networks (ANN), where the agent controller sends outputs to the actuators based on sensory inputs. Learning to control the agent consists in learning to associate the good set of outputs to any set of inputs that the agent may experience. The most common way to perform such learning consists in using the back-propagation algorithm, which computes, for each set of inputs, the errors on the outputs of the controller. With respect to the computed error, the weights of the connections in the network are modified so that the error will be smaller the next time the same inputs are encountered. Back-propagation is a supervised learning method, where a supervisor indicates at each time step what the agent should have done. Nevertheless, it is difficult to build a supervisor in most control problems where the correct behavior is not known in advance. The solution to this problem relies on anticipation (Tani, 1996;Tani, 1999). If the role of an ANN is to predict what the next input will be rather than to provide an output, then the error signal is available: the difference between what the ANN predicted and what has actually happened. Specific implementations of predictive behaviors in robots include anticipatory mechanisms in vision (Hoffmann, 2007;Datteri et al., 2003), object manipulation (Nishimoto et al., 2008;Laschi et al., 2008), and locomotion (Azevedo et al., 2004;Gross et al., 1998), as described in the following subsections.

Vision
In (Hoffmann, 2007), results are presented from experiments with a visually-guided fourwheeled mobile robot carrying out perceptual judgment based on visuo-motor anticipation to exhibit the ability to understand a spatial arrangement of obstacles in its behavioural meaning. The robot learns a forward model by moving randomly within arrangements of obstacles and observing the changing visual input. For perceptual judgment, the robot stands still, observes a single image, and internally simulates the changing images given a sequence of movement commands (wheel speeds) as specified by a certain movement plan. With this simulation, the robot judges the distance to an obstacle in front, and recognizes in an arrangement of obstacles either a dead end or a passage. Images from the robot omni-directional camera are processed to emphasize the obstacles and reduce the number of pixels. The forward model predicts an image given the current processed image and the wheel velocities. Images are predicted using a set of multi-layer perceptrons, where each pixel is computed by one three-layer perceptron. Datteri et al. (2003) proposed a perception-action scheme for visually-guided manipulation that includes mechanisms for visual predictions and for detecting unexpected events by comparisons between anticipated feedback and incoming feedback. Anticipated visual perceptions are based on motor commands and the associated proprioception of the robotic manipulator. If the system prediction is correct, full processing of the sensory input is not needed at this stage. Only when expected perceptions do not match incoming sensory data, full perceptual processing is activated. Experimental results from a feeding task where the robotic arm places a spoon in its Cartesian space, showed the robot capability to monitor the spoon trajectory by vision, without full visual processing at each step in "regular" situations, and to detect unexpected events that required the activation of full perceptual processing.

Object manipulation
In the context of anticipation mechanisms while manipulating objects, Nishimoto et al. (2008) proposed a dynamic neural network model of interactions between the inferior parietal lobe (IPL), representing human behavioural skills related to object manipulation and tool usage, and cells in the ventral premotor area (PMv), allowing learning, generation and recognition of goal-directed behaviours.

www.intechopen.com
Authors suggest that IPL might function as a forward sensory model by anticipating coming sensory inputs in achieving a specific goal, which is set by PMv and sent as input to IPL. The forward sensory model is built by using a continuous-time recurrent neural network that is trained with multiple sensory (visuo-proprioceptive) sequences acquired during the off-line teaching phase of a small-scale humanoid robot, where robot arm movements are guided in grasping the object to generate the desired trajectories. During the experiments, the robot was tested to autonomously perform three types of operational grasping actions on objects with both hands: lift up, move to the right, or move to the left. Experimental conditions included placing the object at arbitrary left or right locations inside or outside the training region, and changing the object location from center to left/right abruptly at arbitrary time step after the robot movement had been initiated. Results showed the robot capability to perform and generalize each behaviour successfully considering object location variations, and adapt to sudden environmental changes in real time until 20 time steps before reaching the object, a process that takes the robot 30 time steps in the normal condition. Laschi et al. (2008) implemented a model of human sensory-motor coordination in grasping and manipulation on a humanoid robotic system with an arm, a sensorized hand and a head with a binocular vision system. They demonstrated the robot able to reach and grasp an object detected by vision, and to predict the tactile feedback by means of internal models built by experience using neuro-fuzzy networks. Sensory prediction is employed during the grasping phase, which is controlled by a scheme based on the approach previously proposed by Datteri et al. (2003). The scheme consists of three main modules: vision, providing information about geometric features of the object of interest based on binocular images of the scene acquired by the robot cameras; preshaping, generating a proper hand/arm configuration to grasp the object based on inputs from the vision module about the object geometric features; and tactile prediction, producing the tactile image expected when the object is contacted based on the object geometric features from the vision module and the hand/arm configuration from the preshaping module. During training (creation of the internal models), the robot system grasps different kinds of objects in different positions in the workspace to collect correct data used to learn the correlations between visual information, hand and arm configurations, and tactile images. During the testing phase, several trials were executed where an object was located in a position in the workspace and the robot had to grasp, lift up and keep it with a stable grasp. Results showed a good system performance in terms of success rate, as well as a good system capability to predict the tactile feedback, as given by the low difference between the predicted tactile image and the actual one. In experimental conditions different from those of the training phase, the system was capable to generalize with respect to variations of object position and orientation, size and shape. Azevedo et al. (2004) proposed a locomotion control scheme for two-legged robots based on the human walking principle of anticipating the consequences of motor actions by using internal models. The approach is based on the optimization technique Trajectory-Free Nonlinear Model Predictive Control (TF-NMPC) that consists on optimizing the anticipated future behaviour of the system from inputs relative to contact forces employing an internal model over a finite sliding time horizon. A biped robot was successfully tested during static walking, dynamic walking, and postural control in presence of unexpected external thrusts. Gross et al. (1998) provided a neural control architecture implemented on a mobile miniature robot performing a local navigation task, where the robot anticipates the sensory consequences of all possible motor actions in order to navigate successfully in critical environmental regions such as in front of obstacles or intersections. The robot sensory system determines the basic 3D structure of the visual scenery using optical flow. The neural architecture learns to predict and evaluate the sensory consequences of hypothetically executed actions by simulating alternative sensory-motor sequences, selecting the best one, and executing it in reality. The subsequent flow field depends on the previous one and the executed action, thus the optical flow prediction subsystem can learn to anticipate the sensory consequences of selected actions. Learning after executing a real action results from comparing the real and the predicted sensory situation considering reinforcement signals received from the environment. By means of internal simulation, the system can look ahead and select the action sequence that yields to the highest total reward in the future. Results from contrasting the proposed anticipatory system with a reactive one showed the robot's ability to avoid obstacles earlier.

Summary and conclusions
The sensory-motor coordination system in humans is able to adjust for the presence of noise and delay in sensory feedback, and for changes in the body and the environment that alter the relationship between motor commands and their sensory consequences. This adjustment is achieved by employing anticipatory mechanisms based on the concept of internal models. Specifically, forward models receive a copy of the outgoing motor commands and generate a prediction of the expected sensory consequences. This output may be used to: i. adjust fingertip forces to object properties in anticipation of the upcoming force requirements, ii. increase the velocity of the smooth eye movement while pursuing a moving target, iii. make necessary adjustments to maintain body posture and equilibrium in anticipation of need, iv. trigger corrective responses when detecting a mismatch between predicted and actual sensory input, involving the corresponding update of the relevant internal model. Several behavioural studies have shown that the sensory-motor system acquires and maintains forward models of different systems (i.e., arm dynamics, grip force, eye velocity, external objects and tools dynamics, and postural stability within the body and between the body and the support surface), and it has been widely hypothesized that the cerebellum is the location of those internal models, and that the theory of cerebellar learning might come into play to allow the models to be adjusted. Even though the major evidence of the role of the cerebellum comes from imaging studies, recent electrophysiological research has analyzed recordings from cerebellar neurons in trying to identify patterns of neural discharge that might represent the output of diverse internal models. As reviewed within this chapter, although not in an exhaustive manner, several independent efforts in the robotics field have been inspired on human anticipatory mechanisms based on internal models to provide efficient and adaptive robot control. Each one of those efforts addresses predictive behaviour within the context of one specific motor system; e.g, visuo-motor coordination to determine the implications of a spatial arrangement of obstacles, or to place a spoon during a feeding task, object manipulation www.intechopen.com while performing grasping actions, postural control in presence of unexpected external thrusts, and navigation within environments having obstacles and intersections. Nevertheless, in trying to endow a robot with the capability of exhibiting an integral predictive behaviour while performing tasks in real-world scenarios, several anticipatory mechanisms should be implemented to control the robot. Simply to follow a visual target by coordinating eye, head, and leg movements, walking smoothly and efficiently in an unstructured environment, the robot performance should be based on diverse internal models allowing anticipation in vision (saccadic and smooth pursuit systems), head orientation according to the direction to be walked, balance control adapting posture to different terrains and configurations of environment, and interpretation of the significance and permanence of obstacles within the current scene. Assuming the cerebellum as a site involved in a wide variety of anticipatory processes by learning, allocating, and adapting different internal models in sensory-motor control, we conclude this brief review suggesting an open challenge in the biorobotics field: to design a computational model of the cerebellum as a unitary module able to operate diverse internal models necessary to support advanced perception-action coordination of robots, showing a human-like robust reactive behaviour improved by integral anticipatory and adaptive mechanisms while dynamically interacting with the real world during typical real life tasks. Anticipating the predictable part of the environment facilitates the identification of unpredictable changes, which allows the robot to improve its capability in moving in the world by exhibiting a fast reaction to those environmental changes.