Robotics Arm Visual Servo: Estimation of Arm-Space Kinematics Relations with Epipolar Geometry

1.1 Visual servoing for robotics applications Numerous advances in robotics have been inspired by reliable concepts of biological systems. Necessity for improvements has been recognized due to lack of sensory capabilities in robotic systems which make them unable to cope with challenges such as anonymous and changing workspace, undefined location, calibration errors, and different alternating concepts. Visual servoing aims to control a robotics system through artificial vision, in a way as to manipulate an environment, in a similar way to humans actions. It has always been found that, it is not a straightforward task to combine "Visual Information" with a "Arm Dynamic" controllers. This is due to different natures of descriptions which defines "Physical Parameters" within an arm controller loop. Studies have also revealed an option of using a trainable system for learning some complicated kinematics relating object features to robotics arm joint space. To achieve visual tracking, visual servoing and control, for accurate manipulation objectives without losing it from a robotics system, it is essential to relate a number of an object's geometrical features (object space) into a robotics system joint space (arm joint space). An object visual data, an play important role in such sense. Most robotics visual servo systems rely on object "features Jacobian", in addition to its inverse Jacobian. Object visual features inverse Jacobian is not easily put together and computed, hence to use such relation in a visual loops. A neural system have been used to approximate such relations, hence avoiding computing object's feature inverse Jacobian, even at singular Jacobian postures. Within this chapter, we shall be discussing and presenting an integration approach that combines "Visual Feedback" sensory data with a "6-DOF robotics Arm Controller". Visual servo is considered as a methodology to control movements of a robotics system using certain visual information to achieve a task. Visionary data is acquired from a camera that is mounted directly on a robot manipulator or on a mobile robot, in which case, motion of the robot induces camera motion. Differently, the camera can be fixed, so that can observe the robot motion. In this sense, visual servo control relies on techniques from image processing, computer vision control theory, kinematics, dynamic and real time computing.

In (Chen et. al. 2008), an adaptive visual servo regulation control for camera-in-hand configuration with a fixed camera extension was presented by Chen.An image-based regulation control of a robot manipulator with an uncalibrated vision system is discussed.To compensate for the unknown camera calibration parameters, a novel prediction error formulation is presented.To achieve the control objectives, a Lyapunov-based adaptive control strategy is employed.The control development for the camera-in-hand problem is presented in detail and a fixed-camera problem is included as an extension.Epipolar Geometry Toolbox as in (Gian et. al. 2004), was also created to grant MATLAB users with a broaden outline for a creation and visualization of multi-camera scenarios.In addition, to be used for the manipulation of visual information, and the visual geometry.Functions provided, for both class of vision sensing, the pinhole and panoramic, include camera assignment and visualization, computation, and estimation of epipolar geometry entities.Visual servoing has been classified as using visual data within a control loop, enabling visual-motor (hand-eye) coordination.Image Based Visual Servoing Using Takagi-Sugeno Fuzzy Neural Network Controller has been proposed by (Miao et. al. 2007).In their study, a T-S fuzzy neural controller based IBVS method was proposed.Eigenspace based image compression method is firstly explored which is chosen as the global feature transformation method.Inner structure, performance and training method of T-S neural network controller are discussed respectively.Besides that, the whole architecture of TS-FNNC is investigated.For robotics arm visual servo, this issue has been formulated as a function of object feature Jacobian.Feature Jacobian is a complicated matrix to compute for real-time applications.For more feature points in space, the issue of computing inverse of such matrix is even more hard to achieve.

Chapter contribution
For robotics arm visual servo, this issue has been formulated as a function of object feature Jacobian.Feature Jacobian Matrix entries are complicated differential relations to be computed for real-time applications.For more feature points in space, the issue of computing inverse of such matrix is even more hard to achieve.In this respect, this chapter concentrates on approximating differential visual information relations relating an object movement in space to the object motion in camera space (which usually complicated relation), hence to joint space.This is known as the (object feature points).The proposed methodology will also discuss how a trained learning system can be used to achieve the needed approximation.The proposed methodology is entirely based on utilizing and merging of three MatLab Tool Boxes.Robotics Toolbox developed by Peter Corke (Corke, 2002), secondly is the Epipolar Geometry Toolbox (EGT) developed by Eleonora Alunno (Eleonora et. al. 2004), whereas the third is the ANN MatLab Tool Box.This chapter is presenting a research framework which was oriented to develop a robotics visual servo system that relies on approximating complicated nonlinear visual servo kinematics.It concentrates on approximating differential visual changes (object features) relations relating objects movement in a world space to object motion in a camera space (usually time-consuming relations as expressed by time-varying Jacobian matrix), hence to a robotics arm joint space.The research is also presenting how a trained Neural Network can be utilized to learn the needed approximation and inter-related mappings.The research whole concept is based on utilizing and mergence three fundamentals.The first is a robotics arm-object visual kinematics, the second is the Epipolar Geometry relating different object scenes during motion, and a learning artificial neural system.To validate the concept, the visual control loop algorithm developed by RIVES has been thus adopted to include a learning neural system.Results have indicated that, the proposed visual servoing methodology was able to produce a considerable accurate results.

Chapter organization
The chapter has been sub-divided into six main sections.In this respect, in section (1) we introduce the concept of robotics visual servo and related background, as related to visual servo.In section (2), we present a background and some related literatures for single scene via signal camera system.Double scene, as known as Epipolar Geometry is also presented in depth in section (3).Artificial Neural Net based IBVS is also presented in Section (4), whereas simulated of learning and training of an Artificial Neural Net is presented in section (5).Section (5) presents a case study and a simulation result of the proposed method, as compared to RIVES algorithm.Finally, Section (6) presents the chapter conclusions.(2) shows a typical camera geometrical representation in a space.Hence, to assemble a closed loop visual servo system, a loop is to be closed around the robotics arm system.In this study, this is a PUMA-560 robotics arm, with a Pinhole camera system.The camera image plane and associated geometry is shown in Fig. (3).For analyzing closed loop visual kinematics, we shall employ a Pinhole Camera Model for capturing object features.For whole representation, details of a Pinhole camera model in terms of image plane ( a ξ , a ψ ) location are expressed in terms ( ξ ,ψ , and ζ ), as in Equ. (1).In reference to Fig.
(2), we can express ( ) aa , ξψ locations as expressed in terms ( ) Both ( ) aa , ξψ location over an image plane is thus calculated by Equ. (1).For thin lenses (as the Pinhole camera model), camera geometry are thus represented by, (Gian et. al. 2004) : In reference to (Gian et. al. 2004), using Craig Notation, denotes coordinate of point P in frame B .For translation case : is a coordinate of the origin OA of frame "A" in a new coordinate system "B".Rotations are thus expressed : In Equ (4), B A i is a frame axis coordinate of "A" in another coordinate "B".In this respect, for rigid transformation we have: For more than single consecutive rigid transformations, (for example to frame "C"), i.e. form frames , coordinate of point P in frame " C " can hence be expressed by: ( ) For homogeneous coordinates, it looks very concise to express B P as : We could express elements of a transformation ( Γ ) by writing : Euclidean transformation preserves parallel lines and angles, on the contrary, affine preserves parallel lines but not angles, Introducing a normalized image plane located at focal length 1 φ = .For this normalized image plane, pinhole (C) is mapped into the origin of an image plane using: Ĉ and P are mapped to : ( ) we also have : κ β as intrinsic camera parameters.In fact, they do present an inner camera imaging parameters.In matrix notation, this can be expressed as : Both ( R ) and ( Ω ) extrinsic camera parameters, do represent coordinate transformation between camera coordinate system and world coordinate system.Hence, any ( ) uv , point in camera image plan is evaluated via the following relation: Here ( M ) in Equ ( 17) is referred to as a Camera Projection Matrix.We are given (1) a calibration rig, i.e., a reference object, to provide the world coordinate system, and (2) an image of the reference object.The problem is to solve (a) the projection matrix, and (b) the intrinsic and extrinsic parameters.

Computing a projection matrix
In a mathematical sense, we are given ( ) ξψζ and T ii uv () for i = (1 …… n), we want to solve for M1 and M2 : Obviously, we can let 34 1 σ = .This will result in the projection matrix is scaled by 34 σ .
Once KM U = , is the pseudo inverse of matrix K , and m and 34 m consist of the projection matrix M .We now turn to double scene analysis.

Double camera scene {epipolar geometry analysis}
In this section, we shall consider an image resulting from two camera views.For two perspective views of the same scene taken from two separate viewpoints 1 Ο and 2 Ο , this is illustrated in Fig.
( ( ) One of the main parameters of an epipolar geometry is the fundamental Matrix Η (which is 33 ℜ∈ × ).In reference to the Η matrix, it conveys most of the information about the relative position and orientation ( t,R ) between the two different views.Moreover, the fundamental matrix Η algebraically relates corresponding points in the two images through the Epipolar Constraint.For instant, let the case of two views of the same 3-D point w Χ , both characterized by their relative position and orientation ( t,R ) and the internal, hence Η is evaluated in terms of 1 K and 2 K , representing extrinsic camera parameters, (Gian et al., 2004) : In such a similar case, a 3-D point ( w Χ ) is projected onto two image planes, to points ( m 2 ) and ( m 1 ), as to constitute a conjugate pair.Given a point ( m 1 ) in left image plane, its conjugate point in the right image is constrained to lie on the epipolar line of ( m 1 ).The line is considered as the projection through C2 of optical ray of m 1 .All epipolar lines in one image plane pass through an epipole point.This is a projection of conjugate optical centre: c EP  is a projection operator extracting the (i th ) component from a vector.
When ( C 1 ) is in the focal plane of right camera, the right epipole is an infinity, and the epipolar lines form a bundle of parallel lines in the right image.Direction of each epipolar line is evaluated by derivative of parametric equations listed above with respect to ( λ ) :  The epipole is projected to infinity once ( ) . In such a case, direction of the epipolar lines in right image doesn't depend on any more.All epipolar lines becomes parallel to vector ( ) A very special occurrence is once both epipoles are at infinity.This happens once a line containing ( C 1 ) and ( C2 ), the baseline, is contained in both focal planes, or the retinal planes are parallel and horizontal in each image as in Fig. (4).The right pictures plot the epipolar lines corresponding to the point marked in the left pictures.This procedure is called rectification.If cameras share the same focal plane the common retinal plane is constrained to be parallel to the baseline and epipolar lines are parallel.

Neural net based Image -Based Visual Servo control (ANN-IBVS)
Over the last section we have focused more in single and double camera scenes, i.e. representing the robot arm visual sensory input.In this section, we shall focus on "Image-Based Visual Servo" (IBVS) which uses locations of object features on image planes (epipolar) for direct visual feedback.For instant, while reconsidering Fig.
(1), it is desired to move a robotics arm in such away that camera's view changes from ( an initial) to (final) view, and feature vector from ( 0 φ ) to ( d φ ).Here ( 0 φ ) may comprise coordinates of vertices, or areas of the object to be tracked.Implicit in ( d φ ) is the robot is normal to, and centered over features of an object, at a desired distance.Elements of the task are thus specified in image space.For a robotics system with an end-effector mounted camera, viewpoint and features are functions of relative pose of the camera to the target, ( c t ξ ).Such function is usually nonlinear and cross-coupled.A motion of end-effectors DOF results in complex motion of many features.For instant, a camera rotation can cause features to translate horizontally and vertically on the same image plane, as related via the following relationship : ( ) Equ ( 30) is to be linearized.This is to be achieved around an operating point: ( ) In Equ (32), ( ) J x is the Jacobian matrix, relating rate of change in robot arm pose to rate of change in feature space.Variously, this Jacobian is referred to as the feature Jacobian, image Jacobian, feature sensitivity matrix, or interaction matrix.Assume that the Jacobian is square and non-singular, then: ( ) from which a control law can be expressed by : will tend to move the robotics arm towards desired feature vector.In Equ (34), f K is a diagonal gain matrix, and (t) indicates a time varying quantity.Object posture rates c t x  is converted to robot end-effector rates.A Jacobian , f c t c J x () as derived from relative pose between the end-effecter and camera, ( ) x is used for that purpose.In this respect, a technique to determine a transformation between a robot's end-effector and the camera frame is given by Lenz and Tsai, as in (Lenz & Tsai. 1988).In a similar approach, an endeffector rates may be converted to manipulator joint rates using the manipulator's Jacobian (Croke, 1994), as follows: t θ  represents the robot joint space rate.A complete closed loop equation can then be given by: For achieving this task, an analytical expression of the error function is given by : Here, and J is the Jacobian matrix of task function as J ∂φ  =  ∂Χ  .Due to modeling errors, such a closed-loop system is relatively robust in a possible presence of image distortions and kinematics parameter variations of the Puma 560 kinematics.A number of researchers also have demonstrated good results in using this image-based approach for visual servoing.It is always reported that, the significant problem is computing or estimating the feature Jacobian, where a variety of approaches have been used (Croke, 1994).The proposed IBVS structure of Weiss (Weiss et. al., 1987 andCraig, 2004), controls robot joint angles directly using measured image features.Non-linearities include manipulator kinematics and dynamics as well as the perspective imaging model.Adaptive control was also proposed, since f1 c J( ) − θ θ , is pose dependent, (Craig, 2004).In this study, changing relationship between robot posture and image feature change is learned during a motion via a learning neural system.The learning neural system accepts a weighted set of inputs (stimulus) and responds.

Visual mapping: Nonlinear function approximation ANN mapping
A layered feed-forward network consists of a certain number of layers, and each layer contains a certain number of units.There is an input layer, an output layer, and one or more hidden layers between the input and the output layer.Each unit receives its inputs directly from the previous layer (except for input units) and sends its output directly to units in the next layer.Unlike the Recurrent network, which contains feedback information, there are no connections from any of the units to the inputs of the previous layers nor to other units in the same layer, nor to units more than one layer ahead.Every unit only acts as an input to the immediate next layer.Obviously, this class of networks is easier to analyze theoretically than other general topologies because their outputs can be represented with explicit functions of the inputs and the weights.In this research we focused on the use of Back-Propagation Algorithm as a learning method, where all associated mathematical used formulae are in reference to Fig. ( 5).The figure depicts a multi-layer artificial neural net (a four layer) being connected to form the entire network which learns using the Back-propagation learning algorithm.To train the network and measure how well it performs, an objective function must be defined to provide an unambiguous numerical rating of system performance.Selection of the objective function is very important because the function represents the design goals and decides what training algorithm can be taken.For this research frame work, a few basic cost functions have been investigated, where the sum of squares error function was used as defined by Equ. ( 38 where p indexes the patterns in the training set, i indexes the output nodes, and t pi and y pi are, respectively, the target hand joint space position and actual network output for the i th output unit on the p th pattern.An illustration of the layered network with an input layer, two hidden layers, and an output layer is shown in Fig. (5).In this network there are i inputs, (m) hidden units, and (n) output units.The output of the j th hidden unit is obtained by first forming a weighted linear combination of the (i) input values, then adding a bias: where ji w (1) is a weight from input (i) to hidden unit (j) in the first layer.j w (1) 0 is a bias for hidden unit j.If we are considering a bias term as being weights from an extra input x0 1 = , Equ. ( 39) can be rewritten to the form of: The activation of hidden unit j then can be obtained by transforming the linear sum using a nonlinear activation function g x () : ( )  5) is a synthesized ANN network with two hidden layers, which can be extended to have extra hidden layers easily, as long as we make the above transformation further.Input units do transform neural network signals to the next processing nodes.They are hypothetical units that produce outputs equal to their supposed inputs, hence no processing is done by these input units.Through this approach, the error of the network is propagated backward recursively through the entire network and all of the weights are adjusted so as to minimize the overall network error.The block diagram of the used learning neural network is illustrated in Fig. ( 6).The network learns the relationship between the previous changes in the joint angles k 1 − Δ Θ , changes in the object posture c a u Δ , and changes joint angles k Δ Θ .This is done by executing some random displacements from the desired object position and orientation.The hand fingers is set up in the desired position and orientation to the object.Different Cartesian based trajectories are then defined and the inverse Jacobian were used to compute the associated joints displacement () . Different object postures with joint positions and differential changes in joint positions are the inputoutput patterns for training the employed neural network.During the learning epoch, weights of connections of neurons and biases are updated and changed, in such away that errors decrease to a value close to zero, which resulted in the learning curve that minimizes the defined objective function shown as will be further discussed later.It should be mentioned at this stage that the training process has indeed consumed nearly up to three hours, this is due to the large mount of training patterns to be presented to the neural network.

Artificial neural networks mapping: A biological inspiration
Animals are able to respond adaptively to changes in their external and internal environment and surroundings, and they use their nervous system to perform these behaviours.An appropriate model/simulation of a nervous system should be able to produce similar responses and behaviours in artificial systems.A nervous system is built by relatively simple; units, the neurons, so copying their behaviour and functionality should be the solution, (Pellionisz, 1989).In reality, human brain is a part of the central nervous system, it contains of the order of (10 +10 ) neurons.Each can activate in approximately 5ms and connects to the order of (10 +4 ) other neurons giving (10 +14 ) connections, (Shields & Casey, 2008).In reality, a typical neural net (with neurons) is shown in Spikes travelling along the axon of the pre-synaptic neuron trigger the release of neurotransmitter substances at the synapse.The neurotransmitters cause excitation or inhibition in the dendrite of the post-synaptic neuron.The integration of the excitatory and inhibitory signals may produce spikes in the post-synaptic neuron.The contribution of the signals depends on the strength of the synaptic connection (Pellionisz, 1989).An Artificial Neural Network (ANN) is an information processing paradigm that is inspired by the way www.intechopen.combiological nervous systems, such as the brain, process information.The key element of this paradigm is the novel structure of the information processing system.It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems.ANN system, like people, learn by example.An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process.Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons.This is true of ANN system as well (Aleksander & Morton, 1995).The four-layer feed-forward neural network with (n) input units, (m) output units and N units in the hidden layer, already shown in the Fig. ( 5), and as will be further discussed later.In reality, Fig. ( 5).exposes only one possible neural network architecture that will serve the purpose.In reference to the Fig. ( 5), every node is designed in such away to mimic its biological counterpart, the neuron.Interconnection of different neurons forms an entire grid of the used ANN that have the ability to learn and approximate the nonlinear visual kinematics relations.The used learning neural system composes of four layers.The {input}, {output} layers, and two {hidden layers}.If we denote ( w c ν ) and ( w c ω ) as the camera's linear and angular velocities with respect to the robot frame respectively, motion of the image feature point as a function of the camera velocity is obtained through the following matrix relation:  Visual servoing using a "pin-hole" camera for a 6-DOF robotics arm is simulated here.The system under study is a (PUMA) and integrated with a camera and ANN.Simulation block is shown Fig. (7).Over simulation, the task has been performed using 6-DOF-PUMA manipulator with 6 revolute joints and a camera that can provide position information of the robot gripper tip and a target (object) in robot workplace.The robot dynamics and direct kinematics are expressed by a set of equations of PUMA-560 robotics system, as documented by Craig, (Craig, 2004).Kinematics and dynamics equations are already well known in the literature, therefore.For a purpose of comparison, the used example is based on visual servoing system developed by RIVES, as in (Eleonora, 2004).The robotics arm system are has been servoing to follow an object that is moving in a 3-D working space.Object has been characterized by some like (8-features) marks, this has resulted in 24, 83 × ℜ∈ size, feature Jacobian matrix.This is visually shown in Fig. (7).An object 8-features will be mapped to the movement of the object in the camera image plane through defined geometries.Changes in features points, and the differentional changes in robot arm, constitute the data that will be used for training the ANN.The employed ANN architecture has already been discussed and presented in Fig. ( 5).

Training phase: visual training patterns generation
The foremost ambition of this visual servoing is to drive a 6-DOF robot arm, as simulated with Robot Toolbox (Corke , 2002), and equipped with a pin-hole camera, as simulated with Epipolar Geometry Toolbox, EGT (Gian et al., 2004), from a starting configuration toward a desired one using only image data provided during the robot motion.For the purpose of setting up the proposed method, RIVES algorithm has been run a number of time before hand.In each case, the arm was servoing with different object posture and a desired location in the working space.The EGT function to estimate the fundamental matrix Η , given U 1 and U 2 , for both scenes in which U 1 and U 2 are defined in terms of eight ( Large training patterns have been gathered and classified, therefore.Gathered patterns at various loop locations gave an inspiration to a feasible size of learning neural system.Four layers artificial neural system has been found a feasible architecture for that purpose.The net maps 24 (3×8 feature points) inputs characterizing object cartesian feature position and arm joint positions into the (six) differential changes in arm joints positions.The network is presented with some arm motion in various directions.Once the neural system has learned with presented patterns and required mapping, it is ready to be employed in the visual servo controller.Trained neural net was able to map nonlinear relations relating object movement to differentional changes in arm joint space.Object path of motion was defined and simulated via RIVES Algorithm, as given in (Gian et al., 2004), after such large number of running and patterns, it was apparent that the learning neural system was able to capture such nonlinear relations.

The execution phase
Execution starts primary while employing learned neural system within the robotics dynamic controller (which is mainly dependent on visual feature Jacobian).In reference to Fig. (7), visual servoing dictates the visual features extraction block.That was achieved by the use of the Epipolar Toolbox.For assessing the proposed visual servo algorithm, simulation of full arm dynamics has been achieved using kinematics and dynamic models for the Puma 560 arm.Robot Toolbox has been used for that purpose.In this respect, also Fig. ( 8) shows an "aerial view" of actual object "initial" posture and the "desired" posture.This is prior to visual servoing to take place.The figure also indicates some scene features.Over simulation, Fig. ( 9) shows an "aerial view" of the Robot arm-camera servoing, as

Conclusions
Servoing a robotics arm towards a moving object movement using visual information, is a research topic that has been presented and discussed by a number of researchers for the last twenty years.In this sense, the chapter has discussed a mechanism to learn kinematics and feature-based Jacobian relations, that are used for robotics arm visual servo system.In this respect, the concept introduced within this chapter was based on an employment and utilization of an artificial neural network system.The ANN was trained in such away to learn a mapping relating the " complicated kinematics" as relating changes in visual loop into arm joint space.Changes in a loop visual Jocobain depends heavily on a robotics arm 3-D posture, in addition, it depends on features associated with an object under visual servo (to be tracked).Results have shown that, trained neural network can be used to learn such complicated visual relations relating an object movement to an arm joint space movement.The proposed methodology has also resulted in a great deal of accuracy.The proposed methodology was applied to the well-know Image Based Visual Servoing, already discussed and presented by RIVES as documented in (Gian et al., 2004).Results have indicated a close degree of accuracy between the already published "RIVES Algorithm" results and the newly proposed "ANN Visual Servo Algorithm".This indicates ANN Visual Servo, as been based on some space learning mechanisms, can reduce the computation time.

Fig. 2 .
Fig. 2. Camera geometrical representation in a 3-D space . In image coordinates this can be expressed as:

Fig. 5 .Fig. 6 .
Fig. 5. Employed four layers artificial neural systemOutputs of the neural net is obtained by transforming the activation of the hidden units using a second layer of processing units.For each output unit k, first we get the linear combination of the output of the hidden units, as in Equ.(42): Fig. (5), it does resemble actual biological neuron, as they are made of: • Synapses: Gap between adjacent neurons across which chemical signals are transmitted: (known as the input) • Dendrites: Receive synaptic contacts from other neurons • Cell body /soma: Metabolic centre of the neuron: processing • Axon: Long narrow process that extends from body: (known as the output) By emulation, ANN information transmission happens at the synapses, as shown in Fig. (5).
the object feature described in camera coordinate frame, which are a priori unknown, it is usual to replace them by coordinates (u) and (v) of the projection of such a feature point onto the image frame, as shown in Fig.(7).

Fig. 8 .
Fig. 8. Top view.Actual object position and desired position before the servoing

Fig. 10 .
Fig. 10.Resulting errors.Use of proposed ANN based visual servo

Fig. 13 .
Fig. 13.ANN visual servo controller approaching zero value for different training visual servo target postures 19)