A high-gain observer-based cooperative deterministic learning (CDL) control algorithm is proposed in this chapter for a group of identical unicycle-type unmanned ground vehicles (UGVs) to track over desired reference trajectories. For the vehicle states, the positions of the vehicles can be measured, while the velocities are estimated using the high-gain observer. For the trajectory tracking controller, the radial basis function (RBF) neural network (NN) is used to online estimate the unknown dynamics of the vehicle, and the NN weight convergence and estimation accuracy is guaranteed by CDL. The major challenge and novelty of this chapter is to track the reference trajectory using this observer-based CDL algorithm without the full knowledge of the vehicle state and vehicle model. In addition, any vehicle in the system is able to learn the knowledge of unmodeled dynamics along the union of trajectories experienced by all vehicle agents, such that the learned knowledge can be re-used to follow any reference trajectory defined in the learning phase. The learning-based tracking convergence and consensus learning results, as well as using learned knowledge for tracking experienced trajectories, are shown using the Lyapunov method. Simulation is given to show the effectiveness of this algorithm.
- cooperative control
- deterministic learning
- neural network
- multi-agent systems
- distributed adaptive learning and control
- unmanned ground vehicles
The two-wheel-driven, unicycle-type vehicle is one of the most common mobile robot platforms, and many research results have been published regarding this system [1, 2, 3, 4]. There are two major challenges for controlling this system: the knowledge of all state variables, and the actuate modeling of the system. For the unicycle-type vehicle that we use in this chapter, the vehicle position and velocity are both required for the trajectory tracking control. The position of the vehicle can be obtained using cameras or GPS signals, while direct measurement of the vehicle velocity is difficult. State observer has been proposed to estimate the full state of the system using the measured signals [5, 6], however, traditional observers require the knowledge of the system model for accurate state estimations. High-gain observer has been proposed to estimate the unmeasured state variables in case that the system model is not fully known to the observer, and the estimated states can be used for control purposes [7, 8, 9, 10]. In this chapter, we follow the standard high-gain observer design method  to obtain the estimation of vehicle velocity using the measured vehicle position.
For the second challenge, adaptive control has been introduced to deal with system uncertainties [11, 12], in which neural network (NN) based control is able to further deal with nonlinear system uncertainties [13, 11]. Though tracking control can be achieved by NN-based adaptive control, however, traditional NN-based control methods failed to achieve parameter (NN weight) convergence. This shortage requires the controller to update the system parameter (NN weight) all the time when the controller is operating, which is time consuming and computational demanding. To overcome this deficiency, a deterministic learning (DL) method has been proposed to model the system uncertainties under the partial persistency of excitation (PE) condition . To be more specific, it has been shown that the system uncertainties can be accurately modeled with a sufficient large number of radial basis function (RBF) NNs, and local NN weights online updated by DL will converge to their optimal values, provided that the input signal of the RBFNNs is recurrent.
Since the RBFNN estimation is locally accurate around the recurrent trajectory, this becomes a disadvantage when there exists multiple tracking tasks. The learned knowledge of the system uncertainties, presented by the RBFNNs, cannot be directly applied on a different control task, and it will need a significant amount of storage space for a large number of different tasks. In recent years, distributed control is a rising topic regarding the control of multiple coordinated agents [15, 16, 17, 18, 19, 20]. In this chapter, we took the idea of communicating inside the multi-agent system (MAS) and apply it on DL, such that in the learning phase, any vehicle in the MAS is able to learn the unmodeled dynamics not only along its own trajectory, but along the trajectories of all other vehicle agents in this MAS as well. In other words, the NN weight of any vehicle in this MAS will converge to a common constant, which presents the unmodeled dynamics along the union trajectory of all vehicles, and any vehicle in the MAS is able to use this knowledge to achieve trajectory tracking for any control task learned in the learning phase.
The main contributions of this chapter are summarized as follows.
A high-gain observer is introduced to estimate the vehicle velocities using the measurement of vehicle position.
An observer and RBFNN-based adaptive learning control algorithm is developed for a multi-vehicle system, such that each vehicle agent will be able to follow the desired reference trajectory.
An online cooperative adaptive NN learning law is proposed, such that the RBFNN weight of all vehicle agents will converge to one common value, which represents the unmodeled dynamics of the vehicle along the union trajectories experienced by all vehicle agents.
An observer and experience-based controller is developed using the common NN model obtained from the learning phase, such that vehicles are able to follow the reference trajectory experienced by any vehicle before with improved control performance.
In the following sections, we briefly describe some preliminaries on graph theory and RBFNNs based DL method, then present the vehicle dynamics and the problem statement, all in Section 2. The main results of this chapter, including the high-gain observer design, CDL-based trajectory tracking control, accurate cooperative learning using RBF NNs, and experience-based trajectory tracking control, are provided in Section 3, respectively. Simulation results of an example with four vehicles running three different tasks are provided in Section 4. The conclusions are drawn in Section 5.
Notations. , and denote, respectively, the set of real numbers, the set of positive real numbers and the set of positive integers; denotes the set of real matrices; denotes the set of real column vectors; denotes the identity matrix; denotes the zero matrix with dimension of ; Subscript denotes the column vector of a matrix; is the absolute value of a real number, and is the 2-norm of a vector or a matrix, i.e., ; denotes the total derivative of with respect to the time; denotes the Jacobian matrix as .
2. Preliminaries and problem statement
2.1 Graph theory
In a graph defined as , the elements of are called vertices, the elements of are pairs with called edges, and the matrix is called the adjacency matrix. If , then agent is able to receive information from agent , and agent and are called adjacent. The adjacency matrix is thus defined as , in which if and only if , and otherwise. For any two nodes , if there exists a path between them, then the graph is called connected. Furthermore, the graph is called fixed if and do not change over time, and called undirected if , pair is also in . According to , for the Laplacian matrix associated with the undirected graph , in which If the graph is connected, then is a positive semi-definite symmetric matrix, with one zero eigenvalue and all other eigenvalues being positive and hence, .
2.2 Localized RBF neural networks and deterministic learning
The RBF networks can be described by , where is the input vector, is the weight vector, is the NN node number, and , with being a radial basis function, and being distinct points in state space. The Gaussian function is one of the most commonly used radial basis functions, where is the center of the receptive field and is the width of the receptive field. The Gaussian function belongs to the class of localized RBFs in the sense that as . It is easily seen that is bounded and there exists a real constant such that .
It has been shown in [22, 23] that for any continuous function where is a compact set, and for the NN approximator, where the node number is sufficiently large, there exists an ideal constant weight vector , such that for any , , where is the ideal approximation error. The ideal weight vector is an “artificial” quantity required for analysis, and is defined as the value of that minimizes for all , i.e., . Moreover, based on the localization property of RBF NNs , for any bounded trajectory within the compact set , can be approximated by using a limited number of neurons located in a local region along the trajectory: , where is the approximation error, with , , , , and the integers are defined by (is a small positive constant) for some .
It is shown in  that for a localized RBF network whose centers are placed on a regular lattice, almost any recurrent trajectory (see  for detailed definition of “recurrent” trajectories) can lead to the satisfaction of the PE condition of the regressor subvector . This result is recalled in the following Lemma.
Lemma 1 [14, 24]. Consider any recurrent trajectory :.remains in a bounded compact set , then for RBF network with centers placed on a regular lattice (large enough to cover compact set ), the regressor subvector consisting of RBFs with centers located in a small neighborhood of is persistently exciting.
2.3 Vehicle model and problem statement
As shown in Figure 1 , this unicycle-type vehicle is a nonholonomic system, with the constraint force preventing the vehicle from sliding along the axis of the actuated wheels. The nonholonomic constraint can be presented as follows
in which , and is the general coordinates of the vehicle (, with being the number of vehicles in the MAS). () and denote the position and orientation of the vehicle with respect to the ground coordinate, respectively.
With this constraint, the degree of freedom of the system is reduced to two. Independently driven by the two actuated wheels on each side of the vehicle, the non-slippery kinematics of the vehicle is
where and are the linear and angular velocities measured at the center between the driving wheels, respectively. The dynamics of the vehicle can be described by .
in which is a positive definite matrix that denotes the inertia, is the centripetal and Coriolis matrix, is the friction vector, is the gravity vector. is a vector of system input, i.e., the torque applied on each driving wheel, is the input transformation matrix, projecting the system input onto the space spanned by , in which is the distance between two actuation wheels, and is the radius of the wheel. is a Lagrange multiplier, and denotes the constraint force.
Matrices and in Eq. (3) can be derived using the Lagrangian equation with the follow steps. First we calculate the kinetic energy for the vehicle agent
where is the mass of the vehicle, is the moment of inertia measured at the center of mass, , , and are the position and orientation of the vehicle at the center of mass, respectively. The following relation can be obtained from Figure 1 :
Then Eq. (4) can be rewritten into
in which is the Lagrangian of the vehicle, is the potential energy of the vehicle agent, is the Lagrangian multiplier, and is the constraint force. denotes the external force, where is the force generated by the actuator, and is the friction on the actuator. Then Eq. (7) can be rewritten into
By setting , , and , Eq. (8) can be thereby transferred into Eq. (3). Notice that the form of is not unique, however, with a proper definition of the matrix , we will have to be skew-symmetric. The entry of is defined as follows .
where is the entry of , and is defined using the Christoffel symbols of the first kind. Then we have the centripetal and Coriolis matrix calculated as . Since the vehicle is operating on the ground, the gravity vector is equal to zero. The friction vector is assumed to be a nonlinear function of the general velocity , and is unknown to the controller.
To eliminate the nonholonomic constraint force from Eq. (3), we left multiplying to the equation, it yields:
The degree of freedom of the vehicle dynamics is now reduced to two. Since is of full rank, then for any transformed torque input , there exists a unique corresponding actual torque input that applied on each wheel.
The main challenge for controlling the system includes (i) the direct measurement of the linear and angular velocities is not feasible, and (ii) system parameter matrices and are unknown to the controller.
Based on the above system setup, we are ready to formulate our objective of this chapter. Consider a group of homogeneous unicycle-type vehicles, the kinematics and dynamics of each vehicle agent are described by Eqs. (2) and (11), respectively. The communication graph of such vehicles is denoted as . Regarding this MAS, we have the following assumption.
Assumption 1. The graph is undirected and connected.
The objective of this chapter is to design an output-feedback adaptive learning control law for each vehicle agent in the MAS, such that
State estimation: The immeasurable general velocities can be estimated by a high-gain observer using the measurement of the general coordinates .
Trajectory tracking: Each vehicle in the MAS will track its desired reference trajectory, which will be quantified by ; i.e., , , .
Cooperative Learning: The unknown homogeneous dynamics of all the vehicles can be locally accurately identified along the union of the trajectories experienced by all vehicle agents in the MAS.
Experience based control: The identified/learned knowledge from the cooperative learning phase can be re-utilized by each local vehicle to perform stable trajectory tracking with improved control performance.
In order to apply the deterministic learning theory, we have the following assumption on the reference trajectories.
Assumption 2. The reference trajectories ,,for all are recurrent.
3. Main results
3.1 High-gain observer design
In mobile robotics control, the position of the vehicle can be easily obtained in real time using GPS signals or camera positioning, while the direct measurement of the velocities is much more difficult. For the control and system estimation purposes, the velocities of the vehicle are required for the controller. To this end, we follow the high-gain observer design method in [8, 9], and introduce a high-gain observer to estimate the velocities using robot positions. First, we define two new variables as follows
Notice that the operation above can be considered as a projecting the vehicle position onto the a frame whose origin is fixed to the origin of ground coordinates, and the axes are parallel to the body-fixed frame of the vehicle. The coordinates of the vehicle in this rotational frame is and hence, and can be calculated based on the measurement of the position and the orientation. The rotation rate of this frame equals to the angular velocity of the vehicle . Based on this, we design the high-gain observer for as
in which is a small positive scalar to be designed, and and are parameters to be chosen, such that is Hurwitz stable. The time derivative of this coordinates defined in Eq. (12) is given by , and , then we design the high-gain observer for as
To prevent peaking while using this high-gain observer and in turn improving the transient response, parameter cannot be too small . Due to the use of a globally bounded control, decreasing does not induce peaking phenomenon of the state variables of the system, while the ability to decrease will be limited by practical factors such as measurement noise and sampling rates [7, 27]. According to , it is easy to show that the estimation error between the actual and estimated velocities of the vehicle will converge to zero, detailed proof is omitted here due to space limitation.
3.2 Controller design and tracking convergence analysis
After obtaining the linear and angular velocities from the high-gain observer, we now proceed to the trajectory tracking. First, we define the tracking error by projecting onto the body coordinate of the vehicle, with the axis set to be the front and to be the left of the vehicle, as shown in Figure 2 .
where and are the linear and angular velocities of the vehicle, respectively.
in which , , and are all positive constants. It can be shown that this virtual velocity controller is able to stabilize the closed-loop system Eq. (16) kinematically by replacing and with and , respectively. To this end, we define the following Lyapunov function for the vehicle
and the derivative of is
Since is negative semi-definite, then we can conclude that this closed-loop system is stable, i.e., the tracking error for the vehicle will be bounded.
Remark 1. In addition to the stable conclusion above, we could also conclude the asymptotic stability by finding the invariant set of . By setting , we have and . Applying this result into Eqs. (16) and (17) , we have the invariant set equals to . With the assumption 2, the velocity of the reference cannot be constant over time, then we can conclude that the only invariant subset of is the origin . Therefore, we can conclude that the closed-loop system Eqs. (16) and (17) is asymptotically stable [ 29 ].
With the idea of backstepping control, we then derive the transformed torque input for the vehicle with the following steps. By defining the error between the virtual controller and the actual velocity as , we can rewrite Eq. (16) in terms of and as
Then we define a new Lyapunov function for the closed-loop system Eq. (20), whose derivative can be written as
To make the system stable, the term needs to be negative definite. From the definition of and Eq. (11), we have
Motivated from the results of , it is easy to show that this term is negative definite if is designed to be
where is a positive constant. Since the actual linear and angular velocity of the vehicle is unknown, we use and generated by the high-gain observer Eqs. (13) and (14) to replace and in Eq. (23). From the discussion in previous subsection, the convergence of velocities estimation is guaranteed.
In Eq. (23), and are unknown to the controller. To overcome this issue, RBFNN will be used to approximate this nonlinear uncertain term, i.e.,
in which is the vector of RBF, with the variable (RBFNN input) , is the common ideal estimation weight of this RBFNN, and is the ideal estimation error, which can be made arbitrarily small given sufficiently large number of neurons. Consequently, we proposed the implementable controller for the vehicle as follows
For the NN weights used in Eq. (25), we propose an online NN weight updating law as follows
where , , and are positive constants.
Theorem 1. Consider the closed-loop system consisting of the vehicles in the MAS described by Eqs. (2) and (11) , reference trajectory , high-gain observer Eqs. (13) and (14) , adaptive NN controller Eq. (25) with the virtual velocity Eq. (17) , and the online weight updating law (26) , under the Assumptions 1 and 2, then for any bounded initial condition of all the vehicles and , the tracking error converges asymptotically to a small neighborhood around zero for all vehicle agents in the MAS.
where and . Notice that the convergence of to is guaranteed by the high-gain observer. Then we derive the error dynamics of NN weight as follows
whose derivative is equal to
where is the Laplacian matrix of , and . Since and are all positive, and is positive semi-definite, then we have . Notice that the estimation error can be made arbitrary small with a sufficient large number of neurons, and is the leakage term chosen as a small positive constant. Therefore, we can conclude that the closed-loop system Eqs. (20), (27), and (28) is stable, i.e., , if the following condition stands
Hence, the closed-loop system is stable, and all tracking error are bounded. Since all variables in Eq. (31) are continuous (i.e., is bounded), then with the application of Barbalat’s Lemma , we have , which implies that the tracking error for all agents will converge to a small neighborhood of zero, whose size depends on the norm of . Q.E.D.
3.3 Consensus convergence of NN weights
In addition to the tracking convergence shown in the previous subsection, we will show that all vehicles in the system is able to learn the unknown vehicle dynamics along the union trajectory (denoted as ) experienced by all vehicles in this subsection.
where , and
As is shown in Theorem 1, the tracking error will converge to a small neighborhood of zero for all vehicle agents in the MAS. Furthermore, the ideal estimation errors and can be made arbitrarily small given sufficient number of RBF neurons, and is chosen to be a small positive constant, therefore, we can conclude that the norm of in Eq. (33) is a small value. In the following theorem, we will show that converges to a small neighborhood of the common ideal weight for all under Assumptions 1 and 2.
Before proceeding further, we denote the system trajectory of the vehicle as for all . Using the same notation from , and represent the parts of related to the region close to and away from the trajectory , respectively.
Theorem 2. Consider the error dynamics Eq. (33) , under the Assumptions 1 and 2, then for any bounded initial condition of all the vehicles and , along the union of the system trajectories , all local estimated neural weights used in Eqs. (25) and (26) converge to a small neighborhood of their common ideal value , and locally accurate identification of nonlinear uncertain dynamics can be obtained by as well as for all , where
with () being a time segment after the transient period of tracking control.
Proof: According to , if the nominal part of closed loop system shown in Eq. (33) is uniformly locally exponentially stable (ULES), then , , , and will converge to a small neighborhood of the origin, whose size depends on the value of .
Now the problem boils down to proving ULES of the nominal part of system Eq. (33). To this end, we need to resort to the results of Lemma 4 in . It is stated that if the Assumptions 1 and 2 therein are satisfied, and the associated vector is PE for all , then the nominal part of Eq. (33) is ULES. The assumption 1 therein is automatically verified since is bounded, and Assumption 2 therein also holds, if we set the counterparts and . Furthermore, the PE condition of will also be met, if of the learning task is recurrent , which is guaranteed by Assumption 2 and results from Theorem 1. Therefore, we can obtain the conclusion that , , , and will converge to a small neighborhood of the origin, whose size depends on the small value of .
Similar to , the convergence of to a small neighborhood of implies that for all , we have
where is a subvector of and is the error using as the system approximation. After the transient process, is small for all .
On the other hand, due to the localization property of Gaussian RBFs, both and are very small. Hence, along the union trajectory , the entire constant RBF network can be used to approximate the nonlinear uncertain dynamics, demonstrated by the following equivalent equations
where and are all small for all . Therefore, the conclusion of Theorem 2 can be drawn. Q.E.D.
3.4 Experience-based trajectory tracking control
In this section, based on the learning results from the previous subsections, we further propose an experience-based trajectory tracking control method using the knowledge learned in the previous subsection, such that the experience-based controller is able to drive each vehicle to follow any reference trajectory experienced by any vehicle on the learning stage.
To this end, we replace the NN weight in Eq. (25) by the converged constant NN weight for the vehicle. Therefore, the experience-based controller for the vehicle is constructed as follows
in which is the derivative of the virtual velocity controller from Eq. (17), and is obtained from Eq. (34) for the vehicle. The system model Eqs. (2) and (11), and the high-gain observer design Eqs. (14) and (13) remain unchanged.
Theorem 3. Consider the closed-loop system consisting of Eqs. (2) and (11) , reference trajectory , high-gain observer Eqs. (14) and (13) , and the experience-based controller Eq. (38) with virtual velocity Eq. (17) . For any bounded initial condition, the tracking error converges asymptotically to a small neighborhood around zero.
Proof: Similar to the proof of Theorem 1, by defining and to be the error between the position and velocity of the vehicle and its associated reference trajectory, we have the error dynamics of the vehicle as
With the same high-gain observer design used in the learning-based tracking, the convergence of to is also guaranteed. For the closed-loop system shown above, we can build a positive definite function as
and the derivative of is
where . Then following the similar arguments in the proof of Theorem 1, given positive , , , and , then we can conclude that the Lyapunov function is positive definite and is negative semi-definite in the region . Similar to the proof of Theorem 1, it can be shown that with Barbalat’s Lemma, and the tracking errors will converge to a small neighborhood of zero. Q.E.D.
4. Simulation studies
Consider four identical vehicles, whose unknown friction vector is assumed to be a nonlinear function of and as follows , and since we assume the vehicles are operating on the horizontal plane, the gravitational vector is equal to zero. The physical parameters of the vehicles are given as , ; , . The reference trajectories of the three vehicles are given by
and for all vehicles, the orientations of reference trajectories and vehicle velocities satisfy the following equations
The parameters of the observer Eqs. (13) and (14) are given as , and . The parameters of the controller Eq. (25) with Eq. (17) are given as , and . The parameters of Eq. (26) are given as , , and . For each , since , we construct the Gaussian RBFNN using neuron nodes with the centers evenly placed over the state space and the standard deviation of the Gaussian function equal to . The initial position of the vehicles are set at the origin, with the velocities set to be zero, and the initial weights of RBFNNs are also set to be zero. The connection between three vehicles is shown in Figure 3 , and the Laplacian matrix associated with the graph is
Simulation results are shown as following. Figure 4a shows that the observer error will converge to a close neighborhood around zero in a very short time period, and Figure 4b shows that all tracking errors and will converge to zero, and Figures 5a–f show that all vehicles (blue triangles) will track its own reference trajectory (red solid circles) on the 2-D frame. Figure 6b shows that the NN weights of all vehicle agents converge to the same constant, and Figure 6a shows that all RBFNNs of three vehicles are able to accurately estimate the unknown dynamics, as the estimation errors converging to a small neighborhood around zero.
To demonstrate the results of Theorem 3, which states that after the learning process, each vehicle is able to use the learned knowledge to follow any reference trajectory experienced by any vehicle on the learning stage. In this part of our simulation, the experience-based controller Eq. (38) will be implemented with the same parameters as those of the previous subsection, such that vehicle 1 will follow the reference trajectory of vehicle 3, vehicle 2 will follow the reference trajectory of vehicle 1, and vehicle 3 will follow the reference trajectory of vehicle 2. The initial position of the vehicles are set at the origin, with all velocities equal to zero.
Simulation results are shown as following. Figure 7a shows that the observer error will converge to a close neighborhood around zero in a very short time period. Figures 8a–c show that all vehicles (blue triangles) will track its own reference trajectory (red solid circles), and Figure 7b shows that all tracking errors and will converge to zero.
In this chapter, a high-gain observer-based CDL control algorithm has been proposed to estimate the unmodeled nonlinear dynamics of a group of homogeneous unicycle-type vehicles while tracking their reference trajectories. It has been shown in this chapter that the state estimation, trajectory tracking, and consensus learning are all achieved using the proposed algorithm. To be more specific, any vehicle in the system is able to learn the unmodeled dynamics along the union of trajectories experienced by all vehicles with the state variables provided by measurements and observer estimations. In addition, we have also shown that with the converged NN weight, this knowledge can be applied on the vehicle to track any experienced trajectory with reduced computational complexity. Simulation results have been provided to demonstrate the effectiveness of this proposed algorithm.