Open access peer-reviewed chapter

# Cooperative Adaptive Learning Control for a Group of Nonholonomic UGVs by Output Feedback

By Xiaonan Dong, Paolo Stegagno, Chengzhi Yuan and Wei Zeng

Submitted: February 12th 2019Reviewed: May 24th 2019Published: April 22nd 2020

DOI: 10.5772/intechopen.87038

## Abstract

A high-gain observer-based cooperative deterministic learning (CDL) control algorithm is proposed in this chapter for a group of identical unicycle-type unmanned ground vehicles (UGVs) to track over desired reference trajectories. For the vehicle states, the positions of the vehicles can be measured, while the velocities are estimated using the high-gain observer. For the trajectory tracking controller, the radial basis function (RBF) neural network (NN) is used to online estimate the unknown dynamics of the vehicle, and the NN weight convergence and estimation accuracy is guaranteed by CDL. The major challenge and novelty of this chapter is to track the reference trajectory using this observer-based CDL algorithm without the full knowledge of the vehicle state and vehicle model. In addition, any vehicle in the system is able to learn the knowledge of unmodeled dynamics along the union of trajectories experienced by all vehicle agents, such that the learned knowledge can be re-used to follow any reference trajectory defined in the learning phase. The learning-based tracking convergence and consensus learning results, as well as using learned knowledge for tracking experienced trajectories, are shown using the Lyapunov method. Simulation is given to show the effectiveness of this algorithm.

### Keywords

• cooperative control
• deterministic learning
• neural network
• multi-agent systems
• distributed adaptive learning and control
• unmanned ground vehicles

## 1. Introduction

The two-wheel-driven, unicycle-type vehicle is one of the most common mobile robot platforms, and many research results have been published regarding this system [1, 2, 3, 4]. There are two major challenges for controlling this system: the knowledge of all state variables, and the actuate modeling of the system. For the unicycle-type vehicle that we use in this chapter, the vehicle position and velocity are both required for the trajectory tracking control. The position of the vehicle can be obtained using cameras or GPS signals, while direct measurement of the vehicle velocity is difficult. State observer has been proposed to estimate the full state of the system using the measured signals [5, 6], however, traditional observers require the knowledge of the system model for accurate state estimations. High-gain observer has been proposed to estimate the unmeasured state variables in case that the system model is not fully known to the observer, and the estimated states can be used for control purposes [7, 8, 9, 10]. In this chapter, we follow the standard high-gain observer design method [8] to obtain the estimation of vehicle velocity using the measured vehicle position.

For the second challenge, adaptive control has been introduced to deal with system uncertainties [11, 12], in which neural network (NN) based control is able to further deal with nonlinear system uncertainties [13, 11]. Though tracking control can be achieved by NN-based adaptive control, however, traditional NN-based control methods failed to achieve parameter (NN weight) convergence. This shortage requires the controller to update the system parameter (NN weight) all the time when the controller is operating, which is time consuming and computational demanding. To overcome this deficiency, a deterministic learning (DL) method has been proposed to model the system uncertainties under the partial persistency of excitation (PE) condition [14]. To be more specific, it has been shown that the system uncertainties can be accurately modeled with a sufficient large number of radial basis function (RBF) NNs, and local NN weights online updated by DL will converge to their optimal values, provided that the input signal of the RBFNNs is recurrent.

Since the RBFNN estimation is locally accurate around the recurrent trajectory, this becomes a disadvantage when there exists multiple tracking tasks. The learned knowledge of the system uncertainties, presented by the RBFNNs, cannot be directly applied on a different control task, and it will need a significant amount of storage space for a large number of different tasks. In recent years, distributed control is a rising topic regarding the control of multiple coordinated agents [15, 16, 17, 18, 19, 20]. In this chapter, we took the idea of communicating inside the multi-agent system (MAS) and apply it on DL, such that in the learning phase, any vehicle in the MAS is able to learn the unmodeled dynamics not only along its own trajectory, but along the trajectories of all other vehicle agents in this MAS as well. In other words, the NN weight of any vehicle in this MAS will converge to a common constant, which presents the unmodeled dynamics along the union trajectory of all vehicles, and any vehicle in the MAS is able to use this knowledge to achieve trajectory tracking for any control task learned in the learning phase.

The main contributions of this chapter are summarized as follows.

1. A high-gain observer is introduced to estimate the vehicle velocities using the measurement of vehicle position.

2. An observer and RBFNN-based adaptive learning control algorithm is developed for a multi-vehicle system, such that each vehicle agent will be able to follow the desired reference trajectory.

3. An online cooperative adaptive NN learning law is proposed, such that the RBFNN weight of all vehicle agents will converge to one common value, which represents the unmodeled dynamics of the vehicle along the union trajectories experienced by all vehicle agents.

4. An observer and experience-based controller is developed using the common NN model obtained from the learning phase, such that vehicles are able to follow the reference trajectory experienced by any vehicle before with improved control performance.

In the following sections, we briefly describe some preliminaries on graph theory and RBFNNs based DL method, then present the vehicle dynamics and the problem statement, all in Section 2. The main results of this chapter, including the high-gain observer design, CDL-based trajectory tracking control, accurate cooperative learning using RBF NNs, and experience-based trajectory tracking control, are provided in Section 3, respectively. Simulation results of an example with four vehicles running three different tasks are provided in Section 4. The conclusions are drawn in Section 5.

Notations. R, R+and +denote, respectively, the set of real numbers, the set of positive real numbers and the set of positive integers; Rm×ndenotes the set of m×nreal matrices; Rndenotes the set of n×1real column vectors; Indenotes the n×nidentity matrix; Om×ndenotes the zero matrix with dimension of m×n; Subscript kdenotes the kthcolumn vector of a matrix; is the absolute value of a real number, and is the 2-norm of a vector or a matrix, i.e., x=xTx12; żdenotes the total derivative of zwith respect to the time; /zdenotes the Jacobian matrix as z=z1zn.

## 2. Preliminaries and problem statement

### 2.1 Graph theory

In a graph defined as G=VεA, the elements of V=12nare called vertices, the elements of εare pairs ijwith i,jV,ijcalled edges, and the matrix Ais called the adjacency matrix. If ijε, then agent iis able to receive information from agent j, and agent iand jare called adjacent. The adjacency matrix is thus defined as A=aijn×n, in which aij>0if and only if ijε, and aij=0otherwise. For any two nodes vi,vjV, if there exists a path between them, then the graph Gis called connected. Furthermore, the graph Gis called fixed if εand Ado not change over time, and called undirected if ijε, pair jiis also in ε. According to [21], for the Laplacian matrix L=lijn×nassociated with the undirected graph G, in which lij=j=1,jinaiji=jaijij.If the graph is connected, then Lis a positive semi-definite symmetric matrix, with one zero eigenvalue and all other eigenvalues being positive and hence, rankLn1.

### 2.2 Localized RBF neural networks and deterministic learning

The RBF networks can be described by fnnZ=i=1NnwisiZ=WTSZ[22], where ZΩZRqis the input vector, W=w1wNnTRNnis the weight vector, Nnis the NN node number, and SZ=s1Zμ1sNnZμNnT, with sibeing a radial basis function, and μii=12Nnbeing distinct points in state space. The Gaussian function siZμi=expZμiTZμiσ2is one of the most commonly used radial basis functions, where μi=μi1μi2μiqTis the center of the receptive field and σiis the width of the receptive field. The Gaussian function belongs to the class of localized RBFs in the sense that siZμi0as Z. It is easily seen that SZis bounded and there exists a real constant SMR+such that SZSM[14].

It has been shown in [22, 23] that for any continuous function fZ:ΩZRwhere ΩZRqis a compact set, and for the NN approximator, where the node number Nnis sufficiently large, there exists an ideal constant weight vector W, such that for any ϵ>0, fZ=WTSZ+ϵ,ZΩZ, where ϵ<ϵis the ideal approximation error. The ideal weight vector Wis an “artificial” quantity required for analysis, and is defined as the value of Wthat minimizes ϵfor all ZΩZRq, i.e., WargminWRNnsupZΩZfZWTSZ. Moreover, based on the localization property of RBF NNs [14], for any bounded trajectory Ztwithin the compact set ΩZ, fZcan be approximated by using a limited number of neurons located in a local region along the trajectory: fZ=WζTSζZ+ϵζ, where ϵζis the approximation error, with ϵζ=Oϵ=Oϵ, SζZ=sj1ZsZTRNζ, Wζ=wj1wTRNζ, Nζ<Nn, and the integers ji=j1,,jζare defined by sjiZp>θ(θ>0is a small positive constant) for some ZpZk.

It is shown in [14] that for a localized RBF network WTSZwhose centers are placed on a regular lattice, almost any recurrent trajectory Zk(see [14] for detailed definition of “recurrent” trajectories) can lead to the satisfaction of the PE condition of the regressor subvector SζZ. This result is recalled in the following Lemma.

Lemma 1 [14, 24]. Consider any recurrent trajectory Zk:+Rq.Zkremains in a bounded compact set ΩZRq, then for RBF network WTSZwith centers placed on a regular lattice (large enough to cover compact set ΩZ), the regressor subvector SζZconsisting of RBFs with centers located in a small neighborhood of Zkis persistently exciting.

### 2.3 Vehicle model and problem statement

As shown in Figure 1 , this unicycle-type vehicle is a nonholonomic system, with the constraint force preventing the vehicle from sliding along the axis of the actuated wheels. The nonholonomic constraint can be presented as follows

ATqiq̇i=0E1

in which Aqi=sinθicosθi0T, and qi=xiyiθiTis the general coordinates of the ithvehicle (i=1,2,,n, with nbeing the number of vehicles in the MAS). (xi,yi) and θidenote the position and orientation of the vehicle with respect to the ground coordinate, respectively.

With this constraint, the degree of freedom of the system is reduced to two. Independently driven by the two actuated wheels on each side of the vehicle, the non-slippery kinematics of the ithvehicle is

q̇i=ẋiẏiθ̇i=cosθi0sinθi001viωi=defJqiuiE2

where viand ωiare the linear and angular velocities measured at the center between the driving wheels, respectively. The dynamics of the ithvehicle can be described by [25].

Mqiq¨i+Cqiq̇iq̇i+Fqiq̇i+Gqi=Bqiτi+Aqiλi,E3

in which MR3×3is a positive definite matrix that denotes the inertia, CR3×3is the centripetal and Coriolis matrix, FR3×1is the friction vector, GR3×1is the gravity vector. τiR2×1is a vector of system input, i.e., the torque applied on each driving wheel, B=1rcosθicosθisinθisinθiRRR3×2is the input transformation matrix, projecting the system input τonto the space spanned by xyθ, in which D=2Ris the distance between two actuation wheels, and ris the radius of the wheel. λiis a Lagrange multiplier, and AλiR3×1denotes the constraint force.

Matrices Mand Cin Eq. (3) can be derived using the Lagrangian equation with the follow steps. First we calculate the kinetic energy for the ithvehicle agent

Ti=mẋic2+ẏic22+Iθ̇ic22E4

where mis the mass of the vehicle, Iis the moment of inertia measured at the center of mass, xic, yic, and θicare the position and orientation of the vehicle at the center of mass, respectively. The following relation can be obtained from Figure 1 :

xic=xi+dcosθiyic=yi+dsinθiθic=θi,ẋic=ẋidθ̇sinθiẏic=ẏi+dθ̇cosθiθ̇ic==θ̇iE5

Then Eq. (4) can be rewritten into

Tqiq̇i=mẋidθ̇sinθi2+ẏi+dθ̇cosθi22+Iθ̇i22=12mẋi2+mẏi2+md2+Iθ̇22mdsinθẋiθ̇i+2mdcosθẏiθ̇i=q̇iTMqiq̇i2E6

in which M=m0mdsinθi0mmdcosθimdsinθimdcosθimd2+I. It will be shown later that the inertia matrix Mshown above is identical to that in Eq. (3). Then the dynamics equation of the system is given by the following Lagrangian equation [26],

ddtLq̇iTLqiT=Aqiλi+QiE7

in which Lqiq̇i=Tqiq̇iUqiis the Lagrangian of the ithvehicle, Uqiis the potential energy of the vehicle agent, λRk×1is the Lagrangian multiplier, and ATλis the constraint force. Qi=Bqiτifuidenotes the external force, where τiis the force generated by the actuator, and fuiis the friction on the actuator. Then Eq. (7) can be rewritten into

Mqiq¨i+Ṁq̇iTiqiT+UiqiT+Bqifq̇i=Aqiλi+BqiτiE8

By setting Cqiq̇iq̇i=Ṁq̇iTiqiT, Fqiq̇i=Bqifq̇i, and Gqi=UiqiT, Eq. (8) can be thereby transferred into Eq. (3). Notice that the form of Cn×nis not unique, however, with a proper definition of the matrix C, we will have Ṁ2Cto be skew-symmetric. The ijthentry of Cis defined as follows [26].

cij=k=1ncijkq̇kE9

where q̇kis the kthentry of q̇, and cijk=12mijqk+mikqjmjkqiis defined using the Christoffel symbols of the first kind. Then we have the centripetal and Coriolis matrix calculated as C=00mdθ̇icosθi00mdθ̇isinθi000. Since the vehicle is operating on the ground, the gravity vector Gis equal to zero. The friction vector Fis assumed to be a nonlinear function of the general velocity ui, and is unknown to the controller.

To eliminate the nonholonomic constraint force Aqiλifrom Eq. (3), we left multiplying JTqito the equation, it yields:

JTMJu̇i+JTMJ̇+CJui+JTF+JTG=JTBτi+JTAλiE10

From Eqs. (1) and (2), we have JTA=02×1, then the dynamic equation of uiis simplified as

M¯qiu̇i+C¯uiui+F¯ui+G¯qi=τ¯i,E11

where

M¯=JTMJ=m00md2+I,C¯=JTMJ̇+CJ=0mdθ̇imdθ̇i0,F¯=JTF,G¯=JTG=02×1,τ¯i=τ¯viτ¯ωi=JTBτi=1/r1/rR/rR/rτi.

The degree of freedom of the vehicle dynamics is now reduced to two. Since JTBis of full rank, then for any transformed torque input τ¯i, there exists a unique corresponding actual torque input τiR2that applied on each wheel.

The main challenge for controlling the system includes (i) the direct measurement of the linear and angular velocities is not feasible, and (ii) system parameter matrices C¯and F¯are unknown to the controller.

Based on the above system setup, we are ready to formulate our objective of this chapter. Consider a group of nhomogeneous unicycle-type vehicles, the kinematics and dynamics of each vehicle agent are described by Eqs. (2) and (11), respectively. The communication graph of such nvehicles is denoted as G. Regarding this MAS, we have the following assumption.

Assumption 1. The graph Gis undirected and connected.

The objective of this chapter is to design an output-feedback adaptive learning control law for each vehicle agent in the MAS, such that

1. State estimation: The immeasurable general velocities ui=viωiTcan be estimated by a high-gain observer using the measurement of the general coordinates qi=xiyiθiT.

2. Trajectory tracking: Each vehicle in the MAS will track its desired reference trajectory, which will be quantified by xrityritθrit; i.e., limtxitxrit=0, limtyityrit=0, limtθitθrit=0.

3. Cooperative Learning: The unknown homogeneous dynamics of all the vehicles can be locally accurately identified along the union of the trajectories experienced by all vehicle agents in the MAS.

4. Experience based control: The identified/learned knowledge from the cooperative learning phase can be re-utilized by each local vehicle to perform stable trajectory tracking with improved control performance.

In order to apply the deterministic learning theory, we have the following assumption on the reference trajectories.

Assumption 2. The reference trajectories xrit,yrit,θritfor all i=1,,nare recurrent.

## 3. Main results

### 3.1 High-gain observer design

In mobile robotics control, the position of the vehicle can be easily obtained in real time using GPS signals or camera positioning, while the direct measurement of the velocities is much more difficult. For the control and system estimation purposes, the velocities of the vehicle are required for the controller. To this end, we follow the high-gain observer design method in [8, 9], and introduce a high-gain observer to estimate the velocities using robot positions. First, we define two new variables as follows

pxi=xicosθi+yisinθipyi=yicosθixisinθiE12

Notice that the operation above can be considered as a projecting the vehicle position onto the a frame whose origin is fixed to the origin of ground coordinates, and the axes are parallel to the body-fixed frame of the vehicle. The coordinates of the vehicle in this rotational frame is pxipyiand hence, pxiand pyican be calculated based on the measurement of the position and the orientation. The rotation rate of this frame equals to the angular velocity of the vehicle θ̇i=ωi. Based on this, we design the high-gain observer for ωas

θ̂̇i=ω̂i+l1δθiθ̂iω̂̇i=l2δ2θiθ̂iE13

in which δis a small positive scalar to be designed, and l1and l2are parameters to be chosen, such that l11l20is Hurwitz stable. The time derivative of this coordinates defined in Eq. (12) is given by ṗxi=vi+pyiωi, and ṗyi=pxiωi, then we design the high-gain observer for vas

p̂̇xi=v̂i+pyiω̂i+l1δpxip̂xiv̂̇i=l2δ2pxip̂xiE14

To prevent peaking while using this high-gain observer and in turn improving the transient response, parameter δcannot be too small [9]. Due to the use of a globally bounded control, decreasing δdoes not induce peaking phenomenon of the state variables of the system, while the ability to decrease δwill be limited by practical factors such as measurement noise and sampling rates [7, 27]. According to [8], it is easy to show that the estimation error between the actual and estimated velocities of the ithvehicle zi=uiûiwill converge to zero, detailed proof is omitted here due to space limitation.

### 3.2 Controller design and tracking convergence analysis

After obtaining the linear and angular velocities from the high-gain observer, we now proceed to the trajectory tracking. First, we define the tracking error q˜iby projecting qriqionto the body coordinate of the ithvehicle, with the xaxis set to be the front and yto be the left of the vehicle, as shown in Figure 2 .

q˜i=x˜iy˜iθ˜i=cosθisinθi0sinθicosθi0001xrixiyriyiθriθi,E15

using the constraint Eq. (1) and kinematics Eq. (2), we have the derivative of the tracking error as follows

x˜̇i=vricosθ˜i+ωiy˜iviy˜̇i=vrisinθ˜iωix˜iθ˜̇i=ωriωiE16

where viand ωiare the linear and angular velocities of the ithvehicle, respectively.

In order to utilize the backstepping control theory, we treat viand ωiin Eq. (16) as virtual inputs, then following the methodology from [28], we can design a stabilizing virtual controller as

uci=vciωci=vricosθ˜i+Kxx˜iωri+vriKyy˜i+Kθsinθ˜i,E17

in which Kx, Ky, and Kθare all positive constants. It can be shown that this virtual velocity controller is able to stabilize the closed-loop system Eq. (16) kinematically by replacing viand ωiwith vciand ωci, respectively. To this end, we define the following Lyapunov function for the ithvehicle

V1i=x˜i22+y˜i22+1cosθ˜iKyE18

and the derivative of V1iis

V˙1i=x˜ix˜̇i+y˜iy˜̇i+sinθ˜iKyθ˜̇i=x˜i(vricosθ˜i+ωiy˜ivci)+y˜i(vrisinθ˜iωix˜i)+sinθ˜iKy(ωriωci)=x˜i(ωiy˜iKxx˜i)+y˜i(vrisinθ˜iωix˜i)+sinθ˜iKy(vriKyy˜iKθsinθ˜i)=Kxx˜i2KθKysin2θ˜i0E19

Since V̇1iis negative semi-definite, then we can conclude that this closed-loop system is stable, i.e., the tracking error q˜ifor the ithvehicle will be bounded.

Remark 1. In addition to the stable conclusion above, we could also conclude the asymptotic stability by finding the invariant set of V̇1i=0. By setting V̇1i=0, we have x˜i=0and sinθ˜=0. Applying this result into Eqs. (16) and (17) , we have the invariant set equals to x˜i=0y˜i=0sinθ˜=0x˜i=0sinθ˜=0y˜i0vri=0ωri=0. With the assumption 2, the velocity of the reference cannot be constant over time, then we can conclude that the only invariant subset of V̇1i=0is the origin q˜i=0. Therefore, we can conclude that the closed-loop system Eqs. (16) and (17) is asymptotically stable [ 29 ].

With the idea of backstepping control, we then derive the transformed torque input τ¯ifor the ithvehicle with the following steps. By defining the error between the virtual controller uciand the actual velocity uias u˜i=v˜i ω˜iT=uciui, we can rewrite Eq. (16) in terms of v˜iand ω˜ias

x˜̇i=vricosθ˜i+ωiy˜ivci+v˜i=Kxx˜i+ωiy˜i+v˜iy˜̇i=ωix˜i+vrisinθ˜iθ˜̇i=ωriωci+ω˜i=vriKyy˜iKθsinθ˜i+ω˜iE20

Then we define a new Lyapunov function V2i=V1i+u˜iTM¯u˜i2for the closed-loop system Eq. (20), whose derivative can be written as

V̇2i=x˜ix˜̇i+y˜iy˜̇i+sinθ˜iKyθ˜̇i+u˜iTM¯u˜̇i=x˜iKxx˜i+ωiy˜i+v˜i+y˜iωix˜i+vrisinθ˜i+sinθ˜iKyvriKyy˜iKθsinθ˜i+ω˜i+u˜iTM¯u˜̇i=Kxx˜i2KθKysin2θ˜i+u˜iTx˜isinθ˜iKy+M¯u˜̇iE21

To make the system stable, the term u˜iTx˜isinθ˜iKy+M¯u˜̇ineeds to be negative definite. From the definition of u˜iand Eq. (11), we have

M¯u˜̇i=M¯u˜̇ciM¯u̇i=M¯u˜̇ci+C¯ui+F¯τ¯iE22

Motivated from the results of [9], it is easy to show that this term is negative definite if τ¯iis designed to be

τ¯i=M¯u̇ci+C¯ui+F¯+Kuu˜i+x˜isinθ˜iKy,E23

where Kuis a positive constant. Since the actual linear and angular velocity of the vehicle is unknown, we use v̂iand ω̂igenerated by the high-gain observer Eqs. (13) and (14) to replace viand ωiin Eq. (23). From the discussion in previous subsection, the convergence of velocities estimation is guaranteed.

In Eq. (23), C¯uiand F¯uiare unknown to the controller. To overcome this issue, RBFNN will be used to approximate this nonlinear uncertain term, i.e.,

HXi=C¯uiui+F¯ui=WTSXi+ϵi,E24

in which SXiis the vector of RBF, with the variable (RBFNN input) Xi=ui, Wis the common ideal estimation weight of this RBFNN, and ϵiis the ideal estimation error, which can be made arbitrarily small given sufficiently large number of neurons. Consequently, we proposed the implementable controller for the ithvehicle as follows

τ¯i=M¯u̇ci+ŴiTSXi+Kuvciv̂iωciω̂i+x˜isinθ˜iKy,E25

For the NN weights used in Eq. (25), we propose an online NN weight updating law as follows

Ŵ̇i=ΓSXiu˜iTγŴiβj=1naijŴiŴj,E26

where Γ, γ, and βare positive constants.

Theorem 1. Consider the closed-loop system consisting of the nvehicles in the MAS described by Eqs. (2) and (11) , reference trajectory qrit, high-gain observer Eqs. (13) and (14) , adaptive NN controller Eq. (25) with the virtual velocity Eq. (17) , and the online weight updating law (26) , under the Assumptions 1 and 2, then for any bounded initial condition of all the vehicles and Ŵi=0, the tracking error q˜iconverges asymptotically to a small neighborhood around zero for all vehicle agents in the MAS.

Proof: We first derive the error dynamics of velocity between uciand uiusing Eqs. (22) and (25)

u˜̇i=M¯1W˜iTSXi+εiKuvciv̂iωciω̂ix˜isinθ˜iKyE27

where ϵi=ϵviϵωiTand W˜i=WŴi. Notice that the convergence of ûito uiis guaranteed by the high-gain observer. Then we derive the error dynamics of NN weight as follows

W˜̇i=Ŵ̇i=ΓSXiu˜iT+γŴi+βj=1naijŴiŴj)E28

For the closed-loop system given by Eqs. (20), (27), and (28), we can build a positive definite function Vas

V=i=1nx˜i22+y˜i22+1cosθ˜iKy+u˜iTM¯u˜i2+traceW˜iTW˜i2ΓE29

whose derivative is equal to

V̇=i=1nx˜ix˜̇i+y˜iy˜̇i+sinθ˜iKyθ˜̇i+u˜iTM¯u˜̇i+traceW˜iTW˜̇iΓE30

By using Eqs. (27) and (28), the equation above is equivalent to

V˙=i=1n{x˜i(v˜i+ωiy˜iKxx˜i)+y˜i(vrisinθ˜iωix˜i)+sinθ˜iKy(ω˜ivriKyy˜iKθsinθ˜i)+ u˜iT[W˜iTS(Xi)+εiKuu˜i[x˜isinθ˜iKy]]+ trace(W˜iT[S(Xi)u˜iT+γŴiΓ+βΓj=1naij(ŴiŴj)])}=i=1n{Kxx˜i2KθKysin2θ˜iKuu˜iTu˜i+u˜iTεi+u˜iT[W˜iTS(Xi)] trace([W˜iTS(Xi)]u˜iT)+trace(γW˜iTŴiΓ)}trace(i=1nβΓW˜iTj=1naij(ŴiŴj))=i=1n{Kxx˜i2KθKysin2θ˜iKuu˜iTu˜i+u˜iTεi+γΓtrace(W˜iTŴi)}βΓtrace(W˜T(LI)W˜)E31

where Lis the Laplacian matrix of G, and W˜=W˜1TW˜nTT. Since βand Γare all positive, and Lis positive semi-definite, then we have βΓtraceW˜TLIW˜0. Notice that the estimation error can be made arbitrary small with a sufficient large number of neurons, and γis the leakage term chosen as a small positive constant. Therefore, we can conclude that the closed-loop system Eqs. (20), (27), and (28) is stable, i.e., V̇0, if the following condition stands

Kxx˜i2+KθKysin2θ˜i+Kuu˜iTu˜iu˜iTϵi+γΓtraceW˜iTŴiE32

Hence, the closed-loop system is stable, and all tracking error are bounded. Since all variables in Eq. (31) are continuous (i.e., V¨is bounded), then with the application of Barbalat’s Lemma [30], we have limtV̇=0, which implies that the tracking error q˜ifor all agents will converge to a small neighborhood of zero, whose size depends on the norm of u˜iTϵi+γΓtraceW˜iTŴi. Q.E.D.

### 3.3 Consensus convergence of NN weights

In addition to the tracking convergence shown in the previous subsection, we will show that all vehicles in the system is able to learn the unknown vehicle dynamics along the union trajectory (denoted as i=1nζiXit) experienced by all vehicles in this subsection.

By defining v˜=v˜1v˜nT, ω˜=ω˜1ω˜nT, W˜v=W˜1,1W˜n,1T, and W˜ω=W˜1,2W˜n,2T, we combine the error dynamics in Eqs. (27) and (28) for all vehicles into the following form:

v˜̇ω˜̇W˜̇vW˜̇ω=ABCDv˜ω˜W˜vW˜ω+EE33

in which

A2n×2n=KumIn00KuIIn,B2nN×2n=STm00STI,C2n×2nN=ΓS00ΓS,D2nN×2nN=βLIN00βLIN,

where S=diagSX1SX2SXn, and

E2nN+2nN×1=E1E2E3E4,E1=1mϵv1x˜1ϵvnx˜n,E2=1Iϵω1sinθ˜1Kyϵωnsinθ˜nKy,E3=γmŴ1,1Ŵn,1,E4=γmŴ1,2Ŵn,2.

As is shown in Theorem 1, the tracking error q˜iwill converge to a small neighborhood of zero for all vehicle agents in the MAS. Furthermore, the ideal estimation errors ϵviand ϵωican be made arbitrarily small given sufficient number of RBF neurons, and γis chosen to be a small positive constant, therefore, we can conclude that the norm of Ein Eq. (33) is a small value. In the following theorem, we will show that Wi=Wi,1Wi,2converges to a small neighborhood of the common ideal weight Wfor all i=1,,nunder Assumptions 1 and 2.

Before proceeding further, we denote the system trajectory of the ithvehicle as ζifor all i=1,,n. Using the same notation from [14], ζand ζ¯represent the parts of related to the region close to and away from the trajectory ζ, respectively.

Theorem 2. Consider the error dynamics Eq. (33) , under the Assumptions 1 and 2, then for any bounded initial condition of all the vehicles and Ŵi=0, along the union of the system trajectories i=1nζiXit, all local estimated neural weights Ŵζiused in Eqs. (25) and (26) converge to a small neighborhood of their common ideal value Wζ, and locally accurate identification of nonlinear uncertain dynamics HXtcan be obtained by ŴiTSXas well as W¯iTSXfor all Xi=1nζiXit, where

W¯i=meantaittbiŴitE34

with taitbi(tbi>tai>Ti) being a time segment after the transient period of tracking control.

Proof: According to [14], if the nominal part of closed loop system shown in Eq. (33) is uniformly locally exponentially stable (ULES), then v˜, ω˜, W˜v, and W˜ωwill converge to a small neighborhood of the origin, whose size depends on the value of E.

Now the problem boils down to proving ULES of the nominal part of system Eq. (33). To this end, we need to resort to the results of Lemma 4 in [31]. It is stated that if the Assumptions 1 and 2 therein are satisfied, and the associated vector SζXiis PE for all i=1,,n, then the nominal part of Eq. (33) is ULES. The assumption 1 therein is automatically verified since Sis bounded, and Assumption 2 therein also holds, if we set the counterparts P=Γm00Iand Q=2ΓKuIn00KuIn. Furthermore, the PE condition of SζXiwill also be met, if Xiof the learning task is recurrent [14], which is guaranteed by Assumption 2 and results from Theorem 1. Therefore, we can obtain the conclusion that v˜, ω˜, W˜v, and W˜ωwill converge to a small neighborhood of the origin, whose size depends on the small value of E.

Similar to [24], the convergence of Ŵζito a small neighborhood of Wζimplies that for all Xi=1nζiXit, we have

HX=WζT+ϵζ=ŴζiTSζX+W˜ζiTSζX+ϵζi=ŴζiTSζX+ϵ1ζiE35

where ϵ1ζi=W˜ζiTSζX+ϵζiis close to ϵζidue to the convergence of W˜ζi. With the W¯idefined in Eq. (34), then Eq. (35) can be rewritten into

HX=ŴζiTSζX+ϵ1ζi=W¯ζiTSζX+ϵ2ζiE36

where W¯ζiT=w1ζwkζTis a subvector of W¯iand ϵ2ζiis the error using W¯ζiTSζXas the system approximation. After the transient process, ϵ1ζiϵ2ζiis small for all i=1,,n.

On the other hand, due to the localization property of Gaussian RBFs, both Sζ¯and W¯ζ¯Sζ¯Xare very small. Hence, along the union trajectory i=1nζiXit, the entire constant RBF network W¯TSXcan be used to approximate the nonlinear uncertain dynamics, demonstrated by the following equivalent equations

HX=WζTSζX+ϵζHX=ŴζiTSζX+Ŵζ¯iTSζ¯X+ϵ1i=ŴiTSX+ϵ1iHX=W¯ζiTSζX+W¯ζ¯iTSζ¯X+ϵ2i=W¯iTSX+ϵ2iE37

where ϵ1iϵ1ζiand ϵ2iϵ2ζiare all small for all i=1,,n. Therefore, the conclusion of Theorem 2 can be drawn. Q.E.D.

### 3.4 Experience-based trajectory tracking control

In this section, based on the learning results from the previous subsections, we further propose an experience-based trajectory tracking control method using the knowledge learned in the previous subsection, such that the experience-based controller is able to drive each vehicle to follow any reference trajectory experienced by any vehicle on the learning stage.

To this end, we replace the NN weight Ŵiin Eq. (25) by the converged constant NN weight W¯ifor the ithvehicle. Therefore, the experience-based controller for the ithvehicle is constructed as follows

τ¯i=M¯u̇ci+W¯iTSXi+Kuvciv̂iωciω̂i+x˜isinθ˜iKy,E38

in which u̇ciis the derivative of the virtual velocity controller from Eq. (17), and W¯iis obtained from Eq. (34) for the ithvehicle. The system model Eqs. (2) and (11), and the high-gain observer design Eqs. (14) and (13) remain unchanged.

Theorem 3. Consider the closed-loop system consisting of Eqs. (2) and (11) , reference trajectory qrij=1nqjt, high-gain observer Eqs. (14) and (13) , and the experience-based controller Eq. (38) with virtual velocity Eq. (17) . For any bounded initial condition, the tracking error q˜iconverges asymptotically to a small neighborhood around zero.

Proof: Similar to the proof of Theorem 1, by defining q˜iand u˜ito be the error between the position and velocity of the ithvehicle and its associated reference trajectory, we have the error dynamics of the ithvehicle as

x˜̇i=vricosθ˜i+ωiy˜ivi=v˜i+ωiy˜iKxx˜iy˜̇i=vrisinθ˜iωix˜iθ˜̇i=ωriωi=ω˜ivriKyy˜iKθsinθ˜iu˜̇i=M¯1HXiW¯iTSXiKuvciv̂iωciω̂ix˜isinθ˜iKyE39

With the same high-gain observer design used in the learning-based tracking, the convergence of ûito uiis also guaranteed. For the closed-loop system shown above, we can build a positive definite function as

Vi=x˜i22+y˜i22+1cosθ˜iKy+u˜iTM¯u˜i2E40

and the derivative of Viis

V̇i=x˜ix˜̇i+y˜iy˜̇i+sinθ˜iKyθ˜̇i+u˜iTM¯u˜̇i=x˜iv˜i+ωiy˜iKxx˜i+y˜ivrisinθ˜iωix˜i+sinθ˜iKyω˜ivriKyy˜iKθsinθ˜i+ u˜iTϵ2iKuu˜ix˜isinθ˜iKy=Kxx˜i2KθKysin2θ˜iKuu˜iTu˜i+u˜iTϵ2iE41

where ϵ2i=HXiW¯iTSXi. Then following the similar arguments in the proof of Theorem 1, given positive Kx, Ky, Kθ, and Ku, then we can conclude that the Lyapunov function Viis positive definite and V̇iis negative semi-definite in the region Kxx˜i2+KθKysin2θ˜i+Kuu˜iTu˜iu˜iTϵ¯i. Similar to the proof of Theorem 1, it can be shown that limtV̇i=0with Barbalat’s Lemma, and the tracking errors will converge to a small neighborhood of zero. Q.E.D.

## 4. Simulation studies

Consider four identical vehicles, whose unknown friction vector is assumed to be a nonlinear function of vand ωas follows F¯=0.1mvi+0.05mvi20.2Iωi+0.1Iωi2, and since we assume the vehicles are operating on the horizontal plane, the gravitational vector G¯is equal to zero. The physical parameters of the vehicles are given as m=2kg, I=0.2kgm2; R=0.15m, r=0.05m and d=0.1m. The reference trajectories of the three vehicles are given by

xr1=sintyr1=2costxr2=2costyr2=sintxr3=2sintyr3=3costxr4=3costyr4=2sintE42

and for all vehicles, the orientations of reference trajectories and vehicle velocities satisfy the following equations

tanθri=ẏriẋri,vri=ẋri2+ẏri2,ωri=ẋriy¨rix¨riẏriẋri2+ẏri2.E43

The parameters of the observer Eqs. (13) and (14) are given as δ=0.01, and l1=l2=1. The parameters of the controller Eq. (25) with Eq. (17) are given as Kx=Ky=Kθ=1, and Ku=2. The parameters of Eq. (26) are given as Γ=10, γ=0.001, and β=10. For each i=1,2,3,4, since Xi=viωiT, we construct the Gaussian RBFNN ŴiSXiusing N=5×5=25neuron nodes with the centers evenly placed over the state space 04×04and the standard deviation of the Gaussian function equal to 0.7. The initial position of the vehicles are set at the origin, with the velocities set to be zero, and the initial weights of RBFNNs are also set to be zero. The connection between three vehicles is shown in Figure 3 , and the Laplacian matrix Lassociated with the graph Gis

L=2101121001211012.E44

Simulation results are shown as following. Figure 4a shows that the observer error will converge to a close neighborhood around zero in a very short time period, and Figure 4b shows that all tracking errors x˜iand y˜iwill converge to zero, and Figures 5a–f show that all vehicles (blue triangles) will track its own reference trajectory (red solid circles) on the 2-D frame. Figure 6b shows that the NN weights of all vehicle agents converge to the same constant, and Figure 6a shows that all RBFNNs of three vehicles are able to accurately estimate the unknown dynamics, as the estimation errors converging to a small neighborhood around zero.

To demonstrate the results of Theorem 3, which states that after the learning process, each vehicle is able to use the learned knowledge to follow any reference trajectory experienced by any vehicle on the learning stage. In this part of our simulation, the experience-based controller Eq. (38) will be implemented with the same parameters as those of the previous subsection, such that vehicle 1 will follow the reference trajectory of vehicle 3, vehicle 2 will follow the reference trajectory of vehicle 1, and vehicle 3 will follow the reference trajectory of vehicle 2. The initial position of the vehicles are set at the origin, with all velocities equal to zero.

Simulation results are shown as following. Figure 7a shows that the observer error will converge to a close neighborhood around zero in a very short time period. Figures 8a–c show that all vehicles (blue triangles) will track its own reference trajectory (red solid circles), and Figure 7b shows that all tracking errors x˜iand y˜iwill converge to zero.

## 5. Conclusion

In this chapter, a high-gain observer-based CDL control algorithm has been proposed to estimate the unmodeled nonlinear dynamics of a group of homogeneous unicycle-type vehicles while tracking their reference trajectories. It has been shown in this chapter that the state estimation, trajectory tracking, and consensus learning are all achieved using the proposed algorithm. To be more specific, any vehicle in the system is able to learn the unmodeled dynamics along the union of trajectories experienced by all vehicles with the state variables provided by measurements and observer estimations. In addition, we have also shown that with the converged NN weight, this knowledge can be applied on the vehicle to track any experienced trajectory with reduced computational complexity. Simulation results have been provided to demonstrate the effectiveness of this proposed algorithm.

## How to cite and reference

### Cite this chapter Copy to clipboard

Xiaonan Dong, Paolo Stegagno, Chengzhi Yuan and Wei Zeng (April 22nd 2020). Cooperative Adaptive Learning Control for a Group of Nonholonomic UGVs by Output Feedback, Multi Agent Systems - Strategies and Applications, Ricardo López - Ruiz, IntechOpen, DOI: 10.5772/intechopen.87038. Available from:

### Related Content

Next chapter

#### Multiagent Systems for 3D Reconstruction Applications

By Metehan Aydın, Erkan Bostancı, Mehmet Serdar Güzel and Nadia Kanwal

First chapter

#### BOLD fMRI Simulation

By Zikuan Chen and Vince Calhoun

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

View all Books