Self-Learning Low-Level Controllers

Humanoid robots are complicated systems both in hardware and software designs. Furthermore, the robots normally work in unstructured environments at which unpredictable disturbances could degrade control performances of whole systems. As a result, simple yet effective controllers are favorite employed in low-level layers. Gain-learning algorithms applied to conventional control frameworks, such as Proportional-Integral-Derivative, Sliding-mode, and Backstepping controllers, could be reasonable solutions. The adaptation ability integrated is adopted to automatically tune proper control gains subject to the optimal control criterion both in transient and steady-state phases. The learning rules could be realized by using analytical nonlinear functions. Their effectiveness and feasibility are carefully discussed by theoretical proofs and experimental discussion.


Introduction
Precise motion control of low-level systems is one of the most important tasks in industrial and humanoid robotic systems [1][2][3]. Different from industrial robots which commonly operate in compact regions with simple and almost repetitive missions, humanoid robots perform complicated works and face to unknown disturbances in daily activities. Hence, designing a high-performance controller that is easy to use in real-time implementation for such the maneuver systems is a big challenge [4,5].
To accomplish motion control in real-time applications, conventional proportional-integral-derivative (PID) controllers are the first selection from engineers and researchers thanks to simplicity in design and acceptable control outcome for uncertain systems [6][7][8][9][10][11]. Stability of the servo-controlled systems is proven by theoretical analyses, and their flexibility could be enhanced using machine-learning methods such as ordinary or neuro fuzzy-logic-based self-tuning [7,10,11], poleplacement adaptation [8], or convolutional learning [9]. However, using linear control signals to suppress the nonlinear behaviors of the robotic dynamics may lead to unexpected transient performance. To overcome this drawback, nonlinear controllers such as sliding mode control (SMC), backstepping control (BSC), or inverse dynamical control have gotten attention from developers [12][13][14][15][16][17]. Indeed, a robust-integral-sign-error (RISE) controller was studied to consolidate lumped disturbances inside the system dynamics for achieving asymptotic control results [13]. In another direction, a model-based nonlinear disturbance-observer controller was proposed based on the backstepping technique to yield excellent control accuracies [15]. Nevertheless, extended studies noted that the outstanding control performances are difficult to be preserved with hard control gains employed in diverse real-time operations [18,19].
As a result, gain-learning SMC algorithms have been developed for robotic systems [18][19][20][21]. The control objective could be minimized by learning processes of robust gains, driving gains or massive gains [22,23]. In fact, some control gains still need to manually tune for their possibly wide ranges due to nature of each control plant. Thus, it may lead to inconvenience during the operation.
Intelligent methods for automatically tuning all the control gains have been also proposed based on modified backtracking search algorithms (MBSA) combining with a Type-2 fuzzy-logic design [24] or model predictive approaches [25]. The desired gains could be estimated for the best performance by dealing with closedloop optimal constraints. Though promising control results were presented, smooth variation of the gain dynamics need to further consideration.
Gain-learning control approaches under backstepping design provided another interesting direction as well. PID control with a gain-varying technique encoded by the backstepping scheme was formerly studied [26]. Success of the creative control method was confirmed by a thorough theoretical proof and experimental validation results. Since the learning process of all the control gain is generated only by one damping function, versatility of the control design may be limited for diverse working conditions. Improvement on the flexibility of gain selection is thus still an open issue.
In this chapter, an extensive gain-adaptive nonlinear control approach is presented for high-performance motion control of a low-level servo system. The controller is comprised of an inner robust nonlinear loop and an outer gain-learning loop. The inner loop is developed based on a RISE-modified backstepping framework to ensure asymptotic tracking control in the existence of nonlinear uncertainties and disturbances. The second loop contains a new gain-adaptive engine to activate variation gains of the inner loop in real-time applications. Theoretical effectiveness of the proposed controller is concretely proven by Lyapunov-based analyses. Feasibility of the control approach was confirmed by intensive real-time experiments on a legged robot. Their features are presented in detail in the below sections.

Problem statements
General dynamics of a robotic system could be expressed in the following form: where q, _ q, € q ∈ ℜ n are respectively the joint position, velocity and acceleration vectors, M q ð Þ ∈ ℜ nÂn is the inertia matrix, C q, _ q ð Þ∈ ℜ nÂn is the Centrifugal/Coriolis matrix, g q ð Þ ∈ ℜ n denotes the gravitational torque, τ fr _ q ð Þ ∈ ℜ n is the frictional torque, J T is the respective Jacobian matrix, f ext is the external disturbance, and τ is control torque at robot joints.
The main control objective here is to find out a proper control signal τ that ensures a control error between the system output and a desired profile stabilizing at origin under various complicated environments.
To realize the control objective, conventional linear or nonlinear controllers such as Proportional-Integral-Derivative (PID) and Sliding mode control (SMC) methods are priority selections in industry thanks to their simplicity and robustness. However, such the mission in humanoid robots is a different story in which the systems frequently operate in unknown environments with harshly unpredictable disturbances [27,28]. Obviously, the required controller is strong robustness, fast adaptation, and easy implementation.

Low-level intelligent nonlinear controller
In this subsection, a position controller is developed based on the general model using the backstepping technique and new adaptation laws. The dynamics Eq. (1) can be splitted for low-level subsystems under the following state-space form: where x 1 ¼ q i|i¼1::n presents a specific joint angle, x 2 is the measurement joint velocity, u ¼ τ i|i¼1::n is the control torque at the specific joint, υ is the measurement noise, a 1 is a positive constant presenting the nominal dynamics, a 2 is another positive constant standing for the inverse nominal mass at low-level dynamics, and d is the lumped disturbance denoting the deviation of internal dynamics. Note that, x 1 and x 2 hold for the following assumptions: Assumption 1: a. The system output x 1 is measurable.
b. The angular velocity x 2 ð Þ is bounded and is indirectly measured from the angular data with a bounded tolerance υ ð Þ:

Robust backstepping control scheme
Let formulate the main control error as: where x 1d is the desired trajectory of the controlled joint. Before designing the final control signal, additional assumptions are given. Assumption 2: a. The measurement noise υ is bounded and differentiable up to the second order.
b. The disturbance d and its time derivative are bounded.
c. The desired signal x 1d is bounded and differentiable up to the third order.
The time derivative of the control objective e 1 in considering the first equation of dynamics Eq. (2) is: To control the error e 1 to zero or to be as small as possible, a virtual control signal is employed to remove the time derivative of the desired signal and to compensate for the disturbance υ: where k 1 is a positive constant.
A new state control error is defined as: Differentiating the new error with respect to time and using the second equation of the dynamics Eq. (2) lead to To drive the new control error e 2 to an expected range, the final control signal is proposed as follows, including two sub-control terms (a model-based term and robust term): where k i|i¼2,3,4,5 are positive control gains. Stability of the closed-loop system under the controller Eq. (8) can be confirmed by the following statement.
Proof of Lemma 1 is given in Appendix A. Remark 1: Lemma 1 reveals that the closed-loop system is stabilized at a vicinity around zero under the constrain Eq. (9). Obviously, acceptable control performance could be resulted in with proper control gains selected.
Effectiveness of the nonlinear control structure is achieved by the following statement: Theorem 1: Given a closed loop system satisfying Lemma 1, it asymptotically converges if properly further choosing the control gains such that: Proof of Theorem 1 is discussed in Appendix B. Remark 2: In real-time situations [15,29,30], the position data x 1 are employed to approximate the velocity x 2 throughout a low-pass filter. Thus, the perturbance term (υ) obviously exists in the studied model Eq. (2) and its variation depends on the used filter.
Remark 3: With the robust backstepping control scheme designed, an excellent control performance can be resulted in by the proper control gains selected regardless of the presence of the disturbances. Perfectly selecting the gains for a good transient performance and maintaining high-precision control results for divergent working conditions in the real-time control is not a trivial work.

Auto gain-tuning rules
To effectively support gain selection for users, a simple strategy for gain tuning is employed: the control gains k i j i≜1:: 5 À Á are separated into two terms: nominal elements k i i≜1:: 5 and variation elements k i i≜1:: 5 . The nominal ones play a key role in ensuring stability of the closed-loop system. The variation gains are self-adjusted to suppress unpredictable disturbances for the expected transient performance. Furthermore, to ensure high control quality by avoiding sudden change of the gain variation, which could activate a chattering problem [25], the following constraints are noted.

Assumption 3:
The variation terms k i i≜1:: 5 and their first-order time derivatives are bounded. Under operation of the flexible gains, the nonlinear control signal Eq. (8) is modified: where sat k i

Lemma 2:
If a closed-loop system satisfies Lemma 1, it is stable for the time-varying gains complying with Assumption 3, and i¼1::5 Proof of Lemma 2 is given in Appendix D.
To comply with Assumption 3, the learning laws for the dynamic gains is structured from activation functions of the state control errors and leakage functions, which make sure boundedness of the learning gains.
The learning rules for the variation gains are proposed as follows: where η i|i¼2::5 and σ i|i¼1::5 are positive learning rates.
To investigate the control performance of the learning control system, a new theorem is given.

Theorem 2:
If applying the control gains updated using Eq. (13) to a closed-loop system satisfying Lemma 2, asymptotic convergences of the state control error and variation gains are obtained.
Proof of Theorem 2 could be referred in Appendix E. Remark 4: Overview of the proposed controller is sketched in Figure 1. As stated in Theorem 1, the stability of the closed-loop system is ensured in a robust control framework, and as proven in Theorem 2, the adaptation of the control structure is highlighted by all the control gains learning for minimizing the tracking control error. The form of Eq. (E.4) reveals that the learning rates (σ i|i¼1::5 and η i|i¼2::5 ) can be employed with predefined values for specific control hardware.
Remark 5: In real-time applications, the proposed algorithm will be deployed in a discrete-time environment, the control errors will converge to arbitrary vicinities around zero. The desired control range can be however minimized under the learning mechanism proposed.

Setup
In this section, control performance of the intelligent controller is discussed based on verification results carried out in a real-time legged 2DOF robot. The experimental leg included one hip joint and one knee joint which were actuated by two BLDC motors. The mechanical design and a photograph of the actual leg are presented in Figure 2.
Incremental encoders were used to measure the joint angles, while a force sensor was placed in the shank of the robot to evaluate the ground contact force. The velocity signal was calculated from filtered backward differentiation of the position data. The robot was setup to freely move in both x and y directions. Total weight of the robot was about 15.74 kg. The proposed control algorithm was deployed in a NI Electrical Controller throughout LABVIEW software with a sampling time of 2 ms. The time derivative and integral terms in real-time implementation were approximated by Euler backward methods.
Two systematic parameters a 1j , a 2j À Á j¼h,k of the low-level systems could be estimated offline or online using a model-based identification method derived in previous works [27,31,32]. Nominal values of the parameters were approximately determined as a 1h ¼ 2:5; a 2h ¼ 12:25; a 1k ¼ 0:5; a 2k ¼ 15:

Comparative control results
Both the hip and knee joints were controlled at the same time using the same control algorithm proposed. The controller was also compared with an adaptive robust extended-state-observer-based (ARCESO) controller, a robust integral-signerror (RISE) controller, and another case of itself with fixed gains (nominal gains) in Eq. (8), which is denoted as the robust backstepping (RB) controller.
The ARCESO controller was designed based on a previous work [30] wherein their control gains were chosen as The RISE controller was implemented based on a robust integral theory [13] to control the studied system Eq. (2) without considering the measurement noise υ ð Þ. Its control signal was: The RISE control gains were set to be: The nominal gains of the proposed controllers were chosen to be: The excitation signals ε and φ ð Þof the learning laws Eq. (13) were directly synthesized from the control error (e 1 ) and its high-order time derivatives based on Eq. (D.1):

Figure 2.
Design and setup of the experimental testing system.
From the nominal control gains selected, the feasible ranges of the variation gains were then chosen to gratify the constraint Eq. (9): The learning rates (σ i|i¼1::5 and η i|i¼2::5 ) were then set to comply with the condition Eq. (12) and to ensure the variation gains freely varying inside their predetermined ranges. For simplicity, the relaxation rates (η i|i¼2::5 ) could be chosen to be 1 or 2. Finally, the rates tuned were as

Simple verification
In this validation series, the proposed controller was only applied for positiontracking control of the hip joint. A sinusoidal signal of x 1dh ¼ 14 sin 4πt ð Þ deg ð Þ was chosen as the desired trajectory of the test. The leg was put to move freely in the air to eliminate the external disturbance. Figure 3(a) presents the experimental data obtained by the comparative controllers. The ARCESO controller produced a very small control error of AE0.14 deg. ($1.0%) in the high-speed tracking control thanks to the use of an effective adaptive-disturbance learning mechanism. The ARCESO control performance was still however limited with fast-variation disturbances [30]. By adopting the integral-robust control signal Eq. (14) to compensate for the lumped disturbance (d) in the low-level system Eq. (2), the RISE controller also exhibited a high control accuracy (control error: [À0.16; 0.14] deg. ($1.14%)). In fact, in real-time applications, improper control gains selected or large measurement noise (υ) could degrade the RISE control performance. As operating under the highly robust design Eq. (8) against all the disturbances, the RB technique provided better control precision (control error: AE0.138 deg. ($ 0.98%)). Theoretically, the control performance could be further increased if the best control gains were found, but it may be a time-consuming work. As a solution, the gain-tuning process could be supported by the learning mechanism Eqs. (11) and (13) proposed. Indeed, the control quality was intuitively enhanced by applying GARB control method, which yielded the smallest control error of AE0.085 deg. ($0.6%).
The gain-learning behaviors are illustrated in Figure 3(b). As seen in the figure, the variation gains were automatically changed in various ways under the adaptation laws to minimize the control error. The maximum-absolute (MA) and rootmean-squares (RMS) values of the control errors from after system was stable (from 2 s to 5 s) are summarized in Table 1. Herein, the proposed controller shows outperformance as comparing to the previous methods.

Complex verification
To deeper challenge to the special properties of the proposed controller, the robot was controlled to perform a squatting exercise in three different working cases: in the air, on the ground, and with ground contact. The frequency and amplitude of the squatting motion were selected to be 2 Hz and 80 mm, respectively. These tests are normal working cases of the leg in real-time missions. The desired trajectories x 1dh and x 1dk ð Þ of the two robot joints (hip and knee) are plotted in Figure 4. The trajectories were derived from desired foot motion of  Table 1.

Figure 4.
Desired profiles of the robot joints in the multiple-joint tests.

Verification with minor external disturbances
Although the robot worked in the air, the disturbances affecting the control joints were large due to high-speed control and interaction forces between the joints during the system movement. The dynamical and statical control results obtained by the validated controllers are respectively shown in Figure 5 and Table 2. In spite of operating with faster motions (192.2 (deg/s) and 324.8 (deg/s) for the hip and knee joints) and in harder internal disturbance conditions, the ARCESO controllers

Experimental results of the testing controllers for the multiple-joint test in case of small external disturbance. (a). Control errors of the comparative controllers. (b). Control inputs generated by the comparative controllers. (c) Forces measured at the shank with respect to the GARB controllers. (d) Gain learning of the GARB controllers.
maintained high control outcome thanks to the strong adaptation ability: AE0:8deg $ 4:8% ð Þand AE1:5 deg $ 4:1% ð Þ for the hip and knee joints. As seen in Figure 5(a), the robust backstepping designs coped with the reaction forces as well. The RB and RISE controllers stabilized the control errors inside acceptable ranges: the errors for hip joint and knee joint are respectively À0: Comparison of the control power required to conduct the high-speed control motions is shown in Figure 5(b). Although the control efforts of the controllers were almost same for this mission. Only minor disparate nonlinearities in the control signals would lead to the divergence on control performances. The figure also reveals that the GARB controllers generated applicable control inputs even though the learning gains were moderated in a risk of the high-order measurement noise. The benefit comes from the low-pass-filter-like nature of the gain-learning algorithm proposed. External force affecting the leg measured in the shank using the GARB controllers is presented in Figure 5(c). The coordinate of the measured force is sketched in Figure 2. This experiment shows the higher control accuracies and demonstrates the advantages of the proposed controller as comparing to other controllers.

Verification with large external disturbances
In this experiment, the robust adaptive ability of the proposed controller was harshly investigated under conditions of heavy external load. The robot was put on the ground and supported by sliders in both the x and y directions. To avoid damage for the robot, only the proposed controller was used in the verification. The control results obtained are plotted in Figure 6. In this test, the external forces reacting from environment were significantly increased from 10 N to 390 N. The data presented in Figure 6(a) however implies that the controller still provided acceptable control accuracy: À0: Þfor the hip and knee joints. As demonstrated in Figure 6(b), in this case the system used larger energy than in the second one to execute the fast-tracking control under critical conditions. As presented in Figure 6(d), the control gains were also automatically changed to higher values to deal with large disturbances for a smallest possible control error. Hence, the strong robustness and fast adaptability of the proposed method can be confirmed via this investigation.

Verification with fast-variation external disturbances
In this case study, transient behaviors of the designed controller were carefully validated by using fast-variation external disturbances. The robot was still controlled to conduct the same squatting work. Harder testing conditions were constituted by two consecutive distinguished phases of one working cycle: a groundcontact phase and ground-release phase. Figure 7(c) shows the ground-reaction forces measured during the test. The nature of the external disturbance in this case was different from those in the previous cases. Fast variation of the reaction forces may make the system instable. The control system designed had however showed the concrete robustness and impressive adaptation in real-time control again.
As presented in Figure 7(a), the closed-loop system provided good performance: À0: 22   parameters were varied to properly adapt to change of the new working conditions. Figure 7(d) shows that the new ranges of the control gains were found by the proposed algorithm, and Figure 7(b) presents the required energy for the new test.

Additional Statical note
The RMS values of the control errors, control signals (u), and the groundreaction forces for the hip and knee joints of the complex validation process are noted in Table 3. The data imply that the GARB controller was able to result in good control performances with the preset learning rates in the high-speed task under different working conditions. The learning mechanism and robust control technique generated proper power for each test case to effectively realize the control objective. Some snapshots of the robot movement in the last experiment are shown in Figure 8.

Discussion
In many humanoid robots, so far, one mainly focuses on building complicated high-level control structures while in the low-level framework simple controllers, such as PID or SMC, were normally employed to realize the given command [28,33]. Obviously, to ensure the whole system operate as expected, auto-adjusting terms must be implemented at the upper-level framework to compensate for the imperfection of the simple low-level actions [34,35]. With such the cross-over interference between the control layers, it was hard to provide high accuracies and fast responses for the overall system [28,36]. Indeed, in our real-time experiments with the legged robot, well-tuned PID controllers could be adopted for squatting tests in a certain case. When the working condition changed, the control system could be damaged by the PID controller due to degradation of the control performance. Of course, precision controllers could be employed in the low-level layer but their simplicity in implementation and less computation burden should be preserved. The gain-adaptive robust backstepping control algorithm has been developed in comply with these strict requirements.
As noted in the control signal Eq. (8), if one chooses k 5 = 0 and a 1 = 0, the nonlinear control method becomes an ordinary PID controller. In another sense, if the control gains k 4 and k 5 are removed, the control signal Eq. (8) Table 3.
Performance comparison of the garb controllers in the multiple-joint tests. conventional form of the SMC scheme in which e 2 is the sliding surface. Hence, users have various options in adoption of the designed controller, which could be easily switched to basic control options [6,8,28,35]. Note also that the input gain constant (a 2 ) could be selected with an arbitrarily positive constant while the nominal dynamical constant (a 1 ) could be zero or any bounded value. Their deviations could be counted into the lumped disturbance (d) or extended disturbance (h). One possible way to determine such the terms is use of the model-based identification method presented in previous works [27,30,31].
As comparing to other intelligent gain-learning algorithms such as neural network or fuzzy logic engines, the computational burden and fast response are noteworthy advantages [9-11, 37, 38]. However, in some cases, one does not need to use the nominal dynamics or (a 1 = 0), and at that time, overall design of the proposed control method becomes a model-free controller.
The experimental results have confirmed the outperformance of the gain-learning controllers over other robust adaptive nonlinear controllers, such as ARCESO and RISE [13,30], thanks to a high-degree-of-learning mechanism. Furthermore, the designed controller has been improved from the former controller [27] to increase the real-time applicability by removing third-order time-derivation terms in the control signal.
From the above analyses, the flexibility of the designed controller in terms of working efficiency and user implementation are intuitively observed. Its feasibility in movable robots have been also confirmed by intensive experiments.

Summary
This chapter presents a gain-adaptive robust position-tracking controller for low-level subsystems of large robotic systems. The mathematical model of the system dynamics was reviewed to provide necessary information for the controller design. To realize the tracking control objective, a robust control signal based on the backstepping scheme was adopted. In fact, this design is a nonlinear extension of ordinary PID controller or conventional sliding mode controller. New adaptation laws were developed to automatically tune the control gains for different working conditions. The learning mechanism was activated by various forms of the control error and deactivated by the relaxation functions.
Stability of the overall system was concretely maintained by proper Lyapunovbased constraints. Extended real-time experiments were conducted to verify the performance of the proposed controller. The results achieved confirmed the advantages on the robustness, adaptation, high accuracy, and fast response of the proposed controller. Depending on the usage purpose of user, the controller could be simplified to become a gain-learning PID controller or an adaptive robust sliding mode controller.

Appendix A. Proof of Lemma 1
Let define the following new disturbance: Also synthesize a new state variable and lumped term as follows: By noting Eqs. (3), (4), (8), and (A.2), the following dynamics are obtained: The following positive function is studied: where V 100 is a positive constant selected as.

B. Proof of Theorem 1
A new Lyapunov function is investigated.
where P 1 t ð Þ is a positive function defined as: The proof of the function P 1 t ð Þ can be referred in Appendix C. The time derivative of the Lyapunov function in adoption of Eqs. (A.6) and (B.2) is.
From Eqs.

C. Proof of the positive function P1(t)
The function P 1 t ð Þ expressed in Eq. (B.2) can be expanded using the error dynamics Eq. (A.3) and integral inequalities as follows: By applying the integrating procedures in previous works [8] and comparison inequality, we have.
The proof is completed by noting Lemma 1, the conditions Eqs.

D. Proof of Lemma 2
By applying the control input Eq. (11) to the dynamics Eq. (2), the closed-loop system is: A new positive function is studied.

(D.4)
By employing the same discussion with Lemma 1 under Assumption 2, Lemma 2 is proven.■

E. Proof of Theorem 2
Let consider the following Lyapunov function: where P 2 t ð Þ is a positive function which is chosen as follows: The proof of the function P 2 t ð Þ can be satisfactory using the similar arguments presented in Appendix C with the following conditions: Substituting Eqs. (D.1) and (13) to the time derivative of the new Lyapunov function leads to.

F. Inverse Kinematics of the robot leg
The desired angles of the leg joints (hip x 1dh ð Þand knee x 1dk ð Þ) can be calculated from the position of the foot (the end-effector) using the following inverse kinematics: where l 1 ¼ 0:21 m and l 2 ¼ 0:295 m are the link lengths of robot (thigh and shank), respectively. P x and P y are the end-effector position of the robot foot with respect to the robot coordinate setting at the hip joint, as sketched in Figure 2(b). The feasible working range of the hip joint was selected to be 0 ! þ80 ½ deg: