n = 1000.
Here, we consider two important classes of unconstrained optimization methods: conjugate gradient methods and trust region methods. These two classes of methods are very interesting; it seems that they are never out of date. First, we consider conjugate gradient methods. We also illustrate the practical behavior of some conjugate gradient methods. Then, we study trust region methods. Considering these two classes of methods, we analyze some recent results.
- conjugate gradient method
- hybrid conjugate gradient method
- three-term conjugate gradient method
- modified conjugate gradient method
- trust region methods
Remind to the unconstrained optimization problem which we can present as
where is a smooth function.
Here, we consider two classes of unconstrained optimization methods: conjugate gradient methods and trust region methods. Both of them are made with the aim to solve the unconstrained optimization problem (1).
In this chapter, at first, we consider the conjugate gradient methods. Then, we study trust region methods. Also, we try to give some of the most recent results in these areas.
2. Conjugate gradient method (shortly CG)
The conjugate gradient method is the method between the steepest descent method and the Newton method.
The conjugate gradient method in fact deflects the direction of the steepest descent method by adding to it a positive multiple of the direction used in the last step.
The restarting and the preconditioning are very important to improve the conjugate gradient method .
Consider positive definite quadratic function
where G is an symmetric positive definite matrix, , and is a real number.
Theorem 1.2.1.  (Property theorem of conjugate gradient method) For positive definite quadratic function (2), FR conjugate gradient method with the exact line search terminates after steps, and the following properties hold for all , :
where is the number of distinct eigenvalues of .
Now, we give the algorithm of conjugate gradient method.
Algorithm 1.2.1. (CG method).
Assumptions: and . Let , , , , , and .
Step 1. If , then STOP.
Step 2. Calculate the step-size by a line search.
Step 3. Calculate by any of the conjugate gradient method.
Step 4. Calculate .
Step 5. Set .
Step 6. Set and go to Step 1.
2.1 Convergence of conjugate gradient methods
Theorem 1.2.2.  (Global convergence of FR conjugate gradient method) Suppose that is continuously differentiable on a bounded level set
and let FR method be implemented by the exact line search. Then, the produced sequence has at least one accumulation point, which is a stationary point, i.e.:
When is a finite sequence, then the final point is a stationary point of .
When is an infinite sequence, then it has a limit point, and it is a stationary point.
In , a comparison of two methods, the steepest descent method and the conjugate gradient method which are used for solving systems of linear equations, is illustrated. The aim of the research is to analyze, which method is faster in solving these equations and how many iterations are needed by each method for solving.
The system of linear equations in the general form is considered:
where matrix is symmetric and positive definite.
The conclusion is that the method is a faster method than the , because it solves equations in less amount of time.
By the other side, the authors find that the method is slower but more productive than the , because it converges after less iterations.
So, we can see that one method can be used when we want to find solution very fast and another can converge to maximum in less number of iterations.
Again, we consider the problem (1), where is a smooth function and its gradient is available.
A hybrid conjugate gradient method is a certain combination of different conjugate gradient methods; it is made to improve the behavior of these methods and to avoid the jamming phenomenon.
An excellent survey of hybrid conjugate gradient methods is given in .
Three-term conjugate gradient methods were studied in the past (e.g., see [8, 32, 34], etc.); but, from recent papers about methods, we can conclude that maybe the mainstream is made by three-term and even four-term conjugate gradient methods. An interesting paper about a five-term hybrid conjugate gradient method is . Also, from recent papers we can conclude that different modifications of the existing methods are made, as well as different hybridizations of and methods.
Consider unconstrained optimization problem (1), where is a continuously differentiable function, bounded from below. Starting from an initial point , the three-term conjugate gradient method with line search generates a sequence , given by the next iterative scheme:
where is a step-size which is obtained from the line search, and
In the last relation, and are the conjugate gradient parameters, , , and . We can see that the search direction is computed as a linear combination of , , and .
In , the author suggests another way to get three-term conjugate gradient algorithms by minimization of the one-parameter quadratic model of the function . The idea is to consider the quadratic approximation of the function in the current point and to determine the search direction by minimization of this quadratic model. It is assumed that the symmetrical approximation of the Hessian matrix satisfies the general quasi-Newton equation which depends on a positive parameter:
In this paper the quadratic approximation of the function is considered:
The direction is computed as
where the scalar is determined as the solution of the following minimizing problem:
Using the idea of Perry , the author obtains
In fact, in this approach the author gets a family of three-term conjugate gradient algorithms depending of a positive parameter .
Next, in , the conjugate gradient () formula, with , is further studied. A three-term algorithm is presented, which has the sufficiently descent property without any conditions. The global convergence and the linear convergence are proven; moreover, the -step quadratic convergence with a restart strategy is established if the initial step length is appropriately chosen.
The first three-term Hestenes-Stiefel () method (method) can be found in .
Baluch et al.  describe a modified three-term Hestenes-Stiefel () method. Although the earliest conjugate gradient method achieves global convergence using an exact line search, this is not guaranteed in the case of an inexact line search. In addition, the method does not usually satisfy the descent property. The modified three-term conjugate gradient method from  possesses a sufficient descent property regardless of the type of line search and guarantees global convergence using the inexact Wolfe-Powell line search [50, 51]. The authors also prove the global convergence of this method. The search direction, which is considered in , has the next form:
In , an accelerated three-term conjugate gradient method is proposed, in which the search direction satisfies the sufficient descent condition as well as extended Dai-Liao conjugacy condition:
This method seems different from the existent methods.
Next, Li-Fushikuma quasi-Newton equation is
In , some new conjugate gradient methods are extended, and then some three-term conjugate gradient methods are constructed. Namely, the authors remind to [41, 42], with its conjugate gradient parameters, respectively:
The three-term and methods are introduced in .
The search direction can be expressed as
An important property of the proposed methods is that the search direction always satisfies the sufficient descent condition without any line search, that is, the next relation always holds
Under the standard Wolfe line search and the classical assumptions, the global convergence properties of the proposed methods are proven.
Motivated by , as well as by , in , a new hybrid nonlinear CG method is proposed; it combines the features of five different CG methods, with the aim of combining the positive features of different non-hybrid methods. The proposed method generates descent directions independently of the line search. Under some assumptions on the objective function, the global convergence is proven under the standard Wolfe line search. Conjugate gradient parameter, proposed in , is
Let’s note that the proposed method is hybrid of , , , , and .
The behaviors of the methods , , , , , and are illustrated by the next tables.
The test criterion is CPU time.
The tests are performed on the computer Workstation Intel Celeron CPU 1,9 GHz.
The experiments are made on the test functions from .
Each problem is tested for a number of variables and .
In , based on the numerical efficiency of Hestenes-Stiefel () method, a new modified algorithm is proposed for unconstrained optimization. The new direction independent of the line search satisfies the sufficient descent condition. Motivated by theoretical and numerical features of three-term conjugate gradient () methods proposed by , similar to the approach in , the new direction is computed by minimizing the distance between the direction and the direction of the three-term methods proposed by . Under some mild conditions, the global convergence of the new method for general functions is established when the standard Wolfe line search is used. In this paper the conjugate gradient parameter is given by
where is a parameter. Also, the global convergence is proven under standard conditions.
3. Trust region methods
We remind that the basic idea of Newton method is to approximate the objective function around by using a quadratic model:
where , , and also use the minimizer of to set .
Also, remind that Newton method can only guarantee the local convergence, i.e., when is small enough and the method is convergent locally.
Further, Newton method cannot be used when Hessian is not positive definite.
There exists another class of methods, known as trust region methods. It does not use the line search to get the global convergence, as well as it avoids the difficulty which is the consequence of the nonpositive definite Hessian in the line search.
Furthermore, it produces greater reduction of the function than line search approaches.
Here, we define the region around the current iterate:
where is the radius of , inside which the model is trusted to be adequate to the objective function.
Our further intention is to choose a step which should be the approximate minimizer of the quadratic model in the trust region. In fact, should be the approximately best point on the sphere:
with the center and the radius .
In the case that this step is not acceptable, we reduce the size of the step, and then we find a new minimizer.
This method has the rapid local convergence rate, and that’s the property of Newton method and quasi-Newton method, too, but the important characteristic of trust region method is also the global convergence.
Since the step is restricted by the trust region, this method is also called the restricted step method.
The model subproblem of the trust region method is
where is the trust region radius and is a symmetric approximation of the Hessian .
In the case that we use the standard norm , is the minimizer of in the ball of radius . Generally, different norms define the different shapes of the trust region.
The problem by itself is the choice of at each single iteration.
If the agreement between the model and the objective function is satisfactory enough, the value should be chosen as large as it is possible. The expression is called the actual reduction, and the expression is called the predicted reduction; here, we emphasize that
measures the agreement between the model function and the objective function .
If is close to 0 or it is negative, the trust region is going to shrink; otherwise, we do not change the trust region.
The conclusion is that is important in making the choice of new iterate as well as in updating the trust region radius . Now, we give the trust region algorithm.
Algorithm 1.3.1. (Trust region method).
Assumptions: , , , , , and .
Step 1. If , then STOP.
Step 3. Compute and . Set
Step 4. If , then .
If , then .
If and , then .
Step 5. Generate , update , set , and go to Step 1.
In Algorithm 1.3.1, is a bound for all . Those iterations with the property (and so those for which ) are called very successful iterations; the iterations with the property (and so those for which ) are called successful iterations; and the iterations with the property (and so those for which ) are called unsuccessful iterations. Generally, the iterations from the two first cases are called successful iterations.
Some choices of parameters are , , , , , and . The algorithm is insensitive to change of these parameters.
Next, if , then can be chosen from on the basis of a polynomial interpolation.
In the case of quadratic interpolation, we set
3.1 Convergence of trust region methods
Assumption 1.3.1 (Assumption ).
We assume that the approximations of Hessian are uniformly bounded in norm and the level set is bounded, as well as is continuously differentiable on . We allow the length of the approximate solution of the subproblem (17)–(18) to exceed the bound of the trust region, but we also assume that
where is a positive constant.
In this kind of trust region way of thinking, generally we do not seek an accurate solution of the subproblem (17)–(18); we are satisfied by finding a nearly optimal solution of the subproblem (17)–(18).
Strong theoretical as well as numerical results can be obtained if the step , produced by Algorithm 1.3.1, satisfies
Theorem 1.3.1  Under Assumption , if Algorithm 3.1 has finitely many successful iterations, then it converges to the first-order stationary point.
Theorem 1.3.2  Under Assumption , if Algorithm 3.1 has infinitely many successful iterations, then
In , it is emphasized that trust region methods are very effective for optimization problems and a new adaptive trust region method is presented. This method combines a modified secant equation with the update formula and an adaptive trust region radius, where the new trust region radius makes use of not only the function information but also the gradient information. Let be a positively definite matrix based on modified Cholesky factorization . Under suitable conditions, in  the global convergence is proven; also, the local superlinear convergence of the proposed method is demonstrated. Motivated by the adaptive technique, the proposed method possesses the following nice properties:
The trust region radius uses not only the gradient value but also the function value.
Computing the matrix of the inverse and the value of , at each iterative point , is not required.
The computational time is reduced.
A modified secant equation is introduced:
where , , and .
When is twice continuously differentiable and is generated by the formula, where , this modified secant Eq. (19) possesses the following nice property:
and this property holds for all .
Under classical assumptions, the global convergence of the method presented in  is also proven in this paper.
In , the hybridization of monotone and non-monotone approaches is made; a modified trust region ratio is used, in which more information is provided about the agreement between the exact and the approximate models. An adaptive trust region radius is used, as well as two accelerated Armijo-type line search strategies to avoid resolving the trust region subproblem whenever a trial step is rejected. It is shown that the proposed algorithm is globally and locally superlinearly convergent. In this paper trust region methods are denoted shortly by ; it is emphasized that in method, having in view that the iterative scheme is
and it often happens that is an approximate solution of the following quadratic subproblem:
Performance of the methods is much influenced by the strategy of choosing the radius at each iteration. To determine the radius , in the standard method, the agreement between and is evaluated by the so-called ratio :
When is negative or a small positive number near to zero, the quadratic model is a poor approximation of the objective function. In such situation, should be decreased and, consequently, the subproblem (20) should be solved again. However, when is close to , it is reasonable to use the quadratic model as an approximation of the objective function. So, the step should be accepted and can be increased. Here, the authors use the modified version of :
where , , , and . Also,
where which is originally used by Toint .
The conjugate gradient methods and trust region methods are very popular now.
Many scientists consider these methods.
Namely, different modifications of these methods are made, with the aim to improve them.
Next, the scientists try to make not only new methods but also whole new classes of methods. For the specific values of the parameters, individual methods are distinguished from these classes. It is always more desirable to make a class of methods instead of individual methods.
Hybrid conjugate gradient methods are made in many different ways; this class of conjugate gradient methods is always actual.
Further, one of the contemporary trends is to use BFGS update in constructions of new conjugate gradient methods (e.g., see ).
Finally, let us emphasize that contemporary papers often use the Picard-Mann-Ishikawa iterative processes and they make the connection of these kinds of processes with the unconstrained optimization (see [29, 37, 38]).