Numerical comparison of NM, DUSM, and SM.
Newton-type methods with diagonal update to the Jacobian matrix are regarded as one most efficient and low memory scheme for system of nonlinear equations. One of the main advantages of these methods is solving nonlinear system of equations having singular Fréchet derivative at the root. In this chapter, we present a Jacobian approximation to the Shamanskii method, to obtain a convergent and accelerated scheme for systems of nonlinear equations. Precisely, we will focus on the efficiency of our proposed method and compare the performance with other existing methods. Numerical examples illustrate the efficiency and the theoretical analysis of the proposed methods.
- Newton method
- Shamanskii method
- diagonal updating scheme
- nonlinear equations
- Jacobian matrix
A large aspect of scientific and management problems is often formulated by obtaining the values of of which the function evaluation of that variable is equal to zero . The above description can be represented mathematically by the following system of nonlinear equations:
where are vectors and is nonlinear functions for . The above system of equations (1) can be written as
where is continuously differentiable in an open neighborhood of the solution . These systems are seen as natural description of observed phenomenon of numerous real-life problems whose solutions are seen as an important goal in mathematical study. Recently, this area has been studied extensively [2, 3]. The most powerful techniques for handling nonlinear systems of equations are to linearize the equations and proceed to iterate on the linearized set of equations until an accurate solution is obtained . This can be achieved by obtaining the derivative or gradient of the equations. Various scholars stress that the derivatives should be obtained analytically rather than using numerical approach. However, this is usually not always convenient and, in most cases, not even possible as equations may be generated simply by a computer algorithm . For one variable problem, system of nonlinear equation defined in (2) represents a function where is continuous in the interval .
Definition 1: Consider a system of equations ; the solution of this system in one variable, two variables, and variable is referred to as a point such that .
In general, the problem to be considered is that for some function , one wishes to evaluate the derivative at some points , i.e.,
This often used to represent an instantaneous change of the function at some given points .
Definition 2: For a function that is smooth, then there exists, at any point , a vector of first-order partial derivative or gradient vector:
The Taylor’s series expansion of the function about point is an ideal starting point for this discussion .
Definition 3: Let be a differentiable function; the Taylor’saround a point is the infinite sum:
However, continuous differentiable vector valued function does not satisfy the mean value theorem (MVT), an essential tool in calculus . Hence, academics suggested the use of the following theorem to replace the mean valued theorem.
Theorem 1: Let be continuously differentiable in an open convex set . For any
Definition 4: Suppose is continuously differentiable at the point and each component function is also continuously differentiable at ; then the derivative of is defined as
Also, if is differentiable at point , then the affine function is a good approximation to near in such a way that
If all the given component functions of are continuous, then we say the function is differentiable.
The most famous method for solving nonlinear systems of equations is the Newton method which generates a sequence from any given initial point via the following:
where is the Jacobian for . The above sequence Eq. (5) is said to converge quadratically to the solution if is sufficiently near the solution point and the Jacobian is nonsingular [7, 8]. This convergent rate makes the method outstanding among other numerical methods. However, Jacobian evaluation and solving the linear system for the step are expensive and time-consuming . This led to the study of different variants of Newton methods for systems of nonlinear equations. One of the simplest and low-cost variants of the Newton method that almost entirely evades derivate evaluation at every iteration is the chord method. This scheme computes the Jacobian matrix once throughout the iteration process for finite dimensional problem as presented in Eq. (6),
The rate of convergence is linear and improves as the initial point gets better. Suppose is sufficiently chosen near solution point and is nonsingular; then, for some , we have
The convergence theorems and proof of Eq. (7) can be referred to [9, 10]. Motivated by the excellent convergence of Newton method and low cost of Jacobian evaluation of chord method, a method due originally to Shamanskii [11, 12] that lies between Newton method and chord method was proposed and has been analyzed in Kelly [9, 13, 14, 15]. Other variants of Newton methods with different Jacobian approximation schemes include [9, 14, 16, 17, 18]. However, most of these methods require the computation and storage of the full or approximate Jacobian, which become very difficult and time-consuming as the dimension of systems increases [10, 19].
It would be worthwhile to construct a derivative-free approach and analyze with existing techniques [20, 21, 22]. The aim of this work is to derive a diagonal matrix for the approximate Jacobian of Shamanskii method by means of variational techniques. The expectation would be to reduce computational cost, storage, and CPU time of evaluating any problem. The proposed method works efficiently by combining the good convergence rate of Shamanskii method and the derivate free approach employed, and the results are very encouraging. The next section presents the Shamanskii method for nonlinear systems of equations.
2. Shamanskii method
It is known that the Newton method defined in Eq. (2) converges quadratically to when the initial guess is sufficiently close to the root [7, 10, 19]. The major concern about this method is the evaluation and storage of the Jacobian matrix at every iteration . A scheme that almost completely overcomes this is the chord method. This method factored the Jacobian matrix only once in the case of a finite dimensional problem, thereby reducing the evaluation cost of each iteration as in Eq. (3) and thereby degrading the convergence rate to linear .
Motivated by this, a method due originally to Shamanskii  was developed and analyzed by [7, 13, 14, 16, 24]. Starting with an initial approximation , this method uses the multiple pseudo-Newton approach as described below:
after little simplification, we have
This method converges superlinearly with -order of at least when the initial approximation is sufficiently chosen near the solution point and is nonsingular. This implies that there exists , such that
Combining Eq. (7) and the quadratic convergence of Newton method produces the convergence rate of the Shamanskii method as in Eq. (8). Thus, the balance is between the reduced evaluation cost of Fréchet derivative and Jacobian computation for Shamanskii method and Newton method rapid convergence. This low-cost derivative evaluation as well as the rapid convergence rate of several methods including the Shamanskii method has been studied and analyzed in [13, 15]. From the analysis, the researchers conclude that that Shamanskii method has shown superior performance compared to Newton method in terms of efficiency whenever work is measured in terms of function evaluations . Also, if the value of is sufficiently chosen, then, as the dimension increases, the performance of the Shamanskii method improves and thus reduces the limit of complexity of factoring the approximate Jacobian for two pseudo-Newton iterations . Please refer to  for the proof of the convergence theorem described below.
Theorem 2 : Let conform hypotheses H, H, and H. Then the solution point is a point of attraction of the Shamanskii iteration, i.e., Eq. (10), and this method possesses at least cubic order of convergence.
3. Diagonal updating scheme for solving nonlinear systems
Evaluation or inversion of the Jacobian matrix at every iteration or after few iterations does not seem relevant even though the computational cost has generally been reduced as in Shamanskii method [14, 25, 26, 27, 28]. As a matter of fact, it can be easily shown that by adding a diagonal updating scheme to a method, we would have a new low memory iterative approach which would approximate the Jacobian into nonsingular diagonal matrix that can be updated in every iteration [29, 30, 31]. Indeed, using the Shamanskii procedure, the proposed method avoids the main complexity of the Newton-type methods by reusing the evaluated Jacobian during the iteration process. This is the basic idea of the Shamanskii-like method which is described as follows.
Given an initial approximation , we compute Eq. (2) to obtain the Jacobian and present a diagonal approximation to the Jacobian say as follows:
Suppose and ; by mean value theorem (MVT), we have
Since is the update of diagonal matrix , let us assume satisfy the weak secant equation:
which would be used to minimize the deviation between and under some norms. The updated formula for follows after the theorem below:
Theorem 3: Suppose is the update of the diagonal matrix and , . Consider the problem
where , , and is the trace operation.
Proof: It is known that the objective function and the constraint of Eq. (16) are convex; thus, we intend to use its Lagrangian function to obtain the unique solution as follows:
where is the corresponding Lagrangian multiplier. Simplifying Eq. (18), we have
Also, for diagonal matrix , the element of the diagonal component is given as, and the component of the vector is . Then . To complete the proof, we rewrite Eq. (20) as follows:
This completes the proof.
Now, from the above description of the theorem, we deduce that the best possible diagonal update is as follows:
However, for possibly small and , we need to define a condition that would be applied for such cases. To this end, we require that for some chosen small . Otherwise, we set the updated diagonal where is defined as
Thus, the proposed accelerated method is described as follows:
The performance of this proposed method would be tested on well-known benchmark problems employed by researchers on existing methods. This would be carried out using a designed computer code for its algorithm. The problems could be artificial or real-life problems. The artificial problems check the performance of any algorithm in situation such as point of singularity, function with many solutions, and null space effect as presented in Figures 1–3 [7, 32].
While the real-life problems emerge from fields such as chemistry, engineering, management, etc., the real-life problems often contain large data or complex algebraic expression which makes it difficult to solve.
4. Numerical results
This section demonstrates the proposed method and illustrates its advantages on some benchmark problems with dimensions ranging from 25 to 1,000 variables. These include problems with restrictions such as singular Jacobian or problems with only one point of singularity. To evaluate the performance of the proposed diagonal updating Shamanskii method (DUSM), we employ some tools by Dolan and Moré  and compare the performance with two classical Newton-type methods based on the number of iterations and CPU time in seconds. The methods include:
The Newton method (NM)
The Shamanskii method (SM)
These tools are used to represent the efficiency, robustness, and numerical comparisons of different algorithms. Suppose there exist solvers and problems; for each problem and solver they define:
Requiring a baseline for comparisons, they compared the performance on problem by solver with the best performance by any solver for this problem using the performance ratio:
We suppose that parameter for all is chosen and if and only if solver does not solve problem . The performance of solvers on any given problem might be of interest, but because we would prefer obtaining the overall assessment of the performance of the solver, then it was defined as
Thus, was the probability for solver that a performance ratio was within a factor of the best possible ratio. Then, function was the cumulative distribution function for the performance ratio. The performance profile for a solver was nondecreasing, piecewise, and continuous from right. The value of is the probability that the solver will win over the rest of the solvers. In general, a solver with high value of or at the top right of the figure is preferable or represents the best solver.
All problems considered in this research are solved using MATLAB (R 2015a) subroutine programming . This was run on an Intel® Core™ i5-2410M CPU @ 2.30 GHz processor, 4GB for RAM memory and Windows 7 Professional operating system. The termination condition is set as
and the program has been designed to terminate whenever:
The number of iterations exceeds 500, and no point of satisfies the termination condition.
The CPU time in seconds reaches 500.
Insufficient memory to initiate the run.
At the point of failure due to any of the above conditions as in the tabulated results, it is assumed the number of iteration and CPU time is zero and thus that point has been denoted by “.” The following are the details of the standard test problems, the initial points used, and the exact solutions for systems of nonlinear equations.
Problem 1 : System of nonlinear equations
Problem 2 : Systems of nonlinear equations
Problem 3 : Structured exponential function
Problem 4 : Extended trigonometric of Byeong-Chun
Problem 5 : Extended spare system of Byeong
Table 1 shows the number of iterations (NI) and CPU time for Newton method (NM), Shamanskii method (SM), and the proposed diagonal updating method (DUSM), respectively. The performance of these methods was analyzed via storage locations and execution time. It can be observed that the proposed DUSM was able to solve the test problems perfectly, while NM and SM fail at some points due to the matrix being singular to working precision. This shows that the diagonal scheme employed has provided an option in the case of singularity, thereby reducing the computational cost of the classical Newton-type methods.
This chapter proposes a diagonal updating formula for systems of nonlinear equations which attributes to reduction in Jacobian evaluation cost. By computational experiments, we reach the conclusion that the proposed scheme is reliable and efficient and reduces Jacobian computational cost during the iteration process. Meanwhile, the proposed scheme is superior compared to the result of the classical and existing numerical methods for solving systems of equations.