A Reduced-Order Gauss-Newton Method for Nonlinear Problems Based on Compressed Sensing for PDE Applications A Reduced-Order Gauss-Newton Method for Nonlinear Problems Based on Compressed Sensing for PDE Applications

A global regularized Gauss-Newton (GN) method is proposed to obtain a zero residual for square nonlinear problems on an affine subspace built by wavelets, which allows reducing systems that arise from the discretization of nonlinear elliptic partial differential equations (PDEs) without performing a priori simulations. This chapter introduces a Petrov-Galerkin (PG) GN approach together with its standard assumptions that ensure retaining the q quadratic rate of convergence. It also proposes a regularization strategy, which maintains the fast pace of convergence, to avoid singularities and high nonlinearities. It also includes a line-search method for achieving global convergence. The numerical results manifest the capability of the algorithm for reproducing the full-order model (FOM) essential features while decreasing the runtime by a significant magnitude. This chapter refers to a wavelet- based reduced-order model (ROM) as WROM, while PROM is the proper orthogonal decomposition (POD)-based counterpart. The authors also implemented the combination of WROM and PROM as a hybrid method referred herein as (HROM). Preliminary results with Bratu ’ s problem show that if the WROM could correctly reproduce the FOM behavior, then HROM can also reproduce that FOM accurately.


Introduction
The Newton method for solving square nonlinear problems is one of the most popular techniques used in engineering applications due to its simplicity and fast convergence rate [1][2][3]. However, the quality of the final numerical results is affected by the possible Jacobian's singularity and high nonlinearity. Another drawback is that the method depends on the initial point. Therefore, it is necessary to implement a globalization strategy to get the solution independently of the initial guess. One such approach is the line-search method that relies on a suitable merit function that yields that the iterations progress toward a solution of the problem.
From the numerical point of view, the technique requires solving square linear systems several times, and it is necessary to carry out function evaluations of the order of the problem, as well as first-order information of the square law of the function to compute the Jacobians. In the case of high-dimensional nonlinear problems, the method can overcome the capacity of the computer memory or decrease speed for solving these linear systems, even in the case of a few iterations. One of the current research activities focuses on solving large-scale square nonlinear problems in real time. The purpose of this chapter is to provide an algorithm for solving largescale square nonlinear problems, in real time, while retaining the fast convergence rate. One strategy for addressing such challenges is to characterize an affine subspace, of much lower dimension than the original one, that contains the initial solution and thus reproduces the problem's principal features.
One procedure to characterize the affine subspace consists of solving the full-order model (FOM) in several input points whose solutions are called snapshots, then using a principal component analysis method such as singular value decomposition (SVD) to build an orthonormal basis that spans the snapshots' majority of energy. This oblique subspace, where one seeks a solution, is projected on the original one. Good numerical results have been already reported in the literature [1,[3][4][5][6][7]. But there are still open questions about this procedure as for how and when to choose the snapshots and their number [8,9]. It is important to emphasize that at every picture it is required solving the FOM regardless of its cost. This chapter thus promotes a new strategy that is snapshot free. The approach originated in signal processing and consists of using the notion of wavelets to compress data in a subspace of smaller dimension which retains the majority of the original energy [10,11]. The discrete wavelet's low-pass matrix is used as the affine subspace; then, the optimization is performed in this compressed subspace to obtain a cheaper solution that can decompress to its original size.
2. Reduced-order models using wavelet transformations

Wavelets and data compression
The rationale for using wavelet transformations for model reduction originates from the fields of image processing, image compression, and transform coding. In these fields, massive amounts of information, for example, images, are broadcasted over limited bandwidth communication lines and networks such as the internet. One needs to compress these signals for quick transmission and to diminish storage requirements [10]. In summary, data compression consists of, given a signal x ∈ R n find a lower dimensional signal b x ∈ R r with r ≪ n to broadcast or store those lower dimensional ones b x: The most widely used techniques for data compression are based on wavelets transform. The key to using wavelets is to find a lower dimensional signal b x that relies on a known subspace properly denoted as energy compaction. It is well comprehended that the wavelets tend to accumulate energy in the low-frequency sub-band of the wavelet decomposition [3,[12][13][14]. The energy relates to the L 2 -norm and is defined as e ¼ ∥x∥ 2 ¼ P n i¼1 x 2 i : To demonstrate the energy compaction, consider Figure 1, in which one has the original image x ∈ R 512Â512 : Notice that the upper left quadrant of the wavelet decomposition is a lowdimensional approximation b x ∈ R 256Â256 that is one-fourth the size of the original signal and resembles the low-frequency sub-band wavelet coefficients. Using the previously measured energy, the energy enclosed in b x is 95:75% using just one-fourth of the coefficients. Since b x comprises most of the energy, a simple data compression scheme would execute all of the other wavelet sub-bands to zero and store only the low-frequency information as in Figure 1 (see in the bottom center). By only employing b x, one can reproduce an approximation of x ∈ R n by its generalized inverse of the sub-band compression [10]. Next section will provide details.

Reduced-order models
Let W ∈ R nÂn describe an orthonormal wavelet. The transformation encompasses a low-pass and a high-pass submatrix that is given by By orthogonality, rank W ð Þ ¼ n, rank L ð Þ ¼ r, and rank H ð Þ ¼ s: Now, by orthogonality of W, Choosing L that contains the majority of the energy such that the energy ∥Hx∥ ≈ 0, one has ∥x∥ ≈ ∥Lx∥: This means that the energy of the original data x ∈ R n is approximately equal to the  energy to the compressed data b x ¼ Lx ∈ R r : That is: ∥x∥ ≈ ∥b x∥: The decompressed data are L T b x ¼x ∈ R n : On the other hand, the compressed and decompressed energies are equal. That is ∥b x∥ ¼ ∥x∥ since LL T ¼ I r : Therefore, the original, compressed, and decompressed data are related as follows where x ∈ R n is the original data, b x ∈ R r is the compressed data, andx ∈ R n is the decompressed data. Thus, once an appropriate low-pass submatrix L is determined, one proposes solving the corresponding optimization problem in the reduced affine subspace determined by L T and later coming back to the original size by its generalized inverse.

Statement of the problem
Given a nonlinear function R from R n to R n , find a solution in an affine subspace determined by an initial displacement point x o ∈ R n and an orthonormal base L T nÂr , r < n: That is: find is the subspace generated by the linear combination of the operator L T .

Overdetermined problem
This section formulates this problem by using the overdetermined functions H and ϕ from R r to R n The fact that finding a solution p * of the overdetermined problem draws attention, H p * ð Þ ¼ 0 for p * ∈ R r , is equivalent to finding the solution one is initially seeking. That is: x * ¼ ϕ p * ð Þ and R x * ð Þ ¼ 0 is a solution on the affine subspace. Therefore, one studies the problem by finding a zero residual of the nonlinear least-squares problem associated to H. Problem (4) is called an overdetermined zero-residual problem.

Nonlinear least-squares problem
The residual problem (4) is immediately seen to be equivalent to solving the nonlinear zeroresidual least-square problem

First derivatives for the residual functions R and H
The Jacobian of R at x ∈ R n is given by A direct application of the chain rule, since J ϕ p ð Þ ¼ L T yields: 3.5. First and second derivatives for problem (5) The gradient of each term of the problem is ∇r 2 The second-order information is

Gauss-Newton method
This section presents a Gauss-Newton method to solve the nonlinear problem (5) which is equivalent to solving the overdetermined nonlinear composite function (4). It describes the standard Newton assumptions for this composite function problems that yield q-quadratic rate of convergence. The inconvenience to use Newton method is that the second-order information associated with the Hessian method is not easily accessible or is impractical for computational time. The latter makes the Newton method impractical for very large-scale problems.

Model order-reduction-based Gauss-Newton algorithm
This subsection presents a reduced-order Gauss-Newton algorithm for solving problem (5), which is the interest herein.

Algorithm 1. Reduced-order Gauss-Newton (ROGN)
Inputs: Given the compressed base L T ∈ R nÂr , and an initial displacement x o ∈ R n : Output: Approximate solution in the affine subspace x ∈ R n : 1: Initial point of the problem. Given p o ∈ R r .
2: Initial point in the affine subspace. Remarks: 1. The algorithm presents two initial points. The first x o is the displacement to characterize the affine subspace, and the second one p o is the initial point for the algorithm.

2.
The update ϕ p kþ1 À Á is the approximation to the solution one is looking for which one denotes by x kþ1 .
3. Finding Gauss-Newton direction Δp kþ1 is equivalent to solving the following linear leastsquares problem: 4. The Gauss-Newton direction is the Petrov-Galerkin direction obtained by approximating Newton's direction of square nonlinear problems for the following weighted problem:

Local convergence of reduced Gauss-Newton algorithm
It is known that the Gauss-Newton method retains q-quadratic rate of convergence under standard assumptions for zero-residual single-function problems [15]. The natural question is: What are the standard assumptions that guarantee the Gauss-Newton conditions for the composite function one is working with, that conserve q-quadratic rate of convergence? The next theorem establishes these assumptions.
are orthonormal operators with r < n: Assume there exists a solution p * ∈D ⊂ R r , withD convex and open. Define positive. Then, the sequence p kþ1 È É given ROGN algorithm 1 is well defined, converges, and has q-quadratic rate of convergence. That is: Proof: The residual problem R has solution x * on D. First, one proves that the Jacobian of H is Since the Jacobian of R is Lipschitz on D, one concludes Second, one proves that the Jacobian of H is bounded onD: Since the Jacobian of H at p is Now, since the Jacobian of R is bounded on D, one concludes Finally, one proves that the smallest eigenvalue of J H p * ð Þ T J H p * ð Þ is greater than zero.
Let p 6 ¼ 0 ∈ R k and σ ∈ R be an eigenvector and eigenvalue associated with the last symmetric matrix. Then Therefore, σ > 0 since L T is a full rank and p 6 ¼ 0. The convergence and its fast rate of convergence given by the last two inequalities follow from the Theorem 10.2.1 in the Dennis and Schnabel book [15].

Regularization
Despite the advantages of the Gauss-Newton method, the algorithm will not perform well if either the problem is ill conditioned or in the presence of high nonlinearity of some components of it. The purpose of this section is to introduce two regularizations to overcome these difficulties while retaining the fast rate of convergence of the Gauss-Newton method.

Levenberg-Marquardt method
To prevent the Gauss-Newton algorithm to preclude in case some eigenvalues are near zero or in case of rank deficiency of the linear systems to solve, the least-squares directions are regularized by where μ > 0. The solution is given by Under the standard Gauss-Newton assumptions written before and choosing the regulariza- , the regularized Gauss-Newton algorithm converges and the q-quadratic rate of convergence is retained; see Theorem 10.2.6 [15].

Scaling regularization
To avoid the influence of the high order of magnitude of some components with respect to the rest of the components of the problem, one presents the following regularization: where The solution is given by This regularization prevents that large components of the problem affect the behavior of the algorithm. It is important to observe the Lipschitz constant of the problem is improved. Considering the preceding two regularizations, one has This last regularizations prevent the smallest eigenvalue affecting the behavior of the Gauss-Newton algorithm and at the same time, through rescaling, components with small values are not considered by the influence of large components while retain its fast rate of convergence.

Globalization strategy
The good performance of the Gauss-Newton algorithm depends on a suitable initial point that must be inside its region of convergence. Rather than absorbing the computational cost associated with choosing an appropriate initial point, the chapter proposes a line-search method that provides convergence for initial points outside of the region of convergence. The goal of this approach is to obtain a sufficient decrease in the merit function. If the direction fails, then a backtracking is used until a sufficient reduction is obtained. A merit function should allow moving toward a solution of the problem.

Merit function
It is natural to think that the merit function for the unconstrained minimization problem (5) is itself. That is: M p ð Þ ¼ f p ð Þ.

Descent direction
One proves that Gauss-Newton direction is a descent direction for the merit function M p ð Þ ¼ f p ð Þ.
Property: The regularized Gauss-Newton direction Δp given by (26) is a descent direction for the merit function M p ð Þ ¼ f p ð Þ.
Proof: One proves that the directional derivative of f at the direction Δp is less than zero. The gradient of f p ð Þ is given by ∇f Consequently, it is possible to progress toward a solution of the problem in the Δp direction. The purpose is to find a step length α ∈ 0; 1 ð that yields a sufficient decrease. To that effect one follows the Armijo-Goldstein conditions given by and for fixed values λ, β ∈ 0; 1 ð Þ: The first inequality allows sufficient decrease of the merit function, and the second one avoids step lengths that are very small. It is important to observe that if β is chosen, β ∈ λ; 1 ½ , then the two inequalities can be satisfied simultaneously. Wolfe proved that if f is continuously differentiable on R r , Δp is a descent direction, and assuming the set f p þ αΔp ð Þ ; α ∈ 0, 1 ð f g is bounded below, then there exists an α * ∈ 0; 1 ð such that the two inequalities be satisfied simultaneously [16].
It is important to realize that these two inequalities can be reached by using a back-tracking procedure. Therefore, this work uses a line-search strategy to satisfy the inequalities. Next section proposes a line-search regularized Gauss-Newton algorithm for solving the zeroresidual composite function problem.

A line-search regularized Gauss-Newton method
This section proposes the following regularized Gauss-Newton method with line search to find a solution on the affine subspace x o þ η L T À Á for problem (4).

Algorithm 2: A reduced-order regularized Gauss-Newton (RORGN)
Input: Given the compressed base L T ∈ R nÂr , and a displacement x o ∈ R n .
Output: The approximate solution in the affine subspace x ∈ R n : 1: Initial point of the problem. Given p o ∈ R r : 2: Initial point in affine subspace.

5: Regularized Gauss-Newton direction. Solve for Δp
6: Line search (sufficient decrease). Find α k ∈ 0; 1 ð such that Remarks: The algorithm is amenable to the use of any suitable basis, not necessarily a wavelet basis. The algorithm can be tested with different initial displacement points. On the other hand, the election of the initial point of the algorithm is not limited to the origin.

Numerical examples
The authors run on a MacBook Pro laptop equipped with an Intel(R) Quad-Core(TM) i7-2720QM CPU @ 2.20GHz and 8 GB of RAM. Section 8.2 presents Bratu's 3D problem. This problem is more challenging since the nonlinear systems become ill conditioned where one approaches the bifurcation point. Therefore, the RORGN algorithm was tested to solve them efficiently. All these Bratu's problems utilized wavelets-based ROM. One takes the regularization by μ k ¼ σ k kR x k ð Þk, with σ k ∈ 0; 1 ð Þ. One employs as stopping criteria for the algorithm; either the norm of the residual, e k ¼ ∥R x k ð Þ∥, is less than some small positive real value given, e, or a maximum number of iterations reached, k max .

Bratu's 2D problem
The Bratu's 1D equation can be generalized by replacing the second derivative by a Laplacian [1,17]. This section numerically studies the nonlinear diffusion equation with exponential source term in two and three dimensions. Let Ω ¼ 0; 1 ½ n , n ¼ 2, 3 be a unitary square or cube, where x i ∈ 0; 1 ½ , i ¼ 1, …, n are the spatial variables while n is the space dimension.
and ζ ∈ R is a coefficient. The Laplacian is defined by One can discretize (32) by means of central finite differences on regular tensor product meshes. Homogenous Dirichlet boundaries are enforced conditions in all square or cube faces, see [17] for details.  Figure 2 shows the parameter continuation problem while Figure 3 compares FOM and WROM results in the whole domain. For visualization purposes, one cuts away the front half of the cube to see inside it. Figure 2 shows the FOM in continuous line while dashed blue and magenta lines correspond to WROM at 85 and 90%, respectively. The mesh size is 10 Â 10 Â 10 and ζ ¼ 1:5 is fixed. In the figure, one has from left to right the FOM, the WROM, and the absolute error. Neither of these WROM models could reproduce the FOM behavior beyond ζ > 9. They could not get into neither the second branch nor close to the bifurcation point, where the system becomes highly nonlinear. On the other hand, Figure 3 compares FOM versus WROM at 90% in order to show that these WROM could properly reproduce the FOM behavior in the whole cube. Table 1 summarizes the performance of the family of Gauss-Newton algorithms applied to FOM Bratu's 3D problem. One employs these numerical values, e tol ¼ 10 À3 and k max ¼ 32. Once again, one gets close to the bifurcation point by choosing, ζ ¼ 9:9, to pose a challenging nonlinear system while σ was tuned to achieve performance for a given rank.

Bratu's 3D problem
One observes for all ranks reported herein that the regularized method provides convergence tolerances likewise but it usually spent two iterations less than standard Newton and hence CPU time reduces as well. One notices for this particular problem that line search is equivalent to standard Newton as well as combined matches the performance of the combined plus linesearch method.

Nonlinear benchmark problems
One also considers a benchmark nonlinear problem from the literature in order to challenge the proposed algorithms. The Yamamura [18] problem is a nonlinear system of equations defined by:  where n is the size of the nonlinear system. One implemented the algorithms with e tol ¼ 10 À6 , k max ¼ 128, and σ ¼ 0:025.
The objective is to challenge the algorithms presented in this research, and the results are reported in Table 2. One can infer from this table that the standard Newton method could not converge in any of these realizations except n ¼ 32. Conversely, the regularized method converged for all realizations. This latter method outperformed all others. On the other hand, the regularized and line-search method could consistently converge for all realizations but n ¼ 2048. For larger ranks, that is, 512 and 1024, the latter method is the more efficient bottom line; the regularized method performed well in this example.

Hybrid method: HROM
The idea is simple since the wavelet subspace is not a function of a priori known snapshots, it can be determined without executing the so-called, computationally expensive, off-line stage, in which one thoroughly studies the FOM and can sample the input space to record a representative set of snapshots. HROM considers as input snapshots those outputted by a WROM procedure. As shown below, if this WROM happens to reproduce the FOM behavior correctly, then one should expect the resulting PROM, that is, HROM, to replicate the original FOM behavior accurately. At first, glance, depending upon the WROM compression ratio, this procedure reduces the runtime of the off-line stage.
This section conducts a series of preliminary numerical experiments on the well-known Bratu's nonlinear benchmark problem, in particular in one and two dimensions [5] to sell this case. Figures 4 and 5 depict results for the 1D continuation problem. They utilize the following WROM compression ratios: 10 and 5%, where the compression ratio is constant during the continuation problem. All plots display two distinct HROM compression ratios. The WROM at 20% that is not depicted completely misses the bifurcation zone and the second branch thus the HROM is also way off. However, as the WROM starts to catch up with the FOM then HROM too does. Indeed, it is observed that HROM yields comparable results when comparing it to the version that takes the original snapshots, that is, PROM.
One can repeat a similar experiment with Bratu's 2D continuation problem, which produces the same trend as before. Indeed, if the input WROM is way off targeting then, HROM is off as well. For instance, 27% compression implies that all HROM miss the second branch, but still, the 21% model could slightly reproduce the proper FOM trend. Things significantly improve on models with 21 and 20% as shown in Figures 6 and 7. However, these models still miss the bifurcation zone. They just render a flat profile there. These insights suggest that if the WROM can correctly reproduce the FOM behavior, then HROM can do so. One is probably able to improve the accuracy of the WROM by changing the compression ratio during the online stage. For Bratu's problem, one can assume a graded energy distribution which provides more of it while approaching the bifurcation point. This approach is referred as "adaptive WROM" or AWROM as shorthand.  This section presents an alternative strategy that can rely on insights from the FOM, such as Newton tolerances and number of iterations, which are in turn indirect error estimates. Figure 8 plots the energy distribution that was utilized as a function of the continuation parameter, ζ, for Bratu's 1D problem. When one approaches the bifurcation point, ζ ¼ 3:5, one should gradually bump up energy as shown. Notice that the distribution tends to concentrate more points toward the bifurcation point. With this energy distribution into account, one obtains the AWROM results in Figure 9. This ROM accurately reproduces the FOM behavior as noted. Let now conduct the following experiment. One must run the FOM, and at every snapshot, one needs to store the number of Newton iterations and the resulting error tolerance.   One then computes the energy distribution that Figure 10 depicts. The following formula was employed to do so: where nIters i is the number of iterations at the current location, and nMaxIters is the maximum number of iterations reported by the FOM and θ ∈ 0; 1 ½ . Figure 11 depicts excellent accordance  between FOM and AWROM θ ¼ 0:2 ð Þ . One should expect that if this last AWROM model is inputted for an HROM simulation, HROM should reproduce the original FOM correctly. Figures 12 and 13 depict preliminary results of HROM applied over a couple of AWROM models whose energy distribution was described in Figures 8 and 10, respectively θ ¼ 0:2 ð Þ . These ROM reproduce the FOM behavior accordingly, which proves that there is potential to study the performance of HROM further. Another important question that arises from further research is how to improve the compression ratio of the AWROM scheme.

Concluding remarks
This chapter introduced a global regularized Gauss-Newton method for resolving square nonlinear algebraic systems in an affine subspace that enable real-time solutions without a priori simulations. Solving a nonlinear least-squares composite function poses the problem where the answer is derived from the inside argument. The authors thus presented the standard Newton assumptions that guarantee a q-quadratic rate of convergence. The findings include that the Petrov-Galerkin projection directions for the Newton method are no other than the Gauss-Newton ones for a composite function. The technique uses two initial points, one that determines the affine subspace and the other is the starting guess for solving the composition mentioned earlier. The notion of compressed sensing with wavelets produces the characterization of an affine subspace that comprises the majority of the energy of the problem. The chapter showed some numerical experimentations that back up the proposed globalization methodology for solving highly nonlinear dynamic systems in real time. These last ones reproduce the principal features of the FOM. Results underline the fact that one does not need to employ information at any particular point. The Bratu's 3D FOM results prove that the proposed RORGN algorithm outperforms the standard GN method while retaining its q-quadratic rate of convergence. This chapter concludes that the regularized and line-searchenabled scheme is the most robust and efficient algorithm for the problems presented herein.
The numerical results imply that this approach performs well and it does not significantly increase the CPU time.
From the numerical results presented in Section 9 for Bratu's 1D and 2D problems, the data fusion procedure (HROM) can be used as an alternative procedure when the simulation time for a problem can be limited. This method uses two different sceneries. In the first, one implies that the data come through out of the model, while in the second, the data are obtained independently of the latter. That is, in the first case the model governs the data, and the other the input information rules the model.