On Non-Linearity and Convergence in Non-Linear Least Squares

To interpret and explain the mechanism of an engineering problem, the redundant observations are carried out by scientists and engineers. The functional relationships between the observations and parameters defining the model are generally nonlinear. Those relationships are constituted by a nonlinear equation system. The equations of the system are not solved without using linearization of them on the computer. If the linearized equations are consistent, the solution of the system is ensured for a probably global minimum quickly by any approximated values of the parameters in the least squares (LS). Otherwise, namely an inconsistent case, the convergence of the solution needs to be well-determined approximate values for the global minimum solution even if in LS. A numerical example for 3D space fixes coordinates of an artificial global navigation satellite system (GNSS) satellite modeled by a simple combination of firstdegree polynomial and first-order trigonometric functions will be given. It will be shown by the real example that the convergence of the solution depends on the approximated values of the model parameters.


Introduction
There are two main computing classes, these are hard and soft computing. Scientists and engineers generally prefer the first-class computing because they can easily establish an explicit mathematical relationship between the model parameters and their data (observations), not in the second class. The relationships between the parameters and data can be linear or nonlinear.
To overcome complicated real-life problems whose mathematical models are not known, the soft computing techniques have been developed in the last decades. We can count well-known techniques, some as artificial neural network (ANN), artificial intelligence (AI), machine learning (ML), deep learning (DP), fuzzy logic (FL) and genetic algorithms (GA) [16][17][18]. The techniques inspired by the human intelligence and learning processes can be very timeconsuming according to the data given in run due to their processing based on the trial-anderror method. If these techniques are roughly defined, data (experimental outcomes and observations) are separated into two parts in them, learning (or training) data and test data. Mathematical (functional and/or stochastic) relations between data and model parameters are learned from the learning data. The handled model is tested by means of the test data. After that, the trained and developed model, if meets expectations, is used to estimate for producing unobserved data for the scientific (or engineering) problems [16][17][18].
In the soft computing techniques, the linear algebra is also a very effective tool to solve the problem as in the hard computing ones. For this reason, we should take a short overview on linear algebra used in science and engineering [16][17][18].

Linear algebra and objective functions
Linear algebra has two basic problems. A solution of linear equations system is one of them; the other is the eigendecomposition. In this chapter, we will use both of them upon a linear equation system as a combined form (Eqs. (8)- (11)) in which we will solve the linear equations system by means of the singular value decomposition related with the eigendecomposition (or the matrix diagonalization) [8,9,13,14].

Suppose an estimated unknown vector
δ (in interested model) and an experimental data (or observations which are stochastic variables) vector y n ¼ b y À b ε [in which an estimated data and error (residual) vectors are in order of b y and b ε] by an objective function and their covariance matrices Σx ¼ Σ x ¼ b σ 2 0 Q x (for the unknowns) and Σ y ¼ σ 2 0 P À1 (for the data), respectively, with a priori variance σ 2 0 and a posteriori variance b σ 2 0 . Note that b x is a nonstochastic vector before estimation, where an approximated values vector is x for b x (hat-sign "^" shows an estimated value for interested parameter according to an objective function). In addition, n, m and u are the observation number, the equation number and the unknown number, respectively. Start with a linear or nonlinear functions vector f m b y; b x ð Þ ¼0, we can have a linear mathematical model with a weight matrix (P ¼ σ 2 0 Σ y À1 ) of the observations for m ¼ n: the error in variable solution as in total least squares (TLS) method can be preferred. Therefore, f m b y; b x ð Þ ¼ 0 (for m 6 ¼ n) should be differenced as following: where : Most of science and engineering problems can be modeled as Therefore, the functional model named as indirect adjustment method in the adjustment literature [3][4][5][6][7] in geomatics engineering has been preferred in the chapter. The weight matrix (P n, n ) of observations (stochastic variables) would be accepted as a unit matrix P n, n ¼ I n, n in here for simplicity.
The second-degree objective function is L 2 -norm estimation which is known as least squares (LS) method and widely used in hard and soft computations.
The last-degree objective function is L ∞ -norm estimation which is known as minmax method. In fact, the soft computing techniques use this objective while it applies the trial-and-error method in their learning stages. Eq. (1) under L 1 -norm and L ∞ -norm is also solved by means of linear programming methods, for this reason; the methods may give several solutions (as being in trial-and-error method) to any interested problem [10,11].

Rank deficiencies in linear models
While a rank is a number that indicates a linear independent column, the number of the coefficient matrix of unknowns in a linear equation system, a rank deficiency represents a linear dependent column number (if it is smaller than the row number) of the coefficient matrix. Inconsistency in the solution stage of a linear equation system results from the (rank) deficiencies.
Defining the rank of A n, u by rank A ð Þ ¼ r, a condition r ≤ u ≤ min m; n ð Þ is always satisfied. In general, n ¼ m in well-known (or the indirect) LS used in many scientific problems.
Denoting the rank defect d letter, we can define two type defects [12].
Objective functions are used to remove the surjectivity defect d s occurred by the redundant observations. The injectivity defect d i can consist of three reasons in the estimation problem [12].
Datum defects (d-defects) are closely related to the origin of the spatial system. The defect arises if the data do not carry any information to cover the absolute spatial position of the problem given.
Configuration (Design) defects (c-defects) occur from weak geometric relation among data and unknowns. To avoid the defect, we can be careful and planned when picking data (whose interval or/and place) and choosing the consistent mathematical model (can use auxiliary variables instead of original ones).
Ill-conditioning defects (i-defects) arise from the large intervals among the elements of the coefficient matrix of unknowns. Norming the matrix can reduce ill-conditioning defects but cannot remove it fully. I-defects and c-defects cannot be separated from each other easily [12].
The defects lead to the failure of any given problem to be solved properly. Since the unknown coefficient matrix cannot be inverted by regular (ordinary) inverse methods, we should use pseudo inverse to overcome the effects of the defects [8,9,[13][14][15]. Eigenvalue and singular value decompositions can be used effectively for the pseudoinverse. Denoting a positive definite symmetric matrix N (that is always satisfied for N ¼ A T A or N ¼ A A T ), its pseudoinverse is: Therefore, we can use pseudoinverse safely in any given problem [8,9,[13][14][15]].

Hard computing
Linearizing from nonlinear functions to their linear form by means of Taylor expansion, a linear equation system is to be handled as Eq. (1). To avoid complicated proofs in the solution of an equation system, the simplified mathematical model can be written in the following (statically rotation invariant [1]) numerical computation form.
A n, u δ u ¼ l n , Meet two states to solve Eq. (8), n ≤ u and n > u. The solution for the former state n ≤ u is achieved by means of auxiliary variables vector λ n which can be defined as δ u ¼ A T u, n λ n . In fact, the auxiliary vector λ n is named as a Lagrange multipliers vector or an eigenvalues vector in a homogenous equations system in which l ¼ 0 for Eq. (8) [9].
And then δ and its variance-covariance matrix if we know the statistical uncertainty of observations (Σ l ¼ Σ y Þ are calculated by Eq. (10) and the low error propagation, respectively. We can only calculate the variance-covariance matrix of estimations as in Eq. (10) due to b x will be the smallest at end of the solution.
Solution to the second state n > u is a situation encountered in many scientific and engineering problems. Multiplying both sides of Eq. (8) by A T u, n the normal equation system is established and solved with Eq. (11): End the solution if the condition ensured is max j b δj <¼ thres ¼ 5e À 12; otherwise, continue On Non-Linearity and Convergence in Non-Linear Least Squares http://dx.doi.org/10.5772/intechopen.76313 Relationships between nonlinearity and LS in a multidimensional surface have been shown by Teunissen et al. [1,2]. The authors argued the relation on some simple examples and gave some analytical solutions for them. But, they highlighted that those types of analytical solutions have not been given for every problem and emphasized that suitable Taylor expansions have been useful to the solution not being transformed into the analytical ones.

Geometry of a combination of polynomial and trigonometric functions
These type functions can be used in defining the orbits of artificial satellites (and celestial bodies). Also, the numerical example part of this chapter, to estimate those type functions, will be inspected and applied on a real example. To foresee a model for any problem we should interpret the model parameter and comprehend the geometry of the model (Figure 1).
With respect to independent variable time t, a combination function of p ¼ 1 degree polynomial and order q ¼ 1 trigonometric function(s) [a combination of polynomial degree and trigonometric order (CPT)] to be estimated in the chapter is: where t j ; ϕ j are data given. In Eq. (12), translation a ϕ and slope b ϕ are elements of a line equation which is a first-order polynomial of CPT function. The other model parameters in the trigonometric part of Eq. (12) are defined as an amplitude c ϕ , and an initial phase d ϕ and a frequency (or angular velocity) e ϕ ¼ 2π=T ϕ (a period T ϕ ) of a wave ( Figure 1). In this chapter, the functions ϕ j are the coordinate components X j ; Y j ; Z j È É incoming from a precise orbit file and the geometric distances as a function of the components. However, nonperiodic earth-fixed coordinates (GR) in the SP3 file should be transformed to the periodic space-fixed coordinates (Υ); why is Eq. (12) is suitable for the space-fixed coordinates, not earth-fixed ones (as seen from Figure 3 in the numerical example part) ( Figure 2)?
For this propose, an easy transformation into any epoch (e.g., it can be taken as the first epoch t 0 of the data) is carried out by: where w E and R 3 are in order of the angular velocity of earth and well-known orthogonal rotation matrix around the third axis ( Figure 2).  On Non-Linearity and Convergence in Non-Linear Least Squares http://dx.doi.org/10.5772/intechopen.76313 We can use a recursive solution for Eq. (14) instead of the batch solution as Eq. (11) because of its solution velocity.
Continuation of the solution of Eq. (15) can be performed according to Eq. (11). The model given by Eq. (12) is a simple model to determine the satellite orbit motions. For more complicated models, the readers can utilize [19][20][21][22][23][24][25] resources.  Table 1 in which they are ordered from the nearest satellite to the farthest one.

Numerical example
The motions of the CPT functions estimated satellites (in Table 1) with respect to earth-(left column of Figure 3) and space-fixed (right column of Figure 3) coordinate systems are demonstrated in Figure 3. Moreover, coherence between the estimated CPT function (black solid line) of the C06 satellite and its data points given (colorful circles) is represented in detail in Figure 4.
We know that accuracy of precise SP3 file coordinates is about σ 0 ¼ AE5 cm. If we compare the value with its estimations given in Table 1, we can say that our predicted model is not meet our demands. We should expand the model by raising the degree of polynomial part or/and order of trigonometric part of CPT functions. In fact, we can readily see that the projected model with Eq. (12) will never cover the data. The model is only chosen for this chapter. The more suitable model established on Keplerian orbital elements can be found in the orbit determination literature and in [18][19][20].
Comparing the solution velocities (from the iteration numbers with respect to 5e À 12 threshold in Table 1) in different platforms, we can say that the solution velocities in 64Bit Python are generally better then 32Bit ones.
If we chose the threshold as 51e-13, we can see the distinctions of solution convergences between 64Bit and 32Bit running on the estimations of C06 satellite from 1000 (X), 8 (Y), 7 (Z), 1000 (S) in 32Bit C++ and 10 (X), 8 (Y), 6 (Z), 28 (S) in 64Bit Python in Windows. In here, 1000 is the maximum iteration number. If the mathematical model would be more complicated and its data number would be bigger than the number used in the example part, we would see the state more prominently.      Table 1.
Maximum iteration number and threshold loop elements are iter MAX ¼ 1000 and thres ¼ 5e À 12 to break the iteration loop.  Table 1) of the CPT functions for the coordinates of the C06 satellite are not statically equal to their expected values (σ 0 ¼ AE5 cm), the CPT model should be expanded. As an example, three more unknowns are added to the model given in Eq. (12) The added terms represent an amplitude f ϕ , an initial phase g ϕ and a frequency h ϕ of a new wave carried by first (sine) wave. After the first estimation with respect to Eq. (12), we can choose the approximate values of the new parameters as The approximate value vector of the expanded model by a new wave is:  Condition numbers computed from a rate of maximum and minimum eigenvalues or a rate of singular values under LS are very effective tools for determining the consistency as well. Therefore, a larger condition number can cause larger iteration number (related to convergence rate), We can see those states from the CPT estimations of the C06 satellite with the iteration (iter) and condition numbers (cond). These are given for X, Y, Z, S as iter = {7, 6, 6, 28} and cond = {3.4e + 13, 2.0e + 12, 2.8e + 12, 1.3e + 09} (Table 1), and as iter = {27, 158, 53, 19} and cond = {9.5e + 14, 2.3e + 13, 3.0e + 13, 2.6e + 10} ( Table 2).

Conclusions
In this chapter, the least squares (LS) estimations of the artificial satellite orbital movements by a combination of polynomial and trigonometric (CPT) functions have been given after a general overview has been made on the hard and soft computations. In practice, the orbital motions are modeled on Keplerian orbital elements. In contrary to this, the coordinate components have been selected for this chapter due to the nonlinear relations of the components and the unknowns which are the elements of CPT functions. The relations cause inconsistencies in the LS solutions. The inconsistencies result from the two injectivity defects, c-defects and i-defects. We can readily see the defects from the differences of the convergence rates (in other words the iteration numbers) in different computer platforms and architectures as shown in the chapter. The defects are not fully removed as long as not change the mathematic models. However, we can surpass the effects of those defects in part by means of the pseudoinverse based on the eigendecomposition or the singular value decomposition (SVD) as in here. The For the sake of simplicity for readers, a simple CPT function has been chosen at first. After the initial estimation of the function, the estimated errors vector has been found. We have seen that the errors have had a periodic characteristic in time. So, a new wave defining the error characteristic and been able to carry by the first wave has been planned for expanding the CPTs. It is shown that we can expand a CPT function until ensuring statically equivalency between a priory and a posteriori variances. For instance, one may secure the equivalency of the variances if one would expand more by a new wave in the last estimated model in the same manner.
The convergence rates (upon the iteration numbers) of the LS estimation have been inspected according to the threshold (thres = 5e-12) which is a good value for the estimation of the nonlinear CPT function. An algorithm compiled by different compilers and run in different architectures (with 32 Bit or 64 Bit) changes the convergence rate of the estimations in such as the inconsistent scientific problems. It is also observed that the iteration numbers change when the 64-bit Python software is run on Linux platform which has a different framework than Windows. But, the numbers have not been given in the example part of the chapter. Contrary to inconsistency model, namely in a consistent one, the iteration numbers can take equivalence values in all circumstances. Another way to determine the inconsistency of a model is to obtain its condition number which is computed from a rate of maximum and minimum eigenvalues or of singular values under LS. If the condition number is close to one, the projected model is accepted as a consistent model.
We can use the Soft Computing Methods (SCM) if not an exact mathematical relationship between the data and unknowns. The mathematical model is established by the trial-and-error method in training part of SCM by means of arbitrary weights and activation functions depending on SCM expert forecasts. For the solution of the SCM model during the training, we can use least absolute residuals (LAR) and minmax absolute residuals (MAR) objective functions by the linear programming or the LS estimation as in hard computing method (HCM). In the state, the inconsistency problem can erase whatever the solution method (LAR, MAR or LS) is. The inconstancy can be removed by means of experiences gained from HCMs.
Prior information is very important to select a suitable mathematical model for a scientific problem. For example, comparing a priori variance with a posteriori variance at the end of the estimation is a useful warning to the user to determine the correct mathematical model as seen from the expanded model in the example section of the chapter.
In numerical computation, there are two main phenomena which are the mathematical model (as a combination of functional and stochastic models) and objective function. The solution strategy is of no importance if the same mathematical model and objective function are preferred in the same problem of hard computing. All solution strategies always give same results, only their solution time spans can be distinct from each other ( Table 3).