Effective Algorithms for Detection Outliers and Cycle Slip Repair in GNSS Data Measurements

The chapter describes effective algorithms that are often used in processing data measurements in Global Navigation Satellite Systems (GNSSs). Existing effective algorithm was developed for detection and elimination of outliers from GNSS data measurements. It is based on searching for a so-called optimal solution for which standard deviation and maximum absolute deviation of the measured data from mean values do not exceed specified threshold values, and the number of the detected outliers is minimal. A modification of this algorithm with complexity of Nlog2N is discussed. Generalization of the existing algorithm to the case when data series included some unknown trend will be presented. The processing trend is assumed to be described by an unknown function of time. The generalized algorithm includes the outlier detection algorithm and trend searching algorithm that has been tested using simulated data. A new algorithm will be presented for cycle slip repair using Melbourne-Wübbena linear combination formed from GNSS data measurements on two carrier frequencies. Test results for repair data in the case of multiple (cascade) cycle slips in actual observation data will also be presented in this chapter.


Introduction
At present, five constellations of GNSS satellites are involved in the formation of observational data, which serve as a source for many applications related to navigation, geodesy, geodynamics, and in the performance of solving of many fundamental problems. These are American Global Positioning System (GPS), Russian Global Navigation Satellite System (GLONASS), European Galileo, Chinese BeiDou, and Japanese Quasi-Zenith Satellite System (QZSS). The satellites of each of the operating systems transmit signals, as a rule, on two L-band carriers, which are received by GNSS receivers. A large number of stations equipped with GNSS receivers and located around the World are part of the International GNSS Service (IGS) network. These stations generate observation data files and transmit them to international databases in real time [1,2], after which these data become available for use by many institutions and laboratories over the World. When solving applications, the measurement data go through various processing steps. Significant element of the data processing is the detection of rough measurements and removal them from the further processing. Despite the fact that many of the laboratories use a high-end application of the software regarding accuracy, reliability, and robustness, the presence of rough measurements in the observational data excludes the possibility of obtaining an accurate final result. In order to obtain results of unprecedented accuracy, the measurement data must be cleared of coarse measurements or outliers. It should be noted that the concept of outliers is key in the measurement processing theory [3], and there is no general definition for it. In order to distinguish outliers from the rest of the measured data, in some cases, the deviation of the data series values from some average value of the data is considered. If the deviation from the average is exceeded by a predetermined threshold value, the measured value is considered as an outlier. Such an approach has a significant disadvantage that the exact mean is generally unknown, and the estimate obtained by averaging a series may be very inaccurate due to outliers. Existing iterative procedures are also based on the idea of calculating deviation from the average and often result in the unjustified rejection of many observations. Reducing the data involved in processing may, in turn, result in a loss of accuracy of the final result.
This chapter describes the outliers cleaning algorithm for GNSS data. The proposed algorithms are based on the search for the so-called optimal solution with the minimum amount of invalidly rejected data. The algorithm for accelerated detection of outliers in a large amount of measurements has been developed, as well as an algorithm for detecting outliers in data containing an unknown trend. In conclusion, the algorithm of jump detection in the Melbourne-Wübbena combination [3][4][5], including the developed procedure of cleaning data from outliers, is considered.
In Section 2, the problem of searching for the so-called optimal solution is formulated. Section 3 provides a search algorithm, with a common number of arithmetic operations not exceeding $N 2 . Section 4 presents the test results for actual measurements in global navigational satellite systems at two carrier frequencies. The searching of outliers was performed in the Melbourne-Wübbena combination. In Section 5, the assertions that are the mathematical prerequisites for justifying a fast outlier search algorithm are proved. In Section 6, the fast outlier detection algorithm with the number of arithmetic operations of Nlog 2 N is proposed. Section 7 describes the case of data with unknown trend. Iterative procedure of outlier search is proposed based on the finding of suitable trend approximation in polynomials class. The idea of excluding coarse measurements is based on finding a so-called minimizing set of measurement data of a given length. This distinguishes the proposed algorithm from known similar procedures in which outliers are detected by exceeding a preset threshold. The test results with simulated data are given. Sections 8 and 9 discuss the problem of detecting jumps in the Melbourne-Wübbena combination. An algorithm is proposed that includes the outlier cleaning procedure based on the search for the optimal solution. Section 10 shows the numerical calculations with real data for algorithms presented in Sections 3-9. Section 11 concludes the chapter.

Statement of the problem: a Melbourne-Wübbena combination
Often, measurements y j taken at moments of time j can be presented in the form: where f j is a trend function, depending on physical process and as a rule is unknown, the sum z þ ξ j is unknown random variable imposed on the trend with unknown constant z and a centered random variable ξ j .
Detection of outliers in data series expressed in Eq. (1) with unknown trend is uncertain since the concept of measurement or outliers itself is uncertain. In many cases, however, the trend function is known a priori.
For example, many data processing programs often use different linear combinations formed of code and phase measurement data to eliminate unknown parameters. One such combination is the Melbourne-Wübbena combination composed of both, carrier phase and code observables as described by Melbourne [4] and Wübbena [5]. This combination eliminates the effect of the ionosphere, the geometry, the clocks, and the troposphere [3], and it is often used to detect loss of carrier phase capture in the preprocessing stages. The Melbourne-Wübbena combination generated for a specific satellite-receiver pair can be presented in the form of the sum of three terms [6]. One of the terms includes the integer wide-lane ambiguity for the two carrier frequencies [3]; the second component accounts for the satellite and receiver instrumental delays; and the third component is the measurement noise, including carrier phase and code multipath. Thus, during a time interval where the integer wide-lane ambiguity does not change, the Melbourne-Wübbena combination can be written as formula (1) with f j ¼ const.
Another example is satellite clock correction values derived from navigation message data, which can also be represented as in Eq. (1) with f j ¼ dj þ a, where d and a are the drift and offset parameters known from navigation data, respectively. In the case where the trend is known, the measurement data after the trend subtraction can be represented as, assuming N observations: with an unknown constant z, which we cannot be determined in advance, because the random value ξ j is not known and may contain outliers.
In Sections 2-6, we consider the case where the trend is known a priori, that is, the data can be presented as Eq. (2). A problem with an unknown trend will be discussed in Section 7. In principle, the outlier detection procedure described below is not affected by the measurement format expressed in Eq. (1) or (2); it can be applied to any set of data measurements y j .
The preliminary processing task includes rejection of rough measurements or outliers from data series (2). In other words, it is necessary to find a set Y L ¼ where L is the length for which the following conditions are satisfied: where σ Y L and σ max are the standard deviation (SD) and their associated specified threshold values; MINOBS is a parameter limiting from below the length of the required set of measured values (e.g., 10), and we will assume hereafter that MINOBS < N.
The values y j that are not included in the set Y L are treated as coarse measurements and removed from further processing. Typically, expressions in Eqs. (3)- (5) are the only conditions considered in processing programs when screening out rough measurements. Below we will formulate the problem of searching for a solution, complementing the conditions expressed in Eqs. (3)-(5) with two extreme conditions [7].
1. First, we will require that the length of the set sought be the maximum, that is, the number of measurements deemed to be coarse is the minimum: Note that for the predetermined values y j , the problem solution satisfying conditions in Eqs. (3)-(5) may not exist (e.g., when y j includes a trend, in particular when y j is an arithmetic progression with a step greater than σ max ). In the case when the solution does exist, we will denote the value L at which the maximum of Eq. (6) is reached as L max . Note that the condition expressed in Eq. (6) does not ensure the uniqueness of the set because several sets of length L max can be found that satisfy the conditions in Eqs. (3)-(5).
2. From all possible sets that satisfy conditions expressed in Eqs. (3)- (5) and (6), we will select the one for which the variable σ Y L receives the smallest value: Let us define Y opt as follow: Definition 1. For a given sequence of values y j , j = 1, 2, … N, the set of values: satisfying conditions in Eqs. (3)- (7), we refer to as the optimal solution of the problem expressed in Eqs. (3)- (7). The corresponding SD value is denoted by σ opt .
Thus, the problem consists in the creation of a search algorithm for the optimal solution of the problem shown in Eqs. (3)- (7).
In a practical situation, the precise value z, given conditions in Eqs. (3) and (4), is not known. We will estimate the values using the following formula: Note that the value z depends on the required solution, which will complicate its search.
Usually, iterative methods are used to find a solution to problem expressed in Eqs. (3)- (5). For example, the algorithm implemented in the observation data smoothing program (see [3]) is designed to find a set Y satisfying the conditions given by Eqs. (3)- (5). The proposed step-by-step algorithm is based on iterations (the index number of iteration is designated by the upper index in parentheses): Step 1: Initialization: Y 0 ð Þ ¼ y 1 , … , y N È É ; Level 0 ð Þ ¼ 10 20 ; k = 0.
Step 2: Checking the length of the set Y k ð Þ ; if it is less than MINOBS, then the process comes to an end and a solution is not found.
Step 3: Calculation of the values z k ð Þ and σ k ð Þ on the available set Y k ð Þ using formulas (3) and (9).
Step 4: Checking the fulfillment of the inequality σ k ð Þ ≤ σ max . If it is satisfied, the set Y k ð Þ is accepted as the solution, and the search process comes to an end. Otherwise, there is a transition to Step 5.
Step 5: Definition of Level kþ1 ð Þ for outlier detection: In order to prevent an infinite loop of iterations, a required verification is carried out: If this inequality is not satisfied, then the following is assumed: Step 6: Definition of a new set Y kþ1 ð Þ to include those and only those y j for which Step 7: Increasing k by 1: k++. Transition to Step 2. Note that the optimal solution cannot be found in such a manner, as confirmed by numerical calculations (see Section 4).

Algorithm for solving the problem
Let us formulate a statement that is the key to the creation of an effective search algorithm for the optimal solution (Eqs. (3)-(7)).
be optimal for a specified sequence of values y j n o and then the interval y min , y max À Á does not contain values y j that are not in the set Y opt . Proof. In fact, let us assume the opposite: Let y j ∉ Y opt , y min < y j < y max and y k and y l be values from the set Y opt for which y k = y min and y l = y max . One of these cases is possible: a. z < y j and, therefore 0 < y j À z < y l À z In the first case, Case (a), we replace the value y l in the set Y opt with y j . In the second case, Case (b), we replace y k with y j . In any of the cases, we will have another set of the same length L max for which conditions expressed in Eqs. (3) and (4) are satisfied, but the SD, as follows from Eq. (3), is less than σ opt . Consequently, the set Y opt is not optimal since requirement expressed in Eq. (7) is not satisfied. The contradiction that is reached proves Assertion 1.
is the optimal solution of the problem given by Eqs.
(3)- (7), then an arbitrary permutation of the numbers y j 1 , … , y j L max will also be the optimal solution of the problem described by Eqs.
(3)- (7). Thus, the optimal solution does not depend on the arrangement of given numbers y j . This allows us to arrange the numbers y j in the order most suitable for searching the optimal solution. Taking advantage of this important circumstance, we arrange the values y j n o in the ascending order and we will look for the optimal solution in the ordered sequence. For brevity, the ordered sequence will also be denoted by y j n o . Thus, y 1 ≤ y 2 ≤ … ≤ y N . Moreover, for simplicity of logic, we will assume that all y j are different, that is, Note that due to the formulated Assertion 1, if Y opt is the optimal set and y j 1 and y j L max are its smallest and greatest values, respectively, then all values of y j from the interval y j 1 , y j L max belong to Y opt . Consequently, considering Eq. (10), we have y j L max ¼ y j 1 þL max À1 and Y opt ¼ y j 1 , y j 1 þ1 , … , y j 1 þL max À1 n o Thus, the optimal solution should be sought in the ascending sequence y j among all possible sets y k , … , y kþLÀ1 È É of length L with k and L satisfying the following conditions: Hence, instead of searching for all possible sets of various length, numbering 2 N , for the solution of the problem described by Eqs. (3)-(7), it is sufficient to vary just the two parameters, k and L, associated with the conditions expressed in Eqs. (11) and (12). The number of pairs of integer numbers k and L subject to condition in Eqs. (11) and (12) is equal to: Let L ¼ l À k þ 1 be the length of an arbitrary set y k , … , y l È É . We introduce the designations: We rewrite conditions given by Eqs. (3) and (4) in the new designations: Note that the two last inequalities directly follow from Eq. (4), monotony of y j , and obvious inequalities: y k ≤ z k; L ð Þ ≤ y kþLÀ1 . Remark. In the conditions expressed in Eqs. (15) and (16), L means the length of the set under checking, and k is the index of the smallest number included in the set. Although "k" and "L" are also encountered as indexes in the sets we use below for monotonically increasing sequences, we hope nevertheless that this will not lead to confusion.
The following recursive relationships are available, making it possible to find z k; L ð Þand σ 2 k; L ð Þthrough the calculated values z k; L þ 1 ð Þи σ 2 k; L þ 1 ð Þfor seven arithmetic operations: The values of the fractions may be computed in advance as elements of a onedimensional array. Analogously, the following formulas can make it possible to express z k þ 1; L ð Þand σ 2 k þ 1; L ð Þthrough z k; L ð Þand σ 2 k; L ð Þ: The algorithm described below is based on the search for all possible pairs (k, L), where L denotes the length of the set to be checked and k is the index of the smallest of the values included in the set. At that, k and L must satisfy conditions Eqs. (11) and (12). The set y k , … , y kþLÀ1 È É corresponding to each such pair must be checked for fulfillment conditions (15) and (16).
We organize this search according to the algorithm described below, at each step of which we check the validation of Eqs. (15) and (16) for all possible sets of a certain length. We start the examine process with Step 1, where we check the set of maximum length N. Further, with each next step, we will reduce the length of the sets to be checked by 1.
Step 1: We consider the set of length N. There is only one such set: y 1 , … , y N È É . We check it for fulfillment conditions expressed in Eqs. (15) and (16). If they are satisfied, this set is selected as a solution, and further search stops. Otherwise, transit to Step 2.
Step 2: We consider the sets of length N À 1. There are two sets of length N À 1.
We test each of these sets for compliance with the conditions specified by Eqs. (15) and (16). If none of them satisfies conditions (15) and (16), then we transit to the next step. Otherwise, the following options are available: • Option 1: if only one set from them is found that satisfies conditions (15) and (16), then it will also be the solution of the stated problem; the search process stops here, where L max = N À 1.
Step N À L + 1: We consider the sets of length L. If L < MINOBS, then the search process stops, and a solution is not found. For L ≥ MINOBS, we examine N À L + 1 sets of length L: We check each of these sets for fulfillment of conditions (15) and (16). If any of them does not satisfy these conditions, then we transit to the next step where we consider the sets of length (L À 1). Otherwise, two options are possible: • Option 1: if only one set from (23) is found that satisfies the conditions of (15) and (16), then it will also be the solution of the stated problem with L max = L, and the search process stops here.
• Option 2: if several sets simultaneously satisfy conditions (15) and (16), we chose the set for which the value σ 2 k, L ð Þappears smallest as the solution, and the search process stops here, where L max = L.
In order to calculate the values z k, L ð Þ and σ 2 k, L ð Þ, we use the recursive formulas (17)-(22) in accordance with a search scheme shown in Figure 1 where only the z k, L ð Þ is involved. In accordance with the proposed arrangement, we calculate the values z 1; N ð Þ and σ 2 1; N ð Þin the first step of the algorithm using formulas expressed in Eqs. (13) and (14), and 4 N arithmetic operations are required for this arrangement. In order to find z k, L ð Þ and σ 2 k, L ð Þon all subsequent steps, we apply the recursive formulas (17)-(22), making it possible to calculate the values of the specified pairs of variables, each for the seven arithmetic operations, based on the results of the calculations of the preceding step. So, for example, on the second step, proceeding from the known values z 1; N ð Þand σ 2 1; N ð Þ, we find the values z 1; N À 1 ð Þand σ 2 1; N À 1 ð Þ(vertical arrow on the diagram) by using formulas in Eqs. (17)-(19) and the values σ 2 2; N ð Þand σ 2 2; N ð Þ(horizontal arrow on the diagram) by using Eqs. (20)-(22). In the general case, the transition to the following step in the direction of the vertical arrows (see diagram) is carried out according to formulas (17)-(19), and in the direction of horizontal arrows, according to formulas (20)-(22). The number of arithmetic operations required to find the solution should not exceed: In the above number of computations, the computational costs of verifying the satisfaction of inequalities (15) and (16) are also considered, which comprise from 0 to 2 arithmetic operations.

Results of calculations using algorithm presented in Section 3
We test the algorithms discussed above on the real data obtained by the ONSA station that is a part of the IGS network [2]. These data are included in the distribution kit of the installation software package [3] and available for usage. We consider measurement data received from global positioning system (GPS) satellite with system number PRN = 12 for 2010, day 207 to check the efficiency of the proposed algorithm described in Section 3. Figure 2 plots the values of the Melbourne-Wübbena combination over a time interval of 89.5 min (N = 180). The index numbers j of time epochs counting from the beginning of a 24-h period with a 30-second interval are plotted on the horizontal axis. The values y j of the combination are plotted on the vertical axis and are expressed in cycles with wavelength λ 5 ≈ 0:86 [3]. Figure 3 shows the values of deviations from the mean of the data cleared of outliers using the algorithms described in Sections 2 and 3. In both cases, σ max = 0.6 and MINOBS = 10. The values y j À z are plotted on the vertical axes in cycles with wavelength λ 5 and the index numbers j of epochs on the horizontal axis. Epochs in which the data were rejected are designated by white circles. In the first case (see Figure 3a), 47 data of the measurements were rejected, which are 26.1% of the total amount of data. In the second case (see Figure 3b), 14 of these measurements were rejected (7.8%), which are almost 18% less than in the previous calculation.
We also provide similar results for data obtained by TLSE station, which is also included in the IGS network. We consider measurement data from GLONASS, Russia satellite with system number PRN = 1 for 2010, day 207. Figure 4 shows the values of the Melbourne-Wübbena combination over a time interval of 65.5 min (N = 132). Figure 5 plots the values of deviations from the mean value of the data cleared of outliers using the algorithms described in Sections 2 and 3, respectively. Parameters σ max and MINOBS are the same as in the previous calculation example. In the first case (see Figure 5a), 41 data of the measurements were discarded, which are 31% of the total amount data. In the second case (see Figure 5b), 8 of these measurements were rejected (6%), which are 25% less than in the previous calculation.

Mathematical prerequisites for modifying of existing algorithm
Note that the number of arithmetic operations required to find the optimal solution according to the algorithm described in Section 3 depends on the L max , which is the length of the solution. As can be seen from Eq. (24), the smaller the length of the found solution (i.e., the larger the number of detected outliers), the more arithmetic operations are required to find it. This number of arithmetic operations is estimated to be of order N þ N À L max ð Þ 2 . Note that the expression in the parentheses herein is equal to the number of outliers detected. Thus, if the number of outliers in the original data series is comparable to N, it will take $N 2 arithmetic operations to find the optimal solution. In particular, to make certain that there is no solution (e.g., in the case where the data contain a trend), $N 2 arithmetic operations will also be required. In Section 6, we will modify the existing algorithm and describe fast outlier search algorithm that requires $Nlog 2 N arithmetic operations.
The necessary preparations are given in this section. Note that in this and the next sections we are dealing with the sequence y j n o N j¼1 arranged in the ascending order.
be monotonically increasing sequence. The following inequality is true: Proof. From the monotonicity of the sequence y j and the definition z k, L þ 1 ð Þ [see Eq. (13)], One of two cases is possible: Suppose, for example, the case (a) holds. Let us show that in this case At first, we will show that: Truly, inequalities: and the above inequality derived in Case (a) implies: These inequalities, in turn, imply Eq. (28). Next let us prove (27). This inequality is expanded as follows: Substituting here of the expression (17) in place of z k; L ð Þand writing for brevity, z instead of z k; L þ 1 ð Þ , we get inequality: Transform the right-hand side of this inequality: Here we take into account the equality: P kþL j¼k y j À z ¼ 0. Next, we have: Substituting this expression in Eq. (31), we get inequality that is true due to Eq. (28). Thus, Eq. (27) is proved for case (a). Analogously, case (b) is considered.
We introduce the notation: Assertion 3. The following inequalities hold: That is, the sequence σ 2 min L ð Þ monotonically decreases when L decreases from N to MINOBS.
Proof. Assertion 2 and definition of σ 2 min L ð Þ expressed in Eq. (32) imply: Since k is chosen arbitrarily, then for all L = MINOBS, … , N À 1 the following inequalities hold: which proves Assertion 3. Assertion 3 implies the following corollary. Corollary 1. If the inequality holds for some L 0 , then for existence of the solution Y L ¼ y k , y kþ1 , … , y kþLÀ1 È É for the problem expressed in Eqs. (3)-(7), it is necessary that the length L of the set Y L satisfies the condition L < L 0 .
Proof. Let us assume that L ≥ L 0 . Assertion 3 on account of monotony of σ 2 min Á ð Þ, expressed in Eq. (33) implies the following inequalities σ 2 min L ð Þ ≥ σ 2 min L 0 ð Þ> σ 2 max for all L≥L 0 . From this, it follows that for any set Thus, any of sets Y L of length L ≥L 0 does not satisfy the condition in Eq. (15) and, therefore, cannot be a solution of the problem, expressed in Eqs. (3)-(7).
In particular, we have come to the next important result. If, for example, the inequalities σ 2 min MINOBS ð Þ> σ 2 max are fulfilled, then the solution for the problem described in Eqs. (3)-(7) does not exist.
In the above-described procedure for solving problem (3)- (7), it takes $N 2 arithmetic operations to make certain that the solution not exists. Taking into account Assertion 3 and Corollary 1, the search procedure may begin by checking the conditions. This will require approximately $N arithmetic operations. If none of these conditions are fulfilled, the solution search must stop because the solution does not exist. As a result, only $N arithmetic operations are required to ensure that there is no solution.

Fast outlier search algorithm
The above proposed search procedure consists in the calculating values of z k, L ð Þ and σ 2 k, L ð Þusing recurrent formulas (17)-(22) and checking at every k and L the fulfillment of the inequalities (11) and (12). The algorithm complexity is estimated by value of $ N þ N 2 Outlier À Á , where N Outlier is the number of outliers found. If it is known a priori that there are few outliers in the measurement data, then the search algorithm for the optimal solution that described in Section 3 can be applied. If the measurement data contain an N-comparable number of outliers, the complexity of such an algorithm will be estimated by about N 2 . It is namely for such type of data we below describe a modified outlier search algorithm with complexity of about Nlog 2 N.
First of all, note one property that is the key to the construction of a fast outlier search algorithm. Note that if the inequality (15) holds for some set of length L + 1, then there exists a set of length L for which the inequality (15) is valid too. Truly, let assume for some k the inequality σ 2 k, L þ 1 ð Þ≤ σ 2 max holds. This inequality and Eq. (25) imply From this, it follows that This means that at least one of these sets y k , … , y kþLÀ1 È É and y kþ1 , … , y kþL È É with length of L satisfies conditions expressed in Eq. (15). However, this property is not true when checking the conditions expressed in Eq. (16). In other words, if these conditions are fulfilled for any set of length L + 1, it might happen that none of the sets of length L may satisfy them. This fact is a significant obstacle to increasing the rate of outlier detection that is necessary when processing a large amount of data with a large number of rough measurements. To overcome this obstacle, we will make the condition expressed in Eq. (16) weaker.
First of all, note that if for some set y k , … , y kþLÀ1 È É , the both conditions expressed in Eq. (16) are fulfilled, then the following inequality will hold: Consider a problem with condition expressed in Eq. (36) instead of conditions expressed in Eq. (16).
Remark. Recall that in this condition L means the length of the set under checking, and k is the index of the smallest number included in the set. Although "k" and "L" are also encountered as indexes in the sets we use hereinafter, we hope nevertheless that this will not lead to confusion.
It is easily seen that condition expressed in Eq. (36) for an arbitrary set y k , … , y kþL È É of length L + 1 implies the fulfillment this condition for each of the sets y k , … , y kþLÀ1 È É and y kþ1 , … , y kþL È É of length L. In fact, the fulfillment condition (36) for the set y k , … , y kþL È É means the fulfillment of inequality y kþL À y k ≤ 6σ max from which due to monotony of y j , the inequalities imply y kþL À y kþ1 ≤ 6σ max and y kþLÀ1 À y k ≤ 6σ max .
Thus, we have established the validity of the following assertion. Assertion 4. If the set y k , … , y kþL È É of length L + 1 satisfies conditions (15) and (36), then at least one of the two sets y k , … , y kþLÀ1 È É or y kþ1 , … , y kþL È É of length L also satisfies conditions (15) and (36).
Based on this statement, we can formulate the following: Assertion 5. Solution for the problem (15) + (36) can be found for $N log 2 N arithmetic operations.
Proof. Let us consider the sequence of steps.
Step 0: Consider the segment N  15) and (36). If none of them satisfy these conditions, then we stop the search process and conclude that the solution does not exist. Otherwise, once we find the set of length MINOBS satisfying conditions (15) and (36), we transit to Step 1.
Step 1: Step 1 is the same as Step k described below for k = 1 … Step k: On the kth step, where (k ≥ 1), we consider a segment N   (15) and (36). The following three cases are possible, schematically shown in Figure 6. The sign "À" above the point means that for none of the sets of the corresponding length the conditions (15) + (36) are satisfied, the sign "+" vice versa, that is, there is at least one set of the corresponding length satisfying (15) + (36). In the case of (a), we set N The search process will continue until either case (b) or until the length of the i is less than or equal to 1. In either case, the total number of steps will not exceed the number of log 2 N À MINOBS ð Þ . Since $N operations are performed in each step, the search process is guaranteed to be terminated in $N log 2 N arithmetic operations.

Search an unknown trend in the power polynomial class
The need to process GNSS measurements including a trend on which noise and outliers are superimposed arises at different processing stages of the application process. As already stated above, satellite clock corrections contain a linear trend. In some cases, it may not be known, and then, one has to search for it, for example, by the least square method. The presence of outliers in the measurement data is a significant obstacle to accurate determination of drift and offset parameters of satellite clocks. Other examples are linear combinations of code and phase data on two carriers [3]. To obtain high accuracy results, it is necessary to detect outliers against an unknown trend and remove them from further processing. This is the subject of this section.

Statement of the problem
Consider the problem of outlier detecting in data presented in the form of Eq. (1), recall that: The procedure described above for finding the optimal solution in an ordered series of numbers may not produce an adequate result if applied to data containing an unknown trend. For example, there may be no solution, and all data will be defined as outliers. In order to detect outliers in a series of numbers with a trend using the algorithm described above, it is necessary to find a suitable approximation of an unknown function f j and subtract it from the data set. Searching for this approximation is usually done by selecting functions from some functional class. The choice of the functional class depends on the task. In some cases, these may be power polynomial, in other cases, trigonometric polynomial, etc. The presence of outliers in the data measurements makes it much more difficult to find such an approximation. In this section, we will describe the general approach to solving the problem and look for suitable approximations of trend f j in the class of power polynomial with real coefficients: where n is the polynomial degree, and a ! ¼ a 0 , … , a n f gis vector of coefficients. Thus, the problem consists in the creation of an algorithm for searching the trend in the class of power polynomial and detecting outliers in specified data series y j represented in Eq. (1).

Minimizing set of specified length L
Before we turn to the trend search algorithm construction, we will define the socalled minimizing set of given length L, which plays an essential role in the trend search. In addition, in Section 7.3, we will describe a search algorithm for such set based on the recurrent formulas (17)  According to this definition, we have Note that the numbers y j are not supposed to be in the ascending order. Next, we will formulate and prove a statement similar to Assertion 1, which will allow us, when searching for a minimizing set, to proceed from the original series to its ordered permutation. then the interval y min , y max À Á does not contain values y j that are not in the set Y L, min . Proof. Let us assume the opposite: Let y j ∉ Y L, min , y min < y j < y max . Let y k , … , y kþLÀ1 is a permutation of the numbers y j 1 , … , y j L in the ascending order; then y k ¼ y min and y kþLÀ1 ¼ y max . One of these cases is possible: a. y j < z Y L, min and therefore subject to inequality y k < y j , we have z Y L, min À y k À Á > z Y L, min À y j ) z Y L, min À y k À Á 2 > z Y L, min À y j 2 ) z Y L, min À y j 2 À z Y L, min À y k À Á 2 < 0 (42) b. y j ≥ z Y L, min and therefore subject to inequality y kþLÀ1 > y j , we have In the first case, Case (a), we replace the value y k in the set y k , … , y kþLÀ1 È É with y j . In the second case, Case (b), we replace the value y kþLÀ1 with y j . We want to show that doing such replacement, the value σ 2 Y L, min expressed in Eqs. (40) and (41) will decrease. This will mean that Y L, min is not a minimizing set.
Suppose Case (a). For brevity, we will write below z instead of z Y L, min and σ 2 instead of σ 2 Y L, min . Denotez andσ 2 , the similar values obtained after replacement y k with y j . We have: andz We want to show thatσ Eqs. (44) and (46) imply:z Modifyσ 2 expressed in Eq. (47) taking account of Eq. (49): þ y kþLÀ1 À z À L À1 y j À y k 2 þ y j À z À L À1 y j À y k 2 : After simplification with taking into account Eq. (45), we get from here: From (42), it follows (recall that we write z instead of z Y L, min ) that the expression in the square brackets is strictly less than zero. Thus, inequality (48) is proven. From this follows that the set Y L, min is not minimizing one because the condition (41) is not met. Thus, we have arrived at a contradiction that proves the validity of the formulated Assertion 6. Case (b) is considered similarly.

Y L, min search algorithm
Assertion 6 is much like Assertion 1, which made it possible to go from an arbitrary numerical series to an ordered one for the optimal solution search (see Section 3). Similarly, Assertion 6 makes it possible to go to an ordered number series to find the minimizing set of numbers of a given length. In our case, considerations similar to those presented in Section 3 may be made when searching for a minimizing set. Namely, if Y L, min ¼ y j 1 , … , y j L n o is the minimizing set of length L, then an arbitrary permutation of the numbers y j 1 , … , y j L will also be the minimizing set. Thus, the process for searching of Y L, min does not depend on the arrangement of given numbers y j . This allows us to arrange them in the order most suitable for searching the minimizing set. We arrange the values y j n o in the ascending order, and we will look for the minimizing set in the ordered sequence. As in Section 3, for brevity, the ordered sequence will also be denoted by y j n o . For simplicity of logic, we will assume that all y j are different, that is, Eq. (10) holds. Rewrite it for convenience: y 1 < y 2 < … < y N Note that due to Assertion 6, if Y L, min is the minimizing set and y j 1 and y j L are its smallest and greatest values, respectively, then all values of y j from the interval y j 1 , y j L belong to Y L, min . Consequently, considering Eq. (10), we have y j L ¼ y j 1 þLÀ1 and therefore Y L, min ¼ y j 1 , y j 1 þ1 , … , y j 1 þLÀ1 n o : Thus, in order to find a minimizing set of length L, it is sufficient for us to check N À L + 1 sets and choose from them the one that has minimal SD value. In the minimizing set searching procedure, we use the appropriate designations z k, L ð Þand σ 2 k, L ð Þ expressed by Eqs. (13) and (14) to denote mean and square SD in the case of sets composed of the ascending ordered values.
The calculation of z k, L ð Þand σ 2 k, L ð Þwhen searching Y L, min can be carried out in the manner similar to that of the optimal solution according to the schematic, in which only the σ 2 k, L ð Þis shown: At first, we carry out the transitions in the direction of the vertical arrows (see diagram in Figure 7) and calculate sequential values σ 2 1, N ð Þ, σ 2 1, N À 1 ð Þ , … , σ 2 1, L ð Þ using formula (17)-(19). Finally, we go in the direction of the horizontal arrows and calculate values of Þusing formula (20)-(22) and choose from them the minimal one.

Trend search algorithm
In order to correctly detect outliers in measurement data that include an unknown trend, it is necessary to find and remove trend from the original data. The problem in determining of unknown trend is to find a suitable profile for the measurement data by adjusting the fitting parameters. This implies, in turn, the needs to select from the specified series y j the "right" reference values y j 1 , … , y j L n o used for fitting. If the data do not contain coarse measurements, all N values of the original data can be treated as reference ones and used for fitting. If the data contain rough measurements, the participation all or some of those values in the fit will result in a fatal distortion of the trend function and, as a result, an incorrect determination of the outliers. Thus, when determining the trend, it is necessary to exclude rough measurements from the number of reference values used for fitting and by which the trend approximations are built. We have the vicious circle. To determine outliers, it is necessary to find a proper trend approximation, and to find an exact trend approximation, we need to find all outliers and remove them from the values used for fitting. Another vicious circle is obtained when trying to choose the number of values for fitting. If we take it too small to minimize the possibility for outliers to be included into the subset of values for fitting, the data fit may not be accurate enough for the rest values of the data outside of the subset. If, on the contrary, we take rather large number of points for fitting, the outliers may be among them, and we will also get a bad approximation of the trend. We describe herein a strategy for finding an unknown trend and detecting outliers. This strategy assumes that the number of outliers in the series presented by Eq. (1) does not exceed a certain value N maxout known a priori. Thus, we can suppose that the number of "right" values suitable for fitting in the series y j is not less than N À N maxout .
Below L is supposed to be fixed and associated with the number of the reference values of the series (1) used for fitting, and L ≤ N À N maxout .
Let us consider the following algorithm. It contains internal iterations, which we will denote with the upper index "s" in parentheses.
Step 0: n = 0. We set some L, satisfying the condition L ≤ N À N maxout .
Step 2: s++. We fit polynomial to the data set and find fitting coefficients a Step 3: Consider the valuesŷ s ð Þ j obtained after subtraction from y j the polynomial values with coefficients found in Step 2: Using the algorithm described in Section 7.3, we find from the numbersŷ We transit to Step 2 and do so until the convergence of the σ Y s ð Þ L, min , s = 1,2, … , is achieved. Note that we will consider the issue of convergence further in Section 7.5. After reaching convergence of values σ Y s ð Þ L, min , we transit to Step 4 for outlier detection.
We find the optimal solution for valuesŷ s ð Þ j using the algorithm of Section 3 and determine the number N out of outliers detected. If it turns out that N out ≤N maxout , then solution is considered found; searching process stops: f j ¼ P n,j a ! s ð Þ . Otherwise, if N out >N maxout , then we transit to Step 1.
We do this until we find a solution or until n reaches some preset value N max (e.g., 10). In this case, probably, we may need to select a different functional class to search for a trend.
Example. Let us consider the data simulated in accordance with formula: evenly distributed in the segment [0,1]. Let us set N = 150 and model 10 outliers at points j ¼ 6:::10 and j ¼ 140:::144 (see Figure 8). Further, let us set σ max = 0.6, MINOBS = 10 and search for outliers as described above with L = 130. If n = 1, we get N out = 88; if n = 2, we get N out = 33; if n = 3, we get N out = 17; if n = 4, we get N out = 10. Thus, a suitable trend approximation turned out to be a fourth degree polynomial. Figure 9 shows values for fourth degree polynomial fitted to the data set on  the eighth iteration after the convergence discussed above is reached. In Figure 10, Positions of detected outliers are marked with white circles. Note that the trend was modeled by a function that does not belong to the class of power polynomials.

Convergence of iterations in trend search algorithm
This section explains the convergence of the iterations described in the trend search algorithm (see previous section).
and therefore converged. Proof. We start our consideration with Step 3 and sth iteration, s = 1, 2, … In Step 3, for the sequenceŷ Transform the expression on the right side of Eq. (56). Substitution here instead ofŷ s ð Þ j expression in Eq. (52) gives: Þ L, min 2 : Figure 10.
Using the flag s ð Þ j defined in Eq. (53), rewrite the last equality as follows: Here we introduce the designation b We transit to Step 2; s is incremented by 1. We find a vector a ! sþ1 ð Þ of polynomial coefficients, which minimizes the functional Φ s ð Þ a ! : Thus, From the definition of the extremum of functional, it follows: Taking into account Eq. (58), we have: The extremum condition (one of n + 1) of functional Φ s ð Þ a ! is as follows: From here, we derive: Taking into account the designation forŷ sþ1 ð Þ j given by Eq. (52), we can write this equality in the form. Thus, We transit to Step 3. In this step, for sequenceŷ , we find a minimizing for this set does not exceed the same magnitude for any other set, particularly for Y Finally, from Eqs. (61) and (63) and the last inequality (64), we get , that proves Assertion 7.

Detection and repair cycle slips in the Melbourne-Wübbena combination algorithm
In this and the following section, we describe cycle slip repair algorithm for observers represented in the form of Melbourne-Wübbena combination, which is often used in modern GNSS measurement data processing programs. Loss by the receiver of the carrier phase capture results in jumps in the code and phase measurement data. In the absence of jumps, as we already discussed in Section 2, the values of the combination consist of measurement noise superimposed on an unknown constant value dependent on a specified satellite-receiver pair.
In case of temporary loss of carrier phase capture by the receiver, jumps occur in a series of values representing the Melbourne-Wübbena combination. The procedure of detecting jumps and eliminating them from the values of the combination, called cycle slip repair, is one of the most important steps of preprocessing GNSS data. The main difficulty in detecting jumps is that neither the exact size of jumps nor their epochs are known. A number of algorithms, descriptions of which can be found, for example, in Refs. [3,6,8] are proposed for the detection of jumps. Although differing in detail, they are based on a common idea, that is, comparison of the SDs of the time series of measurement data obtained from one of the bounds of the time interval to an arbitrary moment, an epoch. If the differences of the SDs corresponding to two adjacent epochs exceed a predetermined threshold value, then a jump is declared in one of these two epochs. A drawback of similar algorithms is the frequent false detection of jumps during epochs containing rough measurements (outliers) since the values of outliers can exceed the sizes of a jump itself. On the other hand, an attempt to increase the threshold value leads to the opposite effect, an inability to recognize jumps that are small in magnitude.
Below, we propose a robust cycle slip repair algorithm that allows, more reliably than similar known algorithms, to detect jumps and determine their sizes. The proposed algorithm is based on search for so-called clusters consisting of epochs, in which the values of the combination are grouped about corresponding predefined values. Besides, this algorithm implements the above-described method of cleaning data from outliers based on the search for the optimal solution. This method, combined with Springer's algorithm used in Ref. [3], allows for the reliable determination of multiple (cascade) jumps in the Melbourne-Wübbena combination.
The Melbourne-Wübbena combination L MÀW can be presented in the form of the total of three terms [6]: where λ 5 is the formal wavelength (for all GPS satellites λ 5 = 86.16 cm and for GLONASS satellites λ 5 = 84.0 ÷ 84.36 cm); n 5 is an unknown integer, the so-called wide lane ambiguity [3]; B is the residual systematic error caused by instrumental delays relative to the specific receiver-satellite pair and, as assumed, not timedependent; and ν is a random component. Both parts of the equality in (65) are expressed in meters. At the same time, the L MÀW combination is often expressed in cycles of wavelength λ 5 . Let us divide both parts of Eq. (65) by λ 5 . We introduce the designations y ¼ L MÀW =λ 5 , β ¼ B=λ 5 , and ξ ¼ ν=λ 5 and derive for each of the epochs associated with indexes j ¼ 1, … , N: where n 5j is an unknown piecewise-constant function of a discrete argument, regarding which it is only known that it takes integer values and can undergo integer jumps; β is an unknown constant value; and ξ j is a random variable. The problem consists in the development of an algorithm for automatic processing of the informational data y j of form of Eq. (66), making it possible to effectively determine integer-valued jumps against the noise component and outliers.

Description of the proposed algorithm
Let us present the proposed algorithm as the following sequence of steps.
Step 0: We introduce parameter Δ by specifying its value as Δ ¼ 2 Á σ max . We mark the indexes of epochs with function flag j as nulls: flag j ¼ 0; j = 1, … , N.
Arrange the array y j in the ascending order and renumber it by placing the indexes in the ascending order. Denote the resulting array as Y j . For simplicity of logic, we suppose that all Y j are different. Therefore, we have: Denote the corresponding permutation of the values of function flag j as FLAG j .
Step 1: On this step, we are looking for the maximum density of array values y j .
We consider the segments Y j , Y j þ Δ Â Ã of length Δ and look for the one that contains the largest number of Y i values. Here, only those indexes i and j from the range of 1 ≤ i, j ≤ N are considered for which FLAG i ¼ FLAG j = 0. The purpose of the ordering of the original array consists in improving the effectiveness of the maximum density search procedure. It can be shown that the number of comparisons required when searching for the above segment does not exceed 2 N in the case of the ordered array. For an unordered array, the number of comparisons is evaluated as N 2 .
Step 2: Let Y J max , Y J max þ Δ Â Ã be the segment found in the previous step, containing N max values of Y j (see Figure 11). We calculate the mean m of these numbers Y j .
Step 3: Cluster searching. The values y j that are clustered about the mean m found in the previous step can pertain to the epochs scattered along the time axis in chaotic order. The task of this step is to define an accumulation of such points, a socalled cluster, which we define as follows:  This definition illustrates in Figure 12.
It is understood that for specified values of m and Δ, there might be one, several, or no (m, Δ) clusters. Searching for clusters is performed by sequential checking of the satisfaction of inequality (69) for all j = 1, … , N for which flag j = 0. Note that inequality (69) is satisfied for at least N max values of j. In fact, from expressions (67) and (68), we derive the following equations: and since the arrays y j and Y j differ only by permutation, it follows from Eq. (68) that the inequalities are fulfilled for N max index j. Inequality (69) is satisfied also for all these indexes, as follows from Eqs. (70) and (71).
If cluster was found, then we mark all points of it as 1, and then, we repeat the cluster search procedure. If even just one cluster has been found at this step, we transfer to Step 1. If not even one cluster has been found, then the search for clusters is complete and we transfer to Step 4.
Step 4: If even just one cluster has been found, we transfer to Step 5, and otherwise: (a) all points of the segment 1; N ½ are marked as outliers and removed from further processing and (b) the operation of the algorithm terminates.
Step 5: Search for individual jumps in clusters. Let us assume that n ≥ 1 clusters have been found: n: k n , l n ½ is the m n , Δ ð Þcluster: In each of the clusters that are found, outliers and 1-size jumps are possible. This follows immediately from the inequalities in (69) and the preestablished value Δ ¼ 1.2. Substep 5.1: In detecting a 1-size jump in a cluster, we use modified Springer algorithm (see Refs. [9,10]) combined with the proposed in Section 3 algorithm that executes a search for the optimal solution with a minimum quantity of defective data.
Substep 5.2: We find all epochs J p and values Δn 5,J p ≜ n 5,J p À n 5,J p À1 (p = 1, … , n) of the jumps inside the clusters. The possible values for Δn 5,J p are 0, AE 1. Substep 5.3: We repair the data by the value of each found jump, using the formula ; p ¼ 1, ::, n, where y 0 ð Þ j ¼ y j . Substep 5.4: We rename: y j ¼ y n ð Þ j .
Step 6: Marking of points outside clusters. All points outside of the found clusters (if any) are marked as outliers.
Step 7: Ordering of clusters. We renumber the clusters so that they are placed left to right on the time axis. For the ordered set of clusters [k p , l p ], p = 1, … , n, the conditions 1 ≤ k 1 < l 1 < k 2 < l 2 < … < k n < l n ≤ N are satisfied.
Step 8: Data screening within clusters and improving the mean values of y j . Substep 8.1: In accordance with the algorithm proposed in Section 3, we perform screening from outliers in each of the n clusters.
Substep 8.2: For each of the clusters cleaned of outliers, we determine the modified mean values m * p .
Step 9: Jumps between clusters. It follows from the description presented above that the remaining jumps in the data y j are on boundaries between clusters. If only one cluster is found, the algorithm is terminated: no additional jumps are detected, and the data are cleaned of outliers. If more than one cluster is found (n > 1), then the epochs j 1 , … , j n-1 of the jumps will be the coordinates of the left-hand boundaries of clusters, beginning from the second: j l = k 2 , … , j n-1 = k n . The values of jumps Δn 5, j p are found in this case as rounded to the nearest integer of the difference of mean values of adjacent clusters: Step 10: Repair data. We delete the jumps between clusters using formula analogous to (72): where y 0 ð Þ j ¼ y j . We rename: y j ¼ y nÀ1 ð Þ j . The algorithm is terminated.

Numerical calculations using algorithms presented in Sections 8 and 9
We present here the results of testing the proposed algorithm using real data obtained by the JOZ2 station, which is part of the IGS network [2]. These data are included in the distribution set of the installation software package [3]. Testing was carried out for data obtained from GPS satellite with number PRN = 13 for 2010, day 207. Figure 13 shows the Melbourne-Wübbena combination values in the  107 min time interval (N = 215). The number j of time epochs counted from the beginning of 24-h day with interval of 30 seconds is plotted along the horizontal axis. The combination values y j expressed in cycles with wavelength λ 5 are plotted along the vertical axis. Figure 14a and b presents the values of the deviations from the mean value z of data after cycle slip repair procedure, by using the algorithm applied in Ref. [3] (see Figure 14a) and the proposed algorithm (Figure 14b). The values y j -z in cycles of λ 5 are plotted along the vertical axes, and the number j of epochs is plotted along the horizontal axis. Epochs in which the measurement data were rejected are marked by light circles. In the first case (see Figure 14a), 111 of the conducted measurements or 51% of the total number of data was discarded. In the second case (see Figure 14b), 29 of these measurements were rejected (13%), which are almost 38% less than in the previous computation. The epochs of detected jumps are marked by daggers.

Conclusion
This chapter presents several effective and stable algorithms for processing data received from GNSS receivers. These data form the basis of almost all engineering applications in the field of computational geo-dynamics and navigation and cadastral survey and in numerous fundamental research works as well. The accuracy of the results obtained is significantly influenced by the quality of the data used in the calculations. In particular, the presence of rough measurements (outliers) in the observation data can significantly reduce the accuracy of the calculations carried out. One of the tasks at the preliminary stage of data processing is reliable detection and removal of rough measurements from the series of measured data with minimum amount of rejected data. The so-called optimal solution, introduced in the chapter, made it possible to detect and eliminate outliers from observed data minimizing the number of rejected measurements. In addition, it is assumed that the data may contain a trend as an unknown function of time. The strategy for determining of the trend is depending on the physical process in question under an assumption that the trend is a continuous function of time. The efficiency of the search is definitely influenced by the choice of the function class from which the trend is searched. In this chapter, we considered the class of power polynomials, but in some cases, this choice may not lead to the expected result. It may require, for example, a class of trigonometric functions to find a suitable trend. The automatic search for the best functional class, together with the strategy of effectively finding an unknown trend, against the background of random noise and outliers, is a complex task for future research.