Open access peer-reviewed chapter - ONLINE FIRST

Effective Algorithms for Detection Outliers and Cycle Slip Repair in GNSS Data Measurements

By Igor V. Bezmenov

Submitted: December 6th 2019Reviewed: April 27th 2020Published: June 9th 2020

DOI: 10.5772/intechopen.92658

Downloaded: 19

Abstract

The chapter describes effective algorithms that are often used in processing data measurements in Global Navigation Satellite Systems (GNSSs). Existing effective algorithm was developed for detection and elimination of outliers from GNSS data measurements. It is based on searching for a so-called optimal solution for which standard deviation and maximum absolute deviation of the measured data from mean values do not exceed specified threshold values, and the number of the detected outliers is minimal. A modification of this algorithm with complexity of Nlog2N is discussed. Generalization of the existing algorithm to the case when data series included some unknown trend will be presented. The processing trend is assumed to be described by an unknown function of time. The generalized algorithm includes the outlier detection algorithm and trend searching algorithm that has been tested using simulated data. A new algorithm will be presented for cycle slip repair using Melbourne-Wübbena linear combination formed from GNSS data measurements on two carrier frequencies. Test results for repair data in the case of multiple (cascade) cycle slips in actual observation data will also be presented in this chapter.

Keywords

  • global navigational satellite systems (GNSSs)
  • GNSS measurements
  • outliers
  • data screening
  • optimal solution
  • trend function
  • Melbourne-Wübbena combination
  • cycle slips

1. Introduction

At present, five constellations of GNSS satellites are involved in the formation of observational data, which serve as a source for many applications related to navigation, geodesy, geodynamics, and in the performance of solving of many fundamental problems. These are American Global Positioning System (GPS), Russian Global Navigation Satellite System (GLONASS), European Galileo, Chinese BeiDou, and Japanese Quasi-Zenith Satellite System (QZSS). The satellites of each of the operating systems transmit signals, as a rule, on two L-band carriers, which are received by GNSS receivers. A large number of stations equipped with GNSS receivers and located around the World are part of the International GNSS Service (IGS) network. These stations generate observation data files and transmit them to international databases in real time [1, 2], after which these data become available for use by many institutions and laboratories over the World. When solving applications, the measurement data go through various processing steps. Significant element of the data processing is the detection of rough measurements and removal them from the further processing. Despite the fact that many of the laboratories use a high-end application of the software regarding accuracy, reliability, and robustness, the presence of rough measurements in the observational data excludes the possibility of obtaining an accurate final result. In order to obtain results of unprecedented accuracy, the measurement data must be cleared of coarse measurements or outliers. It should be noted that the concept of outliers is key in the measurement processing theory [3], and there is no general definition for it. In order to distinguish outliers from the rest of the measured data, in some cases, the deviation of the data series values from some average value of the data is considered. If the deviation from the average is exceeded by a predetermined threshold value, the measured value is considered as an outlier. Such an approach has a significant disadvantage that the exact mean is generally unknown, and the estimate obtained by averaging a series may be very inaccurate due to outliers. Existing iterative procedures are also based on the idea of calculating deviation from the average and often result in the unjustified rejection of many observations. Reducing the data involved in processing may, in turn, result in a loss of accuracy of the final result.

This chapter describes the outliers cleaning algorithm for GNSS data. The proposed algorithms are based on the search for the so-called optimal solution with the minimum amount of invalidly rejected data. The algorithm for accelerated detection of outliers in a large amount of measurements has been developed, as well as an algorithm for detecting outliers in data containing an unknown trend. In conclusion, the algorithm of jump detection in the Melbourne-Wübbena combination [3, 4, 5], including the developed procedure of cleaning data from outliers, is considered.

In Section 2, the problem of searching for the so-called optimal solution is formulated. Section 3 provides a search algorithm, with a common number of arithmetic operations not exceeding ∼N2. Section 4 presents the test results for actual measurements in global navigational satellite systems at two carrier frequencies. The searching of outliers was performed in the Melbourne-Wübbena combination. In Section 5, the assertions that are the mathematical prerequisites for justifying a fast outlier search algorithm are proved. In Section 6, the fast outlier detection algorithm with the number of arithmetic operations of Nlog2N is proposed. Section 7 describes the case of data with unknown trend. Iterative procedure of outlier search is proposed based on the finding of suitable trend approximation in polynomials class. The idea of excluding coarse measurements is based on finding a so-called minimizing set of measurement data of a given length. This distinguishes the proposed algorithm from known similar procedures in which outliers are detected by exceeding a preset threshold. The test results with simulated data are given. Sections 8 and 9 discuss the problem of detecting jumps in the Melbourne-Wübbena combination. An algorithm is proposed that includes the outlier cleaning procedure based on the search for the optimal solution. Section 10 shows the numerical calculations with real data for algorithms presented in Sections 3–9. Section 11 concludes the chapter.

2. Statement of the problem: a Melbourne-Wübbena combination

Often, measurements yjtaken at moments of time j can be presented in the form:

yj=fj+z+ξj;j=1N,E1

where fjis a trend function, depending on physical process and as a rule is unknown, the sum z+ξjis unknown random variable imposed on the trend with unknown constant z and a centered random variable ξj.

Detection of outliers in data series expressed in Eq. (1) with unknown trend is uncertain since the concept of measurement or outliers itself is uncertain. In many cases, however, the trend function is known a priori.

For example, many data processing programs often use different linear combinations formed of code and phase measurement data to eliminate unknown parameters. One such combination is the Melbourne-Wübbena combination composed of both, carrier phase and code observables as described by Melbourne [4] and Wübbena [5]. This combination eliminates the effect of the ionosphere, the geometry, the clocks, and the troposphere [3], and it is often used to detect loss of carrier phase capture in the preprocessing stages. The Melbourne-Wübbena combination generated for a specific satellite-receiver pair can be presented in the form of the sum of three terms [6]. One of the terms includes the integer wide-lane ambiguity for the two carrier frequencies [3]; the second component accounts for the satellite and receiver instrumental delays; and the third component is the measurement noise, including carrier phase and code multipath. Thus, during a time interval where the integer wide-lane ambiguity does not change, the Melbourne-Wübbena combination can be written as formula (1) with fj=const.

Another example is satellite clock correction values derived from navigation message data, which can also be represented as in Eq. (1) with fj=dj+a, where d and a are the drift and offset parameters known from navigation data, respectively. In the case where the trend is known, the measurement data after the trend subtraction can be represented as, assuming N observations:

yj=z+ξj;j=1N,E2

with an unknown constant z, which we cannot be determined in advance, because the random value ξjis not known and may contain outliers.

In Sections 2–6, we consider the case where the trend is known a priori, that is, the data can be presented as Eq. (2). A problem with an unknown trend will be discussed in Section 7. In principle, the outlier detection procedure described below is not affected by the measurement format expressed in Eq. (1) or (2); it can be applied to any set of data measurements yj.

The preliminary processing task includes rejection of rough measurements or outliers from data series (2). In other words, it is necessary to find a set YL=yj1yjLof L elements, where L is the length for which the following conditions are satisfied:

σYL=L11jj1..jLyjz2σmax,E3
yjz3σmax;yjyj1yjL,E4
LMINOBS,E5

where σYLand σmaxare the standard deviation (SD) and their associated specified threshold values; MINOBS is a parameter limiting from below the length of the required set of measured values (e.g., 10), and we will assume hereafter that MINOBS < N.

The values yjthat are not included in the set YLare treated as coarse measurements and removed from further processing. Typically, expressions in Eqs. (3)(5) are the only conditions considered in processing programs when screening out rough measurements. Below we will formulate the problem of searching for a solution, complementing the conditions expressed in Eqs. (3)(5) with two extreme conditions [7].

1. First, we will require that the length of the set sought be the maximum, that is, the number of measurements deemed to be coarse is the minimum:

Lmax.E6

Note that for the predetermined values yj, the problem solution satisfying conditions in Eqs. (3)(5) may not exist (e.g., when yjincludes a trend, in particular when yjis an arithmetic progression with a step greater than σmax). In the case when the solution does exist, we will denote the value L at which the maximum of Eq. (6) is reached as Lmax. Note that the condition expressed in Eq. (6) does not ensure the uniqueness of the set because several sets of length Lmaxcan be found that satisfy the conditions in Eqs. (3)(5).

2. From all possible sets that satisfy conditions expressed in Eqs. (3)(5) and (6), we will select the one for which the variable σYLreceives the smallest value:

σYLYL=y,j1yjL;L=Lmaxmin.E7

Let us define Yoptas follow:

Definition 1. For a given sequence of values yj, j = 1, 2, … N, the set of values:

Yopt=yj1yjLmax,E8

satisfying conditions in Eqs. (3)(7), we refer to as the optimal solution of the problem expressed in Eqs. (3)(7). The corresponding SD value is denoted by σopt.

Thus, the problem consists in the creation of a search algorithm for the optimal solution of the problem shown in Eqs. (3)(7).

In a practical situation, the precise value z, given conditions in Eqs. (3) and (4), is not known. We will estimate the values using the following formula:

z=L1jj1jLyj.E9

Note that the value z depends on the required solution, which will complicate its search.

Usually, iterative methods are used to find a solution to problem expressed in Eqs. (3)(5). For example, the algorithm implemented in the observation data smoothing program (see [3]) is designed to find a set Y satisfying the conditions given by Eqs. (3)(5). The proposed step-by-step algorithm is based on iterations (the index number of iteration is designated by the upper index in parentheses):

Step 1: Initialization: Y0=y1yN; Level0=1020; k = 0.

Step 2: Checking the length of the set Yk; if it is less than MINOBS, then the process comes to an end and a solution is not found.

Step 3: Calculation of the values zkand σkon the available set Ykusing formulas (3) and (9).

Step 4: Checking the fulfillment of the inequality σkσmax. If it is satisfied, the set Ykis accepted as the solution, and the search process comes to an end. Otherwise, there is a transition to Step 5.

Step 5: Definition of Levelk+1for outlier detection:

Levelk+1=3σk;

In order to prevent an infinite loop of iterations, a required verification is carried out:

Levelk+1<Levelk.

If this inequality is not satisfied, then the following is assumed:

Levelk+1=Levelk/2;

Step 6: Definition of a new set Yk+1to include those and only those yjfor which

yjzkLevelk+1.

Step 7: Increasing k by 1: k++. Transition to Step 2.

Note that the optimal solution cannot be found in such a manner, as confirmed by numerical calculations (see Section 4).

3. Algorithm for solving the problem

Let us formulate a statement that is the key to the creation of an effective search algorithm for the optimal solution (Eqs. (3)(7)).

Assertion 1. Let the set Yopt=yj1yjLmaxbe optimal for a specified sequence of values yjand

ymin=minyj1yjLmax,ymax=maxyj1yjLmax,

then the interval yminymaxdoes not contain values yjthat are not in the set Yopt.

Proof. In fact, let us assume the opposite: Let yjYopt, ymin<yj<ymaxand ykand ylbe values from the set Yoptfor which yk= yminand yl= ymax. One of these cases is possible:

  1. z<yjand, therefore 0<yjz<ylz=> yjz2<ylz2,

  2. zyjand, therefore 0zyj<zyk=> yjz2<ykz2.

In the first case, Case (a), we replace the value ylin the set Yoptwith yj. In the second case, Case (b), we replace ykwith yj. In any of the cases, we will have another set of the same length Lmaxfor which conditions expressed in Eqs. (3) and (4) are satisfied, but the SD, as follows from Eq. (3), is less than σopt. Consequently, the set Yoptis not optimal since requirement expressed in Eq. (7) is not satisfied. The contradiction that is reached proves Assertion 1.

Further, note that if Yopt=yj1yjLmaxis the optimal solution of the problem given by Eqs. (3)(7), then an arbitrary permutation of the numbers yj1,,yjLmaxwill also be the optimal solution of the problem described by Eqs. (3)(7). Thus, the optimal solution does not depend on the arrangement of given numbers yj. This allows us to arrange the numbers yjin the order most suitable for searching the optimal solution. Taking advantage of this important circumstance, we arrange the values yjin the ascending order and we will look for the optimal solution in the ordered sequence. For brevity, the ordered sequence will also be denoted by yj. Thus, y1y2yN. Moreover, for simplicity of logic, we will assume that all yjare different, that is,

y1<y2<<yNE10

Note that due to the formulated Assertion 1, if Yoptis the optimal set and yj1and yjLmaxare its smallest and greatest values, respectively, then all values of yjfrom the interval yj1yjLmaxbelong to Yopt. Consequently, considering Eq. (10), we have yjLmax=yj1+Lmax1and

Yopt=yj1yj1+1yj1+Lmax1

Thus, the optimal solution should be sought in the ascending sequence yjamong all possible sets ykyk+L1of length L with k and L satisfying the following conditions:

MINOBSLN,E11
1kNL+1.E12

Hence, instead of searching for all possible sets of various length, numbering 2N, for the solution of the problem described by Eqs. (3)(7), it is sufficient to vary just the two parameters, k and L, associated with the conditions expressed in Eqs. (11) and (12). The number of pairs of integer numbers k and L subject to condition in Eqs. (11) and (12) is equal to:

NMINOBSNMINOBS+1/2.

Let L=lk+1be the length of an arbitrary set ykyl. We introduce the designations:

zkL=1Lj=kk+L1yj,E13
σ2kL=1L1j=kk+L1yjzkL2.E14

We rewrite conditions given by Eqs. (3) and (4) in the new designations:

σ2kLσmax2,E15
yk+L1zkL3σmaxzkLyk3σmax.E16

Note that the two last inequalities directly follow from Eq. (4), monotony of yj, and obvious inequalities: ykzkLyk+L1.

Remark. In the conditions expressed in Eqs. (15) and (16), L means the length of the set under checking, and k is the index of the smallest number included in the set. Although “k” and “L” are also encountered as indexes in the sets we use below for monotonically increasing sequences, we hope nevertheless that this will not lead to confusion.

The following recursive relationships are available, making it possible to find zkLand σ2kLthrough the calculated values zkL+1и σ2kL+1for seven arithmetic operations:

zkL=zkL+1+AL+1zkL+1yk+L,E17
σ2kL=BL+1σ2kL+1CL+1yk+LzkL+12,E18
AL=1L1;BL=L1L2;CL=LL1L2.E19

The values of the fractions may be computed in advance as elements of a one-dimensional array. Analogously, the following formulas can make it possible to express zk+1Land σ2k+1Lthrough zkLand σ2kL:

zk+1L=zkL+AL+1yk+Lyk,E20
σ2k+1L=σ2kL+AL+1yk+LzkL+DL+1ykzkL,E21
DL=LL2.E22

The algorithm described below is based on the search for all possible pairs (k, L), where L denotes the length of the set to be checked and k is the index of the smallest of the values included in the set. At that, k and L must satisfy conditions Eqs. (11) and (12). The set ykyk+L1corresponding to each such pair must be checked for fulfillment conditions (15) and (16).

We organize this search according to the algorithm described below, at each step of which we check the validation of Eqs. (15) and (16) for all possible sets of a certain length. We start the examine process with Step 1, where we check the set of maximum length N. Further, with each next step, we will reduce the length of the sets to be checked by 1.

Step 1: We consider the set of length N. There is only one such set: y1yN. We check it for fulfillment conditions expressed in Eqs. (15) and (16). If they are satisfied, this set is selected as a solution, and further search stops. Otherwise, transit to Step 2.

Step 2: We consider the sets of length N − 1. There are two sets of length N − 1.

y1yN1andy2yN.

We test each of these sets for compliance with the conditions specified by Eqs. (15) and (16). If none of them satisfies conditions (15) and (16), then we transit to the next step. Otherwise, the following options are available:

  • Option 1: if only one set from them is found that satisfies conditions (15) and (16), then it will also be the solution of the stated problem; the search process stops here, where Lmax= N − 1.

  • Option 2: if both sets simultaneously satisfy conditions (15) and (16), we will select the set corresponding to the smallest of two values σ21N1or σ22N1, and the search process stops here, where Lmax= N − 1.

Step N − L + 1: We consider the sets of length L. If L < MINOBS, then the search process stops, and a solution is not found. For L ≥ MINOBS, we examine N − L + 1 sets of length L:

y1yL,y2yL+1,,yNL+1yN.E23

We check each of these sets for fulfillment of conditions (15) and (16). If any of them does not satisfy these conditions, then we transit to the next step where we consider the sets of length (L − 1). Otherwise, two options are possible:

  • Option 1: if only one set from (23) is found that satisfies the conditions of (15) and (16), then it will also be the solution of the stated problem with Lmax= L, and the search process stops here.

  • Option 2: if several sets simultaneously satisfy conditions (15) and (16), we chose the set for which the value σ2kLappears smallest as the solution, and the search process stops here, where Lmax= L.

In order to calculate the values zkLand σ2kL, we use the recursive formulas (17)(22) in accordance with a search scheme shown in Figure 1 where only the zkLis involved.

Figure 1.

Scheme of calculations when finding the optimal solution.

In accordance with the proposed arrangement, we calculate the values z1Nand σ21Nin the first step of the algorithm using formulas expressed in Eqs. (13) and (14), and 4 N arithmetic operations are required for this arrangement. In order to find zkLand σ2kLon all subsequent steps, we apply the recursive formulas (17)(22), making it possible to calculate the values of the specified pairs of variables, each for the seven arithmetic operations, based on the results of the calculations of the preceding step. So, for example, on the second step, proceeding from the known values z1Nand σ21N, we find the values z1N1and σ21N1(vertical arrow on the diagram) by using formulas in Eqs. (17)(19) and the values σ22Nand σ22N(horizontal arrow on the diagram) by using Eqs. (20)(22). In the general case, the transition to the following step in the direction of the vertical arrows (see diagram) is carried out according to formulas (17)(19), and in the direction of horizontal arrows, according to formulas (20)(22). The number of arithmetic operations required to find the solution should not exceed:

4N+9NLmax+2NLmax+1/21E24

In the above number of computations, the computational costs of verifying the satisfaction of inequalities (15) and (16) are also considered, which comprise from 0 to 2 arithmetic operations.

4. Results of calculations using algorithm presented in Section 3

We test the algorithms discussed above on the real data obtained by the ONSA station that is a part of the IGS network [2]. These data are included in the distribution kit of the installation software package [3] and available for usage. We consider measurement data received from global positioning system (GPS) satellite with system number PRN = 12 for 2010, day 207 to check the efficiency of the proposed algorithm described in Section 3. Figure 2 plots the values of the Melbourne-Wübbena combination over a time interval of 89.5 min (N = 180). The index numbers j of time epochs counting from the beginning of a 24-h period with a 30-second interval are plotted on the horizontal axis. The values yjof the combination are plotted on the vertical axis and are expressed in cycles with wavelength λ50.86[3]. Figure 3 shows the values of deviations from the mean of the data cleared of outliers using the algorithms described in Sections 2 and 3. In both cases, σmax= 0.6 and MINOBS = 10. The values yjzare plotted on the vertical axes in cycles with wavelength λ5and the index numbers j of epochs on the horizontal axis. Epochs in which the data were rejected are designated by white circles. In the first case (see Figure 3a), 47 data of the measurements were rejected, which are 26.1% of the total amount of data. In the second case (see Figure 3b), 14 of these measurements were rejected (7.8%), which are almost 18% less than in the previous calculation.

Figure 2.

Melbourne-Wübbena combination for ONSA station (GPS satellite, PRN = 12 for 2010, day 207).

Figure 3.

(a) Deviations of values of the Melbourne-Wübbena combination from the mean value after data cleaning from outliers using the algorithm described in Section 2. (b) Deviations of values of the Melbourne-Wübbena combination from the mean value after data cleaning from outliers using the developed algorithm (see Section 3).

We also provide similar results for data obtained by TLSE station, which is also included in the IGS network. We consider measurement data from GLONASS, Russia satellite with system number PRN = 1 for 2010, day 207. Figure 4 shows the values of the Melbourne-Wübbena combination over a time interval of 65.5 min (N = 132). Figure 5 plots the values of deviations from the mean value of the data cleared of outliers using the algorithms described in Sections 2 and 3, respectively. Parameters σmaxand MINOBS are the same as in the previous calculation example. In the first case (see Figure 5a), 41 data of the measurements were discarded, which are 31% of the total amount data. In the second case (see Figure 5b), 8 of these measurements were rejected (6%), which are 25% less than in the previous calculation.

Figure 4.

Melbourne-Wübbena combination for TLSE station (GLONASS satellite, PRN = 1 for 2010, day 207).

Figure 5.

(a) Deviations of values of the Melbourne-Wübbena combination from the mean value after data cleaning from outliers using the algorithm described in Section 2. (b) Deviations of values of the Melbourne-Wübbena combination from the mean value after data cleaning from outliers using the developed algorithm (see Section 3).

5. Mathematical prerequisites for modifying of existing algorithm

Note that the number of arithmetic operations required to find the optimal solution according to the algorithm described in Section 3 depends on the Lmax, which is the length of the solution. As can be seen from Eq. (24), the smaller the length of the found solution (i.e., the larger the number of detected outliers), the more arithmetic operations are required to find it. This number of arithmetic operations is estimated to be of order N+NLmax2. Note that the expression in the parentheses herein is equal to the number of outliers detected. Thus, if the number of outliers in the original data series is comparable to N, it will take ∼N2arithmetic operations to find the optimal solution. In particular, to make certain that there is no solution (e.g., in the case where the data contain a trend), ∼N2arithmetic operations will also be required. In Section 6, we will modify the existing algorithm and describe fast outlier search algorithm that requires ∼Nlog2Narithmetic operations.

The necessary preparations are given in this section. Note that in this and the next sections we are dealing with the sequence yjj=1Narranged in the ascending order.

Assertion 2. Let yjj=1Nbe monotonically increasing sequence. The following inequality is true:

σ2kL+1minσ2kLσ2k+1L.E25

Proof. From the monotonicity of the sequence yjand the definition zkL+1[see Eq. (13)],

ykzkL+1yk+L.E26

One of two cases is possible:

  1. 2zkL+1yk+L+yk, zkL+1ykyk+LzkL+1,

  2. 2zkL+1>yk+L+yk, zkL+1yk>yk+LzkL+1.

Suppose, for example, the case (a) holds. Let us show that in this case

σ2kL+1σ2kL.E27

At first, we will show that:

yjzkL+1yk+LzkL+1;j=k,,k+L.E28

Truly, inequalities:

ykyjyk+L;j=,k+L,

and the above inequality derived in Case (a) implies:

zkL+1yjzkL+1ykyk+LzkL+1,E29
yjzkL+1yk+LzkL+1.E30

These inequalities, in turn, imply Eq. (28). Next let us prove (27). This inequality is expanded as follows:

1Lj=kk+LyjzkL+121L1j=kk+L1yjzkL2.

Substituting here of the expression (17) in place of zkLand writing for brevity, z instead of zkL+1, we get inequality:

L1j=kk+Lyjz2Lj=kk+L1yjzzyk+LL2.E31

Transform the right-hand side of this inequality:

RHS31=Lj=kk+Lyjz+yk+LzL2L+12Lyk+Lz2=Lj=kk+Lyjz2+L+1Lyk+Lz2L+12Lyk+Lz2.

Here we take into account the equality: j=kk+Lyjz=0. Next, we have:

RHS31=Lj=kk+Lyjz2L+1yk+Lz2.

Substituting this expression in Eq. (31), we get inequality

j=kk+Lyjz2L+1yk+Lz2,

that is true due to Eq. (28). Thus, Eq. (27) is proved for case (a). Analogously, case (b) is considered.

We introduce the notation:

σmin2L=min1kNL+1σ2kL.E32

Assertion 3. The following inequalities hold:

σmin2Nσmin2N1σmin2MINOBS.E33

That is, the sequence σmin2Lmonotonically decreases when L decreases from N to MINOBS.

Proof. Assertion 2 and definition of σmin2Lexpressed in Eq. (32) imply:

σ2kL+1minσ2kLσ2k+1Lσmin2L.

Since k is chosen arbitrarily, then for all L = MINOBS, …, N − 1 the following inequalities hold:

σmin2L+1σmin2L,

which proves Assertion 3.

Assertion 3 implies the following corollary.

Corollary 1. If the inequality

σmin2L0>σmax2.E34

holds for some L0, then for existence of the solution YL=ykyk+1yk+L1for the problem expressed in Eqs. (3)(7), it is necessary that the length L of the set YLsatisfies the condition L < L0.

Proof. Let us assume that L L0. Assertion 3 on account of monotony of σmin2, expressed in Eq. (33) implies the following inequalities σmin2Lσmin2L0>σmax2for all L≥L0. From this, it follows that for any set YL=ykyk+1yk+L1we will have σ2kLσmin2L>σmax2. Thus, any of sets YLof length L ≥L0does not satisfy the condition in Eq. (15) and, therefore, cannot be a solution of the problem, expressed in Eqs. (3)(7).

In particular, we have come to the next important result. If, for example, the inequalities σmin2MINOBS>σmax2are fulfilled, then the solution for the problem described in Eqs. (3)(7) does not exist.

In the above-described procedure for solving problem (3)(7), it takes ∼N2arithmetic operations to make certain that the solution not exists. Taking into account Assertion 3 and Corollary 1, the search procedure may begin by checking the conditions.

σmin2N=σ21Nσmax2andσmin2MINOBSσmax2.

This will require approximately ∼N arithmetic operations. If none of these conditions are fulfilled, the solution search must stop because the solution does not exist. As a result, only ∼N arithmetic operations are required to ensure that there is no solution.

6. Fast outlier search algorithm

The above proposed search procedure consists in the calculating values of zkLand σ2kLusing recurrent formulas (17)(22) and checking at every k and L the fulfillment of the inequalities (11) and (12). The algorithm complexity is estimated by value of ∼N+NOutlier2, where NOutlieris the number of outliers found. If it is known a priori that there are few outliers in the measurement data, then the search algorithm for the optimal solution that described in Section 3 can be applied. If the measurement data contain an N-comparable number of outliers, the complexity of such an algorithm will be estimated by about N2. It is namely for such type of data we below describe a modified outlier search algorithm with complexity of about Nlog2N.

First of all, note one property that is the key to the construction of a fast outlier search algorithm. Note that if the inequality (15) holds for some set of length L + 1, then there exists a set of length L for which the inequality (15) is valid too. Truly, let assume for some k the inequality σ2kL+1σmax2holds. This inequality and Eq. (25) imply

minσ2kLσ2k+1Lσmax2.

From this, it follows that

σ2kLσmax2and/orσ2k+1Lσmax2.E35

This means that at least one of these sets ykyk+L1and yk+1yk+Lwith length of L satisfies conditions expressed in Eq. (15).

However, this property is not true when checking the conditions expressed in Eq. (16). In other words, if these conditions are fulfilled for any set of length L + 1, it might happen that none of the sets of length L may satisfy them. This fact is a significant obstacle to increasing the rate of outlier detection that is necessary when processing a large amount of data with a large number of rough measurements. To overcome this obstacle, we will make the condition expressed in Eq. (16) weaker.

First of all, note that if for some set ykyk+L1, the both conditions expressed in Eq. (16) are fulfilled, then the following inequality will hold:

yk+L1yk6σmax.E36

Consider a problem with condition expressed in Eq. (36) instead of conditions expressed in Eq. (16).

Remark. Recall that in this condition L means the length of the set under checking, and k is the index of the smallest number included in the set. Although “k” and “L” are also encountered as indexes in the sets we use hereinafter, we hope nevertheless that this will not lead to confusion.

It is easily seen that condition expressed in Eq. (36) for an arbitrary set ykyk+Lof length L + 1 implies the fulfillment this condition for each of the sets ykyk+L1and yk+1yk+Lof length L. In fact, the fulfillment condition (36) for the set ykyk+Lmeans the fulfillment of inequality yk+Lyk6σmaxfrom which due to monotony of yj, the inequalities imply yk+Lyk+16σmaxand yk+L1yk6σmax.

Thus, we have established the validity of the following assertion.

Assertion 4. If the set yk…yk+Lof length L + 1 satisfies conditions (15) and (36), then at least one of the two sets ykyk+L1or yk+1yk+Lof length L also satisfies conditions (15) and (36).

Based on this statement, we can formulate the following:

Assertion 5. Solution for the problem (15) + (36) can be found for ∼Nlog2Narithmetic operations.

Proof. Let us consider the sequence of steps.

Step 0: Consider the segment NLeft0NRight0of numerical axis, where NLeft0= MINOBS, NRight0= N. In this step, there is one set y1yNof length N. We check it for fulfillment of the conditions expressed in Eqs. (15) and (36) with k = 1, L = N. If they are satisfied, this set is the solution, and our search stops. Otherwise, we pass to considering of NMINOBS1sets of length MINOBS. We check each of these sets for fulfillment of conditions expressed in Eqs. (15) and (36). If none of them satisfy these conditions, then we stop the search process and conclude that the solution does not exist. Otherwise, once we find the set of length MINOBS satisfying conditions (15) and (36), we transit to Step 1.

Step 1: Step 1 is the same as Step k described below for k = 1

Step k: On the kth step, where (k ≥ 1), we consider a segment NLeftk1NRightk1, which we halve, as a result we receive two segments NLeftk1NMidk1and NMidk1+1NRightk1, where NMidk1= NLeftk1+ NRightk1NLeftk1/2, [·] designate integral part of a number. Next, we check the sets of length NMidk1and NMidk1+ 1 for fulfillment of conditions (15) and (36). The following three cases are possible, schematically shown in Figure 6. The sign “−” above the point means that for none of the sets of the corresponding length the conditions (15) + (36) are satisfied, the sign “+” vice versa, that is, there is at least one set of the corresponding length satisfying (15) + (36). In the case of (a), we set NLeftk= NMidk1+ 1, NRightk= NRightk1and transit to the (k + 1)-th step; in the case of (c), we set NLeftk= NLeftk1, NRightk= NMidk1and transit to the (k + 1)-th step; in the case of (b), we search the solution (15) + (36) with minimal value of σ2kLwith L = NMidk1; the algorithm is terminated.

Figure 6.

Three possible cases for the proposed search. In case (a) we go to the right-hand side range (range for length of sets) to find a solution, in case (c) we go to the left-hand side range to find a solution, in case (b) we look for a solution with length L = NMidk−1 and the search ends.

The search process will continue until either case (b) or until the length of the segment NLeftkNRightkis less than or equal to 1. In either case, the total number of steps will not exceed the number of log2NMINOBS. Since ∼N operations are performed in each step, the search process is guaranteed to be terminated in Nlog2Narithmetic operations.

7. Search an unknown trend in the power polynomial class

The need to process GNSS measurements including a trend on which noise and outliers are superimposed arises at different processing stages of the application process. As already stated above, satellite clock corrections contain a linear trend. In some cases, it may not be known, and then, one has to search for it, for example, by the least square method. The presence of outliers in the measurement data is a significant obstacle to accurate determination of drift and offset parameters of satellite clocks. Other examples are linear combinations of code and phase data on two carriers [3]. To obtain high accuracy results, it is necessary to detect outliers against an unknown trend and remove them from further processing. This is the subject of this section.

7.1 Statement of the problem

Consider the problem of outlier detecting in data presented in the form of Eq. (1), recall that:

yj=fj+z+ξj;j=1N.

The procedure described above for finding the optimal solution in an ordered series of numbers may not produce an adequate result if applied to data containing an unknown trend. For example, there may be no solution, and all data will be defined as outliers. In order to detect outliers in a series of numbers with a trend using the algorithm described above, it is necessary to find a suitable approximation of an unknown function fjand subtract it from the data set. Searching for this approximation is usually done by selecting functions from some functional class. The choice of the functional class depends on the task. In some cases, these may be power polynomial, in other cases, trigonometric polynomial, etc. The presence of outliers in the data measurements makes it much more difficult to find such an approximation. In this section, we will describe the general approach to solving the problem and look for suitable approximations of trend fjin the class of power polynomial with real coefficients:

Pn,ja=anj/Nn+an1j/Nn1++a0,E37

where n is the polynomial degree, and a=a0anis vector of coefficients.

Thus, the problem consists in the creation of an algorithm for searching the trend in the class of power polynomial and detecting outliers in specified data series yjrepresented in Eq. (1).

7.2 Minimizing set of specified length L

Before we turn to the trend search algorithm construction, we will define the so-called minimizing set of given length L, which plays an essential role in the trend search. In addition, in Section 7.3, we will describe a search algorithm for such set based on the recurrent formulas (17)(19) and (20)(22).

Let YL=yj1yjLbe an arbitrary set of length L, composed of the values of a numerical series yjj=1N, the monotony is not supposed. The mean and the SD values for it are denoted by zYLand σYL. These values are calculated by standard formulas:

zYL=L1jj1jLyj,E38
σYL2=L11jj1..jLyjzYL2.E39

Definition 2. Given L for a specified sequence of values yjj=1Nthe set of values:

YL,min=yj1yjL,

at which the minimum value of σYL2defined in Eq. (39) is reached will be called the minimizing set of length L. The corresponding mean and SD values are denoted by zYL,minand σYL,min.

According to this definition, we have

zYL,min=L1jj1jLyj,E40
σYL,min2=L11jj1jLyjzYL,min2=minYLσYL2.E41

Minimum in Eq. (41) is searched by all kinds of sets of length L composed of numbers of series yjj=1N.

Note that the numbers yjare not supposed to be in the ascending order.

Next, we will formulate and prove a statement similar to Assertion 1, which will allow us, when searching for a minimizing set, to proceed from the original series to its ordered permutation.

Assertion 6. Let YL,min=yj1yjLbe a minimizing set of length L for a given sequence of values yjj=1Nand

ymin=minyj1yjL,ymax=maxyj1yjL,

then the interval yminymaxdoes not contain values yjthat are not in the set YL,min.

Proof. Let us assume the opposite: Let yjYL,min, ymin<yj<ymax. Let yk,,yk+L1is a permutation of the numbers yj1,,yjLin the ascending order; then yk=yminand yk+L1=ymax. One of these cases is possible:

a. yj<zYL,minand therefore subject to inequality yk<yj, we have

zYL,minyk>zYL,minyjzYL,minyk2>zYL,minyj2
zYL,minyj2zYL,minyk2<0E42

b. yjzYL,minand therefore subject to inequality yk+L1>yj, we have

yk+L1zYL,min>yjzYL,minyjzYL,min2yk+L1zYL,min2<0E43

In the first case, Case (a), we replace the value ykin the set ykyk+L1with yj. In the second case, Case (b), we replace the value yk+L1with yj. We want to show that doing such replacement, the value σYL,min2expressed in Eqs. (40) and (41) will decrease. This will mean that YL,minis not a minimizing set.

Suppose Case (a). For brevity, we will write below z instead of zYL,minand σ2instead of σYL,min2. Denote z˜and σ˜2, the similar values obtained after replacement ykwith yj. We have:

z=L1yk++yk+L1,E44
σ2=L11ykz2++yk+L1z2,E45

and

z˜=L1yk+1++yk+L1+yj,E46
σ˜2=L11yk+1z˜2++yk+L1z˜2+yjz˜2.E47

We want to show that

σ˜2<σ2.E48

Eqs. (44) and (46) imply:

z˜=z+L1yjyk.E49

Modify σ˜2expressed in Eq. (47) taking account of Eq. (49):

σ˜2=L11yk+1zL1yjyk2+
+yk+L1zL1yjyk2+yjzL1yjyk2.

After simplification with taking into account Eq. (45), we get from here:

σ˜2=σ2+L11yjz2ykz2L1yjyk2.

From (42), it follows (recall that we write z instead of zYL,min) that the expression in the square brackets is strictly less than zero. Thus, inequality (48) is proven. From this follows that the set YL,minis not minimizing one because the condition (41) is not met. Thus, we have arrived at a contradiction that proves the validity of the formulated Assertion 6. Case (b) is considered similarly.

7.3 YL,minsearch algorithm

Assertion 6 is much like Assertion 1, which made it possible to go from an arbitrary numerical series to an ordered one for the optimal solution search (see Section 3). Similarly, Assertion 6 makes it possible to go to an ordered number series to find the minimizing set of numbers of a given length. In our case, considerations similar to those presented in Section 3 may be made when searching for a minimizing set. Namely, if YL,min=yj1yjLis the minimizing set of length L, then an arbitrary permutation of the numbers yj1,,yjLwill also be the minimizing set. Thus, the process for searching of YL,mindoes not depend on the arrangement of given numbers yj. This allows us to arrange them in the order most suitable for searching the minimizing set. We arrange the values yjin the ascending order, and we will look for the minimizing set in the ordered sequence. As in Section 3, for brevity, the ordered sequence will also be denoted by yj. For simplicity of logic, we will assume that all yjare different, that is, Eq. (10) holds. Rewrite it for convenience:

y1<y2<<yN

Note that due to Assertion 6, if YL,minis the minimizing set and yj1and yjLare its smallest and greatest values, respectively, then all values of yjfrom the interval yj1yjLbelong to YL,min. Consequently, considering Eq. (10), we have yjL=yj1+L1and therefore

YL,min=yj1yj1+1yj1+L1.

Thus, in order to find a minimizing set of length L, it is sufficient for us to check N − L + 1 sets

y1yL,y2yL+1,,yNL+1yN.

and choose from them the one that has minimal SD value. In the minimizing set searching procedure, we use the appropriate designations zkLand σ2kLexpressed by Eqs. (13) and (14) to denote mean and square SD in the case of sets composed of the ascending ordered values.

The calculation of zkLand σ2kLwhen searching YL,mincan be carried out in the manner similar to that of the optimal solution according to the schematic, in which only the σ2kLis shown:

At first, we carry out the transitions in the direction of the vertical arrows (see diagram in Figure 7) and calculate sequential values σ21N,σ21N1,,σ21Lusing formula (17)(19). Finally, we go in the direction of the horizontal arrows and calculate values of σ21L,σ22L,,σ2NL+1Lusing formula (20)(22) and choose from them the minimal one.

Figure 7.

Scheme of calculations when finding the minimizing set of length L.

7.4 Trend search algorithm

In order to correctly detect outliers in measurement data that include an unknown trend, it is necessary to find and remove trend from the original data. The problem in determining of unknown trend is to find a suitable profile for the measurement data by adjusting the fitting parameters. This implies, in turn, the needs to select from the specified series yjthe “right” reference values yj1yjLused for fitting. If the data do not contain coarse measurements, all N values of the original data can be treated as reference ones and used for fitting. If the data contain rough measurements, the participation all or some of those values in the fit will result in a fatal distortion of the trend function and, as a result, an incorrect determination of the outliers. Thus, when determining the trend, it is necessary to exclude rough measurements from the number of reference values used for fitting and by which the trend approximations are built. We have the vicious circle. To determine outliers, it is necessary to find a proper trend approximation, and to find an exact trend approximation, we need to find all outliers and remove them from the values used for fitting. Another vicious circle is obtained when trying to choose the number of values for fitting. If we take it too small to minimize the possibility for outliers to be included into the subset of values for fitting, the data fit may not be accurate enough for the rest values of the data outside of the subset. If, on the contrary, we take rather large number of points for fitting, the outliers may be among them, and we will also get a bad approximation of the trend.

We describe herein a strategy for finding an unknown trend and detecting outliers. This strategy assumes that the number of outliers in the series presented by Eq. (1) does not exceed a certain value Nmaxoutknown a priori. Thus, we can suppose that the number of “right” values suitable for fitting in the series yjis not less than N − Nmaxout.

Below L is supposed to be fixed and associated with the number of the reference values of the series (1) used for fitting, and L ≤ N − Nmaxout.

Let us consider the following algorithm. It contains internal iterations, which we will denote with the upper index “s” in parentheses.

Step 0: n = 0. We set some L, satisfying the condition L ≤ N − Nmaxout.

Step 1: n++; s = 0; flagj0=1.

Step 2: s++. We fit polynomial to the data set and find fitting coefficients as=a0sans:

as=argmina0,,an1,anΦs1a,E50
Φs1aj=1NyjPn,ja2flagjs1.E51

Step 3: Consider the values ŷjsobtained after subtraction from yjthe polynomial values with coefficients found in Step 2:

ŷjs=yjPn,jas.E52

Using the algorithm described in Section 7.3, we find from the numbers ŷjsj=1Ndefined in Eq. (52) a minimizing set YL,mins=ŷj1ssŷjLssof length L. For this set, we calculate the corresponding values of the mean zYL,minsand SD σYL,minsaccording to the formulas (40) and (41) in which j1jLis replaced by j1sjLs. We redefine the reference points for the next fit process (Step 2) by marking them with flagjs:

flagjs=1,ifjj1sjLs0,otherwise.E53

We transit to Step 2 and do so until the convergence of the σYL,mins, s = 1,2, …, is achieved. Note that we will consider the issue of convergence further in Section 7.5. After reaching convergence of values σYL,mins, we transit to Step 4 for outlier detection.

Step 4: Searching the optimal solution for data set ŷjs=yjPn,jas, j = 1, …, N.

We find the optimal solution for values ŷjsusing the algorithm of Section 3 and determine the number Noutof outliers detected. If it turns out that NoutNmaxout, then solution is considered found; searching process stops: fj=Pn,jas. Otherwise, if Nout>Nmaxout, then we transit to Step 1.

We do this until we find a solution or until n reaches some preset value Nmax (e.g., 10). In this case, probably, we may need to select a different functional class to search for a trend.

Example. Let us consider the data simulated in accordance with formula: yj= 10j+10+2randomj, where j=1N, randomj—pseudorandom numbers, evenly distributed in the segment [0,1]. Let us set N = 150 and model 10 outliers at points j=6...10and j=140...144(see Figure 8). Further, let us set σmax= 0.6, MINOBS = 10 and search for outliers as described above with L = 130. If n = 1, we get Nout= 88; if n = 2, we get Nout= 33; if n = 3, we get Nout= 17; if n = 4, we get Nout= 10. Thus, a suitable trend approximation turned out to be a fourth degree polynomial. Figure 9 shows values for fourth degree polynomial fitted to the data set on the eighth iteration after the convergence discussed above is reached. In Figure 10, the differences ŷj=yjP4,jaare plotted. Positions of detected outliers are marked with white circles. Note that the trend was modeled by a function that does not belong to the class of power polynomials.

Figure 8.

Data simulated using: yj = 10j+10+2randomj and fourth degree polynomial after the first iteration.

Figure 9.

Simulated data and fourth degree polynomial after the eighth iteration.

Figure 10.

The differences ŷj=yj−P4,ja→, approximation of unknown trend with fourth degree polynomial.

7.5 Convergence of iterations in trend search algorithm

This section explains the convergence of the iterations described in the trend search algorithm (see previous section).

Assertion 7. The SD sequence σYL,minsof values calculated in the trend search algorithm monotonically decreases at s = 1, 2, …:

σYL,minsσYL,mins+1E54

and therefore converged.

Proof. We start our consideration with Step 3 and sth iteration, s = 1, 2, In Step 3, for the sequence ŷjsj=1N, we find a minimizing set of length L: YL,mins=ŷj1ssŷjLss. Write the expressions for the mean and the square of SD corresponding to this set. We have due to Eqs. (40) and (41):

zYL,mins=L1jj1sjLsŷjs,E55
σYL,mins2=L11jj1sjLsŷjszYL,mins2.E56

Transform the expression on the right side of Eq. (56). Substitution here instead of ŷjsexpression in Eq. (52) gives:

σYL,mins2=L11jj1sjLsyjPn,jaszYL,mins2.

Using the flagjsdefined in Eq. (53), rewrite the last equality as follows:

σYL,mins2=L11j=1NyjPn,jaszYL,mins2flagjs==L11j=1NyjPn,jb2flagjs=L11Φsb.E57

Here we introduce the designation b=a0s+zYL,minsa1sansand take into account the definition of Pn,jaexpressed by Eq. (37) and the definition of functional Φsagiven in Eq. (51). From Eq. (57), we get:

σYL,mins2=L11ΦsbE58

We transit to Step 2; s is incremented by 1. We find a vector as+1of polynomial coefficients, which minimizes the functional Φsa:

as+1=argminaΦsa.

Thus,

Φsas+1=minaΦsaE59

From the definition of the extremum of functional, it follows:

Φsas+1Φsb.E60

Taking into account Eq. (58), we have:

σYL,mins2L11Φsas+1.E61

The extremum condition (one of n + 1) of functional Φsais as follows:

a0Φsaa=as+1=0.

From here, we derive:

j=1NyjPn,jas+1flagjs=0.

Taking into account the designation for ŷjs+1given by Eq. (52), we can write this equality in the form.

j=1Nŷjs+1flagjs=0orjj1sjLsŷjs+1=0.E62

Consider the set YLs+1=ŷj1ss+1ŷjLss+1. The mean and SD values for it are calculated using formulas (38) and (39). Taking into account Eq. (62), we get:

zYLs+1=L1jj1sjLsŷjs+1=0,

and

σYLs+12=L11jj1sjLsŷjs+1zYLs+12=L11jj1sjLsŷjs+12=
=L11j=1NyjPn,jas+12flagjs=L11Φsas+1.

Thus,

σYLs+12=L11Φsas+1.E63

We transit to Step 3. In this step, for sequence ŷjs+1j=1N, we find a minimizing set of length L: YL,mins+1=ŷj1s+1s+1ŷjLs+1s+1. The SD square σYL,mins+12for this set does not exceed the same magnitude for any other set, particularly for YLs+1=ŷj1ss+1ŷjLss+1:

σYLs+12σYL,mins+12.E64

Finally, from Eqs. (61) and (63) and the last inequality (64), we get

σYL,mins2L11Φsas+1=σYLs+12σYL,mins+12

that is, σYL,mins2σYL,mins+12, that proves Assertion 7.

8. Detection and repair cycle slips in the Melbourne-Wübbena combination algorithm

In this and the following section, we describe cycle slip repair algorithm for observers represented in the form of Melbourne-Wübbena combination, which is often used in modern GNSS measurement data processing programs. Loss by the receiver of the carrier phase capture results in jumps in the code and phase measurement data. In the absence of jumps, as we already discussed in Section 2, the values of the combination consist of measurement noise superimposed on an unknown constant value dependent on a specified satellite-receiver pair.

In case of temporary loss of carrier phase capture by the receiver, jumps occur in a series of values representing the Melbourne-Wübbena combination. The procedure of detecting jumps and eliminating them from the values of the combination, called cycle slip repair, is one of the most important steps of preprocessing GNSS data. The main difficulty in detecting jumps is that neither the exact size of jumps nor their epochs are known. A number of algorithms, descriptions of which can be found, for example, in Refs. [3, 6, 8] are proposed for the detection of jumps. Although differing in detail, they are based on a common idea, that is, comparison of the SDs of the time series of measurement data obtained from one of the bounds of the time interval to an arbitrary moment, an epoch. If the differences of the SDs corresponding to two adjacent epochs exceed a predetermined threshold value, then a jump is declared in one of these two epochs. A drawback of similar algorithms is the frequent false detection of jumps during epochs containing rough measurements (outliers) since the values of outliers can exceed the sizes of a jump itself. On the other hand, an attempt to increase the threshold value leads to the opposite effect, an inability to recognize jumps that are small in magnitude.

Below, we propose a robust cycle slip repair algorithm that allows, more reliably than similar known algorithms, to detect jumps and determine their sizes. The proposed algorithm is based on search for so-called clusters consisting of epochs, in which the values of the combination are grouped about corresponding predefined values. Besides, this algorithm implements the above-described method of cleaning data from outliers based on the search for the optimal solution. This method, combined with Springer’s algorithm used in Ref. [3], allows for the reliable determination of multiple (cascade) jumps in the Melbourne-Wübbena combination.

The Melbourne-Wübbena combination LMWcan be presented in the form of the total of three terms [6]:

LMW=λ5n5+B+ν,E65

where λ5is the formal wavelength (for all GPS satellites λ5= 86.16 cm and for GLONASS satellites λ5= 84.0 ÷84.36 cm); n5is an unknown integer, the so-called wide lane ambiguity [3]; B is the residual systematic error caused by instrumental delays relative to the specific receiver-satellite pair and, as assumed, not time-dependent; and ν is a random component. Both parts of the equality in (65) are expressed in meters. At the same time, the LMWcombination is often expressed in cycles of wavelength λ5. Let us divide both parts of Eq. (65) by λ5. We introduce the designations y=LMW/λ5, β=B/λ5, and ξ=ν/λ5and derive for each of the epochs associated with indexes j=1,,N:

yj=n5j+β+ξj.E66

where n5jis an unknown piecewise-constant function of a discrete argument, regarding which it is only known that it takes integer values and can undergo integer jumps; β is an unknown constant value; and ξjis a random variable. The problem consists in the development of an algorithm for automatic processing of the informational data yjof form of Eq. (66), making it possible to effectively determine integer-valued jumps against the noise component and outliers.

9. Description of the proposed algorithm

Let us present the proposed algorithm as the following sequence of steps.

Step 0: We introduce parameter Δby specifying its value as Δ=2σmax. We mark the indexes of epochs with function flagjas nulls: flagj=0;j = 1, …, N.

Arrange the array yjin the ascending order and renumber it by placing the indexes in the ascending order. Denote the resulting array as Yj. For simplicity of logic, we suppose that all Yjare different. Therefore, we have: Y1<Y2<<YN. Denote the corresponding permutation of the values of function flagjas FLAGj.

Step 1: On this step, we are looking for the maximum density of array values yj. We consider the segments YjYj+Δof length Δ and look for the one that contains the largest number of Yivalues. Here, only those indexes i and j from the range of 1 ≤ i, j ≤ N are considered for which FLAGi=FLAGj= 0. The purpose of the ordering of the original array consists in improving the effectiveness of the maximum density search procedure. It can be shown that the number of comparisons required when searching for the above segment does not exceed 2 N in the case of the ordered array. For an unordered array, the number of comparisons is evaluated as N2.

Step 2: Let YJmaxYJmax+Δbe the segment found in the previous step, containing Nmaxvalues of Yj(see Figure 11). We calculate the mean m of these numbers Yj.

Figure 11.

Searching for the maximum of the density of values of the array Yj.

m=Nmax1j=JmaxJmax+Nmax1Yj,E67
YjYJmaxYJmax+Δ.E68

Step 3: Cluster searching. The values yjthat are clustered about the mean m found in the previous step can pertain to the epochs scattered along the time axis in chaotic order. The task of this step is to define an accumulation of such points, a so-called cluster, which we define as follows:

Definition 3. We designate as an mΔcluster the set of points (epochs) of the time axis, the indexes j of which belong to the segment [k,l] (1k<lN) and for which the following requirements are satisfied:

  1. all points of the segment [k, l] are marked by the same value: flagk= … = flagl.

  2. at the points j= k, l of the left and right boundaries of the segment, this inequality is satisfied:

    yjmΔ.E69

  • the amount of points at which (69) is satisfied is no less than the present value of MINOBS;

  • the number of consecutive points in which (69) is not satisfied does not exceed the predefined value MAXGAP (e.g., 5);

  • the value of k cannot be reduced while maintaining the requirements of a–d; and

  • the value of l cannot be incremented while maintaining the requirements of a–d.

  • This definition illustrates in Figure 12.

    Figure 12.

    Epochs with indexes k≤j≤l form mΔ cluster.

    It is understood that for specified values of m and Δ, there might be one, several, or no (m, Δ) clusters. Searching for clusters is performed by sequential checking of the satisfaction of inequality (69) for all j = 1, , N for which flagj = 0. Note that inequality (69) is satisfied for at least Nmaxvalues of j. In fact, from expressions (67) and (68), we derive the following equations:

    YJmaxmYJmax+Δ,E70

    and since the arrays yjand Yjdiffer only by permutation, it follows from Eq. (68) that the inequalities

    YJmaxyjYJmax+Δ.E71

    are fulfilled for Nmaxindex j. Inequality (69) is satisfied also for all these indexes, as follows from Eqs. (70) and (71).

    If cluster was found, then we mark all points of it as 1, and then, we repeat the cluster search procedure. If even just one cluster has been found at this step, we transfer to Step 1. If not even one cluster has been found, then the search for clusters is complete and we transfer to Step 4.

    Step 4: If even just one cluster has been found, we transfer to Step 5, and otherwise: (a) all points of the segment 1Nare marked as outliers and removed from further processing and (b) the operation of the algorithm terminates.

    Step 5: Search for individual jumps in clusters. Let us assume that n ≥ 1 clusters have been found:

    1.k1l1is them1Δcluster
    n.knlnis themnΔcluster.

    In each of the clusters that are found, outliers and 1-size jumps are possible. This follows immediately from the inequalities in (69) and the preestablished value Δ=1.2.

    Substep 5.1: In detecting a 1-size jump in a cluster, we use modified Springer algorithm (see Refs. [9, 10]) combined with the proposed in Section 3 algorithm that executes a search for the optimal solution with a minimum quantity of defective data.

    Substep 5.2: We find all epochs Jpand values Δn5,Jpn5,Jpn5,Jp1(p = 1, …, n) of the jumps inside the clusters. The possible values for Δn5,Jpare 0, ± 1.

    Substep 5.3: We repair the data by the value of each found jump, using the formula

    yjp=yjp1;j<Jpyjp1Δn5,Jp;Jpj<N;p=1,..,n,E72

    where yj0=yj.

    Substep 5.4: We rename: yj=yjn.

    Step 6: Marking of points outside clusters. All points outside of the found clusters (if any) are marked as outliers.

    Step 7: Ordering of clusters. We renumber the clusters so that they are placed left to right on the time axis. For the ordered set of clusters [kp, lp], p = 1, …, n, the conditions 1 ≤ k1 < l1 < k2 < l2 <  < kn < ln ≤ N are satisfied.

    Step 8: Data screening within clusters and improving the mean values of yj.

    Substep 8.1: In accordance with the algorithm proposed in Section 3, we perform screening from outliers in each of the n clusters.

    Substep 8.2: For each of the clusters cleaned of outliers, we determine the modified mean values mp.

    Step 9: Jumps between clusters. It follows from the description presented above that the remaining jumps in the data yjare on boundaries between clusters. If only one cluster is found, the algorithm is terminated: no additional jumps are detected, and the data are cleaned of outliers. If more than one cluster is found (n > 1), then the epochs j1, , jn–1 of the jumps will be the coordinates of the left-hand boundaries of clusters, beginning from the second: jl = k2, , jn–1 = kn. The values of jumps Δn5,jpare found in this case as rounded to the nearest integer of the difference of mean values of adjacent clusters:

    Δn5,jp=NINTmp+1mp,p=1,,n1.E73

    Step 10: Repair data. We delete the jumps between clusters using formula analogous to (72):

    yjp=yjp1;j<jpyjp1Δn5,Jp;jpj<N;p=1,,n1,E74

    where yj0=yj.

    We rename: yj=yjn1. The algorithm is terminated.

    10. Numerical calculations using algorithms presented in Sections 8 and 9

    We present here the results of testing the proposed algorithm using real data obtained by the JOZ2 station, which is part of the IGS network [2]. These data are included in the distribution set of the installation software package [3]. Testing was carried out for data obtained from GPS satellite with number PRN = 13 for 2010, day 207. Figure 13 shows the Melbourne-Wübbena combination values in the 107 min time interval (N = 215). The number j of time epochs counted from the beginning of 24-h day with interval of 30 seconds is plotted along the horizontal axis. The combination values yjexpressed in cycles with wavelength λ5are plotted along the vertical axis.

    Figure 13.

    Values of the Melbourne-Wübbena combination for the JOZ2 station (PRN = 13 for 2010, day of year = 207).

    Figure 14a and b presents the values of the deviations from the mean value z of data after cycle slip repair procedure, by using the algorithm applied in Ref. [3] (see Figure 14a) and the proposed algorithm (Figure 14b). The values yj–z in cycles of λ5are plotted along the vertical axes, and the number j of epochs is plotted along the horizontal axis. Epochs in which the measurement data were rejected are marked by light circles. In the first case (see Figure 14a), 111 of the conducted measurements or 51% of the total number of data was discarded. In the second case (see Figure 14b), 29 of these measurements were rejected (13%), which are almost 38% less than in the previous computation. The epochs of detected jumps are marked by daggers.

    Figure 14.

    (a) Deviations of the values of the Melbourne-Wübbena combination from the mean after detection and elimination of jumps from the data using the algorithm from Ref. [3]. (b) Deviations of the values of the Melbourne-Wübbena combination from the mean after detection and elimination of jumps from the data using the proposed algorithm in Section 9.

    11. Conclusion

    This chapter presents several effective and stable algorithms for processing data received from GNSS receivers. These data form the basis of almost all engineering applications in the field of computational geo-dynamics and navigation and cadastral survey and in numerous fundamental research works as well. The accuracy of the results obtained is significantly influenced by the quality of the data used in the calculations. In particular, the presence of rough measurements (outliers) in the observation data can significantly reduce the accuracy of the calculations carried out. One of the tasks at the preliminary stage of data processing is reliable detection and removal of rough measurements from the series of measured data with minimum amount of rejected data. The so-called optimal solution, introduced in the chapter, made it possible to detect and eliminate outliers from observed data minimizing the number of rejected measurements. In addition, it is assumed that the data may contain a trend as an unknown function of time. The strategy for determining of the trend is depending on the physical process in question under an assumption that the trend is a continuous function of time. The efficiency of the search is definitely influenced by the choice of the function class from which the trend is searched. In this chapter, we considered the class of power polynomials, but in some cases, this choice may not lead to the expected result. It may require, for example, a class of trigonometric functions to find a suitable trend. The automatic search for the best functional class, together with the strategy of effectively finding an unknown trend, against the background of random noise and outliers, is a complex task for future research.

    How to cite and reference

    Link to this chapter Copy to clipboard

    Cite this chapter Copy to clipboard

    Igor V. Bezmenov (June 9th 2020). Effective Algorithms for Detection Outliers and Cycle Slip Repair in GNSS Data Measurements [Online First], IntechOpen, DOI: 10.5772/intechopen.92658. Available from:

    chapter statistics

    19total chapter downloads

    More statistics for editors and authors

    Login to your personal dashboard for more detailed statistics on your publications.

    Access personal reporting

    We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

    More About Us