Table of results of traceability by all methods.
The traceability technology for sudden water pollution accidents can be used for fast, accurate identification of a pollution source in the river. A correlation optimization model with the pollution source position and release time as its parameters is established based on hydrodynamic calculation and on the coupling relationship between forward concentration probability density and backward position probability density; and the solution of the model is realized by using a differential evolution algorithm (DEA). A coupled probability density method is to convert the traceability of a sudden water pollution accident into the optimization of two minimum values. This method is simple in principle and easy in solution, realizing the decoupling of parameter of the pollution source. The concept of gradient is introduced to the differential evolution algorithm, improving the efficiency of searching process. The proposed method of traceability was applied to the emergency demonstration project of the SNWDMRP. The results indicate that the model has good efficiency of traceability and high simulation precision and that traceability results have a certain guiding significance to the emergent regulation and control of sudden water pollution events in a river.
- coupled probability density function
- correlation optimization model
- parameter decoupling
Frequent occurrence of various kinds of water pollution events makes the research of pollutant traceability receive increasing attention, especially for long-distance water diversion projects including the SNWDMRP. The uncertainty of sudden water pollution events increases the difficulty in coping with pollution urgently. Finding out the source of water pollution is a prerequisite for realizing water quality prediction and water body pollution control. Therefore, the first task of urgently coping with sudden water pollution is to determine the source of pollution at the first time after such event, make a reasonable emergency handling plan on the basis of pollution source intensity and the occurrence place and time which have been determined, and meanwhile, provide preconditions for sudden water pollution prediction and early warning [1, 2]. The aforesaid traceability technology infers the position and time of occurrence as well as pollution source intensity through research on the transfer and conversion laws of pollutants in a river and on the basis of a monitored pollutant concentration process, realizing the reconstitution of the pollution event and playing an important role in the emergent regulation and control process of sudden water pollution events .
2. Basic traceability principle of sudden water pollution
Essentially, both pollutant concentration prediction and pollutant traceability fall into the process of pollutant traceability. To be more precise, the prediction of pollutant concentration distribution can be taken as the forward track process of pollutants, while the traceability of pollutants the process of backward track and traceability, as shown in Figure 1.
According to the information on a pollution event, the process of determining the concentration of the pollutant during the event by using the simulation technology in Chapter 2 is forward track of pollutant transfer and whereabouts, while the tack of the related information on the occurrence process and source of the event by using observed data including pollutant concentration distribution is the reversal of the forward track. Compared with the prediction of pollutant concentration distribution, the traceability of pollutant is obviously more complex, not only involving the process of track and backward reconstitution of the event but also having nonlinearity and ill-posedness as the reverse problem of prediction.
According to the connotation of sudden water pollution traceability, the definition of such traceability can be divided into five categories: The first category of traceability refers to reconstitution of unknown coefficients in a pollutant transport model according to some (observed) information of temporal and spatial distribution of a known pollutant, including longitudinal dispersion coefficient, lateral dispersion coefficient, and degradation coefficient, and the research is called a problem on parameter identification research; the second category refers to inference of the right pollution source (collection) item in the model by using observed values, including pollution source position and release intensity and time, and such traceability is also called an identification problem of the pollution source (collection) item (called a traceability problem for short); the third category is reverse inference of initial conditions on the basis of known information, and such traceability is also called a backward-time research problem; the fourth category means to infer the type or parameters of boundary conditions according to known information, namely, the reverse inference research of boundary conditions; and the fifth category is the combination of the four categories above. The pollutant traceability technology studied in this chapter is mainly about the second category traceability problem, that is, the identification problem of source (collection) items.
Attention has being paid gradually to the research of the traceability of sudden water pollution in recent years. There are two kinds of common research methods: deterministic method and probabilistic method [4, 5]. The deterministic method mainly includes a regularization method [6, 7], a trial and error method , a least square method [9, 10], and other optimization methods. These methods have a clear physical meaning. A single solution can be obtained by using them, and the error is often great if information is not accurate. However, the probabilistic method is a kind of random method based on Bayes inference and Markov chain Monte Carlo (MCMC) sampling, providing multiple reliable alternative results depending on the collection of distribution information on random variables [4, 11, 12]. On the basis of common traceability methods, this chapter introduces a new fast traceability method for sudden pollution in a channel, which combines the deterministic and probabilistic methods  and provides multiple reliable alternative results for urgently coping with sudden pollution without depending on distribution information on random variables.
3. Common traceability methods
3.1 The deterministic method
The deterministic method is developed mainly by reference to the traceability research of underground water. Perform research on pollutant concentration distribution in a simulation event by using a pollutant transfer and diffusion model, set up an optimization model which takes the error square sum between the simulation result and the result actually measured as an objective function, solve the objective function of the optimization model by using a deterministic algorithm, and then seek for, via iteration, the calculation result which best matches the values actually observed. Typically, the deterministic method includes the regularization method, trial and error method, correlation regression analysis, and the like, as well as some heuristic algorithms including genetic algorithm and simulated annealing algorithm. Regarding the tedious problems of the trial and error method, Han et al.  performed the inversion by using “local and basic expansion of a forward problem” and the optimal control solution by converting traceability into a minimum value problem. Jin and Chen  obtained a response relation between the concentration of pollutants in a channel via variation of an objective function established by using the Laplace transform. On such basis, they determine the pollution source intensity of an upstream cross section under environmental capacity control via reasoning. Chen et al.  transformed the calculation expression of instantaneous discharge concentration at a 1D single-point source, obtaining a linear regression model. And via regression analysis, they determined the release position, time, and intensity of the pollution source as well as the longitudinal dispersion coefficient of a channel. Min et al.  carried out, by using genetic algorithm, the research, respectively, on the identification problem of multiple parameters including the flow velocity, diffusion coefficient, and attenuation coefficient of a 1D river, as well as on the identification problem of right items of a 1D convection-diffusion equation. This part presents a simple introduction of the deterministic method by taking the correlation regression analysis method as an example.
When pollutants are released instantaneously, the analytic solution of the 1D convection-diffusion equation can be further determined via Fourier transform or dimensional analysis. Assuming that the occurrence position and time of pollution are and , respectively, then we obtain
where is the average intensity of the initial surface source of the released pollutant along a cross-section, .
After taking logarithmic transformation of both ends of Equation (1), we obtain
where , , and .
For a fixed pollution event, the occurrence position and the time are constants. Therefore, if the concentration of multiple cross sections can be obtained at the same time and the control time parameter is a constant, then it is easy to judge that and X (i.e., the observation position x) meet a linear relation; and a linearly dependent optimization model can be built by using a regression analysis method :
With the position of the monitored cross section as a variable, after selection of x0 and t0, the concentration measured Ci at different monitored cross sections xi at the same time and corresponding to Xi, and at the monitored position corresponding to the same time are obtained via calculation. Theoretically, when x0 and t0 are selected as the true position and time of occurrence, R obtained via calculation should be 1.0.
From this, the problem of pollutant traceability is converted into determination of proper x0 and t0 via Formula (3), and the R meeting calculation is maximum, that is, 1.
The solution can be completed in five steps:
Firstly, calculate a proper , and it is selected to be 1.0 if meeting R. The calculation can be realized via derivation.
Calculate Xi corresponding to different observed cross sections at the same time according to known , fit and in combination of Formula (2), calculate the corresponding slope a and intercept , and therefrom, obtain via calculation.
Calculate the time parameter according to calculated slope , then in combination of known and finally the intensity of the initial surface source in combination of known .
The intensity of discharged pollutants calculated by using flow cross-section area at time at estimated pollution position is MA.
Considering the existence of fitting and observation errors, calculate via back substitution of obtained and , then judge whether is 1. If is 1 (or the error is very small), then complete traceability calculation. Otherwise go back to Step (2), properly adjust the slope and intercept , and then recalculate and .
From the aforesaid solution process, we can see that the observed data of multiple cross sections are needed at the same time for the model to realize the traceability of the sudden pollution due to a point source.
3.2 The probabilistic method
Because a traceability problem has ill-posedness, when the deterministic method is used for solution, the error in observation or model calculation might result in great deviation in results and distortion in the result of traceability. For this, random devices are introduced to the study of traceability. Among such devices, the one mostly used is the probabilistic method based on Bayes inference and MCMC sampling. Bayes inference is a kind of method based on the theoretical foundation of probability theory, which can reflect the uncertainty of sudden water pollution events in a channel. It solves the posterior probability distribution of such parameters on the basis of making full use of a likelihood function and the priori information of parameters to be determined and then obtains the estimated values of all parameters of a pollution source by respective sampling. This method can offer a random distribution function of the traceability result of a water pollution event. Therefore, the methods based on Bayes inference are mainly used to estimate happening probability of a sudden water pollution event. By using them, we can obtain the posterior probability distribution of the traceability result instead of a single solution, quantitate the uncertainty of the result, and gain more traceability information on the event. In order to acquire the estimated value of the result, Bayes inference should be combined with related sampling methods, such as Markov chain Monte Carlo (MCMC), randomized quasi-Monte Carlo (Monte Carlo, MC), and other sampling methods [11, 12]. Among them, the MC method is an estimation method in which an initial value easily converges to a suboptimal solution no matter whether the initial value deviates from its true value. Therefore, the accuracy rate of the traceability result obtained via this method is not high. The defect in the MC method can be compensated by iterating the mode of Bayes inference’s combination with the MC or MCMC method into the obtained distribution function of the traceability result. The MCMC method is a sufficiently long Markov chain acquired via random walk, only by which can the proximity of the sampling result to the posterior distribution of the traceability result be ensured, that is, the limit distribution of Markov chain is used to express the posterior probability density function of the traceability result. For this, the MCMC method expands the application of Bayes inference in the traceability research of environmental pollution events.
Taking Bayesian MCMC method, for example, this part provides a simple introduction of the traceability method in probability statistics. The method takes all variables in a traceability research model as random ones and considers that the solution to the traceability problem of a sudden water pollution event is a probability distribution. Firstly, it converts the priori information of the solution into a priori probability distribution by using Bayes method, then combines with observed data, and finally obtains the posterior probability distribution of the solution to be determined from the random sampling process of Markov chain by using the likelihood function between observed values and the values calculated via simulation. The probabilistic method includes the following main processes:
Model a basic traceability problem, and quantitate the priori information quantity of related traceability solutions (the intensity, occurrence position, and time of a pollution source) by adopting a reasonable probability distribution function.
Select and establish a reasonable likelihood function based on a channel water quality coupling simulation model and in combination of the information related to the site of a sudden water pollution event and to the hydrologic data of water space of the event.
Obtain a posterior probability distribution of traceability solutions based on the prior probability distribution and the likelihood function.
Perform sampling of the posterior probability distribution and acquire the estimated values of traceability solutions to the event.
Firstly, express the probability distribution table of the priori information on the known parameter vector before collecting observed data into . After obtaining observed data, the posterior distribution of the known parameter acquired via Bayes inference is , meeting
where stands for three parameters of the pollution source; d is the measured value of pollutant concentration.
is the probability distribution of the measured value. obviously has no relation with parameter . In the case with a known concentration actually measured, can be understood to be 1. From this, we obtain
The prior distribution of parameter in Formula (5) can be deemed a uniform distribution of a respective parameter within a prior range of values.
Therefore, to determine the posterior distribution of pollution source parameters, we firstly need to determine distribution , which can be defined as the likelihood function of a measured and a predicted value. Let , , and to be the measured value, predicted value, and likelihood function of no. measuring point, respectively, and to be an error of measurement, . Assuming that obedience mean of error is 0, standard deviation is the normal distribution of , and all measuring points are independent from each other, then
Thereby, we obtain the posterior distribution of pollution source parameters :
However, complicated or great model parameter space and number of dimensions make very abstract and difficult to be expressed visually. Therefore, direct use of Bayes method almost cannot solve an actual problem directly, which, however, can be solved by means of the MCMC method. According to different transition probability matrixes making up the Markov chain, the MCMC method mainly has the following sampling algorithms: Gibbs sampling algorithm, Metropolis-Hastings algorithm, and self-adaptive Metropolis algorithm. The self-adaptive Metropolis algorithm can converge to a target distribution for any prior distribution of , so this algorithm is selected for sampling.
According to the aforesaid derivation process, the main steps for traceability of a sudden pollution event based on Bayes inference and the MCMC method are as follows:
Suppose i = 0, and then initialize different variables.
Produce and accept random variables, and make up a Markov chain.
The priori parameter producing a uniform distribution equals .
From the produced pollution source parameters, calculate the concentration value at the observing point by using the methods in Chapter 3.
Calculate the likelihood function by using Formula (8).
Calculate the acceptance probability of the Markov chain by using the following formula:
A random number is produced which uniformly distributes within 0–1. If , then accept the parameters for this test and make . Otherwise let original parameter .
Repeat Steps (1)–(5) until reaching the number of predefined iteration or acquiring the posterior sample number preset for pollution source parameters, and then perform statistics of the posterior distribution law of all parameters and complete the traceability calculation.
From the aforesaid process, we can see that the probability statistics method depends on the priori range of values of parameters and the information of calculation error distribution and that the convergence rate of producing parameter values randomly by using uniform distribution is very slow if the priori range of values is large. Moreover, the probabilistic method reaches solution based on Markov chain Monte Carlo random sampling, without directivity for making no optimal judgment of existing results, also resulting in a lower efficiency of traceability.
4. Coupled probability density method
4.1 Basic principles and method
According to the introduction and analysis of the common methods mentioned above, the deterministic method has a clear physical meaning but a single solution, resulting in a great error in the case with inaccurate information. Although the probability statistics method can offer multiple reliable alternative results, it depends on the acquisition of distribution information on random variables. For the point of view of research trend, a new generation of traceability method which integrates the deterministic method and the probability method will become the future trend of pollutant traceability research. The coupled probability density method proposed in this chapter combines the deterministic method with the probability method and sets up a sudden water pollution traceability model based on a coupled probability density function (C-PDF). Based on hydrodynamic calculation and considering the observation error of a system, an optimization model is built by taking pollution source position and release time as parameters and through correlation analysis of the forward concentration distribution probability density and backward position probability density of pollutants, and then the solution of such model is realized by using DEA. On such basis, a minimum value optimization model is established based on the forward concentration distribution probability density function of pollutants to determine the intensity of the pollution source.
In case of a sudden event, the majority of pollutants enters a channel in the form of a point source; move and convert with water flowing in the channel. Based on traceability research at home and abroad and the transfer and conversion law of pollutants in a channel, this paper sets up a 1D traceability model for sudden river water pollution depending on a coupled probability density function (C-PDF) and infers the pollution source in the light of observed concentration of pollutants, thus realizing the reconstitution of pollution event. Based on 1D hydrodynamic calculation and considering the observation error of a system, an optimization model is built by taking pollution source position and release time as parameters and through correlation analysis of the forward concentration distribution probability density and backward position probability density of pollutants, and then the solution of such model is realized by using DEA. Meanwhile, a minimum value optimization model is established based on the forward concentration distribution probability density function of pollutants to determine the intensity of the pollution source.
The magnitude of pollutant concentrations at different cross sections in a channel can be expressed as the statistic at such cross sections where tiny substance particles appear at a respective time. Statistically, the high or low probability that substance particles appear at some position can be expressed by a probability function, and the concentration distribution of pollutants in a channel can be also described by some probability density function. On the contrary, without knowing the pollution source, the pollutants observed at some cross section in a river might come from any upstream place, whose probability also can be described by a probability density function. Considering that pollutant traceability is the reverse problem of pollutant concentration distribution, a probability density function used to describe such distribution is called a forward concentration probability density function. However, a probability density function used to describe the probability that the pollution source is at different positions is called a backward position probability density function. Depending on the definition of the probability density function, a forward probability density function can be obtained after normalization of concentration.
According to , the normalization of can be done, as shown below:
where is the forward concentration probability density value corresponding to , with a dimension of m−1, indicating the probability that pollutants appear at cross section x at time t.
According to Neupauer and Wilson’s derivations, forward concentration transport and backward position traceability are adjoint processes to each other [16, 17]. For a 1D channel, is used to express the probability, judged from an observed cross-section xd, that a pollution source is at xs at time (i.e., the probability that pollutants are transported from cross-section to cross-section within time , then meets the adjoint state equation of a convective diffusion Eq. (10)) and normalization conditions (also has a dimension of m−1), as shown in the following formula:
where is the time point for inverse calculation and is the time point for observing pollutant concentration. Formula (11) indicates that the pollution source only can be at the observed cross section when pollutants are not transported () but appear at such cross section.
Like concentration transport process, Formula (11) can be considered a position probability transport process. Similarly, when u is constant, an analytic solution can be obtained, as shown below:
We can see that and have the same form. Actually, the relation between Formulas (1) and (12) decides that notwithstanding flow field, the relation between and can be determined by Figure 2, on which the arrows show the direction of transport (advection item).
Pollutant concentration distribution and traceability are inverse problems to each other, but both of the problems follow the same physical law basically. Figure 2 shows that when , the probability that pollutants move from source to cross-section after time equals to the probability, judged by an observer at cross-section , that pollutants move from cross-section to cross-section after time . Namely, if , the following Formula (13) holds
From Figure 2 and Formula (13), we can see that forward concentration transport process is highly coupled with backward position probability transport process. The two processes are exactly the same except for backward-time calculation direction. Because has no relation with the intensity of the pollution source, can be directly calculated by using Formula (10) or (12) with and given. Therefore, a linearly dependent model can be established by substituting for based on such coupling relation in Formula (13) to realize traceability calculation.
Assuming that observed concentration series is and that the corresponding calculated position probability density series is , , according to the aforesaid derivation, we can obtain the expression of the correlation coefficient r of the two series is as follows:
where and are the arithmetic averages of Ci and Pi, respectively.
Through solution of the aforesaid optimization model, we can obtain the release position x0 and time t0 of the pollution source.
The optimization model realizes the decoupling of pollution source intensity and time parameter, so determine position and time parameters firstly and then the intensity of the nonpoint pollution source m0. Considering that the occurrence position and time of the pollution source have been determined and that the range of pollution source intensity m0 can be roughly determined by the previous model, the forward probability density function can be used to build an optimization model to calculate pollution source intensity. See Formulas (18) and (19).
where coefficient is used to remove small concentration error loss arising from a large concentration difference.
4.2 Model solution
The model uses a smart differential evolution algorithm (DEA) method for solution, which is similar to the genetic algorithm [1, 18]. Solving a traceability optimization model by using the DEA includes population initialization, mutation, recombination, and selection. Take the first optimization model made up by Formulas (14) and (16), for example. The DEA defines and as property elements and produces basic evolutionary individuals, with as a fitness target function. The detailed steps of solution are as follows:
Population initialization: let the scale of population equal to NP, and then produce the first individual from Formula (20) (are the individual and evolution generation numbers, respectively):E20
where stands for a random number within [0,1]; and are the minimum and maximum values of , respectively. After initialization, calculate their fitness values , respectively.
Mutation: select three individuals , , and from population via uniform sampling, and then produce a mutation individual from Formula (21):
where is a scaling factor, generally , , and are all not i. If , then regenerate a mutation individual from Formula (20).
Recombination: before and after mutation, each property element of an individual undergoes recombination and produces a new individual . The rule of recombination is as follows:
where CR is a recombination constant, generally CR ∈ [0.8,1], refers to the property value of in , and is a random value of 1 or 2.
Selection: superior individuals replace an original individual and go into the next generation. Select superior individuals according to Formula (23):
According to the given steps, perform cycle evolution and iterative computation until adaptability meets requirements (the value of target function meets limited conditions, for example, it can be controlled by ) or ends when it evolves to a maximum algebra . After that, select the individual having the minimum value of fitness function (the parameter values that individual elements stand for are x0 and t0 to be determined), and the calculation is completed.
4.3 Improvement to coupled probability density method
The model uses the DEA for solution, the result of calculation has a great randomness, and some optimized results might appear repeatedly. Therefore, some optimized rules should be added to control the value of optimized fitness function to make it change unidirectionally. For this, this chapter puts forward the application of gradient concept in a differential evolution process to provide an optimization direction and improving optimization efficiency.
Firstly, add two new attributes signifying gradient into an individual, namely, gradient feature factor and gradient direction factor . A population individual newly defined is , where the gradient feature factor and gradient direction factor are determined from Formulas (24) and (25).
where sgn is a symbolic function.
After setting gradient attributes, the trend information that facilitates population evolution can be kept during differential evolution selection. For example, supposing an individual is superior to its previous generation of individuals and , then this means that the transient superior evolution trend of this individual is “increasing position parameter and reducing time parameter .” The information on such trend can be used to guide the evolution of the next generation of individuals. When making improvement to DEA optimization in actual applications, we can utilize and obtain gradient information as an additional rule to control evolution direction and guide the evolution of each generation of individuals.
5. Application examples
As shown in Figure 3, a emergent treatment demonstration project of sudden water pollution events was carried out from the check gate in the Fangshui River at Jingshi Section (Pile 100 + 294.750) to the check gate in the Puyang River (Pile No. 113 + 492.750) of the SNWDMRP on March 22, 2014. Cane sugar was used as tracer, and the concentration of cane sugar solution, as a water quality detection index. The experimental channel has a trapezium cross section, whose parameters are bottom width = 18.5 m, side slope = 2.5, average water depth = 4.0 m, channel bottom slope = 0.00005, manning roughness = 0.015 (design value), the experimental estimated value of dispersion coefficient = 3.43 m2/s, and constant flow ≈ 8.0m3/s. About 800 kg of cane sugar was put instantaneously into the channel at 9:00 AM on March 22, and four monitoring cross sections were set downstream. In order to ensure successful instantaneous putting of 800 kg cane sugar, the cane sugar was dissolved in heated clear water of 1.0 m3 (about 80.0°C) timely and in advance. When the experiment started, a floating bridge already built was used to directly pour dissolved cane sugar solution into the middle of the channel. The concentration monitoring process at 1508 m downstream of the cross section where the solution was poured is shown in Figure 4, and the start time of monitoring was 9:30.
Supposing the pile number of the pouring cross section is 0 m, then the pile number of the monitored cross section is 1508 m. The width and water depth at the experimental section of the channel were very small with regard to the length. After checking calculation, the pollutants 1508 m from the monitored cross section were uniformly mixed roughly in the lateral and vertical directions. The time for urgently coping with sudden pollution is short. Without considering degradation, the transport of pollutants can be described by using a convective diffusion equation at a 1D constant flow. A traceability model is applied for traceability calculation, which was established by using the probability statistics method and the proposed coupled probability density method among conventional traceability methods. The results are shown in Tables 1 and 2.
|M (kg)||x0 (m)||t0|
|The value calculated by the probabilistic method||781||−570||9:14 AM|
|The value calculated by the coupled density method||731.13||−28.42||8:48 AM|
|M (kg)||x0 (m)||t0|
|The error calculated by the probabilistic method||2.38%||570||14 min|
|The error calculated by the coupled probability density method||8.59%||28.42||12 min|
From Tables 1 and 2, we can see that the error in the position of a pollution source calculated by using the coupled probability density method is not up to 30 m, time error about 12 min, and the intensity of pollutants not larger than 10%. However, considering the calculation done by the probability statistics method, although the precision of its pollutant intensity is higher, the position and time are not calculated more accurately, especially the error of the position calculated is above 500 m. Furthermore, to solve the same problem, it takes much more time for calculation by the probability statistics method than by the probability density method. As a whole, when a sudden water pollution accident happens during water diversion, the adoption of the traceability technology based on the coupled probability density method is recommended more.
Due to uncertainty of sudden water pollution events, after occurrence of an event, it is very difficult to determine the location and release time by experience. Therefore, the first task for emergent treatment is to determine the pollution source. The main content of fast traceability of sudden pollution is to determine the location, release time, and the intensity of the pollution source. Regarding fast determination of the conditions of a pollution source in case of sudden water pollution under SNWDMRP, this chapter presents detailed introduction and analysis of the fast channel traceability technology including conventional traceability, provides practical effect of the traceability technology via analysis of example applications, and recommends the application of the coupled probability density method to traceability calculation according to traceability results, in order to provide support for fast traceability and emergent regulation and control of sudden water pollution under long-distance water transfer projects.
However, the traceability model can have good application results for single-point source of sudden water pollution accident in a river. The traceability model for multipoint sources of sudden water pollution accidents needs deeper research and can refer to the ongoing research by Wang .