Data Assimilation Application to the Subsurface Flow and Solute Transport

part, we will assimilate solute transport data to calibrate conductivity fields via the constrained EnKF method, where the conservative solute concentration is constrained to be non-negative. The constrained EnKF method is used to identify a two-dimensional (2D) heterogeneous hydraulic conductivity field with mixed Neumann/ Dirichlet boundary conditions through assimilating conservative solute concentration observations. The study focuses on investigating the capability of constrained EnKF to update a hydraulic conductivity field, and explores the influences of different factors on assimilation results.


Introduction
Natural media are generally heterogeneous. The medium properties (hydraulic parameters such as hydraulic conductivity and porosity) are often treated as random space functions. Without an adequate description of the spatial distributions of the hydraulic parameters, such as hydraulic conductivity and porosity, the ability of the numerical model to predict groundwater flow and solute transport is limited and tends to deteriorate over time. However, characterization of field heterogeneity is always difficult due to economical and technical limitations, which make the predictions, deteriorate over time.
Many inverse algorithms have been developed in the field of hydrogeology (Constantinescu et al, 2007;Hoeksema and Kitanidis, 1984;Poeter and Hill, 1997;Sun, 1994;Sun and Yeh, 1992;Vrugt et al., 2005a;Vermeulen et al., 2006;Yeh and Liu, 2000;Zhang 1996, Zhu andYeh, 2005). With newly developed techniques, more observations become available. But the observation data from these techniques are usually indirectly related to the model parameters, which are of the most interest in practice. Moreover, there may be different kinds of measurements. Therefore, it is necessary to develop approaches to dynamically reconcile the observation information of different kinds. The data assimilation method is established in this situation (Thomsen and Zlatev, 2008).
In geophysics, data assimilation methods have been used to assimilate geophysical data to characterize medium heterogeneity (Christakos, 2005). Data assimilation methods have been used in hydrology for uncertainty and sensitivity analysis (Liu and Gupta, 2007;Vrugt et al., 2005b) and in the petroleum industry (Oliver et al., 2008).
The traditional Kalman filter (KF) is an efficient sequential data assimilation method for linear dynamics and measurement processes with Gaussian error statistics (Kalman, 1960;Kalman and Bucy, 1961;Drecourt, 2003;Drecourt et al, 2006;Gelb, 1974;Maybeck, 1979;Tipireddy et al., 2008;Zou et al., 1991;Zhang et al., 2007). To assimilate data for nonlinear dynamics and measurement processes, the extended Kalman filter (EKF) was developed (Jazwinski, 1970). However, the EKF is very unstable if the nonlinearities are strong. Furthermore, this method is not computationally feasible for large-scale environmental systems.
To overcome these limitations, the ensemble Kalman filter (EnKF) is proposed (Evensen, 1994). The Ensemble Kalman filter (EnKF) method is a Monte-Carlo variant of the KF, consisting of a set (or ensemble) of parallel short-term forecasts and data assimilation cycles. Statistics derived from the short-term forecasts are used to estimate background error covariances during the subsequent data assimilation step. Further, ensemble-based techniques have two potential advantages over the traditional EKF (Cohn, 1997): (1) the computational cost of ensemble techniques are significantly less than the EKF because covariances are estimated using a limited-size random sample, while the error covariances of EKF for each model component are propagated using linear tangent and adjoints of the fully nonlinear model, and exorbitant computational expense for a high-dimensional model [however, it may be possible to reduce the computations by computing in a reduced-dimensional subspace, e.g., Fisher (1998)]; (2) in terms of accuracy, ensemble filters may be more accurate than the EKF since covariances are estimated by propagating model states with a fully nonlinear model rather than under assumptions of linearity. Moreover, it is also able to account for the possible model noise/error. Recent development of the EnKF is not limited to updating system state variables (such as hydraulic head in transient groundwater flow modeling and solute concentration in solute transport modeling) as in conventional data assimilation, but also allow updating state variables and model parameters (such as hydraulic conductivity in transient groundwater flow modeling or in transient solute transport on the basis of the steady state groundwater flow) to yield more accurate model predictions. The application of the EnKF in hydraulic modeling (including groundwater flow and solute transport) has also received more and more attention. Chen and Zhang (2006) have successfully applied data assimilation for transient groundwater flow in geologic formations via EnKF, but in their study, they did not consider the influence of different boundaries and the locations of observation wells on the data assimilation results. Huang et al. (2009) used the EnKF data assimilation method to calibrate a heterogeneous conductivity field using steady state groundwater flow data and calibrate solute plume initial condition using tracer test data.
All of the above works are done by applying the EnKF to calibrate the hydraulic conductivity by assimilating the hydraulic head. For the solute transport modeling, though Liu et al. (2008) applied the EnKF to calibrate the conductivity fields by assimilating tritium concentra-tion at the MADE site, they first assimilated hydraulic head observations to update the hydraulic conductivity and obtained the flow field for the solute transport process. They also incorporated the updated variable longitudinal dispersivity during the solute transport modeling, which will influence the solute prediction and the corresponding inversed results including the updated hydraulic conductivity field.
All uncertainty sources, including parameters and output observations, need to be considered for improvement of forecast capability (Wang et al., 2009). In most physical systems, states and/or parameters are bounded, which introduces constraints on their estimates. While in sequential data assimilation techniques like the EnKF, the state variables are updated by minimizing the mean square error rather than following physical principles, which may result in the violation of physical constraints and many methods have been proposed to solve these problems (Prakash et al., 2008;Wang et al., 2009). In the conservative solute transport modeling, solute concentrations have their own physical meaning. So it is better to pose some constraints to them in case they violate the physical meaning after the application of the EnKF method (Hemant andOlive, 2009, Sridhar et al., 2007;Thacker, 2007). We call this method as constrained EnKF method.
According to the above introduction about the EnKF, the EnKF application in groundwater is classified into two types: using groundwater flow data or conservative solute transport data to calibrate the hydraulic parameters, e.g. hydraulic conductivity and corresponding state variables, e.g., hydraulic head and conservative solute concentrations. The rest is arranged as follows: In section 2, the EnKF data assimilation method is introduced based on the numerical forward groundwater flow and conservative solute transport models.
In section 3, the research design for the groundwater flow and conservative solute transport is described. The synthetic or illustrative examples or cases for the groundwater flow and conservative solute transport will be depicted in detail. In our research, we will apply EnKF method to update a hydraulic conductivity field through assimilating transient head data in the groundwater flow modeling part. The study focuses on investigating the capability of EnKF to update a hydraulic conductivity field through assimilating transient head data, and explores the influences of different factors on assimilation results. In the conservative solute part, we will assimilate solute transport data to calibrate conductivity fields via the constrained EnKF method, where the conservative solute concentration is constrained to be non-negative. The constrained EnKF method is used to identify a two-dimensional (2D) heterogeneous hydraulic conductivity field with mixed Neumann/ Dirichlet boundary conditions through assimilating conservative solute concentration observations. The study focuses on investigating the capability of constrained EnKF to update a hydraulic conductivity field, and explores the influences of different factors on assimilation results.
In section 4, the updated results through assimilating groundwater flow and conservative solute transport data will be provided, and then discussion and analysis of the influences of various factors on the results will be provided in detail. Conclusions for the inverse modeling of the groundwater flow and conservative solute transport will be made.

Data assimilation method
Since data assimilation is a process in which estimates of model parameters and variables are updated by requiring consistency with observations and governing equations, a data assimilation system is composed of a model operator, an observation operator and a data assimilation algorithm.

Transient stochastic flow model
The governing equation for transient groundwater flow in unsaturated-saturated medium is (Bear, 1972;Arlen, 2005): is the spatial parameters, i.e., in the three-dimensional domain, x means x, y, z; g(x, t) [T -1 ] is the external sink or source term; K(x) [LT -1 ] is the hydraulic conductivity; ∇ stands for (∂ / ∂ x,∂ / ∂ y ,∂ / ∂ z); S s is the specific storage. In this study, the hydraulic conductivity K(x) is considered as a log-normally distributed random space function while specific storage S s is treated as a deterministic constant. Since K(x) is a random function, the flow equations become stochastic nonlinear partial differential equations.

Transient conservative solute transport model
When the right side in Equation (1) is 0, then it becomes a governing equation for the steady state flow. On the basis of the mass balance and Fick's law, the governing equation for the transport of conservative solute in groundwater flow can be written as (Zheng, 1990): is the solute concentration of water entering from sources or sinks.

Ensemble Kalman filter
EnKF is a Monte Carlo method. The EnKF method is briefly introduced with perturbed observation proposed by Burgers et al. (1998). In this algorithm, predictions of model variables S, including both the variables (hydraulic head in the groundwater flow modeling or solute concentration in the solute transport modeling) and parameters (hydraulic conductivity), are given by their ensembles. Assuming normal distribution of model predictions, the ensemble mean is supposed to be the best estimate of the true state, and prediction error around the mean is measured by covariance of the ensemble (Evensen, 2003).
The covariance, P, of forecast and analysis error of a random variable S are defined as ( ) In the forecast step, forecasted model variables of each ensemble member are updated according to f is the forecasted model variable of the ith ensemble member at time t+1;S i,t a the analyzed model variable of the ith ensemble member at time t; M (.) is model operator, which is the flow model in groundwater modeling and transport model in solute modeling; e 1i is the model error vector, which is assumed to satisfy a Gaussian distribution with zero mean and covariance matrix Q s .
In the analysis step, the observation data are perturbed by adding random observation errors.
The observation vector at the time step t+1 for each ensemble member is given by: Where e 2i is random error vector of observation with zero mean and covariance matrix R e .
The forecast of each ensemble member is updated as follows (Burgers et al., 1998): The analysis state estimate at time t+1 is given by the mean of the ensemble members. The analyzed ensemble is then integrated forward until the next observation (hydraulic head or solute concentration) is available and the process is repeated. In comparison with commonly used inverse methods (e.g., general least squares and maximum likelihood methods), the EnKF can dynamically adjust the system estimate, without reprocessing existing data when new observations become available. The data assimilation procedure using the EnKF in this paper will be introduced later combined with the description of the study cases below.

Illustrative examples
Where M is the matrix volume fraction of the grain size distribution in the medium; n is the porosity of the medium; d 10 is the 10th percentile of grain size distribution. The distributions of M, n and d 10 were obtained from the field experiment. In this study, a special case is considered in which d 10 is a spatial random variable, its distribution is represented by a fractal function according to the field measurement, while the other two parameters, M and n are assumed to be constant and their mean values are used in Eq. (9). Using Hu et al.'s (2009) method, multiple, equally-probable realizations of the hydraulic conductivity field is generated based on Eq. (9). One of the generated realizations is shown in Figure 1(a), where it is apparent that the field is in the log form. One can see that the realized field is heterogeneous, but statistically stationary. An ensemble of a sufficient number of realizations should reach a constant field. To generate a "real" or reference conductivity field, one realization is randomly picked and modified the field by inserting a high conductivity layer in the middle of the domain by setting the conductivity value at every point in the layer 100 times the original value. The constructed figure with log form is shown in Figure 1(b) and the red band part is the modified part. This field will be used to generate hydraulic head data at selected measurement points, while the other realizations, without modification, will be used as the initial "guess" conductivity field by averaging them. For example, In Figure 1(c), the 500 realizations mean logK field is taken as the representative initial "guess" conductivity field.
conductivity field and the initial guess conductivity field in this study: Where M is the matrix volume fraction of the grain size distribution in the medium; n is the porosity of the medium; d10 is the percentile of grain size distribution. The distributions of M, n and d10 were obtained from the field experiment. In this stu special case is considered in which d10 is a spatial random variable, its distribution is represented by a fractal function accordi the field measurement, while the other two parameters, M and n are assumed to be constant and their mean values are used i (9). Using Hu et al.'s (2009) method, multiple, equally-probable realizations of the hydraulic conductivity field is generated b on Eq. (9). One of the generated realizations is shown in Figure 1(a), where it is apparent that the field is in the log form. On see that the realized field is heterogeneous, but statistically stationary. An ensemble of a sufficient number of realizations sh reach a constant field. To generate a "real" or reference conductivity field, one realization is randomly picked and modifie field by inserting a high conductivity layer in the middle of the domain by setting the conductivity value at every point in the 100 times the original value. The constructed figure with log form is shown in Figure 1(b) and the red band part is the mod part. This field will be used to generate hydraulic head data at selected measurement points, while the other realizations, wi modification, will be used as the initial "guess" conductivity field by averaging them. For example, In Figure 1(c), th realizations mean logK field is taken as the representative initial "guess" conductivity field.  Based on the hydraulic boundary conditions and the conductivity fields described above, the flow equations are solved with the Galerkin finite-element method (FEM).
The simulated results based on the reference conductivity field at the chosen "measurement" locations will be used as the real values of the hydraulic heads at the locations. Considering possible measurement error in reality, the real head values are perturbed by white noises and the perturbed head results serve as the observed head values. In this study, the noise has a mean equal to 0 (indicating unbiased observations) and the standard deviation is 1% of the hydraulic head measurements. It should be pointed out that the EnKF method is not limited to this simplification of measurement error. In fact, the errors, assumed or inferred from the measurements, may vary with measurement types, times, and locations. In this study, the simulation time is evenly subdivided into many time steps of size 1 [T].
In case 1, in the vertical (z) direction, the bottom level of the study domain is -100 [L], and the upper level is 2.5 [L]. The initial hydraulic head throughout the domain is assumed to be 3.0 [L], so the saturated groundwater is confined at first, and the specific storage for this confined study medium is assumed to be 0.0001. After some time of the pumping the groundwater table goes below the upper level of the medium in some areas, as the groundwater becomes unconfined. The specific storage for the unconfined study medium is set to 0.1. In the case study, the observation wells are distributed evenly in the study domain (e.g. 25 wells shown in Figure 1(d)), and continuous data can be obtained from the observation wells.
The procedure of the data assimilation using the EnKF is shown in a flowchart in Figure 2. First, ensembles of initial hydraulic conductivity fields are generated, and the generated hydraulic conductivity fields are used in transient flow model to calculate hydraulic head in the study area. These hydraulic conductivity fields and heads serve as the first-step forecast results. Second, if there are the hydraulic head measurements at the observation wells at the current time step, the measurements will be used to update the hydraulic conductivity and hydraulic head distributions through the EnKF algorithm. Finally, the updated hydraulic conductivity field and hydraulic heads are used to reinitialize the transient flow model at the next time step.

Transient state conservative solute transport
The conservative solute transport modeling is based on the steady state groundwater flow. In our study, the same study case is chosen as Case 2 (L x = L y = 50.0 [L], 50×50 square grid cells, the top and bottom boundaries are the second Neumann condition with no-flow, but constant head boundary conditions on the left and right sides.) in part 3.1 except that there is no pumping well at all. In this way, the steady state flow can be obtained. With the same idea as in part 3.1, the ensemble realizations of the initial guessed hydraulic conductivity fields are produced by Hu et al.'s (2009) method. One of the generated realizations of log form is directly picked out and shown in Figure 3(a), which is also used as the true or reference hydraulic conductivity field in the solute transport modeling. The ensemble number is selected as 300, and these realizations mean logK field is taken as the representative initial "guess" conductivity field, which is shown in Figure 3(b). Uniform distributions of 25 and 81 observation wells are studied here, and the former distribution have been presented in Figure 1(c), while the later distribution is presented in Figure 3(c). Continuous solute concentration data can be obtained from the observation wells where the solute reaches.
Based on the hydraulic boundary conditions and the conductivity fields described above, the forward groundwater flow equation is solved with the Galerkin finite-element method (FEM). Then the Darcy's velocity can be calculated and the transport equation is solved with the Galerkin FEM based on the initial instant injection of solute concentration somewhere.
In this way, the steady state flow can be obtained. With the same idea as in part 3.1, the ensemble realizations of th hydraulic conductivity fields are produced by Hu et al.'s (2009) method. One of the generated realizations of log picked out and shown in Figure 3(a), which is also used as the true or reference hydraulic conductivity field in the modeling. The ensemble number is selected as 300, and these realizations mean logK field is taken as the repr "guess" conductivity field, which is shown in Figure 3(b). Uniform distributions of 25 and 81 observation wells a and the former distribution have been presented in Figure 1(c), while the later distribution is presented in Figure  solute concentration data can be obtained from the observation wells where the solute reaches.
Based on the hydraulic boundary conditions and the conductivity fields described above, the forward groundwat is solved with the Galerkin finite-element method (FEM). Then the Darcy's velocity can be calculated and the tran solved with the Galerkin FEM based on the initial instant injection of solute concentration somewhere. The observed concentration values are obtained by adding white noises to the real solute concentration valu simulated results based on the reference conductivity field. The noise is unbiased and the standard deviation is concentration measurements. The simulation time for the solute transport modeling is evenly subdivided into ma size 0.1 [T] after the steady water velocity field is obtained by solving the governing flow equation.
To purely verify the ability of the EnKF method to update the hydraulic conductivity field through assimil concentration, all the other related parameters are assumed to be known except the hydraulic conductivity in th The solute longitudinal and transverse dispersivity is 0.1 [L] and 0.01 [L], respectively. The corresponding random also one realization produced in Hu et al. (2009). In order to explore the EnKF ability to update the hydraulic c through assimilating the solute concentration, the random porosity filed is the same for every realization.
Models usually have relevant physical laws or settings, but the EnKF formulation appears to be a promising appr with a wide class of nonlinear state estimation problems, it cannot handle bounds on the state variables (the solute and/or the parameters (hydraulic conductivity here) that are being estimated (Hemant and Olive, 2009;Prak Thacker, 2007;Wang et al., 2009), which will results in the violation of physical meaning. So it is necessary to accou state and parameters being estimated. The procedure of the data assimilation method using the constrained EnK flowchart as Figure 4. First, ensembles of initial parameters fields are generated, which are used in steady-stat calculate solute concentration. If there are the solute concentration measurements at the observation wells at the c the measurements will be used to update the hydraulic conductivity field and solute concentration distributions th algorithm. If it is desired to apply the ensemble Kalman filter for state estimation when states are bounded, necessary to modify the EnKF. Therefore, constraints are treated in a post processing after applying the EnKF (W and bounds have to be deal with in EnKF framework (Prakash et al., 2008). For example, the solute concentration less than 0, if the updated concentrations become less than 0, then we set them as 1% of the concentration value step. In this way, the negative value of the solute concentration can be avoided. Finally, the updated reactive rel and solute concentration are used to reinitialize the conservative solute transport model at the next time step. The observed concentration values are obtained by adding white noises to the real solute concentration values, which is the simulated results based on the reference conductivity field. The noise is unbiased and the standard deviation is 1% of the solute concentration measurements. The simulation time for the solute transport modeling is evenly subdivided into many time steps of size 0.1 [T] after the steady water velocity field is obtained by solving the governing flow equation.
To purely verify the ability of the EnKF method to update the hydraulic conductivity field through assimilating the solute concentration, all the other related parameters are assumed to be known except the hydraulic conductivity in the study domain. The solute longitudinal and transverse dispersivity is 0.1 [L] and 0.01 [L], respectively. The corresponding random porosity filed is also one realization produced in Hu et al. (2009). In order to explore the EnKF ability to update the hydraulic conductivity field through assimilating the solute concentration, the random porosity filed is the same for every realization.
Models usually have relevant physical laws or settings, but the EnKF formulation appears to be a promising approach for dealing with a wide class of nonlinear state estimation problems, it cannot handle bounds on the state variables (the solute concentrations) and/or the parameters (hydraulic conductivity here) that are being estimated ( meaning. So it is necessary to account for bounds on state and parameters being estimated. The procedure of the data assimilation method using the constrained EnKF is shown in a flowchart as Figure 4. First, ensembles of initial parameters fields are generated, which are used in steady-state flow model to calculate solute concentration. If there are the solute concentration measurements at the observation wells at the current time step, the measurements will be used to update the hydraulic conductivity field and solute concentration distributions through the EnKF algorithm. If it is desired to apply the ensemble Kalman filter for state estimation when states are bounded, then it become necessary to modify the EnKF. Therefore, constraints are treated in a post processing after applying the EnKF (Wang et al., 2009), and bounds have to be deal with in EnKF framework (Prakash et al., 2008). For example, the solute concentrations should not be less than 0, if the updated concentrations become less than 0, then we set them as 1% of the concentration value for the last time step. In this way, the negative value of the solute concentration can be avoided. Finally, the updated reactive related parameters and solute concentration are used to reinitialize the conservative solute transport model at the next time step.

Application of data assimilation method to the groundwater flow and conservative solute transport
To determine the accuracy of the updated state vectors, the reference (also called true) simulation is compared to the EnKF simulation using the root mean squared error (RMSE) as the following: where RMSE is the criteria for the degree of the data assimilation capability, M is the total number of numerical grid elements in the study domain, which is 2601 in this hypothetic study case of section 3 or 4; S i f and S i t stands for the estimated and reference hydraulic conductivities at grid i, respectively.

Transient state groundwater flow
In this part, the observations are assumed to be available for us for every simulation time, so the observed data can be assimilated for every current simulated time via the EnKF method.

Influence of ensemble size
To determine the ensemble size that should be used in our study, it is needed to study the influence of ensemble size on the data assimilation results. Here, it is assumed that there are 25 (5×5) observation wells, and they are distributed uniformly within four no-flow boundaries (Case 1).
The mean and variance values of the hydraulic conductivity field with and without data assimilation for different ensemble size is shown in Figures 5 and 6. The simulated hydraulic conductivity is compared with the real hydraulic conductivity fields for different ensemble size. The mean and variance values of the updated hydraulic conductivity fields for the ensemble size of 70, 90, 100, 200 are almost the same with the RMSE K more than 10 [L/T] at some assimilation steps, I only plot the ensemble size of 90 as a representative for simplicity in Figures 5 and 6. Similarly, I only plotted the ensemble size of 300 and 500 as representatives.
The initial ensemble mean hydraulic conductivity for different ensemble size (the value is an approximate constant as 13.10 [L/T]) is apparent at the first time step in Figure 5. It is shown that the first 2 or 3 time steps of data assimilation are the most efficient for every ensemble size simulation, with decrease of RMSE K value during these time steps. Also, I found that after the first 5 time step, the effect of assimilating observation data became worse ( Figure 6) for different ensemble sizes 300 and 500. So, it is better for us to assimilate only a few time step observation data. However, in Huang et al. (2009), the more data assimilation time steps, the more improved the updating hydraulic conductivity fields become. The criterion to end the process of data assimilation is introduced as: Where DifK is the mean difference of the updated value and the corresponding ensemble mean value over the domain; K i j is the model result of the jth ensemble for the ith grid; K i is the ensemble mean value at the ith grid. After the end of the data assimilation process, I consider the latest update as providing the most acceptable results. It is very clear that the mean and variance values of hydraulic conductivity fields with data assimilation for the ensemble size of 300 and 500 are much closer to the real hydraulic conductivity field. This demonstrates that the data assimilation method for the transient flow via EnKF in this study can update the model parameters (hydraulic conductivities here) well. Moreover, the larger the ensemble size, the better the updated hydraulic conductivity field, which agrees with the results from Chen and Zhang (2006), Evensen (2003Evensen ( , 2004 and Huang et al. (2009). It is well known that with the increase of ensemble size, more effort will be needed for computation. Since the ensembles consisting of 300 and 500 members (realizations) yield similar results (the mean and variance values are very close to the real hydraulic conductivity field at a sufficient number of time steps), an ensemble consisting of 300 members is considered to be an appropriate number of realizations for both the results and computational efficiency.
To compare the spatial distribution of the updated hydraulic conductivity fields under different assimilation steps with the real field, the spatial distributions of the log hydraulic conductivity field under the first and fourth time steps are plotted in Figures 7(a) and 7(b), respectively, with ensemble members of 300 and 25 observation wells. According to the plot, it can be seen that after four time step data assimilation, the updated mean hydraulic conductivity field is much better than the initial one shown in the Figure 1(c). Furthermore, it is very like the real field ( Figure 1(b)), which suggest that the 2D saturated transient flow simulation by assimilating easily observed hydraulic head data can not only update the hydraulic head in time, but is also an efficient method to update other hydraulic parameters such as the hydraulic conductivity.

Influence of boundary condition
In this section, the ensemble size of 300 is chosen as discussed above, 25 observation wells is selected to update the hydraulic conductivity by assimilating the observed hydraulic head data for different boundary conditions. The first condition is the one used in Case 1, four noflow boundaries, the assimilation results have obtained and shown in Figure 7 Figure 1(b), one can conclude that the updated conductivity field is similar to the real field only after four data assimilation steps when the boundaries are Neumann boundary condition, whether it is no flow or non-zero flow condition.
The third boundary condition is the one used in the Case 2, two no-flow boundaries and two constant hydraulic head boundaries. The updated hydraulic conductivity fields after 3 different time steps of data assimilation are shown in the Figure 8. It is shown from the figure that the updated hydraulic conductivity field is not similar to the real field even after the 50 th assimilation step. The results shown in Figures 7 and 8 indicate that this data assimilation method for transient flow works better for the study domain with Neumann boundaries than that with mixed no-flow and constant head boundaries. In other words, data assimilation Data Assimilation Application to the Subsurface Flow and Solute Transport http://dx.doi.org/10.5772/56705 method for transient flow does not work well for the study domain with mixed no-flow and constant head boundaries. These results can be explained by the boundary hydraulic characteristics. For the Neumann boundary condition, hydraulic head variation is directly related to the hydraulic conductivity distribution, without being constrained by constant head boundaries. While, for transient flow with a constant-head boundary, the hydraulic head variation will not be fully determined by the hydraulic conductivity field due to the head constraints imposed at the boundaries. Using extreme cases as examples, when the hydraulic conductivity is extremely large or small in the study domain with a constant-head boundary, the hydraulic head in the boundary is still constant. Therefore, the hydraulic conductivity field is not the only factor that determines the hydraulic head distribution in the transient flow domain, and the constant-head boundary is also a strong influence on the hydraulic head change in the study domain. Therefore, the constant head boundary condition will decrease the sensitivity of the transient hydraulic head change with hydraulic conductivity.

Impact of number of observation wells
To investigate the effect of the number of observation wells on the data assimilation, in addition to the conductivity field in Figure 7 calibrated through 25 observation wells, the data assimilation method is applied to Case 1 with 9, 16, 36, 49, 64 and 81 observation wells, where the observation wells are uniformly distributed in the study domain. The study results are shown in Figure 9.
9 Figure 1(b), one can conclude that the updated conductivity field is similar to the real field only after four data assimilation steps when the boundaries are Neumann boundary condition, whether it is no flow or non-zero flow condition. The third boundary condition is the one used in the Case 2, two no-flow boundaries and two constant hydraulic head boundaries The updated hydraulic conductivity fields after 3 different time steps of data assimilation are shown in the Figure 8. It is shown from the figure that the updated hydraulic conductivity field is not similar to the real field even after the 50 th assimilation step. The results shown in Figures 7 and 8 indicate that this data assimilation method for transient flow works better for the study domain with Neumann boundaries than that with mixed no-flow and constant head boundaries. In other words, data assimilation method for transient flow does not work well for the study domain with mixed no-flow and constant head boundaries. These results can be explained by the boundary hydraulic characteristics. For the Neumann boundary condition, hydraulic head variation is directly related to the hydraulic conductivity distribution, without being constrained by constant head boundaries. While, for transien flow with a constant-head boundary, the hydraulic head variation will not be fully determined by the hydraulic conductivity field due to the head constraints imposed at the boundaries. Using extreme cases as examples, when the hydraulic conductivity is extremely large or small in the study domain with a constant-head boundary, the hydraulic head in the boundary is still constant Therefore, the hydraulic conductivity field is not the only factor that determines the hydraulic head distribution in the transien flow domain, and the constant-head boundary is also a strong influence on the hydraulic head change in the study domain Therefore, the constant head boundary condition will decrease the sensitivity of the transient hydraulic head change with hydraulic conductivity.

Impact of number of observation wells
To investigate the effect of the number of observation wells on the data assimilation, in addition to the conductivity field in Figure  7 calibrated through 25 observation wells, the data assimilation method is applied to Case 1 with 9, 16, 36, 49, 64 and 81 observation wells, where the observation wells are uniformly distributed in the study domain. The study results are shown in Figure 9.    Whil flow with a constant-head boundary, the hydraulic head variation will not be fully determined by the hydraulic con due to the head constraints imposed at the boundaries. Using extreme cases as examples, when the hydraulic extremely large or small in the study domain with a constant-head boundary, the hydraulic head in the boundary i Therefore, the hydraulic conductivity field is not the only factor that determines the hydraulic head distribution i flow domain, and the constant-head boundary is also a strong influence on the hydraulic head change in the Therefore, the constant head boundary condition will decrease the sensitivity of the transient hydraulic head hydraulic conductivity.

Impact of number of observation wells
To investigate the effect of the number of observation wells on the data assimilation, in addition to the conductivity 7 calibrated through 25 observation wells, the data assimilation method is applied to Case 1 with 9, 16, 36, 49, 64 and wells, where the observation wells are uniformly distributed in the study domain. The study results are shown in Fig   Figure 9.

Impact of number of observation wells
To investigate the effect of the number of observation wells on the data assimilation, in addition to the conductivity field in Figure  7 calibrated through 25 observation wells, the data assimilation method is applied to Case 1 with 9, 16, 36, 49, 64 and 81 observation wells, where the observation wells are uniformly distributed in the study domain. The study results are shown in Figure 9. Comparing the real hydraulic conductivity field, the initial mean hydraulic conductivity field with the calibrated hydraulic conductivity fields with various number of observation wells shown in Figure 7 and 9, one can tell that with the increase of the observation wells, the updated hydraulic conductivity distribution will become closer to the real hydraulic conductivity field until the number of observation wells reaches 64. As shown in Figure 9 that when the observation wells are limited, such as 9 and 16 wells, the updated hydraulic conductivity is not close to the real hydraulic conductivity field, and the high conductivity channel is not well captured. However, when the observation wells become too many, such as 81 wells, the true information is too much, which will disturb the assimilation results and lead to "bad" calibration results. This phenomenon can be explained as one observed data from one observation well will influence a certain space around the well via EnKF by data assimilation. If the observation wells are too close to each other, some area will be calibrated by data from two wells at the same time. However, the observation wells and their influencing regions are considered to be independent in the data assimilation method, so, too many observation wells lead to over calibration of the hydraulic conductivity field. Moreover, the suitable uniform distance of two observation wells is still a problem and how to define the appropriate data assimilation frequency and space is an issue that requires further study.
From the assimilation results it is told that 25, 36, 49 and 64 observation wells are suitable for the calibration of the conductivity field with an ensemble size of 300, which is composed of 2601 numerical grids cells in the study area. In general, these results demonstrate that the data assimilation method using an appropriate number of observation wells for transient flow to calibrate a heterogeneous conductivity field for an ensemble size of 300 via EnKF is effective and convenient. Furthermore, it displays similarities with the real field.

Influence of boundary condition
To further investigate the influence of the observation wells on identification of a conductivity field from transient flow data via EnKF, three different observation well locations are set in case 1 with the ensemble size of 300. As shown in Figure 10  In the first pattern shown in Figure 10(a), the 16 hydraulic head observation are uniformly distributed in the whole but there is not a single observation well located within the high hydraulic conductivity zone. As discussed abov hydraulic conductivity field shown in Figure 9(b) does not represent the existence of the high conductivity zone well data will influence the updated data around the observation locations (Ott et al., 2004), so the updated hydrauli distribution cannot capture the zone of high hydraulic conductivity.
For the observation wells shown in Figure 10(b), they are still uniformly distributed. Although the observation we still not in the high hydraulic conductivity zone, they are around that zone. Therefore, the updated hydraulic con shown Figure 11

Influence of boundary condition
To further investigate the influence of the observation wells on identification of a conduct EnKF, three different observation well locations are set in case 1 with the ensemble size of 3 are the same number of 16 hydraulic head observation wells in the study field, but the different patterns, denoted 16, 016, 0016, and their corresponding updated hydraulic co assimilating the observed hydraulic head data are displayed in the Figures 9(b), 11(a) and 11 For the observation wells shown in Figure 10(b), they are still uniformly distributed. Altho still not in the high hydraulic conductivity zone, they are around that zone. Therefore, th shown Figure 11 In the first pattern shown in Figure 10(a), the 16 hydraulic head observation are uniformly distributed in the whole study domain, but there is not a single observation well located within the high hydraulic conductivity zone. As discussed above, the updated hydraulic conductivity field shown in Figure 9(b) does not represent the existence of the high conductivity zone well.
The observed data will influence the updated data around the observation locations (Ott et al., 2004), so the updated hydraulic conductivity distribution cannot capture the zone of high hydraulic conductivity.
For the observation wells shown in Figure 10(b), they are still uniformly distributed. Although the observation well locations are still not in the high hydraulic conductivity zone, they are around that zone. Therefore, the updated hydraulic conductivity field shown Figure 11(a) is similar to the real field. The third distribution pattern of the observation wells is shown in Figure 10(c), and the wells are not distributed uniformly, and are located closer to each other on the lower right side while some are in the high hydraulic conductivity zone. Similar to the Figure 11(a), the updated hydraulic conductivity field shown in Figure 11(b), is very similar to the real conductivity field.
The study results indicate that the observation well locations will influence the assimilation results of the hydraulic conductivity field. Better results are obtained if the observation wells sample zones of distinct conductivity. Optimal design of observation well locations to maximize data efficiency is a research issue requiring further research.

Conclusions
In this study, the ensemble Kalman Filter (EnKF) is used to estimate heterogeneous hydraulic conductivity distributions by assimilating hydraulic head measurements in a transient flow pumping field. The measurement error is considered, but the model error is not. The forecast model is assumed to be known. Two synthetic cases with different boundaries are designed to investigate the capacity and effectiveness of the proposed data assimilation method to identify a conductivity distribution. The developed method is suitable for 2 or 3- . The head measurements are used to identify a heterogeneous hydraulic conductivity field. Based on the study results, the following conclusions are obtained.

1.
The EnKF can be used to effectively calibrate a heterogeneous conductivity distribution by assimilation of hydraulic head measurements in a transient flow field with no-flow boundaries. Only after a few (no more than 10) assimilation steps, the spatial distributions of hydraulic head and hydraulic conductivity become significantly closer to the real field distributions than those without data assimilation.

2.
The model predictability depends on ensemble size. For our study cases, 300 realizations were found to be sufficient for transient flow calculation.

3.
Using the EnKF to calibrate a heterogeneous conductivity field through assimilating transient head data, the calibration is very efficient for the study domain with all Neumann boundaries, while it is not efficient with any constant boundaries in study domain. The boundary condition will significantly affect the assimilation results.

4.
The appropriate number of observation wells for our case study is 25, 36, 49 and 64 respectively. Compared with the total 2601 grids in the study area, the observation well number is still small. When well number is too small, the data cannot capture the variation, but when the number is too big, the data will be redundant and cause over calibration.

5.
The location of the observation wells will also significantly influence the calibration results. Better results will be obtained if the observation wells adequately sample the heterogeneities. For the study case, if the observation wells are located in both high and low conductivity regions, the calibration results will be better than if they are only located in one region.

6.
A criterion is introduced to determine the end of the data assimilation process, but how to develop a more suitable criterion needs further study. That criterion will allow the identification of the point where continued data assimilation will deteriorate the quality of the results.

7.
The simulation time of the assimilation steps is very short, and this is a problem requires our deeper thinking. The model error is not considered here, which will pose too much confidence to the model estimate. Maybe the latter one is the cause of the former problem. However, the model error is unknown here, so how to choose a model and determine the model error is another issue needs further thinking.

Transient state conservative solute transport
Similar to part 4.1, it is assumed that the solute concentration samples can be obtained in the observation wells with every simulation time step, so in this part the observed data will be assimilated through the constrained EnKF method (described in section 3.2) at every simulated time. Since the ensemble size of 300 is enough in part 4.1, 300 realizations will be used in this part too.

Small initial distribution of solute concentration
The solute concentration can be predicted after the water velocity field is obtained, and the initial instant injection is given. The solute concentration distribution of the initial instant injection is shown in Figure 12. The initial injection area is very small in Figure 12(a), and the larger on is in Figure 12(b). There are 81 observation wells as shown in Figure 3(c). I plot the mean and variance values of the hydraulic conductivity field with and without data assimilation under 300 ensemble sizes with small area of initial solute concentration in Figures 13  and 14, respectively. I compare the real conductivity field with simulated fields with and without the constrained EnKF method. The initial ensemble mean hydraulic conductivity under 300 ensemble size is 13.35 [L/T], as shown in Figure 13, while the real field mean is 7.92 [L/T]. It is shown in Figure 13 that in comparison with the initial conductivity field, the updated mean conductivity through the data assimilation method is much closer to the real mean value. However, some updated results could change from "good" at the first several steps to "bad" after many assimilation steps. The results at the first five or seven assimilated time steps are the most efficient. In Figure 14, the variance value of the updated result approaches the real value after several assimilation steps, however, the variance will continue increasing and deviates from that in the real field as time goes. The results are similar to the results in Figure  13. The RMSE K value also decreases in the first several time steps and then increases. The results suggest that it is only meaningful to use the data assimilation method to update parameters for the first several assimilation time steps. If we just take the results from the first several assimilation steps, the updated hydraulic conductivity field is significantly improved in comparison with the initial guessed field.
The solute concentration can be predicted after the water velocity field is obtained, and the initial insta solute concentration distribution of the initial instant injection is shown in Figure 12. The initial inject Figure 12(a), and the larger on is in Figure 12(b). There are 81 observation wells as shown in Figure  variance values of the hydraulic conductivity field with and without data assimilation under 300 ensemb initial solute concentration in Figures 13 and 14, respectively. I compare the real conductivity field with without the constrained EnKF method. The initial ensemble mean hydraulic conductivity under 300 ensem shown in Figure 13, while the real field mean is 7.92 [L/T]. It is shown in Figure 13 that in comparison wi field, the updated mean conductivity through the data assimilation method is much closer to the real me updated results could change from "good" at the first several steps to "bad" after many assimilation ste five or seven assimilated time steps are the most efficient. In Figure 14, the variance value of the updated value after several assimilation steps, however, the variance will continue increasing and deviates from th goes. The results are similar to the results in Figure 13. The RMSE K value also decreases in the first se increases. The results suggest that it is only meaningful to use the data assimilation method to updat several assimilation time steps. If we just take the results from the first several assimilation steps conductivity field is significantly improved in comparison with the initial guessed field. We plot the spatial distributions of the updated hydraulic conductivity field at various time steps in Figure 15, and the corresponding absolute hydraulic conductivity difference between the updated result and the real one is presented in Figure 16. The initial ensemble mean guess of hydraulic conductivity field (Figure 3(b)) is uniform, while the real field is heterogeneous and shown in Figure 3(a). The spatial distribution of updated hydraulic conductivity field in Figure 15 have captured the feature of the real field, two low hydraulic conductivity zones in the middle and two high hydraulic conductivity zones at the right-top and left-middle areas. From Figure 16, it is can be seen that the updated hydraulic conductivity fields are better than the initial guess, and the hy conductivity has been significantly improved in the middle area. This is explained as that the observed data are mainly ob from the middle area during the simulation time. However, the constrained EnKF method overestimates the high hy conductivity zone in the left-bottom and the right-top areas (Figures 15 and 16). The reason for the over estimation explained later, which leads to the deterioration of the results of the mean and variance values shown in Figure 13 and 14. It noticed that observed data of only 7 to 9 can be obtained from the observation wells because the solute does not spread domains at any time step, and solute duration time at any location is limited. Therefore, for any given well, if too many tim observations are used to update the conductivity field, the concentration values in some steps will be zero, such as in th period before the solute reaches the well point or the period after the solute passes that point. If these void concentration d used to update the conductivity field, the data will deteriorate the assimilation results. For the study case, the solute passes th some observation wells in a few time steps, so only the few time step data should be assimilated. In general, the EnK significantly improve the hydraulic conductivity field by assimilating 7 to 9 observed solute concentration data, demonstrates that the constrained EnKF is an efficient method to update the hydraulic conductivity field via assimilat observed solute concentration available. Moreover, compared with 2601 grid elements in the study domain, the number of the observation data (7 to 9) is very small, is very convenient in the reality. In part 4.1, From the study on updating conductivity field through assimilating hydraul data, it is found that 25 observed data (observation wells) are needed to capture major feature of the same conductivity field one used in this study. Therefore, the concentration data are more efficient than hydraulic head data. This is because the concentration in the middle part will be influenced by the groundwater velocity in the upstream and downstream area hydraulic conductivity field will affect the hydraulic head field through the flow model, and the velocity is controlled by b hydraulic head and conductivity fields through the Darcy's law (see Equation (2)). Thus, the groundwater velocity is d influenced by the hydraulic conductivity field. In this study, the solute transport is dominated by the advection, so the concentration will be very sensitive to the hydraulic conductivity in the upstream and downstream areas, and the observed concentration data will contain the information of the hydraulic conductivity distribution. Furthermore, the dispe parameters are also related to the groundwater velocity closely in the solute transport modeling, which suggests that the disp part of the solute transport is also sensitive to the hydraulic conductivity indirectly. So only 7 to 9 observed solute concen data in the middle part can be used to improve the hydraulic conductivity distribution not only in this middle part but can From Figure 16, it is can be seen that the updated hydraulic conductivity fields are better than the initial guess, and the hydraulic conductivity has been significantly improved in the middle area. This is explained as that the observed data are mainly obtained from the middle area during the simulation time. However, the constrained EnKF method overestimates the high hydraulic conductivity zone in the left-bottom and the right-top areas (Figures 15 and 16). The reason for the over estimation will be explained later, which leads to the deterioration of the results of the mean and variance values shown in Figure 13 and 14. It is also noticed that observed data of only 7 to 9 can be obtained from the observation wells because the solute does not spread to all domains at any time step, and solute duration time at any location is limited. Therefore, for any given well, if too many time step observations are used to update the conductivity field, the concentration values in some steps will be zero, such as in the time period before the solute reaches the well point or the period after the solute passes that point. If these void concentration data are used to update the conductivity field, the data will deteriorate the assimilation results. For the study case, the solute passes through some observation wells in a few time steps, so only the few time step data should be assimilated. In general, the EnKF can significantly improve the hydraulic conductivity field by assimilating 7 to 9 observed solute concentration data, which demonstrates that the constrained EnKF is an efficient method to update the hydraulic conductivity field via assimilating the observed solute concentration available. From Figure 16, it is can be seen that the updated hydraulic conductivity fields are better than the initial guess, and the hydraulic conductivity has been significantly improved in the middle area. This is explained as that the observed data are mainly obtained from the middle area during the simulation time. However, the constrained EnKF method overestimates the high hydraulic conductivity zone in the left-bottom and the right-top areas (Figures 15 and 16). The reason for the over estimation will be explained later, which leads to the deterioration of the results of the mean and variance values shown in Figure 13 and 14. It is also noticed that observed data of only 7 to 9 can be obtained from the observation wells because the solute does not spread to all domains at any time step, and solute duration time at any location is limited. Therefore, for any given well, if too many time step observations are used to update the conductivity field, the concentration values in some steps will be zero, such as in the time period before the solute reaches the well point or the period after the solute passes that point. If these void concentration data are used to update the conductivity field, the data will deteriorate the assimilation results. For the study case, the solute passes through some observation wells in a few time steps, so only the few time step data should be assimilated. In general, the EnKF can significantly improve the hydraulic conductivity field by assimilating 7 to 9 observed solute concentration data, which demonstrates that the constrained EnKF is an efficient method to update the hydraulic conductivity field via assimilating the observed solute concentration available. Moreover, compared with 2601 grid elements in the study domain, the number of the observation data (7 to 9) is very small, which is very convenient in the reality. In part 4.1, From the study on updating conductivity field through assimilating hydraulic head data, it is found that 25 observed data (observation wells) are needed to capture major feature of the same conductivity field as the one used in this study. Therefore, the concentration data are more efficient than hydraulic head data. This is because the solute concentration in the middle part will be influenced by the groundwater velocity in the upstream and downstream areas. The hydraulic conductivity field will affect the hydraulic head field through the flow model, and the velocity is controlled by both the hydraulic head and conductivity fields through the Darcy's law (see Equation (2)). Thus, the groundwater velocity is directly Moreover, compared with 2601 grid elements in the study domain, the number of the observation data (7 to 9) is very small, which is very convenient in the reality. In part 4.1, From the study on updating conductivity field through assimilating hydraulic head data, it is found that 25 observed data (observation wells) are needed to capture major feature of the same conductivity field as the one used in this study. Therefore, the concentration data are more efficient than hydraulic head data. This is because the solute concentration in the middle part will be influenced by the groundwater velocity in the upstream and downstream areas. The hydraulic conductivity field will affect the hydraulic head field through the flow model, and the velocity is controlled by both the hydraulic head and conductivity fields through the Darcy's law (see Equation (2)). Thus, the groundwater velocity is directly influenced by the hydraulic conductivity field. In this study, the solute transport is dominated by the advection, so the solute concentration will be very sensitive to the hydraulic conductivity in the upstream and downstream areas, and the observed solute concentration data will contain the information of the hydraulic conductivity distribution. Furthermore, the dispersivity parameters are also related to the groundwater velocity closely in the solute transport modeling, which suggests that the dispersion part of the solute transport is also sensitive to the hydraulic conductivity indirectly. So only 7 to 9 observed solute concentration data in the middle part can be used to improve the hydraulic conductivity distribution not only in this middle part but can also in the left and right areas.
In part 4.1, it was already shown that the EnKF method by assimilating the hydraulic head data is not very efficient for the study domain with the mixed boundaries, no-flow boundary at the top and bottom borders and constant head boundary at the left and right borders. However, the updating results are very significant with the same hydraulic boundary conditions in this study. Therefore, it is can be concluded that updating a conductivity field through the constrained EnKF method via assimilating the solute concentration data is much more efficient than via assimilating the hydraulic head data.

Large initial distribution of solute concentration
To further explore the capability of the constrained EnKF to update the hydraulic conductivity field via assimilating solute concentrations, it is necessary to add more observation data. So the initial distribution of the solute area is extended from the one in Figure 12(a) to the one in Figure 12(b). In this way, the solute plume will cover a larger area and more observation wells can sample the solute concentration. The mean and variance values of the hydraulic conductivity field with and without data assimilation are plotted in Figures 13 and 14 at the first few and different assimilation steps, and compare them with the values in the real hydraulic conductivity field.
In Figure 13 and 14, it is shown that both the mean and variance values of the updated hydraulic conductivity field by the constrained EnKF become closer to the real ones than those of the initial guess field. Moreover, the RMSE K value at different assimilation time is less than that of the initial guess, which indicates that the updated hydraulic conductivity field is better than the initial guess hydraulic field. The number of the wells from which the plume can be observed during the application of the constrained EnKF by assimilating the solute concentration varies from 13 to 16.
The spatial distributions of the updated hydraulic conductivity field and the corresponding absolute hydraulic conductivity difference between the updated result and the real field at some time steps are shown in Figures 17 and 18. From the figures, it can be seen that the updated hydraulic conductivity field captures well the main feature of the real field, and the updated results are much better than the initial guess. The low hydraulic conductivity zones of the real field are together in the middle area, while it is also indicated in the updated fields as shown in Figure 17. Like the real field, the high hydraulic conductivity zones can also be seen in the right-upper and the left-middle areas of the updated ones. In comparison with the 7 th step updated results with a smaller initial plume size, shown in Figures 15(d) and 16(e), and with a larger initial plume size, Figures 17(c) and 18(c), the larger the initial plume size, the slightly better the updated results.

Large initial distribution of solute concentration
To further explore the capability of the constrained EnKF to update the hydraulic conductivity field via assim concentrations, it is necessary to add more observation data. So the initial distribution of the solute area is extended fr Figure 12(a) to the one in Figure 12(b). In this way, the solute plume will cover a larger area and more observation we the solute concentration. The mean and variance values of the hydraulic conductivity field with and without data as plotted in Figures 13 and 14 at the first few and different assimilation steps, and compare them with the values in the conductivity field.
In Figure 13 and 14, it is shown that both the mean and variance values of the updated hydraulic conductivity constrained EnKF become closer to the real ones than those of the initial guess field. Moreover, the RMSE K valu assimilation time is less than that of the initial guess, which indicates that the updated hydraulic conductivity field the initial guess hydraulic field. The number of the wells from which the plume can be observed during the appl constrained EnKF by assimilating the solute concentration varies from 13 to 16.
The spatial distributions of the updated hydraulic conductivity field and the corresponding absolute hydraulic difference between the updated result and the real field at some time steps are shown in Figures 17 and 18. From the be seen that the updated hydraulic conductivity field captures well the main feature of the real field, and the updat much better than the initial guess. The low hydraulic conductivity zones of the real field are together in the middle ar also indicated in the updated fields as shown in Figure 17. Like the real field, the high hydraulic conductivity zone seen in the right-upper and the left-middle areas of the updated ones. In comparison with the 7 th step updated r smaller initial plume size, shown in Figures   What we want to mention is that the observation wells are uniformly distributed in space, which is very easy practice, and the hydraulic conductivity field can be improved by obtaining solute concentration from these concentration data are available in all the observation wells, it is not necessary to get the data at every observation w be gotten selectively, that is to say that some observed data at these wells can be gotten at this time, and some obs some different wells (not the same observation wells) cab also be gotten at the next time. But how to determine the time is better to obtain observed data from which observation wells is an issue requiring further research.

Large initial distribution of solute concentration
To further explore the capability of the constrained EnKF to update the hydraulic conductivity field via assim concentrations, it is necessary to add more observation data. So the initial distribution of the solute area is extended fr Figure 12(a) to the one in Figure 12(b). In this way, the solute plume will cover a larger area and more observation wel the solute concentration. The mean and variance values of the hydraulic conductivity field with and without data as plotted in Figures 13 and 14 at the first few and different assimilation steps, and compare them with the values in the conductivity field.
In Figure 13 and 14, it is shown that both the mean and variance values of the updated hydraulic conductivity constrained EnKF become closer to the real ones than those of the initial guess field. Moreover, the RMSE K valu assimilation time is less than that of the initial guess, which indicates that the updated hydraulic conductivity field the initial guess hydraulic field. The number of the wells from which the plume can be observed during the appl constrained EnKF by assimilating the solute concentration varies from 13 to 16.
The spatial distributions of the updated hydraulic conductivity field and the corresponding absolute hydraulic difference between the updated result and the real field at some time steps are shown in Figures 17 and 18. From the be seen that the updated hydraulic conductivity field captures well the main feature of the real field, and the updat much better than the initial guess. The low hydraulic conductivity zones of the real field are together in the middle ar also indicated in the updated fields as shown in Figure 17. Like the real field, the high hydraulic conductivity zone seen in the right-upper and the left-middle areas of the updated ones. In comparison with the 7 th step updated r smaller initial plume size, shown in Figures   What we want to mention is that the observation wells are uniformly distributed in space, which is very easy practice, and the hydraulic conductivity field can be improved by obtaining solute concentration from these concentration data are available in all the observation wells, it is not necessary to get the data at every observation w be gotten selectively, that is to say that some observed data at these wells can be gotten at this time, and some obs some different wells (not the same observation wells) cab also be gotten at the next time. But how to determine the time is better to obtain observed data from which observation wells is an issue requiring further research. What we want to mention is that the observation wells are uniformly distributed in space, which is very easy to conduct in practice, and the hydraulic conductivity field can be improved by obtaining solute concentration from these wells. If the concentration data are available in all the observation wells, it is not necessary to get the data at every observation well. They can be gotten selectively, that is to say that some observed data at these wells can be gotten at this time, and some observed data at some different wells (not the same observation wells) cab also be gotten at the next time. But how to determine the criteria which time is better to obtain observed data from which observation wells is an issue requiring further research.  However, maybe the standard deviation of the observation error as 1% of the solute concentration measurements is too small, the standard deviation of the observation error is increased to 5% of the solute concentration measurements. The calculated results of the mean and variance of the hydraulic conductivity are shown in Figures 19 and 20. From figures, it is found that the mean and variance of the updated hydraulic conductivity field via the data assimilation method is closer to that of the real hydraulic conductivity field than the initial guess conductivity field. the spatial distribution of the updated hydraulic conductivity field at different assimilated time steps are plotted in Figure 21. Comparing Figure 21 with Figure  17, it is found that these results are very similar. So the standard deviation of the observation error does not affect the simulation results very much here from the first view. Therefore, the absolute hydraulic conductivity difference between the updated result and the real field are not plotted for simplicity.
To have a further look at the influence of the standard deviation of the observation error on the updated results, the mean, variance and spatial distribution of the updated hydraulic conductivity field at different assimilated time steps with the standard deviation of the observation error be 10% of the solute concentration measurements are also plotted in Figures  22-24. From the mean and variance values of the updated hydraulic conductivity field in Figures 22 and 23, it is seen that the simulated time become longer and the results by the data assimilation method via the constrained EnKF is much closer to the real conductivity field than the initial guess even the standard deviation of the observation error is 10% of the observation measurements. Compared with Figures 13 and 14, it can be even seen that the results with the standard deviation of the observation error be 10% of the solute concentration measurements are much better than that of 1%, and the simulated time is also longer here. This phenomenon can be explained that maybe the constraints posed to the EnKF method have introduced too much potential error to the model, which lead to the avoidance of the filter divergence to a certain degree and longer simulation time, and also the increase of the observation error will give a tradeoff between the model error and the observation error. Furthermore, it is seen that the spatial distribution of the updated hydraulic conductivity fields still similar to that in Figure 17. Therefore, it is can proved that the standard deviation of the observation error does not affect the simulation results very much in a deeper step. However, maybe the standard deviation of the observation error as 1% of the solute concentration measurements is t standard deviation of the observation error is increased to 5% of the solute concentration measurements. The calcula the mean and variance of the hydraulic conductivity are shown in Figures 19 and 20. From figures, it is found that t variance of the updated hydraulic conductivity field via the data assimilation method is closer to that of the re conductivity field than the initial guess conductivity field. the spatial distribution of the updated hydraulic conduc different assimilated time steps are plotted in Figure 21. Comparing Figure 21 with Figure 17, it is found that these res similar. So the standard deviation of the observation error does not affect the simulation results very much here from t Therefore, the absolute hydraulic conductivity difference between the updated result and the real field are no simplicity.
To have a further look at the influence of the standard deviation of the observation error on the updated results, the m and spatial distribution of the updated hydraulic conductivity field at different assimilated time steps with the standa of the observation error be 10% of the solute concentration measurements are also plotted in Figures 22-24. From t variance values of the updated hydraulic conductivity field in Figures 22 and 23, it is seen that the simulated time be and the results by the data assimilation method via the constrained EnKF is much closer to the real conductivity f initial guess even the standard deviation of the observation error is 10% of the observation measurements. Compared 13 and 14, it can be even seen that the results with the standard deviation of the observation error be 10% o concentration measurements are much better than that of 1%, and the simulated time is also longer here. This phenom explained that maybe the constraints posed to the EnKF method have introduced too much potential error to the mode to the avoidance of the filter divergence to a certain degree and longer simulation time, and also the increase of the error will give a tradeoff between the model error and the observation error. Furthermore, it is seen that the spatial d the updated hydraulic conductivity fields still similar to that in Figure 17. Therefore, it is can proved that the standard the observation error does not affect the simulation results very much in a deeper step.  The RMSE values for the updated hydraulic conductivity field under different percentage (3%, 5%, 9%, 10%) standard deviation of the observation error of the solute concentration measurements are also plotted in Figure 25. From the figure, it can be seen that the RMSE values will increase as the assimilated time increase for different perturbations to the observation error. The larger perturbations added to the observation errors, the greater the RMSE values for the updated hydraulic conductivity fields. It is seen that if the standard deviation of the observation errors is greater than 5% of the measurements, the RMSE values for the updated hydraulic conductivity have the trend to increase. So it is better to use the standard deviation  The RMSE values for the updated hydraulic conductivity field under different percentage the observation error of the solute concentration measurements are also plotted in Figure  RMSE values will increase as the assimilated time increase for different perturbatio perturbations added to the observation errors, the greater the RMSE values for the updat that if the standard deviation of the observation errors is greater than 5% of the measure hydraulic conductivity have the trend to increase. So it is better to use the standard dev than 5% of the measurements if I want to have less RMSE. Furthermore, it can also be f and RMSE values for the updated hydraulic conductivity field with the standard deviati measurements in Figures

Conclusions
In this study, the constrained EnKF is used to estimate heterogeneous hydraulic conductivity distributions by assimilating solute concentration measurements for transient solute transport in a steady state flow field. The measurement error is considered, but the model error is not. The forecast model is assumed to be known. A synthetic case of a rectangular field with the no-flow boundary at the upper and bottom borders and constant head boundary at the left and right borders is designed to investigate the capacity and effectiveness of the data assimilation method to identify a hydraulic conductivity distribution. The developed method is only demonstrated on 2-D transient solute transport with two different areas of initial instant solute source in the middle area of the field. The solute concentration measurements are used to identify the heterogeneous hydraulic conductivity field. Based on the study results, the following conclusions are obtained.