Formulas for the three spatial covariance functions used in this analysis.
Climate change is increasing variation in freshwater input and the intensity of this variation in estuarine systems throughout the world. Estuarine salinity responds to dynamic meteorological and hydrological processes with important consequences to physical features, such as vertical stratification, as well as living resources, such as the distribution, abundance and diversity of species. We developed and evaluated two space-time statistical models to predict bottom salinity in Pamlico Sound, NC: (i) process and (ii) time models. Both models used 20-years of observed salinity and contained a deterministic component designed to represent four key processes that affect salinity: (1) recent and long-term fresh water influx (FWI) from four rivers, (2) mixing with the ocean through inlets, (3) hurricane incidence, and (4) interactions among these variables. Freshwater discharge and distance from an inlet to the Atlantic Ocean explained the most variance in dynamic salinity. The final process model explained 89% of spatiotemporal variability in salinity in a withheld dataset, whereas the final time model explained 87% of the variability within the same withheld data set. This study provides a methodological template for modeling salinity and other normally-distributed abiotic variables in this lagoonal estuary.
- space-time model
- spatial covariance
- freshwater inflow
- process-based model
Estuarine salinity responds to dynamic meteorological and hydrological processes  with important consequences to physical features, such as vertical stratification, as well as living resources, such as the distribution, abundance and diversity of species [2, 3, 4, 5]. For example, relatively low mixing and subsequent salinity stratification can lead to hypoxia in areas where organically-rich sediments are not adequately re-oxygenated, causing emigration of mobile fauna and degradation of ecosystem functions [5, 6, 7, 8, 9]. Rapid salinity changes, such as those associated with large rainfall events or tropical cyclones, can cause death of postlarval stages that are sensitive to unusually low salinities , and mass seaward migration and subsequent hyper-aggregation of mobile, commercially important species that can result in (1) shifts of juveniles from primary nursery areas protected from trawling to secondary non-nursery areas vulnerable to fishing pressure , (2) overharvest of adults due to increases in fishery catchability , or (3) bias fishery-independent surveys that leads to over-inflated population abundance estimates . Thus, the need to accurately predict the spatiotemporal dynamics of salinity is unprecedented. The specific goals of this study were to: (1) evaluate several statistical models to hindcast and forecast salinity in the second largest estuary and largest lagoonal estuary in the United States—Pamlico Sound, North Carolina, USA, and (2) assess salinity observations, predictions, and standard errors under five hydrologic scenarios characteristic of historic and future climate changes.
Pamlico Sound (PS) is a relatively shallow estuary with a mean depth of 4 m and a maximum depth of 7 m. PS circulation is dominated by wind-driven currents and freshwater input [13, 14]. Seasonal cyclonic storms are also an important climatological component of the PS system. Since 1996, over three tropical storms or hurricanes have passed within 300 km of the North Carolina coast per year . Given the important role that salinity plays in the abiotic and biotic system components of estuaries, and the likelihood that global climate change will increase the frequency of extreme weather events (e.g., floods, droughts, hurricanes—[9, 15, 16]), there is a critical need for models that can accurately forecast spatiotemporal variation in salinity (e.g., ). A recent review by Iglesias et al.  highlights the strengths of applying numerical modeling tools to characterize morpho-hydrodynamic processes in estuarine and coastal systems.
1.1 Statistical models to predict dynamic salinity
Producing retrospective salinity maps based on observational data does not require a statistical model based on hydrological mechanisms that affect salinity; it is possible to perform individual spatial interpolations for each time period of interest using an ordinary kriging model or a universal kriging model with a simple spatial trend. Predicting salinity under a hypothetical set of conditions, however, does require a model that can ‘learn’ about hydrological mechanisms based on retrospective data (e.g., [20, 21]). Thus, the more comprehensive goal of this study was to produce retrospective maps of salinity by developing a space-time statistical model in which the mean function represents the hydrological mechanisms that affect salinity, and a spatial covariance function makes up the difference between the observed salinity data and the mean function’s salinity prediction.
To create such a model, we constructed explanatory variables that accounted for the effect of riverine freshwater inflow (FWI), distance to inlet sources of oceanic saltwater, and hurricane incidence on salinities at different locations in PS. We used a forward-selection process to choose which of these variables to keep in the model. Standard errors based on the covariance function allowed for assessment of strengths and weaknesses of the representation of the hydrology in the mean function. Since an additional goal of this study was to provide a template for researchers to build process-based models of normally-distributed estuarine variables, we considered only models that could be fit using procedures in the SAS® software package, yet can be adopted to R-statistical software.
Other process-based models of PS salinity in the literature—all of which are differential-equation-based deterministic models—provided important insights into how different variables influenced spatiotemporal salinity variation in PS ([22, 23], and others). However, these models ultimately lacked the spatial resolution and/or coverage of the entire area of interest of this study, and none quantified uncertainty at every space-time prediction location. For example, Xu et al.  predicted surface and bottom salinity, and temperature at 30-second intervals over a spatial grid with varying cell size (200–800 m2) in the Pamlico River Estuary (PRE), a PS tributary, using a customized extension of the Environmental Fluid Dynamics Code  to incorporate FWI from major tributary rivers, as well as tide and wind effects on circulation. Although this model incorporated environmental variation and produced salinity predictions suitable to assess long-term space-time trends, the PRE makes up only 18% of the area of PS. Predicting salinity across the entire PS using this model would require spatial domain expansion and re-parameterization, and such extensions are not planned (J. Lin, NC State University, pers. comm. on behalf of Xu et al. ).
Though we are unaware of researchers that have constructed space-time statistical models of salinity in PS, there are examples of applying statistical models for spatial prediction of salinity in other estuaries. For example, Rathbun  used independent multiple linear regression models with spatially-correlated errors to predict salinity and dissolved oxygen (DO) in Charleston Harbor, SC over a two-week time period in 1988 as a function of spatial coordinates and distance to the estuary mouth. Chehata et al.  performed three-dimensional spatial interpolation of salinity and DO measurements in Chesapeake Bay. Qiu and Wan  developed a salinity model based on time series analyses of salinity data for the Caloosahatchee River Estuary, Florida, USA. The structure of their model consisted of an autoregressive term representing the system persistence and an exogenous term accounting for physical drivers including freshwater inflow, rainfall, and tidal water surface elevation that cause salinity to vary. The model was calibrated and validated using up to 20 years of measured data collected they found that the time series model offers comparable or superior performance compared with its 3-D, numerical counterpart. This model has been used as a tool for water resources management projects relating to ecosystem restoration and water control in south Florida . Similarly, Ross et al.  examined the response of salinity in the Delaware Estuary, USA to climatic variations using statistical models and long-term (1950-present) records of salinity from the U.S. Geological Survey and the Haskin Shellfish Research Laboratory. The statistical models included non-parametric terms and were robust against auto-correlated and heteroscedastic errors. After using the models to adjust for the influence of streamflow and seasonal effects on salinity, several locations in the estuary showed significant upward trends in salinity. Insignificant trends are found at locations that are normally upstream of the salt front. The models indicate a positive correlation between rising sea levels and increasing residual salinity, with salinity rising from 2.5 to 4.4 psu per meter of sea-level rise. The results suggest that continued sea-level rise in the future will cause salinity to increase regardless of any variation in fresh water influx . Urquhart et al.  present the results of multiple statistical models that predicted daily, gridded surface salinity at 1 km resolution across Chesapeake Bay, USA as a function of surface reflectance estimates of salinity from the NASA Moderate Resolution Imaging Spectroradiometer (MODIS), onboard the Aqua platform satellite. Eight statistical methods were tested, and sea surface salinity was accurately predicted via remote sensed products with an accuracy that was more than sufficient for many physical and ecological applications .
None of these previous studies, however, attempted to explicitly represent the hydrological processes by which fresh and saltwater mixing affects estuarine salinity. In this paper, we describe the development of candidate explanatory variables to represent mechanisms affecting PS salinity and how that development led to consideration of two fundamentally different mean functions. We then describe the forward selection process by which candidate variables were chosen to be retained in the models, and how candidate covariance functions were selected to pair with each mean function. Next, we examined maps of salinity observations, predictions, and standard errors under five hydrologic scenarios, analyzed these results, and provided overall implications of the findings.
2. Methods and results
2.1 Data and notation
We used bottom salinity values measured by the North Carolina Division of Marine Fisheries (NC DMF) Pamlico Sound Trawl Survey Program 195 (
The temporal domain contains
The fresh water influx (FWI) data represented watersheds of the Neuse, Pamlico, Roanoke, and Chowan rivers, which comprise 80% of the land draining into PS . FWI observations were average daily river discharge rates collected by one US Geological Survey (USGS) gauge station per tributary (Figure 1): Neuse River (NR) station 02089500 in Kinston; Tar-Pamlico River (TPR) station 02083500 in Tarboro; Roanoke River (RR) station 02080500 in Roanoke Rapids; and Ahoskie Creek (AC) station 02053500 in Ahoskie, which gauges Chowan River inflow. Discharge rates in ft3/s for every day during the time domain (7305 days) were downloaded from the USGS Water Resources website for the state of North Carolina (USGS 2009) and were converted to m3/s. For each river, the gauge chosen was the furthest downstream gauge that recorded data over the entire temporal domain.
2.2 Candidate explanatory variables
The creation of explanatory variables reflects the modeling context—the objectives, the geographical features of the spatial domain, and the space-time coverage and resolution of the data—but the general thought process can be modified by other researchers in a different context. We index the term
2.3 Freshwater influx indices
Sixty-one days is the average freshwater residence time of the four major rivers flowing into PS [30, 31, 32], accounting for the temporal lag between the upriver gauging of freshwater and the delivery of that water to
Since freshwater from river
The coordinates of each gauge station were used to calculate distance because the gauge was the location of the and observations. Like all distances in this study, represents distance “as the crow flies” as opposed to water-path distance. Though the superiority of using water-path distance when modeling water-quality variables in stream and estuarine systems seems intuitive, results from studies that compare these two distance metrics are inconclusive. For example, Gardner et al.  found more accurate predictions of stream temperatures when models incorporated water-path distance, but only when this distance was modified and weighted by stream order. Peterson and Urquhart  predicted various nutrient concentrations in 17 Maryland rivers and concluded that using water-path distance works well when modeling certain nutrients, but not others, and that the crow-flies distance appeared to be the most suitable distance measure overall. Comparing the accuracy of predictions of water quality parameters generated from two different multiple linear regression models containing the explanatory variable “distance to inlet mouth”, Little et al.  found that predictions from models using water-path distance were no more accurate than those from models using crow-flies distance. None of these studies demonstrated marked predictive improvement using water-path distance, therefore we used crow-flies distance from each of the four river gauges to each of 2100 sample stations and over 6000 prediction locations.
The plot in Figure 2 of against Roanoke River typifies the relationships between salinity and each of the eight and variables. Larger values of the metric are associated with smaller values of salinity, but groups of observations have different slopes. Closer examination revealed that the different groups corresponded to different time periods. We attempted to account for the different slopes in two ways, first by considering the 28 pair-wise interactions among the andand second by considering 39 time-period indicator variables defined as
(A fortieth indicator variable was not used because it would create a non-full-rank design matrix, and the effect for the fortieth time period can be derived using the intercept.) This latter consideration led to the creation of two distinct mean function models: the
2.4 Saltwater mixing and tidal signal
Although salinity on the inner-continental shelf of the U.S. Southeast Atlantic coast exhibits some spatial variability near PS , we follow Xie et al.  and assume constant open ocean salinity. This assumption allows for modeling the effect of ocean water mixing as a function of only the distance to inlet, as opposed to distance interacting with the salinity of the ocean water, from each spatial location in the sound to each of the major PS inlets: Oregon, Hatteras, and Ocracoke. Exploratory analyses reveal that models using a single variable (distance to the nearest inlet) rather than three variables (distances to each of the three inlets), explains the same amount of variability in salinity when other explanatory variables are also included. Therefore, we consider for inclusion in subsequent models the variable
2.5 Wind speed and direction
A prevailing wind field that is north/northeast from March to August and south/southwest from September to February is the primary driver of currents in PS . Thus, wind speed and direction were incorporated into the modeling process using the categorical variable
is used to examine the effects of seasonal wind patterns on the spatial distribution of salinity.
2.6 Evaporation and direct precipitation
Holding other factors constant, sound-wide salinity in time periods that experience more evaporation of water from the surface of PS would likely be higher than those in time periods that experienced less evaporation, but no evaporation data were available for the space-time domain of interest. Salinity in time periods for which there was more direct precipitation into should be lower than those in lower-precipitation time periods, however precipitation data were only available at two weather stations on the edges of PS from which information about individual spatial locations within PS would be difficult to infer. Giese et al.  found that direct precipitation constitutes only 8% of mean PS freshwater input, thus the signal from riverine FWI should dominate in explaining salinity variability. Therefore, we did not include evaporation or direct precipitation variables in our models.
2.7 Spatial coordinates
Estuarine salinity varies over space such that functions of spatial coordinates might explain variability in salinity not accounted for by the other variables. Scatterplots of salinity versus easting and northing suggested that salinity is quadratic in the former and cubic in the latter. The quadratic function of easting can be explained by examining a west-to-east path through PS along the 35° 16′ N parallel (A in Figure 1): salinity should initially increase, reach a maximum at the saltwater plume near Ocracoke and Hatteras Inlets, and decrease again on the other side of the plume in the waters on the western shore of Hatteras Island near Buxton, NC. The cubic function of northing is best described by examining a north-to-south path along longitude of 75° 42′ W (B in Figure 1), where salinity should increase traveling south from Albemarle Sound, reach a local maximum near Oregon Inlet, decrease continuing past the saltwater inlet plume, and increase again as the Hatteras Inlet saltwater plume is reached. Thus, , , , , and the interactions , , , and are considered as explanatory variables. All coordinates are centered before they are squared or cubed by subtracting the mean over all observations.
Hurricanes can rapidly introduce large volumes of freshwater to estuaries via riverine influx, push large volumes of saltwater in through inlets via storm surge, and alter circulation patterns through abrupt changes in wind speed and direction [7, 10]. Hurricanes can also open new inlets to PS, which can alter current flow and increase saltwater intrusion . The variable should capture variability in salinity due to hurricane-produced FWI. Three additional variables may account for non-FWI-related variability in salinity due to hurricane passage. These variables are unique to a given time period
2.9 Variable selection
Section 3 identifies 46 candidate explanatory variables for the process model mean function: and (8), plus selected pair-wise interactions (explained below) (24); spatial coordinates, their powers, and specified interactions (9); ; ; and hurricane variables , , and
2.10 Process model
The results of eight separate ordinary least squares linear regression models of salinity make up the rows Table 1. The first five consist of an intercept and a single explanatory variable: , , ,
Adjusted R2 is a modification of R2 that penalizes the number of explanatory variables. While R2 increases as more variables are added to a model, adjusted R2 increases only if the added variable decreases the error sum of squares enough to offset the loss in error degrees of freedom.
The model with the long-term freshwater influx indices had the largest adjusted R2 at 0.38, followed by the model with the distance from the nearest inlet (0.34), and the model with the short-term FWI indices (0.27). None of the other four models explained more than 5% of the variability in salinity. We chose the model with the long-term freshwater influx indices as the base upon which to build the mean function.
To this base model we added the variable since the model containing this variable had the second-best performance, thus beginning a forward-selection process. Each time we added a variable or set of variables to the model, we kept it in the model if the new adjusted R2 exceeded the old. Variables from the seven initial models were then added in order of decreasing adjusted R2. Following this procedure, the mean trend model grew to contain 10 variables—, , , and —with adjusted R2 0.57.
Because the effect of FWI from one river on a given location in PS could change based on the FWI from another river during the same time period, we evaluated the addition of the 6 pair-wise interactions among the four , the 6 pair-wise interactions among the four , and the twelve interactions between the and the , excluding interactions of one river’s with its own . Despite a decrease in error degrees of freedom by 24, adjusted R2 was 0.66, so the set was retained.
Spatial coordinate variables were evaluated last in groups according to their polynomial order, with squared and cubic terms added before interactions. We considered these variables last because we wanted to include them only if they explained additional variability in the response after more interpretable variables were included. We determined that including all variables except increased the adjusted R2. The final process model mean function thus had an adjusted R2 of 0.73 and included the following: ; ; ; ; ; ; ; , , , , and interactions , , and .
2.11 Time model
To build the time model, we followed the same procedure described above, selecting for the base of the mean function a set of time period indicator variables because a linear regression of on these variables had an adjusted R2 of 0.41 (Table 1). (Note that such a model is equivalent to fitting an ANOVA model using the time periods as groups.) Again, we added other sets of explanatory variables in order of decreasing adjusted R2. Before evaluating interactions, the mean trend time model had an adjusted R2 of 0.78 and contained 48 variables: , , , and . When interactions among the and thewere added, the model was not full rank (not all columns in the design matrix were linearly independent). Because we created this second model to evaluate these interactions, we removed the , the most recent variable addition, to include them. This new model, including the interactions, became the base since its adjusted R2 (0.89) was larger than that of the previous mean trend time model (0.78). After investigating spatial coordinate variables, the final mean trend time model (below) had an adjusted R2 of 0.91 and included 204 variables: , , , , , ,, and . To avoid confusion later, note that the adjusted R2 of 0.73 for the process model and 0.91 for the time model were based on fitting each model to the full dataset. In the next section, we report R2 (not adjusted R2) based on a cross-validation dataset.
2.12 Modeling spatially correlated error
The variable selection analyses above used ordinary least squares (OLS) regression to model salinity as a function of explanatory variables. That model can be written as
where represents the value of the explanatory variable at space–time location , for , where is the total number of explanatory variables. represent the intercept and regression coefficients, and deviations from the mean trend are assumed to be independent and identically distributed with mean 0 and variance . The model can be equivalently written as , where is the vector containing the values of the explanatory variables at space-time location , and represents the vector of regression coefficients. The same model written in matrix form is
where bold print indicates vectors so that ,
Rarely, however, does the assumption of independent and identically distributed errors hold for observations of natural phenomena associated with locations in space and time. While it is intuitive that values of salinity located close together in space should be similar, it is also generally the case that the deviations from the mean function of observations located close together are similar. That similarity is referred to as spatial covariance, and the spatial covariance between deviations from the mean trend at two locations within the same time period can be modeled as a function of the distance separating them. Including in the overall model both a deterministic mean function and a spatial covariance function allowed predictions of salinity at locations where there were no observations.
Valid covariance functions ensure that the covariance matrix will be positive definite, which, in turn, ensures that variances will be non-negative. Each covariance function has a shape defined by a range parameter, a partial sill, and sometimes a nugget effect. Appendix Table A1 gives formulas for determining spatial covariance according to the exponential, Gaussian, and spherical covariance functions, each with and without a nugget effect. Figure 3 shows an example of the spherical covariance function—the solid red line—fit to a sample covariogram—the blue dots—of deviations from the process model for June 1994. The range parameter—for the exponential and Gaussian covariance functions, for spherical—is related to the distance that must separate two sites before their deviations are independent, where independence corresponds to a covariance of zero or virtually zero. In Figure 3, the range is approximately 10 km. In the absence of a nugget effect, the partial sill is the value of the covariance at distance zero—that is, it is the variance of deviations from the mean—and in Figure 3 this value is approximately 2.5 squared units of salinity. In the presence of the nugget , there is a discontinuity in the covariance function at distance zero, so that the intercept is slightly greater than the limit of the smooth part of the function as distance approaches zero. In this case, the variance of the deviations is equal to the sum of the partial sill and nugget: . It may be the case that variance is higher when values of deviations are higher. Since covariance parameters represent physical quantities that may change over time, we used the capabilities of SAS® Proc Mixed to allow a different partial sill and range parameter for each time period.
Model (3), modified to include spatial correlation, becomes
where represents the block-diagonal covariance matrix
where zero matrices for off-diagonal elements indicate that deviations in one time period are not correlated with those in another. We make this assumption partially due to the long time span separating June and September, but also because no SAS® procedure has the capacity to model such space-time correlation while at the same time allowing every time period to have different spatial covariance parameters and allowing a mean function to be fit. Diagonal elements are individual spatial covariance matrices for each time period with dimensions , and elements representing the spatial covariance between sites
Understanding how predictions of salinity and prediction standard errors are generated from this model will make the results and analysis in Sections 6 and 7 easier to understand. To predict salinity at space–time locations where it is not observed, the following results are needed. Superscripts differentiate between locations where salinity is observed and unobserved. Model (4), represents observations of salinity (by virtue of the dimensions of the vectors and matrices), but we model salinity observations and unobserved values of salinity at other space-time locations using a similar model, the joint distribution of unobserved and observed salinity, given by
Here, represents the vector of salinity observations, and, letting represent the number of space-time locations at which we want to predict salinity, represents the vector of unknown values of salinity at these locations. All the symbols in (5) have the same meaning as in (4), except for the distinction between observed and unobserved locations. The matrix contains the cross-covariance between observed and unobserved locations. Thus,
and . The elements also come from the spatial covariance function.
Let represent the vector that contains all spatial covariance parameters for every time period—either 80 or 81 parameters depending on whether a nugget effect is used. Standard normal distribution theory gives the distribution of unobserved salinity conditioned on knowing the values of observations and all of the parameter values:
The pipe symbol (|) means “given” or “conditioned on knowing the values of” the terms following the pipe symbol. The terms before the comma represent the mean of the multivariate normal distribution, which is used for the salinity prediction, and the terms after the comma represent the variance-covariance matrix, which is used for prediction standard errors. Salinity predictions are the sum of the mean trend, , and the spatial interpolation of observation deviations from the mean trend, . If the deviations are large for a given time period, then the partial sill will be large for that time period, so that diagonal elements of and will be large. For a given location, the prediction standard error is the diagonal element of the matrix . If the diagonal elements of and are large, then the diagonal elements of are small, and the prediction standard error is a large number minus a small number. That is, the prediction standard error will be high for time periods in which observation deviations from the mean function are large. When observation deviations from the mean trend are small, the reverse is true, and prediction standard errors tend to be low for that time period.
The salinity predictor
Salinity predictions are better when a spatial covariance function is combined with either mean function. For example, of the time models, the exponential covariance function with a nugget produced predictions with the lowest RMSE (2.1), slope closest to one (0.92), and intercept closest to zero (1.55). Comparing process models, the exponential and spherical, each with and without a nugget, performed equally well, and better than the time models. To select the best model from this group of four, we examined statistics based on how well the model fit the base dataset. The model with an exponential covariance function with a nugget had the lowest AIC (7580.0) and BIC (7711.7) and was thus chosen as the final model. It explained 89% of variability in the test dataset and generated predictions with RMSE 2.0.
Next, we fit this model using the full dataset, and produced retrospective maps of salinity predictions and standard errors at evenly spaced 1 nmi (1.85 km) increments for each time period. Forty-two salinity predictions—less than 0.1% of the total number of predictions—were negative and set to zero.
2.13 Examining freshwater influx scenarios
To examine variations in the spatial distribution of salinity under drought, average, and flood conditions, we classified freshwater influx from each river within each time period (and ) as LOW if it fell below the 25th percentile of observed FWI across all time periods, MODERATE if it fell between the 25th and 75th percentiles, HIGH if it fell between the 75th and 95th percentiles, and FLOOD if it fell above the 95th percentile. Next, we classified one-week and two-month FWI for the entire time period as LOW (or HIGH) if at least two rivers exhibited low (or high) inflow, MODERATE if at least three rivers exhibited moderate inflow, and FLOOD if at least one river exhibited extremely high (>95th percentile) inflow. These classifications are mutually exclusive, though some of the 40 time periods did not fall into any of them. The first two columns of Table 3 list the 16 combinations of classifications, and the third column shows the classification and salinity rank for each time period. Time periods were ranked 1–40 by mean predicted salinity (1 = highest mean salinity; 40 = lowest).
The lower right pane of Figure 4 (as well as Figure 5A and B
This spatial trend in SEs is further illustrated by comparing locations of high SE in the same time period, which are also consistent over time. High SEs occur between the mouths of the Neuse and Pamlico Rivers and along a margin of varying width following the outline of the Outer Banks, areas within which sampling does not occur (Figure 1). We note here that because SEs increase as distance from sample site increases, we chose to generate only interpolated (and not extrapolated) salinity predictions. In June 2005, as in all other time periods, predictions were generated only for locations within
Though June 2002 salinity observations have a larger mean and greater variability, the majority of prediction standard errors are less than 1.01. In June 1999, however, SEs fell between 1.01 and 1.81 at all prediction locations except those that were very close to observations. This result shows that the conditions affecting salinity in PS were better represented by the mean function in June 2002 than they were in June 1999.
Because water exchange between lagoonal estuaries and the open ocean can be relatively restricted, there is a relatively high potential in systems like PS for changes in precipitation patterns and storm frequencies associated with global climate change to result in changes in salinity patterns and subsequent ecosystem alterations. Changes in precipitation will affect the amount and timing of river flow, which will impact nutrient cycling, estuarine flushing rates, and salinity. Increased storm activity may open new inlets, which would alter current flow, increase tidal action, and allow a greater influx of seawater that carries with it both different chemical signals and mobile species. Salinity is therefore a practical estuarine characteristic to use to study the impacts of these changes, as both effects mentioned above include enhanced water exchange that impacts overall estuarine salinity content [43, 44].
We developed and evaluated two statistical models, using the best model to hindcast salinity in PS. The process mean function combined with the exponential covariance with a nugget explained 89% of the variability in a test dataset with a RMSE of 2.0 and produced relatively accurate retrospective salinity maps under a wide range of freshwater influx and system-state scenarios. Much of this accuracy was due to allowing the range and partial sill parameters of the spatial covariance to be time-period specific. We then examined variations in the spatial distribution of salinity under varying freshwater influx (FWI) conditions such as drought, average FWI, and flood conditions, and identified the following patterns. In years with moderate FWI, the salinity gradient increased from west to east in PS as expected, and was highest adjacent to the major inlets, with highest salinities near Oregon Inlet. In years with low FWI indicative of drought conditions, the overall mean and variance in salinity increased in PS. In years with floods, salinities displayed a high degree of spatial variation, with salinities being lower near the tributaries as expected, yet also displaying occasional sharp increases in salinity near inlets due to influx of ocean water into PS via the major inlets.
3.1 Improvements to model predictions
For retrospective prediction purposes, model improvements could focus on improvements to the mean trend, the covariance, or both, and such improvements could be evaluated using the test dataset. A reasonable goal might be to increase
Though alone explained a third of the variability in salinity over all time periods, variability in inlet-plume size across Figures 3
Differences in both salinity values and SE estimates between early-stage drought during June 1999 and late-stage drought during June 2002 suggest accounting for effects of FWI over a longer duration than 61 days. Doing so might explain differences in salinity patterns seen in time periods with similar one-week and two-month FWI conditions. Molina  calculated an 11 month mean residence time for freshwater in PS. We could incorporate this effect by adding a third freshwater influx index to the mean function or by adding an autoregressive component to the model so that salinity in a given time period was a function of mean salinity in the previous time period. The first option would be tedious from a data-manipulation standpoint, but much easier from a mathematical model-fitting standpoint, because SAS® Proc Mixed could still be used. The second option necessitates a change in the covariance function, as we can no longer assume that salinity deviations from the mean function at a given space-time point were independent in time. This second option would also require specialized hand-written code, as no current SAS® Proc allows such a dynamic space-time model to be fit.
Differences in salinity patterns between June 1999 and June 2002, our two low-to-low FWI time periods, could be attributed to differences in FWI from the Roanoke River, one of the two northern rivers whose connection to PS is indirect. This observation warrants further investigation into the calculation of the FWII indices; namely, an investigation of water-path distance as a possible substitute for crow-flies distance between river gauges and sites in PS. Although we did not find a study that demonstrated marked predictive improvement using water-path distance under all circumstances ([36, 46], and others), it would be interesting in future work to compare differences in PS salinity predictions using both distance methods. Recall that Gardner et al.  noted more accurate predictions of stream temperatures when models incorporated water-path distance, but only when this distance was further modified and weighted by stream order. It might be the case that water-path distance out-performs crow-flies distance in predicting estuarine salinity when care is taken to make all explanatory variables as meaningful as possible. Development of an automated procedure for calculating water-path distances similar to the one used in  would make such an investigation more practically feasible.
Another way to achieve flexibility while still specifying a single covariance function for every time period, would be to allow an anisotropic covariance function. Geometric anisotropy allows for different range parameters in different directions. For example, if the water current in PS were flowing directly north-to-south, two points separated by a north-to-south vector might have more similar values of salinity than would two points separated by a west-to-east vector of the same length. Fortunately, the parameterization of a geometric anisotropic covariance function is such that if anisotropy were unnecessary, the parameters would take values that effectively result in an isotropic covariance function. The cost of this added flexibility is the need to estimate two additional covariance parameters per time period, for a total of 80 additional parameters. Computation time might be less here than for Matern, since anisotropic covariance functional forms are less complex.
We created a statistical model combining a process mean function with an exponential spatial covariance function with a nugget to predict salinity in a lagoonal estuary. This model can generate predictions of bottom salinity for Pamlico Sound, NC that are more spatially-resolute than any previous bottom salinity predictions encountered in the literature for this system. The salinity maps produced using the model are useful for researchers to build an intuitive understanding of salinity dynamics under PS conditions covered by these 40 time periods. Salinity predictions can also be used to inform future analyses including, but not limited to, the examination of historical distribution patterns of estuarine species relative to salinity variability and the prediction of salinity changes under various global climate change scenarios.
We thank the North Carolina Division of Marine Fisheries and the United States Geological Survey for providing datasets used in this study. We also thank editor A. Manning for helpful comments that improved the manuscript. Funding for this project was provided by the Environmental Defense Fund (Program Manager Pam Baker), North Carolina Coastal Recreational Fishing License Program (Grant No. 2010-H-004), North Carolina Sea Grant (R12-HCE-2) and the National Science Foundation (OCE-1155609) to D. Eggleston. A. Nail was supported as a VIGRE Postdoctoral Fellow by NSF grant DMS 0354189.
See Table A1.
|Name of covariance function|
|With nugget effect||Without nugget effect|
|Explanatory variable or set of explanatory variables||Adj R2|
|Model type||−2 log likelihood||AIC||BIC||RMSE (psu)||Slope/β1||Intercept/β0||R2|
|Exponential + ||7424.0||7580.0||7711.7||2.0||0.96||0.96||0.89|
|Gaussian + ||7532.0||7686.0||7816.0||2.1||0.94||1.15||0.87|
|Spherical + ||7571.6||7727.6||7859.3||2.0||0.96||0.93||0.89|
|Exponential + ||6217.1||6367.1||6493.7||2.1||0.92*||1.55*||0.87|
|Gaussian + ||6214.0||6366.0||6494.4||2.2||0.91*||1.90*||0.86|
|Spherical + ||6201.3||6357.3||6489.1||2.2||0.91*||1.86*||0.86|
|Time periods and mean predicted salinity rank (mmyy, r)|
|Moderate||(0687, 28), (0689, 27)|
|High||Flood||(0903*, 39), (0690, 29)|
|Moderate||(0698, 38), (0693, 36), (0697, 35)|
|High||(0696, 26), (0900, 24)|
|Low||(0997, 15), |