Formulas for the three spatial covariance functions used in this analysis.

## Abstract

Climate change is increasing variation in freshwater input and the intensity of this variation in estuarine systems throughout the world. Estuarine salinity responds to dynamic meteorological and hydrological processes with important consequences to physical features, such as vertical stratification, as well as living resources, such as the distribution, abundance and diversity of species. We developed and evaluated two space-time statistical models to predict bottom salinity in Pamlico Sound, NC: (i) process and (ii) time models. Both models used 20-years of observed salinity and contained a deterministic component designed to represent four key processes that affect salinity: (1) recent and long-term fresh water influx (FWI) from four rivers, (2) mixing with the ocean through inlets, (3) hurricane incidence, and (4) interactions among these variables. Freshwater discharge and distance from an inlet to the Atlantic Ocean explained the most variance in dynamic salinity. The final process model explained 89% of spatiotemporal variability in salinity in a withheld dataset, whereas the final time model explained 87% of the variability within the same withheld data set. This study provides a methodological template for modeling salinity and other normally-distributed abiotic variables in this lagoonal estuary.

### Keywords

- estuaries
- space-time model
- spatial covariance
- freshwater inflow
- process-based model
- salinity

## 1. Introduction

Estuarine salinity responds to dynamic meteorological and hydrological processes [1] with important consequences to physical features, such as vertical stratification, as well as living resources, such as the distribution, abundance and diversity of species [2, 3, 4, 5]. For example, relatively low mixing and subsequent salinity stratification can lead to hypoxia in areas where organically-rich sediments are not adequately re-oxygenated, causing emigration of mobile fauna and degradation of ecosystem functions [5, 6, 7, 8, 9]. Rapid salinity changes, such as those associated with large rainfall events or tropical cyclones, can cause death of postlarval stages that are sensitive to unusually low salinities [10], and mass seaward migration and subsequent hyper-aggregation of mobile, commercially important species that can result in (1) shifts of juveniles from primary nursery areas protected from trawling to secondary non-nursery areas vulnerable to fishing pressure [11], (2) overharvest of adults due to increases in fishery catchability [12], or (3) bias fishery-independent surveys that leads to over-inflated population abundance estimates [12]. Thus, the need to accurately predict the spatiotemporal dynamics of salinity is unprecedented. The specific goals of this study were to: (1) evaluate several statistical models to hindcast and forecast salinity in the second largest estuary and largest lagoonal estuary in the United States—Pamlico Sound, North Carolina, USA, and (2) assess salinity observations, predictions, and standard errors under five hydrologic scenarios characteristic of historic and future climate changes.

Pamlico Sound (PS) is a relatively shallow estuary with a mean depth of 4 m and a maximum depth of 7 m. PS circulation is dominated by wind-driven currents and freshwater input [13, 14]. Seasonal cyclonic storms are also an important climatological component of the PS system. Since 1996, over three tropical storms or hurricanes have passed within 300 km of the North Carolina coast per year [10]. Given the important role that salinity plays in the abiotic and biotic system components of estuaries, and the likelihood that global climate change will increase the frequency of extreme weather events (e.g., floods, droughts, hurricanes—[9, 15, 16]), there is a critical need for models that can accurately forecast spatiotemporal variation in salinity (e.g., [17]). A recent review by Iglesias et al. [17] highlights the strengths of applying numerical modeling tools to characterize morpho-hydrodynamic processes in estuarine and coastal systems. *Numerical methods* can include a large variety of models and techniques, such as finite element, finite difference, finite volume, or Eularian-Lagrangian models (e.g., [17, 18, 19]). Complex, three-dimensional numerical models used for simulation and forecasting of dynamic estuarine salinity can require significant effort and computation time that is beyond the capabilities of many local management agencies. Local management agencies sometimes require a quick turnaround time for long-term simulations or short-term forecasts of estuarine salinity conditions, which could be produced using location-specific statistical models. Therefore, the goals of this study were to (1) develop and evaluate two types of *statistical models* of bottom salinity in PS, and (2) apply the best models to produce sound-wide retrospective maps of bottom salinity based on observational data. Bottom (as opposed to surface) salinity was chosen as the variable of interest because it characterizes habitats of mobile demersal species that are important members of benthic food webs, and that are the targets of valuable commercial and recreational fisheries. Hereafter, the term ‘salinity’ will always refer to bottom salinity unless otherwise noted.

### 1.1 Statistical models to predict dynamic salinity

Producing retrospective salinity maps based on observational data does not require a statistical model based on hydrological mechanisms that affect salinity; it is possible to perform individual spatial interpolations for each time period of interest using an ordinary kriging model or a universal kriging model with a simple spatial trend. Predicting salinity under a hypothetical set of conditions, however, does require a model that can ‘learn’ about hydrological mechanisms based on retrospective data (e.g., [20, 21]). Thus, the more comprehensive goal of this study was to produce retrospective maps of salinity by developing a space-time statistical model in which the mean function represents the hydrological mechanisms that affect salinity, and a spatial covariance function makes up the difference between the observed salinity data and the mean function’s salinity prediction.

To create such a model, we constructed explanatory variables that accounted for the effect of riverine freshwater inflow (FWI), distance to inlet sources of oceanic saltwater, and hurricane incidence on salinities at different locations in PS. We used a forward-selection process to choose which of these variables to keep in the model. Standard errors based on the covariance function allowed for assessment of strengths and weaknesses of the representation of the hydrology in the mean function. Since an additional goal of this study was to provide a template for researchers to build process-based models of normally-distributed estuarine variables, we considered only models that could be fit using procedures in the SAS^{®} software package, yet can be adopted to R-statistical software.

Other process-based models of PS salinity in the literature—all of which are differential-equation-based deterministic models—provided important insights into how different variables influenced spatiotemporal salinity variation in PS ([22, 23], and others). However, these models ultimately lacked the spatial resolution and/or coverage of the entire area of interest of this study, and none quantified uncertainty at every space-time prediction location. For example, Xu et al. [24] predicted surface and bottom salinity, and temperature at 30-second intervals over a spatial grid with varying cell size (200–800 m^{2}) in the Pamlico River Estuary (PRE), a PS tributary, using a customized extension of the Environmental Fluid Dynamics Code [25] to incorporate FWI from major tributary rivers, as well as tide and wind effects on circulation. Although this model incorporated environmental variation and produced salinity predictions suitable to assess long-term space-time trends, the PRE makes up only 18% of the area of PS. Predicting salinity across the entire PS using this model would require spatial domain expansion and re-parameterization, and such extensions are not planned (J. Lin, NC State University, pers. comm. on behalf of Xu et al. [24]).

Though we are unaware of researchers that have constructed space-time statistical models of salinity in PS, there are examples of applying statistical models for spatial prediction of salinity in other estuaries. For example, Rathbun [26] used independent multiple linear regression models with spatially-correlated errors to predict salinity and dissolved oxygen (DO) in Charleston Harbor, SC over a two-week time period in 1988 as a function of spatial coordinates and distance to the estuary mouth. Chehata et al. [27] performed three-dimensional spatial interpolation of salinity and DO measurements in Chesapeake Bay. Qiu and Wan [20] developed a salinity model based on time series analyses of salinity data for the Caloosahatchee River Estuary, Florida, USA. The structure of their model consisted of an autoregressive term representing the system persistence and an exogenous term accounting for physical drivers including freshwater inflow, rainfall, and tidal water surface elevation that cause salinity to vary. The model was calibrated and validated using up to 20 years of measured data collected they found that the time series model offers comparable or superior performance compared with its 3-D, numerical counterpart. This model has been used as a tool for water resources management projects relating to ecosystem restoration and water control in south Florida [20]. Similarly, Ross et al. [21] examined the response of salinity in the Delaware Estuary, USA to climatic variations using statistical models and long-term (1950-present) records of salinity from the U.S. Geological Survey and the Haskin Shellfish Research Laboratory. The statistical models included non-parametric terms and were robust against auto-correlated and heteroscedastic errors. After using the models to adjust for the influence of streamflow and seasonal effects on salinity, several locations in the estuary showed significant upward trends in salinity. Insignificant trends are found at locations that are normally upstream of the salt front. The models indicate a positive correlation between rising sea levels and increasing residual salinity, with salinity rising from 2.5 to 4.4 psu per meter of sea-level rise. The results suggest that continued sea-level rise in the future will cause salinity to increase regardless of any variation in fresh water influx [21]. Urquhart et al. [28] present the results of multiple statistical models that predicted daily, gridded surface salinity at 1 km resolution across Chesapeake Bay, USA as a function of surface reflectance estimates of salinity from the NASA Moderate Resolution Imaging Spectroradiometer (MODIS), onboard the Aqua platform satellite. Eight statistical methods were tested, and sea surface salinity was accurately predicted via remote sensed products with an accuracy that was more than sufficient for many physical and ecological applications [28].

None of these previous studies, however, attempted to explicitly represent the hydrological processes by which fresh and saltwater mixing affects estuarine salinity. In this paper, we describe the development of candidate explanatory variables to represent mechanisms affecting PS salinity and how that development led to consideration of two fundamentally different mean functions. We then describe the forward selection process by which candidate variables were chosen to be retained in the models, and how candidate covariance functions were selected to pair with each mean function. Next, we examined maps of salinity observations, predictions, and standard errors under five hydrologic scenarios, analyzed these results, and provided overall implications of the findings.

## 2. Methods and results

### 2.1 Data and notation

We used bottom salinity values measured by the North Carolina Division of Marine Fisheries (NC DMF) Pamlico Sound Trawl Survey Program 195 (*the survey*) every June and September from 1987 to 2006. The survey is conducted only in June and September each year. Designed to assess species abundance at depths over 2 m, the survey uses a weighted stratified random sampling design. For each time period, coordinates of stations are randomly generated within each of seven water body strata, with more stations allocated to larger strata, for a total of 54 stations per time period. Hereafter, we denote with *S*. Salinity was measured using a YSI-85 multi-function meter at the beginning of each trawl and recorded along with depth and spatial reference coordinates. All spatial coordinates used in this analysis were converted from decimal degrees to northings and eastings in nautical miles (nmi) from a reference point (the origin in Figure 1) located southwest of

The temporal domain contains *T* = 40 time periods, or month/year combinations, indexed by the subscript *t*. Site refers to a specific spatial location nested within a particular time period and is indexed using the subscript

The fresh water influx (FWI) data represented watersheds of the Neuse, Pamlico, Roanoke, and Chowan rivers, which comprise 80% of the land draining into PS [29]. FWI observations were average daily river discharge rates collected by one US Geological Survey (USGS) gauge station per tributary (Figure 1): Neuse River (NR) station 02089500 in Kinston; Tar-Pamlico River (TPR) station 02083500 in Tarboro; Roanoke River (RR) station 02080500 in Roanoke Rapids; and Ahoskie Creek (AC) station 02053500 in Ahoskie, which gauges Chowan River inflow. Discharge rates in ft^{3}/s for every day during the time domain (7305 days) were downloaded from the USGS Water Resources website for the state of North Carolina (USGS 2009) and were converted to m^{3}/s. For each river, the gauge chosen was the furthest downstream gauge that recorded data over the entire temporal domain.

### 2.2 Candidate explanatory variables

The creation of explanatory variables reflects the modeling context—the objectives, the geographical features of the spatial domain, and the space-time coverage and resolution of the data—but the general thought process can be modified by other researchers in a different context. We index the term *it* as any variable that varies in both space and time, and with *t* any variable that varies over time but is constant over

### 2.3 Freshwater influx indices

Sixty-one days is the average freshwater residence time of the four major rivers flowing into PS [30, 31, 32], accounting for the temporal lag between the upriver gauging of freshwater and the delivery of that water to *S*. Therefore, we defined the long-term metric *mt*, the first day of the survey in time period *t*. Because Ramus et al. [33] calculated a seven-day residence time for the Neuse and Pamlico Rivers after Hurricanes Dennis and Floyd deposited 1 m of rainfall in eastern NC less than 2 weeks before the September 1999 survey, we defined the short-term metric _{.}

Since freshwater from river *r* in time period *t* should have more of an effect on *i* is to the river, a unique measure of the influence of

The coordinates of each gauge station were used to calculate distance because the gauge was the location of the

The plot in Figure 2 of

(A fortieth indicator variable was not used because it would create a non-full-rank design matrix, and the effect for the fortieth time period can be derived using the intercept.) This latter consideration led to the creation of two distinct mean function models: the *process* and *time* models. The first has process variables only, and the second has process variables in addition to the time-period indicator variables to address the possibility that salinity is affected by some aspect of physical phenomena that is not accounted for by any other variable in the model.

### 2.4 Saltwater mixing and tidal signal

Although salinity on the inner-continental shelf of the U.S. Southeast Atlantic coast exhibits some spatial variability near PS [37], we follow Xie et al. [38] and assume constant open ocean salinity. This assumption allows for modeling the effect of ocean water mixing as a function of only the distance to inlet, as opposed to distance interacting with the salinity of the ocean water, from each spatial location in the sound to each of the major PS inlets: Oregon, Hatteras, and Ocracoke. Exploratory analyses reveal that models using a single variable (distance to the nearest inlet) rather than three variables (distances to each of the three inlets), explains the same amount of variability in salinity when other explanatory variables are also included. Therefore, we consider for inclusion in subsequent models the variable *,* defined to be the distance separating site *i*, sampled in time period *t*, from the center of the most proximate inlet.

### 2.5 Wind speed and direction

A prevailing wind field that is north/northeast from March to August and south/southwest from September to February is the primary driver of currents in PS [39]. Thus, wind speed and direction were incorporated into the modeling process using the categorical variable *montht*, where

is used to examine the effects of seasonal wind patterns on the spatial distribution of salinity.

### 2.6 Evaporation and direct precipitation

Holding other factors constant, sound-wide salinity in time periods that experience more evaporation of water from the surface of PS would likely be higher than those in time periods that experienced less evaporation, but no evaporation data were available for the space-time domain of interest. Salinity in time periods for which there was more direct precipitation into

### 2.7 Spatial coordinates

Estuarine salinity varies over space such that functions of spatial coordinates might explain variability in salinity not accounted for by the other variables. Scatterplots of salinity versus easting and northing suggested that salinity is quadratic in the former and cubic in the latter. The quadratic function of easting can be explained by examining a west-to-east path through PS along the 35° 16′ N parallel (A in Figure 1): salinity should initially increase, reach a maximum at the saltwater plume near Ocracoke and Hatteras Inlets, and decrease again on the other side of the plume in the waters on the western shore of Hatteras Island near Buxton, NC. The cubic function of northing is best described by examining a north-to-south path along longitude of 75° 42′ W (B in Figure 1), where salinity should increase traveling south from Albemarle Sound, reach a local maximum near Oregon Inlet, decrease continuing past the saltwater inlet plume, and increase again as the Hatteras Inlet saltwater plume is reached. Thus,

### 2.8 Hurricanes

Hurricanes can rapidly introduce large volumes of freshwater to estuaries via riverine influx, push large volumes of saltwater in through inlets via storm surge, and alter circulation patterns through abrupt changes in wind speed and direction [7, 10]. Hurricanes can also open new inlets to PS, which can alter current flow and increase saltwater intrusion [41]. The variable *t* but are constant over all sites *t.* The continuous variable *mt*, it takes the value zero. Finally, the discrete variable *num_stormst* equals the number of hurricanes making landfall in NC in the 61 days prior to

### 2.9 Variable selection

Section 3 identifies 46 candidate explanatory variables for the process model mean function: *num_stormst*. For the time model, there were an additional 39 time period indicator variables. Some variables—in either model—may be redundant. There is overlap among the hurricane variables, and spatial coordinates may not be necessary if other variables explain more variability in salinity. The set of variables included in the final model(s) should balance goodness-of-fit with parsimony. We first describe the variable-selection process for the process model, then for the time model.

### 2.10 Process model

The results of eight separate ordinary least squares linear regression models of salinity make up the rows Table 1. The first five consist of an intercept and a single explanatory variable: *num_stormst*, and

Adjusted R^{2} is a modification of R^{2} that penalizes the number of explanatory variables. While R^{2} increases as more variables are added to a model, adjusted R^{2} increases only if the added variable decreases the error sum of squares enough to offset the loss in error degrees of freedom.

The model with the long-term freshwater influx indices had the largest adjusted R^{2} at 0.38, followed by the model with the distance from the nearest inlet (0.34), and the model with the short-term FWI indices (0.27). None of the other four models explained more than 5% of the variability in salinity. We chose the model with the long-term freshwater influx indices as the base upon which to build the mean function.

To this base model we added the variable ^{2} exceeded the old. Variables from the seven initial models were then added in order of decreasing adjusted R^{2}. Following this procedure, the mean trend model grew to contain 10 variables—^{2} 0.57.

Because the effect of FWI from one river on a given location in PS could change based on the FWI from another river during the same time period, we evaluated the addition of the 6 pair-wise interactions among the four ^{2} was 0.66, so the set was retained.

Spatial coordinate variables were evaluated last in groups according to their polynomial order, with squared and cubic terms added before interactions. We considered these variables last because we wanted to include them only if they explained additional variability in the response after more interpretable variables were included. We determined that including all variables except ^{2}. The final process model mean function thus had an adjusted R^{2} of 0.73 and included the following:

### 2.11 Time model

To build the time model, we followed the same procedure described above, selecting for the base of the mean function a set of time period indicator variables because a linear regression of ^{2} of 0.41 (Table 1). (Note that such a model is equivalent to fitting an ANOVA model using the time periods as groups.) Again, we added other sets of explanatory variables in order of decreasing adjusted R^{2}. Before evaluating interactions, the mean trend time model had an adjusted R^{2} of 0.78 and contained 48 variables: ^{2} (0.89) was larger than that of the previous mean trend time model (0.78). After investigating spatial coordinate variables, the final mean trend time model (below) had an adjusted R^{2} of 0.91 and included 204 variables: ^{2} of 0.73 for the process model and 0.91 for the time model were based on fitting each model to the full dataset. In the next section, we report R^{2} (not adjusted R^{2}) based on a cross-validation dataset.

### 2.12 Modeling spatially correlated error

The variable selection analyses above used ordinary least squares (OLS) regression to model salinity as a function of explanatory variables. That model can be written as

where

where bold print indicates vectors so that **ε**, and **0** are

Rarely, however, does the assumption of independent and identically distributed errors hold for observations of natural phenomena associated with locations in space and time. While it is intuitive that values of salinity located close together in space should be similar, it is also generally the case that the deviations from the mean function of observations located close together are similar. That similarity is referred to as spatial covariance, and the spatial covariance between deviations from the mean trend at two locations within the same time period can be modeled as a function of the distance separating them. Including in the overall model both a deterministic mean function and a spatial covariance function allowed predictions of salinity at locations where there were no observations.

Valid covariance functions ensure that the covariance matrix will be positive definite, which, in turn, ensures that variances will be non-negative. Each covariance function has a shape defined by a range parameter, a partial sill, and sometimes a nugget effect. Appendix Table A1 gives formulas for determining spatial covariance according to the exponential, Gaussian, and spherical covariance functions, each with and without a nugget effect. Figure 3 shows an example of the spherical covariance function—the solid red line—fit to a sample covariogram—the blue dots—of deviations from the process model for June 1994. The range parameter—^{®} Proc Mixed to allow a different partial sill and range parameter for each time period.

Model (3), modified to include spatial correlation, becomes

where

where zero matrices for off-diagonal elements indicate that deviations in one time period are not correlated with those in another. We make this assumption partially due to the long time span separating June and September, but also because no SAS^{®} procedure has the capacity to model such space-time correlation while at the same time allowing every time period to have different spatial covariance parameters and allowing a mean function to be fit. Diagonal elements *i* and *j* in time period *t*.

Understanding how predictions of salinity and prediction standard errors are generated from this model will make the results and analysis in Sections 6 and 7 easier to understand. To predict salinity at space–time locations where it is not observed, the following results are needed. Superscripts differentiate between locations where salinity is observed and unobserved. Model (4), represents observations of salinity (by virtue of the dimensions of the vectors and matrices), but we model salinity observations and unobserved values of salinity at other space-time locations using a similar model, the joint distribution of unobserved and observed salinity, given by

Here,

and

Let

The pipe symbol (|) means “given” or “conditioned on knowing the values of” the terms following the pipe symbol. The terms before the comma represent the mean of the multivariate normal distribution, which is used for the salinity prediction, and the terms after the comma represent the variance-covariance matrix, which is used for prediction standard errors. Salinity predictions are the sum of the mean trend,

The salinity predictor

is an *exact predictor*: the prediction of salinity at a site where there is an observation will exactly equal the observation. For this reason, to determine which spatial covariance function to use, we randomly selected 10% of the observations to withhold as a cross-validation dataset, the *test dataset*; the remaining 90% we term the *base dataset*. For every combination of the two mean functions—process and time—and the six spatial covariance functions in Appendix Table A1, we fit model (4) to the base dataset, and predicted salinity values at the space–time locations of the test dataset using the results given in (5) and (6). When the model predicted salinity to be less than zero, we set the prediction equal to zero before calculating the following statistics. Predictions of negative values could be avoided using a truncated normal distribution, but SAS^{®} Proc Mixed does not permit specification of this distribution. The root mean squared error (RMSE) of predictions—with the same units as salinity—are given in Table 2, along with the slope, intercept, and coefficient of determination (R^{2}) from a regression of actual salinity values in the test dataset on predictions of them. If predictions were perfect, this regression would have slope equal to one, intercept equal to zero, and R^{2} equal to 1.

Salinity predictions are better when a spatial covariance function is combined with either mean function. For example, of the time models, the exponential covariance function with a nugget produced predictions with the lowest RMSE (2.1), slope closest to one (0.92), and intercept closest to zero (1.55). Comparing process models, the exponential and spherical, each with and without a nugget, performed equally well, and better than the time models. To select the best model from this group of four, we examined statistics based on how well the model fit the base dataset. The model with an exponential covariance function with a nugget had the lowest AIC (7580.0) and BIC (7711.7) and was thus chosen as the final model. It explained 89% of variability in the test dataset and generated predictions with RMSE 2.0.

Next, we fit this model using the full dataset, and produced retrospective maps of salinity predictions and standard errors at evenly spaced 1 nmi (1.85 km) increments for each time period. Forty-two salinity predictions—less than 0.1% of the total number of predictions—were negative and set to zero.

### 2.13 Examining freshwater influx scenarios

To examine variations in the spatial distribution of salinity under drought, average, and flood conditions, we classified freshwater influx from each river within each time period (

*Moderate-to-moderate FWI*. June 2005 (Figure 4) experienced moderate FWI in both the 2 months and 1 week prior to the survey in PS with predicted salinity ranked 37th—the lowest of the moderate-to-moderate time periods. Legend colors for model predictions in the left pane and observations in the upper right pane of Figure 4 (as well as Figure 5A and B**,**6A and B) are based on percentiles of the distribution of observed salinity across all time periods: minimum to 5%; 5–10%; 10–25%; 25–50%; 50–75%; 75–90%; 90–95%; and 95% to maximum. From the left pane of Figure 4, predicted salinity in June 2005 increased moving east across PS, reaching a maximum just south of Oregon Inlet. We note the same east-west salinity gradient when comparing this pane to the June 2005 map of observed salinities (top right pane), indicating that prediction maps typically mirror trends seen in observation maps. The area of highest predicted salinity corresponds to a lone purple observation of 26.5 just south of Oregon Inlet (Figure 4). Plumes of relatively higher salinity are evident in the vicinity of all three ocean inlets (Figure 4).

The lower right pane of Figure 4 (as well as Figure 5A and B**,**6A and B) displays prediction standard errors (SE) with the same units as salinity. The same eight percentile groups classify colors on the SE legend, here based on the distribution of prediction standard errors across all time periods. The transition from low SE at sample sites to higher SE moving away from sample sites reflects the fact that the exact predictor (6) reproduces observations, so confidence intervals closer to sample sites are narrower than those further away.

This spatial trend in SEs is further illustrated by comparing locations of high SE in the same time period, which are also consistent over time. High SEs occur between the mouths of the Neuse and Pamlico Rivers and along a margin of varying width following the outline of the Outer Banks, areas within which sampling does not occur (Figure 1). We note here that because SEs increase as distance from sample site increases, we chose to generate only interpolated (and not extrapolated) salinity predictions. In June 2005, as in all other time periods, predictions were generated only for locations within *S*, which does not extend either to Albemarle Sound or to the heads of the Neuse and Pamlico Rivers (Figure 4).

*Low to low FWI in early and late-stage drought.* June 1999 (Figure 5A) and June 2002 (Figure 5B)—which mark early and late stages of North Carolina’s 1998–2002 drought [42]—experienced low long- and short-term FWI with predicted salinity ranking 12th and 4th, respectively. At every point in PS, predicted salinity in these two time periods was higher than in June 2005, and predicted salinity was much higher in June 2002 than June 1999, though both have similar values for

Though June 2002 salinity observations have a larger mean and greater variability, the majority of prediction standard errors are less than 1.01. In June 1999, however, SEs fell between 1.01 and 1.81 at all prediction locations except those that were very close to observations. This result shows that the conditions affecting salinity in PS were better represented by the mean function in June 2002 than they were in June 1999.

*Flood to flood FWI—with and without hurricanes*. FWI was extremely high in September 1999 (Figure 5A) as a result of the 500-year floods produced by Hurricanes Dennis and Floyd that occurred 24 and 12 days before the survey, respectively. In June 2003 (Figure 5B), extremely high FWI was due to an eight-month period of above-average precipitation totals prior to the survey. Though these are the only two time periods categorized as flood-to-flood, predicted salinity in September 1999 ranks a surprisingly high 30th, while in June 2003 it ranks 40th. Observed and predicted salinity for these two time periods are lower than those in the low-FWI time periods of June 1999 and 2002, but in September 1999, salinity was higher at most prediction locations, and more variable, than in moderate-FWI June 2005. Water at locations near the two southerly inlets to PS was more saline in September 1999 than in these same locations during moderate-FWI of June 2005 likely due to storm surge-generated inlet plumes. Salinity at locations near the Neuse and Tar-Pamlico Rivers was similar to that in June 2005. Standard errors were lower sound-wide in June 2003 than in September 1999. SEs in September 1999 were highest sound-wide relative to the other four time periods examined (Figures 5 and 6).

## 3. Discussion

Because water exchange between lagoonal estuaries and the open ocean can be relatively restricted, there is a relatively high potential in systems like PS for changes in precipitation patterns and storm frequencies associated with global climate change to result in changes in salinity patterns and subsequent ecosystem alterations. Changes in precipitation will affect the amount and timing of river flow, which will impact nutrient cycling, estuarine flushing rates, and salinity. Increased storm activity may open new inlets, which would alter current flow, increase tidal action, and allow a greater influx of seawater that carries with it both different chemical signals and mobile species. Salinity is therefore a practical estuarine characteristic to use to study the impacts of these changes, as both effects mentioned above include enhanced water exchange that impacts overall estuarine salinity content [43, 44].

We developed and evaluated two statistical models, using the best model to hindcast salinity in PS. The process mean function combined with the exponential covariance with a nugget explained 89% of the variability in a test dataset with a RMSE of 2.0 and produced relatively accurate retrospective salinity maps under a wide range of freshwater influx and system-state scenarios. Much of this accuracy was due to allowing the range and partial sill parameters of the spatial covariance to be time-period specific. We then examined variations in the spatial distribution of salinity under varying freshwater influx (FWI) conditions such as drought, average FWI, and flood conditions, and identified the following patterns. In years with moderate FWI, the salinity gradient increased from west to east in PS as expected, and was highest adjacent to the major inlets, with highest salinities near Oregon Inlet. In years with low FWI indicative of drought conditions, the overall mean and variance in salinity increased in PS. In years with floods, salinities displayed a high degree of spatial variation, with salinities being lower near the tributaries as expected, yet also displaying occasional sharp increases in salinity near inlets due to influx of ocean water into PS via the major inlets.

### 3.1 Improvements to model predictions

For retrospective prediction purposes, model improvements could focus on improvements to the mean trend, the covariance, or both, and such improvements could be evaluated using the test dataset. A reasonable goal might be to increase *R*^{2} to 0.93 or to reduce RMSE to 1.5. Improvements for the purpose of prospective prediction of salinity under hypothetical, unobserved conditions, a situation in which spatial covariance among observation deviations cannot be used, would entail improving the mean function exclusively. Locations and time periods with high SEs highlight conditions not well-represented by the current mean function. A reasonable goal here would be to produce a model for which all values of SE fall beneath the current median (1.32).

*Mean function*. The mean function alone explained over two-thirds of the variability in salinity in both process and time models. While this is a noteworthy accomplishment, there remains room for multiple improvements. High SE values in Figure 5A show that the mean function is unable to capture the interaction between high FWI in September 1999 and hurricane storm surges. One hurricane explanatory variable, *inverse_days_surveyt*, remained in the final process model. Its parameter estimate was positive, reflecting that strong hurricane winds push more saltwater into PS through inlets than would enter under typical seasonal wind conditions, but alone it explained only 4% of salinity variability in the full dataset. The *inverse_days_surveyt*, variable did not differentiate between a year in which a single hurricane passed within 12 days of the survey and a year in which such a hurricane followed another that passed 12 days earlier. A future effort might attempt to account for cumulative build-up of storm surge on observed PS salinities.

Though **–**5 suggests that this distance metric should be modified based on wind speed and direction, using more finely resolved wind information than the *u* and *v* components of wind to interact with

Differences in both salinity values and SE estimates between early-stage drought during June 1999 and late-stage drought during June 2002 suggest accounting for effects of FWI over a longer duration than 61 days. Doing so might explain differences in salinity patterns seen in time periods with similar one-week and two-month FWI conditions. Molina [45] calculated an 11 month mean residence time for freshwater in PS. We could incorporate this effect by adding a third freshwater influx index to the mean function or by adding an autoregressive component to the model so that salinity in a given time period was a function of mean salinity in the previous time period. The first option would be tedious from a data-manipulation standpoint, but much easier from a mathematical model-fitting standpoint, because SAS^{®} Proc Mixed could still be used. The second option necessitates a change in the covariance function, as we can no longer assume that salinity deviations from the mean function at a given space-time point were independent in time. This second option would also require specialized hand-written code, as no current SAS^{®} Proc allows such a dynamic space-time model to be fit.

Differences in salinity patterns between June 1999 and June 2002, our two low-to-low FWI time periods, could be attributed to differences in FWI from the Roanoke River, one of the two northern rivers whose connection to PS is indirect. This observation warrants further investigation into the calculation of the FWII indices; namely, an investigation of water-path distance as a possible substitute for crow-flies distance between river gauges and sites in PS. Although we did not find a study that demonstrated marked predictive improvement using water-path distance under all circumstances ([36, 46], and others), it would be interesting in future work to compare differences in PS salinity predictions using both distance methods. Recall that Gardner et al. [34] noted more accurate predictions of stream temperatures when models incorporated water-path distance, but only when this distance was further modified and weighted by stream order. It might be the case that water-path distance out-performs crow-flies distance in predicting estuarine salinity when care is taken to make all explanatory variables as meaningful as possible. Development of an automated procedure for calculating water-path distances similar to the one used in [47] would make such an investigation more practically feasible.

*Covariance function*. Two mutually-exclusive improvements to the covariance function, as implemented in SAS^{®} Proc Mixed, could be investigated: using either the Matern covariance function or an anisotropic covariance function to achieve greater flexibility in each time period. The Matern covariance function has a smoothing parameter in addition to partial sill and range parameters. When the smoothing parameter takes the value of 0.5, the Matern covariance function is the same as the exponential covariance function—as the smoothness parameter approaches infinity, the covariance function approaches the Gaussian covariance function. Using the Matern covariance function is thus equivalent to allowing a third parameter to determine which two-parameter covariance function is appropriate, as opposed to using the same two-parameter covariance function for every time period. The computational cost of this flexibility is high—in a similar model with only four separate groups of covariance parameters, compared to the 40 groups in this paper—co-author Amy Nail experienced computation time of 2 h (versus a 2 min run time using the two-parameter exponential covariance function here). The added computational burden is due to the complex nature of the Matern covariance function and to the necessity of estimating one additional covariance parameter per time period (for a total of 40 additional parameters).

Another way to achieve flexibility while still specifying a single covariance function for every time period, would be to allow an anisotropic covariance function. Geometric anisotropy allows for different range parameters in different directions. For example, if the water current in PS were flowing directly north-to-south, two points separated by a north-to-south vector might have more similar values of salinity than would two points separated by a west-to-east vector of the same length. Fortunately, the parameterization of a geometric anisotropic covariance function is such that if anisotropy were unnecessary, the parameters would take values that effectively result in an isotropic covariance function. The cost of this added flexibility is the need to estimate two additional covariance parameters per time period, for a total of 80 additional parameters. Computation time might be less here than for Matern, since anisotropic covariance functional forms are less complex.

## 4. Conclusions

We created a statistical model combining a process mean function with an exponential spatial covariance function with a nugget to predict salinity in a lagoonal estuary. This model can generate predictions of bottom salinity for Pamlico Sound, NC that are more spatially-resolute than any previous bottom salinity predictions encountered in the literature for this system. The salinity maps produced using the model are useful for researchers to build an intuitive understanding of salinity dynamics under PS conditions covered by these 40 time periods. Salinity predictions can also be used to inform future analyses including, but not limited to, the examination of historical distribution patterns of estuarine species relative to salinity variability and the prediction of salinity changes under various global climate change scenarios.

## Acknowledgments

We thank the North Carolina Division of Marine Fisheries and the United States Geological Survey for providing datasets used in this study. We also thank editor A. Manning for helpful comments that improved the manuscript. Funding for this project was provided by the Environmental Defense Fund (Program Manager Pam Baker), North Carolina Coastal Recreational Fishing License Program (Grant No. 2010-H-004), North Carolina Sea Grant (R12-HCE-2) and the National Science Foundation (OCE-1155609) to D. Eggleston. A. Nail was supported as a VIGRE Postdoctoral Fellow by NSF grant DMS 0354189.

See Table A1.

Name of covariance function | ||
---|---|---|

With nugget effect | Without nugget effect | |

Exponential | ||

Gaussian | ||

Spherical | ||

Note: For all models,andis the distance separating sitesiandNote: I(statement) = 1 if statement is true and 0 otherwise. |

Explanatory variable or set of explanatory variables | Adj R^{2} |
---|---|

0.34 | |

0.049 | |

0.035 | |

num_stormst | 0.029 |

0.015 | |

0.27 | |

0.38 | |

0.41 |

Model type | −2 log likelihood | AIC | BIC | RMSE (psu) | Slope/β_{1} | Intercept/β_{0} | R^{2} | |
---|---|---|---|---|---|---|---|---|

Process | IID | 9935.9 | 9937.9 | 9943.5 | 2.9 | 0.98 | 0.84 | 0.74 |

Exponential | 7430.7 | 7584.7 | 7714.7 | 2.0 | 0.95 | 1.03 | 0.89 | |

Exponential + σ^{2}n* | 7424.0 | 7580.0 | 7711.7 | 2.0 | 0.96 | 0.96 | 0.89 | |

Gaussian | 8198.0 | 8356.0 | 8489.5 | 2.3 | 0.94 | 1.37 | 0.84 | |

Gaussian + σ^{2}n* | 7532.0 | 7686.0 | 7816.0 | 2.1 | 0.94 | 1.15 | 0.87 | |

Spherical | 7570.0 | 7722.0 | 7850.4 | 2.0 | 0.95 | 1.07 | 0.88 | |

Spherical + σ^{2}n* | 7571.6 | 7727.6 | 7859.3 | 2.0 | 0.96 | 0.93 | 0.89 | |

Time | IID | 7077.5 | 7079.5 | 7084.9 | 2.6 | 0.83* | 3.47* | 0.83 |

Exponential | Infinite | |||||||

Exponential + σ^{2}n | 6217.1 | 6367.1 | 6493.7 | 2.1 | 0.92* | 1.55* | 0.87 | |

Gaussian | 6281.0 | 6433.0 | 6561.3 | 2.2 | 0.90* | 1.98* | 0.86 | |

Gaussian + σ^{2}n* | 6214.0 | 6366.0 | 6494.4 | 2.2 | 0.91* | 1.90* | 0.86 | |

Spherical | 6199.6 | 6315.6 | 6479.9 | 2.2 | 0.91* | 1.86* | 0.86 | |

Spherical + σ^{2}n | 6201.3 | 6357.3 | 6489.1 | 2.2 | 0.91* | 1.86* | 0.86 |

2mo_FWIrt | 1wk_FWIrt | Time periods and mean predicted salinity rank (mmyy, r) |
---|---|---|

Flood | Flood | (0603, 40), (0999*, 30) |

High | none | |

Moderate | (0687, 28), (0689, 27) | |

Low | None | |

High | Flood | (0903*, 39), (0690, 29) |

High | (0904*, 32) | |

Moderate | (0698, 38), (0693, 36), (0697, 35) | |

Low | None | |

Moderate | Flood | (0996*, 33) |

High | (0696, 26), (0900, 24) | |

Moderate | (0605, 37), (0989, 31), (0601, 26), (0600, 22), (0604, 21), (0688, 16), (0990, 13), (0692, 10) | |

Low | (0694, 17) | |

Low | Flood | (0987, 18) |

High | (0695, 6) | |

Moderate | (0905, 20) | |

Low | (0997, 15), (0699, 12), (0901, 8), (0902, 7), (0993, 5), (0602, 4), (0988, 3), (0994, 1) |