Mean (*μ*) and standard deviation (*σ*) of the differences between foreseen and measured total tide level as a function of lead time.

## Abstract

It is widely known that small and semi-enclosed basins could be inclined to storm surge events. This is mainly due to either the meteorological exposition, to the presence of a continental shelf or to their shape. These storm surges can induce coastal flooding and consequent problems in terms of infrastructure stability and damage to touristic activities or, in some cases, threaten human life. Therefore, in order to manage the risk, coastal managers or policymakers need to have forecast or hindcast tools. They must help to take preventive actions that may be done previously to the occurrence of natural phenomena and to carry out simultaneous actions useful during the occurrence of the event. This work aims at answering these necessities presenting a review of two methods for storm surge forecast and hindcast in semi-enclosed basins.

### Keywords

- storm surge forecasting
- storm surge hindcasting
- sea level variations
- artificial neural networks
- extreme events
- generalized Pareto distribution
- coastal flood risk
- risk assessment
- sea level rise
- climate change

## 1. Introduction

The last few years have seen the increase of human settlement in the coastal areas, for social, touristic or economic reasons. This phenomenon leads to a consequent increase in human occupation of coastal areas (e.g. [1]). All these factors, combined with the attention paid to the sea level rise scenarios (e.g. [2]) and with the problem of coastal erosion (e.g. [3, 4, 5]), justify the increasing attention paid to storm surge events and related coastal flooding.

As it is well known, the tide level is the superposition of a harmonic component related to the mutual influence of the earth, sun and moon and a meteorological one. The first is purely deterministic and has been studied since the last years of the nineteenth century. Indeed, since the observation of the phenomenon started before Christ, only after Newton’s theory [6] researchers as George Darwin [7] provided the first mathematical description of the tides. However, the first model of the tides is the “harmonic theory” developed in 1927 by Doodson [8]. It considers the tides as a superposition of sinusoidal constituents, whose frequencies are referred to as those evaluated on the basis of astronomical forces. The amplitude, instead, is influenced by the shallow water conditions, the coastal effects and morphological phenomena due to the interaction between waves and bottom [9]. For a more comprehensive description of the “harmonic theory”, the reader may refer to Godin, McCully, and Pugh [9, 10, 11].

The difference between the deterministic component and the total measured level oscillation can be related to meteorological phenomena and may be defined as *storm surge*. It is also referred to as *meteorological component* or *residual*. In European literature, the term storm surge is commonly used [12].

Generally speaking, the generation of storm surges is (mainly) due to pressure gradients and wind set-up. The effect of the wind is to push the water in the principal direction of the wind causing an increase in sea level. The other factor that induces variation in sea level is the barotropic field (e.g. pressure anomalies) causing the physical phenomenon of the “inverse barometric effect” that is the increase of mean sea level as the pressure decreases (e.g. 1 cm for each hPa).

Talking about small and semi-enclosed basins (e.g. Adriatic Sea, Black Sea, Caspian Sea, Great Lakes, etc.), there is another effect contributing to the level increase. This effect may be attributed to the case in which meteorological perturbations persist for a long time over the basin (e.g. until several days). In those situations, there are two main consequences: The first is that it could be difficult to forecast the storm surge (e.g. [13]), and the second is related to the dynamic of the basin. Indeed, if the basin is semi-enclosed, its natural modes, i.e. seiches, may be excited, and level oscillations may persist for several days in the whole basin area (e.g. [14, 15, 16]).

While the astronomical component can be estimated and reconstructed by means of standard techniques (based on the theory of harmonic analysis) (e.g. [11, 17, 18]) performed by using measured level time series, the reconstruction of storm surge events, with both forecast and hindcast purposes, is not deterministic and requires more effort in its evaluation.

The topic of storm surge and their forecast has been investigated in the past by many researchers (e.g. [19, 20, 21, 22]), and the importance of the topic is also highlighted observing that there are countries that are being equipped with early warning systems (e.g. [23]).

From a practical point of view, there are three ways to study storm surges: pure numerical approaches (i.e. circulation models), statistical approaches (i.e. artificial neural networks) and mixed approaches.

In the case of a pure numerical approach, the aim is to focus the attention on the ability of the model to reproduce the physical processes (e.g. [24]). These methods are physics-based and are often referred to as “dynamical method” (e.g. [25]). They numerically solve the classical mathematical set of equations composed by the continuity equation and the equation of motion where the initial and boundary conditions are given by a meteorological model (e.g. [25]). As pointed out by Vilibić et al. (e.g. [25]), the first examples of dynamical methods, at least for the Adriatic Sea, are [26, 27], for the only Adriatic area and for the entire Mediterranean Sea, respectively. An improvement in terms of forecast reliability (also in terms of forecast window) is the use of ensemble-based prediction systems (e.g. [28]) that allow having a more consistent forecast than that obtained with a single deterministic one.

Statistical approaches, instead, are based on the use of regression models to estimate a series of predictors and weights to forecast the desired variable (i.e. water level). The database is usually composed by observed or forecast and/or hindcast data given by a meteorological model (e.g. [29, 30, 31, 32, 33]).

Numerical models are accurate, with great reliability, as they are physics-based. However they involve high computational costs compared to the statistical ones. These are fast, show an acceptable reliability, but have no physics inside.

A possible way to overcome these problems is the use of the last category of approaches, the “mixed” ones. They statistically correct the results coming from numerical models with the aim to reduce the computational costs. The idea is to use numerical models accepting low reliability using meshes with lower resolution (i.e. lower computational costs) and correcting the obtained results by means of statistical tools (e.g. [34, 35, 36]).

In all the three cases (dynamical, statistical and mixed approaches), meteorological data provided by global general circulation models (e.g. European Centre for Medium-Range Weather Forecasts (ECMWF), the Meteorological Research Institute model (MRI-AGCM3.2) [37, 38]) have to be used, so the final reliability is often related to those of the GCMs.

This chapter aims to propose a review of two simplified methods [36, 39] finalized to forecast and hindcast storm surge levels. Both approaches are mixed, so they have a first physics-based approach where the water level due to the wind is evaluated (once at all) and a separate step in which the obtained results are corrected (i.e. the barotropic effects are considered) by means of statistical techniques. The chapter is structured as follows. Section 2 describes the general outline of the two methods, Section 2.2 illustrates the forecast method, while Section 2.3 details the hindcast one. The applications of the two approaches are described in Section 3. Concluding remarks close the chapter.

## 2. Methods

### 2.1 Outline

The whole idea behind the presented forecast and hindcast methods lies in the theory of linear dynamic systems (e.g. [40]). The concept is to consider, at least for technical purposes, the dynamics of semi-enclosed basins (e.g. [41]) as linear. The linearity allows to compute the unit response function of the basin and use it to determine (by using the convolution integral) the response of water level due to any wind time series.

More in detail, at this step a basin is discretized in small areas. In each area, the wind must be considered homogeneous and constant. The total number of the areas must be chosen in order to capture the variation of wind field across the basin (i.e. gradients). Considering a generic point of interest in the domain (hereinafter referred as POI), the elevation *i* of the domain will be

where *t* acting on area *i*, while *U* and *V* are the components of wind stress impulse along with the two Cartesian directions (zonal and meridional, respectively) acting on the same area.

Considering *M* wind stress impulses, Eq. (1) becomes

The value obtained with Eq. (2) is the water elevation of a POI due to a series of wind stress impulses *N* areas can be easily calculated by using the superposition of the role of each area; therefore, Eq. (2) must be modified to as

with

The values of *U* and *V* can be forecast or hindcast data provided by global general circulation models and must be taken at a point at each area.

The unit response functions

However, the computed level is due to the wind field effect and does not consider the role of the barotropic field in the storm surge generation. For this reason, it can be viewed as a “raw level”.

The pressure field is considered using statistical techniques. The forecast and the hindcast models have the physics-based module as a common part but correct the raw level in a different manner.

### 2.2 Forecast method statistical correction

As previously underlined the statistical corrections are often carried out using regression models or, alternatively, artificial neural networks (ANNs).

In the proposed method, the use of a series of ANNs aimed at correcting the forecast for each lead time is suggested. Without claiming to be exhaustive, the ANNs could be defined as a statistical tool that reproduces the human ability of learning. They are made up of a layer of input neurons that, interacting with connections characterized by their own weight (i.e. hidden layers, activation function, etc.), produce one or more output neurons.

The learning phase is basically the way in which the system acquires information in order to predict future events on the basis of past experience. Mathematically, this phase consists of a training aimed at reducing the mean square error (MSE) between the output neuron(s) and the wished output by changing the weights of the links between neurons. The methodology is iterative, and it can be stopped only (after a fixed number of training epochs) when the MSE is lower than a given threshold. After the learning phase, the net needs to be tested in order to check its performance. If results are not consistent with the desired accuracy, it is possible to repeat the training phase. It has to be noticed that this phase is a “black box”, so results from a training and another one could be very different from each other. At the end of this process, the network can be used in operational situations.

In the presented case, the choice was to use a series on ANNs instead of only one because each ANN operates only on one lead time (to correct a forecast of 48 h, 48 ANNs are needed).

A crucial point in ANNS is the choice of input neurons. In the case at hand, the use of (a) the raw level time series, (b) recent level measurements at the POI and (c) the pressures at the centre of each area used within the frame of the physics-based module is proposed. Of course, a sensitivity analysis could be useful in order to perform this choice in other cases; i.e. the procedure is site-dependent.

### 2.3 Hindcast method statistical correction

The statistical correction in the hindcast method is different from that of the forecast one. The reason lies in the different purposes of the method. Indeed, when the measured data time series are not long enough to be considered representative (i.e. statistically) for high return periods, the use of hindcast methods is the only way to work with a subset of reliable data.

In these cases, the interest is not focused on the timing of the reconstructed time series, but on the reliability in reproducing the extreme events.

For these reasons, the correction of the raw level (obtained using reanalysis data) is carried out by adding the pressure field using a coefficient *Cp *that can be estimated by comparing synchronous observed residual levels and pressure values at a generic point of interest. This means that Eq. (3) is modified as follows:

where

Due to the aim of the method (i.e. achieve reliable estimates of extreme events), a correction on the maximum hindcast error for a given temporal window has to be defined. Having available measured data, it is possible to define the maximum error (

Following the technique proposed by [44], it is possible to evaluate the quantile of the ECDF of the random variable *Ccal *. This quantile should be viewed as the comparison between extreme values of observation and the corrected hindcast series. It is clear that this method requires the availability (even for a few years) of measured tide levels and pressure values.

## 3. Application of the methods

This section aims to illustrate the application of the described methods showing their general performances by means of two applications to real cases. The forecast method is applied to the northern part of the Adriatic Sea, while the hindcast one to the southern Adriatic Italian coast.

### 3.1 Forecast method: the case of Venice

This section shows an example of the application of the forecast method to a POI located in the northern part of the Adriatic Sea. For this basin, storm surges are mainly due to Atlantic perturbations (i.e. cyclones). Due to the presence of the continental shelf and considering thermal effects, the perturbations are amplified [45]. A famous example of these effects is Venice and the phenomenon of “acqua alta” causing the partial or total flooding of the city with damage to historical monuments, economy and private buildings. The recent flooding event of November 2019 has been supposed to damage the city for 4 billion of euros.

The physical characteristics of the weather conditions can induce resonance phenomena (e.g. [14, 15, 16]) with level oscillations persisting for several days in the whole basin [46].

The worst situation for the Adriatic Sea in terms of storm surge is when the perturbations come from south-east (i.e. “sirocco winds”) and propagate along the main axis of the basin (e.g. [45]).

The Adriatic Sea has been discretized in 19 areas (

The unit wind response functions have been estimated for each of the 19 areas using the “regional ocean modelling system” (ROMS, e.g. [42, 47]). Due to the particular geographical conditions and to the prevalent wind directions (i.e. Sirocco), only the wind stress component acting along the main axis of the Adriatic Sea (i.e.

where *i*th area.

About the simulations, the grid resolution is 3′ (i.e. about 5500 m, *τ* = 900 s. Figure 2 shows an example of the computed response functions for two areas located in two different locations in the basin.

The wind stress values have been computed on the basis of wind speed by using the relationship proposed by Drago and Iovenitti [49]

where *W* is the actual wind speed. The factor

Figure 3 shows the performances of the forecast method. The figure shows some typical results illustrating the forecasted storm surge level (dashed grey line), the measured storm surge level (triangle symbols), the forecasted total tide level (dashed black line) and the measured total tide level (black circles). The measurements have been taken using the records in the mareographic station (45.41°N, 12.44°E) located on Lido mouth (Italian National Mareographic Network). It is also illustrated in the figure the total measured tide level (grey cross symbols) and the forecasted one (solid grey lines). The harmonic component has been evaluated by means of [17] considering seven components (M2, S2, N2, K2, K1, O1, P1) as suggested in the literature (e.g. [51]).

The statistical correction have been made training 48 ANNs, one for each increasing lead time (Δ*tn * = *n*-hours, *n* = 1, 2, ..., 48). The used ANNs are multilayer networks with an input vector, two hidden layers and one output value (the storm surge). The training has been made using a back-propagation algorithm. In this case, a Levenberg–Marquardt algorithm has been used. The learning phase of the ANNs covered 3 years (2009–2011); the testing period is referred to as the year 2012, while the validation period is the year 2013.

As it is possible to see, inspecting Figure 3, the training phase confirms the good performances of the algorithm, and, in the validation step, at least from a qualitative point of view, it is possible to appreciate the accuracy of the model.

Talking about quantitative performances, a series of statistical parameters have been evaluated. Mean (*μ*) and standard deviation (*σ*) of the differences between predicted and observed total tide have been computed for the years 2009–2011 (training period), year 2012 (testing period) and year 2013 (validation period, e.g. [34, 47]). Table 1 summarizes the results of the statistical analysis. It is important to observe that the absolute value of the mean is always lower than about 0.04 m, while the standard deviation (that increases as the lead time increases) ranges between 0.05 m and 0.10 m.

Lead time [hours] | μ [m] | σ [m] | Lead time [hours] | μ [m] | σ [m] |
---|---|---|---|---|---|

1 | −0.008 | 0.045 | 13 | 0.008 | 0.077 |

2 | −0.004 | 0.054 | 14 | 0.013 | 0.083 |

3 | 0.017 | 0.072 | 15 | 0.025 | 0.091 |

4 | 0.012 | 0.077 | 16 | 0.028 | 0.095 |

5 | 0.010 | 0.077 | 17 | 0.043 | 0.097 |

6 | 0.020 | 0.069 | 18 | 0.012 | 0.079 |

7 | 0.014 | 0.062 | 19 | −0.002 | 0.070 |

8 | 0.019 | 0.063 | 20 | 0.006 | 0.064 |

9 | 0.025 | 0.062 | 21 | 0.019 | 0.073 |

10 | 0.011 | 0.062 | 22 | 0.026 | 0.082 |

11 | 0.000 | 0.062 | 23 | 0.001 | 0.077 |

12 | 0.000 | 0.066 | 24 | −0.006 | 0.078 |

The comparison between the obtained results and those available in the literature (e.g. [34]) reveals that the gained reliability is satisfactory if the simplicity and computational costs of the method are considered.

### 3.2 Hindcast method: the case of Manfredonia

This section aims at showing an example of the application of the hindcast method to a POI located in the south of Italy, in the Manfredonia Gulf (41.38°N, 15.55°E). When the wind comes from south or south-east rivers, flow is influenced by the downstream boundary condition, and several areas are flooded. These events cause damage to economic activities, private houses, etc. Moreover, the coastal area on the Gulf suffers from erosion. Although standard protection structures (e.g. [52, 53]) have been rolled out, erosion problems are still unresolved.

In this scenario, the utility of having a tool to perform hazard analysis is clear [3].

As observed, this method relies on the same mathematical hypotheses of the forecast one using a different statistical approach to correct the raw data coming from the dynamical step.

The response functions of the basin have been evaluated in the same way as described in Section 3.1 and extracted in a different point of interest (see Figure 1).

Following the description in Section 2.3, to apply the hindcast method, a set of data has been known. In the present case, the wind and atmospheric pressure data are referred to the ERA-Interim database (European Centre for Medium-Range Weather Forecasts (e.g. [54]). The spatial resolution of ERA-Interim data is 0.75°, while the response function of the basin, as described above, has another resolution equal to 3′. To overcome this problem, the values of wind and pressure (acting at the centre of each area) have been evaluated performing a linear interpolation.

The tidal data are those collected by means of the tidal gauge station owned by the Apulia Region Meteomarine Network (also referred to as SIMOP, e.g. [55]). It collects wave, wind and tidal data along the Apulian coasts [56]. This station does not gather the measures of atmospheric pressure. Then, in order to compute the term related to pressure gradients, this data have been taken in two locations near Manfredonia: Vieste and Bari (see Figure 4) where two tidal gauges of the National Mareographic Network are installed.

As for the case of Venice, the raw level can be evaluated using a simplified version of Eq. (4) considering the projection of the wind stress along the principal orientation of the basin (i.e.

where

The coefficient

Due to the proximity of the stations to the POI, as might be expected, pressure measurements are in agreement. For a more detailed description, the reader may refer to [39]. It is possible to reach the same conclusion considering the quantiles of the measures in Vieste and Bari (see Figure 5) and comparing the quantiles of the pressure extracted from the ERA-Interim database at the centre of area 12 (that is the area the POI belongs to, see Figure 1) against the quantiles of the pressure observed at Bari (

Based on these outcomes, arguing that the field pressure is almost the same in the area between Bari and Vieste, for Manfredonia, a value of

A total of 39 years (from 1979 to 2017) of residual tide levels have been reconstructed by means of Eq. (9). Also in this case, although Eq. (9) considers also the pressure anomalies (i.e. using the term *Ccal *given by Eq. (5). The selection has been made by matching the quantiles of the probability density functions of the hindcast and observed extreme values.

The extreme extraction has been performed by means of a peak over threshold (POT) analysis. The obtained data have been used to define the generalized Pareto distribution (GPD) (e.g. [58]). The threshold selection has been made following the standard technique proposed by [58].

Varying the calibration coefficient *Ccal *ranging from 1.0 up to 2.0, the return levels of the hindcast time series and of the observed values (Xr* M *) have been computed. Measured data show a threshold equal to 0.10 m with 162 values exceeding the threshold, while the estimated GPD parameters are *Ccal *). Taking into account a * H */Xr* M *approaches to 1. This means that the corrected hindcast time series (by means of *Ccal *) shows equal values to those evaluated on the basis of observed time series.

In order to gain insight on the ECDF of the observed and hindcast extreme values, the Q-Q plots for the uncorrected series (i.e. *Ccal * = 1) and with a correction equal to 1.28 (see Figure 7) have been evaluated. Figure 7 shows the results and exhibits the usefulness of using the calibration coefficient in improving the reliability of the hindcast.

In addition, the root-mean-square error (*RMSE*) the *Bias*, the correlation coefficient (*R*), the index of agreement (*d*) and the Nash-Sutcliffe efficiency coefficient (*NSE*) have been calculated on the sample of the quantiles of the ECDF of the hindcast and observed extreme values. The *R* are commonly used in the literature (e.g. [59, 60, 61]), while

These statistical indicators have been calculated considering the corrected (*Ccal * = 1.28) and uncorrected (*Ccal * = 1) hindcast data.

Results show that there is a moderate increase in the reliability of the corrected data (by means of the calibration coefficient). More in details, the *R* varied from 0.987 to 0.991, *d* changed from 0.952 to 0.990, and *NSE* has varied from 0.853 to 0.960. Remembering the aim of the method, however, the same indexes have been evaluated for the quantiles greater than 0.15 m. In this case, the field contains real extreme values, and the importance of the correction is emphasized (

## 4. Conclusions

A simplified real-time forecast method and a simplified method for the estimation of return levels of storm surge in semi-enclosed basins are proposed. Both the two approaches are mixed. Indeed, results coming from a physics-based approach are corrected by means of statistical corrections.

In both cases, the strategy is to estimate the dynamic response function of the basin to a unit wind stress. These functions may be used, following the theory of linear dynamic systems, to compute the response of a considered semi-enclosed basin using whatever wind time series.

In this way, only the wind field role is considered, and the pressure field is not. In order to take into account all the meteorological parameters inducing the storm surge, a statistical correction for both models is proposed.

In the case of forecast models, the statistical correction have been made using a series of artificial neural networks trained with (a) the residual raw level time series, (b) recent residual level estimated at the POI on the basis of the measurements collected during 24 h before the forecast time and (c) the forecasted pressures along the basin as input neurons.

For the hindcast method, instead, the pressure field has been considered using the pressure anomaly and operating a statistical correction using a calibration coefficient.

The approaches allow to reduce the computational costs since the numerical simulations have to be done once and for all for each considered basin.

The two methods have been applied to two different points of interest in the Adriatic Sea revealing in both cases good reliability of the obtained results compared to their simplicity.

It has to be noticed that these approaches are devoted to study storm surges in the semi-enclosed basins and are not able to correctly reproduce storm surges due to very rapid meteorological events (i.e. hurricanes).

## Acknowledgments

The author wants to thank Prof. Marcello Di Risio for his help in suggesting ideas and methods that have made possible the realization of these approaches. He is also grateful to all the co-authors in the papers regarding storm surges that have been cited in the references of this chapter.

## Conflict of interest

The author declares no conflict of interest.