The atmospheric movements can be described by non-linear differential equations that unfortunately have no analytical solution. The numerical methods to solve the atmospheric non-linear differential equations have been developed in different stages. During the 50s, Charney, Fjørtoft and von Neumann (1950) made a 24-hour forecast of 500 hPa geopotential height using a bidimensional quasi-geostrophic equation. After that, in 1956, Philips showed the close relation between cyclone dynamics and the global circulation using a 2-layer model.
At the beginning of the 70s, the global circulation models emerged (Lynch, 2006). These models are based on a set of non-linear differential equations, which are used to approximate the global atmospheric flow, called primitive equations. During this stage the full primitive equations were implemented without any quasi-geostrophic approximation (Williamson, 2007).
During the 80s, the regional and mesoscale numerical models appeared (Athens &Warner, 1978; Mesinger et al., 1988). The evolution of the models is a direct consequence of the increase of computer resources, and the improvement in observational networks and assimilation methods. This evolution has extended the knowledge on the dynamics and atmospheric microphysical processes.
The last period of the numerical weather prediction was initiated in the 90s. The atmosphere-ocean and atmosphere-ocean-soil coupled models, and the spatio-temporal high resolution models allowed the development of analysis and diagnostic techniques for the weather forecasting (Mechoso & Arakawa, 2003).
Until then, the numerical prediction models’ philosophy was based on the deterministic atmospheric behavior. That means, given an atmospheric initial state its evolution can be numerically predicted to give a unique final state. Consequently the efforts of the scientific community were focused on producing the most accurate prediction (Tracton & Kalnay, 1993).
Nevertheless, the formulation of models requires approximations due to unknown variables or known process that cannot be explicitly resolved using the spatio-temporal resolutions a model works with. These processes must be parameterized and this fact generates errors associated with the parameterization used in the model. Although the model could perfectly simulate all the atmospheric processes, it would be impossible to determine a realistic initial state description for all resolutions and in all places using the available observational data (Daley, 1991). Lorenz (1963) showed that small variations on the model initial conditions do not produce a single final solution, but a set of different possible solutions. That is why the predictability of the future atmospheric states is limited in time: the initial condition errors are amplified as the forecast period grows (Lorenz, 1963, 1969).
The traditional deterministic approach gave way to a new paradigm, with richer information than a single solution of the future atmospheric state. The new paradigm includes quantitative information about the uncertainty of the predictive process. The atmospheric non-linear behavior, consequently chaotic, must be treated now in a probabilistic way (Lorenz, 1963).
The improvement of numerical models will permit a better characterization of the atmospheric processes but the models will always have some limitations related to the scales of the simulated processes and the approximations made to solve numerically the equations. Another limitation of the numerical forecasting methods is the lack of observational data with high enough resolution to properly describe the initial state.
Nowadays the observational methods, the assimilation strategies and the own characteristics of the numerical models have inherent limitations that generate uncertainty in the estimation of the possible future atmospheric states. The uncertainty is amplified when the forecast period grows and when the resolution increases. Thus, the probabilistic approach seems an ideal strategy to characterize forecast uncertainty.
The atmospheric state cannot be exactly known. The analysis data always contain an error that only can be estimated. The inaccurate determination of the real atmospheric state drives to the existence of a great number of initial conditions compatible with it. A single model only provides a single solution of the future atmospheric state. The generation of multiple forecasts starting from slightly different but equally-probable initial conditions can characterize the uncertainty of the prediction (Leith, 1974).
The generation of equally probable forecasts starting from multiple realistic initial conditions introduces the probabilistic forecasting concept. A practical approximation to probabilistic forecasting based on meteorological models is the so called ensemble forecasting. The Ensemble Prediction Systems (EPS) are used operationally in several weather and climate prediction centres worldwide. The European Centre for Medium-Range Weather Forecasts (ECMWF; Molteni et al., 1996) or the Meteorological Service of Canada (Pellerin et al., 2003), among others, produce routinely ensemble predictions. These predictions have been demonstrated to be extremely useful on decision making process.
The EPS is a tool for estimating the time evolution of the probability density function viewed as an ensemble of individual selected atmospheric states. Each of these different states is physically plausible. The spread of the states is representative of the prediction error (Toth & Kalnay, 1997).
Several techniques for constructing the ensemble have been developed and applied. One of the first methods proposed for generating an ensemble of initial states is the random Monte Carlo statistical methodology. It was proposed by Leith (1974), Hollingsworth (1980) and Mullen and Baumhefner (1989), among others.
Perturbative methods that depend on the atmospheric flow are also used. These strategies are based on the generation of perturbations in the subspaces where the initial condition errors grow faster. The breeding vectors (Toth & Kalnay, 1993, 1997) or the singular vectors (Buizza & Palmer, 1995; Buizza, 1997; Hamill et al., 2000) are remarkable examples.
There are other perturbative methods that consider the model sub-grid scale errors by means of varying model physical parameterizations (Stensrud et al., 1998; Houtekamer & Mitchell, 1998; Andersson et al., 1998) or using stochastic physics (Buizza et al., 1999).
The combination of multiple model integrations initialized by multiple initial conditions determined by different analysis cycles is another strategy to generate ensembles. Using different assimilation techniques allows characterizing the uncertainties associated to the initial condition and the uncertainty associated to each model (Hou et al., 2001; Palmer et al., 2004). Finally, taking different global models as different initial conditions has been found to provide better performance than any single model system (Kalnay & Ham, 1989; Wobus & Kalnay, 1995; Krishnamurti et al., 1999; Evans et al., 2000).
The technique based on the use of multiple limited area models (LAM) and multiple initial conditions coming from several global models combined with advanced statistical post-processing techniques (Gneiting & Raftery, 2005a) has been tested in the National Centres for Environmental Prediction (NCEP; Hamill & Colucci, 1997, 1998; Stensrud et al., 1999; Du and Tracton, 2001, Wandishin et al., 2001) during the Storm and Mesoscale Ensemble Experiment (SAMEX; Hou et al., 2001). Such probabilistic predictions have also been generated over the Pacific Northwest coast (Grimit & Mass, 2002) and over the Northeast coast (Jones et al., 2007) of the United States.
The combination of multiple models and multiple analyses is part of the operational suite of NCEP (Du & Tracton, 2001) and the basic idea of the short-range EPS of Washington University (Grimit & Mass, 2002) and the Agencia Estatal de Meteorología (AEMET; García-Moya et al., 2011).
2. Atmosphere as a chaotic system
2.1. Lorenz and non-linearity
Two basic properties can dynamically characterize a chaotic system: the sensitivity to initial conditions and the topologically mixing. Sensitivity to initial conditions implies that infinitesimal changes in the system initial trajectory can lead to big changes in its final trajectory. The Lyapunov exponent (Lyapunov, 1992) gives a measure to this sensitivity to initial conditions as it quantifies the rate of separation of infinitesimally close trajectories. Generally it cannot be calculated analytically and one must use numerical techniques. In Krishnamurthy (1993) it is described how to calculate the Lyapunov exponents of a simple system. The meaning of topological mixing is that the temporal evolution of meteorological quantities in any given region of its phase space will eventually overlap with those of any other given region. This second property is necessary to distinguish between simple unstable systems and chaotic systems.
The classical example provided by Lorenz (1963) is instructive. For this reason we use it in this section, to show briefly some concepts of Chaos Theory. It comes from a simplified model of fluid convection. It consists of a dynamical system with only three degrees of freedom, but it exhibits most of the properties of other more complex chaotic systems. It is forced and dissipative (in contrast to Hamiltonian systems which conserve total energy), non-linear (as its equations contain products of dependent variables) and autonomous (all the coefficients are time independent). The Lorenz (1963) equations are:
where, in this simplified model,
Which means that an original volume in the phase space
As is shown with more detail in next sections the difficulty of weather forecasting is due either to the sensitivity of the atmosphere evolution to small changes in the initial conditions related to the analysis error and to the sensitivity of the atmospheric differential equations to small differences in the numerical schemes used to find a numerical solution or model error. Figure 2 shows the evolution of the Lorenz system for two different but similar initial conditions. The solutions are very similar up to t = 25 approximately in this case and after that the differences become larger. After t = 30 the value of the variables x and y cannot be predicted although z remains more predictable. In general, the time range within which the system remains predictable, depends on the initial condition, and this characteristic is called the flow dependency of the predictability of the system.
The effect of model errors can be shown by changing slightly the constant parameters
In light of these results (Lorenz, 1963), the question about the predictability of the atmosphere was raised for the first time, which has involved the efforts of the meteorological community to quantify it over several decades until today.
2.2. The predictability problem
A rigorous analysis of the chaotic properties of such a complex system as the atmosphere can be only achieved in simplified contexts. Atmosphere dynamics has been stated as chaotic and it is well established that there is an effective time barrier beyond which the detailed prediction of the weather may remain impossible (Lorenz, 1969). Predictability, or the degree to which a correct forecast can be made, depends on the spatial and temporal scales (from few hours at the mesoscale to few weeks at the planetary scale) and also on the variable (for instance, surface wind and temperature, precipitation or cloudiness).
Atmospheric chaos, uncertainty, predictability and instability are related concepts. Due to the approximate simulation of atmospheric processes, small errors in the initial conditions and model errors are the two main sources of uncertainties that limit the skill of a single deterministic forecast (Lorenz, 1963). Uncertainty limits the predictability, especially under unstable atmospheric conditions. The atmospheric instabilities related to low predictability conditions are the baroclinic instability at synoptic scales (Buizza & Palmer, 1995) and inertial and potential instabilities (e.g. deep convection) on the mesoscale, among others (Hohenegger & Schär, 2007; Zhang, 2005; Roebber & Reuter, 2002; Emanuel, 1979). This inherent limitation in predictability has led to the concept and development of ensemble prediction systems, which provide probabilistic forecasts to complement the traditional deterministic forecasts (Palmer et al., 1992).
3. Ensemble prediction systems
3.1. Uncertainty sources in numerical weather prediction
As indicated before, due to the chaotic nature of weather, there are several uncertainty or error sources in the Numerical Weather Prediction (NWP) framework that can grow and limit the predictability (Lorenz, 1963, 1969) of atmospheric flow. Forecast errors can arise due to inaccuracies in the initial condition atmospheric state estimates or due to imperfect data assimilation systems (Initial Conditions forecast error source), inadequacies of the NWP models themselves (Model Formulation forecast error source). Processes that take place at spatial scales that are shorter than the truncation scale of NWP models must be parameterized with sometimes inexact approximations thus giving us another source of forecast error (Parameterization forecast error source). One approach of NWP is to use Limited Area Models (LAMs) where the lateral conditions come from a global NWP models. This procedure is another source of forecast error (Lateral Boundary Conditions forecast error source). So far, the main error sources are: Initial Conditions (IC), Model Formulation, Parameterization and Lateral Boundary Conditions (LBC) error sources.
To the extent that these error sources project onto dynamical instabilities of the chaotic atmospheric system, such error will grow with time and evolve into spatial structures favoured by the atmospheric flow of the day. The inherent atmospheric predictability is thus state-dependent.
To estimate these uncertainties or errors, i.e. the predictability, many operational and scientific centres produce ensemble forecasts (e.g. NCEP, ECMWF, etc.). The idea of using ensemble forecasts has been know for many years (Leith, 1974). Since the early 1990s, many centres generate ensemble forecasts. The methodology that is behind is to run multiple (ensemble) forecast integrations from slightly perturbed IC (IC forecast error source), using multiple models or perturbing model formulation (Model Formulation forecast error source). Adding stochastic physics parameterizations (Ehrendorfer, 1997; Palmer, 2001) or using multiple boundary conditions (Lateral Boundary Conditions forecast error source) among others techniques is described below. The discrete distribution of ensemble forecasts can be interpreted as a forecast Probability Density Function (PDF). If an idealized forecast ensemble can be constructed that properly characterizes all sources of forecast errors, then the forecast PDFs would be reliable (see section 5) and skilful (sharper than the climatological PDF). No further information would be needed to make trustworthy forecast-error predictions, since a perfect PDF is a complete statement of the actual forecast uncertainty.
In practice, estimates of all the forecast-error sources mentioned above are inexact, leading to PDFs from real ensemble forecasts with substantial errors in both of the first two moments (mean and variance). These limitations are particularly pronounced for mesoscale prediction of near-surface weather variables, where large underdispersion results from insufficient ensemble size, inadequate parameterization of sub-grid scale processes, and incomplete or inaccurate knowledge of land surface boundary conditions (Eckel & Mass, 2005). Real ensemble forecast distributions, although generated using incomplete representations of weather forecast error sources, often represent a substantial portion of the true forecast uncertainty.
3.1.1. Initial conditions forecast error source
It is clear that the atmospheric state at a given time is not perfectly known; not only the inherent observational errors alone guarantee this, but also the sparse network of observations worldwide that sample the atmosphere only at limited intervals with inexact results. In addition network density and design can yield errors in regional averages (PaiMazumder & Mölders, 2009). Another contribution to IC forecast error source is the Data Assimilation (DA) system used. Every DA system is affected by the characteristic errors of both the observations incorporated in the analysis and of the short-range model forecast, which is typically used as a background or
3.1.2. Model formulation forecast error source
NWP model inadequacy is inevitable given to our inability to represent numerically the governing atmospheric physical laws in full. Contributions to this forecast-error source can be found in the model used which is, of course, a simplified scheme of what really happens in the atmosphere, dynamical formulation, different discretization methods, the numerical method employed to integrate the model and the different horizontal and vertical discretization resolutions used.
The model formulation forecast error in conjunction with another forecast error sources such as parameterizations has been recognized traditionally by operational forecasters in NWP centres. They usually select
3.1.3. Parameterization forecast error source
There are several parameterized processes in NWP models: those which are taking place at smaller spatial scales than the truncation scale of the NWP model and are not resolved explicitly by the model as convection. Another one is introduced in a simplified way due to computer time limitations like radiation, and finally processes which are not taking into account in the NWP model dynamic part as microphysics in clouds. All theses processes are called sub-grid processes. It is assumed that sub-grid processes are in equilibrium with grid resolved states and so they can be represented statistically from them. A parameterization is the statistical method used when representing the sub-grid processes. Parameterizations are always imperfect representation of atmospheric processes so they always include inherent errors (Tribbia & Baumhefner, 1988; Palmer, 1997). NWP parameterizations have a time and space scale dependency. At small scales, forecast verification is primarily concerned with the locations and amounts of precipitation and other sensible weather parameters, which are often directly affected by the assumptions used to develop the model parameterization schemes for convection and other processes. Moreover, especially for the higher model resolutions, the implicit equilibrium assumption of sub-grid processes with model state could break down being another source of parameterization uncertainty.
3.1.4. Lateral boundary condition forecast error source
The LBC forecast error is only present in LAMs or regional models, which have as inputs lateral boundary values spatially and temporally interpolated from a coarser resolution grid-point or spectral model. So the coarser model errors are translated into LAMs as LBC error source. For instance, a possible configuration for a LAM EPS could include lateral boundary conditions from an ensemble of global forecasts.
3.2. Techniques used by global models
For many years operational forecasters, particularly medium range forecasters in meteorological services, have had access to some forecast products coming from global NWP centres other than their own. They routinely compare forecasts from different centres to assess the confidence in the forecasts of their own models, and to determine possible alternative forecasts. This set of available forecasts is often called the
Hoffman and Kalnay (1983) proposed a
A better solution is to sample the different error sources that were indicated before. Depending on the sampling technique we obtain different methods: a Monte Carlo approach as proposed by Leith (1974), Hollingsworth (1980) and Mullen and Baumhefner (1989) among others. In general, the technique consists of sampling all sources of forecast error, by adding or perturbing any input variable (analysis, initial conditions, boundary conditions etc.) and whatsoever meteorological parameter that is not perfectly known. These perturbations can be generated in different ways. The main limitation of the Monte Carlo approach is the need to perform a high number of perturbations in order to have a proper description of the initial uncertainty, which is usually far from the available computational resources. This limitation leads to reduced sampling by just sampling the leading sources of forecast error due to the complexity and high dimensionality of the system. Reduced sampling identifies active components that will dominate forecast error growth.
IC forecast error source have a dominant effect. To sample it several techniques have been available. One of them is the initial perturbations method, which consists of adding small perturbations to the initial analysis, such as NCEP’s
An alternative to the breed mode is the ECMWF’s
In addition to the breeding and singular vector methods there are
Model forecast error source, i.e. model formulation and parameterization error sources together, is another component to take into account. To represent model uncertainty several approaches have been used: the m
So, in order to sample model forecast error an ensemble forecasts are produced by using different numerical models (multi-model approach). The multi-model approach implies equiprobable members which is not always the case. An alternative method for sampling model forecast errors is using different physical packages (multi-physics approach). Another approach is the
3.3. Techniques used by limited area models
Not only global models can be used in building EPS, but also Limited Area Models (LAM) can be used to create LAM EPS, normally used for the short range. Error sources in LAM EPS are the same as in global EPS, but LAM models require lateral boundary conditions that update the weather situation regularly throughout the integration. These lateral boundary conditions introduce a main source of uncertainty in LAM ensembles. Both LBCs and ICs give their contribution to the spread and skill of the system (Clark et al., 2009). All the techniques discussed so far can be applied to generate LAM EPS. A very popular generating technique is the downscaling of global ensemble forecasts. This technique consists of using the selected global ensemble members (chosen by clustering) as initial and boundary conditions for a limited area ensemble system. The difficulty is that the perturbations generated from the global EPS are usually effective only on the medium range and large scales. Therefore they are not likely optimal for short range ensemble forecasts. Another technique for sampling lateral boundaries forecast error source is
4. Uncertainty representation and weather forecasting products
4.1. Uncertainty representation
In statistics, uncertainty is represented by means of the Probability Distribution Function (PDF). Let us consider a random variable
The PDF gives us information about the behaviour of the random variable
In the case of a Numerical Weather Prediction (NWP) system there are about 108 different random variables
In ensemble prediction, a simplified way of representing the uncertainty of the forecast is the
4.2. Raw products
Ensembles are composed of members that are deterministic predictions, and allow providing individual deterministic information (García-Moya et al, 2011). This information can either help the traditional staff to understand ensembles, and can provide support in the probabilistic interpretation. Far beyond this, ensembles provide their most useful information when we look at them as intrinsically probabilistic prediction systems. In this context, most of the ensemble outputs reflect this probabilistic nature. Probability is a rich and reasonable model to describe and understand many aspects of the physical world, but the interpretation of ensemble outputs must be learned and used carefully beyond the straightforward interpretation, because (especially for
The deterministic outputs for all the ensemble members can be plotted as usual meteorological charts (see for instance Figure 8 with MSLP and T850 for the ECMWF EPS). These usual
In a given location, we can provide N forecast values for that place (either by bi-linear interpolation or some other fine process which could account for height variability). Moreover, we can plot the evolution with forecast step for all the N members, i.e. we would plot N curves. Like on the stamp charts, the control forecast can be highlighted and the high-resolution deterministic forecast can also be plotted (e.g., Figure 9). The difficulty is similar to that one of the stamps namely the necessity to deal with such an amount of information. For a specific location and parameter, plumes can help the forecast guidance and, in fact, often provide information about the uncertainty and general trends.
As a third example, we show the spaghetti charts. For a given (often dynamical) field, it is rather impossible to overlay N member charts. But picking a selected isoline of interest, we can plot one line per member and, thus, the whole plot would contain N lines. Typically, the control is highlighted, and a higher resolution deterministic forecast can be included (e.g. Figure 10). As plumes, this kind of plot can help the forecaster to provide information about the uncertainty.
All of these raw outputs share the same shortcoming: the inherent difficulty in the forecast guidance for handling the huge amount of information they provide. This issue is addressed using derived probabilistic outputs that compact this information naturally.
4.3. Derived probabilistic products
Probabilistic outputs are derived computationally from the PDF representation, which is assumed to be provided by the ensemble members. These products reflect the probabilistic nature of the ensemble, visually and conceptually. They provide explicit, quantitative and detailed information about uncertainty, and this fact is a real breaking point with respect to deterministic model products. They address the issue of providing compact information in this natural way. Ensembles nowadays can provide ideal complementary information to a higher resolution deterministic model.
For a given grid point, there are
The corresponding inverse is the computation of percentiles, i.e., for a given probability
4.3.1. Ensemble mean and spread charts
The ensemble mean (the arithmetic mean of all the ensemble members) is not always a feasible meteorological situation because it is obtained as a result of a statistical operation, not from a numerical model (Buizza and Palmer, 1997). So, it is strongly discouraging in forecast guidance to use the ensemble mean without special care, if at all (García-Moya et al, 2011). However, the ensemble mean is often plotted in charts together with the standard deviation (the latter as a measure of spread), to help with the understanding of the atmospheric flow (e.g. Figure 11). The standard deviation is sometimes normalized.
4.3.2. Probability maps and percentile maps
Given a binary event (e.g. precipitation forecast > 5 mm/6h) we can plot the spatial distribution of the forecast probabilities that the EPS provides (see Figure 12). Similar plots can be made for the percentiles. These maps provide the forecasters with useful guidance by showing them where it is more probable for an event of interest to occur (e.g. representative precipitation that exceeds 5 mm/6h).
Box-plots (Wilks, 2006) and similar graphs give a quick, visual and simple representation of a distribution of numbers, a discrete PDF. Extending this idea by including the evolution with forecast time of the main weather parameters at a given location, we obtain plots that are
A natural way of examining the number of atmospheric scenarios that EPS provide (e.g. stamps) is by similarity: one can gather scenarios in groups that could lead to similar weather conditions (Ferranti, 2010). This process can be done by
4.3.5. Extreme forecast index
Extreme events are not always severe, but severe events are often extremes. An index of
4.4. Interpretation for weather forecasting
As an example for the application of probabilistic forecasting we present a real case of extreme winds that is fully described in Escribà et al. (2010). This section is not intended to be a detailed manual of the utilization of probabilistic products in operational forecasting. More extend and concise information can be found for example at www.ecmwf.int.
At 00 UTC on 24 January 2009 an explosive cyclogenesis in the Gulf of Vizcaya reached its maximum intensity with an observed surface pressures less than 970 hPa on its center. During the cyclone’s path through the south of France strong westerly and north-westerly winds occurred over the Iberian Peninsula (> 150 km/h). These winds led to eight casualties in Catalunya, the north-east region of Spain.
In Figure 16 are represented three probabilistic forecasting products, the ensemble mean, the ensemble spread and the probability of having wind speeds greater than 15 m/s (54 km/h). The weather parameter analyzed is 10m wind speed. These fields correspond to the H+60 prediction of the AEMET-SREPS initialized at 00 UTC on 22 January 2009. The ECMWF reanalysis is also shown as verification.
Even though wind speed values plotted in the maps are not extreme, they do not exceed 20 m/s or 72 km/h, this fact has to be taken carefully because we are representing mean values of wind at a forecast time, i.e. a mean over a time interval equal to the last forecast time step of the forecasting model (which in this case is around 5 minutes). As a first approximation we can estimate the wind gust (maximum wind) as twice the value of the mean wind (this factor can be roughly obtained comparing temporal series of mean wind and wind gusts from observation ground stations). In this case, such an approximation would give us extreme winds of about 150 km/h, similar to those observed.
The ensemble mean (Figure 16) can be thought as a skilful deterministic forecast that comes from the ensemble. When we compare it with the verification we can highlight there is a good agreement in the overall patterns. Looking in more detail we can select three zones where there is more discrepancy: a.) south of Catalunya (yellow ellipse), b.) Aragon and Valencia (white ellipse) and c.) south-east of France (green ellipse). It is especially interesting to analyze zone a.), where the casualties occurred. The question is whether the ensemble can estimate in some way the error in the prediction; the spread is expected to give information on this.
The ensemble spread gives us the areas of more uncertainty in the prediction and is a measure of it. For zones a.) and c.) it has values around 5 m/s, which are considerable. In the case of a.), the spread estimates properly the prediction error, giving valuable information. When making the forecast, we would say that it is possible to have wind speeds greater than 16 m/s. This is not the case in zone c.), where the spread is not enough to explain the discrepancy between the ensemble mean prediction and the verification. In this case, the probabilistic forecast is badly displaced to the east. In zone b.), a lower spread (around 3 m/s) has also the ability to describe the prediction error.
Finally, the probability forecast enforces the general forecast by determining the areas of maximum confidence of occurrence and quantifying this confidence in a number. In this sense, we can say that the probability of having mean wind speed greater than 15 m/s (54 km/h) or wind gusts of more than 100 km/h in zone a.) is between 30% and 70%, which is more than the term
5. Ensemble forecast verification
NWP models must be compared with a good representation of the observed atmosphere. This process is often called
Different frameworks are available for verification. Observations (ob) and forecasts (fc) can be compared, either
An ordinary example of the difficult issue in comparing apples and oranges is the performance assessment of quantitative precipitation forecasts (QPF) from a deterministic model. European meteorological offices provide to the ECMWF 24-hour accumulated precipitation reports from their high-density rain gauges networks. Forecast values are regularly spaced, while observations are not. One comparison method is to interpolate forecast values to observation points (Rodwell, 2010). Special care should be taken with the impact of spatial density of observations and the potential lack of statistical consistency due to spatial dependence between close ones. To address these issues, one can compute an observed quantitative precipitation estimate (QPE) using a simple up-scaling technique (Ghelli & Lalaurette, 2000; Cherubini et al., 2002) whereby stations are assigned to model grid-boxes and then
Moreover, in the comparison of the performance of QPF from two different forecasting models, further issues arise. The grid spacing of the two models might be different. If observations are gridded to the finer resolution, then the coarser model might be penalized. On the other hand, if observations are gridded to the coarser resolution, the comparison can be fair but the higher resolution model is not given a chance. How to compare the way in which both models represent structures at their own scale is a non-trivial issue. PaiMazumder & Mölders (2009) assessed the impact of network density and design on regional averages using real sites and model simulations over Russia. They find that generally, the real networks underestimate regional averages of sea level pressure, wind speed, and precipitation while overestimate 2m temperature, downward shortwave radiation and soil temperature.
5.1. A first requirement: Deterministic performance of ensemble members
The assessment of the deterministic quality of the ensemble members is a first requirement in the development of an EPS. When the quality of the ensemble members is similar, then any member can be weighted equally in the computation of a probabilistic forecasts, i.e. they are assumed to be equiprobable. Once provided this individual quality, then some other properties can be considered (see below). In addition, the ensemble mean should show a better deterministic performance than any individual member in terms of Root Mean Square Error (
As a visual representation, either time-series or evolutions with forecast step for bias and
5.2. Large scale flow: Statistical consistency with the observations/analysis
As a probabilistic forecast, an EPS must be statistically consistent with the observations in the large scale flow given the EPS domain is large enough. At this scale, the model analyses of upper-air dynamical fields (e.g. 500 hPa geopotential height, Z500) can be used for comparison, by providing a larger sample and covering the whole integration domain, and so by giving no priority to land areas where the density of observations is higher. Verification against SYNOP/TEMP observations is expected to give worse but qualitatively similar results. This statistical consistency can be assessed in several ways; two methods are shown here: the rank histogram and the spread-error diagram.
On each grid-point, either the analysis or each of the ensemble member values are assumed to be independent realizations of the same atmospheric process and hence equally probable. Here, the rank of the analysis is an integer number according to the position of the analysis value in the sorted list of forecast values. The rank histogram (Anderson, 1996; Hamill & Colucci, 1997, 1998; Hamill, 2000; Candille & Talagrand, 2005) can be used to check if the verifying observation is statistically indistinguishable from the set of forecast values (
Furthermore, the ensemble spread (often measured by the standard deviation with respect to the ensemble mean or the control forecast if possible) and the error of the ensemble (measured by the
5.3. Weather parameters: Binary events
For the performance assessment of weather parameters (e.g. precipitation, 2m temperature, 10m wind), with larger variability in space and time, the use of observations is encouraged, as they are not as smooth as upper-air field analyses. In the distributions-oriented framework (for a detailed description see Joliffe & Stephenson, 2003; Wilks, 2006), the performance of an EPS can be done measuring its response to a set of binary events (occurrence / non-occurrence, e.g. to exceed a threshold). The EPS behavior in this context is described by different properties: reliability, resolution, sharpness and discrimination. Brier (Skill) Score together with
5.3.1. Formal framework
By considering a binary event
The base rate
the components are meaningful:
To give a summary of performance measures comprising the response to several thresholds, the Ranked Probability Score (and the corresponding skill score) can be used, either the discrete or the continuous version (Hersbach, 2000).
A complementary measure of ensemble performance is the
As a measure of discrimination, the area
Another interesting complementary property, beyond performance, is the Economic
5.3.2. Visual presentation
The properties described above can be visualized in several ways.
5.4. Verification issues and prospective
Computational resources, object-oriented programming languages and data-base improvements give a boost to forecast verification. In the last decades, the forecast verification community has started to address some important issues that have an impact in the interpretation of verification scores and introduction of new conceptions. Either EPS specific or not, some of these issues are of large interest and hence are introduced here.
5.4.1. Pooling versus stratification
To compute statistically significant scores, samples must be large (many fc-ob pairs) and the corresponding significance tests should be applied (e.g. t-Student). On the other hand, mixing non-homogeneous sub-samples (e.g. different seasons) can lead to misleading performance information. In this context, the dimensionality problem (Murphy, 1988) can be a critical stumble in practice (Candille & Talagrand, 2008): computing completely consistent (from the strictly mathematical point of view) scores can turn out to be an infeasible task, and this issue often leads to a practical compromise: large samples are created without mixing different ones, according to possibilities. E.g.: the splitting of seasonal behavior that could be hidden in the overall average.
5.4.2. Flow-dependent verification
Flow-dependent sample stratification can improve other traditional ways of stratifying (e.g. seasonal), and nowadays can be tackled with clustering techniques (Ferranti & Corti, 2010).
5.4.3. Sampling uncertainty
Given the dimensionality problem and pooling-stratification compromise, the available fc-ob pairs are, in practice, relatively small samples from the all possible realizations of the model and observing systems. Thus, the scores computed are only sample measures of the population quantities, and there is a sampling uncertainty related to this process (PaiMazumder & Mölders, 2009). Any verification report should include this uncertainty, with error bars, confidence intervals, etc. (see e.g. Efron & Tibshirani, 1997; Bradley et al., 2008).
5.4.4. Spatial scales of forecasts and observations
Spatial scales are a key point (see examples above). For example, the double penalty (Ebert & Gallus, 2009) is a well-know related issue. The relatively recent development of new methods that account for spatial patterns, e.g. CRA (Ebert & Gallus, 2009), MODE (Davis et al., 2009), SAL (Wernli et al., 2008) are still under research, but show promising results and can lead to a framework of diagnostic verification (closer to subjective verification in the sense that provides information that can better help model developers and weather forecasters). For a comprehensive overview, see (Gilleland et al., 2009).
5.4.5. Extreme and severe weather
Extreme and severe weather are often introduced together, but they are not the same. Extreme events are rare events, with low base rates and belong to the tail of the corresponding climatological distributions. Severe events are those that have an impact (human and material) on society. Severe weather verification must include extra information from outside the meteorological context, whereas verification of extreme events is still in an early stage (Ferro, 2007a; Casati et al., 2008), and some alternative scores are under research, e.g. the Extreme Dependency Score (EDS; Stephenson et al., 2008; Ghelli & Primo, 2009).
5.4.6. Observational uncertainty
In forecast verification it has been traditionally assumed that the observation error is negligible when compared with the forecast error. This assumption can be consistent for longer forecast ranges, when the forecast error is much larger than the observation uncertainty. Several studies have extended the verification problem to a more general framework, in which observations are described together with their uncertainty. They show sometimes a surprising result: traditional scores generally underestimate EPS performance (e.g., Saetra et al., 2004; Candille & Talagrand, 2008; Santos & Ghelli, 2011).
5.4.7. Ensemble size
Differences in ensemble size can have an impact on performance assessment (e.g. compare a 16 members EPS with a 51 members EPS). The difference in size would, in principle, give better performance to the larger EPS a fact that should be at least taken into account. This issue is addressed by various authors (e.g., Buizza & Palmer, 1998; Ferro, 2007b; Ferro et al., 2008).
6. Statistical post-processing
EPS evidence systematic errors like do the deterministic NWP models. Calibration is the process of correction of the ensemble PDF to adjust it to the actual (and unknown) forecast uncertainty. The main point of calibration techniques is to use the information of the former prediction's skill to correct the current probabilistic forecast.
Different methodologies have been proposed recently to build calibrated probabilistic forecasts from ensembles, including Bayesian Model Averaging (Raftery et al., 2005), Logistic Regression (Hamill et al., 2008) and Extended Logistic Regression (Wilks, 2009), Non-homogeneous Gaussian Regression (Gneiting et al., 2005b) and Ensemble Dressing (Roulston & Smith, 2003; Wang & Bishop, 2005).
6.1. Bayesian Model Averaging (BMA)
Bayesian Model Averaging (BMA) is a statistical post-processing method that generates calibrated and sharp predictive PDFs from EPS (Raftery et al., 2005). The BMA predictive PDF of a weather variable is a weighted average of PDFs centred on the individual bias-corrected forecasts. The weights reproduce the predictive skill of the ensemble member over a training period.
If the forecast errors are approximately Gaussian distributed such as surface temperature or sea level pressure, BMA can be applied (e.g. Raftery et al., 2005; Wilson et al., 2007). For non-Gaussian error distributions using a mixture of skewed PDFs allows to extend the BMA methodology to this kind of weather parameters; A combination of point mass at zero and a power-transformed gamma distribution, for instance, can be applied to quantitative precipitation (Sloughter et al., 2007) and a mixture of gamma distributions with different shapes and scale parameters can be used to improve wind speed probabilistic forecasts (Sloughter et al., 2010).
Figure 24 illustrates the potentiality of BMA as a method for ensemble calibration. Continuous ranked probability score (CRPS) performance score (Hersbach, 2000) is represented for various ensemble predictions of surface temperature. CRPS index is understood to be for a probabilistic forecast the equivalent of the mean absolute error for a deterministic forecast. The different curves correspond to the raw ensemble and to the six ensembles calibrated using different statistical techniques. It is straightforward to see that the ensembles corrected with BMA perform better than any of the others.
6.2. Logistic Regression (LR) and Extended LR
The logistic regression and extended logistic regression techniques are described in detail in Wilks (2009). Logistic Regression (LR) approximates the Cumulative Distribution Function (CDF) of the predicted parameter
Typical predictors for LR when calibrating ensemble predictions are ensemble mean, ensemble spread or a function of them (Hamill et al., 2008). As thresholds
According to Wilks (2009)
By construction Equation (13) is fitted separately for every threshold
Thus, a unique regression estimation for any value of
It is important to point out as an advantage that LR (and ELR) has no statistical restriction to be used with non-Gaussian parameters such as precipitation or wind. At the same time this technique can be applied to ensembles whose members are non-distinguishable.
As an example of ELR, Schmeits and Kok (2010) calibrated ECMWF ensemble predictions of precipitation over an area that covers Netherlands (see the article for details). After studying the performances of different shapes for
Figure 25 represents a reliability diagram which compares the forecast probabilities of having precipitation lower or equal than 5 mm with the observed frequencies of this event. For a perfect reliable forecast all points would be in the diagonal, so in this case ELR calibration clearly improves the performance of the raw forecast.
6.3. Non-homogeneous gaussian regression
The non-homogeneous Gaussian Regression (NGR) technique was proposed by Gneiting et al. (2005b). In its general form, the predictive PDF estimated by NGR is assumed to be a perfect Gaussian with the mean value being a bias-corrected weighted average of the ensemble members forecasts and the variance a linear function of the ensemble variance. That is (Gneiting et al., 2005b):
This way of representing the predictive PDF allows a natural understanding of the regression coefficients. Coefficient
Compared to other techniques (e.g. BMA; Raftery, 2005), NGR has the advantage that can be applied to ensembles whose members are non-distinguishable, such as the ECMWF ensemble prediction system (Molteni et al., 1996). In this case, NGR is simplified by constraining the
Due to the Gaussian shape of the analytical expression (Eq. 17), this technique is expected to be especially useful for weather parameters that have Gaussian distributions such as temperature or pressure. Figure 26 represents a real experiment by Hagedorn et al. (2008) where GFS (Toth & Kalnay, 1997), ECMWF and a multi-model (combining GFS and ECMWF) ensemble predictions of surface temperature are calibrated using NGR all else being equal. Continuous Ranked Probability Skill Score (CRPSS) is used as a performance measure for probabilistic forecasts (Jolliffe & Stephenson, 2003). The higher its values (closer to 1), the better the probabilistic forecast will be. When CRPSS reaches 0 it means that the probabilistic forecast has the same skill than the climatology (Hagedorn et al., 2008; see for details). In this case, it is clear the benefit of calibrating the ensembles with NGR.
A classical and widely extended technique for estimating the
The adequate length of the training period for an operational approach is not unique. Raftery et al. (2005) showed that by using the previous 25 days the prediction intervals are the narrowest maintaining the right coverage of the verification (see the article for details). However, Wilks and Hamill (2007) used 45 days. An experimental study of the optimal length for one or more specific locations is desirable.
6.4. Ensemble dressing
The dressing technique is a statistical post-processing technique based on combining each member of a dynamical ensemble with its own statistical error ensemble.
Roulston & Smith (2003) proposed the use of a simple resampling scheme called
Wang and Bishop (2005) showed by stochastic simulations that the best-member method can lead both to underdispersive or overdispersive ensembles. In addition to this, Wilks (2006) demonstrated that the dressed ensemble cannot be reliable. In order to alleviate these problems a new multivariate dressing method based on the second moment constraint is proposed. Ensemble bias is removed before building training statistics for the dressing kernel assuming that each ensemble is drawn for stochastic process. To dress the ensemble, statistical perturbations ε are added to each ensemble member. The covariance matrix
Where the columns of
The comparison of the original best-member dressing method with the second moment constraint dressing method confirms that the spread of the best-member dressed ensemble is indeed underdispersive, or even becomes overdispersive, depending on factors such as the undressed ensemble size, how underdispersive the undressed ensemble is and the nature of the subspace from which the best member is identified. On the other hand, for underdispersive ensembles, the second moment constraint dressing kernel correction always returns about the right amount of dispersion.
Although underdispersion is a common characteristic of an EPS, some variables have an overdispersive behavior (Feddersen & Andersen, 2005). Fortín et al. (2006) proposed to dress and weight each member differently to improve the reliability of the forecast and to correct the variable under or overdispersed. This method is very similar to BMA (Raftery et al., 2005) and today has only been applied to one-dimensional variables.
7. A brief description of some current state-of-the-art ensemble prediction systems
The Australian Bureau of Meteorology (BoM), Brazilian Centro de Previsao do Tempo e Estudos Climatico (CPTEC), China Meteorological Administration (CMA), the European Centre for Medium-Range Weather Forecasting (ECMWF), Japan Meteorological Agency (JMA), Korea Meteorological Administration (KMA), Meteorological Service of Canada (MSC), Météo-France (MF), UK Met Office (UKMO) and US National Centres for Environmental Prediction (NCEP), among others, run ensemble prediction systems.
The ECMWF-EPS is a global ensemble that is optimized for the medium range. It uses the singular vector technique (Ensemble Data Assimilation is under research and is starting to be used operationally together with SV) for providing the set of initial perturbations, as well as stochastic parameterizations to account for model errors. The ECMWF-EPS comprises 51 members with 62 vertical levels and a spectral horizontal resolution of T639.
In the NCEP ensemble the initial perturbations are obtained by the Ensemble Transform with Rescaling (ETR) technique. It also uses stochastic perturbations to account for model errors. It runs 20 members with 28 levels and a spectral horizontal resolution of T126.
The MetOffice ensemble, called MOGREPS, works with 24 members. It uses the ETKF (Ensemble Transform Kalman Filter) technique with scaling of perturbations using radiosonde and ATOVS observations (Bowler et al., 2008). The horizontal resolution is 0.83 degrees in longitude and 0.56 in latitude, and the number of vertical levels is 70.
The Japan Meteorological Agency (JMA) ensemble has 51 members. It uses singular vectors for the calculation of the initial perturbations and stochastic representation of physical parameterizations for accounting model error. The number of vertical levels is 60 levels and the spectral horizontal resolution is T319.
A source of information for studying global ensembles is the TIGGE project, which is a key component of the THORPEX Interactive Grand Global Ensemble, a World Weather Research
Programme for improving the accuracy of high-impact weather forecasts. In the TIGGE project, the forecasts of 10 global ensembles are archived, permitting the comparison of methods and results.
Limited area ensembles are developed for higher resolutions (nowadays from 2 km up to 25 km) and shorter time ranges (from 18 to 72 hours) than those of global ensembles. When a model can explicitly resolve convection (due to its characteristics and high resolution configuration) it can represent more realistically typical precipitation patterns in the forecast field. However, convection forecasts (as well as other small scale processes) are very limited by their deterministic predictability (which is small due to its chaotic behaviour). Therefore, even in the short forecast range of only 24 hours, the prediction of details in the convection (such as location and timing of a thunderstorm) are usually very uncertain. Limited area ensembles can add information to deterministic high resolution forecasts and for this reason many operational weather centres are developing limited area ensembles.
Limited area ensembles running in operational centres are based on high resolution non-hydrostatic models, such as ALADIN (developed and maintained by a consortium of 16 National Meteorological Services, led by Météo-France), COSMO (developed by a consortium of seven NMS, led by Deutscher Wetterdienst), WRF (developed in the United States of America by NCAR, NOAA and others), HARMONIE (a model which shares code with ALADIN, developed by a consortium of 10 NMS) and the Unified Model (MetOffice). These ensembles can have an assimilation cycle which uses a wide class of meteorological observations, in some cases including radar data trying to represent as much as possible actual precipitation processes. They need lateral boundary conditions typically coming from global ensembles or coarser limited area ensembles.
Just to mention two examples of limited area ensembles, we resume below the characteristics of the COSMO-DE ensemble (Deutscher Wetterdienst) and the AEMET-SREPS. Other operational limited area ensembles are: the Norwegian targeted EPS LAMEPS (Frogner & Iversen, 2002), the Hungarian LAMEPS based on ALADIN (Hágel & Horányi, 2007), the multi-model GLAMEPS (ALADIN and HIRLAM consortium) and the limited area version of MOGREPS (MetOffice).
The COSMO-DE Ensemble is based on the convection resolving model COSMO-DE. It produces 2.8 km grid forecasts up to 18 hours, runs every 3 hours and assimilates estimated precipitation rates from RADAR.
The AEMET-SREPS uses the multi-model and multi-boundaries techniques for sampling initial, lateral conditions and model errors. It uses five global models for initial and lateral conditions and five limited area models running with every lateral and initial conditions (MM5, UM, HIRLAM, COSMO and HRM) thus producing 25 members. The horizontal resolution is 25 km and produces forecasts up to 72 hours twice every day. It covers a wide area (includes North Atlantic, Europe and North of Africa).
8. Conclusions and future directions
The most reliable and skilful theoretical forecasts from the current observed state of the atmosphere can be obtained through a Probability Distribution Function (PDF) which describes a comprehensive set of possible future states. The only practically feasible methodology to assess a forecasted PDF is using an Ensemble Prediction System (EPS), that is, a PDF sample of different but equally plausible Numerical Weather Prediction (NWP) forecasts (EPS members). Furthermore from the practical point of view EPS forecasts have been showed to be more reliable and skilful than a forecast from one single NWP model, even when the latter has a higher resolution.
This better performance is due to the fact that the set of non-linear equations which describe the future evolution of the atmosphere have a chaotic behaviour. This means that any uncertainty in the prediction process like two slightly different initial states could grow and lead to quite significantly distinct forecasted states. As a consequence, the predictability associated to any forecasted atmospheric state is always spatially and temporally limited but depending in each forecast on the uncertainty magnitude and the particular atmospheric situation.
The sources of errors and uncertainties which limit the predictability are mainly due to: a.) inaccuracies in the initial atmospheric state, estimated from available observations with their associated observational error and limited representativeness and imperfect assimilation systems, b.) inadequacies of the NWP models, related to dynamical NWP model formulation and physical parameterizations and c.) for Limited Area Model (LAM) EPSs, approximations and errors from Lateral Boundary Conditions. So any reliable and skilful EPS has to take into consideration all of these uncertainty sources by using different methodologies. Some of these methodologies are, for instance (and respectively): a.) singular vectors, bred vectors, Ensemble Transform Kalman Filter (ETKF) and Ensemble Data Assimilation (EDA), b.) multi-model, multi-physics or multi-parameterizations, multi-parameters and stochastic parameterizations, and c.) multi-boundaries.
In addition, as any forecasting system, EPS quality and value, that is the overall performance, has to be evaluated through objective verification, assessing the necessary and complementary set of properties (with the corresponding tools): consistency (rank histogram), reliability (spread-error and attributes diagrams), resolution (resolution component of Brier Score), discrimination (Relative Operating Characteristic curves), sharpness (Sharpness Histogram), skill (Brier Skill Score) and relative value (Relative Value Diagram). The main goal of verification, apart from assessing EPS forecast performance, is that EPS developers can be leaded to what and how to improve the EPS.
A number of EPS products, which take into account the forecast probabilities and the predictability concept, can serve the forecast guidance. The EPS products can be raw ones (e.g. stamps, plumes or spaghettis) or derived (e.g. ensemble mean and spread charts, probability and percentile maps, EPS-grams, clusters and extreme forecast indexes). Before the products production it would be advisable to calibrate the EPS in order to remove its systematic errors. Some statistical post-processing techniques for ensemble calibration are Bayesian Model Averaging, Logistic and Extended Logistic Regression, Non-homogeneous Gaussian Regression and Ensemble Dressing.
Regarding to future scenarios, the weather forecast process is expected to improve at all temporal scales, from the first hours to the climatic scales, and to be done at finer spatial scales, due to the fact of having more and better observations, better NWP models, increased supercomputer resources, etc. Even though it is expected a reduction of errors and uncertainties, they will be always present and limit the predictability. This means that the majority of ideas and methodologies explained in this chapter are going to be useful for the next generation of weather forecasting systems, although new ones are expected to be developed. Then some of the future directions of work are outlined.
From the point of view of the current experience in EPS development, the multi-model and multi-analysis (from independent Global NWP models) approach, has showed to have better performance than any theoretical methodology based on a single model. This fact means that there is not enough knowledge about the different model uncertainties and that there can even be other unknown sources of error. Anyway, the latter approach is expected to be used intensively in the next EPS generations and even to overcome the former approach.
On the other hand, the better the EPS performance seems to be, the more number of error sources are considered and even the more methodologies are used together. Thus, future EPS developments are going to follow this line, partly because it increases the spread counteracting the common EPS shortcoming, the underdispersion. Anyway particular attention has to be paid not to increase spuriously the EPS spread, that is, without increasing the skill.
As it has been mentioned before, other EPS developments will come from having better assimilation techniques, better generation of initial conditions and better methodologies to tackle model errors and uncertainties. This fact means that, in a foreseeable future, EPSs will become more complex.
Finally in the current and the next decade there will be an important increase in the horizontal and vertical resolutions of the EPSs linked to the NWP model developments for smaller grid spacing. Thus, spatial resolutions of the next Global and LAM EPSs generations are going to be, respectively, close to the non-hydrostatic scale (e.g. 8-16 km), and about the meso-gamma or convection-resolving scale (e.g. 1-4 km). One consequence of this will be that verification will have to evolve to an objected-oriented way (e.g. SAL or MODE techniques). Because of uncertainties grow faster as the resolved scales are smaller, due to a more intrinsic chaotic nature, another consequence will be that the only feasible methodology to forecast the weather at these scales will be ensemble forecasting.