## 1. Introduction

Seasonal climate refers to average conditions in the atmosphere and ocean over time scales of the order of three months. When considering risks associated with seasonal climate we are concerned with deviations from normal conditions, or ‘climate anomalies’. Summers that are hotter than usual, extended drought conditions and exceptionally active tropical cyclone seasons are examples of seasonal climate anomalies.

The countries of the Pacific Ocean are exposed to climate risk across a range of sectors, most notably in water resources, agriculture and disaster preparedness. In Fiji, the forestry industry is affected by an increased likelihood of fires in dry conditions and by access roads becoming too muddy to work on in wet conditions. In Samoa and Fiji the supply of hydroelectric power is vulnerable to rainfall deficiencies, as dams tend to be relatively small in comparison to average inflows. Extreme weather conditions threaten tourism revenue for islands such as Rarotonga in the Cook Islands. Seasonal variations of ocean temperatures, which can drive the migration of species such as Tuna and cause the bleaching of coral reefs in which fish spawn affect the productivity of fisheries which are an important economic resource for countries such as Kiribati. Seasonal variations in surface water and temperature can create more favourable conditions for host vectors of diseases such as malaria, increasing their prevalence. [1]

While many climate anomalies are essentially chaotic and not predictable, there exists large-scale coupling (feedback) between the atmosphere and the ocean, which imparts a degree of predictability to variations of seasonal climate in the atmosphere-ocean-land surface system. The most significant manifestation of this coupling, and the physical source of much of this predictability is the El Niño Southern Oscillation (ENSO), a quasi-periodic mode of variability of the equatorial Pacific Ocean [2]. The primary manifestation of ENSO is in the patterns of sea surface and sub-surface temperature in the Pacific Ocean, with cooler than normal central equatorial Pacific sea surface temperatures termed ‘La Niña’ and warmer than normal temperatures termed ‘El Niño’. During La Niña and El Niño events, feedbacks between the ocean and atmosphere lead to changes in the dominant atmospheric patterns, which influence climatic conditions throughout the world. The ocean processes are slower and more predictable than the atmospheric processes responsible for weather, and their influence on the likelihood of atmospheric states can be used to make predictions, either through characterising this relationship empirically using historical data, or by using a physically motivated model of the coupled ocean-atmosphere system.

The Tuvalu drought of 2011 provides an example of vulnerability to seasonal climate risk.

Populations on low coral atolls such as Funafuti (located at 8 South, 179 East) rely heavily on rainwater harvesting for water resources as there are no natural streams or lakes. Rainfall from December 2010 to January 2011 was up to 600mm below normal levels for the western central Pacific region in which Funafuti is located (Figure 1)[3]. Long range rainfall outlooks for the March to May season forecast a continuation of the pattern of suppressed rainfall[1] -
. These outlooks turned out to be substantially correct, with analysed rainfall deficits of up to 400mm in the region for the period March to May[2] -
. On the 28^{th} of September 2011, critically low water supplies caused the government of Tuvalu to declare a state of emergency. In early October the governments of Australia, New Zealand, Korea and Japan began delivering fresh water supplies and portable desalination units.

The physical cause of the lack of rainfall in Funafuti in 2011 was cooler than normal waters in the equatorial Pacific, associated with the strongest La Nina[3] - episode in recent recorded history, which peaked in the Southern Hemisphere summer of 2010-2011. La Niña events typically decay in Southern Hemisphere Autumn, but in this case the event weakened and then re-established itself in the second half of 2011. The cooler than normal waters in the region of Tuvalu suppressed the rainfall generating convection of moist air, which led to rainfall deficiencies over a sustained period. Figure 1 illustrates that seasonal outlooks based on dynamical models provided guidance anticipating the persistence of these rainfall deficiencies throughout 2011. The tendency towards suppressed rainfall at Funafuti during La Niña events is evident from the composite time series shown in Figure 2. This event illustrates the real nature of climate risk and that, for some phenomena, we now have the capability to predict the features of the earth system that are responsible well in advance.

Many of the examples in this chapter will revolve around the island countries of the Pacific that are directly affected by ENSO and are able to benefit directly from advances in the ability to predict it. Routine seasonal outlooks are issued regularly by national meteorological agencies including the Australian Bureau of Meteorology and The United States National Oceanic and Atmospheric Administration (NOAA), as well as by organisations such as the International Research Institute for Climate and Society (IRI). The availability of seasonal outlooks for the coming seasons gives important information for governments and aid agencies to plan their assistance.

Seasonal outlooks of the likelihood of extreme, synoptic timescale events such as tropical cyclones are also of use for planning disaster preparedness. Tropical cyclones are the most destructive weather systems that impact on coastal areas in the Pacific. While individual tropical cyclones are not predictable beyond timescales of the order of one day, the distribution of tropical cyclone activity is influenced by large-scale climatic features such as ENSO [4].

Climate risk may be assessed in a historically averaged sense, by using the past distribution of extreme events such as droughts or tropical cyclones to give predictive probabilities of the events in the future. Climate change complicates this approach, because while observed changes in the mean state of the climate systems so far have been small, this small change in the mean state can lead to large changes in the frequency and magnitude of extreme events[5]. We refer to this as the influence of climate change on climate variability. The effect of climate change on weather patterns is likely to be considerably more complex than a simple shift of the existing probability distribution. As an example, a recently completed global analyses has found a near 50-fold increase in the frequency of extremely hot temperatures during the northern summer, meaning that the historical occurrence now greatly underestimates the risks of extremes[6]. It has been proposed that a change in climate forcing projects onto the existing modes of variability of the climate system, altering the frequencies and intensities of existing weather regimes[7] [8]. An example of such a mechanism is the prospect that global warming has intensified the hydrological cycle, causing more extreme flooding and droughts [9]. The current set of coarse resolution GCMs used to evaluate anthropogenic climate change may not be sufficiently detailed to capture such nuanced responses, and as such considerable uncertainties remain about the impact of climate change on weather events. In the face of these uncertainties, an effective and low cost option to reduced vulnerability to climate change is to improve the accuracy, availability and use of forecasts[10].

The aim of seasonal forecasting is to predict the average weather or aggregate weather over a long period, usually three months. By exploiting the relationship of weather systems with large scale, long time-scale coupled ocean-atmosphere processes, probabilistic forecasts can be made of the likely tendency of conditions in the coming season. Seasonal predictions are not deterministic, in other words they do not make a prediction that a single outcome will or will not happen. Rather they give a statement of risk, typically about the likelihood of wetter than normal, or warmer than normal conditions.

A range of potential applications for seasonal outlooks has been identified. As noted, in countries dependant on rainwater harvesting for water supplies, advance knowledge of drought conditions can allow pre-emptive water saving or water supply bolstering. Knowledge of the relative likelihood of fires or inaccessibility due to rainfall could be used to plan forestry activities. Rainfall outlooks can be used to estimate the availability of water for hydroelectric power generation, and to pre-emptively purchase fuel for backup generators, avoiding the payment of expensive spot rates for fuel. Tourism operators can develop forward plans that take into account changes in the likelihood of climatic disturbances. Reefs likely to suffer from elevated temperatures can be declared off-limits for fishing and tourism to reduce other sources of stress on corals [11]. Seasonal variations in surface water and temperature can increase the prevalence of certain diseases such as malaria by causing more or less favourable conditions for host vectors [1]. The beef industry in Vanuatu can benefit from forward estimates of how many head of cattle a pasture will be able to support. Seasonal forecasts have been shown to be of economic utility in the management of wheat farming in Australia by guiding changes in practice such as crop row spacing and fertilizer application [12].

## 2. The limitations of empirical models and the imperative for a dynamical model basis for seasonal forecasting

Empirical models (or ‘statistical models’) are currently used by many meteorological services for seasonal climate outlooks. These models are based on empirical relationships, usually between ENSO based indices (‘the predictors’) and variables such as local rainfall and temperature (‘the predictands’). Using current observed values of ENSO indices these past relationships can be used to create forecasts [13].

A warming of the climate system due to greenhouse gas forcing is predicted by theory, demonstrated by numerical predictions and has been observed over the course of the past century [14]. While the empirical relationships between climate predictors and predictands such as rainfall may be robust, in a warming climate, environmental indicators used as predictors are now frequently outside of the range of historical records, meaning that relationships are being assumed for events which do not have an historical analogue. In general, empirical models cannot reliably account for aspects of climate variability and change that are not represented in the historical record. Empirical forecasting usually depends on the assumption of stationary relationships between predictors and predictands. This also renders such schemes susceptible to periodic changes in these relationships due to decadal timescale variability.

For example, outlooks for tropical cyclone (TC) activity in the Australian region are based on a regression model using values indices representing major modes of variability in the ocean. The 2010-11 TC season featured to a very strong La Niña event with an unusually hot Indian Ocean, an event without historical precedent. In this case the statistical models significantly over-predicted the number of TCs that occurred in the Australian region. Analysis shows that the environmental indicators used for tropical cyclone seasonal outlooks for the Australian region in 2010-11 and 2011-12 are outliers in the predictor phase space, in other words, outside of the range of variability for which the model was tested and built.

Inter-annual variability in the intensity and distribution of tropical cyclones is large, and presently greater than any trends that are ascribable to climate change. However climate change impacts our ability to make skilful predictions of tropical cyclone activity using empirical models, because in the warming environment predictors such as SSTs now frequently lie outside of the range of past variability. Improved empirical methods can be developed to adjust for this, by incorporating trends and by treating predictors that lie outside the observed range of variability more cautiously [15]. However it is widely considered that dynamical models provide the best prospects for improved seasonal forecasting in the future, either through providing long range forecasts of environments favorable to cyclo-genesis, or through high resolution models that can provide an estimate of the number of cyclones expected to form.

### 2.1. Seasonal forecasting with dynamical models

An alternative paradigm for seasonal prediction is the use of coupled ocean-atmosphere General Circulation Models (‘coupled models’ or GCMs). State of the art coupled models consist of a physically based model of the ocean, usually solved using a grid based scheme, coupled to a physically based atmospheric model, often solved using a spectral spatial discretisation [16]. GCMs solve a set of dynamical equations ('the primitive equations') to project the current analysed state of the ocean-atmosphere system into the future. The term ‘analysed’ here is used quite deliberately to describe methods used to determine a global estimate of the state of the ocean and atmosphere based on the combination of available observations and using numerical methods based on a mix of physical and statistical relationships to infer the state of regions not subject to direct observation. The objective of the assimilation process is placing constraints on the observations to ensure they present a physically plausible set of initial conditions for the ocean-atmosphere simulation.

A number of dynamical models are run at operational meteorological centres around the world. We briefly describe the main components here with reference to the model used operationally at the Australian Bureau of Meteorology for ocean temperature forecasts. The first component is an ocean data assimilation system, which provides an estimate of the state of the upper ocean based on an analysis of ocean observations. The observations of the ocean come from a variety of sources including satellite observations of sea surface temperature and sea level height, fixed, drifting and profiling buoys (such as the TOGA-TAO array which provides real-time observations of the region of the Pacific Ocean central to ENSO) and observations taken from ships.

This ocean assimilation system initialises an ocean model though a complex process which attempts to bring the model into a state consistent with the oceanic observations but also such that it is internally balanced to minimise so called ‘initialisation shock’. Ocean model resolution for the Bureau of Meteorology POAMA (Predictive Ocean Atmosphere Model for Australia) [17] is 2 degrees in longitude with a latitudinal resolution telescoping from 0.5 degrees near the equator to 1.5 degrees near the poles. The model resolves 25 vertical levels. Specialised coupling software is used to transmit surface fluxes of heat and momentum between the ocean model and an atmospheric model. The POAMA atmospheric model has a spherical harmonic horizontal structure with triangular truncation at wave number 47 (grid cells of roughly 250km by 250km when transformed) and 17 pressure levels. The atmospheric model typically has its own assimilation system to ingest data from available observations of meteorological parameters including wind, pressure and temperature. Coupled assimilation, in which the ocean and atmosphere are initialised together to reduce initialisation shock is an area of current research.

Processes with a spatial scale smaller than the model grid scale are ‘parameterised’, which means a statistical or process-based model is used to represent the average effect of this process on the sub-grid scale. Design and configuration of sub-grid scale processes is a specialised and active area of research, with current activity focussed on the use of stochastic models to better capture the uncertainty of the sub-grid processes.

Seasonal climate prediction is inherently probabilistic because the evolution of the climate system is highly sensitive to initial conditions. Small difference or ‘errors’ in the description of the initial climate state grow with time leading to very different forecast outcomes. To estimate the range of physically plausible outcomes, GCMs are typically run as an ensemble, in which a number of simulations are performed with slightly different initial conditions. The initial conditions are perturbed to realistically sample the plausible range of initial climate states.

Ensemble strategies in theory allow for better estimation of the probability of extreme, or less likely events. The nonlinear nature of the coupled ocean-atmosphere system means that these probabilities may not be well estimated from a single ‘best guess’ deterministic forecast. Using simple decision models which will be discussed in more detail below, Palmer [18] demonstrated that the economic value of ensemble forecasts is greater than that of individual models or simple ensemble means.

Because basic physics does not change under global warming, dynamical models are less compromised by climate change than statistical models. GCMs explicitly take into account climate processes that are important for seasonal climate prediction such as equatorial oceanic waves and atmospheric convection driven by ocean temperatures and are not constrained by what has occurred in the past. GCMs implicitly include the effects of a changing climate whatever its character or cause and can predict outcomes not seen previously.

## 3. Model based forecast products

Seasonal climate forecasts are inherently probabilistic due to imperfect model initialisation, instabilities in the modelled system, and model error. One approach to transform a GCM ensemble forecast into a probabilistic forecast is to define one or more event thresholds, and then take the fraction of ensemble members above this threshold as the probability forecast. This approach effectively takes the model ensemble distribution as a best guess of the probabilities of future states of the system. These can be referred to as ‘ensemble relative frequency’ or ‘perfect model’ probabilities, as they assume that the model ensemble is a perfect sample from possible futures consistent with the model initial conditions. This procedure does provide an adjustment for model biases, for example if the model tends to be biased towards warmer temperatures, because the ensemble distribution for a particular realization is measured against the model’s own climatological state.

One event for which probabilities may be desired would be the occurrence of above median monthly rainfall over a region of interest. Figure 4 shows the POAMA hindcast ensemble for the year 1997 and its conversion to a probabilistic forecast of the event of monthly rainfall being above the long-term median in the Murray Darling Basin, a region of high agricultural importance [19]. This probability forecast was generated for retrospective seasonal forecasts generated with the POAMA 1.5 model for the period of 1980 to 2006. The individual ensemble members show that for each month a range of outcomes is possible including both above and below media rainfall. These retrospective forecasts are produced from the first season of model output, meaning there is no time elapsed between the model initialisation and the period being forecast for.

## 4. Accounting for model error

Uncertainty in probability forecasts can be divided into three distinct categories. The first category of prediction uncertainty is linked to the non-linearity of climate dynamics that causes a sensitivity to initial conditions. This is the so-called butterfly effect, which imposes hard limits on our ability to make deterministic predictions of nonlinear systems. The simple fact that we do not have infinite precision means that instabilities on scales smaller than the smallest resolved model scale inevitably grow and affect the larger scale until no predictive skill remains [20]. The ‘saturation time’ after which the system is effectively unpredictable is longer for the ocean than the atmosphere.

The second major category of prediction uncertainty is the sparseness and imprecision of earth system observations. As discussed above, the analysed state of the atmosphere and ocean is necessarily different from its actual state, and as such model projections are projecting an imperfect estimate of the initial state forward in time. As such even with a perfect physical model, predictions would be imperfect. This source of error interacts with the first, because instabilities growing from initial conditions that are not present in nature may produce possible future states that are inconsistent with actual potential future states. Ensemble forecasting allows this initial condition uncertainty to be estimated and quantified by sampling the space of plausible initial conditions and projecting this sample forward in time. These two kinds of uncertainty can be described as ’flow dependent’ [21] because their rates of growth and magnitudes are sensitive to the stability of the point in phase space characterising the flow.

The final category of uncertainty is model error, the fact that our mathematical idealisations of the climate system are not perfect. This includes errors due to imperfect physical parameterisations, errors due to unresolved processes at the sub-grid scale and differences between the mean state of the model and the true system. This class of error is widely studied and motivates research into better models with improved representation of physics, and model calibration techniques that can account for or correct the errors.

Single model ensemble forecasts only capture the components of prediction uncertainty associated with uncertain initial conditions and model-captured instability, and these are only fully captured in the ideal case of an infinite ensemble that uniformly samples initial condition uncertainty. An ensemble of a single model provides no information about the model error component of prediction uncertainty (Stephenson, 2005), and models that are structurally similar will invariably share biases.

### 4.1. Assessing forecast error

Forecast validation is the process of measuring the correctness of a set of issued forecasts. It can be thought of as being distinct from model validation which is about determining whether a model correctly resolves physical processes [20].

Here we give an example of forecast validation based on the definition of discrete events, for example the event of rainfall over a three month period exceeding a given threshold, and of categorical forecasts, for example low, medium and high probability of the event. For three forecast categories, the contingency table summarising the forecast- verification set has the form shown in Table 1, with forecast categories

The joint distribution of the forecasts in one bin

The calibration-refinement factorisation of the joint distribution for a particular forecast bin,

is composed of two factors: the true positive ratio

The true positive ratio

We apply this simple forecast validation scheme to the POAMA MDB rainfall forecasts discussed above. In order to compute meaningful statistics on these probability outlooks, three bins for the probability of rainfall exceeding the climatological median were used. A small number of forecast verification pairs in any particular bin reduces the statistical significance of results markedly. Larger probability bins can be used to mitigate this, but at the expense of forecast resolution and sharpness. The three bins translate into categorical forecasts of a low, medium and high probability of an above median rainfall event. The binned forecasts were verified against Australian rainfall data from the Australian Bureau of Meteorology National Climate Centre’s gridded atmospheric data set [23].

Table 2 shows these counts for the MDB rainfall forecasts described above for all months in the hindcast period.

Forecast | Mean Ensemble Frequency (Model probability) | p(E|F) | 90% Probability Interval of p(E|F) |

Low (0-33%) | 0.21 | 0.39 | 0.31 - 0.47 |

Medium (33-66%) | 0.50 | 0.49 | 0.42 - 0.56 |

High (66-100%) | 0.80 | 0.65 | 0.56 - 0.73 |

If the calibration distribution in each bin is assumed to be a Bernoulli distribution, probability intervals for the parameter can be generated for the forecasts by a permutation counting method. An alternative method for larger datasets for which permutation counting is prohibitive is to use percentiles of a normal posterior distribution. Table 3 gives the true positive ratio with a 90% probability interval for the data in Table 2. It can immediately be seen that the probability distribution implied by the model ensemble is not consistent with the probability distribution implied by the verification of the forecasts. For example the mean probability for ‘low probability’ forecasts is 21%, but the event occurs 39% of the time for this forecast category.

The earth system is very high dimensional and the procedure used here reduces the dimensionality of the problem. Such dimension reduction may result in a loss of information about the performance of the system – we are faced with a trade-off between information contained in the model-based forecasts, and seeking to extract information from the model-reforecast dataset. In this case simple binning is used, more sophisticated methods such as principal component analysis could also be employed. We will return to this point later in the chapter when calibration is discussed.

### 4.2. Assessment of probability forecasts

In assessment of probability forecasts the two main aspects of performance are resolution and reliability. Reliability is defined as the degree to which the observed frequency of an event coincides with its forecast probability. Reliability does not guarantee useful skill, but forecasts that are not reliable cannot be taken at face value and must be adjusted, either implicitly as occurs when a verification plot demonstrating overconfidence is published next to a forecast or explicitly by downgrading probabilities that are not justified by model performance. The term 'well calibrated' is used to describe probability forecasts that are reliable. Resolution is defined as the frequency with which different observed outcomes follow different forecast categories, in other words the degree to which the forecast system can 'resolve' different outcomes.

Figure 5 shows the reliability diagram for the POAMA 1.5 Murray Darling Basin Average monthly mean rainfall, for all months. The green bar marks the forecast 90% probability interval (ci), the purple bar marks the 90% probability interval for perfect forecasts with the same sample size. Reliability diagrams are plots of the true positive ratio (also known as the calibration function, observed relative frequency, likelihood and hit rate) against the mean probability of the forecasts in each bin. Reliability diagrams are used to assess the degree to which the model forecast probabilities agree with the observed frequencies, shown in figure 5 with the probability intervals described above. The figure shows that even when small sample size is taken into account, the forecasts are overconfident. Resolution is represented by the spread of points on the reliability diagram in the vertical – it can be seen the model has some ability to resolve between the two outcomes.

## 5. Understanding prediction utility: Simple decision models

In order to begin to understand potential uses of seasonal forecasts it is instructive to study simple cost-loss decision models. Such simple models provide a framework to begin to quantify the potential value of forecasts. Before proceeding, we note that real-world decisions are typically made with far more parameters and subject to greater uncertainty regarding potential costs and payoffs than the simple models studied here.

We first consider a simple binary event, binary decision model in which there are two possible outcomes – the occurrence or non-occurrence of an event - and the user makes a decision to protect, or not protect, against the event. Protection has a cost; failure to protect incurs a loss. The classic example is the decision to carry an umbrella to protect against the possibility of rain. A seasonal timescale example is the decision to apply fertiliser to a crop based on the likelihood of future rainfall over a season. An early study of these issues was made by Anders Angstrom as documented by [24].

A failure to protect with cost C results in a loss L. In this framework it only makes sense to take action given the probability of the event P if P > CL. If it is not, then the expected loss is less than the cost of taking protective action. The combination of the joint distribution of forecasts and observations and the decision-makers cost function determines the potential economic value of the forecasts. (Table 5.)

### 5.1. Adjusting model output: Introducing calibration

Decisions about the use of GCMs for seasonal climate forecasting are usually based upon measures of model performance over a hind-cast (retrospective forecast) period. A natural and popular extension of this idea is that GCM-based forecasts should be adjusted by this skill assessment. This motivates 'Model Output Statistics' methods[25] and ‘model calibration’ and has been widely adopted in medium range weather forecasting [26] and seasonal forecasting [27] [28] [29].

In order to make rational decisions based on quantifiable costs, losses and probabilities the end user needs the calibrated forecast probabilities, and needs to know what their costs and losses are for each contingency. Given the calibrated forecast probabilities, with reliable confidence intervals, they are in a position to use these probabilities to determine the optimum course of action to follow for their unique cost function. Given information about climatology, a model and its verification, the calibrated model probability p(E|F) is this best estimate, subject to the assumptions made in determining the calibrated probability.

As a simple example of calibration, consider the true positive ratio calculated above. While crude and subject to sampling error, this represents the conditional probability of the event given the model forecast category. The true positive ratio, proposed as the best estimate of the event probability from the POAMA MDB seasonal outlooks discussed above is a conditioning of probabilistic forecasts derived from the GCM ensemble upon probabilities obtained from comparison of the hindcast set with observations. These conditional probabilities are needed for users to make optimal decisions [30]. Skill for coupled models is commonly presented as correlation plots, mean error plots and sometimes more esoteric scores for probabilistic forecasts. While these scores are useful for model diagnostics, and can quantify potential forecast value, it is not obvious how users who need to make decisions based on forecasts should convert these measures into new estimates of probability. We note that some effort has been spent into developing verification measures that do have a direct relationship to economic value such the ROC (Receiver Operating Characteristic) score and the logarithmic score based on the information content of a forecast.

Resolution can be degraded by calibration and it is expected that the application of calibration techniques will involve some trade-off in which resolution is traded for reliability. It is also the case that cross-validation methods used on the application of calibration in order to avoid ‘artificial skill’ can also result in artificial reduction in skill scores, and thus in the assessment of such methods it can be difficult to disentangle cross validation artefacts from true reduction of model skill due to calibration.

This simple calibration framework can be extended: similar methods can be applied to parametric probability density functions [28]. Below we discuss different calibration methods, but first we turn to more sophisticated decision models.

### 5.2. Extending the binary cost-loss model

In the simple cost-loss model the cost of taking protective action is the same whether the event occurs or does not occur. While this may be true for many economic decisions, when social and political dimensions are considered there is a clear penalty, in terms of confidence in the forecasting system and reduced possibility of action in the future, for false alarms. The binary cost-loss model can be developed further to include such a false alarm or ‘cry wolf’ effect. Such an extension is effectively an adjustment for the deviation from perfect rationality of forecast users.

The above model can also be extended to more sophisticated decisions based on event probability thresholds, with different actions to be taken at different probability thresholds, depending on the users attitude to risk. We present a hypothetical example of an agriculturalist making a decision about whether to apply additional fertilizer, at a cost, with a potential payoff depending on the probability of expected rainfall being above median. In this example 20% rainfall probability is the threshold at which the cost of applying fertilizer is less than the expected payoff (Table 6). The decision thresholds in Table 6 provide a way of mapping from a given forecast to an action, again in relation to a binary yes/no event. Such tables are dependent on the details of individual enterprises and must be determined with regard to their operating costs and potential losses. The premise for Table 6 is the decision by wheat farmers to apply top-dressed fertiliser in order to benefit from expected rainfall[12], however the numbers selected are arbitrary and shown for illustration. Another management decision that could be studied using this methodology is choice of cultivar, for example to decide whether to plant a drought tolerant strain of wheat or one with a higher potential yield in the event of good rains.

Using the true positive ratio we calculated for our sample rainfall forecasts in Table 3, the farmer would find that the calibrated ‘low probability’ forecasts from POAMA are not sufficient to justify the ‘no fertilizer’ action, because the observed frequency of above median rainfall events is above the 20% threshold. In other words, while the forecast have skill they do not have value to this particular decision.

Another simple decision model is the theory of Kelley betting, which deals with cost-loss scenarios in the context of gambling. In this theory a gambler bets a fraction

### 5.3. Presenting probability outlooks

We now turn our attention to ways of presenting information about forecasts and their skill-based calibration. The actual contingency table (Table 2) has the advantage of containing almost all the usable information (assuming the stationarity of the marginal distributions), but the disadvantage of requiring knowledge of verification methods to translate it into usable probabilities. A plot of the actual ensemble of past forecasts (Figure 4) allows users to eyeball the agreement and spread between forecasts and observations. However it provides no quantitative information about how much credibility to assign to a particular forecast. The reliability diagram (Figure 5) provides this information, but it is not intuitive to interpret for most users. A simple pie chart can also be used to present relative probabilities. Figure 6 shows visually how the model forecast adjusts the model estimated probabilities, and what the credible intervals based on the size of the sample are. It shows the prior climatological probability of the event and the updated probabilities, with 90% credible intervals for each forecast category. This plot is designed to communicate to end users how much the forecast ought to affect their estimate of the event’s probability, based on the rate of event occurrence for previous forecasts.

Coupled model skill varies strongly by month, but using the simple binning calibration method this information is difficult to resolve. Table 4 shows the contingency table and true positive ratio for June-July-August seasonal forecasts. The true positive ratio suggests that the forecasts have reasonable skill and that we ought to take the forecast of a high probability of above median rainfall as increased from a 50:50 climatological odds to 9:2 in favour of the event. Unfortunately the small sample size in each probability bin results in very large probability intervals as shown in Figure 6 (left). The wide probability intervals around our estimate of skill by month are troubling, because we know that skill varies strongly by month but are unable to quantify this adequately for these forecasts. Pooling forecast-verification pairs in order to increase confidence is one way to increase sample size, by aggregating forecasts at different locations and times. Both procedures will reduce the size of our credible intervals, but risk increasing the autocorrelation of the forecast data. A similar sample size problem affects the statistical significance of attempts to calibrate forecasts for individual grid points.

A question for forecast users is how the probability range should affect the decision. The wider the interval, the less evidence exists that the forecast probability corresponds to a repeatable relationship between model and reality. Decision makers may prefer to assume climatological probabilities until this information can be sharpened. Theoretical work or modeling could determine optimum forecasts for selected decision making cost functions.

### 5.4. Adjusting for Model Error in Continuous Forecasts

Calibration methods can be considered to adjust the probability distribution produced by the model by using information about its past performance, with the aim of providing unbiased and reliable forecasts. A straightforward approach to the generation of probability outlooks is to build a linear regression model for predictand

where

Such regression-based approaches can be made more robust to small sample size using Bayesian methods in which model parameters *θ* are given by Bayes theorem as

with hindcast data *H* and observations *O*. The likelihood function *p*(*θ*) is the prior probability for the model parameters. Probability density functions for model parameters *θ* can be determined using Markov Chain Monte-Carlo sampling [32].

### 5.5. Variance inflation

Johnson and Bowler (2009) outline a variance inflation technique which adjusts the ensemble forecast to meet two conditions: a) that ensemble members have the same variance as observations, and b) that the root-mean-square error of the ensemble mean be equal to the spread of the ensemble. A major difference between this and the previous method of linear regression with residual errors is that the ensemble spread remains a major determinant of forecast uncertainty. The first condition is designed to achieve the statistical indistinguishability of the first two moments between ensemble members and observations. The second condition is designed to ensure that the ensemble spread accounts for the expected model error. These conditions are achieved by increasing (or decreasing) the perturbations of the ensemble members from the mean while keeping the correlation between model and truth is unchanged (except in the case of a negative correlation between model and truth, in which case the sign of the correlation is reversed).

Given ensemble mean

Coefficients *α* and *β* are computed as

with observed variance*ρ* and time average of ensemble variance

### 5.6. Regression estimate of event probability

Another method of adjusting probability forecasts is to regress the forecast probabilities directly against the observed events/non-events frequencies. While having the drawback that it is computed directly on ensemble-derived probabilities, it has the advantage that it makes no distributional assumptions, and estimates only one parameter.

### 5.7. General remarks on calibration

In the case of overconfident forecasts, calibration procedures reduce the amplitude of the probabilities, adjusting for this overconfidence by reducing the resolution. Conceptually, this calibration step can be considered the application of a statistical model to the direct model output in which the forecasts are corrected for mean state bias and over-confidence in the ensemble distribution. Figure 7 presents the application of two calibration methods to model time series which exhibit a high and low correlation with the verifying observations respectively. The central panel demonstrates the effect of the variance inflation adjustment described above, while the lower panel shows a regression adjustment with Bayesian parameter estimates.

It could be argued that such procedures degrade, or corrupt model outputs, because they make use of only limited information from the model reforecast set and available observations. This may be the case, but if such information can be specified it can be included in the calculation of calibration factors. If it cannot be specified and measured, then we are hardly in a position to use it to inform our estimates of future probabilities!

In seasonal forecasting, calibration is complicated by the short length of the hindcast verification data set, typically 15 to 30 years, which imposes hard limits on how much information we can reliably say we have about the model. This paucity of data makes model skill assessments and model adjustment difficult because parameters calculated from the verification dataset will necessarily have large sampling error. For this reason it is desirable that calibration models have a small number of parameters.

The problem is even thornier, because some circulation regimes such as strong El Nino events are thought to be more predictable than others, so there is every chance other variables may be strongly related to the predicted accuracy of a given forecast. Indeed the practice of ensemble forecasting is designed to reflect such changes in potential predictability. For example the influence of strong El Niño and La Niña events leads to greater predictability of climate anomalies in affected regions during such events. Given this knowledge, information about the state of ENSO should in theory be used to estimate the certainty of seasonal outlooks. Empirical outlooks do just this, but schemes using this information for dynamical model based outlooks are not yet common.

### 5.8. The multi-model approach

Another approach to the quantification of model error is to combine forecasts from a number of different yet plausible dynamical models. Multi-model combination aims to benefit from a better representation of uncertainty in model physics, model configuration and initialisation strategy. The multi-model approach is widely used in operational weather prediction (out to 7 to 10 days ahead). Model combination is complicated by varying grid resolutions, ensemble sizes, different model skill and mean biases between models, as well as unresolved questions about model weighting. The multi-model approach has been criticized on the grounds that combining a forecast from a bad model with a forecast from a good model may result in a less skillful forecast if one does not weight models to reflect their level of skill.

### 5.9. Downscaling

Another family of model adjustments is motivated by the mismatch between the resolved scale of GCMs and the scale at which most decisions are made. GCMs can provide useful forecasts of atmospheric fields at seasonal timescales but are typically run at coarse spatial resolution such that the direct model output represents spatial averages over thousands of square kilometres (typically grid cells some 100 km in size). This coarse resolution poses a problem for applications that require forecasts at a finer spatial scale, especially in regions where the real topography causes local rainfall to diverge significantly from model grid averages. Where the errors to be corrected are primarily a result of the spatial scale of the GCM, the correction is called ‘downscaling’. Downscaling is desired for those Pacific islands where the interaction between the prevailing winds and local topography is a significant driver of variability, but the GCM does not resolve local topography.

The primary goal of downscaling is to replace the large-scale grid box climate variable, in this case rainfall, with rainfall that is better representative of the local situation. One method of downscaling is that of meteorological analogues. In this approach, large-scale synoptic meteorological fields are used as predictors for small scale variables. The output of a seasonal timescale GCM is used to generate forecasts of the large-scale fields. The analogue methods has been shown has been shown to produce good results for Twentieth Century South Eastern Australian rainfall in the context of downscaling for climate change projections[33]. As with most statistical downscaling techniques, analogue downscaling is computationally cheap, in contrast to resource-intensive dynamical downscaling using nested atmospheric models.

Figure 6 shows the topography resolved by a high-resolution numerical weather prediction model, and the topography resolved by a coarse resolution seasonal prediction model.

## 6. Software architecture: From models to systems

The design of systems for the generation and distribution of GCM based outlooks is architecturally complex. It is here that the interdisciplinary nature of the seasonal forecasting activity becomes clear. In addition to more traditional earth system science involved in understanding coupled ocean-atmosphere processes, the tasks of data processing, data modelling and information system architecture require advanced computing skills. We now outline a general pattern for the design of systems for the delivery of seasonal forecasts to end users which is a generalisation of the implementation described above and in [34].

Four distinct layers can be defined as components of the overall process of turning the outputs of GCMs into seasonal outlooks suitable for use by decision-makers.

The model layer comprises the GCM simulating the evolution of the coupled ocean-atmosphere system. This component is a complex software system in itself, integrating the ingestion of data analyses (themselves based on multiple networks of observations), the assimilation of these observations into the model integration cycle, and the output of variables of interest. This layer is the domain of earth system scientists and experts in numerical computation. GCMs are typically the result of the combined efforts of a large number of such scientists and engineers working over a long period of time.

In the forecast generation layer, forecast products are generated from dynamical model output. This process involves statistical corrections for model biases and may involve the integration of outputs from a number of different models. Decisions about which model outputs to use will be based on the analysis of model performance over an historical period. The resulting derived forecast products are typically stored in self-describing files with additional metadata to support the clients that deliver the outlooks. By storing the generated forecasts in an accessible, metadata rich format they are easily ingestible by downstream clients, whether these are simple viewers or more complex models that use the calibrated model data as one input among many. GCM model outputs may be used to drive other models (for example hydrological models). In general metadata is preserved as data is processed and new metadata added to describe transformations. This enables downstream users of the data to understand its provenance. Metadata curation, while tedious, should be considered a best practice if data is to be made public and its use promoted. Practitioners in this domain will typically be data scientists, statisticians, and climate scientists who work closely with forecast users.

At the data service layer forecast data is exposed via a data server, which makes the forecast data available using standard interfaces such as OPEnDAP[35]. At this stage, the generated outlooks are data products, not graphical products. The format of the output is not dependent on the particular dynamical model, or even that the model is dynamical: the forecast is simply a time series of gridded data with descriptive metadata. This layer is the domain of information architects and software engineers with expertise in moving data across networks efficiently.

The product service layer provides the means for the majority of forecast users to access the products they require, typically in the form of maps and graphs presented as images, data tables and expert commentary. Such products are developed by climate scientists and associated professionals with expertise in data visualisation, usually in close consultation with forecast users. This layer may take the form of pre-generated images and tables, or of complex applications that obtain and process data directly from the data service layer using web services.

The use of open standards, interoperable systems and simple, clean interfaces simplify the challenge of integrating data from multiple streams into usable seasonal forecasts. Systems need to be interoperable to reduce the cost of exporting and ingesting data, a procedure which is required at all stages of the process from the modelling (where analyses must be ingested for the initialisation of models) to the product services (where potentially large volumes of image and web page requests must be serviced). The use of open standards supports this interoperability. Open standards arise in communities of practice over periods of time, and generally become enshrined in documentation and formally supported by inter-institutional bodies. They are to be preferred over the creation of new *ad hoc* formats and interfaces. Clean interfaces means that coupling between system modules should be kept to a minimum, and that system modules communicate with each other as far as possible using the standards described above. The integration of model outputs into arbitrary decision-support systems and downstream models is supported by providing the model output, and post-processed model-based forecasts in standard formats, and exposing web services that provide access to data and meta-data via clearly defined protocols.

### 6.1. Designing seasonal outlook products and tools

Agile software development methodologies allow technical development to proceed simultaneously with the gathering of user feedback and refinement of designs. They are characterised by short development cycles with clearly defined goals and sub goals. Beginning development early ensures technical issues are solved, avoiding delays if system requirements cannot be completely specified in advance, or scientific results that underpin the forecast products cannot be anticipated. More traditional software development lifecycles, such as the so-called ‘waterfall’ model, depend on system requirements and features being specified early and held static throughout the development period.

In the agile model of software development, regular user testing takes place at each development increment with user feedback incorporated into the next iteration. A test system may be made available for the use of developers and other project team members. An agile approach suits small and specialised project teams, as the flexibility of the agile approach may hold management risks for a larger development group. This approach enables responsiveness to the requirements of the end users of the system.

### 6.2. A case study: The pacific seasonal forecast portal

The development of web based tools integrating model-based outlooks with climatological information and other contextual information is one means of communicating information about climate risk to end users. [36]

A system was developed to ingest output data from the Predictive Atmospheric-Ocean Model for Australia (POAMA) GCM [17]. A user-facing component was developed, based on a rich web-based interface that provides a one-stop shop for access to dynamical model-based outlooks. The purpose of the tool is to provide a specialised point of access to CGCM based seasonal outlooks for the national meteorological services of Pacific Island countries.

This project was supported by the Pacific Adaptation Strategy Assistance Program (PASAP), a component of the International Climate Change Adaptation Initiative - an Australian Government Initiative of $328 million over five years, 2008-2013 to assist with high priority climate adaptation needs in vulnerable countries in the Asia-Pacific region. As part of this program, the Australian Bureau of Meteorology lead a project to strengthen climate prediction capacities in the national meteorological and hydrological services of Pacific Island countries, including countries both north and south of the equator: Papua New Guinea, Tuvalu, Kiribati, Fiji, Marshall Islands, Federated States of Micronesia, Palau, Nauru, Cook Islands, Samoa, Tonga, Niue, Solomon Islands, Vanuatu and East Timor. A key element of this work was the development of a web-based application providing access to dynamical model-based seasonal outlooks. As previously described, one means to reduce vulnerability to climate change is by improve preparedness to anomalous climatic events.

Graphical displays of seasonal forecasts of broad-scale, point and climate driver forecasts are generated with an example shown in Figure 9. The web application displays the contextual information provided as meta-data by the data service layer, consumes the outputs of web services that produce figures and tables. It displays model-based outlooks as overlays on dynamical maps using geospatial web services. Access is given not just to application graphics but also to outlook data. User-friendly options for data extraction from the web portal are provided to support users of the range of tools from Excel to R.

An agile, iterative approach to the development of the web portal user interface (UI) included testing of early development versions of the portal with users at a project workshop, and in a series of country visits. These sessions validated the overall UI design and provided valuable feedback for improvements.

While much work went into the web front end, an equal amount of work was spent ensuring that the forecast generation layer is decoupled from this specific client. The access to the data provided by the data service layer allows for the future design of web clients that perform computational value adding using processing services, for example the ingestion and subsequent combination and calibration of multiple selected models.

The integration of data into a dynamical mapping tool provides opportunities for data mash-up in which data from different sources is displayed in composite. The provision of geospatial information in such a way that data from multiple sources can be integrated opens the way for new and interesting applications. For example, one potential future application for seasonal forecasting might be the display of agricultural or fishery yield data overlaid with outlook reliability data.

Over the course of the project several workshops were held bringing representatives of partner countries together with scientists and service developers. These workshops provided training in the use and interpretation of dynamical seasonal predictions and introduced the software tools developed to provide the seasonal outlooks. Communication and training are essential elements in the development of outlook products: producing well-calibrated outlooks is not effective unless the end users are equipped to use them correctly. A particular challenge is that of communicating seasonal forecasts that are couched in terms of probabilities, and which may differ from model to model. Both formal and informal user assessment of outlooks must be carried out to ensuring that what the forecast provider thinks is being communicated is what is being communicated. An iterative design process in which users are consulted early and often reduces the risk of miscommunication, and allows for learning to proceed over a period of time.

In workshops scientists presented lectures on the physical basis of seasonal predictability; the historical skill of the POAMA GCM in predicting ocean and atmospheric conditions across the Pacific; software tools developed to provide access to the latest seasonal forecasts based on the coupled models. In one workshop participants also engaged in a series of exercises using the portal to generate seasonal outlooks for their local region which they described to the group in a series of successful presentations. Such hands on exercises are highly effective at developing skills in using seasonal forecasts and associated tools, in assessing the knowledge and level of engagement of participants, and in testing whether the tool works properly under realistic conditions.

In discussions participants highlighted the importance of climate studies focused on improving the understanding of climate variability in Pacific Island Countries, noting that climate variability interacts with climate change leading to many of the first felt impacts of climate change. Improved knowledge of extreme climatic events, with the assistance of tailored forecast tools, will help enhance the resilience and adaptive capacity of communities affected by climate variability and change.

## 7. Conclusion

From the physical basis to the complexities of applications to specific industries and decisions it is clear that seasonal prediction is a large-scale enterprise requiring coordinated work across a range of scientific and technological disciplines. Steady improvements in GCM resolution and physics, coupled with ever increasing understanding of the physical mechanisms of predictability, will ensure that seasonal predictions become an important component of adaptation to a changing and more variable climate.