Details about GCMs used in this study with their resolutions and other brief information.

## Abstract

In climatology, there are mainly two types of models used, that is, global circulation/climate models (GCMs) and regional climate models (RCMs). GCMs can be run for the whole globe, while RCMs can be run only for a part of the globe. In this chapter, we provided a general statistical methodology for evaluating process-based (GCM or RCM) climate models. To bridge observed and simulated data sets, statistical bias correction was implemented. A meta-analysis technique is used for selecting a model or scenarios, which have good performance compared to others. For model selection and ensemble projection, Bayesian model averaging (BMA) is used. Posterior inclusion probability (PIP) is used as model selection criterion. Our analysis concluded with a list of best models for maximum, minimum temperature, and precipitation where the rank of the selected models is not the same for the listed three variables. The outputs of BMA closely followed the pattern of observed data; however, it underestimated the variability. To overcome this issue, 90% prediction interval was calculated, and it showed that almost all the observed data are within these intervals. The results of Taylor diagram show that the BMA projected data are better than the individual GCMs’ outputs.

### Keywords

- bias correction
- climate change
- meta analysis
- model selection
- posterior inclusion probability

## 1. Introduction

This chapter is basically about statistical evaluation of climate models; however, prior to model’s evaluation, it is important to highlight briefly about climate models and their types. In the literature, climate models are also known as process-based models as these models work closely to the physical process of the climate of our planet. Broadly speaking, climate models can be divided into two types, global circulation/climate model (GCM) and regional climate model (RCM). As the name of these models suggests, GCM can be run for the whole globe, while RCM can be run for a particular location of interest. Due to different uncertainties, we need to evaluate these models before using their outputs for further analysis. Evaluation of a climate model means assessing a model’s performance so as to articulate the grounds on which a model can be declared good enough for its anticipated use. Model’s evaluation is an important step in climate change assessment and impact assessment studies. It provides guide lines to choose the best models or scenarios for further analysis. For evaluation of climate models, we need observational data and historical simulated data by using climate models for the same variables and for the same time period. This chapter presents a combination of classical and Bayesian statistical approaches for the evaluation of climate models. This chapter is structured as: right after this brief introduction, climate models in general and GCM and RCM in particular and their evaluation are briefly explained, and methodology is discussed in Section 2, Section 3 is reserved for results and discussion, and Section 4 comprised of summary and conclusion.

### 1.1. Climate model

A climate model is a complex system of mathematical equations which represents the physical process among various components which contribute to the climate of our globe. To run a GCM, the globe is divided into a number of grid boxes with horizontal resolution (latitude and longitude) and vertical resolution (height or pressure). The climatology is solved in each grid after providing the initial conditions to the main deriving climate variables like temperature, wind speed, humidity, pressure, etc. The brief details about GCMs and RCMs are presented in the subsequent sections, for more details about GCMs, we refer to [1].

### 1.2. Global climate model

GCMs are the most modern and sophisticated tools available to provide basic information about climate globally. These models comprise complex mathematical equations which represent the physical process of atmosphere, ocean, cryosphere, and land surface. GCMs are run for the whole globe, and due to the complex system, it takes long time in simulations; however, super computers can be used to make their performances more efficient. According to [1], a GCM is “Numerical models, representing physical processes in the atmosphere, ocean, cryosphere and land surface, are the most advanced tools currently available for simulating the response of the global climate system to increasing greenhouse gas concentrations.” The process of simulating climate systems by using climate models is called dynamical modeling or dynamical downscaling. The nesting of dynamical modeling and simulation of climate systems is presented in Figure 1 using GCM and RCM, while a list of some popular GCMs is presented in Table 1 along with some basic information about each model.

S. No. | Name | Resolution (Degree*) | Institute/Center |
---|---|---|---|

1 | CanESM2 | 2.7906 × 2.8125 | Canadian Centre for Climate Modeling and Analysis |

2 | CCSM4 | 1.250 × 0.942 | National Center for Atmospheric Research |

3 | CESM1-CAM5 | 1.250 × 0.942 | National Center for Atmospheric Research |

4 | CMCC-CMS | 1.875 × 1.865 | Centro Euro-Mediterraneo per I Cambiamenti Climatici |

5 | CNRM-CM5 | 1.406 × 1.401 | Centre National de Recherches Meteorologiques |

6 | EC-EARTH | 1.1215 × 1.125 | Irish Centre for High-End Computing (ICHEC), European Consortium |

7 | GFDL-ESM2G | 2.500 × 2.023 | Geophysical Fluid Dynamics Laboratory |

8 | GFDL-ESM2M | 2.500 × 2.023 | Geophysical Fluid Dynamics Laboratory |

9 | INM-CM4 | 2.000 × 1.500 | Institute for Numerical Mathematics |

10 | MIROC-ESM-CHEM | 2.7906 × 2.8125 | National Institute for Environmental Studies, The University of Tokyo |

11 | MPI-ESM-LR | 1.875 × 1.865 | Max Planck Institute for Meteorology (MPI-M) |

12 | MPI-ESM-MR | 1.875 × 1.865 | Max Planck Institute for Meteorology (MPI-M) |

13 | NorESM1-M | 2.500 × 1.895 | Norwegian Climate Centre |

### 1.3. Regional climate model

RCMs are also process-based climate models and comprise complex mathematical equations like GCMs; however, these models can be run for a particular location of interest. As GCMs can be run for the whole globe, therefore, the grid size is coarser, and the information in the form of output of GCMs is at lower resolution. For impact assessment studies like impact of climate change on water resources, agriculture, urban planning, etc., we need information at higher resolution. Toward this end, we need to do regional climate modeling which provides climatic information at higher resolutions. Due to the rapid development in computational technology, the modern RCMs can be run with a resolution of 10 km or even higher resolution.

### 1.4. Evaluation of climate models

Evaluation of the process-based climate models is an important step in climate change assessment and impact assessment studies. Different methods can be used for this purpose; however, we presented a combination of advanced statistical methods for model evaluation including classical and Bayesian approaches. Evaluation of process-based climate model will give guide lines about model or scenario selection to researchers in climate changes assessment and studies related to the impact assessment of climate change in different areas. This will help researchers to use specified and representative models rather than randomly selected model and to get realistic results. Statistical bias correction is important to reduce the gap between observed and model’s simulated data and considers initial step toward climate change assessment. Meta analysis is used for scenario analysis, which assigns higher weights on the basis of precision of a particular scenario. For model (GCM in our case), we used Bayesian model averaging (BMA) technique to choose a set GCMs performing better than others. On the basis of chosen GCMs, ensemble projections were calculated using BMA technique and further evaluated using Taylor diagram along with individual GCMs’ outputs. Further details are presented in methodology section about each method and examples with discussion in results and discussion section.

### 1.5. Objectives of this study

This study aims to present statistical methods including frequentist as well as Bayesian which can be used for the evaluation of process-based models with detailed examples and discussion using the real world data.

## 2. Data and methodology

### 2.1. Data

Two types of data sets have been used in this research, observed and climate model’s simulated data. Observational data were acquired from Pakistan Meteorological Department (PMD) on daily frequency for three climate variables including maximum, minimum temperature, and precipitation for different locations of Pakistan. RCM’s simulated data were collected from COordinated Regional climate Downscaling Experiment (CORDEX), Met. Office of the United Kingdom (UK), and Global Change Impact Studies Centre (GCISC), Pakistan. For evaluation purpose, we used data for the baseline time period, which is 1960–1990 for both observed and simulated data, but for calculating ensemble projections and their evaluation, the baseline is considered as 1975–2005.

### 2.2. Statistical bias correction

According to [2], all models are wrong but some of them are useful. Climate models provide useful information; however, there are various sources of uncertainties which have influence on the outputs of these models [3, 4]. To bridge the difference between observed and model’s simulated data, we need to utilize statistical methods. In order to carry out statistical bias correction, different statistical methods were developed starting from simple to most sophisticated ones. For detailed literature about statistical bias correction methods, we refer to [5, 6]. We present the methodology of latest developed methods by [7], which preserve trend and climate extremes in future climate model’s simulations called quantile delta mapping (QDM).

A four step methodology is required to implement the QDM method, starting from the cumulative distribution function (CDF) of model projected series

To proceed to the second step, we need to find the relative change using the ratio of the inverse CDF of model predicted data applied to the CDF of model predicted data and the inverse CDF of historical observed data applied to model predicted data. Mathematically, this can be written by Eq. (2).

The quantiles of model’s predicted data

Finally, the bias corrected future projections can be obtained by applying the relative changes to the historical bias corrected data presented in Eq. (4).

### 2.3. Meta analysis

Scenario’s or model’s assessment is an essential part of climate change analysis as it provides valuable information about a particular scenario or model. Meta analysis is a statistical method which can be used for this purpose; however, it is also a useful technique to produce a combined estimate of projections from individual model outputs or different scenarios. It gives weight to each study on the basis of its precision and, consequently, provides confidence in future projections which have higher precision. Usually, researchers prefer models, scenarios, studies, laboratories’ outputs, etc., which have higher weights than those with lower weights. In order to accomplish the evaluation of models or scenarios using meta analysis, the three step methodology is explained briefly in the following subsections.

#### 2.3.1. Selection of the model

There are two basic models to perform a meta analysis: the fixed effect model (FEM) and the random effect model (REM) [9]. The FEM assumes that all the studies included in the meta analysis come from a single identical population or share a common effect (mean or average), while a REM assumes that the effects of the studies included in the meta analysis form a random sample from a population following a specified distribution. The observed effects in the FEM and REM are mathematically presented in Eqs. (5) and (6), respectively. Suppose we have k studies, and let ^{th} study effect, and

Here, ^{th} study, and the random effects

then the random effect model reduces to the fixed effect model. Model selection is mainly based on the nature and objectives of the study [9, 10, 11]. This is an important step because the remaining two steps depend on model selected. Therefore, model selection must be made carefully. This chapter presented both models for explanation, and in results and discussion section, we present an example to make clear the differences between these two models.

#### 2.3.2. Weighting schemes for parameter estimation

Different weighting schemes are available for the estimation of the effect size in meta analysis; however, it depends on the nature of the study to choose one of them [9]. We proceed with the so-called inverse-variance weighting technique for quantifying the effect size in our analysis. For details about different weighting schemes, we refer to [10]. According to [9], all the available schemes are efficient because they assign higher weights to more precise studies. In case of a fixed-effect model, the weights are calculated by Eq. (8).

where ^{th} study. In a random effect model, the weights are calculated by Eq. (9).

where ^{th} study, and

#### 2.3.3. Estimation of parameters

The next step is to estimate the unknown parameters of the specified model by incorporating the weighted least squares method given by Eq. (10).

The

where

### 2.4. Model (GCM) selection, ensemble projections, and their evaluation

In this part, Bayesian model averaging (BMA) is used for two purposes, that is, model selection and producing ensemble projections by using the outputs of selected GCMs and finally the evaluation of models’ outputs. Table 1 lists some popular GCMs with brief details about their resolution, and the institutes where each model was developed. The outputs of these models are used in the subsequent sections of this chapter. However, before embarking on this journey, it is important to discuss briefly the concept of posterior probability which is at the core of the Bayesian approach. Bayesian model averaging is discussed afterwards.

#### 2.4.1. Posterior probability

Bayes’ theorem states that the posterior probability of

In Eq. (11),

The prior distribution of a model shows the probability allocated to a statistical model. In this study, we have

Eq. (14) shows that in this case the posterior probability of a model is only determined by the likelihood of observational data. Likelihood of a model reflects the ability to reproduce a given system of observed data. Different likelihood functions have been proposed to calculate the likelihood,

#### 2.4.2. GCM selection

The rapid developments in the computational technology make it practicable to run complex process-based climate models for simulating complex climate systems of our planet; however, the output from a single model still may have uncertainties [18]. There has been a number of GCMs developed to project the future global climate change and use their output for impact assessment studies in different areas [1]. Due to different parameterization schemes of GCMs, internal atmospheric variability [19], and uncertainties in input data, different GCMs may produce quite different results. Therefore, it is important to consider more GCMs instead of relying on a single GCM. Regression models can be used to estimate the observed climate by using outputs from different GCMs as covariates. In a regression model context, the problem of uncertainty modeling has been raised by Raftery et al. [20]. In such models, covariate (GCM here) selection is a basic part to build a valid regression model, and the objective is then to find the “best” model for the response variable and a given set of predictors. The first problem to solve is which covariates should be included in the model and how important are they? Suppose we have a response variable

The same procedures are used here for GCM selection, where GCM now stands for a model *j* = 1,…, *m* = 13; and a uniform prior was used as a prior probability distribution for GCMs (

It has a strong and solid background in Bayesian statistics, and there is a rich body of literature on BMA and PIP. GCM selection was made along the following four steps:

Run BMA as presented in Eq. (15).

Calculate the posterior probability for each GCM included in all possible regression models in step 1.

Sum the posterior probabilities for each GCM from all possible models called PIP in step 2.

Decide about the models having higher PIPs.

As the criterion is probability; therefore, we prefer models with higher PIPs than those with lower PIPs. Similarly, this procedure can be used for model selection in other areas like hydrology, ecology, forestry, etc.

#### 2.4.3. Ensemble projections

Normally, it is assumed in standard regression modeling that a single model be the true model to examine the response variable given a set covariates, but other probable models could give different outcomes for the same problem at hand. The typical approach, which means conditioning on a single model supposed to be true, nevertheless, it does not take account of model uncertainties. One way is to compute an arithmetic ensemble mean (AEM) as a prediction as this could provide better results than any of the single model’s output; however, this approach gives no information about the uncertainty that the predictions have [21]. BMA overcomes this issue by estimating the regression models using all possible combinations of covariates given in Eq. (15) and then builds a weighted average model from all possible models. Thus, it provides probabilistic projections where the weights are the posterior probabilities during the training period, and these are directly tied to the performance of the models [20, 22, 23, 24]. The predictive probability density function (PPDF) of BMA of a variable of interest is the weighted average of PDFs of individual forecasts where the weights are the posterior model probabilities [20]. The performance of BMA is considered better in different areas such as ground water modeling, weather forecast, hydrological predictions, and model uncertainty analysis [25, 26, 27, 28, 29, 30, 31]. Suppose we have a set of k covariates (different GCMs’ outputs in this study), then there are

where

In Eq. (18), the predictive variance has two parts, one is the between models variance, and the other one is the within model variance [20]. The variance

Here,

#### 2.4.4. Taylor diagram

Taylor diagrams are rather sophisticated diagrams for graphical evaluation of a system, process or phenomenon. It was invented by [31] in 1996; however, it was published later in 2001 to aid researchers in comparative assessment of the performance of different models. The diagram is used to quantify the degree of correspondence between observational and modeled data sets in term of three statistics, standard deviation (SD), root mean square error (RMSE), and correlation coefficient (CC). We used the R software system to create the Taylor diagrams (a) for maximum temperature, (b) for minimum temperature, and (c) for precipitation presented in Figure 6. The data on all these variables are taken from northern Pakistan.

The interpretation of Taylor diagrams is straight forward; however, sometimes it feels tricky and needs basic understanding of statistics. The model’s performance is considered better if the modeled and observed data have strong correlation, and the modeled data have low RMSE and have closer standard deviation to that of the observational data.

## 3. Results and discussion

This section provides analysis and results about the methodology presented in Section 2. Each section presented in methodology section is explained with examples and detailed discussion.

### 3.1. Statistical bias correction

Figure 2 presents the results about evaluation of statistical bias correction methods applied to temperature and rainfall data taken from Northern Pakistan. In Figure 2, observed represents observed data, Sim-baseline is model’s simulated data for the base line period (1960–1990), and Sim-BC stands for model’s simulated data after the application of statistical bias correction techniques. For comparison, we keep the time duration of both data sets (simulated and observed) same (1960–1990). It can be seen from Figure 2 that there are marked differences between observed and regional climate model’s simulated data for temperature and precipitation in Northern Pakistan. The left panel displays maximum temperature, while the right panel is about precipitation. In both parts, original simulated results deviated from observational data; however, after the application of statistical bias correction techniques, the differences are reduced, and the pattern is followed in a better way, particularly for precipitation, where we have quite a large difference between observed and simulated precipitation data. This example is about 30 years averaged data for each month; however, these techniques can be applied to data on other frequencies like daily, hourly, etc.

### 3.2. Meta analysis

In Figure 3, climate change scenarios were analyzed, and the same procedures can be performed for model selection depending on the researcher’s objective. In Figure 3, on the left side under the heading, studies 1, 2, 3, and 4 represent the A2, B2, RCP4.5, and RCP8.5 scenarios, respectively. The former two are chosen from the fourth assessment report (AR4), while the latter are the two scenarios stem from the fifth assessment report (AR5) of the Intergovernmental Panel on Climate Change (IPCC). In this study, scenario analysis was performed for the mean difference between baseline and future time period. The subheadings total, mean, and SD under experimental and control stand for total number of observation included in a particular scenario, mean value of each scenario, and standard deviation of each scenario, respectively. Experimental and control stand for the baseline period and future time period, respectively. The thick black vertical line shows no difference between the mean values of experimental and control periods. The dotted vertical line shows the combined mean difference. Against study 1, there is outcome effect mean difference for scenario 1, similarly for other scenarios. The length of the line on each box shows the width of the confidence interval, and the size of the box (square shape) shows the weight assigned to a particular study. As the weights are assigned on the basis of precision of a scenario (in our case), therefore, the scenarios receiving higher weights have less variance and consequently exhibit shorter confidence intervals. The bigger the square box the higher the weights assigned to a particular scenario. The two diamond shapes represent combined mean differences where the upper one is for FEM, while the lower one is for REM. The width of the diamond shows the confidence interval under the selected model, and it is obvious that the width of the diamond for REM is larger than that of FEM. The reason for this difference is that REM also considered the variation between studies (Eq. (9)), while a FEM does not consider this variation (Eq. (8)). Bear in mind that two R packages “metafor” and “meta” have been used for this analysis. For details about these packages, we refer to [11, 12], respectively.

### 3.3. GCM selection and ensemble projection

#### 3.3.1. GCM selection

Figure 4 presents the output of model selection results for 13 different GCMs where the criterion of model selection is PIP. To our knowledge, this technique is used for the first time in atmospheric sciences for the purpose of GCMs selection. The model selection procedure was run multiple times (six times here) with small changes in sample size; however, no significant changes have been noted in the end results. In Figure 4, the results are presented only for maximum temperature; nevertheless, the same procedures can be used for other variables. For maximum temperature, top five selected GCMs are canESM2, CESM-CAM5, EC-EARTH, GFDL-ESM2G, and INM-CM4 in ascending order. It was investigated that different variables (minimum temperature and precipitation in our case) have different lists of top GCM; however, they shared some models in the top list. The list of top five models has maximum PIPs and can be used for further analysis rather than to use all GCMs.

#### 3.3.2. Ensemble projections

Figure 5 elaborates the results of BMAs’ outputs calculated from selected GCMs compared with observational data sets for three key climate variables, maximum temperature, minimum temperature, and precipitation for the duration of 30 years (1975–2005) considered as baseline period here. As BMA is a regression-based approach, it estimates the mean value and underestimates variation. To cope with this issue, 90% prediction intervals were calculated for each variable and plotted. In Figure 5, red, blue, and deep gray colors represent observational, BMAs’ outputs, and 90% prediction intervals for each variable. From the upper panel, it is difficult to get a clear conclusion as it is for 30-year data; however, from the bottom panel, clear information can be inferred. We can see from parts d (maximum temperature) and e (minimum temperature) that BMAs’ outputs follow the pattern nicely; however, it does not capture well the variability in both cases. The 90% prediction intervals cover almost all the variation for both variables. For precipitation, we calculated upper 90% prediction intervals, and it can be seen that it covers the observational values; however, quite a few values are still outside because precipitation is a more complex phenomenon than the temperature. Similarly, we can calculate ensemble projections from various models or climate change scenarios in hydrology, agriculture, ecology, etc. to address the uncertainty while relying on the single model’s output.

### 3.4. Taylor diagram

The green lines inside the graph (Figure 6) show RMSE between observed and modeled data sets, while the blue lines show standard deviation of each data set. The straight lines inside the graph show the correlation coefficient between modeled and observational data sets. The dots on the horizontal lines represent observed data in each part of Figure 6. Look at part (a) of Figure 6 which is about maximum temperature and the purpose is to assess which model’s simulated data is best as compared to other models. We calculated ensemble projections by using the outputs from all selected GCMs by using BMA and plotted them in each part of Figure 6. We can see that the BMA performance is superior to the performance of individual model’s output. The output of BMA has higher correlation with observed data than that of individual GCM’s outputs. The BMA’s result also has less standard deviation than individual model outputs and smaller RMSE than the individual GCMs’ outputs. The other parts of Figure 6 can be interpreted similarly. In the same way, other models can be evaluated with other variables mentioned in the above example and can help in model’s selection in different areas.

## 4. Summary and conclusion

This study aims to present statistical methodology for evaluating process-based climate models. Different techniques have been presented for this purpose including statistical bias correction, meta analysis, model selection, ensemble projections, and Taylor diagram. The application of statistical bias correction bridged regional climate model’s simulated and observational data. The performance of bias correction technique is better for temperature than precipitation; however, bias-corrected precipitation follows the observed precipitation’s pattern nicely. Meta analysis can be used for different purposes like model selection, scenario analysis, etc. In this study, meta analysis is used for scenario analysis by considering four different scenarios two each from fourth assessment report (AR4) and fifth assessment report (AR5) of the IPCC. Meta analysis shows higher confidence in RCP projections and assigned higher weights on the basis of their precision. GCM’s selection is of course important part in climate change assessment as there are many GCMs available. BMA is used for this purpose, and the results show that different variables have different ranks for different GCMs; however, they shared some GCMs in the list of best models. On the basis of selected GCM, ensemble projections were calculated using BMA technique. The results of GCMs and BMA’s outputs were then evaluated by using Taylor diagram. Evaluation statistics used in Taylor diagram are root mean square error, correlation coefficient, and standard deviation of each data set. The evaluation confirms that ensemble projections are better than individual GCMs’ outputs; nevertheless, we need to conduct this type of studies at different locations and then can make recommendations on the basis of their results.

## Glossary

AEM | arithmetic ensemble mean |

BMA | Bayesian model averaging |

CC | coefficient of correlation |

CORDEX | COordinated Regional climate Downscaling Experiment |

FEM | fixed effect model |

GCISC | Global Change Impact Studies Centre |

GCM | global circulation/climate model |

IPCC | Intergovernmental Panel on Climate Change |

probability density function | |

PIP | posterior inclusion probability |

PMD | Pakistan Meteorological Department |

PPDF | predictive probability density function |

RCM | regional climate model |

REM | random effect model |

RMSE | root mean square error |

SD | standard deviation |

UK | the United Kingdom |