Pinball loss function and maximum deviation from perfect reliability of next-day forecast in May 2013.
The high penetration of photovoltaic (PV) systems led to their growing impact on the planning and operation of actual distribution systems. However, the uncertainties due to the intermittent nature of solar energy complicate these tasks. Therefore, high-quality methods for forecasting the PV power are now essential, and many tools have been developed in order to provide useful and consistent forecasts. This chapter deals with probabilistic forecasting methods of PV system power, since they have recently drawn the attention of researchers as appropriate tools to cope with the unavoidable uncertainties of solar source. A new multi-model probabilistic ensemble is proposed; it properly combines a Bayesian-based and a quantile regression-based probabilistic method as individual predictors. Numerical applications based on actual irradiance data give evidence of the probabilistic performances of the proposed method in terms of both sharpness and calibration.
- smart grids
- distributed power generation
- forecasting methods
- forecast uncertainty
- photovoltaic systems
Several kinds of distributed energy resources currently are involved in modern electrical distribution system development, allowing to enhance the overall system efficiency and to reduce overall greenhouse gas emissions. However, the integration of distributed energy resources into power networks is a challenging task in the view of their planning, management, and operation; thus, new research contributions are strongly encouraged in this area [1–3].
Photovoltaic (PV) and wind power plants are acknowledged to bring technical, environmental, and economic benefits to power systems, and their diffusion has straightforwardly grown during past years. Unfortunately, the intermittent and random nature of both solar and wind energy negatively affects the efficient, reliable and secure operation of electrical power systems. Then, accurate methods for forecasting wind and PV power generation, as well as appropriate measures to quantify the goodness of the previsions in both technical and economic terms, are mandatory. In particular, forecasting methods should be fitted to operate on different time horizons, as they are involved in several real-time, scheduling, and planning power system tasks; also, since electrical power has become a necessity with a specific and strongly variable value, the economic impact of forecasts cannot be neglected.
More in detail, with the deregulation of energy market in the 1990s and the widespread dissemination of renewable generation at the beginning of the twenty-first century, the use of performing tools for load and generation power forecasting is becoming more and more important from System Operators to Electric Utilities and Energy Traders, Independent Power Producers and Consumers [4–8]. Accurate forecasting of renewable generation also helps industrial customers/prosumers to better control their operational processes, thanks to demand response activities and electrical storage system use, and schedule and plan appropriate energy market bidding and maintenance strategies.
Forecasts are also needed when the solar/wind power producers do not participate directly in the markets, as is the case in some countries. For example, Italian wind/solar producers are not yet allowed to participate directly in the shorter time-period markets for deviations and adjustments (short-term balancing markets) and deliver instead their powers to the “Gestore dei Servizi Energetici” (GSE), the Italian state-owned company that promotes and supports renewable energy sources. GSE, in turn, sells the wind/solar power at the day-ahead market and, then, needs accurate forecasting tools for optimizing offers . However, Italian Authority started a test program in June 2016 in order to allow renewable producers to participate also to dispatching markets  within the end of 2018. Thus, the requirements in terms of forecast performances will surely increase during this period.
Many deterministic and probabilistic methods for forecasting wind and PV power have been proposed in the relevant literature [11–15]. The outputs of deterministic methods are single values of power, and no further information on the uncertainty of the prediction is provided. Nowadays, the development of probabilistic tools is strongly encouraged, since they completely address the unavoidable uncertainties related to wind and solar source; this facilitate the operators’ decisions, especially in risk-related tasks such as electrical market bidding [16, 17]. Two subcategories of probabilistic methods can be distinguished: the first is based on an underlying deterministic model and provides the uncertainty of the error usually expressed in terms of prediction intervals or quantiles, while the second is based on a direct approach that directly provides the predictive probabilistic representation of the wind or PV power (i.e. through a predictive probability density function or cumulative distribution function).
Probabilistic forecasts can be provided by either a single predictor or through a convenient combination of multiple deterministic or probabilistic predictors. The latters are known in the relevant literature as “ensemble forecasts” and are expected to perform better than each of the single predictors .
This chapter deals with the problem of direct probabilistic forecasting of PV power. A new multi-model ensemble forecasting method (MEM) is proposed and is used to properly combine two probabilistic base predictors. The base predictors are a Bayesian-based method (BM) and a quantile regression-based method (QM); both were successfully used for the forecasting of PV power in the relevant literature [18–21]. Numerical applications were performed to validate the method on the basis of actual solar measurements; the performances of the proposed method are quantified numerically in terms of a proper score [i.e. the pinball loss function (PLF)] and graphically through diagrams (PIT histograms and reliability diagrams), accounting for probabilistic reliability and probabilistic sharpness of forecasts.
The key result of this chapter is, then, the proposal of a new ensemble probabilistic method for PV power forecasting, based on the aggregation of two probabilistic methods. Outputs of the single base predictors were processed and combined through the application of a linear pool technique [22–24] based on the minimization of the PLF. The proposed method seems particularly useful for both forecasters and forecast users being the method characterized by good reliability and sharpness, for a wide range of short-term forecasting intervals.
The remainder of the chapter is organized as follows. Base predictors are briefly recalled in Section 2, and also the proposed MEM for PV power forecasting is shown in Section 2. Numerical applications are shown in Section 3, Section 4 provides our conclusions, and some definitions about the forecast properties are explained in Appendix.
2. Probabilistic forecasting methods
Various methods were proposed for the forecasting of PV power, and the relevant literature shows a wide range of papers dealing with this subject. The majority of the proposed methods are deterministic in nature even though recently a great attention was paid to the probabilistic forecast that is object of interest in this chapter.
Probabilistic forecast usually takes the form of a predictive probability density function and has the general goal of maximizing the sharpness of the predictive distribution, subject to calibration (also addressed as reliability) [25, 26]. Sharpness is related to the concentration of the predictive probabilities and is an intrinsic property of the forecast alone, while calibration corresponds to the probabilistic correctness of the forecasts, i.e. refers to the statistical consistency between probabilistic forecasts and observations. Reliability can be assessed via reliability diagrams or probability integral transform (PIT) histograms. Sharpness, in the case of density forecasts for a real-value variable, can be assessed in terms of the associated prediction intervals (see Appendix). Proper scoring rules, - such as the continuous ranked probability score (CRPS) and the PLF (Appendix), can be used for assessing both sharpness and calibration simultaneously.
In this section, we first briefly recall the BM and QM, as we use them as probabilistic base predictors in the new MEM; then, we present the MEM with extensive details.
In the following, we suppose that the forecaster performs the PV power forecast at time for the time horizon , with being the lead time (1, 2, …, 24, … h) (Figure 1).
2.1. The Bayesian forecasting method
This method is based on a two-step procedure:
Step 1—An analytic PDF is selected to model the randomness of PV power.
Step 2—The parameters of the PDF selected at Step 1 are predicted.
In the relevant literature , the normalized Beta distribution was suggested as an adequate PDF to characterize the solar irradiance random variable. When the PV generation system is equipped with a maximum power point tracker (MPPT), the output active power of the PV system at hour, is a linear function of the total irradiance (in kW/m2) at hour on a surface with an inclination to the horizontal plane [28, 29]:
where is the surface area of the panel array (in m2), and is the total efficiency of the PV system. As the relationship (Eq. 1) is linear, also the PV power can be modelled through a Beta distribution; consequently, the normalized Beta distribution for the PV power at the time horizon is:
where are the shape parameters, is the maximum value of power produced by the PV installation (is therefore defined over the range [0, ]), and is the Beta function (see Appendix for its mathematical explanation). The maximum value of power produced by the PV installation is assumed known. The shape parameter in Eq. (2) can be expressed as a function of the mean value and of the shape parameter of the Beta PDF, i.e.:
Then, a re-parameterization of the PDF (Eq. 2) in terms of leads to the knowledge of the predictive PDF if the mean value and the shape parameter are known for each time horizon of interest.
The mean value can be estimated through a time series model, which links the mean value of PV power at hour to the measurements of, respectively, the last known values of PV power and of other exogenous inputs; for example, assuming the cloud cover and air temperature , which are collected until the hour , as exogenous inputs, it is:
where are the coefficients of model. These coefficients can be estimated by solving a least square minimization problem in the forecasting training period, assuming the realizations of the random variable to be known.
Then, the remaining unknown shape parameter in the PDF of PV power given by Eq. (2) is estimated in the Bayesian inference framework.
be the vector the elements of which are the measurements of PV power observed until the time, , i.e. ;
be the assigned prior distribution of the unknown shape parameter with , i.e. the vector of its hyper-parameters -; and
The posterior predictive distribution of PV power, i.e. the desired forecasted probability density function, can be calculated through the total probability theorem as:
The Bayesian inference of PV power measurements on the prior distribution allows the computation of the posterior distribution of the shape parameter in Eq. (5), as follows:
where is the likelihood function of the samples in , given the shape parameter , as follows:
Unfortunately, since is a Beta distribution, no conjugate prior distribution for can be found analytically; then, Eq. (6) and therefore Eq. (5) cannot be calculated in closed form. However, the un-normalized posterior distribution of the scale parameter can always be provided; it is given by:
Once known the un-normalized posterior distribution given by Eq. (8), two sampling methods (i.e. the Metropolis-Hastings (MH) algorithm and the Gibbs algorithm [30, 31]) are commonly used in the literature to obtain samples of the posterior distribution ). The MH algorithm was used in the numerical applications of this chapter. Eventually, once samples from the posterior distribution are obtained, straightforwardly the samples of the searched posterior PDF of PV power can be drawn.
2.2. The quantile regression forecasting method
The BM shown in the preceding subsection requires that an analytic PDF of PV power is selected in order to estimate the parameters of the selected PDF for the desired time horizon. To avoid any assumptions on the density functions, one might restrict attention to estimating only a finite number of quantiles of the distribution [20, 21, 32]. These quantiles can be estimated using historical data in the QM framework. Indeed, the input of the model is the column vector of explanatory variables linked to the measurements of PV power and other meteorological quantities. - In the most general form, the -quantile of PV power at the time horizon of the forecast can be estimated through a linear regression as:
where is a row vector of coefficients to be estimated, and is a residual white noise at time . Starting from Eq. (9), the expected value of is given by:
and, then, the problem of quantile estimation reduces to find an estimation of the row vector .
If the data set of past hourly measurements of PV power and the corresponding vectors of explanatory variables are available for the given forecasting training period, can be obtained by solving the following minimization problem:
where each value is calculated as:
The solution of Eq. (11) can be found in the least square framework; also, an effective solution to problem (Eq. 11) was proposed in Ref. . Once the selected quantiles of PV power are estimated, the predictive CDF can be obtained through linear interpolation.
2.3. The multi-model ensemble method
Pierre-Simon Laplace came up in 1818 with the idea of merging different forecasts into a new, combined forecast; obviously, the combined forecast should provide a mean error that is lower than the error of each constituent individual forecast. This is possible since individual forecasts usually contain some independent information that can be exploited in a convenient combination. Bates and Granger then followed up in 1969 with a paper that set standards for the combination of forecasts. After that, hundreds of relevant studies have been carried out and applied to a variety of fields of research, including economics, management, systematics, biomedicine, meteorology, and climatology [13, 33, 34].
Among the large number of possible ensemble methods, in this subsection, we propose a new multi-model competitive ensemble forecast method that consists in training the two probabilistic predictors shown in the previous subsections, and in processing their outputs in such a way to guarantee adequate sharpness and calibration of the final forecast.
The proposed method consists in three steps (Figure 2):
Step 1—Collection of the input data: available measurements must be collected and, if necessary, processed in order to form the data that are used as inputs.
Step 2—Selection and running of base predictors: the base probabilistic models are chosen on the basis of the forecaster’s experience and the available input data sets; then, the models are utilized for a training period.
Step 3—Processing of the outputs of the single base predictors: the outputs of all of the base predictors are aggregated by applying adequate and robust criteria to obtain an ensemble forecast that operates better than any of the base predictors guaranteeing adequate levels of reliability and sharpness of forecasts.
The selection of the base predictors (Step 2) and the processing of the outputs of the single base predictors to obtain a good performing ensemble (Step 3) are obviously particularly critical.
In the proposed MEM, the chosen base predictors are the BM and QM of Sections 2.1 and 2.2. Indeed, since diversity is a key feature in competitive ensemble forecasting, it was experimented that these two predictors return usually different decisions, and then they could provide a good improvement in terms of their ensemble.
With reference to Step 3, as shown in Ref. [22, 24], the linear pooling of base probabilistic predictors can be suitable for the aggregation. Other approaches (e.g. in the Bayesian framework or through the logarithmic pooling [13, 22, 35]) were also considered in the relevant literature.
The linear pooling is performed on the cumulative distribution functions (CDFs) obtained through the base predictors, since they are easier to manage when different kinds of predictors have to be aggregated. The output of the MEM is the predictive CDF , i.e. the linear combination of two predictive CDFs , of the PV power for the horizon time . It results in the following weighted sum:
where the weights are conveniently selected in order to guarantee that the output function is indeed a CDF defined in the interval . To achieve that, each weight must be non-negative () and their sum must be unitary (). Note that the assumption of two base predictors was made with no loss of generality. Indeed, the same procedure could be applied for a larger number of predictors, i.e. the predictive CDF would be a weighted sum of base predictive CDFs with weights and .
The estimation of the weight(s) in the linear pooling MEM should be aimed to produce and ensemble forecast that performs better than any base predictors, maximizing the sharpness subject to calibration. In the relevant literature, weights were estimated by minimizing the CRPS in the forecasting training period [36, 37]. Also, as LPE predictions may be over-dispersed, - if single predictors are neutrally dispersed, some techniques shown in Ref. [22, 39] (e.g. Beta-transformation or multi-objective procedures) may be applied to overcome this problem.
In this chapter, we use a further proper score, i.e. the PLF, which penalizes for observations lying far from a given quantile.
Thus, the estimation of weight in Eq. (13) is the solution of the following minimization problem:
where is the PLF at the th hour for the -quantile, as defined in (A.6), and is the total number of considered quantiles. The estimation of is then trivially obtained as , for the assigned weight properties.
3. Numerical applications
Values of solar irradiance were measured from 1 January 2012 to 31 December 2013 at the National Renewable Energy Laboratory in USA (39.74° N lat., 105.18° W long.) . Measurements were collected with a 1-min resolution but were then averaged and pre-processed in order to obtain 17544 hourly values with no outliers or bad data. Also measurements of cloud cover and air temperature were selected in order to be used as exogenous variables in the base predictors forecasting procedures. The rated power of the considered PV installation was set to = 110 kW.
Probabilistic forecasts were performed through the multi-model ensemble method presented in Section 2.3 in order to validate the usefulness of the procedure. Forecasts were performed for several lead times , and results for hours (next day forecast) are initially shown with extensive details in this section; then, results for hour (next hour forecast) are also provided, but with less details, for the sake of conciseness. In particular, the PV power output was forecasted from February to December 2013 (11 months of forecasting), and results for May 2013 are shown below. In this case, the interval used to train the base predictor methods was made of eleven months (from May 2012 to March 2013), while the calibration of base predictors and the choice of weights of the multi-model ensemble method were performed in the following month (April 2013).
The proposed multi-model ensemble method is compared to both probabilistic base predictors and also to a benchmark based on the probabilistic persistence method (PPM) , in order to verify its usefulness. The comparison is performed numerically in terms of PLF and graphically through the inspection of PIT histograms  and reliability diagrams . Also, the maximum deviation from perfect reliability (MDPR; see Appendix) is considered in order to compare different forecasts. In all numerical applications, night-time hours were not considered for forecast, as the total PV power output was set equal to zero, and therefore, they were not considered in indices and diagrams evaluation.
Table 1 shows the results in terms of PLF and MDPR for BM, QM, MEM, and PPM, and Table 2 shows the corresponding estimated weights obtained in the MEM procedure for the next-day forecast. From the analysis of Table 1, the proposed method appears to improve the performances of base predictors. Indeed, the PLF decreases by about 2% and also the MDPR is reduced by about 2 and 1.4% with respect to the BM and QM, respectively, with a relative reduction of 50 and 40%, respectively. Base predictors and the proposed MEM outperformed the PPM benchmark in terms of PLF by about 22–23%. Also, even if the performances in terms of PLF are quite similar for BM and QM, they are weighted differently in the MEM, with a prevalence of QM as shown in Table 2.
|Estimated weight (–)||0.219||0.781|
Selected base predictors are acknowledged in the relevant literature as very competitive forecasting tools. Then, also a not impressive reduction in terms of PLF is a valuable contribution toward well-performing methods for PV power forecasting. Further improvements could be obtained by merging more base probabilistic predictors in the MEM.
We outline that MDPR gives a rough evaluation on the performances of probabilistic methods in terms of reliability, but it is important to evaluate also how the probabilistic method performs in each individual quantile. Thus, reliability diagrams for base predictors, MEM and PPM are shown in Figure 3, while PIT histograms show the relative frequencies of these methods in Figure 4. From the graphical inspection of Figure 3, base predictors appear to perform well in terms of reliability, especially for higher quantiles and with only a little deviation in lower quantiles. The MEM also shows good performances, as estimated coverages are very close to the ideal ones on overall. PPM instead appears to provide under-dispersed forecasts.
This trend is also confirmed by the inspection of Figure 4, as base predictors and MEM appear to be normally dispersed, i.e. a necessary condition for the overall reliability. The under-dispersion trend seen in the PPM reliability diagram is confirmed in the graphical inspection of PPM PIT histogram.
As a further comment on numerical simulations performed from February to December 2013, we note that the MEM was able to outperform both base predictors in terms of PLF in 9 months on 11; in the other 2 months, the results are only slightly worse. A slight prevalence of QM weight was observed with respect to the BM weight. In 4 months, the resulting ensemble forecast was affected by a slight over-dispersion, due to the normal-dispersion of base predictors; this will need some techniques to be developed in order to overcome this problem.
Finally, also simulations for different lead times were performed. In particular, the results for hour are particularly significant for comparison. In this case, only in 5 months the weight of BM was different from zero. This was due the different behaviour of BM and QM for next-hour forecasting, as the performances of QM were particularly better than BM in terms of PLF. In three of these five months the proposed MEM led to a better forecast in terms of PLF than both base predictors, and only in one month the problem of over-dispersion of ensemble forecasts was detected.
As an example, Table 3 shows the results in terms of PLF and MDPR for BM, QM, MEM, and PPM, and Table 4 shows the corresponding estimated weights obtained in the MEM procedure during March 2013 for next-hour forecast. The PLF performance differences are very clear between BM and QM; however, the BM weight is still not negligible and the MEM has better performances than both predictors in terms of PLF, thus leading to a greater value of MDPR.
|Estimated weight (–)||0.186||0.814|
In this chapter, we dealt with the problem of short-term forecasting of PV power in electrical power systems. In the frame of the smart grid paradigm, the needing of accurate, reliable, and sharp probabilistic forecasting is particularly enhanced for industrial operators that are interested in optimally manage their grids and actively participating to liberalized electricity markets. As well-known, probabilistic forecasts can be obtained from single probabilistic predictors or from an ensemble of multiple probabilistic or deterministic predictors. The variability of information available in different predictors would likely contribute to produce forecasts that are better, or at least as good as one of the base predictors.
In this chapter, a proposal of a MEM based on BM and QM predictors was presented, and its effectiveness was evaluated through a large number of numerical applications based on actual irradiance data. The ensemble forecasts were constructed by the application of the linear pooling technique, through a minimization procedure that aims to minimize the PLF, that is a proper score. Results of the numerical applications proved the usefulness of the procedure on actual data, thus sometimes leading to a slight over-dispersion of resulting forecasts. Significantly, better forecasts were obtained for the next day while less significant performances were obtained in case of next hour forecast. The above problems will surely be addressed in new researches.
A.1. Probabilistic forecast indices and properties
Two major requirements must be met simultaneously by all probabilistic forecasts, i.e. the forecasts must be sharp and also calibrated (or equivalently reliable) [25, 26], as defined at beginning of Section 2. Note that sharpness and reliability are not distinct one each other, as one property significantly influences the other, and vice versa.
Sharpness is a property of the forecast alone, as the realization of the random variable is not involved in its definition. Sharpness, in the case of forecasts for a real-value variable, can be easily assessed in terms of the associated prediction intervals. The narrower the intervals, the better is the forecast (if the corresponding coverage is, however, coherent, as shown in the following subsection ). Usually, 50, 90, 95 and 99% prediction intervals are considered for probabilistic forecasting. Prediction intervals can be easily extracted from a forecasted predictive distribution.
Reliability is a property of the probabilistic forecast and of the realization. It involves the correspondence between estimated coverages and actual coverages.
Indeed, let us suppose that a 50% prediction interval is provided for a random variable; the forecast is therefore considered reliable if the observation of the random variable lies in that interval with probability 0.5 for the given time horizon.
The same property can be defined also for predictive quantiles; e.g. if the 0.5-quantile (median) is predicted for a given horizon time, the realizations should be equal or lower than the 0.5-quantile in 50% of cases [20, 21, 32].
Reliability diagrams are very effective tools to evaluate the reliability of a probabilistic method [21, 42, 44]; they show the estimated coverage versus the nominal one, for various nominal coverage values (usually from 0.05 to 0.95, with a 0.05 step, or from 0.1 to 0.9 with a 0.1 step).
The estimated coverages can be found from a predictive distribution in a very intuitive manner. Let be the forecasted -quantile extracted from the forecasted distribution of the random variable at the desired time horizon . The indicator is defined from the comparison between the actual value and the forecasted quantile , as follows:
and, consequently, the estimation of the actual coverage based on a set of forecasts is:
Obviously, the probabilistic forecasting method is considered reliable if the estimated coverages do not significantly differ from the nominal ones. A necessary condition for the probabilistic calibration is the normal dispersion of forecasts, and this results in a reliability curve that is close to the 45° diagonal line (representing the ideal reliability). Instead, over-dispersed forecasts (usually due to lack of sharpness) result in an inverse S-shaped reliability curve, while under-dispersed forecasts (usually due to too much sharpness) result in a S-shaped reliability curve. Biased forecasts are easily recognized, as the corresponding reliability diagrams strongly differ from perfect curve. Figure A1 shows examples of reliability diagrams for reliable, over-dispersed, under-dispersed and biased forecasts.
The MDPR is straightforwardly defined as the maximum error between estimated coverages and nominal coverages; i.e.:
Also PIT histograms [25, 42] can be used to empirically check the calibration of forecasts. In these histograms, the PIT values - are plotted: for a probabilistically calibrated forecast, the PIT histogram is statistically uniform. Even if the uniformity of PIT histograms is a necessary, but not sufficient condition for the forecast to be perfect , from the behaviour of PIT histograms, can be derived useful information; in particular, U-shaped histograms indicate under-dispersed predictive distributions as well as inverse U-shaped histograms suggest that the predictive distributions are over-dispersed. Biased predictive distributions have a very irregular PIT histograms. Figure A2 shows examples of PIT histograms for reliable, over-dispersed, under-dispersed, and biased forecasts.
Anyway, formal tests of the hypothesis that a given forecasting method is probabilistically calibrated are also available, provided that these tests account for complex dependence structures. The reader can refer to the specialized literature to deepen this subject [21, 25].
A.1.3. Proper scores
Probabilistic forecasts can be assessed numerically through the evaluation of proper scores . Two of the most common and versatile proper scores are the PLF and the CRPS, simultaneously addressing both calibration and sharpness [27, 45].
In practice, the CRPS compares the predictive distribution with the observation, both in terms of cumulative distribution functions. In particular, the CDF of the observation is a Heaviside function centred in the observation , and the CRPS probabilistically accounts for the error area between predictive and actual CDFs (Figure A3).
Indeed, let centred in be the cumulative distribution function of the actual value of PV power and be the predictive CDF at the time horizon ; hourly CRPS can be evaluated as follows:
From the analysis of Eq. (A.4), it clearly appears that the CRPS is linked to the total area between the predictive CDF and the Heaviside function. It can be seen that the area (and, consequently, the ) decreases as the predictive distribution approximates the step function. The calculation of the will result in a value that has the units of the forecast variable. For a total number of forecasts, the average CRPS is:
and it can be interpreted as a probabilistic version of the mean absolute error .
The PLF is another widely used proper score [14, 27, 46]. As we defined to be the -quantile extracted from the predictive distribution of PV power at the desired time horizon and to be the corresponding actual value of PV power, the PLF is defined as follows:
Summing up the PLFs across all considered quantiles and averaging them throughout the forecast horizon, the PLF of the corresponding probabilistic forecasts is obtained.
A.2. Gamma and beta functions
The Gamma function of the complex number , with , is defined as follows:
The Beta function of two complex numbers , with , is defined as follows:
and its equivalent form in terms of Gamma function is:
- Given an observation of a random variable extracted from a distribution F, a score for this observation is defined “proper” if its maximum (or minimum depending on the nature of the score) value is obtained when the probabilistic forecast is the distribution F. The score is defined “strictly proper” if all of its values are lower (or higher) than its maximum (or minimum) when the probabilistic forecast is a distribution G ≠ F .
- Prior parameter knowledge is expressed through corresponding prior distributions, whose parameters are usually called hyper-parameters, and they are denoted by an upper hyphen. The choice of prior distributions depends on the degree of prior confidence the forecaster puts into each parameter. Uninformative prior distributions are selected when no or few prior knowledge is available on the specific parameters; Jeffreys distribution was specifically proposed for such kind of Bayesian applications, but also uniform distributions or Normal distributions with large variance have been used. Instead, when a significant prior knowledge is available, informative prior distributions can be selected (e.g., Normal distribution with small variance) [30, 31].
- In a regression model, the inputs are usually called “independent variables” and the output is usually called “dependent variable”. However, the term “explanatory variables” is here preferred to the term “independent variables” in order to avoid confusion, as it is possible to have two (or more) input variables that are not independent on each other (e.g., some regression models use measurements of both temperature and squared temperature).
- Neutral dispersion is a necessary but not sufficient condition for high-quality forecasts. Definitions of over-dispersion, neutral dispersion, and under-dispersion are provided in Definition 2.6 of Ref. .
- In a nutshell, the PIT is the value that the predictive CDF attains at the observation, with suitable adaptions at any points of discontinuity .