A New Ensemble Probabilistic Method for Short-Term Photovoltaic Power Forecasting A New Ensemble Probabilistic Method for Short-Term Photovoltaic Power Forecasting

The high penetration of photovoltaic (PV) systems led to their growing impact on the planning and operation of actual distribution systems. However, the uncertainties due to the intermittent nature of solar energy complicate these tasks. Therefore, high-quality methods for forecasting the PV power are now essential, and many tools have been developed in order to provide useful and consistent forecasts. This chapter deals with probabilistic forecasting methods of PV system power, since they have recently drawn the attention of researchers as appropriate tools to cope with the unavoidable uncer- tainties of solar source. A new multi-model probabilistic ensemble is proposed; it properly combines a Bayesian-based and a quantile regression-based probabilistic method as individual predictors. Numerical applications based on actual irradiance data give evidence of the probabilistic performances of the proposed method in terms of both sharpness and calibration.


Introduction
Several kinds of distributed energy resources currently are involved in modern electrical distribution system development, allowing to enhance the overall system efficiency and to reduce overall greenhouse gas emissions. However, the integration of distributed energy resources into power networks is a challenging task in the view of their planning, management, and operation; thus, new research contributions are strongly encouraged in this area [1][2][3].
Photovoltaic (PV) and wind power plants are acknowledged to bring technical, environmental, and economic benefits to power systems, and their diffusion has straightforwardly grown during past years. Unfortunately, the intermittent and random nature of both solar and wind energy negatively affects the efficient, reliable and secure operation of electrical power systems. Then, accurate methods for forecasting wind and PV power generation, as well as appropriate measures to quantify the goodness of the previsions in both technical and economic terms, are mandatory. In particular, forecasting methods should be fitted to operate on different time horizons, as they are involved in several real-time, scheduling, and planning power system tasks; also, since electrical power has become a necessity with a specific and strongly variable value, the economic impact of forecasts cannot be neglected.
More in detail, with the deregulation of energy market in the 1990s and the widespread dissemination of renewable generation at the beginning of the twenty-first century, the use of performing tools for load and generation power forecasting is becoming more and more important from System Operators to Electric Utilities and Energy Traders, Independent Power Producers and Consumers [4][5][6][7][8]. Accurate forecasting of renewable generation also helps industrial customers/prosumers to better control their operational processes, thanks to demand response activities and electrical storage system use, and schedule and plan appropriate energy market bidding and maintenance strategies.
Forecasts are also needed when the solar/wind power producers do not participate directly in the markets, as is the case in some countries. For example, Italian wind/solar producers are not yet allowed to participate directly in the shorter time-period markets for deviations and adjustments (short-term balancing markets) and deliver instead their powers to the "Gestore dei Servizi Energetici" (GSE), the Italian state-owned company that promotes and supports renewable energy sources. GSE, in turn, sells the wind/solar power at the day-ahead market and, then, needs accurate forecasting tools for optimizing offers [9]. However, Italian Authority started a test program in June 2016 in order to allow renewable producers to participate also to dispatching markets [10] within the end of 2018. Thus, the requirements in terms of forecast performances will surely increase during this period. Many deterministic and probabilistic methods for forecasting wind and PV power have been proposed in the relevant literature [11][12][13][14][15]. The outputs of deterministic methods are single values of power, and no further information on the uncertainty of the prediction is provided. Nowadays, the development of probabilistic tools is strongly encouraged, since they completely address the unavoidable uncertainties related to wind and solar source; this facilitate the operators' decisions, especially in risk-related tasks such as electrical market bidding [16,17]. Two subcategories of probabilistic methods can be distinguished: the first is based on an underlying deterministic model and provides the uncertainty of the error usually expressed in terms of prediction intervals or quantiles, while the second is based on a direct approach that directly provides the predictive probabilistic representation of the wind or PV power (i.e. through a predictive probability density function or cumulative distribution function).
Probabilistic forecasts can be provided by either a single predictor or through a convenient combination of multiple deterministic or probabilistic predictors. The latters are known in the relevant literature as "ensemble forecasts" and are expected to perform better than each of the single predictors [13]. This chapter deals with the problem of direct probabilistic forecasting of PV power. A new multi-model ensemble forecasting method (MEM) is proposed and is used to properly combine two probabilistic base predictors. The base predictors are a Bayesian-based method (BM) and a quantile regression-based method (QM); both were successfully used for the forecasting of PV power in the relevant literature [18][19][20][21]. Numerical applications were performed to validate the method on the basis of actual solar measurements; the performances of the proposed method are quantified numerically in terms of a proper score [i.e. the pinball loss function (PLF)] and graphically through diagrams (PIT histograms and reliability diagrams), accounting for probabilistic reliability and probabilistic sharpness of forecasts.
The key result of this chapter is, then, the proposal of a new ensemble probabilistic method for PV power forecasting, based on the aggregation of two probabilistic methods. Outputs of the single base predictors were processed and combined through the application of a linear pool technique [22][23][24] based on the minimization of the PLF. The proposed method seems particularly useful for both forecasters and forecast users being the method characterized by good reliability and sharpness, for a wide range of short-term forecasting intervals.
The remainder of the chapter is organized as follows. Base predictors are briefly recalled in Section 2, and also the proposed MEM for PV power forecasting is shown in Section 2. Numerical applications are shown in Section 3, Section 4 provides our conclusions, and some definitions about the forecast properties are explained in Appendix.

Probabilistic forecasting methods
Various methods were proposed for the forecasting of PV power, and the relevant literature shows a wide range of papers dealing with this subject. The majority of the proposed methods are deterministic in nature even though recently a great attention was paid to the probabilistic forecast that is object of interest in this chapter.
Probabilistic forecast usually takes the form of a predictive probability density function and has the general goal of maximizing the sharpness of the predictive distribution, subject to calibration (also addressed as reliability) [25,26]. Sharpness is related to the concentration of the predictive probabilities and is an intrinsic property of the forecast alone, while calibration corresponds to the probabilistic correctness of the forecasts, i.e. refers to the statistical consistency between probabilistic forecasts and observations. Reliability can be assessed via reliability diagrams or probability integral transform (PIT) histograms. Sharpness, in the case of density forecasts for a real-value variable, can be assessed in terms of the associated prediction intervals (see Appendix). Proper scoring rules, 1 such as the continuous ranked probability 1 Given an observation of a random variable extracted from a distribution F, a score for this observation is defined "proper" if its maximum (or minimum depending on the nature of the score) value is obtained when the probabilistic forecast is the distribution F. The score is defined "strictly proper" if all of its values are lower (or higher) than its maximum (or minimum) when the probabilistic forecast is a distribution G≠F [27]. score (CRPS) and the PLF (Appendix), can be used for assessing both sharpness and calibration simultaneously.
In this section, we first briefly recall the BM and QM, as we use them as probabilistic base predictors in the new MEM; then, we present the MEM with extensive details.
In the following, we suppose that the forecaster performs the PV power forecast at time t ¼ h−k for the time horizon t ¼ h, with k being the lead time (1, 2, …, 24, … h) ( Figure 1).

The Bayesian forecasting method
The BM provides a predictive PDF of the PV power applying the Bayesian inference of a data set of past observations [18,19].
This method is based on a two-step procedure: Step 1-An analytic PDF is selected to model the randomness of PV power.
Step 2-The parameters of the PDF selected at Step 1 are predicted.
In the relevant literature [28], the normalized Beta distribution was suggested as an adequate PDF to characterize the solar irradiance random variable. When the PV generation system is equipped with a maximum power point tracker (MPPT), the output active power of the PV system at hour, P PVt , is a linear function of the total irradiance I β t (in kW/m 2 ) at hour t on a surface with an inclination β to the horizontal plane [28,29]: where S C is the surface area of the panel array (in m 2 ), and η is the total efficiency of the PV system. As the relationship (Eq. 1) is linear, also the PV power can be modelled through a Beta distribution; consequently, the normalized Beta distribution for the PV power at the time horizon t ¼ h is: where σ h , ϕ h are the shape parameters, P r is the maximum value of power produced by the PV installation (P h is therefore defined over the range [0, P r ]), and BðÁ, ÁÞ is the Beta function (see Appendix for its mathematical explanation). The maximum value of power P r produced by the PV installation is assumed known. The shape parameter ϕ h in Eq. (2) can be expressed as a function of the mean value m BMh and of the shape parameter σ h of the Beta PDF, i.e.: Then, a re-parameterization of the PDF (Eq. 2) in terms of m BM h , σ h leads to the knowledge of the predictive PDF if the mean value m BM h and the shape parameter σ h are known for each time horizon of interest.
The mean value m BM h can be estimated through a time series model, which links the mean value m BM h of PV power at hour h to the measurements of, respectively, the last U known values of PV power and of other exogenous inputs; for example, assuming the cloud cover cc and air temperature at, which are collected until the hour t ¼ h−k, as exogenous inputs, it is: where γ 0 , …, γ U , δ 1 , …, δ 2U are the 3U þ 1 coefficients of model. These coefficients can be estimated by solving a least square minimization problem in the forecasting training period, assuming the realizations of the random variable to be known.
Then, the remaining unknown shape parameter σ h in the PDF of PV power given by Eq. (2) is estimated in the Bayesian inference framework. Let: i. P M h, k be the vector the elements of which are the G measurements of PV power observed until the time, ii. pðσ h jz σ Þ be the assigned prior distribution of the unknown shape parameter σ h with z σ ¼ {z 1σ , …, z HPσ }, i.e. the vector of its hyper-parameters 2 ; and iii. pðP h jσ h Þ be the PDF (Eq. 2) in which the mean value m BMh is assigned and given by Eq. (4).
The posterior predictive distribution pðP h jP M h, k , z σ Þ of PV power, i.e. the desired forecasted probability density function, can be calculated through the total probability theorem as: 2 Prior parameter knowledge is expressed through corresponding prior distributions, whose parameters are usually called hyper-parameters, and they are denoted by an upper hyphen. The choice of prior distributions depends on the degree of prior confidence the forecaster puts into each parameter. Uninformative prior distributions are selected when no or few prior knowledge is available on the specific parameters; Jeffreys distribution was specifically proposed for such kind of Bayesian applications, but also uniform distributions or Normal distributions with large variance have been used. Instead, when a significant prior knowledge is available, informative prior distributions can be selected (e.g., Normal distribution with small variance) [30,31].
The Bayesian inference of PV power measurements P M h, k on the prior distribution pðσ h jz σ Þ allows the computation of the posterior distribution pðσ h jP M h, k , z σ Þ of the shape parameter σ h in Eq. (5), as follows: where pðP M h, k jσ h Þ is the likelihood function of the samples in P M h, k , given the shape parameter σ h , as follows: Unfortunately, since pðP h jσ h Þ is a Beta distribution, no conjugate prior distribution for σ h can be found analytically; then, Eq. (6) and therefore Eq. (5) cannot be calculated in closed form. However, the un-normalized posterior distribution qðσ h jP M h, k , z σ Þ of the scale parameter σ h can always be provided; it is given by: Once known the un-normalized posterior distribution qðσ h jP M h, k , z σ Þ given by Eq. (8), two sampling methods (i.e. the Metropolis-Hastings (MH) algorithm and the Gibbs algorithm [30,31]) are commonly used in the literature to obtain samples of the posterior distribution pðσ h jP M h, k , z σ ). The MH algorithm was used in the numerical applications of this chapter. Eventually, once samples from the posterior distribution are obtained, straightforwardly the samples of the searched posterior PDF pðP h jP M h, k , z σ Þ of PV power can be drawn.

The quantile regression forecasting method
The BM shown in the preceding subsection requires that an analytic PDF of PV power is selected in order to estimate the parameters of the selected PDF for the desired time horizon.
To avoid any assumptions on the density functions, one might restrict attention to estimating only a finite number of quantiles of the distribution [20,21,32]. These quantiles can be estimated using historical data in the QM framework. Indeed, the input of the model is the column vector of V explanatory variables y h ¼ {y h1 , :::, y hV } linked to the measurements of PV power and other meteorological quantities. 3 In the most general form, the α-quantile P ðαÞ h of PV power at the time horizon of the forecast t ¼ h can be estimated through a linear regression as: 3 In a regression model, the inputs are usually called "independent variables" and the output is usually called "dependent variable". However, the term "explanatory variables" is here preferred to the term "independent variables" in order to avoid confusion, as it is possible to have two (or more) input variables that are not independent on each other (e.g., some regression models use measurements of both temperature and squared temperature).
where β ðαÞ is a row vector of V coefficients to be estimated, and r ðαÞ h is a residual white noise at time t ¼ h. Starting from Eq. (9), the expected valueP ðαÞ h of P ðαÞ h is given by: and, then, the problem of quantile estimation reduces to find an estimationβ ðαÞ of the row vector β ðαÞ .
If the data set {P 1 , …, P D } of D past hourly measurements of PV power and the corresponding D vectors of explanatory variables y 1 , …, y D are available for the given forecasting training period,β ðαÞ can be obtained by solving the following minimization problem: where each value I d is calculated as: The solution of Eq. (11) can be found in the least square framework; also, an effective solution to problem (Eq. 11) was proposed in Ref. [20]. Once the selected Q quantiles of PV power are estimated, the predictive CDF can be obtained through linear interpolation.

The multi-model ensemble method
Pierre-Simon Laplace came up in 1818 with the idea of merging different forecasts into a new, combined forecast; obviously, the combined forecast should provide a mean error that is lower than the error of each constituent individual forecast. This is possible since individual forecasts usually contain some independent information that can be exploited in a convenient combination. Bates and Granger then followed up in 1969 with a paper that set standards for the combination of forecasts. After that, hundreds of relevant studies have been carried out and applied to a variety of fields of research, including economics, management, systematics, biomedicine, meteorology, and climatology [13,33,34].
Among the large number of possible ensemble methods, in this subsection, we propose a new multi-model competitive ensemble forecast method that consists in training the two probabilistic predictors shown in the previous subsections, and in processing their outputs in such a way to guarantee adequate sharpness and calibration of the final forecast.
The proposed method consists in three steps ( Figure 2): Step 1-Collection of the input data: available measurements must be collected and, if necessary, processed in order to form the data that are used as inputs.
Step 2-Selection and running of base predictors: the base probabilistic models are chosen on the basis of the forecaster's experience and the available input data sets; then, the models are utilized for a training period.
Step 3-Processing of the outputs of the single base predictors: the outputs of all of the base predictors are aggregated by applying adequate and robust criteria to obtain an ensemble forecast that operates better than any of the base predictors guaranteeing adequate levels of reliability and sharpness of forecasts.
The selection of the base predictors (Step 2) and the processing of the outputs of the single base predictors to obtain a good performing ensemble (Step 3) are obviously particularly critical.
In the proposed MEM, the chosen base predictors are the BM and QM of Sections 2.1 and 2.2. Indeed, since diversity is a key feature in competitive ensemble forecasting, it was experimented that these two predictors return usually different decisions, and then they could provide a good improvement in terms of their ensemble. With reference to Step 3, as shown in Ref. [22,24], the linear pooling of base probabilistic predictors can be suitable for the aggregation. Other approaches (e.g. in the Bayesian framework or through the logarithmic pooling [13,22,35]) were also considered in the relevant literature.
The linear pooling is performed on the cumulative distribution functions (CDFs) obtained through the base predictors, since they are easier to manage when different kinds of predictors have to be aggregated. The output of the MEM is the predictive CDF F MEM h, k ðP h Þ, i.e. the linear combination of two predictive CDFs F BM h, k ðP h Þ, F QM h, k ðP h Þ of the PV power for the horizon time h. It results in the following weighted sum: where the weights are conveniently selected in order to guarantee that the output function is indeed a CDF defined in the interval ½0, P r . To achieve that, each weight must be non-negative (w 1 , w 2 ≥0) and their sum must be unitary (w 1 þ w 2 ¼ 1). Note that the assumption of two base predictors was made with no loss of generality. Indeed, the same procedure could be applied for a larger number of predictors, i.e. the predictive CDF would be a weighted sum of N p base predictive CDFs with weights w ¼ {w 1 , …, w Np } and ∑ Np i¼1 w i ¼ 1.
The estimation of the weight(s) in the linear pooling MEM should be aimed to produce and ensemble forecast that performs better than any base predictors, maximizing the sharpness subject to calibration. In the relevant literature, weights were estimated by minimizing the CRPS in the forecasting training period [36,37]. Also, as LPE predictions may be overdispersed, 4 if single predictors are neutrally dispersed, some techniques shown in Ref. [22,39] (e.g. Beta-transformation or multi-objective procedures) may be applied to overcome this problem.
In this chapter, we use a further proper score, i.e. the PLF, which penalizes for observations lying far from a given quantile.
Thus, the estimation of weight w 1 in Eq. (13) is the solution of the following minimization problem:ŵ where PLFðP ðλjÞ d , P Ã d Þ is the PLF at the d th hour for the λ j -quantile, as defined in (A.6), and J is the total number of considered quantiles. The estimationŵ 2 of w 2 is then trivially obtained aŝ w 2 ¼ 1−ŵ 2 , for the assigned weight properties.

Numerical applications
Values of solar irradiance were measured from 1 January 2012 to 31 December 2013 at the National Renewable Energy Laboratory in USA (39.74°N lat., 105.18°W long.) [40]. Measurements were collected with a 1-min resolution but were then averaged and pre-processed in order to obtain 17544 hourly values with no outliers or bad data. Also measurements of cloud cover and air temperature were selected in order to be used as exogenous variables in the base predictors forecasting procedures. The rated power of the considered PV installation was set to P r = 110 kW.
Probabilistic forecasts were performed through the multi-model ensemble method presented in Section 2.3 in order to validate the usefulness of the procedure. Forecasts were performed for several lead times k, and results for k ¼ 24 hours (next day forecast) are initially shown with extensive details in this section; then, results for k ¼ 1 hour (next hour forecast) are also provided, but with less details, for the sake of conciseness. In particular, the PV power output was forecasted from February to December 2013 (11 months of forecasting), and results for May 2013 are shown below. In this case, the interval used to train the base predictor methods was made of eleven months (from May 2012 to March 2013), while the calibration of base predictors and the choice of weights of the multi-model ensemble method were performed in the following month (April 2013).
The proposed multi-model ensemble method is compared to both probabilistic base predictors and also to a benchmark based on the probabilistic persistence method (PPM) [41], in order to verify its usefulness. The comparison is performed numerically in terms of PLF and graphically through the inspection of PIT histograms [42] and reliability diagrams [43]. Also, the maximum deviation from perfect reliability (MDPR; see Appendix) is considered in order to compare different forecasts. In all numerical applications, night-time hours were not considered for forecast, as the total PV power output was set equal to zero, and therefore, they were not considered in indices and diagrams evaluation. Table 1 shows the results in terms of PLF and MDPR for BM, QM, MEM, and PPM, and Table 2 shows the corresponding estimated weights obtained in the MEM procedure for the next-day forecast. From the analysis of Table 1, the proposed method appears to improve the performances of base predictors. Indeed, the PLF decreases by about 2% and also the MDPR is reduced by about 2 and 1.4% with respect to the BM and QM, respectively, with a relative reduction of 50 and 40%, respectively. Base predictors and the proposed MEM outperformed the PPM benchmark in terms of PLF by about 22-23%. Also, even if the performances in terms of PLF are quite similar for BM and QM, they are weighted differently in the MEM, with a prevalence of QM as shown in Table 2.
Selected base predictors are acknowledged in the relevant literature as very competitive forecasting tools. Then, also a not impressive reduction in terms of PLF is a valuable contribution toward well-performing methods for PV power forecasting. Further improvements could be obtained by merging more base probabilistic predictors in the MEM.
We outline that MDPR gives a rough evaluation on the performances of probabilistic methods in terms of reliability, but it is important to evaluate also how the probabilistic method performs in each individual quantile. Thus, reliability diagrams for base predictors, MEM and PPM are shown in Figure 3, while PIT histograms show the relative frequencies of these methods in Figure 4. From the graphical inspection of Figure 3, base predictors appear to perform well in terms of reliability, especially for higher quantiles and with only a little deviation in lower quantiles. The MEM also shows good performances, as estimated coverages are very close to the ideal ones on overall. PPM instead appears to provide under-dispersed forecasts.
This trend is also confirmed by the inspection of Figure 4, as base predictors and MEM appear to be normally dispersed, i.e. a necessary condition for the overall reliability. The underdispersion trend seen in the PPM reliability diagram is confirmed in the graphical inspection of PPM PIT histogram. As a further comment on numerical simulations performed from February to December 2013, we note that the MEM was able to outperform both base predictors in terms of PLF in 9 months on 11; in the other 2 months, the results are only slightly worse. A slight prevalence of QM weight was observed with respect to the BM weight. In 4 months, the resulting ensemble forecast was affected by a slight over-dispersion, due to the normal-dispersion of base predictors; this will need some techniques to be developed in order to overcome this problem.
Finally, also simulations for different lead times were performed. In particular, the results for k ¼ 1 hour are particularly significant for comparison. In this case, only in 5 months the weight of BM was different from zero. This was due the different behaviour of BM and QM for nexthour forecasting, as the performances of QM were particularly better than BM in terms of PLF.
In three of these five months the proposed MEM led to a better forecast in terms of PLF than both base predictors, and only in one month the problem of over-dispersion of ensemble forecasts was detected.  As an example, Table 3 shows the results in terms of PLF and MDPR for BM, QM, MEM, and PPM, and Table 4 shows the corresponding estimated weights obtained in the MEM procedure during March 2013 for next-hour forecast. The PLF performance differences are very clear between BM and QM; however, the BM weight is still not negligible and the MEM has better performances than both predictors in terms of PLF, thus leading to a greater value of MDPR.

Conclusions
In this chapter, we dealt with the problem of short-term forecasting of PV power in electrical power systems. In the frame of the smart grid paradigm, the needing of accurate, reliable, and sharp probabilistic forecasting is particularly enhanced for industrial operators that are interested in optimally manage their grids and actively participating to liberalized electricity markets. As well-known, probabilistic forecasts can be obtained from single probabilistic predictors or from an ensemble of multiple probabilistic or deterministic predictors. The variability of information available in different predictors would likely contribute to produce forecasts that are better, or at least as good as one of the base predictors.
In this chapter, a proposal of a MEM based on BM and QM predictors was presented, and its effectiveness was evaluated through a large number of numerical applications based on actual irradiance data. The ensemble forecasts were constructed by the application of the linear pooling technique, through a minimization procedure that aims to minimize the PLF, that is a proper score. Results of the numerical applications proved the usefulness of the procedure on actual data, thus sometimes leading to a slight over-dispersion of resulting forecasts. Significantly, better forecasts were obtained for the next day while less significant performances were obtained in case of next hour forecast. The above problems will surely be addressed in new researches.

A.1.2. Reliability
Reliability is a property of the probabilistic forecast and of the realization. It involves the correspondence between estimated coverages and actual coverages.
Indeed, let us suppose that a 50% prediction interval is provided for a random variable; the forecast is therefore considered reliable if the observation of the random variable lies in that interval with probability 0.5 for the given time horizon.
The same property can be defined also for predictive quantiles; e.g. if the 0.5-quantile (median) is predicted for a given horizon time, the realizations should be equal or lower than the 0.5quantile in 50% of cases [20,21,32].
Reliability diagrams are very effective tools to evaluate the reliability of a probabilistic method [21,42,44]; they show the estimated coverage versus the nominal one, for various nominal coverage values (usually from 0.05 to 0.95, with a 0.05 step, or from 0.1 to 0.9 with a 0.1 step).
The estimated coverages can be found from a predictive distribution in a very intuitive manner. Let P ðλÞ h be the forecasted λ-quantile extracted from the forecasted distribution of the random variable at the desired time horizon h. The indicator I ðλÞ h is defined from the comparison between the actual value P Ã h and the forecasted quantile P ðλÞ h , as follows: and, consequently, the estimationλ of the actual coverage λ based on a set of N tot forecasts is: Obviously, the probabilistic forecasting method is considered reliable if the estimated coverages do not significantly differ from the nominal ones. A necessary condition for the probabilistic calibration is the normal dispersion of forecasts, and this results in a reliability curve that is close to the 45°diagonal line (representing the ideal reliability). Instead, over-dispersed forecasts (usually due to lack of sharpness) result in an inverse S-shaped reliability curve, while under-dispersed forecasts (usually due to too much sharpness) result in a S-shaped reliability curve. Biased forecasts are easily recognized, as the corresponding reliability diagrams strongly differ from perfect curve. Figure A1 shows examples of reliability diagrams for reliable, over-dispersed, under-dispersed and biased forecasts.
The MDPR is straightforwardly defined as the maximum error between estimated coverages and nominal coverages; i.e.: Also PIT histograms [25,42] can be used to empirically check the calibration of forecasts. In these histograms, the PIT values 5 are plotted: for a probabilistically calibrated forecast, the PIT histogram is statistically uniform. Even if the uniformity of PIT histograms is a necessary, but not sufficient condition for the forecast to be perfect [42], from the behaviour of PIT Figure A2. Examples of PIT histograms for calibrated, over-dispersed, under-dispersed and biased forecasts.   Figure A2 shows examples of PIT histograms for reliable, overdispersed, under-dispersed, and biased forecasts.
Anyway, formal tests of the hypothesis that a given forecasting method is probabilistically calibrated are also available, provided that these tests account for complex dependence structures. The reader can refer to the specialized literature to deepen this subject [21,25].

A.1.3. Proper scores
Probabilistic forecasts can be assessed numerically through the evaluation of proper scores [27]. Two of the most common and versatile proper scores are the PLF and the CRPS, simultaneously addressing both calibration and sharpness [27,45].
In practice, the CRPS compares the predictive distribution with the observation, both in terms of cumulative distribution functions. In particular, the CDF of the observation is a Heaviside function HðÁÞ centred in the observation P Ã h , and the CRPS probabilistically accounts for the error area between predictive and actual CDFs ( Figure A3).
Indeed, let HðP h −P Ã h Þ centred in P Ã h be the cumulative distribution function of the actual value of PV power andF h ðP h Þ be the predictive CDF at the time horizon h; hourly CRPS can be evaluated as follows: Figure A3. Graphical interpretation of continuous ranked probability score.
From the analysis of Eq. (A.4), it clearly appears that the CRPS is linked to the total area between the predictive CDF and the Heaviside function. It can be seen that the area (and, consequently, the CRPS h ) decreases as the predictive distribution approximates the step function. The calculation of the CRPS h will result in a value that has the units of the forecast variable. For a total number D of forecasts, the average CRPS is: and it can be interpreted as a probabilistic version of the mean absolute error [25].
The PLF is another widely used proper score [14,27,46]. As we defined P ðλÞ h to be the λ-quantile extracted from the predictive distribution of PV power at the desired time horizon h and P Ã h to be the corresponding actual value of PV power, the PLF is defined as follows: Summing up the PLFs across all considered quantiles and averaging them throughout the forecast horizon, the PLF of the corresponding probabilistic forecasts is obtained.