The parameters of the twovariable tStudent copula distribution for wind speed of two farms at different hours of the day in fall season
1. Introduction
Nowadays, governments are developing ambitious goals toward the future green and sustainable sources of energy. In the U.S., the penetration level of wind energy is expected to be 20% by the year 2030 [1]. Several European countries already exhibit the adoption level in the range of 5%–20% of the entire annual demand. Also with further developments in the solar cells technology and lower manufacturing costs, the outlook is that the photovoltaic (PV) power will possess a larger share of electric power generation in the near future. Gridconnected PV is ranked as the fastestgrowing power generation technology [2]. PV generates pollutionfree and very costeffective power which relies on a free and abundant source of energy.
Due to the increasing wind and solar penetrations in power systems, the impact of system variability has been receiving increasing research focus from market participants, regulators, system operators and planners with the aim to improve the controllability and predictability of the available power from the uncertain resources. The produced power from these resources is often treated as nondispatchable and takes the highest priority of meeting demand, leaving conventional units to meet the remaining or net demand. This issue makes the optimum scheduling of power plants in power system cumbersome as embeds the stochastic parameters into the problem to be handled. The unpredictability along with potential sudden changes in the net demand, may face operators with technical challenges such as ramp up and down adaptation and reserve requirement problems [34].
Several investigations aiming at handling the uncertain nature of wind and solar energy resources have been reported. Basically, the methods found in the literature can be classified into three groups: methods that deal with the prediction of uncertain variables as an input data preprocessing, methods that use stochastic scenariobased approach within the optimization procedure to cover all the outcomes per the probable range of uncertain variables, and methods based on a combination of these two approaches. The studies presented in [57] can be mentioned as one of the most recent efforts lying in the first group. In [56] an Artificial Neural Network (ANN) forecast technique is employed and followed by risk analysis based on the error in the forecast data. Then, the so called preprocessed data is directly taken as the input to the optimization process. Relying on the forecast tools, such methods suffer from high inaccuracy or exante underestimation of the available power which increases the scheduled generation and reserve costs. Anyway, this approach is useful as it accounts for the temporal correlation between the random variables representative of each time step of the scheduling period, in terms of timeseries models. On the other hand, in [89] which belong to the second group, the focus is on the stochastic scenario analysis rather than the forecasting methods. The usage of this approach also has its own advantages, as it tries to model the likely range of values for the random variables. However, the efficiency of this approach largely depends on the accuracy and reliability of their probabilistic analysis; based on which the potential scenarios are built.
The most effective approach is associated with the third group, which applies the advantages of both forecast techniques and scenariobased optimization approach. Reference [10] presents a computational framework for integrating a numerical weather prediction (NWP) model in stochastic unit commitment/economic dispatch formulations that describes the wind power uncertainty. In [11], the importance of stochastic optimization tools from the viewpoint of the profit maximization of power generation companies is investigated. The exposed financial losses regarding the wind speed forecast errors are discussed. A stochastic model is also presented in [12]which uses a heuristic optimization method for the reduction of random wind power scenarios. The wind speed data is assumed to follow the normal PDF. A similar approach is introduced in [13] whereas the wind speed error distribution is considered as a constant percentage of the forecasted data. In [14], the AutoRegressive Moving Average (ARMA) time series model was chosen to estimate the wind speed volatility. Based on the model, the temporal correlation of wind speed at a time step with respect to the prior time steps is well analyzed.
In this chapter, the authors present a framework for stochastic modeling of random processes including wind speed and solar irradiation which are involved in the power generation scheduling optimization problems. Based on a thorough statistical analysis of the accessible historical observations of the random variables, a set of scenarios representing the available level of wind and solar power for each time step of scheduling are estimated. To this aim, the Kernel Density Estimation (KDE) method is proposed to improve the accuracy in modeling the Probability Distribution Function (PDF) of wind and solar random variables. In addition, the concept of aggregation of multiarea wind/solar farms is analyzed using Copula method. Taking the advantage of this method, we can reflect the interdependency and spatial correlation of the power generated by several wind farms or PV farms that are spread over different locations in the power system. A final framework is developed to perform the stochastic analysis of the random variables to be input into the stochastic optimization process, as discussed in the following sections.
2. Methodology of data processing
2.1. Probability distribution function and data sampling
In order to generate sample data for random variables, the random behavior should be simulated somehow that the model follows the historical data pattern with the most homology to real data. In order to specify the pattern of a random variable, the PDF should be obtained. There are two classes of methods to determine the PDF of a random variable including parametric and nonparametric methods [15]. In parametric methods, the data samples are fitted to one of the wellknown standard PDFs (such as Normal, Beta, Weibull, etc.) so that the most possible adaptation between the PDF and the existing data is achieved. The values associated with the PDF parameters are evaluated using Goodness of Fit (GoF) methods such as KolmogorovSmirnov test [16]. On the other hand, the nonparametric methods do not employ specific wellknown PDF models.
The use of parametric methods in some studies in which simulation of probabilistic models for wind and solar data is included have been reported, as in [6, 12]. Similarly, authors in [9] employ a fixed experimental equation to represent the PDF of wind data. However, this approach to PDF estimation can bring about some defects as follows:
The parametric methods may show significant deviation to the actual distribution of data, mainly because the actual distribution does not characterize the underlying symmetry in the standard PDFs. As an example, Figure1 shows the distribution function for yearly solar irradiation sample data at 11 AM in a region. As seen in the figure, the parametric distribution fittings are not capable of modeling the right side skewness in the actual distribution, which will reveal considerable error in the outcoming samples.
Some random variables in general and particularly solar irradiation and wind speed are very timedependent in behavior. In other words, their patterns change with different time periods, months and seasons. Hence, the nonparametric approach is advantageous in terms of time period adaptation, because it does not consider a specific type of distribution. However, the parametric approach tries to nominate a certain type of PDF to each random phenomenon in all circumstances. For instance, it is common to associate a Weibull pattern to wind speed data, which may not be the most appropriate option to be generalized to all time periods.
Based on the aforementioned facts, in this study, it is desired to obtain the most accurate distribution model taking the advantage of Kernel Density Estimation (KDE), categorized as a nonparametric method.
2.2. Kernel Density Estimation (KDE)
The simplest and most frequently used nonparametric method is to use a histogram of historical samples. As a brief description of the method, the distance covering the range of samples is divided into equal sections called "bin". For each bin, a sample value is considered as the kernel of that bin. A number of rectangular blocks equal to the number of samples in each bin, each with unity area, are located on each bin. In this way, a discrete curve is obtained that somewhat describes the probability distribution of samples. However, the overall curve is largely dependent on the size of bins and their marginal points, because with the alterations in the bin size, the number of samples in each bin will be changed [17]. Besides, the obtained curve suffers from high raggedness. Hence, KDE method was introduced to solve the mentioned drawbacks. In this method, considering the samples as the kernels of each bin, the blocks are with a unity width and a height equal to the inverse of the number of samples for each sample value (
where
In the present study, the KDE method is used to obtain the PDF of the seasonally wind speed and solar irradiation data for each hour in a day. The method is implemented using the
2.3. Correlation of random variables
Sample generation from a random variable is possible simply using a Monte Carlo simulation of its corresponding PDF. However, this is more cumbersome for a group of random variables which may have underlying dependence or correlation. Neglecting the correlation will result in the inaccurate multivariate PDF and then to irrelevant and deviated samples.
There are several correlation coefficients to quantify the correlation among a number of random variables, among which the most famous one is the Pearson coefficient:
where
In the problem under study, i.e. power system scheduling in the presence of uncertain renewables, we consider the presence of multiple wind farms and solar farms throughout the power system. The solar power and wind power as well as load demand are three distinct stochastic processes. They can be discriminated into 24 random variables representing 24 hours of the day.
The random variables within each random process have their own temporal relation which can be modeled by time series prediction methods [14]. However, there may also be spatial correlation among the random variables from different processes, although it might seem unlikely. Here, we are going to deal with the tangible nonlinear correlation between the hourly random variables for a wind farm and another farm located in a close region, as well as for a PV farm and another one located in a close region. The interested reader may examine other possible dependence structures between random variables / processes. Obviously, taking into account these relations results in more accuracy and enhancement of the models and solutions. Figure 4 presents how neglecting the correlations and directly using single variable PDFs to generate samples for a multivariate process may lead to model malfunction. For two random variables with similar Lognormal distribution and Pearson correlation of 0.7 (with diagonal covariance matrix), 1000 samples have been generated considering total independence (Figure 4 (a)) and linear dependence (Figure 4 (b)). It is observed in Figure 4 (b) that X1 values tend to be closer to X2 values especially in the upper range, in comparison with Figure 4 (a).
In order to describe the correlations between random variables including nonlinear correlations, a method named Copula can be employed which is described in the following section.
2.3.1. Copula method
The correlation between random variables or samples is measured by the Copula concept. Embrechts & McNeil introduced Copula functions for application in financial risk and portfolio assessment problems [20]. Recently, much attention is being paid to this method in statistical modeling and simulation problems.
Copulas provide a way to generate distribution functions that model the correlated multivariate processes and describe the dependence structure between the components. The cumulative distribution function of a vector of random variables can be expressed in terms of marginal distribution functions of each component and a copula function.
The basic idea behind the copula method is described as the Sklar's theorem [21]. It shows that a multivariate cumulative distribution function (CDF) can be expressed in terms of a multivariate uniform distribution function with marginal density functions
This equation can be rewritten to extract the Copula of the joint distribution function of the random variables, as follows:
where
This equation can be restated as:
where
Various copula functions are introduced by present. They are generally classified into explicit and implicit types. The implicit copulas are inspired by standard distribution functions and have complicated equations, whereas the explicit ones are simpler and do not follow the specific functions. Among the most widely used implicit copulas, Gaussian copula and tStudent copula and among the explicit ones, Clayton copula and Gumbel copula can be mentioned. The selection of the most appropriate copula is a complicated problem itself. Here, the tStudent copula is employed because of its simplicity and flexibility. The tStudent copula is formulated as [21]:
where
In the current study, the authors employed the twodimensional Copula method to present the correlation of the wind speed patterns between two wind farms and the correlation of the solar irradiation patterns between two PV farms, for every hour of the day. The available data for three years are initially normalized:
where




0.14  3.7  0.21  7 
0.12  15.58  0.19  10 
0.08  7.39  0.13  11 
0.12  24.31  0.18  16 
2.4. Timeseries prediction of wind speed and solar irradiation
As mentioned earlier, besides the spatial correlation among different farms, the wind speed and solar irradiation random variables assigned to the scheduling time steps exhibit temporal correlation, i.e., they are dependent on the condition of random variables at previous time steps (hours). In order to take into account the temporal correlation, timeseries prediction models can be used. Here, For the purpose of dayahead scheduling of power system, an initial prediction of random variables should be performed using ANN. Other forecast tools such as ARMA model are reported [6, 24], but ANN is preferred due to its capability of reflecting nonlinear relations among the timeseries samples and better performance for longterm applications. Afterwards, the distribution of forecast errors is analyzed to determine the confidence interval around the forecasted values for the upcoming potential wind speed and solar irradiation data on the scheduling day.
The forecast process is performed using two MultiLayer Perceptron (MLP) neural networks [25] for wind speed and solar irradiation. Each network is configured with three layers including one hidden layer. The input is a 24 hour structure in which a vector of 90 data samples forms its arrays (representing each hour of the day for three month of a season). The available data is divided into three groups proportional to 70%, 15% and 15% for training, validation and test steps, respectively. The hourly data of wind speed and solar irradiation for the first farm are presented in Figure 7 and Figure 8, respectively. The plots of forecast results along with the actual data for one week are followed in Figure 9 and Figure 10.
2.5. Estimation of the confidence interval and risk analysis for wind speed and solar irradiation scenarios
From the power system planning viewpoint, the important aim is to reduce as far as possible the uncertainty and risk associated with generation and power supply. The risk is more crucial when the exante planned generation is less than the expost actual generation. The error in the forecast data can be estimated with a level of confidence (LC), in order to determine a reliable level of generation to be considered in the planning stage. Here, the confidence interval method [26], known in risk assessment problems is proposed as a constraint to specify a lower and upper band for the wind speed and solar irradiation scenarios. For example,
(
In order to calculate the
where
2.6. Scenario generation and reduction
The final step of data processing before performing the stochastic optimization procedure is the scenario generation and reduction. This makes the main distinction between the deterministic programming and the stochastic programming approach. The deterministic programming deals with determined inputs and predefined parameters of the system model, whereas the stochastic programming combines the process of assignment of optimum values to the control variables with the stochastic models of the existing random variables. The variables with stochastic behavior are indeed represented by a bunch of scenarios reflecting the most probable situations that is likely to occur for them. Here, in the scenario generation step, using the multivariate distribution function obtained, a large set of random vector samples will be generated using the Monte Carlo simulation. Then, since all of the generated scenarios may not be useful and some of them may exhibit similarities and correlation with other scenarios, the scenario reduction techniques are applied to eliminate lowprobability scenarios and merge the similar ones to extract a limited number of scenarios keeping the whole probable region of the variables covered. Furthermore, the scenario reduction technique increases the computational efficiency of the optimization process. The most wellknown methods for scenario reduction are fast backward, fast forward/backward and the fast forward method [2728]. The first step in scenario reduction is clustering. By clustering, the scenarios which are close to each other are put in one cluster. In the following, we present a description on the fast backward method.
The backward method is initialized by selection of scenarios as the candidates to be eliminated. The selection criterion is based on the minimum value for the product of each sample's probability by the probabilistic distance of that sample to others. The probabilistic distance of each sample to others is considered as the minimum distance of that sample to each of the other samples in the same set. In the next step, the same analysis is performed on the remaining scenarios, however, in this step the product of the eliminated scenarios probabilities in the previous step by the distance of the current sample to other samples are also included. This process continues until the probabilistic distance obtained in each step (iteration) would be less than a predefined value as the convergence criterion. Mathematically speaking, the algorithm can be summarized as follows:
where
The power generation scheduling problem including units with uncertain and volatile characteristic is commonly treated as a stochastic optimization problem. In this approach, the objective function calculation is repeated per all final scenarios in each iteration of the optimization process, where the summation of these objective values commonly defines the overall objective function value to be optimized. As a comparison, the mean square error (MSE) of the calculated scenarios with respect to the real data is calculated in two cases. In the first case, the scenarios are obtained from the Copula distribution and within the calculated confidence interval, based on the proposed framework. In the second case, the scenarios are generated from the singlevariable distributions without modeling the underlying temporal and spatial correlations. Table 2 exhibits that the error in the first case is less than that in the second case in three days under study.



1.97e3  1.4e3  Day 1 
5.99e3  4.05e3  Day 2 
4.11e3  3.31e3  Day 3 
3. Conclusion
Natural characteristics of wind and solar energy impose uncertainty in their design and operation. Hence, considering various possible scenarios in the model of these resources can lead to more realistic decisions. The uncertain parameters are expressed by probability distributions, showing the range of values that a random variable could take, and also accounting for the probability of the occurrence of each value in the considered range. Therefore, the way the random processes are modeled in terms of their PDF is a significant problem. The possible spatial correlations have been addressed and shown effective using the Copula method. Similarly, the possible temporal correlations have been taken into account using a timeseries analysis. In summary, the overall framework can be listed as follows:
Take historical data of random variables;
Data normalization;
Calculate the PDF for each random variable using KDE;
Calculate multivariate Copula PDF;
Forecast the data timeseries over the scheduling horizon;
Calculate the confidence interval of potential scenarios for the vector of random variables based on error analysis of forecast data;
Generate initial set of scenarios within the confidence interval obtained from the previous step, based on the PDF of step 4;
Perform scenario reduction;
Perform stochastic optimization based on the final set of scenarios for each random process.
References
 1.
20% “Wind Energy By 2030: Increasing Wind Energy’s Contribution to U.S. Electricity Supply,," Office of Energy Efficiency and Renewable Energy DOE/GO1020082567, 2009.  2.
REN21 Steering Committee, Renewables Global Status Report 2009 Update [Online]. Available: http://sitedev.cci63.net/europe/actu/2009/0609/doc/energie_renouv2009_Update.pdf  3.
S. Abedi , et al. , "RiskConstrained Unit Commitment of Power System Incorporating PV and Wind Farms,"ISRN Renewable Energy, vol. 2011, 2011.  4.
S. Abedi , et al. , "A comprehensive method for optimal power management and design of hybrid RESbased autonomous energy systems,"Renewable and Sustainable Energy Reviews, vol. 16, pp. 15771587, 2012.  5.
S. Abedi , et al. , "RiskConstrained Unit Commitment of Power System Incorporating PV and Wind Farms,"ISRN Renewable Energy, vol. 2011, p. 8, 2011.  6.
K. Methaprayoon , et al. , "An integration of ANN wind power estimation into unit commitment considering the forecasting uncertainty,"Industry Applications, IEEETransactions on, vol. 43, pp. 14411448, 2007.  7.
B. C. Ummels , et al. , "Impacts of wind power on thermal generation unit commitment and dispatch,"Energy Conversion, IEEE Transactions on, vol. 22, pp. 4451, 2007.  8.
C. L. Chen, "Optimal wind–thermalgenerating unit commitment," Energy Conversion, IEEE Transactions on, vol. 23, pp. 273280, 2008.  9.
M. A. OrtegaVazquez and D. S. Kirschen, "Estimating the spinning reserve requirements in systems with significant wind power generation penetration," Power Systems, IEEE Transactions on, vol. 24, pp. 114124, 2009.  10.
E. M. Constantinescu , et al. , "A computational framework for uncertainty quantification and stochastic optimization in unit commitment with wind power generation,"Power Systems, IEEE Transactions on, pp. 11, 2011.  11.
[L. V. L. Abreu , et al. , "RiskConstrained Coordination of Cascaded Hydro Units With Variable Wind Power Generation,"Sustainable Energy, IEEE Transactions on, vol. 3, pp. 359368, 2012.  12.
V. S. Pappala , et al. , "A stochastic model for the optimal operation of a windthermal power system,"Power Systems, IEEE Transactions on, vol. 24, pp. 940950, 2009.  13.
H. Siahkali and M. Vakilian, "Stochastic unit commitment of wind farms integrated in power system," Electric Power Systems Research, vol. 80, pp. 10061017, 2010.  14.
R. Billinton , et al. , "Unit commitment risk analysis of wind integrated power systems,"Power Systems, IEEE Transactions on, vol. 24, pp. 930939, 2009.  15.
J. W. GOETHE, "NonParametric Statistics," 2009.  16.
G. Marsaglia , et al. , "Evaluating Kolmogorov’s distribution,"Journal of Statistical Software, vol. 8, pp. 14, 2003.  17.
A. R. Mugdadi and E. Munthali, "Relative efficiency in kernel estimation of the distribution function," J. Statist. Res, vol. 15, pp. 579–605, 2003.  18.
R. D. Deveaux, "Applied Smoothing Techniques for Data Analysis," Technometrics, vol. 41, pp. 263263, 1999.  19.
K. Sutiene and H. Pranevicius, "Copula Effect on Scenario Tree," International journal of applied mathematics, vol. 37, 2007.  20.
P. Embrechts , et al. , "Correlation and dependence in risk management: properties and pitfalls,"Risk management: value at risk and beyond, pp. 176223, 2002.  21.
U. Cherubini , et al. ,Copula methods in finance vol. 269: Wiley, 2004.  22.
D. Berg, "Copula goodnessoffit testing: an overview and power comparison," The European Journal of Finance, vol. 15, pp. 675701, 2009.  23.
M. Dorey and P. Joubert, "Modelling copulas: an overview," The Staple Inn Actuarial Society, 2005.  24.
G. Riahy and M. Abedi, "Short term wind speed forecasting for wind turbine applications using linear prediction method," Renewable Energy, vol. 33, pp. 3541, 2008.  25.
M. T. Hagan , et al. ,Neural network design : PWS Boston, MA, 1996.  26.
D. Y. Hsu, et al., Spatial Error Analysis: A Unified, Applicationoriented Treatment: IEEE Press, 1999.  27.
GAMS/SCENRED Documentation [Online]. Available: http://www.gams.com/docs/document.htm  28.
N. G.K. J. Dupacová, and W. Römisch, "Scenario Reduction in Stochastic Programming: An Approach Using Probability Metrics," Math. Program, pp. 493–511, 2000.