Risk-Constrained Forward Trading Optimization by Stochastic Approximate Dynamic Programming

Since the mid-twentieth century, Dynamic Programming (DP) has proved to be a flexible and powerful approach to address optimal decisions problems. Nevertheless, a decisive drawback of the conventional DP is the need for exploring the whole state space in order to find the optimal solution. The immense amount of mathematical operations involved to solve real-scale problems, constrained the application of DP to small or highly simplified cases. Indeed, state space grows exponentially with the number of variables when considering multivariate optimization. The curse of dimensionality is a well-known limitation of conventional DP algorithms for tackling large-scale problems ubiquitous in real science and engineering applications.


Introduction
Since the mid-twentieth century, Dynamic Programming (DP) has proved to be a flexible and powerful approach to address optimal decisions problems. Nevertheless, a decisive drawback of the conventional DP is the need for exploring the whole state space in order to find the optimal solution. The immense amount of mathematical operations involved to solve real-scale problems, constrained the application of DP to small or highly simplified cases. Indeed, state space grows exponentially with the number of variables when considering multivariate optimization. The curse of dimensionality is a well-known limitation of conventional DP algorithms for tackling large-scale problems ubiquitous in real science and engineering applications.
In the last decades, many new algorithms emerged in different branches of science to overcome the inherent limitations of conventional DP. Unlike conventional DP, these algorithms avoid enumerating and calculating every possible state of a system during the optimization process. Instead, they estimate relevant features of the state space. This approach circumvents the dimensionality limitations of the conventional DP while retaining many of its advantages.
In this chapter, the application of advanced stochastic dynamic programming techniques to the optimization of the forward sell strategy of a power generator subjected to delivery risk is considered. The proposed approach allows rebalancing the portfolio during the period of analysis. In electricity markets, a power generator can sell in advance part or all its future energy production at a fixed price, hedging against the high price volatility of the spot market. The strategy of eliminating the price risk by selling in advance the entire production in the forward market to a fixed price is often thought as the minimum-risk trading policy. Nonetheless, it can be proven that this is not the case for most generators. The outages of the generation units and transmission lines, as well as unforeseen limitations in the primary energy supply expose generators to delivery risk [1]. Delivery risk considerably modifies the probability distribution of profits, shifting the optimal trading strategy toward a portfolio mixing forward contracts and power sold in the spot market. Because of the size of the probability state space and the limited computing capabilities, the problem of the optimal trading strategy has not a closed form solution and thus, its determination is matter of current study. The increase in computing power and recent developments in Operational Research has brought new insights into the solution of such problems.
In the past decade and by virtue of the ever increasing computational power, many methods emerged in different scientific fields with several different names: Reinforced Learning, Q-Learning, Neuro-Dynamic Programming, etc. All these methods were later brought together in what is currently known as Approximated Dynamic Programming (ADP) [2], [3]. These algorithms resign the exhaustive enumeration and calculation of the space-state typically performed by conventional DP. Instead, they iteratively approximate a function of the space state through stochastic simulation and statistical regression techniques, circumventing the dimensionality problem of DP.
Although ADP algorithms are being used in several other fields of science, the application to design optimal trading strategies in power markets has not been proposed so far. In this chapter, ADP techniques are exploited to optimize the selling strategy of a power generator trading in a frictional market with transaction costs. Three available products are considered: selling in the spot market, and/or get involved in quarterly and one-year forward contracts. The objective of the generator is to maximize the expected profit while limiting financial risk. Decisions can be made only at the beginning of each month. At each decision stage, the current trading position can be changed at a cost in order to rebalance the portfolio.

Approximate dynamic programming
In the field of decision making, it is often useful to assess what could be expected from each possible decision given the information available. After evaluating the outcome of each alternative decision, a simple comparison is enough to take the optimal course of action. This approach is straightforward but also naive. Real problems often present simply too many possible options to evaluate. Moreover, if the problem involves sequential decision stages, the number of possible solution paths scales up exponentially. Finally, outcomes are frequently subjected to uncertainty. So, several outcomes could present themselves for each decision, augmenting further the size of the problem. Therefore, in order to keep the size of the problem within reach, shortcuts and simplifications areoften necessary. Dynamic Programming (DP) is a clever way to reduce the number of options based on Bellman´s Principle of Optimality: "An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision" [4]. This leads to the fact that from any given state there is an optimal decision trajectory that once solved can be used for every other path containing that state. Therefore for each state is only necessary to hold the information of the best solution until the end, ruling out suboptimal options. This rule prevents the exponential growth in the decision sequence path, scaling the problem only linearly.
Yet, other reasons of explosive dimensionality growth remain, namely the number of states, as well as decision and outcome spaces. For small financial decision problems, conventional DP algorithms are able to find the optimal policy. For real-scale problems, gross simplifications are often necessary to keep tractability. Sometimes, these simplifications render the model unrealistic turning results meaningless.
Finding an appropriate combination of financial instruments in a portfolio can be portrayed as a Markov Decision Problem (MDP). A MDP is defined as a succession of states that are reached through decisions (actions) followed by an outcome (reaction) of the system. After each decision, the system evolves into another state, according probabilities defined by the previous state and the decision taken. Each transition between states has a cost, usually dependent on the action taken, and a reward produced by the reaction of the system, as depicted in Figure 1. In these processes, a decision maker should choose a sequence of actions so that the expected sum of rewards minus the sum of incurred costs is maximized over a certain period of time. According to the Bellman´s Principle of Optimality, for every MDP can be determined a series of Value Functions, which represents the continuation value of each state. The continuation value associated to a given state is the expected sum of rewards that the optimal strategy would yield from that state until the end of the process (or the expected average reward if the MDP is infinite).
It is easy to see how the value functions of a MDP can be found using a classic backwards DP algorithm. Starting from the final states, a DP algorithm exhaustively calculates the continuation value for a discrete number of states. All these continuation values, collected in a lookup table, constitute later the Value Functions which are accurate but not very compact or easy to calculate. After acquiring the Value Functions, it is simple to find an optimal decision for each state as the one that maximizes the sum of the expected reward and the expected continuation value of the next state or states. However, the problems that a DP algorithm can address by this procedure are restricted by the size of their state spaces.
Other forms to represent and/or approximate the Value Functions can then be proposed. These approaches should not require exhaustive calculation of every state. The Value Functions can be interpolated between computed states. Approximate Dynamic Programming algorithms are built on this cornerstone: approximation of the Value Functions in the state space domain. The estimation methods can be linear regressions, artificial neural networks, etc. Several authors make detailed analysis of MDPs and the use of ADP and DP algorithms to solve them [2], [3].
For approximating and updating the Value Functions, the proposed algorithm uses linear regression on Gaussian radial basis functions jointly with Monte Carlo simulations to consider randomness. An interior-point optimization algorithm is implemented to make decisions.
The ADP algorithm starts with a series of approximations of the value functions, usually constant. Then taking a Monte Carlo sample, a simulation of the system is conducted. At each decision stage, the algorithm makes a decision that is optimal regarding to the current state and the available approximations. Finally, after each Monte Carlo simulation, decisions and outcomes are used to refine the estimation of the Value Functions and a complementary Risk Functions, denoted by and respectively. The process continues iteratively, until a certain termination criterion is fulfilled. A simple diagram of this approach is illustrated inFigure 2.

Figure 1. Markov Decision Process depiction
According to the Bellman´s Principle of Optimality, for every MDP can be determined a series of Value Functions, which represents the continuation value of each state. The continuation value associated to a given state is the expected sum of rewards that the optimal strategy would yield from that state until the end of the process (or the expected average reward if the MDP is infinite).
It is easy to see how the value functions of a MDP can be found using a classic backwards DP algorithm. Starting from the final states, a DP algorithm exhaustively calculates the continuation value for a discrete number of states. All these continuation values, collected in a lookup table, constitute later the Value Functions which are accurate but not very compact or easy to calculate. After acquiring the Value Functions, it is simple to find an optimal decision for each state as the one that maximizes the sum of the expected reward and the expected continuation value of the next state or states. However, the problems that a DP algorithm can address by this procedure are restricted by the size of their state spaces.  [2], [3].
For approximating and updating the Value Functions, the proposed algorithm uses linear regression on Gaussian radial basis functions jointly with Monte Carlo simulations to consider randomness. An interior-point optimization algorithm is implemented to make decisions.
The ADP algorithm starts with a series of approximations of the value functions, usually constant. Then taking a Monte Carlo sample, a simulation of the system is conducted. At each decision stage, the algorithm makes a decision that is optimal regarding to the current state and the available approximations. Finally, after each Monte Carlo simulation, decisions and outcomes are used to refine the estimation of the Value Functions and complementary Risk Functions, denoted by V t and R t respectively. The process continues iteratively, until a certain termination criterion is fulfilled. A simple diagram of this approach is illustrated in Figure 2.

Regression technique
An approximation of a system involves finding a function that can predict the output values for a given set of inputs. In this case, the inputs are the state variables and the decision and the outputs are the continuation value and risk.
To approximate the value functions one could use several methods such as linear regressions, artificial neural networks, splines, etc. In the context of electricity trading, it will be discussed how to use linear regression with radial basis functions. The same principles and techniques apply to any regression based on linear parameters.
The main feature of the algorithm is that the approximation is used to make decisions while collecting new data to further improve the currently available approximation. Thus, it is necessary that the regressed function fitting the data can be readily updated as new data is simulated, making the improved approximations immediately available for the next simulation.

Regression technique
An approximation of a system involves finding a function that can predict the output values for a given set of inputs. In this case, the inputs are the state variables and the decision and the outputs are the continuation value and risk.
To approximate the value and risk functions one could use several methods such as linear regressions, artificial neural networks, splines, etc. In the context of electricity trading, it will be discussed how to use linear regression with radial basis functions. The same principles and techniques apply to any regression based on linear parameters.
The main feature of the algorithm is that the approximation is used to make decisions while collecting new data to further improve the currently available approximation. Thus, it is necessary that the regressed function fitting the data can be readily updated as new data is simulated, making the improved approximations immediately available for the next simulation. To fulfill this requirement, a recursive regression technique can be used.
Let be a set of inputs X and a set of collected data from outputs Y , The matrix of inputs X has n data points and g dimensions and the matrix of outputs Y has n data points and h dimensions. Then, the approximation function is: Where the kernel functions φ(X ) transform the input variables X into a k-dimensional space and are in general non-linear. Here, it is important to notice that despite the fact the φ functions are nonlinear, the approximation is still linear with respect the parameters θ. Therefore, the regression parameters can easily be found by solving a linear system of equations.
The parameter vector θ that minimizes the mean quadratic error MQE of the estimated outputs is: The dimension of matrices X , Y and Z increase as new points are simulated, and thus the estimated parameters change as new data is collected. On the other hand, the algorithm needs the approximation to get new data. One approach could be calculating the new parameters after a number of simulations with equation (4) but it soon reveals impractical as an increasing amount of data need to be stored. Instead, by storing only the matrix A and the parameter vector θ, the matrix itself can recursively be updated jointly with the parameters, following the process described in [2]: In the i-th iteration, a new data point x i , y i is simulated and the matrices X and Y become: Also the matrix Z becomes: Replacing these values in (4): Using the Sherman-Morrison formula: Where L is a k × k invertible matrix and u is a k-dimensional row vector: Dynamic Programming and Bayesian Inference, Concepts and Applications 96 ( ) 1 1 T  T  T  1  1  1  1  1  T  T  T  1  1  1  1  1  T  T  1  1 1 Replacing Naming Now, simplifying Taking common factor − ρ i A i−1 z i T of the last 3 terms: Finally, canceling out we get a formula to update the parameter vector: Note that in order to update the approximations, it is needed to store and update only the values of the matrix A and the parameter vector θ. The complete set of update formulas is then: By using the equations (20), (21) and (22), it is possible to update a regression while using it to take decisions within the ADP algorithm without explicitly storing the entire dataset and performing the matrix inversion after each Monte Carlo simulation. However, an important issue arises at the start of the recursive process, namely the starting values for both the parameter vector θ and the matrix A. One possibility is to carry first a number of simulations with random decisions, and then use the data collected to calculate the starting values for θ and A. Another option is to use a diagonal matrix: This option is usually simpler despite the fact that setting the absolute values of the elements requires care and it is not always clear in the literature. To set suitable values, a closer look at the regression formulas for matrix A is needed: The matrix A i is the inverse of a matrix sum whose diagonal elements are positive and increasing with i. That means that the absolute value of the diagonal elements of A would in general decrease as i increases. The relation factor between the starting value of all the elements a j, j and the corresponding expected value of (1 / z p, j ) 2 controls how new values modify the parameter vector θ.
Setting starting diagonal elements with a small δ value implies implies a large number of iterations i. This would mean that the starting parameter vector represents already a large amount of "fictional" data. In that case, any new real data sample will have little impact on the parameter vector, and the convergence rate of the approximation might be significantly slowed. On the other hand, setting initial diagonal values calculated with large δ enables the algorithm to large modifications on the parameter vector as new data is collected, which could cause instability of the parameter vector and in the decision computed, again reducing the overall convergence rate. In practice, the factor δ depends on the problem and it must be assessed accordingly. From extensive experimentation, values of δ between 1 and 1000 tend to work most of the time.
For initializing the parameter vector θ 0 , it is usual to set a constant vector. Values obtained from the formula Ŷ s = φ ( E(X )) ⋅ θ 0 produce coherent results. It is important to notice that the starting values of the parameter vector are important in cases where δ is small, while for a large δ the starting vector value is lost in the first iterations.

Kernel functions
The approximations are highly influenced by the type and parameterization of the kernel functions. These functions transform the original inputs x m , 1 ≤ m ≤ g into a k-dimensional space where a linear correlation between outputs and transformed inputs is more accurate.
There are several kernel functions, such as polynomial, trigonometric, logarithmic, radial basis, etc. The algorithm proposed in this work uses Gaussian radial basis functions, whose general formula is: where c j is 1xg vector called a center or centroid, with g being the number of input variables.
The functions measure the distance of each variable in the input state space to k centers and then transform each distance using a Gaussian function. The Euclidean norm is typically used to measure the distance. Nevertheless, any other norm can be considered.
The number and relative position of centers c j are important. The number of centers determines the dimension of the matrices in the linear regression, and also the smoothing and overfitting properties of the approximation. There must be enough centers and they must be placed to "cover" the entire state space, avoiding blanks in regions of interest while keeping the number of centers to a minimum. A random setting of the centers in the state space can serve in some cases. However, it is usually more appropriate to place the centers using techniques such as Latin Hyper Cube Sampling, as it is done in the implemented algorithm.

ADP and linear regressions in the context of distributed computing
In the context of ADP, a large amount of the calculation is attributed to the Monte Carlo simulations. Consequently, distributed computing techniques can be exploited to dramatically improve and speed up the optimization. The optimal point between thread opening and result gathering basically depends on the hardware system, network latency and simulation times. With time-consuming simulations, it is often useful to parallelize the program after each approximation update, leaving several threads gather more data in parallel before any update. But if the simulation time is short, the overhead times to distribute the calculation are usually larger than the speed up achieved through parallel simulation. In such cases, another approach should be used. A practical approach is to run parallel ADP algorithms for the same problem in several threads in order to combine and synchronize all approximations at regular time intervals. The combination of the results of two or more independent threads is based on the same mathematical basis as the recursive update, provided that the non-linear parametric functions are identical for all threads.
As shown in equation (4), the optimal parameter vector is: where each z i and y i represent the transformed inputs and the outputs respectively of the i-th simulation of a total of n data points. As i increases, the matrices B and C summarize all the Dynamic Programming and Bayesian Inference, Concepts and Applications regression data collected. The data simulated by several threads of ADP can be easily combined using these matrices. As parallel ADP threads a and b gather different sets of data, the parameter vectors that they approximate and update are different However, saving the matrices B and C allows to recombine all the data later as This result can be generalized to any number of parallel threads.
In general, data gathered from several parallel algorithms can be recombined by updating and saving the matrices B and C after each simulation i of each thread j. The update formulas are: After a fixed simulation time, they can be synchronized and recombined, so all the threads share the same data gathered: Finally, each thread can restart the simulation process using as starting values for θ and A 1 - The final approximation is not as good as the one carried out by a single thread for the same amount of simulations. This is due to the fact that the decisions taken by the single thread algorithm have always all the information gathered up to the decision point while the multithread algorithm lacks the information gathered by other threads since the last synchronization. Nevertheless, with the correct choice of synchronization and simulation cycles, the overall optimization process can run much faster.

Electricity markets
Since about two decades, the power industry had undergone a major restructuring in many countries. The former vertically-owned and centrally-planned electricity utilities have been unbundled in separate and independent business segments: the generation, transmission and distribution sectors. Unlike the latter two segments, which have remained as natural monopolies under regulation, power generation is now a business subject to competition in the open marketplace [5].
Electricity is a commodity with some very distinctive features. First, modern societies are exceedingly dependent on a continuous delivery of electrical power, placing a very high value to supply reliability. Because electrical energy cannot be economically stored in considerable amounts, production and consumption must be continuously in perfect balance. In addition, power demand is nearly price irresponsive (inelastic) in the short-run. Therefore, power prices often escalate to very high quotes (price spikes) when supply/demand conditions are tight. Most of these circumstances are short-lived, e.g. equipment outages, transmission congestion climatic events, etc., and price rapidly reverse to normal levels [6].
The exceptionally high volatility of electricity prices imposes high financial risks when trading electricity and forces generation companies to make decisions and commitments under high uncertainty. Thus, stochastic modeling and optimal decision making under uncertainty are key tasks in modern power trading and power risk management [7].

Trading power in spot markets
Currently, electricity is traded forward on bilateral negotiations and on centrally-run electronic platforms like power exchanges. Electricity can be traded in anticipation from one to three years (mostly OTC) to several months ahead. Shorter term forward markets negotiating electricity with delivery horizon of weeks, or even one day in advance in the so-called dayahead markets are quite common. Market liquidity is typically higher as the contracting horizon shortens.
Because of technical requirements of real time system balance, spot power markets are always centralized and run by entities in charge of the physical operation of the system to keep reliability and security. Equilibrium spot prices are computed each 5 min, 30 min or in hourly basis according to the realized power demand and the last bid accepted for keeping the system balance in real time. Rigorously, locational marginal prices are set in order to account for transmission constraints. Therefore, spot prices reflect the actual conditions of the system at the time of delivery.
Though spot prices are subject to high uncertainty, liquidity of this market is warranted as the generator can always sell its production at real time prices. For this reason, the spot market is regarded a last-resort market. One advantage of participating only in this market is that unavailability of the generating unit does not have financial consequences for the generator other than the opportunity cost of the lost production.
Let p S (t) the prevailing price at the t-th time interval in the spot market, P S (t) the power delivered by the generating unit and P max its generating capacity. Under the hypothesis the generator is price-taker and its marginal costs of generation, denoted by MC(t), are constant with the rate of production, the optimal operating policy is: The operating profit B S (t) the generator obtains in the spot market by implementing this optimal production policy can be written as: This equation shows the operating flexibility of generator to alter its output in response to the spot price in order to avoid operating losses if prevailing spot prices drop below marginal costs.   the optimal operating policy is described by the following equation: It is noteworthy to observe that the first term in equation is the probability of obtaining a zero profit in the spot market. Note also that E p S = MC only if MC = 0. By selling the production in the spot market, the generator never incurs in operating losses, i.e. Pr(b S < 0) = 0 as it can immediately stop production if p S < MC.
We consider now the more general case where generating units are unavailable, either for planned or unplanned reasons, during a fraction of the time. Let p the failure probability and q = 1 − p the probability of the unit being available, provided the failure and operating states are the only two mutually exclusive states in which the generator resides. We further assume that the price level and the state of the generator are statistically independent. Under these considerations, the generator cannot always capture de spread p S − MC and thus the probability of obtaining a positive profit will decrease accordingly. The expected operating profit under these conditions is then: The probability of having zero profit β 0 = Pr(b S = 0) is given by:  p f p dp q f p dp p q f p dp The resulting probability density function of the hourly operating profit is illustrated in Figure  4. Despite the high variance of the profits, the function clearly shows that the generator cannot lose money when participating in the spot market, even if the unit is technically unavailable. Figure 4. Probability density function of the operating profit in the spot market of electricity

Trading electricity in forward markets
Given the dramatic volatility of real time electricity prices, a major activity of power trading is structuring hedging strategies by means of tradable derivative instruments like future and option contracts [8]. A power company owning a set of generating units may decide either to sell electricity in advance at a fixed price in a forward market, or wait to the time of delivery and receive the spot price. Deciding on committing production forward or being exposed to volatility of real-time power prices has however a drastic impact on risk.
By selling forward its production, the generator may hedge against a sudden decline of electricity spot prices during the delivery horizon, thereby securing an operating margin. This hedging strategy isolates the generator from the price risk. However, the generator in exchange resigns the opportunity of selling electricity in the spot market if high prices happen.
Electricity markets are typically arranged under a two-settlement system. This approach preserves the economy and efficiency of the physical operation of the power system from any financial commitment the market players have entered into in the past. Under the twosettlement scheme, only deviations from contractual obligations are negotiated in the spot market.
The revenue from the forward contracting is given by the volume sold P F times the price p F agreed in the forward contract, i.e. R F = p F P F . On the other hand, the revenue captured by selling in the spot market is given by R S = p S ΔP = p S (P S − P F ). So, the total revenue R F from forward contracting and delivering power in the spot market is given as: We can observe the utility of forward contracts by inspecting this equation. If generator delivers in the spot market an amount equal to its contractual obligation, i.e. P S = P F , the total revenue is set equal to p F P F irrespective of the fluctuations of the spot price p S .
At the time of delivery, and assuming the generator is price-taker, the term P F (p F − p S ) is fixed and represents the profit of the forward contract against the spot market. Therefore, the profitmaximizing production policy is the same and given by the spot price, irrespective of the contractual obligations. The profit the generator can make by selling electricity in forward markets is given by the expression: In Figure 5A, a probability density function of the spot price is depicted. In the following, it is assumed that the forward market price is an unbiased estimator of the spot price at the time of delivery. Therefore, the condition p F = E(p S ) holds. In the forward contract, the generator makes a profit for unit of capacity b F = p F − MC, assuming p F > MC. Otherwise, the generator is better by avoiding entering into a forward obligation with negative profit. For realizing this profit, the generator must be able to deliver in the spot market the contracted volume in the exact amount. This profit level is achieved as long as the spot price exceeds the marginal cost, i.e. the probability of making this profit is Pr Graphically, this probability is represented by the dark grey area under the pdf of the spot price (cf. Figure 5A). The generator can make additional profits in the forward contract, b F = p F − p S > p F − MC, each time the spot price drops below the marginal cost, i.e. p S < MC. In fact, the generator is better buying replacement power in the spot market than incurring in fuel costs generating with its own facilities. Figure 5B illustrates the pdf of the profit of a forward contract. When compared with the profit distribution in the spot market (cf. Figure  5), it is easily noticeable the drastic reduction of the profit variance under forward contracting. The forward obligation sets a floor for profits, reducing dramatically dispersion of results and thereby the price risk. In exchange, the generator also foregoes the chance of profiting at times of high power prices in the spot market. The expected profit of a forward contract in terms of the pdf of the spot price f (p S ) can be expressed as: p MC f p dp p p f p dp p MC f p dp p f p dp It can be mathematically demonstrated that under rational expectations and efficiency of forward markets, i.e. the forward price is an unbiased estimator of the spot price p F = E( p S ), the condition E(b F ) = E(b S ) holds [1]. This means that for a risk-neutral generator both policies, either selling in the spot market or hedging in forward markets, are entirely equivalent. Nevertheless, for risk-averse players (which is the rule in real market settings), the hedged strategy is clearly preferred as profit expectation remain unaltered while price risk is eliminated.

Delivery risk in forward contracting
If we consider again unplanned outages of generating units, hedging price risk in the forward markets exposes generating companies to other class of risk, i.e. delivery risk, which also referred as quantity or volume risk. We further examine this important issue. When a generator under a contractual obligation is unable to deliver in the spot market the contracted amount, i.e. P S ≠ P F , the generator is forced to buy replacement power in the spot market at the prevailing price at that time. This may configure a very significant loss if while the generator is down the spot price is considerable higher than its own marginal costs, i.e. p S ≫ MC. Under this situation, the generator may be compelled to buy very expensive replacement power to honor the obligation, incurring in a potentially high financial loss. It is interesting to note that if when the unit is unavailable spot prices are lower or equal than the marginal cost, the generator can even make an extra profit b F = p F − p S > p F − MC. The probability density function of the forward position under consideration of positive failure probability and the associated delivery risk is illustrated in Figure 5C.
The expected profit can be computed as the expected value of the contract under the hypothesis of fully reliable unit times the probability of being available: q p MC f p dp p p f p dp E (51) As for modern units q ≅ 1, the change in the expected profit due to unit unavailability is typically negligible. However, downside risk increases substantially.
Assuming statistical independence between the unit´s failure and the level of spot prices, the probability of incurring in losses is given by: and the conditional expectation on the value of losses can be written as: The expected profit can be computed as the expected value of the contract under the hypothesis of fully reliable unit times the probability of being available: As for modern units 1 q  , the change in the expected profit due to unit unavailability is typically negligible. However, downside risk increases substantially.
Assuming statistical independence between the unit´s failure and the level of spot prices, the probability of incurring in losses is given by: and the conditional expectation on the value of losses can be written as: From the profit pdf of the forward contract f (b F ), the downside risk metrics, namely the value at risk (VaR) and the conditional value at risk (CVaR) for a δ confidence level can respectively be computed as: Dynamic Programming and Bayesian Inference, Concepts and Applications

Problem formulation
Let consider a small generation portfolio running in an electricity market. The power company owning the generation portfolio wants to determine the best-selling strategy of the energy production, which would maximize the expected profit while financial risk is constrained. Any trading strategy x is defined by the amount of energy to be sold in each different available selling instrument i in the electricity market, for example, the spot market, day-ahead obligations, annual forward contracts, etc.
It is important to notice that the trading strategy sets the amounts of energy committed in every forward instrument, but only estimates the amount of energy to be actually sold in the spot market. Indeed, actual energy production is stochastic and depends on technical availability of generating units and the spread between spot and fuel prices. Suppose that whatever trading strategy is decided now, it could be changed in future decision stages in order to rebalance the portfolio. Composition of the portfolio can be rebalanced only at a cost however, i.e. the transaction costs. The process is then a sequence of balancing decisions determined by the trading strategy, each followed by a stochastic reward according to the position taken in the market. The process is over after a number of periods n.
The optimization of the trading strategy can mathematically be formulated as a stochastic nonlinear problem involving the maximization of the expected profit accrued by the generation portfolio across all instruments i and time intervals t: , ,   The constraint (57) represents the hourly energy balance for all Monte Carlo samples and forces the generator to settle in the spot market differences between the energy sold in forward markets and actual production. Constraints (58) and (59) are introduced to avoid financial positions without physical counterpart, i.e. avoid the generator to take speculative positions by selling in forward markets more energy that the generation portfolio can produce. These constraints may be replaced by capital restrictions as regulations often allow financial trading without physical position. Finally, constraint (60) represents the limit to the financial risk associated to the selling strategy within each period t. In order to limit risk over the horizon time, the selected risk metric must be coherent [9] ensuring subadditivity. There are several downside risk measures that fulfill this requirement among which the Conditional Value at Risk (CVaR) is the most widely used (cf. equation 55).
The equation (61) represents the revenue generated by each forward contract and the revenue (or cost) due to selling (or buying) differences between energy sold in futures and the real generation in the spot market forced by restriction (57). The equation (62) represents the costs of the operating policy, which is independent of the forward obligation for a price taker as demonstrated in Section 3. Therefore, the three equations (57), (61) and (62) calculated for the whole Monte Carlo set represent then the profit calculated by equation (49) of Section 3. Finally the transaction costs in equation (63) are assumed to be the 3% of the total transacted value only for forward contracts, and not existent for the spot market.

Modeling stochastic spot and forward electricity prices
The problem formulation relies on Monte Carlo simulations to represent uncertainty on the future development of key variables. In addition, stochastic simulations are used to confront the algorithm with mapping scenarios for approximating both, the Value and the Risk functions.
A synthetic ensemble of 2000 annual realizations of hourly power prices in the spot market were generated by means of spectral representation techniques [10]. Forward prices and spot prices are not statistically independent. The forward prices corresponding to each spot price sample are calculated considering both, the expected value of the spot price and the mean value of each spot price time series. To simulate the changes in the forward prices accounting for the correlation to each sample of the spot prices a simple model is introduced.
Under perfect competition and rationality on the expectation formation, the price of a forward product should converge to the mean expected spot prices for the delivery period. This is relatively easy to calculate for the first hour of the simulated time. Assuming that the whole Monte Carlo set of spot prices was simulated taking the same price forecast as the market, the price of any forward should be the mean spot prices for the delivery period: Finally, a contango situation is considered in the forward market. A risk premium of 8% in excess of expected spot prices is considered for the forward prices in the future market to compensate for the volatility risk. This premium reduces linearly during the delivery period of the future and becomes zero at the last hour of delivery.
The model for future prices for each instrument i, hour h and sample w is thus as follows: The operation-failure cycles of the generating unit are obtained from a chronological Markovian stochastic simulation. For each spot price sample, a time series of power output is synthesized for every generation unit. The hourly power output is simulated o following three steps: 1. Based on failure and repair rates defined by the state the unit resided in the previous hour, a random failure is simulated [12]. If a failure is in place, the output power is set to zero for this particular hour.  This unit model accounts for the fact that peaking units exhibit higher availability rates. This result is explained by the fact the failure probability is typically very small when the unit is in the stand-by state. A generator is economically called online if its marginal cost of production is below the prevailing spot prices, following the decision model of equation (42). Variable costs of generation are assumed linear with power output, i.e. marginal costs are constant.
The operation-failure cycles of the generating unit are obtained from a chronological Markovian stochastic simulation. For each spot price sample, a time series of power output is synthesized for every generation unit. The hourly power output is simulated o following three steps: 1. Based on failure and repair rates defined by the state the unit resided in the previous hour, a random failure is simulated [12]. If a failure is in place, the output power is set to zero for this particular hour.

2.
The dispatch of the unit is simulated, taking into account the marginal cost of generation and the prevailing sample spot price at that time interval. Here perfect foresight of the spot price is assumed in order to decide the dispatch and fulfill the minimal generation times.

3.
If dispatched, other unit's technical restrictions are fulfilled, e.g. ramping capabilities. This chronological stochastic model reproduces with accuracy the dynamics involved in failure and repair cycles of generators, giving the possibility to select different failure rates depending on whether the unit is generating or is in stand-by.

Risk constraint formulation
As already mentioned before, the financial decision process can be modeled by means of a MDP. Naming profit B w for each sample the sum of income, cost and transaction cost over all instruments, the objective function (56) becomes: Exchanging the order of the expectation operator and the summation, and expanding the summation: where n is the total number of time periods considered.
At this point and considering the first term depends only on the initial market position x 0 and the first strategy decision x 1 , the maximization can be decomposed using the Bellman´s Principle of optimality as follows: If x is independent of the Monte Carlo sample, the terms inside the summation over all future periods (t = 2 … n) are simply the expected profit for the period t after a decision x t −1 → x t . However, the model should take into account that future strategy decisions may be different for each Monte Carlo sample, accounting for adjustments the decision-maker almost certainly would execute to face specific scenarios. Then, the expected profit B t and the decision itself for future decision stages will depend on a set of variables s t , which represent the variables considered by the decision maker in order to adapt the strategy to a particular situation and the equation (72) becomes: Defining the continuation value functions V t as: The maximization can be solved by a set of recursive maximizations, each one solving only one decision stage: With this model, the optimization can be decomposed in steps and the dynamic nature of a strategy can be accurately replicated. Despite the fact future trading decisions x t =2…n are considered and optimized, the practical product of this procedure is the new optimal rebalanced state x t =1 starting from the previous trading position x t =0 . The further trading positions are only optimal given the current information available and should be reconsidered later. Therefore, each new trading position (x t =2 , x t =3, …, x t =n ) should be the product of a similar optimization incorporating the additional market information available immediately before.
The value functions provide the expected continuation value within a state space defined by the state variables, x t , x t −1 and s t 3 . However, the continuation functions, which are essential to solve the optimization problem, are unknown beforehand. It is here that the ADP approach is introduced to approximate the value and risk functions for the state space.
The set of constraints remains the same as they were already defined for each period t. Nevertheless, special considerations should be made to calculate de risk and to fulfill the riskconstrained optimization. As V t only account for the expected profit, risk functions R t should also similarly be calculated for the same state space in order to enforce the risk constraint.
By definition, the linear regression will deliver an approximation that minimizes the mean square error on the entire dataset. In a stochastic setting where the same inputs leads to several different outputs, the regression will accurately estimate the expected value for a given set of inputs, provided the sample size and the approximation order are appropriate. This fits perfect for the case of the value function but for the approximation of the risk function some problems arise.
Let suppose that the CVaR is chosen as risk metric. As the algorithm progresses, new data points are collected, i.e. a set of state variables and its corresponding simulated profits for the period. For approximating the CVaR associated to a particular point in the state space, one approach is to select a subset from the dataset whose input variables are "close" to the point. Then the CVaR is calculated first by sorting the profit values and then taking the mean of those below the specified α-quantile. Now, let suppose that a new data point is simulated and an update of the CVaR approximation is needed. To do so, the process described must be repeated, but now including the new data point. This simple approach has large disadvantages: all the data points must be stored and the mean is not easily updated as old data may be excluded or included of the zone below the α-quantile. These drawbacks are caused by the fact that the CVaR is quantile-based. To solve these difficulties, another solution is envisioned. Instead of approximating directly the CVaR, another risk measure is used to approximate the CVaR within the space state. The risk measure used is called Relative Lower Semideviation (RLS) and it is moment based instead. Hence, it can be updated more easily and without needing to store the entire dataset. These types of risk measures are described in detail in [13][14][15][16] and for a stochastic profit P t have the form: where the equation (77) is the negative semideviation of degree p of the stochastic profit P t .
It can be proven that these moment-based risk measures are coherent if 0 ≤ a ≤ 1 and p ≥ 1. To approximate a CVaR with a 5%-quantile, in this work the parameters used are a = 1 and p = 9.5. To compute the approximation, two linear regressions were used to calculate the expectations on the profit and on the negative deviations, which can be updated using the same method proposed for the value functions. In a Monte Carlo scheme, a large amount of data is needed to compute reliable risk estimations. Therefore, the risk constraint should not be enforced at the early iterations of the ADP algorithm, simply because the sample size of the dataset is small to get a consistent and statistically converged risk value.

Algorithm validation
With the objective of validating the results of the proposed ADP algorithm, a first simple exemplary case is considered in which a thermal generator sells energy in the spot market and in a future contract. The results of the ADP algorithm were compared with the results of a conventional DP algorithm for which the space-state was discretized appropriately.
The fractions of energy sold in the spot market and in a quarter future contract were optimized considering that the future can be traded during the delivery period. The optimization determines three decision stages during this period, one at the beginning of each month, consisting on sell or buy energy in the future market based on the previous state. The spacestate previous to each decision is defined only by the level of future already sold, in order to keep tractable the DP problem.

Validation results
The results obtained by solving the problem by means of the ADP and DP algorithms are presented in Figure 7. The plots represent the expected profit and the downside risk measured by the CVaR of the optimal strategy as a function of the initial state, i.e. the energy already committed in the future contract at the initial stage. An excellent agreement between the optimal strategies obtained by ADP and DP is evidenced, validating the proposed approach.
It can be noticed that the expected profit rises as the amount of energy sold forward increases. This is caused by the risk premium paid to the generator in the future market, i.e. the mean future prices are higher than the mean spot prices. Additionally, the transaction costs are not compensated by the risk premium; therefore the best trading strategy is to maintain similar involvement in the future market to the initial level without rebalancing the portfolio. This is illustrated in Figure 8, where the optimal decisions for the first month are practically the same when solving with a conventional DP and an ADP approach.
It is noteworthy to observe in Figure 7 that financial risk lessens when forward contracting in the future market increases. This means that for the conventional generator considered, which present a high availability, the delivery risk is lower than the risk of not being dispatched in the spot market. The behavior of the risk curve is closely related to the unit's failure and reparation rates and to the marginal production costs. Generators with low marginal costs are in the first places of the dispatch merit order, and hence the risk of not being dispatched is low. Moreover, high failure rates imply also a higher delivery risk. Out of these relations arise a broad number of risk curves that differ from one generator to another and suggest that considerable risk mitigation by aggregating different generators in a portfolio is possible. Dynamic Programming and Bayesian Inference, Concepts and Applications 118

Optimal policy for portfolio rebalancing
For the base study case, a slightly more complex system was examined. In this case, the generation portfolio comprises five generation units of 2 MW each with a constant marginal generation cost of 50$/MWh, failure rate of 1/950h -1 , reserve failure rate of 1/9950 h -1 and repair rate of 1/50 h -1 . These rates give a failure probability of 5% while the unit is in operation and of 0.5% while in standby.
An annual future, four quarterly futures and the spot market were considered together with 12 monthly trading decision stages for rebalancing the portfolio. The space state for this arrangement is defined by the amount sold forward as well as spot and future prices before each decision point and for each realization. Therefore, the decisions are chosen taking into account additional information about the state of the market for each realization. Even though the simplicity of the example, the additional variables cause a drastic increase in the dimension of the problem and would force an unacceptable coarse discretization of the space-state in order to keep the problem tractable with conventional DP. In this case, the maximum admissible expected loss for the 5%-confidence level is set to CVaR 5% =$20000 for each decision stage. Since the risk measure is coherent, i.e. sub-additive, the annual risk is less or equal to $20000•12 periods=$240000. Transaction costs are set to 3% of the dollar amount contracted in the forward market. The optimization problem was solved on a Beowulf cluster comprising 20 multicore Intel i7 2600K 3.4 GHz processors connected by a Gbit LAN. The 160 available computation nodes were fully exploited and the total computation time was 5 h.
In Figure 9, the results of the ADP algorithm for the optimal strategy on the first rebalancing period with a previous trade equal to zero are illustrated. The amount of energy to be committed in each market is expressed in terms of a fraction of the maximal energy output the generation portfolio would generate without failures in the period, i.e. 10MWh per hour of operation. The prices for the traded futures are also presented in Figure 9 except for the 2 nd quarter future which is 45.63$/MWh and it is not shown as the optimal trade does not include this contract, presumably because the price is too low and it is better to wait for a better price in the spot market and sell in subsequent decisions. Likewise, the expected spot prices for each quarter are displayed, except for the last quarter which is 52.45$/MWh. Finally, the expected annual profits without considering fixed costs and the risk estimated by the RLS for the first rebalancing period are shown. Note that the expected profit is calculated considering that the following trading decisions are made taking into account the particular sample price realization, capturing the adaptation to the market developments. Thus, the rebalancing decisions for the second up to the last period are not unique.

Sensitivity to unit availability
In order to investigate the sensitivity of the optimal trading strategy to the unit availability, a second case was considered. Under these conditions, all the parameters are identical to the described base case, except for the failure and repair rates. Operation failure rate was set to 1/850h -1 , a reserve failure rate to 1/9850 h -1 and repair rate of 1/150h -1 . With these rates, the failure probability is 15% in operation and 1.5% in standby. 25 would generate without failures in the period, i.e. 10MWh per hour of operation. The prices for the traded futures are also presented in Figure 9 except for the 2 nd quarter future which is 45.63$/MWh and it is not shown as the optimal trade does not include this contract, presumably because the price is too low and it is better to wait for a better price in the spot market and sell in subsequent decisions. Likewise, the expected spot prices for each quarter are displayed, except for the last quarter which is 52.45$/MWh. Finally, the expected annual profits without considering fixed costs and the risk estimated by the RLS for the first rebalancing period are shown. Note that the expected profit is calculated considering that the following trading decisions are made taking into account the particular sample price realization, capturing the adaptation to the market developments. Thus, the rebalancing decisions for the second up to the last period are not unique. Figure 9. Optimal trading strategy for a 5x2MW generation portfolio

Sensitivity to unit availability
In order to investigate the sensitivity of the optimal trading strategy to the unit availability, a second case was considered. Under these conditions, all the parameters are identical to the described base case, except for the failure and repair rates. Operation failure rate was set to 1/850h -1 , a reserve failure rate to 1/9850 h -1 and repair rate of 1/150h -1 . With these rates, the failure probability is 15% in operation and 1.5% in standby.
In Figure 10, the optimal trading strategy delivered by the ADP algorithm for the case of decreased unit availability is depicted. The differences on the sell strategy are evident for the annual future and the total amount of energy left to be sold in the spot market. Because of units have more frequent and longer random failures, a long-term commitment is avoided. However, the high prices for the fourth quarter push the sell in future markets up to 100%. In comparison to the base case results, the expected annual profit is slightly lower. The risk for the first month is irrelevant, because the forward commitment is around 50% and the probability of having more than two units (> 40%) unavailable is very low, leading to almost all unit failures can be covered by the remaining available units.  In Figure 10, the optimal trading strategy delivered by the ADP algorithm for the case of decreased unit availability is depicted. The differences on the sell strategy are evident for the annual future and the total amount of energy left to be sold in the spot market. Because of units have more frequent and longer random failures, a long-term commitment is avoided. However, the high prices for the fourth quarter push the sell in future markets up to 100%. In comparison to the base case results, the expected annual profit is slightly lower. The risk for the first month is irrelevant, because the forward commitment is around 50% and the probability of having more than two units (> 40%) unavailable is very low, leading to almost all unit failures can be covered by the remaining available units.

Sensitivity to transaction costs
The third case of study is identical to the base case but considering a transaction cost of 7% of the sold amount instead of 3%. The optimal trading policy for the case of increased transaction costs are shown in Figure 11. In this case, the generator sells only 1.32% of its capacity in an annual future contract due to the irreversibility introduced by the higher transactions costs and the larger contracting volume. Under these circumstances, the risk premium offered in the annual future price render insufficient for attracting the generator to enter in such a long-lasting commitment. On the other hand, the sell volume in quarterly futures is higher than in the base case, staying between 62% and 76% in a rather static trading policy. Under higher transaction fees, it is desirable to be able to rebalance the portfolio in future stages with smaller changes and hence smaller transaction costs. Expectedly, the

Sensitivity to transaction costs
The third case of study is identical to the base case but considering a transaction cost of 7% of the sold amount instead of 3%. The optimal trading policy for the case of increased transaction costs are shown in Figure 11. In this case, the generator sells only 1.32% of its capacity in an annual future contract due to the irreversibility introduced by the higher transactions costs and the larger contracting volume. Under these circumstances, the risk premium offered in the annual future price render insufficient for attracting the generator to enter in such a longlasting commitment. On the other hand, the sell volume in quarterly futures is higher than in the base case, staying between 62% and 76% in a rather static trading policy. Under higher transaction fees, it is desirable to be able to rebalance the portfolio in future stages with smaller changes and hence smaller transaction costs. Expectedly, the expected profit is lower due to higher costs. The delivery risk for the first month is negligible, due to the fact that the most likely failures can still be covered by the remaining operative units without buying replacement power in the spot market.
26 Figure 10. Optimal trading strategy for a 5x2MW generation portfolio with reduced availability

Sensitivity to transaction costs
The third case of study is identical to the base case but considering a transaction cost of 7% of the sold amount instead of 3%. The optimal trading policy for the case of increased transaction costs are shown in Figure 11. In this case, the generator sells only 1.32% of its capacity in an annual future contract due to the irreversibility introduced by the higher transactions costs and the larger contracting volume. Under these circumstances, the risk premium offered in the annual future price render insufficient for attracting the generator to enter in such a long-lasting commitment. On the other hand, the sell volume in quarterly futures is higher than in the base case, staying between 62% and 76% in a rather static trading policy. Under higher transaction fees, it is desirable to be able to rebalance the portfolio in future stages with smaller changes and hence smaller transaction costs. Expectedly, the expected profit is lower due to higher costs. The delivery risk for the first month is negligible, due to the fact that the most likely failures can still be covered by the remaining operative units without buying replacement power in the spot market.   Figure 11. Optimal trading strategy for a 5x2MW generation portfolio with higher transaction costs

Conclusions
Optimal decision-making under uncertainty is a field of active research and uppermost relevance in science, engineering and computational finance. Conventional optimization approaches have difficulties and serious limitations for tackling high-dimensional problems often encountered in real world settings. Recent advances in operation research and computation technology opened new possibilities for approaching optimization problems that were considered intractable until recent times. This chapter presents an efficient Approximate Dynamic Programming algorithm for solving complex stochastic optimization problems and amenable for running in a distributed computing environment. The implemented ADP algorithm has been validated against conventional Dynamic Programming for a simple problem.
The proposed algorithm uses Monte Carlo simulation techniques combined with linear regression for successively approximating and refining the continuation and risk functions. A novel and efficient procedure for updating these functions, combining calculations of independent computing threads and without storing the entire datasets, is proposed. This feature enables exploiting the currently widespread multicore processor architectures and deploying the algorithm in large computation clusters.
In order to demonstrate the practicability of the envisioned approach, the proposed algorithm has been applied to find the optimal trading strategy of a power generation portfolio in forward and spot electricity markets. Power trading and risk management is currently a central activity of power companies running in liberalized electricity markets. The probability density functions of the profits a generator would make by participating in either the spot or the forward markets are extremely different. The forms and boundaries of these probability functions have drastic implications for risk when generators get involved in the spot or the forward markets. Generators can hedge price risk of spot markets by contracting forward, but by exposing themselves to delivery risk. Hence, the optimization problem is formulated as the maximization of the expected profit of the trading policy while the downside risk is constrained. For doing so, the generator selects and combines a portfolio of annual and quarterly forward contracts as well as involvement in the spot market. A frictional market with nonnegligible transaction costs is considered.
A detailed chronological 4-state reliability model of generating units has been adopted for replicating stochastic behavior of random outages. Large stochastic ensembles of spot prices and forward prices time series have been synthetized for this application. In order to retain subadditivity, downside risk is measured by CVaR. The approximation of CVaR by a momentbased risk metric drastically improves computational efficiency while providing accurate and consistent risk estimations.
Applying ADP-based optimization techniques to electricity markets is a novel undertaking and opens a prospectively fertile avenue for research. In future works, further algorithmic enhancement are foreseen. Application of these methods for designing trading strategies that considers a larger set of available financial contracts as well as generation portfolios comprising renewable resources would provide results and findings of high practical significance.