A Distributed Computing Architecture for the Large-Scale Integration of Renewable Energy and Distributed Resources in Smart Grids

We present a distributed computing architecture for smart grid management, composed of two applications at two different levels of the grid. At the high voltage level, we optimize operations using a stochastic unit commitment (SUC) model with hybrid time resolution. The SUC problem is solved with an asynchronous distributed subgradient method, for which we propose stepsize scaling and fast initialization techniques. The asynchronous algorithm is implemented in a high-performance computing cluster and benchmarked against a deterministic unit commitment model with exogenous reserve targets in an industrial scale test case of the Central Western European system (679 buses, 1037 lines, and 656 generators). At the distribution network level, we manage demand response from small clients through distributed stochastic control, which enables harnessing residential demand response while respecting the desire of consumers for control, privacy, and simplicity. The distributed stochastic control scheme is successfully tested on a test case with 10,000 controllable devices. Both applications demonstrate the potential for efficiently managing flexible resources in smart grids and for systematically coping with the uncertainty and variability introduced by renewable energy. the error due to asynchronous execution; and (iii) we propose two methods for a faster initialization of the algorithm. The asynchronous algorithm is implemented in a high-performance computing (HPC) cluster and benchmarked against a deterministic unit commitment model with exogenous reserve targets (DUCR). We find that distributed computing allows solving SUC within the same time frame required for solving DUCR.


Introduction
The progressive integration of renewable energy resources, demand response, energy storage, electric vehicles, and other distributed resources in electric power grids that has been taking place worldwide in recent years is transforming power systems and resulting in numerous operational challenges, including uncertainty of supply availability, distributed storage management, real-time coordination of distributed energy resources, and changing directions of flow in distribution networks. These challenges demand a shift of the traditional centralized power system operations paradigm toward the smart grid paradigm [1], where distributed computing and control stand out as a promising technology with the potential of achieving operations with optimal performance. The academic literature includes various applications of distributed computing in power system operations, including long-and mid-term planning, short-term scheduling, state estimation and monitoring, real-time control, and simulation [2][3][4][5]. Early studies pointed out several challenges related to communications and the heterogeneous characteristics of distributed computing systems, which needed to be addressed first in order to implement distributed computing applications. Nowadays, standard communication protocols are a mature technology and most current distributed computing resources can perform a broad range of operations. Such advances in distributed computing technology have paved the way for developing and implementing scalable distributed algorithms for power systems.
The prevailing industry view, as we move forward into the future smart grid, is that it will entail: (i) broadcasting of dynamic prices or other information and (ii) telemetry backhaul to market participants. In the proposed model, distributed energy resource aggregators are often regarded as transaction brokers between end customers and various upstream market participants. The "failure-free market" design for a pure market-driven solution under this paradigm has been elusive, despite decades of research and development. In this chapter, we analyze the deployment of distributed computing as an enabling tool for managing the shortterm operations of smart grids in two levels: • At the level of the high-voltage grid, we centrally optimize operations using a stochastic unit commitment (SUC) model, which endogenously allocates reserve capacity by explicitly modeling uncertainty. Specifically, we present an asynchronous distributed algorithm for solving SUC, which extends the asynchronous algorithm proposed in Ref. [6] in three aspects: (i) we propose a hybrid approach for modeling quarterly dispatch decisions alongside hourly commitment decisions; (ii) we introduce a stepsize scaling on the iterative method to diminish the error due to asynchronous execution; and (iii) we propose two methods for a faster initialization of the algorithm. The asynchronous algorithm is implemented in a high-performance computing (HPC) cluster and benchmarked against a deterministic unit commitment model with exogenous reserve targets (DUCR). We find that distributed computing allows solving SUC within the same time frame required for solving DUCR.
• At the level of the distribution grid, we rely on stochastic distributed control to manage consumer devices using the ColorPower architecture [7][8][9], which enables harnessing flexible residential demand response while respecting the desire of consumers for control, privacy, and simplicity. The ColorPower control approach is inspired by the very automatic cooperative protocols that govern Internet communications. These protocols represent a distributed and federated control paradigm, in which information and decision-making authority remain local, yet global system stability is ensured.
Centralized clearing at the high-voltage grid level and distributed clearing at the distribution grid level can be integrated in a cooptimization framework, as recently proposed by Caramanis et al. [10]. These two applications of distributed computing in power system operations demonstrate the potential to fully harness the flexibility of the grid and smoothly integrate large shares of renewable and other distributed energy resources in power systems without deteriorating the quality of service delivered to consumers.
The rest of the chapter is organized as follows: Section 2 introduces the deterministic and stochastic unit commitment problems. Section 3 proposes an asynchronous algorithm for solving SUC and presents numerical experiments on a network of realistic scale. Section 4 presents the ColorPower architecture for managing demand response in the distribution grid and demonstrates its capability through a numerical experiment. Finally, Section 5 concludes the chapter.
2. High-voltage power grid optimization models

Overview
Operations of the high-voltage power grid are typically scheduled in two stages: (i) day-ahead scheduling, where operations are planned based on forecast conditions for the system and the on/off status of slow generators is fixed and (ii) real-time scheduling, where system operators balance the system for the actual conditions using the available flexibility in the system. Models for short-term scheduling are solved on a daily basis, and they occupy a central role in clearing power markets and operating power systems.
Until recently, power system operators have relied on deterministic short-term scheduling models with reserve margins to secure the system against load forecast errors and outages [11][12][13][14]. The integration of renewable energy sources has placed these practices under question because they ignore the inherent uncertainty of renewable energy supply, thereby motivating system operators and researchers to look for systematic methods to address uncertainty in real-time operations. A consistent methodology for mitigating the impacts of renewable energy uncertainty-and operational uncertainty in general-is stochastic programming. Stochastic models for short-term scheduling (i.e., SUC models) were originally considered in the seminal work of Takriti et al. [15] and Carpentier et al. [16], as an approach for mitigating demand uncertainty and generator outages. Subsequently, numerous variants of the SUC model have been proposed, which differ on the number of stages, the source of uncertainty, the representation of uncertainty, and the solution methods that are used. See Ref. [17] and references therein for a recent survey.
In the present work, we use the deterministic and stochastic unit commitment models for dayahead scheduling presented in Sections 3.1 and 3.2. The proposed models differ from previously proposed models in the literature in which they use hybrid time resolution: hourly commitment decisions (u, v, w and z) and 15-min dispatch decisions (p, r and f). This formulation allows modeling subhourly phenomena, which have been shown to be important for the operation of systems with significant levels of renewable energy integration [18].

Deterministic unit commitment with exogenous reserve targets
Using the notation provided in the beginning of the section, we model deterministic unit commitment with reserves (DUCR) as the minimization problem Eqs. (1)- (9). s:t: P À g u g, τðtÞ ≤ p g, t , p g, t þ r 2 g, t þ r 3 g, t ≤ P þ g u g, τðtÞ ∀g ∈ G SLOW , t ∈ T 15 ð5Þ P À g u g, τðtÞ ≤ p g, t , p g, t þ r 2 g, t ≤ P þ g u g, τðtÞ , p g, t þ r 2 g, t þ r 3 g, t ≤ P þ g ∀g ∈ G\G SLOW , t ∈ T 15 ð6Þ ÀTL g þ ðTL g À R À g Þ u g, τðtÞ ≤ p g, t À p g, tÀ1 ∀g ∈ G, t ∈ T 15 ð7Þ The objective function Eq. (1) corresponds to the total operating cost, composed by the no-load cost, the startup cost, and the production cost. Constraints Eq. (2) enforce nodal power balance, while allowing for production shedding. Demand shedding can be included in the present formulation as having a very expensive generator connected to each bus. Eq. (3) enforces the reserve margins on each area of the system, allowing for reserve cascading (secondary reserve capacity can be used to provide tertiary reserve). Eq. (4) models DC power flow constraints in terms of bus angles and thermal limits of transmission lines.
The feasible production set of thermal generators is described by Eqs. (5)- (9). Production and reserve provision limits are expressed as Eq. (5) for slow generators, that can provide reserves only when they are online, and as Eq. (6) for the remaining set of generators, which can provide secondary reserves when they are online and tertiary reserves both when they are online and offline. Ramp rate constraints Eqs. (7)-(8) are based on the formulation provided by Frangioni et al. [19]. Ramp-up rate constraints Eq. (8) enforce, in addition to the ramp-up rate limit on production, that there is enough ramping capability between periods t À 1 and t to ramp-up r 2 g, t MW within ΔT 2 minutes (which can be used to provide secondary reserve), and to ramp-up r 2 g, t þ r 3 g, t MW within ΔT 3 minutes (which can be used to provide tertiary reserve). Constraints Eq. (9) enforce minimum up and down times, as proposed by Rajan and Takriti [20].
Boundary conditions of the problem are modeled by allowing the time indices to cycle within the horizon, in other words, for any commitment variable x Á, τ with τ < 1, we define x Á, τ :¼ x Á, ððτÀ1Þ mod jT60jþ1Þ . Similarly, for any dispatch variable x Á, t with t < 1 or t > jT 15 j, we define x Á, t :¼ x Á, ððtÀ1Þ mod jT15jþ1Þ . In this fashion, we model initial conditions (τ < 1, t < 1) and restrain end effects of the model (τ ¼ jT 60 j, t ¼ jT 15 j), simultaneously. In practical cases, initial conditions are given by the current operating conditions and end effects are dealt with by using an extended look-ahead horizon.

Scenario decomposition of the SUC problem
The SUC problem in Eqs. (10)-(17) grows linearly in size with the number of scenarios. Hence, SUC problems are in general of large scale, even for small system models. This motivated Takriti et al. [15] and Carpentier et al. [16] to rely on Lagrangian decomposition methods for solving the problem.
Recent SUC studies have focused on designing decomposition algorithms, capable of solving the problem in operationally acceptable time frames. Papavasiliou et al. [21] proposed a dual scenario decomposition scheme where the dual is solved using the subgradient method, and where the dual function is evaluated in parallel. Kim and Zavala [22] also used a dual scenario decomposition scheme, but solved the dual problem using a bundle method. Cheung et al. [23] present a parallel implementation of the progressive hedging algorithm of Rockafellar and Wets [24].
All previously mentioned parallel algorithms for SUC are synchronous algorithms, i.e., scenario subproblems are solved in parallel at each iteration of the decomposition method; however, it is necessary to solve all scenario subproblems before advancing to the next iteration. In cases where the solution times of subproblems differ significantly, synchronous algorithms lead to an underutilization of the parallel computing infrastructure and a loss of parallel efficiency. We have found instances where the time required to evaluate subproblems for difficult scenarios is 75 times longer than the solution time for easy scenarios.
Aiming at overcoming the difficulties faced by synchronous algorithms, we propose an asynchronous distributed algorithm for solving SUC. The algorithm is based on the scenario decomposition scheme for SUC proposed in Ref. [21], where the authors relax the nonanticipativity constraints Eq. (16) and form the following Lagrangian dual problem where h 0 ðμ, νÞ and h s ðμ s , ν s Þ are defined according to Eqs. (19) and (20), respectively. We use boldface to denote vectors and partial indexation of dual variables with respect to scenarios, so that μ s :¼ ½μ g 1 , s, 1 … μ g jGj , s, 1 T . The constraints within the infimum in Eq. (20) refer to constraints Eqs. (11)-(15) for scenario s (dropping the scenario indexation of variables).
h 0 ðμ, νÞ :¼ inf Both h 0 ðμ, νÞ and h s ðμ s , ν s Þ for all s ∈ S are nondifferentiable convex functions. Evaluating h 0 ðμ, νÞ amounts to solving a small integer programming problem, for the constraints of which we have a linear-size convex hull description [20]. Evaluating h s ðμ s , ν s Þ amounts to solving a deterministic unit commitment (DUC) problem without reserve requirements, which is a mixed-integer linear program of potentially large scale for realistic system models. In practice, the run time for evaluating h s ðμ s , ν s Þ for any s and any dual multipliers is at least two orders of magnitude greater than the run time for evaluating h 0 ðμ, νÞ.
The proposed distributed algorithm exploits the characteristics of h 0 ðμ, νÞ and h s ðμ s , ν s Þ in order to maximize Eq. (18) and compute lower bounds on the optimal SUC solution, while recovering feasible nonanticipative commitment schedules with associated expected costs (upper bounds to the optimal SUC solution). The dual maximization algorithm is inspired by the work of Nedić et al. on asynchronous incremental subgradient methods [25].

Dual maximization and primal recovery
For simplicity, assume that we have 1 þ DP þ PP available parallel processors which can all access a shared memory space. We allocate one processor to coordinate the parallel execution and manage the shared memory space, DP ≤ jSj processors to solve the dual problem in Eq. (18) and PP processors to recover complete solutions to the SUC problem in Eqs. (10)- (17). Interactions between different processors are presented in Figure 1. Figure 1 Asynchronous algorithm layout. Information within square brackets is read or written at a single step of the algorithm.
We maximize the dual function in Eq. (18) using a block coordinate descent (BCD) method, in which each update is performed over a block of dual variables associated with a scenario, ðμ s , ν s Þ for certain s ∈ S, following the direction of the subgradient of the dual function in the block of variables ðμ s , ν s Þ. The BCD method is implemented in parallel and asynchronously by having each dual processor perform updates on the dual variables associated with a certain scenario, which are not being updated by any other dual processor at the same time. Scenarios whose dual variables are not currently being updated by any processor are held in the dual queue Q D , to be updated later.
We maintain shared memory registers of Q D . We denote the current multipliers as μ kðsÞ s , ν LB. Additionally, a shared memory register of the primal queue Q P is required for recovering primal solutions. Then, each dual processor performs the following operations: 1. Read and remove the first scenario s from Q D .

5.
Read the current global iteration count k and perform a BCD update on the dual multipliers  (19) and (20).

7.
Let kðsÞ :¼ kðsÞ þ 1 and update in memory: 8. Add s at the end of Q D and return to 1.
Steps 1-3 of the dual processor algorithm are self-explanatory.
Step 4 constructs a compound of the previous iterates which is useful for computing lower bounds.
During the execution of the algorithm, step 5 will perform updates to the blocks of dual variables associated to all scenarios. As h s ðμ s , ν s Þ is easier to evaluate for certain scenarios than others, the blocks of dual variables associated to easier scenarios will be updated more frequently than harder scenarios. We model this process, in a simplified fashion, as if every update is performed on a randomly selected scenario from a nonuniform distribution, where the probability of selecting scenario s corresponds to where T between s is the average time between two updates on scenario s (T between s is estimated during execution). The asynchronous BCD method can then be understood as a stochastic approximate subgradient method [26,27]. This is an approximate method for three reasons: (i) as the objective function contains a nonseparable nondifferentiable function h 0 ðμ, νÞ, there is no guarantee that the expected update direction coincides with a subgradient of the objective of Eq. (8) at the current iterate, (ii) h 0 ðμ, νÞ is evaluated for a delayed version of the multipliers ðμ, νÞ, and (iii) h 0 ðμ, νÞ and h s ðμ s , ν s Þ are evaluated only approximately up to a certain MILP gap. Provided that we use a diminishing, nonsummable and square-summable stepsize α k of the type 1=k q , and that the error in the subgradient is bounded, the method will converge to an approximate solution of the dual problem in Eq. (8) [26,27].
In step 6, we compute a lower bound on the primal problem Eqs. (10)-(17) using previous evaluations of h s ðμ s , ν s Þ recorded in memory, as proposed in Ref. [6].
Step 7 updates the shared memory registers for future iterations and step 8 closes the internal loop of the dual processor.
We recover primal solutions by taking advantage of the fact that ðu Ã SLOW , v Ã SLOW Þ is a feasible solution for ðw, zÞ in Eqs. (10)- (17). Therefore, in order to compute complete primal solutions and obtain upper bounds for problem in Eqs. (10)-(17), we can fix w :¼ u Ã SLOW and z :¼ v Ã SLOW and solve the remaining problem, as proposed in Ref. [28]. After fixing ðw, zÞ, the remaining problem becomes separable by scenario; hence, in order to solve it, we need to solve a restricted DUC for each scenario in S. These primal evaluation jobs, i.e., solving the restricted DUC for fu Ã SLOW g Â S, are appended at the end of the primal queue Q P by dual processors after each update (step 7.e). Note that we do not require storing v Ã SLOW because its value is implied by u Ã SLOW .
The primal queue is managed by the coordinator process, which assigns primal jobs to primal processors as they become available. The computation of primal solutions is therefore also asynchronous, in the sense that it runs independently of dual iterations and that the evaluation of candidate solutions u Ã SLOW does not require that the previous candidates have already been evaluated for all scenarios. Once a certain candidate u l has been evaluated for all scenarios, the coordinator can compute a new upper bound to Eqs. (10)-(17) as where UB l s is the upper bound associated with u l on the restricted DUC problem of scenario s. The coordinator process keeps track of the candidate associated with the smaller upper bound throughout the execution.
Finally, the coordinator process will terminate the algorithm when 1 À LB=UB ≤ E, where E is a prescribed tolerance, or when reaching a prescribed maximum solution time. At this point, the algorithm retrieves the best-found solution and the bound on the distance of this solution from the optimal objective function value.

Dual algorithm initialization
The lower bounds computed by the algorithm presented in the previous section depend on previous evaluations of h s ðμ s , ν s Þ for other scenarios. As the evaluation of h s ðμ s , ν s Þ can require a substantial amount of time for certain scenarios, the computation of the first lower bound considering nontrivial values of h s ðμ s , ν s Þ for all scenarios can be delayed significantly with respect to the advance of dual iterations and primal recovery. In other words, it might be the case that the algorithm finds a very good primal solution but it is unable to terminate because it is missing the value of h s ðμ s , ν s Þ for a single scenario.
In order to prevent these situations and in order to obtain nontrivial bounds faster, in the first pass of the dual processors over all scenarios, we can replace h s ðμ s , ν s Þ with a surrogate η s ðμ s , ν s Þ which is easier to compute, such that η s ðμ s , ν s Þ ≤ h s ðμ s , ν s Þ for any ðμ s , ν s Þ. We propose two alternatives for η s ðμ s , ν s Þ:
The LP approach requires solving a linear program of the same size as the original problem in Eq. (20), but it has the advantage that it can be obtained as an intermediate result while evaluating h s ðμ s , ν s Þ (the LP approach does not add extra computations to the algorithm). The OPF approach, on the other hand, requires solving many small MILP problems, which can be solved faster than the linear relaxation of Eq. (20). The OPF approach ignores several constraints and cost components, such as the startup cost of nonslow generators, and it adds extra computations to the algorithm.

Implementation and numerical experiments
We implement the DUCR model using Mosel and solve it directly using Xpress. We also implement the proposed asynchronous algorithm for SUC (described in the previous subsections) in Mosel, using the module mmjobs for handling parallel processes and communications, while solving the subproblems with Xpress [29]. We configure Xpress to solve the root node using the barrier algorithm and we set the termination gap to 1%, for both the DUCR and SUC subproblems, and the maximum solution wall time to 10 hours. Numerical experiments were run on the Sierra cluster hosted at the Lawrence Livermore National Laboratory. Each node of the Sierra cluster is equipped with two Intel XeonEP X5660 processors (12 cores per node) and 24GB of RAM memory. We use 10 nodes for the proposed distributed algorithm, assigning 5 nodes to dual processors, with 6 dual processors per node (DP ¼ 30), and 5 nodes to primal recovery, with 12 primal processors per node. The coordinator is implemented on a primal node and occupies one primal processor (PP ¼ 59).
We test the proposed algorithm on a detailed model of the Central Western European system, consisting of 656 thermal generators, 679 nodes, and 1037 lines. The model was constructed by using the network model of Hutcheon and Bialek [30], technical generator information provided to the authors by ENGIE, and multiarea demand and renewable energy information collected from national system operators (see [31] for details). We consider eight representative day types, one weekday and one weekend day per season, as being representative of the different conditions faced by the system throughout the year.
We consider 4 day-ahead scheduling models: the DUCR model and the SUC model with 30 (SUC30), 60 (SUC60), and 120 (SUC120) scenarios. The sizes of the different day-ahead scheduling models are presented in Table 1, where the size of the stochastic models refers to the size of the extensive form. While the DUCR model is of the scale of problems that fit in the memory of a single machine and can be solved by a commercial solver, the SUC models in extensive form are beyond current capabilities of commercial solvers. Table 2 presents the solution time statistics for all day-ahead scheduling policies. In the case of SUC, we report these results for the two dual initialization alternatives proposed in Section 3.2.
The results of Table 2 indicate that the OPF initialization significantly outperforms the LP approach in terms of termination time. This is mainly due to the fact that the OPF approach provides nontrivial lower bounds including information for all scenarios much faster than the LP approach. On the other hand, the solution times of SUC60 and DUCR indicate that, using distributed computing, we can solve SUC at a comparable run time to that required by commercial solvers for DUCR on large-scale systems. Moreover, as shown in Table 3, for a given hard constraint on solution wall time such as 2 h (which is common for day-ahead power system operations), the proposed distributed algorithm provides solutions to SUC with up to 60 scenarios within 2% of optimality, which is acceptable for operational purposes.

Overview
Residential demand response has gained significant attention in recent years as an underutilized source of flexibility in power systems, and is expected to become highly valuable as a balancing resource as increasing amounts of renewable energy are being integrated into the grid. However, the mobilization of demand response by means of real-time pricing, which represents the economists' gold standard and can be traced back to the seminal work of Schweppe et al. [32], has so far fallen short of expectations due to several obstacles, including regulation issues, market structure, incentives to consumers, and technological limitations.
The ColorPower architecture [7,8,9] aims at releasing the potent power of demand response by approaching electricity as a service of differentiated quality, rather than a commodity that residential consumers are willing to trade in real time [33]. In this architecture, the coordination problem of determining which devices should consume power at what times is solved through distributed aggregation and stochastic control. The consumer designates devices or device modes using priority tiers (colors). These tiers correspond to "service level" plans, which are easy to design and implement: we can simply map the "color" designations of electrical devices into plans. A "more flexible" color means less certainty of when a device will run (e.g., time when a pool pump runs), or lower quality service delivered by a device (e.g., wider temperature ranges, slower electrical vehicle charging). These types of economic decision-making are eminently compatible with consumer desires and economic design, as evidenced by the wide range of quality-of-service contracts offered in other industries.
Furthermore, the self-identified priority tiers of the ColorPower approach enable retail power participation in wholesale energy markets, lifting the economic obstacles for demand response: since the demand for power can be differentiated into tiers with a priority order, the demand in each tier can be separately bid into the current wholesale or local (DSO level) energy markets. The price for each tier can be set according to the cost of supplying demand response from that tier, which in turn is linked to the incentives necessary for securing customer participation in the demand response program. This allows aggregated demand to send price signals in the form of a decreasing buy bid curve. Market information thus flows bidirectionally. A small amount of flexible demand can then buffer the volatility of the overall power demand by yielding power to the inflexible devices as necessary (based upon the priority chosen by the customer), while fairly distributing power to all customer devices within a demand tier.
Technological limitations to the massive deployment of demand response are dealt with by deploying field-proven stochastic control techniques across the distribution network, with the objective of subtly shifting the schedules of millions of devices in real time, based upon the conditions of the grid. These control techniques include the CSMA/CD algorithms that permit cellular phones to share narrow radio frequency bands, telephone switch control algorithms, and operating system thread scheduling, as well as examples from nature such as social insect hive behaviors and bacterial quorum sensing. Moreover, the ubiquity of Internet communications allows us to consider using the Internet platform itself for end-to-end communications between machines.
At a high level, the ColorPower algorithm operates by aggregating the demand flexibility state information of each agent into a global estimate of total consumer flexibility. This aggregate and the current demand target are then broadcast via IP multicast throughout the system, and every local controller (typically one per consumer or one per device) combines the overall model and its local state to make a stochastic control decision. With each iteration of aggregation, broadcast, and control, the overall system moves toward the target demand, set by the utility or the ISO, TSO, or DSO, allowing the system as a whole to rapidly achieve any given target of demand and closely tracking target ramps. Note that aggregation has the beneficial side-effect of preserving the privacy of individual consumers: their demand information simply becomes part of an overall statistic.
The proposed architectural approach supplements the inadequacy of pure market-based control approaches by introducing an automated, distributed, and cooperative communications feedback loop between the system and large populations of cooperative devices at the edge of the network. TSO markets and the evolving DSO local energy markets of the future will have both deep markets and distributed control architecture pushed out to the edge of the network. This smart grid architecture for demand response in the mass market is expected to be a key asset in addressing the challenges of renewable energy integration and the transition to a low-carbon economy.

The ColorPower control problem
A ColorPower system consists of a set of n agents, each owning a set of electrical devices organized into k colors, where lower-numbered colors are intended to be shut off first (e.g., 1 for "green" pool pumps, 2 for "green" HVAC, 3 for "yellow" pool pumps, etc.), and where each color has its own time constants.
Within each color, every device is either Enabled, meaning that it can draw power freely, or Disabled, meaning that has been shut off or placed in a lower power mode. In order to prevent damage to appliances and/or customer annoyance, devices must wait through a Refractory period after switching between Disabled and Enabled, before they return to being Flexible and can switch again. These combinations give four device states (e.g., Enabled and Flexible, EF), through which each device in the ColorPower system moves according to the modified Markov model of Figure 2: randomly from EF to DR and DF to ER (becoming disabled with probability p off and enabled with probability p on ) and by randomized timeout from ER to EF and DR to DF (a fixed length of T ÁF plus a uniform random addition of up to T ÁV ).
The ColorPower control problem can then be stated as dynamically adjusting p on and p of f for each agent and color tier, in a distributed manner, so that the aggregate consumption of the system follows a demand goal given by the operator of the high-voltage network.

The ColorPower architecture
The block diagram of the ColorPower control architecture is presented in Figure 3. Each ColorPower client (i.e., the controller inside a device) regulates the state transitions of the devices under its control. Each client state sðt, aÞ is aggregated to produce a global state estimateŝðtÞ, which is broadcasted along with a goal gðtÞ (the demand target set by the utility or the ISO, TSO, or DSO), allowing clients to shape demand by independently computing the control state cðt, aÞ.
The state sðt, aÞ of a client a at time t sums the power demands of the device(s) under its control, and these values are aggregated using a distributed algorithm (e.g., a spanning tree in Ref. [7]) and fed to a state estimator to get an overall estimate of the true stateŝðtÞ of total demand in each state for each color. This estimate is then broadcast to all clients (e.g., by gossip-like diffusion in Ref. [7]), along with the demand shaping goal gðtÞ for the next total Enabled demand over all colors. The controller at each client a sets its control state cðt, aÞ, defined as the set of transition probabilities p on, i, a and p off, i, a for each color i. Finally, demands move through their states according to those transition probabilities, subject to exogenous disturbances such as changes in demand due to customer override, changing environmental conditions, imprecision in measurement, among others. Figure 2 Markov model-based device state switching [8,9].
Note that the aggregation and broadcast algorithms must be chosen carefully in order to ensure that the communication requirements are lightweight enough to allow control rounds that last for a few seconds on low-cost hardware. The choice of algorithm depends on the network structure: for mesh networks, for example, spanning tree aggregation and gossipbased broadcast are fast and efficient (for details, see [7]).

ColorPower control algorithm
The ColorPower control algorithm, determines the control vector cðt, aÞ by a stochastic controller formulated to satisfy four constraints: Goal tracking: The total Enabled demand in sðtÞ should track gðtÞ as closely as possible: i.e., the sum of Enabled demand over all colors i should be equal to the goal. This is formalized as the equation: Color priority: Devices with lower-numbered colors should be shut off before devices with higher-numbered colors. This is formalized as: so that devices are Enabled from the highest color downward, where D i is the demand for the ith color and above: Figure 3 Block diagram of the control architecture [8,9].
ðjEF j j þ jER j j þ jDF j j þ jDR j jÞ: Fairness: When the goal leads to some devices with a particular color being Enabled and other devices with that color being Disabled, each device has the same expected likelihood of being Disabled. This means that the control state is identical for every client.
Cycling: Devices within a color trade-off which devices are Enabled and which are Disabled such that no device is unfairly burdened by initial bad luck. This is ensured by asserting the constraint: This means that any color with a mixture of Enabled and Disabled Flexible devices will always be switching the state of some devices. For this last constraint, there is a tradeoff between how quickly devices cycle and how much flexibility is held in reserve for future goal tracking; we balance these with a target ratio f of the minimum ratio between pairs of corresponding Flexible and Refractory states.
Since the controller acts indirectly, by manipulating the p on and p off transition probabilities of devices, the only resource available for meeting these constraints is the demand in the flexible states EF and DF for each tier. When it is not possible to satisfy all four constraints simultaneously, the ColorPower controller prioritizes the constraints in order of their importance. Fairness and qualitative color guarantees are given highest priority, since these are part of the contract with customers: fairness by ensuring that the expected enablement fraction of each device is equivalent (though particular clients may achieve this in different ways, depending on their type and customer settings). Qualitative priority is handled by rules that prohibit flexibility from being considered by the controller outside of contractually allowable circumstances. Constraints are enforced sequentially. First comes goal tracking-the actual shaping of demand to meet power schedules. Second is the soft color priority, which ensures that in those transient situations when goal tracking causes some devices to be in the wrong state, it is eventually corrected. Cycling is last, because it is defined only over long periods of time and thus is the least time critical to satisfy. A controller respecting the aforementioned constraints is described in Ref. [8].

Numerical experiment
We have implemented and tested the proposed demand response approach into the ColorPower software platform [8]. Simulations are executed with the following parameters: 10 trials per condition for 10,000 controllable devices, each device consumes 1 kW of power (for a total of 10 MW demand), devices are 20% green (low priority), 50% yellow (medium priority) and 30% red (high priority), the measurement error is ε = 0.1% (0.001), the rounds are 10 seconds long and all the Refractory time variables are 40 rounds. Error is measured by taking the ratio of the difference of a state from optimal versus the total power.
The results of the simulation test are shown in Figure 4. When peak control is desired, the aggregate demand remains below the quota, while individual loads are subjected stochastically to brief curtailments. Post-event rush-in, a potentially severe problem for both traditional demand response and price signal-based control systems, is also managed gracefully due to the specific design of the modified Markov model of Figure 2.
Taken together, these results indicate that the ColorPower approach, when coupled with an appropriate controller, should have the technological capability to flexibly and resiliently shape demand in most practical deployment scenarios.

Conclusions
We present two applications of distributed computing in power systems. On the one hand, we optimize high-voltage power system operations using a distributed asynchronous algorithm capable of solving stochastic unit commitment in comparable run times to those of a deterministic unit commitment model with reserve requirements, and within operationally acceptable time frames. On the other hand, we control demand response at the distribution level using stochastic distributed control, thereby enabling large-scale demand shaping during real-time operations of power systems. Together, both applications of distributed computing demonstrate the potential for efficiently managing flexible resources in smart grids and for systematically coping with the uncertainty and variability introduced by renewable energy. Figure 4 Simulation results with 10,000 independently fluctuating power loads. Demand is shown as a stacked graph, with enabled demand at the bottom in dark tones, disabled demand at the top in light tones, and Refractory demand cross hatched. The goal is the dashed line, which coincides with the total enabled demand for the experiment. The plot illustrates a peak shaving case where a power quota, the demand response target that may be provided from an externally-generated demand forecast, is used as a guide for the demand to follow.

Nomenclature
Deterministic and stochastic unit commitment Sets T 60 Hourly periods, T 60 :¼ f1, …, jT 60 jg Maximum 15-min ramp down/up, generator g TL g Maximum state transition level, generator g UT g , DT g Minimum up/down times, generator g K g Hourly no-load cost, generator g S g Startup cost, generator g C g ðpÞ Quarterly production cost function, generator g (convex, piece-wise linear) Variables p g, t , p g, s, t Production, generator g, scenario t, quarter t f l, t , f l, s, t Flow through line l, scenario s, quarter t θ n, t , θ n, s, t Voltage angle, bus n, scenario s, quarter t r 2 g , r 3 g Capacity and ramp up rate reservation for secondary and tertiary reserve provision, generator g, quarter t