Open access peer-reviewed chapter

Distributionally Robust Optimization

By Jian Gao, Yida Xu, Julian Barreiro-Gomez, Massa Ndong, Michalis Smyrnakis and Hamidou Tembine

Submitted: November 18th 2017Reviewed: March 22nd 2018Published: September 5th 2018

DOI: 10.5772/intechopen.76686

Downloaded: 1311


This chapter presents a class of distributionally robust optimization problems in which a decision-maker has to choose an action in an uncertain environment. The decision-maker has a continuous action space and aims to learn her optimal strategy. The true distribution of the uncertainty is unknown to the decision-maker. This chapter provides alternative ways to select a distribution based on empirical observations of the decision-maker. This leads to a distributionally robust optimization problem. Simple algorithms, whose dynamics are inspired from the gradient flows, are proposed to find local optima. The method is extended to a class of optimization problems with orthogonal constraints and coupled constraints over the simplex set and polytopes. The designed dynamics do not use the projection operator and are able to satisfy both upper- and lower-bound constraints. The convergence rate of the algorithm to generalized evolutionarily stable strategy is derived using a mean regret estimate. Illustrative examples are provided.


  • distribution robustness
  • gradient flow
  • Bregman divergence
  • Wasserstein metric
  • f-divergence

1. Introduction

Robust optimization can be defined as the process of determining the best or most effective result, utilizing a quantitative measurement system under worst case uncertain functions or parameters. The optimization may occur in terms of best robust design, net cash flows, profits, costs, benefit/cost ratio, quality-of-experience, satisfaction, end-to-end delay, completion time, etc. Other measurement units may be used, such as units of production or production time, and optimization may occur in terms of maximizing production units, minimizing processing time, production time, maximizing profits, or minimizing costs under uncertain parameters. There are numerous techniques of robust optimization methods such as robust linear programming, robust dynamic programming, robust geometric programming, queuing theory, risk analysis, etc. One of the main drawbacks of the robust optimization is that the worst scenario may be too conservative. The bounds provided by the worst case scenarios may not be useful in many interesting problems (see the wireless communication example provided below). However, distributionally robust optimization is not based on the worst case parameters. The distributional robustness method is based the probability distribution instead of worst parameters. The worse case distribution within a certain carefully designed distributional uncertainty set may provide interesting features. Distributionally robust programming can be used not only to provide a distributionally robust solution to a problem when the true distribution is unknown, but it also can, in many instances, give a general solution taking into account some risk. The presented methodology is simple and reduces significantly the dimensionality of the distributionally robust optimization. We hope that the designs of distributionally robust programming presented here can help designers, engineers, cost–benefit analyst, managers to solve concrete problems under unknown distribution.

The rest of the chapter is organized as follows. Section 2 presents some preliminary concepts of distributionally robust optimization. A class of constrained distributionally robust optimization problems are presented in Section 3. Section 4 focuses on distributed distributionally robust optimization. Afterwards, illustrative examples in distributed power networks and in wireless communication networks are provided to evaluate the performance of the method. Finally, prior works and concluding remarks are drawn in Section 5.

Notation:Let R, R+, denote the set of real and non-negative real numbers, respectively, Ωdbe a separable completely metrizable topological space with d:Ω×ΩR+a metric (distance). Let PΩbe the set of all probability measures over Ω.


2. Distributionally robust optimization

This section introduces distributionally robust optimization models. We will first present a generic formulation of the problem. Then, individual components of the optimization and their solvability issues via equivalent formulations will be discussed.

2.1. Model

Consider a decision-maker who wants to select an action aARnin order to optimize her objective raω,where ωis an uncertain parameter. The information structure is the following:

  • The true distribution of ωis not known to the decision-maker.

  • The upper/lower bound (if any) of ωare unknown to the decision-maker.

  • The decision-maker can measure/observe realization of the random variable ω.

The decision-maker chooses to experiment several trials and obtains statistical realizations of ωfrom measurements. The measurement data can be noisy, imperfect and erroneous. Then, an empirical distribution (or histogram) mis built from the realizations of ω.However, mis not the true distribution of the random variable ω,and mmay not be a reliable measure due to statistical, bias, measurement, observation or computational errors. Therefore, the decision-maker is facing a risk. The risk-sensitive decision-maker should decide action that improves the performance of Em˜raωamong alternative distributions m˜within a certain level of deviation ρ>0from the distribution m.The distributionally robust optimization problem is therefore formulated as


where Bρmis the uncertainty set of alternative admissible distributions from mwithin a certain radius ρ>0.Different distributional uncertainty sets are presented: the f-divergence and the Wasserstein metric, defined below.

2.1.1. f-divergence

We introduce the notion of fdivergence which will be used to compute the discrepancy between probability distributions.

Definition 1.Letmandm˜be two probability measures overΩsuch thatmis absolutely continuous with respect tom˜.Letfbe a convex function. Then, thef-divergence betweenmandm˜is defined as follows:


wheredmdm˜is the Radon-Nikodym derivative of the measuremwith the respect the measurem˜.

By Jensen’s inequality:


Thus, Dfmm˜0for any convex function f.Note however that, the fdivergence Dfmm˜is not a distance (for example, it does not satisfy the symmetry property). Here the distributional uncertainty set imposed to the alternative distribution m˜is given by


Example 1.From the notion offdivergence one can derive the following important concept:

  • α-divergence for

  • In particular, Kullback–Leibler divergence (or relative entropy) is retrieved asαgoes to1.

2.1.2. Wasserstein metric

The Wasserstein metric between two probability distributions m˜and mis defined as follows:

Definition 2.Form,m˜PΩ,letΠm˜mbe the set of all couplings betweenmandm˜.That is,


BΩdenotes the measurable sets ofΩ.Letθ1.The Wasserstein metric betweenm˜andmis defined as


It is well-known that for every θ1,Wθm˜mis a true distance in the sense that it satisfies the following three axioms:

  • positive-definiteness,

  • the symmetry property,

  • the triangle inequality.

Note that m˜is not necessarily absolutely continuous with respect to m.Now the distributional uncertainty/constraint set is the set of all possible probability distributions within a Lθ-Wasserstein distance below ρ.


Note that, if mis a random measure (obtained from a sampled realization), we use the expected value of the Wasserstein metric.

Example 2.TheLθ-Wasserstein distance between two Dirac measuresδω0andδω˜0isWθδω0δω˜0=dω0ω˜o.More generally, forK2,theL2-Wasserstein distance between empirical measuresμK=1Kk=1KδωkandνK=1Kk=1Kδω˜kisW22μKνK1Ki=1Kωkω˜k2.

We have defined Bρmand B˜ρm.The goal now is to solve (1) under both fdivergence and Wasserstein metric. One of the difficulties of the problem is the curse of dimensionality. The distributionally robust optimization problem (1) of the decision-maker is an infinite-dimensional robust optimization problem because Bρis of infinite dimensions. Below we will show that (1) can be transformed into an optimization in the form of supinfsup.The latter problem has three alternating terms. Solving this problem requires a triality theory.

2.2. Triality theory

We first present the duality gap and develop a triality theory to solve equivalent formulations of (1). Consider uncoupled domains Ai,i1,2,3.For a general function r2,one has


and the difference


is called duality gap. As it is widely known in duality theory from Sion’s Theorem [1] (which is an extension of von Neumann minimax Theorem) the duality gap vanishes, for example for convex-concave function, and the value is achieved by a saddle point in the case of non-empty convex compact domain.

Triality theory focuses on optimization problems of the forms: supinfsupor infsupinf.The term triality is used here because there are three key alternating terms in these optimizations.

Proposition 1.Leta1a2a3r3a1a2a3Rbe a function defined on the product spacei=13Ai.Then, the following inequalities hold:


and similarly




Thus, for all a2,a3,one has ĝa2a3r3a1a2a3.It follows that, for any a1,a3,


Using the definition of ĝ,one obtains


Taking the infimum in a1yields:


Now, we use two operations for the variable a3:

  • Taking the infimum in the inequality (5) in a3yields


which proves the second part of the inequalities (3). The first part of the inequalities (3) follows immediately from (5).

  • Taking the supremum in inequality (5) in a3yields


which proves the first part of the inequalities (4). The second part of the inequalities (4) follows immediately from (5).

This completes the proof.

2.3. Equivalent formulations

Below we explain how the dimensionality of problem (1) can be significantly reduced using a representation by means of the triality theory inequalities of Proposition 1.

2.3.1. f-divergence

Interestingly, the distributionally robust optimization problem (1) under f-divergence is equivalent to the finite dimensional stochastic optimization problem (when Aare of finite dimensions). To see this, the original problem need to be transformed. Let us introduce the likelihood functional Lω˜=dm˜dmω˜,and set


Then, the Lagrangian of the problem is


where λ0and μR.The problem becomes


A full understanding of problem 6requires a triality theory (not a duality theory). The use of triality theory leads to the following equation:


where his the integrand function λρ+f1μλfr+μλ,where fis Legendre-Fenchel transform of fdefined by


Note that the righthand side of (7) is of dimension n+2,which reduces considerably the dimensionality of the original problem (1).

2.3.2. Wasserstein metric

Similarly, the distributionally robust optimization problem under Wasserstein metric is equivalent to the finite dimensional stochastic optimization problem (when Ais a set of finite dimension). If the function ωraωis upper semi-continuous and Ωdis a Polish space then the Wasserstein distributionally robust optimization problem is equivalent to


The next subsection presents algorithms for computing a distributionally robust solution from the equivalent formulations above.

2.4. Learning algorithms

Learning algorithms are crucial for finding approximate solutions to optimization and control problems. They are widely used for seeking roots/kernel of a function and for finding feasible solutions to variational inequalities. Practically, a learning algorithm generates a certain trajectory (or a set of trajectories) toward a potential approximate solution. Selecting a learning algorithm that has specific properties such as better accuracy, more stability, less-oscillatory and quick convergence is a challenging task [2, 3, 4, 5]. From the calculus of variations point of view, however, a learning algorithm generates curves. Therefore, selecting an algorithm among the others leads to an optimal control problem on the spaces of curves. Hence, it is natural to use optimal control theory to derive faster algorithms for a family of curves. Bergman-based algorithms and risk-aware version of it are introduced below to meet specific properties. We start by introducing the Bregman divergence.

Definition 3.The Bregman divergencedg:A×ARis defined on a differentiable strictly convex functiong:AR.For two pointsabA2, it measures the gap betweengaand the first-order Taylor expansion of g aroundaevaluated atb


Example 3.From the Bregman divergence one gets other features by choosing specific functionsg:

  • Ifga=i=1nai2then the Bregman divergencedgab=i=1naibi2is the squared standard Euclidean distance.

  • If ga=i=1nailogaiis defined on the relative interior of the simplex, i.e., abb01ni=1nbi=1then the Bregman divergence dgab=i=1nailogaibi,is the Kullback–Leibler divergence.

We are now ready to define algorithms for solving the righthand side of (7) and (9). One of the key approaches for error quantification of the algorithm with respect to the distributionally robust optimum is the so-called average regret. When the regret vanishes one gets close to a distributionally robust optimum.

Definition 4.The average regret of an algorithm which generates the trajectoryat=a˜tλtμtwithint0T,t0>0is


2.4.1. Armijo gradient flow

Algorithm 1.The Armijo’s gradient pseudocode is as follows:

1:ProcedureArmijo gradient a0ϵTgmhThe Armijo’s gradient starting froma0within0T

2:       aa0

3:       whileregret>ϵandtTdoWe have the answer if regret is 0

4:              Computeatsolution of (10)

5:              Computeregrett

6:       end while

7:       returnat,regrettget a(t) and the regret

8:end procedure

Proposition 2.LetaEmhaω:Rn+2Rbe a concave function that has a unique global maximizera.Assume thatabe a feasible action profile, i.e.,aA.Consider the continuous time analogue of the Armijo gradient flow[6], which is given by


where a0=a0is the initial point of the algorithm and gis a strictly convex function on a.Let atbe the solution to (10).

Then the average regret withint0T,t0>0is bounded above by




where ais solution to (10). The function Wis positive and ddtW=EmhaωhatωtEmahaωgaa1Emahatω+ddtdgaat.By concavity of Emhaωone has


On the other hand,




where the last inequality is by convexity of g.It follows that ddtWat0along the path of the gradient flow. This decreasing property implies 0WatWa0=dgaa0.In particular, 0tEmhaωhaωWa0<+.Thus, the error to the value Emhaωis bounded by


The announced result on the regret follows by integration over t0Tand by averaging. This completes the proof.

Note that the above regret-bound is established without assuming strong convexity of aEmhaω.Also no Lipschitz continuity bound of the gradient is assumed.

2.4.2. Bregman learning algorithms

Algorithm 2.The Bregman learning pseudocode is as follows:

1:procedureBregmana0ϵTgαβmhThe Bregman learning starting froma0within0T

2:       aa0

3:       whileregret>ϵandtTdoWe have the answer if regret is 0

4:              Computeatsolution of (13)

5:              Computeregrett

6:       end while

7:       returnat,regrettgetatand the regret

8:end procedure

Proposition 3.LetaEmhaω:Rn+2Rbe a concave function that has a unique global maximizera.Assume thatabe a feasible action profile, i.e.,aA.Letαandβbe two functions such thatβ̇teαt.Consider the following Bregman learning algorithm


where a0is the initial point of the algorithm and gis a strictly convex function on a.Let atbe the solution to (13). Then the average regret within t0T,t0>0is bounded above by



Proof.Let Waȧta=dgaat+eαtȧt+eβtEmhaωhatω.It is clear that Wis positive. Moreover, ddtWatȧtta0for β̇eα.Thus WatȧttaWa0ȧ00a=c0.By integration between t0Tit follows


This completes the proof.

In particular, for βs=s+es,one obtains an error bound to the minimum value as


and for βs=s,the regret bound becomes


Figure 1 illustrates the advantage of algorithm (13) compared with the gradient flow (10). It plots the regret bound c0Tt0t0Teβsdsfor β=sand dgaa0logTt0Tt0with an initial gap of c0=25.

Figure 1.

Global regret bound under Bregman vs. gradient. The initial gap isc0=25.

The advantage of algorithms (10) and (13) is that it is not required to compute the Hessian of Emhaωas it is the case in the Newton scheme. As a corollary of Proposition 2 the regret vanishes as Tgrows. Thus, it is a no-regret algorithm. However, Algorithm (10) may not be sufficiently fast. Algorithm (13) provides a higher order convergence rate by carefully designing αβ.The average regret decays very quickly to zero [7]. However, it may generate an oscillatory trajectory with a big magnitude. The next subsection presents risk-aware algorithms that reduce the oscillatory phase of the trajectory.

2.4.3. Risk-aware Bregman learning algorithm

In order to reduce the oscillatory phase, we introduce a risk-aware Bregman learning algorithm [7] which is a speed-up-and-average version of (13) called mean dynamicsm¯of agiven by


with starting vector m¯0=a0,m¯̇0,m¯¨0.

Algorithm 3.The risk-aware Bregman learning pseudocode is as follows:

1:procedureRisk-aware Bregman m¯0ϵTgαβmhThe risk-aware Bregman learning starting fromm¯0within0T

2:       m¯m¯0=a0,m¯̇0,m¯¨0

3:       whileregret>ϵandtTdoWe have the answer if regret is 0

4:             Computem¯tsolution of (15)

5:             Compute regret

6:       end while

7:       returnm¯t,regrettgetm¯tand the regret

8:end procedure

Proposition 4.The time-average trajectory of the learning algorithm (13) generates the mean dynamics (15).

Proof.We use the average relation m¯t=1t0tasdswhere asolves Eq. (13). From the definition of m¯,and by Hopital’s rule, m¯0=a0.Moreover, m¯tand atshare the following equations:


Substituting these values in Eq. (13) yields the mean dynamics (15). This completes the proof.

The risk-aware Bregman dynamics (15) generates a less oscillatory trajectory due to its averaging nature. The next result provides an accuracy bound for (15).

Proposition 5.The risk-aware Bregman dynamics (15) satisfies


Proof.Let m¯t=1t0tasds.Then, m¯t=Ras1t1l0tsds.Thus, m¯t=Eμtawhere μtis the measure with density ts=1t1l0tds.By convexity of Emhaωwe apply the Jensen’s inequality:


In view of (14) one has


This completes the proof.

Definition 5.(Convergence time). Letδ>0andatbe the trajectory generated by Bregman algorithm starting froma0at timet0.The convergence time to be within a ballBEmhaωδof radiusδ>0from the centerrais given by


Proposition 6.Under the assumptions above, the error generated by the algorithm is at most (14) which means that it takes at mostTδ=β1logc0δtime units to the algorithm to be within a ballBraδof radiusδ>0from the centerEmhaω.

Proof.The proof is immediate. For δ>0the average regret bound of Proposition 5,


provides the announced convergence time bound. This completes the proof.

See Table 1 for detailed parametric functions on the bound Tδ.

Example 4.Letfy=ylogydefined onR+.Then,f1=0,and derivatives offarefy=1+logy,fy=1y>0.The Legendre-Fenchel transform offisfξ=y=eξ1.Leta1a2ga=a22,anda1a2ωra1a2ω=1+k=12ωik2ak2.The coefficientωdistribution is unknown but a sampled empirical measuremis considered to be similar to uniform distribution in01with104samples. We illustrate the quick convergence rate of the algorithm in a basic example and plot inFigure 2the trajectories under standard gradient, Bregman dynamics and risk-aware Bregman dynamics (15). In particular, we observe that risk-aware Bregman dynamics (15) provides very quickly a satisfactory value. In this particular setup, we observe that the accuracy of the risk-aware Bregman algorithm (15) att=0.5will need four times (t=2) less than the standard Bregman algorithm to reach a similar level of error. It takes40times moret=20than the gradient ascent to reach that level. Also, we observe that the risk-aware Bregman algorithm is less oscillatory and the amplitude decays very fast compared to the risk-neutral algorithm.

ConvergenceError boundTime-to-reach Tδ
Triple exponentialeeetc0logloglogc0δ
Double exponential rateeetc0loglogc0δ
Exponential rateetc0logc0δ
Polynomial order kc0tkc01/kδ1/k

Table 1.

Convergence rate under different set of functions.

Figure 2.

Gradient ascent vs. risk-aware Bregman dynamics forr=1+k=12ωk2ak2.


3. Constrained distributionally robust optimization

In the constrained case i.e., when Ais a strict subset of Rn+2,algorithms (10) and (13) present some drawbacks: The trajectory atmay not be feasible, i.e., atA×R+×Reven when it starts in A.In order to design feasible trajectories, projected gradient has been widely studied in the literature. However, a projection into Aat each time tinvolves additional optimization problems and the computation of the projected gradient adds extra complexity to the algorithm. We restrict our attention to the following constraints:


We impose the following feasibility condition: a¯l<a¯l,l1n,cl>0,l=1ncla¯l<b.Under this setting, the constraint set Ais non-empty, convex and compact.

We propose a method to compute a constrained solution that has a full support (whenever it exists). We do not use the projection operator. Indeed we transform the domain a¯la¯l=ξ01where ξxl=a¯lxl+a¯l1xl=al.ξis a one-to-one mapping and


The algorithm (18)


generates a trajectory atthat satisfies the constraint.

Algorithm 4.The constrained learning pseudocode is as follows:

1:procedureConstrained gradient a0ϵTgmhThe constrained learning algorithm starting froma0within0T

2:       aa0

3:       whileregret>ϵandtTdoWe have the answer if regret is 0

4:             Computeatsolution of (18)

5:             Computeregret

6:       end while

7:       returnat,regrettget a(t) and the regret

8:end procedure

Proposition 7.Ifb̂minlcla¯la¯lthen Algorithm(18) reduces to


Proof.It suffices to check that for b̂minlcla¯la¯l,the vector zdefined by zl=eylk=1neyksolves the replicator equation,


Thus, xl=eylk=1neykb̂cla¯la¯lsolves ẋl=xlelf̂a1b̂lelf̂axlcla¯la¯l.This completes the proof. 

Note that the dynamics of xin Eq. (19) is a constrained replicator dynamics [8] which is widely used in evolutionary game dynamics. This observation establishes a relationship between optimization and game dynamics and explains that the replicator dynamics is the gradient flow of the (expected payoff) under simplex constraint.

The next example illustrates a constrained distributionally robust optimization in wireless communication networks.

Example 5(Wireless communication). Consider a power allocation problem overnmedium access channels. The signal-to-interference-plus-noise ratio (SINR) is



  • N0>0is the background noise.

  • The interference on channel lis denoted Il0.One typical model for Ilis


  • ϵ>0is the height of the transmitter antenna.

  • ωllis the channel state at l.The channel state is unknown. Its true distribution is also unknown.

  • srlis the location of the receiver of l

  • stlis the location of the transmitter of l

  • o2,3,4is the pathloss exponent.

  • alis the power allocated to channel l.It is assumed to be between a¯l0and a¯lwith 0a¯l<a¯l<+.Moreover, a total power budget constraint is imposed l=1nala¯wherea¯>l=1na¯l0.

It is worth mentioning that the action constraint of the power allocation problem are similar to the ones analyzed in Section 3. The admissible action space is


Clearly,Ais a non-empty convex compact set. The payoff function is the sum-rateraω=l=1nWllog1+SINRlwhereWl>0.The mappingaωraωis continuously differentiable.

  • Robust optimization is too conservative: Part of the robust optimization problem[9, 7] consists of choosing the channel gainωll20ω¯llwere the boundω¯need to be carefully designed. However the worst case is achieved when the channel gain is zero:infωl0ω¯llraω=0.Hence the robust performance is zero. This is too conservative as several realizations of the channel may give better performance than zero. Another way is to re-design the boundsω¯llandω¯ll.But ifω¯ll>0it means that very low channel gains are not allowed, which may be too optimistic. Below we use the distributional robust optimization approach which eliminates this design issue.

  • Distributional robust optimization: By means of the training sequence or channel estimation method, a certain (statistical) distributionmis derived. Howevermcannot be considered as the true distribution of the channel state due to estimation error. The true distribution ofωis unknown. Based on this observation, an uncertainty setBρmwith radiusρ0is constructed for alternative distribution candidates. Note thatρ=0means thatB0m=m.The distributional robust optimization problem issupainfm˜BρmEm˜raω.In presence of interference, the functionraωis not necessarily concave ina.In absence of interference, the problem becomes concave.


4. Distributed optimization

This section presents distributed distributionally robust optimization problems over a direct graph. A large number of virtual agents can potentially choose a node (vertex) subject to constraint. The vector arepresents the population state. Since ahas ncomponents, the graph has nvertices. The interactions between virtual agents are interpreted as possible connections of the graph. Let us suppose that the current interactions are represented by a directed graph G=LE, where EL2is the set of links representing the possible interaction among the proportion of agents, i.e., if lkE, then the component lof acan interact with the kth component of a. In other words, lkEmeans that virtual agents selecting the strategy lLcould migrate to strategy kL.Moreover, Λ01n×nis the adjacency matrix of the graph G, and whose entries are λlk=1, if lkE; and λlk=0, otherwise.

Definition 6.The distributionally robust fitness function is the marginal distributionally robust payoff function. IfaEmhaωis continuously differentiable, the distributionally robust fitness function isEmahaω.

Definition 7.The virtual population stateais an equilibrium ifaAand it solves the variational inequality


Proposition 8.Let the set of virtual population stateAbe non-empty convex compact andbEmhbωbe continuous. Then the following conditions are equivalent:

  • ab,Emhaω0,bA.

  • the actionasatisfiesa=projAa+ηEmhaω

Proof.Let abe a feasible action that solves the variational inequality:


Let η>0.By multiplying both sides by η,we obtain


We add the term abato both sides to obtain the following relationships:


Recall that the projection operator on a convex and closed set Ais uniquely determined by




This completes the proof.

As a consequence we can derive the following existence result.

Proposition 9.Let the set of virtual population statesAbe a non-empty convex compact and the mappingbEmhbωbe continuous. Then, there exists at least one equilibrium inA.

Proof.A direct application of the Brouwer-Schauder’s fixed-point theorem which states that if ϕ:AAis continuous and Anon-empty convex compact then ϕhas at least one fixed-point in A.Here we choose ϕa=projAa+ηEmhaω.Clearly ϕAAand ϕis continuous on Aas the mapping bEmhbωand the projection operator bprojAbare both continuous. Then the announced result follows. This completes the proof.

Note that we do not need sophisticated set-valued fixed-point theory to obtain this result.

Definition 8.The virtual population stateais evolutionarily stable ifaAand for any alternative deviant statebathere is an invasion barrierϵb>0such that


The function ϱ:A×Rn×R+n×nRn×nis the revision protocol, which describes how virtual agents are making decisions. The revision protocol ϱtakes a population state a,the corresponding fitness Emh, the adjacency matrix Λand returns a matrix. Therefore, let ϱlkahΛbe the switching rate from the lthto kthcomponent. Then, the virtual agents selecting the strategy lLhave incentives to migrate to the strategy lLonly if ϱlkahΛ>0, and it is also possible to design switch rates depending on the topology describing the migration constraints, i.e., λlk=0ϱlkahΛ=0.The distributed distributionally robust optimization consists to perform the optimization problem above over the distributed network that is subject to communication restriction. We construct a distributed distributionally robust game dynamics to perform such a task. The distributed distributionally robust evolutionary game dynamics emerge from the combination of the (robust) fitness hand the constrained switching rates ϱ.The evolution of the portion alis given by the distributed distributional robust mean dynamics


Since the distributionally robust function his obtained after the transformation from payoff function rby means of triality theory, the dynamics (22) is seeking for distributed distributionally robust solution.

Algorithm 5.The distributed distributional robust mean dynamics pseudocode is as follows:

1:procedurePopulation-inspired algorithm a0ϵTϱgmhΛThe population-inspired learning starting froma0within0T

2:       aa0

3:       whileregret>ϵandtTdoWe have the answer if regret is 0

4:             Computeatsolution of (22)

5:             Computeregrett

6:       end while

7:       returnat,regrettgetatand the regret

8:end procedure

The next example establishes evolutionarily stable state, equilibria and rest-point of the dynamics (22) by designing ϱ.

Example 6.Let us consider a power system that is composed of10generators, i.e., letL=110. LetalR+be the power generated by the generatorlL. Each power generation should satisfy the physical and/or operation constraintsala¯la¯l, for alllL. It is desired to satisfy the power demand given bydR, i.e., it is necessary to guarantee thatlLal=d,i.e., the supply meets the demand. The objective is to minimize the generation quadratic costs for all the generators, i.e.,


where r:RnRis concave, and the parameters are possibly uncertain and selected as c0l=25+6l, c1l=15+4l+ω1l, c2l=5+l+ω2l, and d=20+ω3l. Therefore, the fitness functions for the corresponding full potential game are given by fla=2alc2lc1l, for all lL, and action space is given by


The distributed revision protocol is set to


for al0.We evaluate four different scenarios, i.e.,

  1. a¯=0nanda¯=d1ln,

  2. a¯l=0, for alllL\910, a¯9=1.1, anda¯10=1; anda¯l=d, for alllL\12, a¯1=3, anda¯2=2.5,

  3. Case 1 constraints and with interaction restricted to the cycle graphG=LEwith set of linksE=lL\nll+1n1,

  4. Case 2 constraints and with interaction restricted as in Case 3.

Figure 3 presents the evolution of the generated power, the fitness functions corresponding to the marginal costs and the total cost. For the first scenario, the evolutionary game dynamics converge to a standard evolutionarily stable state in which f̂a=c1n. In contrast, for the second scenario, the dynamics converge to a constrained evolutionarily stable state.

Figure 3.

Economic power dispatch. Evolution of the population states (generated power), fitness functionsf̂a=Ehaω, and the costsEraω. Figures (a)-(c) for case 1, (d)-(f) for case 2, (g)-(i) for case 3, and (j)-(l) for case 4.

4.1. Extension to multiple decision-makers

Consider a constrained game Gin strategic-form given by

  • P=1Pis the set of players. The cardinality of Pis P2.

  • Player phas a decision space ApRnp,np1.Players are coupled through their actions and their payoffs. The set of all feasible action profiles is ARn,with n=pPnp.Player pcan choose an action apin the set Apap=apAp:apapA.

  • Player phas a payoff function rp:AR.

We restrict our attention to the following constraints:


The coupled constraint is


Feasibility condition: If a¯pl<a¯pl,l1np,cpl>0,l=1npcpla¯pl<bp,c¯pR>0npand pPc¯pa¯p<b¯,the constraint set Ais non-empty, convex and compact.

We propose a method to compute a constrained equilibrium that has a full support (whenever it exists). We do not use the projection operator. Indeed we transform the domain a¯pla¯pl=ξ01where ξxpl=a¯plxpl+a¯pl1xpl=apl.ξis a one-to-one mapping and


The learning algorithm (23) is


generates a trajectory apt=apltlthat satisfies the constraint of player pat any time t.


5. Notes

The work in [10] provides a nice intuitive introduction to robust optimization emphasizing the parallel with static optimization. Another nice treatment [11], focusing on robust empirical risk minimization problem, is designed to give calibrated confidence intervals on performance and provide optimal tradeoffs between bias and variance [12, 13]. f-divergence based performance evaluations are conducted in [11, 14, 15]. The connection between risk-sensitivity measures such as the exponentiated payoff and distributionally robustness can be found in [16]. Distributionally robust optimization and learning are extended to multiple strategic decision-making problems i.e., distributionally robust games in [17, 18].



We gratefully acknowledge support from U.S. Air Force Office of Scientific Research under grant number FA9550-17-1-0259.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Jian Gao, Yida Xu, Julian Barreiro-Gomez, Massa Ndong, Michalis Smyrnakis and Hamidou Tembine (September 5th 2018). Distributionally Robust Optimization, Optimization Algorithms - Examples, Jan Valdman, IntechOpen, DOI: 10.5772/intechopen.76686. Available from:

chapter statistics

1311total chapter downloads

2Crossref citations

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Polyhedral Complementarity Approach to Equilibrium Problem in Linear Exchange Models

By Vadim I. Shmyrev

Related Book

First chapter

Digital Image Processing with MATLAB

By Mahmut Sinecen

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us