Approximate Dynamic Programming: An Efficient Machine Learning Algorithm

Zhou Shaorui; Cai Ming; Zhuo Xiaopo

doi:10.5772/intechopen.106691

Abstract

We propose an efficient machine learning algorithm for two-stage stochastic programs. This machine learning algorithm is termed as projected stochastic hybrid learning algorithm, and consists of stochastic sub-gradient and piecewise linear approximation methods. We use the stochastic sub-gradient and sample information to update the piecewise linear approximation on the objective function. Then we introduce a projection step, which implemented the sub-gradient methods, to jump out from a local optimum, so that we can achieve a global optimum. By the innovative projection step, we show the convergent property of the algorithm for general two-stage stochastic programs. Furthermore, for the network recourse problem, our algorithm can drop the projection steps, but still maintains the convergence property. Thus, if we properly construct the initial piecewise linear functions, the pure piecewise linear approximation method is convergent for general two-stage stochastic programs. The proposed approximate dynamic programming algorithm overcomes the high dimensional state variables using methods from machine learning, and its logic capture the critical ability of the network structure to anticipate the impact of decisions now on the future. The optimization framework, which is carefully calibrated against historical performance, make it possible to introduce changes in the decisions and capture the collective intelligence of the experienced decisions. Computational results indicate that the algorithm exhibits rapid convergence.

Keywords

stochastic programming
piecewise linear approximation
machine learning
network
approximate dynamic programming

Author Information

Show +

Zhou Shaorui*
- Sun Yat-sen University, Guangzhou, China
Cai Ming
- Sun Yat-sen University, Guangzhou, China
Zhuo Xiaopo
- Sun Yat-sen University, Guangzhou, China

*Address all correspondence to: zshaorui@gmail.com

1. Introduction

Optimal learning addresses the challenge of how to collect information, as efficiently as possible, to make a decision in the present such that it minimizes the expectation of costs in the future with uncertainty. Collecting information is usually time consuming and expensive. For example, several large shippers, such as Amazon, Walmart, and IKEA, need to decide the quantity of products to ship from plants to warehouses to satisfy the retailers’ demand. The retailer usually makes their decisions before knowing the real demand. Then, after they know the retail demand, they optimize the shipping plans between retailers and warehouses. The aforementioned problems generally can be treated as the two-stage stochastic programs. The decisions that the retailer made now (Stage 1) will determines the state while solving the problem in future (Stage 2). Therefore, an optimal decision can be made now if we can compute the expected cost function (the recourse function) of Stage 2. In this chapter, we propose an efficient machine learning algorithm that can collect information very efficiently based on the knowledge gradient and solve the problem optimally. The main gap between MAT and proposed algorithm is that our proposed can collect the information based on knowledge gradient and overcomes the “curse of dimensionality”. Besides, it can transfer the problem into a polynomial solvable problem and has been proven convergent theoretically.

1.1 Motivation

Optimal learning is a rich filed that includes contributions from different communities. At the moment, this chapter focus on optimal learning in two-stage stochastic program, which is a practically important problem. Problems of this type arise in several areas in dynamic programming, in which the decision maker need to make temporal and spatial decisions before realizing events that will influence the decisions. For example, in empty container repositioning problems [1], shipping companies need to reposition empty containers before realizing the demand. In locomotive planning problems [2], railroads have to decide the schedule of trains in which locomotives are assigned before disruptions occur across the railway network. For relief distribution problems [3], humanitarian decision makers need to distribute emergency aid to disaster locations when the emergency aid materials are very scarce amidst great uncertainties. For job scheduling problems [4], the managers need to decide initial staffing levels and their working plans before the demand are realized. Most of the aforementioned applications are fully sequential problems, and they can be modeled as two-stage stochastic programming problems. Hence, the research of two-stage stochastic optimization in this chapter is very important. However, the main obstacle in most practical problem is that the expected cost function in Stage 2 is quite complex due to uncertainty. In this chapter, we propose a hybrid learning algorithm called projected stochastic hybrid learning algorithm (ProSHLA) to approximate the expected recourse function for two-stage stochastic programs. In order to demonstrate the efficiency of the algorithm, we also theoretically prove the convergence of the proposed algorithm mathematically.

In essence, ProSHLA is a hybrid of stochastic sub-gradient and piecewise linear approximation methods. The core of ProSHLA consists of a series of learning steps those provide information for updating the recourse function through a sequence of piecewise linear separable approximations, and a series of the projection steps those can guarantee convergence by implementing the stochastic sub-gradient method. The mathematical analysis and the computational results all demonstrates that when the initial piecewise linear approximation function is properly constructed for two-stage stochastic programs with network recourse, the learning algorithm can drop the projection steps without sacrificing convergence. Moreover, without the projection step, the learning algorithm only consists of a series of learning steps through a sequence of piecewise linear separable approximations, and can solve the practical complex problems very efficiently. Our innovative finding can help the practitioner and the scholar to understand the open problem that has puzzled them for decades: why does the piecewise linear approximation method can be efficient and convergent for stochastic programs with network recourse in practice. In this chapter, we provide the first theoretical support by our analytical results for the use of the piecewise linear approximation method in solving practical problems.

1.2 Literature review

In this chapter, we consider the two-stage stochastic programming problem as follows:

minc0Tx+EωQxωE1

s.t.,

Ax=b,

x≥0,

where X⊂Rn denotes a convex compact set and the recourse function Qxω denotes the optimal value of the second stage problem:

Qxω=minc1TyωE2

s.t.,

Wωy=hω−Tωx,

yω≥0ω.

In the above model, variables x and y denote the decision variables of stage 1 and 2 problems, respectively. A, Wω are constraint matrices, and parameters c0 and c1 denote the first and second stage vectors of cost coefficients, respectively.

Stochastic programming models and solution methods has been examined by many researchers. Comprehensive reviews and discussions were performed by Wallace and Ziemba [5]. The expected recourse function is extremely complex to evaluate except for a few special cases. There are various approximation methods those can be categorized into four groups. Let Q̂x denote the approximate function. The first group includes scenario methods which use the sample average of Qxωi for several samples, ω1,ω2,…ωN, to approximate the expected recourse function [6]. The approximation function is usually successively updated by the following function:

Q̂x=∑i=1NQxωiN

Generally, the scenario method is very efficient, but it cannot guarantee to obtain the convergent solution.

The second group consists stochastic gradient techniques [7, 8], which updates solutions by using stochastic sub-gradients as directions. Usually, the approximate function can be successively updated by the following function:

Q̂x=g¯kTxE3

where g¯k denotes a smoothed estimate of the gradient of the expected recourse function at x for iteration k. This method can be proven convergent by projection [9] or recursive linearization [10], although the drawback of this method is that it is time-consuming.

The third group mainly consists of primal and dual decomposition methods. The use of primal and dual decomposition methods dates back to Benders decomposition [11]. Van Slyke and Wets [12] first adopted the L-shaped algorithm into the application of Benders decomposition to two-stage stochastic programs. Pereira and Pinto [13] proposed the stochastic dual dynamic programming (SDDP) method, which has been widely applied in many areas. SDDP uses Benders cuts to compute an outer approximation of a (convex) recourse function, and constructs feasible dynamic programming policies. SDDP has led to numerous related approximation methods those are based on the same logic but seek to improve the approximation procedures by exploiting the underlying structure of the particular applications. These methods consist of use of inexact cuts [14], risk-averse variants [15], embedding SDDP in the scenario tree framework [16]. The convergence of SDDP and related methods has been proven by [17], for linear programs by Girardeau et al. [18].

The fourth group includes separable approximation methods [19, 20]. This type of methods usually replaces the expected recourse function in Eq. (1) with separable approximation functions as follows:

Q̂x=∑i=1IQ̂ixE4

If the separable functions Q̂ix are piecewise linear or linear, we can replace the expected recourse function in Eq. (1) with Q̂x. Then we can solve the problem as a pure network flow problem for network recourse problems, which is polynomial solvable. Thus, it is very efficient. For example, Godfrey and Powell [21] proposed an adaptive piecewise concave approximation (CAVE) algorithm, and the experimental performance of the algorithm shows exceptionally good. However, there was none provable convergent results in their study. In order to provide convergent solutions, Cheung and Powell [19] proposed an approximation algorithm (SHAPE), which uses sequences of strongly convex approximation functions. However, the strongly convex functions require to construct a nonlinear term, and the strongly convex term might damage the pure network structure and need additional computational effort. This chapter intends to introduce an accurate and efficient approximations with the convergence property.

1.3 Contributions of the algorithms

In this chapter, we aim to develop a convergent method that can efficiently approximate the expected recourse function for two-stage stochastic programs. The main contributions are listed as following:

We propose a new convergent hybrid learning algorithm to approximate the expected recourse function for two-stage stochastic programs.
Through rigorous mathematical analysis, we prove the convergence of the proposed algorithm for general two-stage stochastic programs. The computational results and mathematical analysis both reveals that the algorithm can drop the projection step without sacrificing the convergence for two-stage stochastic programs with network recourse if we can properly construct the initial piecewise linear approximation functions. That means that a pure piecewise linear approximation can be indeed convergent, which is highly consistent with industry practices. This interesting finding answers the open question which has puzzled scholars for more than a decade: why does the piecewise linear approximation work well for two-stage stochastic programs in industry? Our mathematical analysis can provide the first theoretical support.
A series of performance analysis has been conducted. The computational results reveal the efficiency of the proposed algorithms and the proposed algorithms are distribution-free. Furthermore, the convergence rate can be affected by the granularity of the initial function (δ). Small granularity usually leads to a high convergence rate. Finally, the computational results also show that the proposed algorithm is very competitive for high dimensional problems.
Compared with MAT, the proposed algorithm can collect the information based on knowledge gradients and use it to update the recourse function by learning steps. It can overcome the “curse of dimensionality”. Moreover, it can transfer the problem into a polynomial solvable problem.

The remainder of this chapter is organized as follows. Section 2 presents the description and convergence analysis of the algorithm for general two-stage stochastic programs. The algorithm (without projection steps) for two-stage stochastic programs with network recourse are shown in Section 3. Section 4 demonstrates computational experiments based on an application of the empty container repositioning problem. Section 5 presents the conclusions and outline directions for future research.

2. Description and convergence analysis of ProSHLA for general two-stage stochastic programs

In this section, ProSHLA is first introduced. Subsequently, we analyze the convergence of ProSHLA for general two-stage stochastic programs.

2.1 Description of ProSHLA

To present ProSHLA mathematically, we let, at each iteration k,

αk= (possibly random) positive step size;

Q¯x= expected recourse function, that is, EωQxω;

Q̂kx= a convex differentiable approximation of Q¯x;

q̂kx= a subgradient of Q̂kx at x, that is q̂kx ∈ ∂Q̂kx;

g¯k= a smoothed estimate of the gradient of Q¯x at iteration k;

gk= a stochastic subgradient of Q¯x at xk, that is, g^k ∈ ∂Q(x^k,ω^k + 1);

Hk={ω1,ω2,…ωN}=the historyuptoand includingiterationk.

For a general non-smooth convex function Q̂x, its sub-differential can be defined as follows:

∂Q̂x=q̂x∈Rn:Q̂y−Q̂x≥q̂xTy−x.

We combine Eqs. (1), (3), and (4) to form an approximation at iteration k as follows:

minc0Tx+Q̂0x+g¯kTxE5

In this study, we approximate the expected recourse function at iteration k via a convex, differentiable approximation Q̂0x with a linear correction termg¯kTx. At each iteration, the linear correction termg¯kTx are introduced to improve the initial approximation Q̂0x. Note that here we use a convex initial approximation function Q̂0x, whereas SHAPE uses strongly convex approximation functions. SHAPE will introduce a nonlinear term in the approximation function to maintain the strong convexity property, and it might destroy the pure network flow problem structure and demands additional computational effort. Moreover, we do not calculate g¯k in the usual manner to obtain stochastic sub-gradients in this study. We use the following form in our model instead:

minc0Tx+Q̂kx+αkgk−q̂kxTx,E6

where Q̂kx is updated as follows:

Q̂k+1x=Q̂kx+αkgk−q̂kxTxE7

The greatest merit of updating Q̂k+1x in the above way is that it can retain the stochastic sub-gradients (q̂kxk,q̂k−1 (x^k − 1),…,q̂0(x0)) used in the previous iterations. Thus, in iteration k, the objective function involves a weighted average of stochastic sub-gradients in the past (k−1) iterations. As shown later in Lemma 2, g¯k in Eq. (5) is a linear combination of g1,g2,…gk−1.

Let PX:Rn→X be the orthogonal projection onto X [9]. Then, we can obtain a sequence of solutions {xk} using the following procedure (Figure 1).

Generally, ProSHLA consists of two-level loops. In the first-level loop, there exists a series of passes, and in the second-level loop, the exists a series of projection steps, which include the step 5 and 6. We first construct an initial bounded and piecewise linear convex approximation function Q̂0 (x) at the beginning of the first pass, then the initial solution x⁰ can be obtained by solving problem (1). A realization of the random quantity ω ∈ Ω can be drawn, and then we can obtain a stochastic sub-gradient of Q¯0x by solving the resulting deterministic problem. Compared with the slope of Q̂0 (x) and the stochastic sub-gradient at x = x⁰, the difference of these two slopes can be used as a linear term to update Q̂0 (x). Subsequently, we can obtain a new solution x^k + 1 using the updated approximation function. If the sub-gradient vectors q̂m¯xk+1 of the newly obtained solution x^k + 1 are equal to sub-gradient of solution xm¯, that is q̂m¯xm¯, the piecewise linear approximation might have jumped into a local optimum. Subsequently, ProSHLA need to jump out from local optimum by implementing projection steps in the second-level loop. If we obtain a new solution x^k + 1 in the second-level loop and the sub-gradient q̂m¯xk+1 is different from sub-gradient q̂m¯xm¯, then ProSHLA can jump out the second-level loop, and comes to the end of second pass. Thus then, we can repeat the entire process. Finally, ProSHLA will be terminated when the total absolute change in Q̂lx over a certain number of iterations is low (e.g. ∑l=k−M+1k‖Q̂lx−Q̂l−1x∣<δ).

Here we point out the main difference between SHAPE and ProSHLA. The most remarkable difference is that ProSHLA uses convex approximation functions while SHAPE uses strongly convex approximation functions. The strongly convexity always maintains a nonlinear term in the approximation function. And this term might destroy the pure network flow structure and causes additional computational effort. To overcome the drawback of the SHAPE, we introduce the projection step in the second-level loop and construct approximation functions. Particularly, the approximation functions in the ProSHLA is NOT strictly convex, while it needs to be strictly convex in SHAPE. Without the projection step in the second-level loop, the ProSHLA might stuck in the corner local –optimum for stochastic linear programs. Thus, ProSHLA can work well for most practical stochastic linear programs, because most of practical stochastic programs are piecewise convex problems.

2.2 Convergence analysis of ProSHLA

Firstly, we demonstrate the convergence theorem of ProSHLA in this subsection. Then, several properties of approximation are presented. Finally, we use these properties to prove the convergence of ProSHLA.

Without loss of generality, the following assumptions are listed.

(A.1)X⊂Rn is compact and convex.

(A.2)EωQx1ω is convex, finite and continuous onX.

(A.3)gk is bounded such that gk≤c1 for each ω∈Ω;q̂k is bounded such that q̂k≤c2 for eachω∈Ω.

(A.4) Piecewise linear function Q̂kx are convex, implying that

Q̂kx1−Q̂ky1≤q̂kx1Tx1−y1.

(A.5) The stepsize αk are Hk measurable and satisfy

0<αk<1,∑k=0∞Eαk2≤∞

Except for the assumption from (A.1) to (A.5), we also introduce the following assumption to characterize the piecewise linear convex approximation functions.

(A.6) There exists a positive b and a constant δ, such that for any two points x1,y1∈X, ifx1−y1>δ, then q̂x1−q̂y1≥bx1−y1. If there exists q̂x1 and q̂y1 such that q̂kx1−q̂ky1=0, then x1−y1≤δ. If δ→0, then the function corresponds to a strongly convex function, if δ→∞, then the function becomes purely linear.

Given assumption (A.1)–(A.6), we obtain the following theorem of ProSHLA.

Theorem 1. If assumptions (A.1)–(A.6) are satisfied, then the sequence of x1k generated by algorithm ProSHLA converges almost surely to the optimal solution x1∗∈X∗ of problem (1).

In order to prove the Theorem 1, we need to use the following Martingale convergence theorem and three lemmas.

Martingale Convergence Theorem. A sequence of random variables Wk, which are Hk measurable, is said to be a super-martingale if there exists the sequence of conditional expectations E{Wk+1Hk and satisfies E{Wk+1Hk≤Wk.

Theorem 2. (From reference [22]) Let Wk be a positive super-martingale. Then, Wk converges to a finite random variables a.s.

From the above theorem, we can conclude that Wk is a stochastic decreasing analogue essentially.

Based on the convexity property, the optimal solution for problem (8) at iteration m¯ can be characterized by the following inequality:

q̂m¯x1m¯Tx1−x1m¯≥0,∀x1∈XE8

To obtain Theorem 1, the following three lemmas are required. The first lemma shows that the difference between the solutions of two consecutive update processes will be bounded by the step-size and the stochastic gradient. The second lemma indicates that the approximation Q̂kx1 is finite. The third lemma shows that Tk (which will be denoted in Lemma 3) is bounded.

Lemma 1. For any two iterations j∈m+1¯m+2¯, i∈m¯m+1¯, solutions x1j and x1i obtained by ProSHLA can be characterized by the following inequality:

αm¯gm¯x1i−x1j≤αm¯c12/bE9

Proof. Consider a special case, where i and j corresponds to two consecutive iterations. Let i=m+1¯−1 and j=m+1¯. Based on (10), we can obtain that,

q̂m+1¯x1m+1¯Tx1−x1m+1¯≥0,∀x1∈XE10

According to the approximation function’s updating rule, we can conclude that,

q̂m+1¯−1x1m+1¯+αm+1¯−1gm+1¯−1−q̂m+1¯−1x1m+1¯−1Tx1−x1m+1¯≥0,∀x1∈XE11

Substituting x1 with x1m+1¯−1 in Eq. (13), we can obtain

αm+1¯−1gm+1¯−1−q̂m+1¯−1x1m+1¯−1Tx1m+1¯−1−x1m+1¯≥q̂m+1¯−1x1m+1¯Tx1m+1¯−x1m+1¯−1E12

If we arrange the above terms, then we can obtain the inequality below:

αm+1¯−1gm+1¯−1Tx1m+1¯−1−x1m+1¯≥q̂m+1¯−1x1m+1¯Tx1m+1¯−x1m+1¯−1−αm+1¯−1q̂m+1¯−1x1m+1¯−1Tx1m+1¯−x1m+1¯−1=q̂m+1¯−1x1m+1¯−q̂m+1¯−1x1m+1¯−1Tx1m+1¯−x1m+1¯−1+1−αm+1¯−1q̂m+1¯−1x1m+1¯−1Tx1m+1¯−x1m+1¯−1E13

When iteration m+1¯−1 and m+1¯ are not in the same update process, then it means that q̂m+1¯−1x1m+1¯−1≠q̂m+1¯−1x1m+1¯. According to assumption (A.6), we can conclude that q̂m+1¯−1x1m+1¯−1−q̂m+1¯−1x1m+1¯≥bx1m+1¯−1−x1m+1¯.

According to Eqs. (10) and (13), and0<αm+1¯−1<1, we can obtain

αm+1¯−1gm+1¯−1Tx1m+1¯−1−x1m+1¯≥bx1m+1¯−1−x1m+1¯2+1−αm+1¯−1q̂m+1¯−1x1m+1¯−1Tx1m+1¯−x1m+1¯−1≥bx1m+1¯−1−x1m+1¯2.

Applying Schwartz’ inequality, we can get the inequality below:

αm+1¯−1gm+1¯−1∙x1m+1¯−1−x1m+1¯≥αm+1¯−1gm+1¯−1Tx1m+1¯−1−x1m+1¯≥bx1m+1¯−1−x1m+1¯2

Therefore, x1m+1¯−1−x1m+1¯≤αm+1¯−1∙c1/b. We can obtain the inequality below:

αm+1¯−1gm+1¯−1Tx1m+1¯−1−x1m+1¯≤αm+1¯−1c12/bE14

For any i∈m¯m+1¯−1], gi=gm¯=gm+1¯−1 andq̂ix1=q̂m¯x1=q̂m+1¯−1x1. Thus,

αm¯gm¯Tx1i−x1m+1¯≤αm¯c12/bE15

For any i∈m¯m+1¯−1] and j∈m+1¯m+2¯−1],gj=gm+1¯=gm+2¯−1 for anyx1∈X. Thus,

αm¯gm¯Tx1i−x1j≤αm¯c12/bE16

□

Lemma 2. In iteration k, the approximation function Q̂kx1 can be written as Q̂kx1=Q̂0x1+g¯kTx1, where g¯k is a finite vector.

Proof. According to Eq. (5) in Proposition 1, we can conclude that g¯k+1 is a linear combination of g1,g2,…,gk. Since gk and q̂0x1 are finite, there will exists a finite and positive vector d̂ such that

d̂≥maxkgk−q̂0x1kE17

According to Lemma 2 in [19], we can conclude that g¯k+1≥d̂. □

LetTk=Q̂kx1∗−Q̂kx1k, where x1∗ represents the optimal solution. The following Lemma characterizes the difference between Tk+1 andTk.

Lemma 3. For any two iterations i∈m−1¯m¯−1] and j∈m¯m+1¯−1], Ti and Tj satisfy

Tj−Ti≤αm¯gm¯Tx1i−x1j+αm¯gm¯Tx1∗−x1i.E18

Proof. The special case is first considered. Let i=m¯ and j=m+1¯. By re-writing x1∗−xm+1¯ as x1∗−xm¯+xm¯−xm+1¯, we can obtain the following equation:

Q̂m+1¯x1=Q̂m+1¯−1x1+αm+1¯−1gm+1¯−1−q̂m+1¯−1x1m+1¯−1Tx1=Q̂m¯x1+αm¯gm¯−q̂m¯x1m¯Tx1

Then,

Tm+1¯−Tm¯=Q̂m¯x1∗+αm¯gm¯−q̂m¯x1m¯Tx1∗−(Q̂m¯x1m+1¯+αm¯gm¯−q̂m¯x1m¯Tx1m+1¯)−Q̂m¯x1∗−Q̂m¯x1m¯=αm¯gm¯−q̂m¯Tx1∗−x1m¯+x1m¯−x1m+1¯+Q̂m¯x1m¯−Q̂m¯x1m+1¯=Q̂m¯x1m¯−Q̂m¯x1m+1¯−αm¯q̂m¯Tx1m¯−x1m+1¯⏟Ι−αm¯q̂m¯Tx1∗−x1m¯⏟ΙΙ+αm¯gm¯Tx1m¯−x1m+1¯⏟ΙΙΙ+αm¯gm¯Tx1∗−x1m¯⏟ΙV

Considering each part individually, given that q̂m¯∈∂Q̂m¯x1m¯, by convexity of Q̂m¯x1m¯, we can obtain

Q̂m¯x1m¯−Q̂m¯x1m+1¯≤q̂m¯Tx1m¯−x1m+1¯E19

Thus, the following expression is applicable.

Q̂m¯x1m¯−Q̂m¯x1m+1¯≤q̂m¯Tx1m¯−x1m+1¯=1−αm¯q̂m¯Tx1m¯−x1m+1¯+αm¯q̂m¯Tx1m¯−x1m+1¯E20

Given Eq. (10) and0<αm¯<1, we know that Ι≤0. Additionally, from Eq. (10) and0<αm¯<1, we know that ΙΙ≥0. Thus,Tm+1¯−Tm¯≤αm¯gm¯Tx1m¯−x1m+1¯+αm¯gm¯Tx1∗−x1m¯.

For anyi∈m¯m+1¯−1], gi=gm¯=gm+1¯−1 andq̂ix1=q̂m¯x1=q̂m+1¯−1x1 for anyx1∈X. Therefore,

Tm+1¯−Ti≤αm¯gm¯Tx1i−x1m+1¯+αm¯gm¯Tx1∗−x1iE21

For any i∈m¯m+1¯−1] andj∈m+1¯m+2¯−1],Q̂j=Q̂m+1¯=Q̂m+2¯−1 for anyx1∈X. Therefore,

Tj−Ti≤αm¯gm¯Tx1i−x1j+αm¯gm¯Tx1∗−x1iE22

□

To the proof of Theorem 1, we here consider two scenarios. For the first scenario, ProSHLA does not stop in a given update process. Thus, any update process exhibits finite iterations before the algorithm stops, which means m+1¯−m¯<M for any m (M represents a large number). For the second scenario, ProSHLA might terminate in a given update process. In the following text, Theorem 1 is proven for each scenario.

Scenario 1: ProSHLA does not stop in a given update process.

In the first scenario, a subsequence of x1k,x1m¯ are considered. We will prove that the subsequence x1m¯ converges to the true optimal x1∗. According to the definition of gk∈∂Qx1kωk+1, we can obtain the following inequality

gkTx1∗−x1k≤Qx1∗ωk+1−Qx1kωk+1E23

where Qx1ωk+1 represents the operational cost function given the outcomeωk+1.

According to Lemma 1, we can obtain the following inquality:

αm¯gm¯Tx1i−x1j≤αm¯c12/bE24

On the basis of Lemma 3, the difference Tm+1¯−Tm¯ can be described as follows:

Tm+1¯−Tm¯≤αm¯gm¯Tx1m¯−x1m+1¯+αm¯gm¯Tx1∗−x1m¯≤−αm¯Qx1m¯ωm+1¯−Qx1∗ωm+1¯+αm¯gm¯Tx1∗−x1m¯≤−αm¯Qx1m¯ωm+1¯−Qx1∗ωm+1¯+αm¯c12/bE25

Conditional expectation of Eq. (27) with respect to Hk can be taken on both side and then we can obtain

ETm+1¯Hm¯≤Tm¯−αm¯Q¯x1m¯−Q¯x1∗+αm¯c12/b

where Q¯x1 represents the expected recourse function, that is EωQx1ω. Given the conditioning on Hk,Tm¯,αm¯ and x1m¯ on the right-hand side are deterministic. The conditioning Hk cannot provide any information onωm+1¯. Hence, we replace Qx1ωm+1¯ (for x1=x1k and x1=x1∗) with its expectationQ¯x1. Given that αm¯Q¯x1m¯−Q¯x1∗≥0, the sequence

Wm¯=Tm¯+αm¯c12/bE26

is a positive supermartingale. Theorem 2 implies the almost sure convergence ofWm¯. Hence,

Tm¯→T∗a.s.E27

We perform the summation of Eq. (27) from 0 to M¯ and obtain the following inequality:

TM¯−T0≤−∑m¯=0M¯αm¯Qx1m¯ωm+1¯−Qx1∗ωm+1¯+∑m¯=0M¯αm¯c12/bE28

We take the expectation of both sides. We take the conditional expectation with respect to Hm¯ and then over all Hm¯ for the first term on the right-hand side.

ETM+1¯−T0≤−∑m¯=0M¯EEαm¯Qx1m¯ωm+1¯−Qx1∗ωm+1¯Hm¯+E∑m¯=0M¯αm¯c12b≤−∑m¯=0M¯Eαm¯Qx1m¯ωm+1¯−Qx1∗ωm+1¯Hm¯+c12/b∑m¯=0M¯Eαm¯2

We take the limit as M¯→∞ and use the finiteness of TM¯ and ∑m¯=0M¯Eαm¯2 to obtain

∑m¯=0M¯Eαm¯Qx1m¯ωm+1¯−Qx1∗ωm+1¯Hm¯<∞E29

Given that Qx1m¯ωm+1¯−Qx1∗ωm+1¯≥0 and∑m¯=0M¯Eαm¯2=∞a.s., there exists a subsequence m¯ such that

Q¯x1m¯→Q¯x1∗a.s.

By continuity of Q¯, the sequence converges. Hence,

x1m¯→x1∗a.s.

Subsequently, we construct another subsequence x1m¯−1. Based on Eq. (27),

ETM+2¯−1−TM+1¯−1≤−∑m¯=0M¯αm¯Qx1m+1¯−1ωm+2¯−1−Qx1∗ωm+2¯−1+αm¯c12/b

Like-wise, the following approximation can be proved:

x1m¯−1→x1∗a.s.

By analogic condition, a very general subsequence x1i,i∈m¯m+1¯−1 will almost surely converge to x1∗. Here, we term this type of subsequence Xs.

In the procedure of ProSHLA, the number of all update iterations is finite. Thus, for any subsequence of x1k, we can obtain a subsequence that always belongs to Xs. Then, the following conclusion can be obtained:

x1k→x1∗a.s.

Scenario 2: ProSHLA halts in a given update process.

For the second scenario, ProSHLA halts in a projection procedure which generates a convergent sequence.

Hence, the conclusion of Theorem 1 can be finally obtained. □.

The above analytical processes demonstrate the convergence property of ProSHLA. According to the above results, we require the function Q̂x to be piecewise linear convex. However, for practitioners are interested in a practically scenarios, in which they usually use separable functions to approximate the expected recourse function for stochastic programs with network recourse. Based on Eq. (4), if the separable functions are piecewise linear or purely linear, then practitioners can easily solve this network recourse problem, because a pure network flow problem is polynomial solvable. In the following section, we will discuss this special practice scenario.

3. Application for two-stage stochastic programs with network recourse using separable piecewise linear functions

In this section, we will discuss the scenario where Q̂x is separable for two-stage stochastic programs with network recourse. For this scenario, we can simplify implement ProSHLA without projection step. We denote this simplified version as the Stochastic Hybrid Learning Algorithm (SHLA), which is described as follows (Figure 2).

Essentially, SHLA is not convergent. However, if it is applied to two-stage stochastic programs with network recourse, SHLA will enjoys several merits as follows: (1) the solution of Q(x,ω) is naturally integer; (2) at each iteration, problem Q(x,ω) is simple network flow problem that can be solved by polynomial algorithm.

Here, if we use separable functions, then assumption (A.6) can be satisfied by the following artificially expression:

q̂0xi<q̂0xi+δ

Note that for both ProSHLA and SHLA, it allows to choose initial approximation function with different value of δ flexibly. Thus, if δ is set to be 1 for any i, then we can guarantee the following expression:

q̂i0xi<q̂i0xi+δ.EA.7

Then, we can reach Theorem 3 below.

Theorem 3. If (A.7) is satisfied, SHLA is always convergent for two-stage stochastic programs problem with network recourse.

Proof. For any x,y∈X,if there are unequal, we can obtain the following expression according to (A.7).

q̂kx≠q̂ky

Thus,

∣q̂kx−q̂ky∣>0

Hence, if we set δ=1 and apply ProSHLA for two-stage stochastic programs with network recourse, then ProSHLA can drop the projection step because q̂kx and q̂ky are always unequal. In this situation, ProSHLA is equivalent to SHLA, so SHLA is convergent.

According to the above analysis, we have provided first theoretical convergence support for SHLA-type algorithms which are widely used in numerous applications as mentioned in introduction part. Compared with SHAPE, SHLA does not contain any nonlinear terms so that it can be very efficient. Besides, SHLA can automatically maintain the convexity of the approximation function if the initial piecewise linear functions are properly constructed.

4. Experimental results for performance analysis

In this section, we use two experimental designs to evaluate the performance of the algorithms: (1) An empty container repositioning problem which arises in the context of two-stage stochastic programs with network recourse; and (2) a high dimensional resource allocation problem as an extension experiment. In this section, the empty container repositioning problem is first introduced and then we present the efficiency of ProSHLA and SHLA. Sub-sequentially, we present the convergence of ProSHLA and SHLA, and examine how δ affects convergence performance, and compare the performance under different distributions of random demands. Finally, an extension experiment on a high dimensional resource allocation problem is conduncted to evaluate the efficiency our algorithms.

4.1 Problem generator for the empty container repositioning problem

In this subsection, we test our algorithms in an empty container repositioning problem faced by a major Chinese freight forwarder, who need to manage their numerous empty container in a port network which is located in Pearl River Delta in a fixed route from [23]. The port network contains several hubs (large ports) and spokes (small ports). And the demand of empty container is usually uncertain. When the forwarder need to decide the quantity of empty container to ship from one port to another, they did not know the exact demand of container in the future [24, 25]. Thus, we can formulate the problem as a two-stage stochastic programs with network recourse. Before we formally introduce the problem, we present the following notations.

L	=	set of ports;
dij	=	demand from port i to portj in stage 1;
Dij	=	demand from port i to portj in stage 2;
si	=	initial number of empty containers at port i;
Si	=	number of empty containers at port i in stage 2;
cij	=	cost for moving an empty container from port i to portj;
rij	=	profit for moving a laden container from port i to portj;
xij	=	number of laden containers shipped from port i to portj in stage 1;
yij	=	number of empty containers shipped from port i to portj in stage 1;

Then, the problem can be formulated as follows:

min∑i∈L∑j∈L−rijxij+cijyij+EωQxω,E30

s.t.,

∑j∈Lxij+yij=si,∀i∈LE31

∑i∈Lxij+yij=sj,∀j∈LE32

yij≥0,∀i,j∈LE33

where the recourse function Qxω is given as follows:

Qxω=min∑i∈L∑j∈L−rijxijω+cijyijωE34

s.t.,

∑j∈Lxijω+yijω=si,∀i∈LE35

∑i∈Lxijω+yijω=sj,∀j∈LE36

yijω≥0,∀i,j∈LE37

In order to evaluate the algorithm, a set of problem instances are created. In this study, the problem generator creates ports in L in a 100-mile by 100-mile rectangle. We simply use the Euclidean distance between each pair of ports as the corresponding travel distance. We set the holding cost for a demand to 15 cents per time instance. We set the net profit for a demand to 500 cents per mile. The empty cost is set to 40 cents. The demand Dij between locations i and j is set as follows:

Dij=outj·ini·v,

where

outj= outbound potential for port j;

ini = inbound potential for port i;

v= random variable.

The outbound and inbound potentials for each port represent the capability of the location to generate outbound demand or attract inbound containers. In the generator, We draw the inbound potential, ini, for port i between 0.2 and 1.8 uniformly, while the corresponding outj is set as outj=2−ini . The reason for this setting is that in real-world regions, large inbound flows port usually exhibits small outbound flows. We also include a random number v with mean 30, that is, the typical daily demand between each pair of locations to capture the randomness in demand. In order to test the performance of the algorithms under different distributions, we also evaluate the performance under exponential, normal and uniform distribution. We set the stepsizeαk to 1/k.

We solve a deterministic network flow problem to construct an initial piecewise linear functions as described in [1], and we replace the random demand by their mean values in the deterministic problem. Then, we can obtain S¯=S¯1S¯2…S¯n. For eachi∈L, we generate the initial approximation function Q̂i0x=cx−S¯i2,x=0,δ,…,kδ,….Kδ, where c is a positive parameter and x ∈ [0,Kδ]. In the projection step, a least-squares problem is solved as following:

xk+1=argminxk+1−xk+αkgkxk2,xk+1∈X.

4.2 Effectiveness and efficiency performance

To test the efficiency of the algorithm, we use a myopic algorithm, a posterior bound (PB), the L-shaped algorithm [12] and the inexact cut algorithm [15] as benchmarks. The myopic algorithm simply solves a static deterministic assignment problem at the current stage while ignoring uncertainties in the second stage. It is necessary to solve a deterministic network flow problem with all realized demands to obtain PB. Note that such a posterior optimization involves no uncertainty since decisions are allowed to anticipate future demand. Thus, the cost of PB is the lowest and normally unreachable. As for the L-shaped algorithm and the inexact cut algorithm, a group of linear programming problems with valid cuts should be solved.

We use 8 instances, in which the number of empty containers is ranged from 400 to 3200, and the corresponding number of ports is ranged from 5 to 40. For each instance, 2000 samples are implemented and we obtain the solutions of the myopic algorithm and the sample means of PB, the inexact cut algorithm, the L-shaped algorithm, SHLA and ProSHLA. For SHLA, two classes of initial functions with δ = 1 and δ = 2 are selected, whereas we select δ = 2 for ProSHLA.

We show the experiment results on total cost in Table 1. In Table 1, column 1 presents the number of ports, and column 2 shows the number of the empty containers. The PB bounds are contained in column 3. Columns 4–9 contain the solutions achieved by the myopic algorithm, the L-shaped algorithm, the inexact cut algorithm, SHLA-1, SHLA-2 and ProSHLA, respectively. From the table, it clearly demonstrates that the inexact cut algorithm, the L-shaped algorithm, SHLA, and ProSHLA can achieve optimal or very-near-optimal solutions, which are closer to the PB (lowest) bounds than those of the myopic method. Moreover, the solutions of the L-shaped algorithm are the best solutions known, which are slightly better than the inexact cut algorithm because the latter produces valid cuts that are inexact in the sense that they are not as constraining as optimality cuts in the Lshaped algorithm. In addition, the performance of SHLA (δ = 1) outperforms that of SHLA (δ = 2) and ProSHLA (δ = 2), the reason is that small δ can lead to good performance. A specific discussion with impact of δ will be demonstrated later. The performance of ProSHLA (δ = 2) is slight better than SHLA (δ = 2). Because the projection steps in ProSHLA help improve the solution. Considering the speed of convergence is quite important in practical problems, we will focus on the computational time for different algorithms, which is shown in the Table 2 below.

NL	NR	Total cost (dollars)
NL	NR	PB	Myopic	L-shaped	Inexact cut	SHLA-1	SHLA-2	ProSHLA
5	400	−28,551,302	−27,761,331	−28,551,274	−28,551,227	−28,464,248	−28,463,941	−28,464,002
10	800	−59,397,423	−58,432,159	−59,396,702	−59,396,319	−59,338,653	−59,338,190	−59,338,341
15	1200	−98,451,193	−93,188,576	−98,449,868	−98,449,427	−98,257,484	−98,257,244	−98,257,395
20	1600	−147,390,005	−141,062,187	−147,388,360	−147,387,532	−147,269,239	−147,269,212	−147,269,213
25	2000	−185,223,883	−180,875,614	−185,220,423	−185,219,663	−185,092,289	−185,092,113	−185,092,143
30	2400	−234,005,740	−226,978,090	−234,004,427	−234,003,513	−233,401,849	−233,401,516	−233,401,658
35	2800	−266,375,728	−260,966,356	−266,375,468	−266,374,497	−266,337,862	−266,337,649	−266,337,660
40	3200	−304,910,355	−293,882,881	−304,908,597	−304,906,829	−304,907,153	−304,906,829	−304,906,962

Table 1.

Total cost for SHLA and ProSHLA.

N_R	N_L	Computation time (s)
N_R	N_L	Inexact cut	L-shaped	ProSHLA	SHLA-1	SHLA-2
400	5	76	153	43	28	28
800	10	382	535	148	90	95
1200	15	598	1140	332	224	217
1600	20	1084	2277	628	417	420
2000	25	1843	4190	1107	676	790
2400	30	2338	5491	1559	1112	1066
2800	35	4249	8075	2341	1539	1531
3200	40	8154	18,636	5307	3010	3428

Table 2.

Computational time of ProSHLA and SHLA.

As shown in Table 2, ProSHLA and SHLA are more efficient than the inexact cut algorithm and the L-shaped algorithm because ProSHLA and SHLA can utilize the network structure while using the stochastic sub-gradient to approximate the recourse function. From the table, we find that the inexact cut algorithm and L-shaped algorithm are time-consuming, the reason is that here are 2000 samples, and it corresponds to a very large number of cuts for the inexact cut algorithm and L-shaped algorithm. It can also be observed that the computational time of the inexact cut algorithm is smaller than that of the L-shaped algorithm, and the reason is that the optimality cut in L-shaped is more than the valid cuts in the inexact cut algorithm. Moreover, the computational time of SHLA (δ= 1) is almost equal to that of (δ = 2), which reveals that the computational time canot be affect by the choice of δ. In contrary, ProSHLA(δ = 2) requires more computational time than SHLA (δ = 2), and the reason is that the projection step in ProSHLA are time-consuming. In the following text, we focus on the convergence performance of SHLA and ProSHLA. Thus, only the results of the myopic algorithm, PB, ProSHLA and SHLA are demonstrated, and we use the solutions of the myopic algorithm and PB as the upper and lower bounds, respectively.

4.3 Analysis of convergence performance

In this subsection, a set of experiments are conducted to evaluate the convergence performance of SHLA and ProSHLA, and we choose the second instance (N_R = 800 and N_L = 10) as the experimental illustration. The range of the sample number is set from 20 to 640, and we record the result of each combination of N_R and N_L at each iteration. We can seem from Figure 3 that the convergence rate of SHLA-1 and ProSHLA is remarkably high.

Figure 3.
Convergence rate of ProSHLA and SHLA.

To further evaluate how δ affects the algorithm’s convergence performance, a set of computational experiments are conducted. We increase δ from 1 to 16 and the number of samples from 20 to 640. Here are many combination of δ and the number of samples. We record the sample means of the solutions of SHLA and ProSHLA, PB and the myopic method for each combination. We demonstrate the 3D plots of the solution in Figures 4 and 5. As in Figure 4, the layers of ProSHLA and SHLA are extremely close to the PB layer, and this implies the ProSHLA and SHLA are convergent rapidly for various δ. Furthermore, it can been seem that the performance of ProSHLA can slightly exceeds that of SHLA. In order to further investigate the difference between SHLA and ProSHLA, we demonstrate the performance of ProSHLA and SHLA separately in Figure 5 (without the myopic algorithm and PB). As described in Figure 5, the choice of δ can affect the performance of ProSHLA and SHLA, and a small δ usually leads to a good solution.

Figure 5.
Comparison of ProSHLA and SHLA for various δ.

We provides more details on the convergence performance of ProSHLA and SHLA for various δ in Table 3 below, which clearly demonstrates that in conjunction with the small δ, the performance of SHLA and ProSHLA is close to that of PB.

Total cost—% gap to PB (computational time in seconds)
δ	PB	Myopic	SHLA	ProSHLA
1	−59,397,423	1.6251%	0.0989%(909.5)	0.0989% (1441.5)
2	−59,397,423	1.6251%	0.0997%(907.9)	0.0995% (1506.5)
4	−59,397,423	1.6251%	0.1001%(901.8)	0.0997% (1503.6)
6	−59,397,423	1.6251%	0.1005%(911.5)	0.1004% (1553.5)
8	−59,397,423	1.6251%	0.1028%(901.7)	0.1024% (1535.6)
16	−59,397,423	1.6251%	0.1189%(920.3)	0.1150% (1565.4)

Table 3.

Performance under various δ (no. of samples is 2000).

4.4 An extension experiment on a high dimensional resource allocation problem

Due to the limitation of the container setting, an extension experiment on a higher dimensional problem is considered in this subsection. In this problem, there exists several retailers R and many production facilities (with warehouse) L. In stage 1, an amount xij is moved to a warehouse or retailer or location j from production facility i before the retail demand is realized. When we know the consumer’s demand, then yij products are moved to retailer location j from production facility i. Besides, the type of the consumer’s demand at each location j is different, we denote the type as t∈Τ, we set the consumer’s demand of type t at location j as Djt, and provide pit unit of type t at production location i. We denote the production capacity of location i bycapi. This problem is a non-separable problem.

Subsequently, we formulate the problem as follows:

min∑i∈L∑j∈L∪Rcij1yij+EωQxωE38

subject to

∑j∈L∪Rxij≤capi,∀i∈LE39

∑i∈Lxij=sj,∀j∈L∪RE40

xij,sj≥0,∀i∈L,∀j∈L∪RE41

where the recourse functionQxω is given as follows:

Qxω=min∑i∈L∪R∑j∈Rcij2yij−∑i∈R∑t∈TritpitE42

subject to

∑j∈Ryij=si,∀i∈L∪RE43

∑i∈L∪Ryij=∑t∈Tpjt,∀i∈RE44

pjt≤Djtω,∀t∈L∪R,∀j∈R,t∈TE45

In the first stage, we set cij1=c01+c11dij, where dij is the Euclidean distance between locations i and j, and c01 is the production cost for each product and c11 is the transportation cost per mile. For the second stage costs, we set

cij2=c12dijifi∈Lori=jc02+c12dijifi∈Randi≠j

c12 is the transportation cost per mile in the second stage, and c02 represents the fixed charge for moving each product from one retailer location to another retailer location. For one unit of the demand type t occurring in retailer locationi, a revenuerit will be obtained. Our problem instances differ in the number of products and ∣L∪R∣, and it determines the dimensionality of the recourse function.

Similarly, we use the inexact cut algorithm [15] and the L-shaped algorithm [12] as benchmarks, and these two algorithms are Benders decomposition based methods. Considering the convergence rate is quite important practically, in this part, our main focus is on the speed of convergence. In order to evaluate the speed of convergence of different methods, each algorithm is implemented for 40, 160, 640, 1200, and 4000 iterations, and a side by side comparison of the algorithms has been made when the number of iteration increases. For the L-shaped and inexact cut algorithms, the number of iterations refer to the number of cuts used to approximate the expected recourse function. For ProSHLA (δ = 2), the number of iterations refer to the number of demand samples used.

Table 4 below shows the experiment results. In the experiment, the L-shaped algorithm has been used to help find the optimal solution. In the table, the numbers denote the percent deviation between the optimal value and the objective value.

			Number of iterations
N_\|L∪R\|	N_P	Algorithm	40	160	640	1200	4000	Sec./iter.
\|L ∪ R\| = 6	10	ProSHLA L-shaped	13.26 1.17	8.61 0	2.93 0	2.92 0	2.73 0	0.01 0.07
		Inexact cut	0.96	0	0	0	0	0.05
\|L ∪ R\| = 10	200	ProSHLA L-shaped	10.58 1.85	3.01 0	0.61 0	0.29 0	0.12 0	0.07 0.26
		Inexact cut	1.31	0	0	0	0	0.21
\|L ∪ R\| = 20	400	ProSHLA L-shaped	7.22 10.46	1.22 1.16	0.42 0	0.23 0	0.05 0	0.30 1.13
		Inexact cut	6.63	0.98	0	0	0	1.04
\|L ∪ R\| = 40	800	ProSHLA L-shaped	6.03 23.57	0.82 3.23	0.34 0.31	0.17 0.02	0.02 0	0.83 9.53
		Inexact cut	17.16	2.24	0.13	0.01	0	8.72
\|L ∪ R\| = 100^∗	2000	ProSHLA L-shaped	5.68 44.84	0.78 14.51	0.15 1.38	0.04 0.5	0 0.03	2.68 36.53
		Inexact cut	29.56	8.14	0.91	0.25	0.03	30.98

Table 4.

Percent error over optimal solution with different algorithms costs.

Note. Figures represent the deviation from the best objective value known.

*Optimal solution not found.

For all problem instances, we use the L-shaped algorithm to find the optimal solution. The numbers in the table represent the percent deviation between the objective value and the optimal value obtained after a certain number of iterations. The computational time per iteration are also listed in Table 4. The computational results on 5 scale of dimensionality instances.

In Table 4 above, column 1 presents the number of the locations, and column 2 shows the number of the products. Column 3 presnets the method that we used in the experiment. The percent deviation from the optimal value are contained in columns 4 to 8. Column 9 lists the computational time per iteration. According to results in Table 4, ProSHLA is able to obtain high quality solutions very efficient for different problem instance, and it can maintain the consistent performance in problem of different sizes, especially for large problems. This performance characteristic makes ProSHLA promising for large-scale application. In comparison with these two Benders decomposition-based methods, ProSHLA is competitive for high dimensional problems. The reason is that separable approximations usually scale much more easily to very high dimensional problems. Note that in the first problem instance, when the number of location is 6 and the number of resource is 10 (the inventory in a location might be 0, 1, 2), the result of ProSHLA seems to be breakdown, because the problem instance in this subsection is non-separable, which may introduce errors when we use the separable approximations to approximate the expected recourse function. However, it will not happen on large problems. As for large problems, the separable approximations are nearly continuous, rather than being just piecewise continuous.

According to the above computational results, ProSHLA is a promising method for two-stage stochastic programs, but more comprehensive numerical work is needed before using it in a particular problem. Owing to its efficient performance and simplicity, ProSHLA is a very promising candidate for high-dimensional problems. Moreover, we can use it as an initialization routine method for high-dimensional stochastic programming problems, and it can exploit high-quality initial feasible solution.

5. Conclusion

In this study, we propose an efficient machine learning algorithm for two-stage stochastic programs. This machine learning algorithm is termed as projected stochastic hybrid learning algorithm, and consists of stochastic sub-gradient and piecewise linear approximation methods. We use the stochastic sub-gradient and sample information to update the piecewise linear approximation on the objective function. Then we introduce a projection step, which implemented the sub-gradient methods, to jump out from a local optimum, so that we can achieve a global optimum. By the innovative projection step, we show the convergent property of the algorithm for general two-stage stochastic programs. Furthermore, for the network recourse problem, our algorithm can drop the projection steps, but still maintains the convergence property. The computational results reveal the efficiency of the proposed algorithms and the proposed algorithms are distribution-free. Furthermore, the convergence rate can be affected by the granularity of the initial function (δ). Small granularity usually leads to a high convergence rate. Finally, the computational results also show that the proposed algorithm is very competitive for high dimensional problems. Compared with MAT, the proposed algorithm can collect the information based on knowledge gradients and use it to update the recourse function by learning steps. It can overcome the “curse of dimensionality”. Moreover, it can transfer the problem into a polynomial solvable problem.

Acknowledgments

This research is partially supported by the National Nature Science Foundation of China (Project No. 71701221) and the Natural Science Foundation of Guangdong Province, China, under Grant number: 2019A1515011127.

References

1. Cheung RK, Chen CY. A two-stage stochastic network model and solution methods for the dynamic empty container allocation problem. Transportation Science. 1998;32(2):142-162
2. Bouzaiene-Ayari B, Cheng C, Das S, Fiorillo R, Powell WB. From single commodity to multiattribute models for locomotive optimization: A comparison of optimal integer programming and approximate dynamic programming. Transportation Science. 2016;50:366-389
3. Moreno A, Alem D, Ferreira D, Clark A. An effective two-stage stochastic multi-trip location-transportation model with social concerns in relief supply chains. European Journal of Operational Research. 2018;269(3):1050-1071
4. Kim K, Mehrotra S. A two-stage stochastic integer programming approach to integrated staffing and scheduling with application to nurse management. Operations Research. 2015;63:1431-1451
5. Wallace SW, Ziemba WT. Applications of stochastic programming. In: MOS-SIAM Series on Optimization. Vol. 5. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM); Mathematical Programming Society (MPS); 2005. ISBN:0-8971-555-5
6. Kleywegt AJ, Shapiro A. Homem de Mello T. the sample average approximation method for stochastic discrete optimization. SIAM Journal on Optimization. 2001;12(2):479-502
7. Ermoliev Y. Stochastic quasigradient methods. In: Numerical Techniques for Stochastic Optimization. New York: Springer-Verlag; 1988
8. Robbins H, Monro S. A stochastic approximation method. The Annals of Mathematical Statistics. 1951;22(3):400-407
9. Rockafellar RT, Wets JB. A note about projections in the implementation of stochastic quasigradient methods. In: Numerical Techniques for Stochastic Optimization, Springer Ser. Comput. Math. Vol. 10. Berlin: Springer; 1988. pp. 385-392
10. Ruszczyñski A. A linearization method for nonsmooth stochastic optimization problems. Mathematics of Operations Research. 1987;12:32-49
11. Benders JF. Partitioning procedures for solving mixed-variables programming problems. Numerische Mathematik. 1962;4(1):238-252
12. Van Slyke RM, Wets RJ-B. L-shaped linear programs with applications to optimal control and stochastic programming. SIAM Journal on Applied Mathematics. 1969;17(4):638-663
13. Pereira MVF, Pinto LMVG. Multi-stage stochastic optimization applied to energy planning. Mathematical Programming. 1991;52:359-375
14. Zakeri G, Philpott AB, Ryan DM. Inexact cuts in benders decomposition. SIAM Journal on Optimization. 2000;10(4):643-657
15. Shapiro A. Analysis of stochastic dual dynamic programming method. European Journal of Operational Research. 2011;209(1):63-72
16. Rebennack S. Combining sampling-based and scenario-based nested benders decomposition methods: Application to stochastic dual dynamic programming. Mathematical Programming. 2016;156(1):343-389
17. Philpott AB, Guan Z. On the convergence of stochastic dual dynamic programming and related methods. Operations Research Letters. 2008;36:450-455
18. Girardeau P, Leclere V, Philpott AB. On the convergence of decomposition methods for multistage stochastic convex programs. Mathematics of Operations Research. 2015;40(1):130-145
19. Cheung RK, Powell WB. SHAPE—A stochastic hybrid approximation procedure for two-stage stochastic programs. Operations Research. 2000;48(1):73-79
20. Powell WB, Ruszczyñski A, Togaloglu H. Learning algorithms for separable approximation of discrete stochastic optimization problems. Mathematics of Operations Research. 2004;29(4):814-836
21. Godfrey GA, Powell WB. An adaptive dynamic programming algorithm for dynamic fleet management I: Single period travel times. Transportation Science. 2002;36(1):21-39
22. Neveu J. Discrete Parameter Martingales. Amsterdam: North Holland; 1975
23. Song DP, Dong JX. Empty container management in cyclic shipping routes. Maritime Economics & Logistics. 2008;10(4):335-361
24. Zhou S, Zhang H, Shi N, Xu Z, Wang F. A new convergent hybrid learning algorithm for two-stage stochastic programs. European Journal of Operational Research. 2020;283(1):33-46
25. Xu L, Zou Z, Zhou S. The influence of COVID-19 epidemic on BDI volatility: An evidence from GARCH-MIDAS model. Ocean Coastal Management. 2022. DOI: 10.1016/j.ocecoaman.2022.106330

[1] 1. Cheung RK, Chen CY. A two-stage stochastic network model and solution methods for the dynamic empty container allocation problem. Transportation Science. 1998;32(2):142-162

[2] 2. Bouzaiene-Ayari B, Cheng C, Das S, Fiorillo R, Powell WB. From single commodity to multiattribute models for locomotive optimization: A comparison of optimal integer programming and approximate dynamic programming. Transportation Science. 2016;50:366-389

[3] 3. Moreno A, Alem D, Ferreira D, Clark A. An effective two-stage stochastic multi-trip location-transportation model with social concerns in relief supply chains. European Journal of Operational Research. 2018;269(3):1050-1071

[4] 4. Kim K, Mehrotra S. A two-stage stochastic integer programming approach to integrated staffing and scheduling with application to nurse management. Operations Research. 2015;63:1431-1451

[5] 5. Wallace SW, Ziemba WT. Applications of stochastic programming. In: MOS-SIAM Series on Optimization. Vol. 5. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM); Mathematical Programming Society (MPS); 2005. ISBN:0-8971-555-5

[6] 6. Kleywegt AJ, Shapiro A. Homem de Mello T. the sample average approximation method for stochastic discrete optimization. SIAM Journal on Optimization. 2001;12(2):479-502

[7] 7. Ermoliev Y. Stochastic quasigradient methods. In: Numerical Techniques for Stochastic Optimization. New York: Springer-Verlag; 1988

[8] 8. Robbins H, Monro S. A stochastic approximation method. The Annals of Mathematical Statistics. 1951;22(3):400-407

[9] 9. Rockafellar RT, Wets JB. A note about projections in the implementation of stochastic quasigradient methods. In: Numerical Techniques for Stochastic Optimization, Springer Ser. Comput. Math. Vol. 10. Berlin: Springer; 1988. pp. 385-392

[10] 10. Ruszczyñski A. A linearization method for nonsmooth stochastic optimization problems. Mathematics of Operations Research. 1987;12:32-49

[11] 11. Benders JF. Partitioning procedures for solving mixed-variables programming problems. Numerische Mathematik. 1962;4(1):238-252

[12] 12. Van Slyke RM, Wets RJ-B. L-shaped linear programs with applications to optimal control and stochastic programming. SIAM Journal on Applied Mathematics. 1969;17(4):638-663

[13] 13. Pereira MVF, Pinto LMVG. Multi-stage stochastic optimization applied to energy planning. Mathematical Programming. 1991;52:359-375

[14] 14. Zakeri G, Philpott AB, Ryan DM. Inexact cuts in benders decomposition. SIAM Journal on Optimization. 2000;10(4):643-657

[15] 15. Shapiro A. Analysis of stochastic dual dynamic programming method. European Journal of Operational Research. 2011;209(1):63-72

[16] 16. Rebennack S. Combining sampling-based and scenario-based nested benders decomposition methods: Application to stochastic dual dynamic programming. Mathematical Programming. 2016;156(1):343-389

[17] 17. Philpott AB, Guan Z. On the convergence of stochastic dual dynamic programming and related methods. Operations Research Letters. 2008;36:450-455

[18] 18. Girardeau P, Leclere V, Philpott AB. On the convergence of decomposition methods for multistage stochastic convex programs. Mathematics of Operations Research. 2015;40(1):130-145

[19] 19. Cheung RK, Powell WB. SHAPE—A stochastic hybrid approximation procedure for two-stage stochastic programs. Operations Research. 2000;48(1):73-79

[20] 20. Powell WB, Ruszczyñski A, Togaloglu H. Learning algorithms for separable approximation of discrete stochastic optimization problems. Mathematics of Operations Research. 2004;29(4):814-836

[21] 21. Godfrey GA, Powell WB. An adaptive dynamic programming algorithm for dynamic fleet management I: Single period travel times. Transportation Science. 2002;36(1):21-39

[22] 22. Neveu J. Discrete Parameter Martingales. Amsterdam: North Holland; 1975

[23] 23. Song DP, Dong JX. Empty container management in cyclic shipping routes. Maritime Economics & Logistics. 2008;10(4):335-361

[24] 24. Zhou S, Zhang H, Shi N, Xu Z, Wang F. A new convergent hybrid learning algorithm for two-stage stochastic programs. European Journal of Operational Research. 2020;283(1):33-46

[25] 25. Xu L, Zou Z, Zhou S. The influence of COVID-19 epidemic on BDI volatility: An evidence from GARCH-MIDAS model. Ocean Coastal Management. 2022. DOI: 10.1016/j.ocecoaman.2022.106330

Approximate Dynamic Programming: An Efficient Machine Learning Algorithm

Multi-Agent Technologies and Machine Learning

Abstract

Keywords

Author Information

Zhou Shaorui*

Cai Ming

Zhuo Xiaopo

1. Introduction

1.1 Motivation

1.2 Literature review

1.3 Contributions of the algorithms

2. Description and convergence analysis of ProSHLA for general two-stage stochastic programs

2.1 Description of ProSHLA

Figure 1.

2.2 Convergence analysis of ProSHLA

3. Application for two-stage stochastic programs with network recourse using separable piecewise linear functions

Figure 2.

4. Experimental results for performance analysis

4.1 Problem generator for the empty container repositioning problem

4.2 Effectiveness and efficiency performance

Table 1.

Table 2.

4.3 Analysis of convergence performance

Figure 3.

Figure 4.

Figure 5.

Table 3.

4.4 An extension experiment on a high dimensional resource allocation problem

Table 4.

5. Conclusion

Acknowledgments

References

A State-of-the-Art Survey on Various Domains of Multi-Agent Systems and Machine Learning

Approximate Dynamic Programming: An Efficient Machine Learning Algorithm

Multi-Agent Technologies and Machine Learning

Abstract

Keywords

Author Information

Zhou Shaorui*

Cai Ming

Zhuo Xiaopo

1. Introduction

1.1 Motivation

1.2 Literature review

1.3 Contributions of the algorithms

2. Description and convergence analysis of ProSHLA for general two-stage stochastic programs

2.1 Description of ProSHLA

Figure 1.

2.2 Convergence analysis of ProSHLA

3. Application for two-stage stochastic programs with network recourse using separable piecewise linear functions

Figure 2.

4. Experimental results for performance analysis

4.1 Problem generator for the empty container repositioning problem

4.2 Effectiveness and efficiency performance

Table 1.

Table 2.

4.3 Analysis of convergence performance

Figure 3.

Figure 4.

Figure 5.

Table 3.

4.4 An extension experiment on a high dimensional resource allocation problem

Table 4.

5. Conclusion

Acknowledgments

References

Continue reading from the same book

Multi-Agent Technologies and Machine Learning