Open access peer-reviewed chapter

# Density Estimation in Inventory Control Systems under a Discounted Optimality Criterion

Written By

Submitted: October 1st, 2018 Reviewed: July 4th, 2019 Published: August 7th, 2019

DOI: 10.5772/intechopen.88392

From the Edited Volume

## Statistical Methodologies

Edited by Jan Peter Hessling

Chapter metrics overview

View Full Metrics

## Abstract

This chapter deals with a class of discrete-time inventory control systems where the demand process D t is formed by independent and identically distributed random variables with unknown density. Our objective is to introduce a suitable density estimation method which, combined with optimal control schemes, defines a procedure to construct optimal policies under a discounted optimality criterion.

### Keywords

• discounted optimality
• density estimation
• inventory systems
• optimal policies
• Markov decision processes
• AMS 2010 subject classifications: 93E20
• 62G07
• 90B05

## 1. Introduction

Inventory systems are one of the most studied sequential decision problems in the fields of operation research and operation management. Its origin lies in the problem of determining how much inventory of a certain product should be kept in existence to meet the demand of buyers, at a cost as low as possible. Specifically, the question is: How much should be ordered, or produced, to satisfy the demand that will be presented during a certain period? Clearly, the behavior of the inventory over time depends on the ordered quantities and the demand of the product in successive periods. Indeed, let Itand qtbe the inventory level and the order quantity at the beginning of period t,respectively, and Dtbe the random demand during period t.Then Itt0is a stochastic process whose evolution in time is given as

It+1=max0It+qtDtIt+qtDt+,t=0,1,

Schematically, this process is illustrated in the following figure.

(Standard inventory system)

In this case, the inventory manager (IM) observes the inventory level Itand then selects the order quantity qtas a function of It.The order quantity process causes costs in the operation of the inventory system. For instance, if the quantity ordered is relatively small, then the items are very likely to be sold out, but there will be unmet demand. In this case the holding cost is reduced, but there is a significant cost due to shortage. Otherwise, if the size of the order is large, there is a risk of having surpluses with a high holding cost. These facts give rise to a stochastic optimization problem, which can be modeled as a Markov decision process (MDP). That is, the inventory system can be analyzed as a stochastic optimal control problem whose objective is to find the optimal ordering policy that minimizes a total expected cost.

The analysis of the control problem associated to inventory systems has been done under several scenarios: discrete-time and continuous-time systems with finite or infinite capacity, inventory systems considering bounded and unbounded one-stage cost, as well as partially observable models, among others (see, e.g., [1, 2, 3, 4, 5, 7]). Moreover, such scenarios have their own methods and techniques to solve the corresponding control problem. However, in most cases, it has been assumed that all the components that define the behavior of the inventory system are known to the IM, which, in certain situations, can be too strong and unrealistic. Hence it is necessary to implement schemes that allow learning or collecting information about the unknown components during the evolution of the system to choose a decision with as much information as possible.

In this chapter we study a class of inventory control systems where the density of the demand is unknown by the IM. In this sense, our objective is to propose a procedure that combines density estimation methods and control schemes to construct optimal policies under a total expected discounted cost criterion. The estimation and control procedure is illustrated in the following figure:

(Estimation and control procedure)

In this case, unlike the standard inventory system, before choosing the order quantity qt, the IM implements a density estimation method to get an estimate ρt,and, possibly, combines this with the history of the system ht=I0q0D0It1qt1Dt1Itto select qt=qthtρt.Specifically, the density of the demand is estimated by the projection of an arbitrary estimator on an appropriate set, and its convergence is stated with respect to a norm which depends on the components of the inventory control model.

In general terms, our approach consists in to show that the inventory system can be studied under the weighted-norm approach, widely studied by several authors in the field of Markov decision processes (see, e.g., [11] and references therein) and in adaptive control (see, e.g. [9, 12, 13, 14]). That is, we prove the existence of a weighted function Wwhich imposes a growth condition on the cost functions. Then, applying the dynamic programming algorithm, the density estimation method is adapted to such a condition to define an estimation and control procedure for the construction of optimal policies.

The chapter is organized as follows. In Section 2 we describe the inventory model and define the corresponding optimal control problem. In Section 3 we introduce the dynamic programming approach under the true density. Next, in Section 4 we present the density estimation method which will be used to state, in Section 5, an estimation and control procedure for the construction of optimal policies. The proofs of the main results are given in Section 6. Finally, in Section 7, we present some concluding remarks.

## 2. The inventory model

We consider an inventory system evolving according to the difference equation

It+1=It+qtDt+,t=0,1,,E1

where Itand qtare the inventory level and the order quantity at the beginning of period t,taking values in I0and Q0, respectively, and Dtrepresents the random demand during period t.We assume that Dtis an observable sequence of nonnegative independent and identically distributed (i.i.d.) random variables with a common density ρL10which is unknown by the inventory manager. In addition, we assume finite expectation

D¯EDt<.E2

Moreover, there exists a measurable function ρ¯L10such that

ρsρ¯sE3

almost everywhere with respect to the Lebesgue measure. In addition

0s2ρ¯sds<.E4

For example, if ρ¯sKmin11/s1+r,s[0,),for some positive constants Kand r,then there are plenty of densities that satisfy (3)(4).

The one-stage cost function is defined as

c˜IqD=cq+hI+qD++bDIq+,IqI×Q,E5

where h,c,and bare, respectively, the holding cost per unit, the ordering cost per unit, and the shortage cost per unit, satisfying b>c.

The order quantities applied by the IM are selected according to rules known as ordering control policies defined as follows. Let Htbe the space of histories of the inventory system up to time t.That is, a typical element of Htis written as

ht=I0q0D0It1qt1Dt1It.

An ordering policy (or simply a policy) γ=γtis a sequence of measurable functions γt:HtQ,such that γtht=qt,t0. We denote by Γthe set of all policies. A feedback policy or Markov policy is a sequence γ=gtof functions gt:IQ,such that gtIt=qt.A feedback policy γ=gtis stationary if there exists a function g:IQsuch that gt=gfor all t0.

When using a policy γΓ,given the initial inventory level I0=I, we define the total expected discounted cost as

VγIEt=0αtc˜ItqtDt,E6

where α01is the so-called discount factor. The inventory control problem is then to find an optimal feedback policy γsuch that VγI=VIfor all II, where

VIinfγΓVγI,II,E7

is the optimal discounted cost, which we call value function.

We define the mean one-stage cost as

cIq=cq+hEI+qD++bEDIq+=cq+h0I+qI+qs+ρsds+bI+qsIq+ρsds,IqI×Q.E8

Then, by using properties of conditional expectation, we can rewrite the total expected discounted cost (6) as

VγI=EIγt=0αtcItqt,E9

where EIγdenotes the expectation operator with respect to the probability PIγinduced by the policy γ,given the initial inventory level I0=I(see, e.g., [8, 10]).

The sequence of events in our model is as follows. Since the density ρis unknown, the one-stage cost (8) is also unknown by the IM. Then if at stage tthe inventory level is It=II,the IM implements a suitable density estimation method to get an estimate ρtof ρ.Next, he/she combines this with the history of the system to select an order quantity qt=q=γtρthtQ.Then a cost cIqis incurred, and the system moves to a new inventory level It+1=I'Iaccording to the transition law

QBIqProbIt+1BIt=Iqt=q=01BI+qs+ρsdsE10

where 1B.denotes the indicator function of the set BBI,and BIis the Borel σalgebra on I. Once the transition to the inventory level I'occurs, the process is repeated. Furthermore, the costs are accumulated according to the discounted cost criterion (9).

## 3. Dynamic programming equation under the true density ρ

The study of the inventory control problem will be done by means of the well-known dynamic programming (DP) approach, which we now introduce in terms of the unknown density ρ.In order to establish precisely the ideas, we first present some preliminary and useful facts.

The set of order quantities in which we can find the optimal ordering policy should be Q=0QQ,

where

Q=bD¯c1α.

Thus, we can restrict the range of qso that qQ.Specifically we have the following result.

Lemma 3.1Letγ0Γbe the policy defined asγ0=00, and letγ¯=γ¯tbe a policy such thatγ¯khk=q¯k>Q,for at least ak=0,1,.Then

Vγ0IVγ¯I,II.E11

That is,γ0is a better solution thanγ¯.

Proof.Let It0,t=0,1,,be the inventory levels generated by the application of γ0,and I¯tq¯tbe the sequence of inventory levels and order quantities generated by γ¯,where I00=I¯0=I,It+10=It0Dt+, and I¯t+1=I¯t+q¯tDt+,t0.Without loss of generality, we suppose that for a q¯>Qwe have q¯0=q¯.Note that It0I¯t,for all t0.Then observing that cq¯>bD¯/1α,

Vγ0I=Et=0αtcIt00Dt=Et=0αthIt0Dt++bDtIt0+Et=0αthI¯tDt++bt=0αtEDtEt=0αthI¯t+q¯tDt++bDtI¯tq¯t++bD¯1αEt=0αthI¯t+q¯tDt++bDtI¯tq¯t++cq¯Et=0αtcq¯t+hI¯t+q¯tDt++bDtI¯tq¯t+=Vγ¯I,II.

Remark 3.2Observe that forIqI×Qwe have

cIq=cq+LI+q,

where, by writingy=I+q,

LyhEyD++bEDy+.

In addition, observe that for any fixeds[0,),the functionsyys+andysy+are convex, which implies thatLyis convex. Moreover

limyLy=.

The following lemma provides a growth property of the one-stage cost function (8).

Lemma 3.3There exist a numberβand a functionW:I[1,)such that0<αβ<1,

supIqsI×Q×[0,)WI+qs+WIφ<,E12

and for allIqI×Q

cIqWI.E13

In addition, for any densityμon0such that0s<,

0WI+qs+μsdsβWI,IqI×Q.E14

The proof of Lemma 3.3 is given in Section 6.

We denote by BWthe normed linear space of all measurable functions u:Iwith finite weighted-norm (Wnorm) Wdefined as

uWsupIIuIWI.E15

Essentially, Lemma 3.3 proves that the inventory system (1) falls within of the weighted-norm approach used to study general Markov decision processes (see, e.g., [11]). Hence, we can formulate, on the space BW, important results as existence of solutions of the DP-equation, convergence of the value iteration algorithm, as well as existence of optimal policies, in the context of the inventory system (1). Indeed, let

VnγI=EIγt=0n1αtcItqt

be the n-stage discounted cost under the policy γΓand the initial inventory level II,and

VnI=infγΓVnγI;V0I=0,II

the corresponding value function. Then, for all n0and II,(see, e.g., [6, 10, 11]),

VnI=minqQcIq+α0Vn1I+qs+ρsdsE16

Moreover, from [11, Theorem 8.3.6], by making the appropriate changes, we have the following result.

Theorem 3.4 (Dynamic programming)(a) The functionsVnandVbelong toBW.Moreover

VnIWI1αβ,VIWI1αβ,II.E17

1. (b) Asn,VnVW0.

2. (c)Vis convex.

3. (d)Vsatisfies the dynamic programming equation:

VI=minqQcIq+α0VI+qs+ρsds=minIyQ+Icy+Ly+α0Vys+ρsdscI,II.E18

1. (e) There exists a functiong:IQsuch thatgIQand, for eachII,

VI=cIgI+α0VI+gIs+ρsds,II.

Moreover,γ=gis an optimal control policy.

## 4. Density estimation

As the density ρis unknown, the results in Theorem 3.4 are not applicable, and therefore they are not accessible to the IM. In this section we introduce a suitable density estimation method with which we can obtain an estimated DP-equation. This will allow us to define a scheme for the construction of optimal policies. To this end, let D0,D1,,Dt,be independent realizations of the demand whose density is ρ.

Theorem 4.1There exists an estimatorρtsρtsD0D1Dt1,s0,), ofρ, such that (see(2) and(3)):

1. D.1.ρtL10is a density.

2. D.2.ρtρ¯a.e. with respect to the Lebesgue measure.

3. D.3.0sρtsdsD¯.

4. D.4.E0ρtρsds0,ast.

5. D.5.Eρtρ0,ast,where

μsupIqI×Q1WI0WI+qs+μsdsE19

for measurable functions μon 0.

It is worth noting that for any density μon 0satisfying (14), the norm μis finite. The remainder of the section is devoted to prove Theorem 4.1.

We define the set DL1([0,))as:

Observe that ρD.

Lemma 4.2The setDis closed and convex inL1([0,)).

Proof.The convexity of Dfollows directly. To prove that Dis closed, let μtDbe a sequence in Dsuch that μtL1μL1([0,)).First, we prove

μsρ¯sa.e.E20

We assume that there is A[0,)with mA>0such that μs>ρ¯s,sA,mbeing the Lebesgue measure on . Then, for some ε>0and AAwith mA>0,

μs>ρ¯s+ε,sA.E21

Now, since μtD,t0,there exists Bt0,)with mBt=0,such that

μtsρ¯s,s[0,)\Bt,t0.E22

Combining (21) and (22) we have

μtsμsε,sA([0,)\Bt),t0.

Using the fact that m(A([0,)\Bt)=mA>0, we obtain that μtdoes not converge to μin measure, which is a contradiction to the convergence in L1.Therefore μsρ¯sa.e.

On the other hand, applying Holder’s inequality and using the fact that ρ¯L10, from (20),

10μsds=0μtsds0μsds=0μtsμs12μtsμs12ds02ρ¯s1/20μtsμs1/20ast,E23

which implies 0μsds=1.Now, as μ0a.e.,we have that μis a density. Similarly, from (4),

0sμtsμsds=0sμtsμs12μtsμs12ds0s22ρ¯sds1/20μtsμsds1/2212M0μtsμsds1/2,E24

for some constant M<.Letting twe obtain

0sμtsds0sds

which, in turn, implies that

0sdsD¯.

This proves that Dis closed.∎

Let ρ̂tsρ̂tsD0D1Dt,s[0,),be an arbitrary estimator of ρsuch that

Eρρ̂tL1=E0ρsρ̂ts0ast.E25

Lemma 4.2 ensures the existence of the estimator ρtwhich is defined by the projection of ρ̂ton the set of densities D.That is, the density ρtD, expressed as

ρtargminσDσρ̂tL1,

is the “best approximation” of the estimator ρ̂ton the set D,that is,

ρtρ̂tL1=infμDμρ̂tL1.E26

Now observe that ρtsatisfies the properties D.1, D.2, and D.3. Hence, Theorem 4.1 will be proved if we show that ρtsatisfies D.4 and D.5. To this end, since ρD, from (26) observe that

ρtρL1ρtρ̂tL1+ρ̂tρL12ρ̂tρL1,t0,

which implies that, from (25),

E0ρsρtsds2Eρ̂tρL10,ast.E27

That is, ρtsatisfies Property D.4. In fact, since 0ρsρtsds2a.s., from (27) it is easy to see that

E0ρsρtsdsq0,ast,foranyq>0.E28

Now, to obtain property D.5, observe that from (12)

ρtρ=supIqI×Q1WI0WI+qs+ρsρtsds=φ0ρsρtsds.E29

Therefore, property D.4 yields

Eρtρ0,ast,E30

which proves the property D.5.

## 5. Estimation and control

Having defined the estimator ρt,we will now introduce an estimate dynamic programming procedure with which we can construct optimal policies for the inventory systems.

Observe that for each t0,from (14),

0WI+qs+ρtsdsβWI,IqI×Q.E31

Now, we define the estimate one-stage cost function:

ctIq=cq+h0I+qI+qs+ρtsds+bI+qsIq+ρtsds=cq+LtI+q,IqI×Q,E32

where (see Remark 3.2) for y=I+q,

Ltyh0yys+ρtsds+bysy+ρtsds.

In addition, observe that for each t0,Ltyis convex and

limyLty=.E33

We define the sequence of functions Vtas V00,and for t1

VtI=minqQctIq+α0Vt1I+qs+ρtsds=minIyQ+Icy+Lty+α0Vt1ys+ρtsdscI,II.E34

We can state our main results as follows:

Theorem 5.1(a) Fort0andII,

VtIWI1αβ.E35

Therefore, VtBW.

1. (b) Ast,EsupIqI×QctIqc(Iq)WI0.

2. (c) Ast,EVtVW0.

3. (d) For eacht0,there existsKt0such that the selectorgt:IQdefined as

qt=gtIKtIif0IKt0ifI>Kt

attains the minimum in(34).

Remark 5.2From [10, Proposition D.7], for eachII, there is an accumulation pointgIQof the sequencegtI. Hence, there exists a constantKsuch that

gIKIif0IK0ifI>KE36

Theorem 5.3Letgbe the selector defined in(36). Then the stationary policyγgis an optimal base stock policy for the inventory problem.

## 6. Proofs

### 6.1 Proof of Lemma 3.3

Note that, for each IqI×Q,

cIqcQ+hI+Q+bD¯c+hQ+hI+bD¯MGI,E37

where Mmaxc+hQ+bD¯hand GI=I+1.Moreover, for every density function μon 0and IqI×Q,

0GI+qs+μsdsGI+Q.E38

On the other hand, we define the sequence of functions wt,wt:I, as

w0I1+MGIE39

and for t1and any density function μon 0

wtIsupqQ0wt1I+qs+μsds.

Observe that, for each II,

w1I=supqQ01+MGI+qs+μsds1+MGI+MQ.

Thus,

w2I=supqQ01+MGI+qs++MQμsds1+MGI+MQ+MQ,II.

In general, it is easy to see that for each II,

wtIMGI+1+j=0t1MQ=MGI+1+MQt.E40

Let α0α1be arbitrary, and define

WIt=0α0twtI.E41

Then, from (40),

WIt=0α0tMGI+1+MQt=t=0α0tMGI+1+MQt=0tα0tMGI+11α0+MQα01α02.E42

Therefore, WI<for each II, and since w0>1,from (41),

WI>1.E43

Furthermore, using (42) and the fact that Ww0,a straightforward calculation shows that

φsupIqsI×Q×0,)WI+qs+WI<.E44

Now, from (37) and (39), cIqw0I,which yields, for all IqI×Q,

cIqWI.E45

In addition, for every density function μon 0and IqI×Q,

0WI+qs+μsds=0t=0α0twtI+qs+μsds=t=0α0t0wtI+qs+μsdst=0α0twt+1I=α01t=0α0twtIw0I=α01WIw0Iα01WI.

Therefore, defining βα01,we have that 0<αβ<1,and

0WI+qs+μsdsβWI,IqI×Q,

which, together with (43), (44), and (45), proves Lemma 3.3.∎

### 6.2 Proof of Theorem 5.1

1. (a)Since 0sρtsdsD¯, from (32) (see (37)) ctIqMGIfor each t0,IqI×Q.Hence, it is easy to see that ctIqWIfor each IqI×Q(see (45)). Then we have V1IWI,and from (31), and by applying induction arguments, we get

VtIWI1αβ,t0,II.E46

1. (b)Observe that from (39), for each II,

WIw0I=1+MGI,

1. which implies that (see (43))

MGIWI11WI<.E47

1. In addition, from (37),

hI+QMGI.E48

1. On the other hand, similarly as (24), from (4), it is easy to see that

0sρtsρsds212M0ρtsρsds1/2,E49

1. for some constant M<.Hence, combining (47)(49), from the definition of ctIqand cIq, we have

ctIqc(Iq)WIhWI0I+Qρtsρsds+bWI0sρtsρsdsMGIWI0ρtsρsds+b212M'0ρtsρsds1/2.

1. Finally, taking expectation, (28) and Property D.4 prove the result.

2. (c)For each IIand t0,by adding and subtracting the term α0Vt1I+qs+ρsds, we have

VtIVIsupqQctIq(Iq)+supqQα0Vt1I+qs+ρtsρsds+α0Vt1I+qs+VI+qs+ρsdssupqQctIq(Iq)+α1αβsupqQ0WI+qs+ρtsρsds+αβVt1VWWI,

where the last inequality is due to (35), (17), (14), and (15). Therefore, from (15) and (19) and by taking expectation,

EVtVWEsupqQctIqc(Iq)+α1αβEρtρ+αβEVt1VW.E50

Finally, from (17) and (35), ηlimsuptEVVtW<. Hence, taking limsup in both sides of (50), from part (a) and property D.5 in Theorem 4.1, we get ηαβη, which yields η=0(since 0<αβ<1). This proves (c).

1. (d)For each t0,let Ht:Ibe the function defined as

Htycy+Lty+α0Vt1ys+ρtsds.

Hence, (34) is equivalent to

VtI=minqQHtI+qcI,II.E51

Moreover (see (33)), observe that Htis convex and limyHty=.Thus, there exist a constant Kt0such that

HtKt=minIyQ+IHty,

and

gtI=KtIif0IKt0ifI>Kt

attains the minimum in (51).∎

### 6.3 Proof of Theorem 5.3

We fix an arbitrary II. Since gIis an accumulation point of gtI(see Remark 5.2), there exists a subsequence tmIof t(tm=tmI)such that

gtmIgIasm.

Moreover, from (34) and Theorem 5.1(d), letting tm=m,we have

VmI=cmIgm+α0Vm1I+gms+ρmds.E52

On the other hand, following similar arguments as the proof of Theorem 5.1(c), for each m0and IqI×Q,we have

α0Vm1I+qs+ρmsdsα0VI+qs+ρsdsα0Vm1I+qs+VI+qs+ρmsds+α0VI+qs+ρmsρsdsαβVm1VWWI+α1αβρmρ.

Then, for each II,

EsupqQα0Vm1I+qs+ρmsdsα0VI+qs+ρsds0,asm.E53

Now,

α0Vm1I+gms+ρmds=α0Vm1I+gms+ρmdsα0VI+gms+ρsds+α0VI+gms+ρsds.E54

Taking expectation and liminf as mon both sides of (54), from (53) we obtain

liminfm0Vm1I+gms+ρmds=liminfm0VI+gms+ρsds0VI+gs+ρsds,

where the last inequality follows by applying Fatou’s Lemma and because the function qI+qs+is continuous. Hence, taking expectation and liminf in (52), we obtain

cIg+α0VI+gs+ρsdsVI,II.E55

As Iwas arbitrary, by (18), the equality holds in (55) for all II. To conclude, standard arguments on stochastic control literature (see, e.g., [10]) show that the policy γ=gis optimal.∎

## 7. Concluding remarks

In this chapter we have introduced an estimation and control procedure in inventory systems when the density of the demand is unknown by the inventory manager. Specifically we have proposed a density estimation method defined by the projection to a suitable set of densities, which, combined with control schemes relative to the inventory systems, defines a procedure to construct optimal ordering policies.

A point to highlight is that our results include the most general scenarios of an inventory system, e.g., state and control spaces either countable or uncountable, possibly unbounded costs, finite or infinite inventory capacity. This generality entailed the need to develop new estimation and control techniques, accompanied by a suitable mathematical analysis. For example, the simple fact of considering possibly unbounded costs led us to formulate a density estimation method that was related to the weight function W, which, in turn, defines the normed linear space BW(see (15)), all this through the projection estimator. Observe that if the cost function cis bounded, we can take W1and we have =L1(see (19) and (25)). Thus, any L1consistent density estimator ρtcan be used for the construction of optimal ordering policies.

Finally, the theory presented in this chapter lays the foundations to develop estimation and control algorithms in inventory systems considering other optimality criteria, for instance, the average cost or discounted criteria with random state-action-dependent discount factors (see [14, 15] and references therein).

## References

1. 1. Arrow KJ, Karlin S, Scarf H. Studies in the Mathematical Theory of Inventory and Production. CA: Stanford University Press; 1958
2. 2. Bensoussan A, Çakanyıldırım M, Sethi SP. Partially observed inventory systems: The case of zero balance walk. SIAM Journal on Control and Optimization. 2007;46:176-209
3. 3. Bensoussan A, Çakanyıldırım M, Minjárez-Sosa JA, Royal A, Sethi SP. Inventory problems with partially observed demands and lost sales. Journal of Optimization Theory and Applications. 2008;136:321-340
4. 4. Bensoussan A, Çakanyıldırım M, Minjárez-Sosa JA, Sethi SP, Shi R. Partially observed inventory systems: The case of rain checks. SIAM Journal on Control and Optimization. 2008;47(5):2490-2519
5. 5. Bensoussan A, Çakanyıldırım M, Minjárez-Sosa JA, Sethi SP, Shi R. An incomplete information inventory model with presence of inventories or backorders as only observations. Journal of Optimization Theory and Applications. 2010;146(3):544-580
6. 6. Bertsekas DP. Dynamic Programming: Deterministic and Stochastic Models. Englewood Cliffs, N.J: Prentice-Hall; 1987
7. 7. Beyer D, Cheng F, Sethi SP, Taksar MI. Markovian Demand Inventory Models. New York: Springer; 2008
8. 8. Dynkin EB, Yushkevich AA. Controlled Markov Processes. New York: Springer-Verlag; 1979
9. 9. Gordienko EI, Minjárez-Sosa JA. Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion. Kybernetika. 1998;34:217-234
10. 10. Hernández-Lerma O, Lasserre JB. Discrete-Time Markov Control Processes: Basic Optimality Criteria. New York: Springer-Verlag; 1996
11. 11. Hernández-Lerma O, Lasserre JB. Further Topics on Discrete-Time Markov Control Processes. New York: Springer-Verlag; 1999
12. 12. Hilgert N, Minjárez-Sosa JA. Adaptive policies for time-varying stochastic systems under discounted criterion. Mathematical Methods of Operations Research. 2001;54(3):491-505
13. 13. Minjárez-Sosa JA. Approximation and estimation in Markov control processes under discounted criterion. Kybernetika. 2004;6(40):681-690
14. 14. Minjárez-Sosa JA. Empirical estimation in average Markov control processes. Applied Mathematics Letters. 2008;21:459-464
15. 15. Minjárez-Sosa JA. Markov control models with unknown random state-action-dependent discount factors. TOP. 2015;23:743-772

Written By