Density Estimation in Inventory Control Systems under a Discounted Optimality Criterion

This chapter deals with a class of discrete-time inventory control systems where the demand process D t f g is formed by independent and identically distributed random variables with unknown density. Our objective is to introduce a suitable density estimation method which, combined with optimal control schemes, defines a procedure to construct optimal policies under a discounted optimality criterion.


Introduction
Inventory systems are one of the most studied sequential decision problems in the fields of operation research and operation management. Its origin lies in the problem of determining how much inventory of a certain product should be kept in existence to meet the demand of buyers, at a cost as low as possible. Specifically, the question is: How much should be ordered, or produced, to satisfy the demand that will be presented during a certain period? Clearly, the behavior of the inventory over time depends on the ordered quantities and the demand of the product in successive periods. Indeed, let I t and q t be the inventory level and the order quantity at the beginning of period t, respectively, and D t be the random demand during period t: Then I t f g t ≥ 0 is a stochastic process whose evolution in time is given as Schematically, this process is illustrated in the following figure.
(Standard inventory system) In this case, the inventory manager (IM) observes the inventory level I t and then selects the order quantity q t as a function of I t : The order quantity process causes costs in the operation of the inventory system. For instance, if the quantity ordered is relatively small, then the items are very likely to be sold out, but there will be unmet demand. In this case the holding cost is reduced, but there is a significant cost due to shortage. Otherwise, if the size of the order is large, there is a risk of having surpluses with a high holding cost. These facts give rise to a stochastic optimization problem, which can be modeled as a Markov decision process (MDP). That is, the inventory system can be analyzed as a stochastic optimal control problem whose objective is to find the optimal ordering policy that minimizes a total expected cost.
The analysis of the control problem associated to inventory systems has been done under several scenarios: discrete-time and continuous-time systems with finite or infinite capacity, inventory systems considering bounded and unbounded onestage cost, as well as partially observable models, among others (see, e.g., [1][2][3][4][5]7]). Moreover, such scenarios have their own methods and techniques to solve the corresponding control problem. However, in most cases, it has been assumed that all the components that define the behavior of the inventory system are known to the IM, which, in certain situations, can be too strong and unrealistic. Hence it is necessary to implement schemes that allow learning or collecting information about the unknown components during the evolution of the system to choose a decision with as much information as possible.
In this chapter we study a class of inventory control systems where the density of the demand is unknown by the IM. In this sense, our objective is to propose a procedure that combines density estimation methods and control schemes to construct optimal policies under a total expected discounted cost criterion. The estimation and control procedure is illustrated in the following figure: (Estimation and control procedure) In this case, unlike the standard inventory system, before choosing the order quantity q t , the IM implements a density estimation method to get an estimate ρ t , and, possibly, combines this with the history of the system h t ¼ I 0 ; q 0 ; D 0 ; …; I tÀ1 ; q tÀ1 ; D tÀ1 ; I t À Á to select q t ¼ q t h t ; ρ t ð Þ: Specifically, the density of the demand is estimated by the projection of an arbitrary estimator on an appropriate set, and its convergence is stated with respect to a norm which depends on the components of the inventory control model.
In general terms, our approach consists in to show that the inventory system can be studied under the weighted-norm approach, widely studied by several authors in the field of Markov decision processes (see, e.g., [11] and references therein) and in adaptive control (see, e.g. [9,[12][13][14]). That is, we prove the existence of a weighted function W which imposes a growth condition on the cost functions. Then, applying the dynamic programming algorithm, the density estimation method is adapted to such a condition to define an estimation and control procedure for the construction of optimal policies.
The chapter is organized as follows. In Section 2 we describe the inventory model and define the corresponding optimal control problem. In Section 3 we introduce the dynamic programming approach under the true density. Next, in Section 4 we present the density estimation method which will be used to state, in Section 5, an estimation and control procedure for the construction of optimal policies. The proofs of the main results are given in Section 6. Finally, in Section 7, we present some concluding remarks.

The inventory model
We consider an inventory system evolving according to the difference equation where I t and q t are the inventory level and the order quantity at the beginning of period t, taking values in I ≔ 0; ∞ ½ Þand Q ≔ 0; ∞ ½ Þ, respectively, and D t represents the random demand during period t: We assume that D t f g is an observable sequence of nonnegative independent and identically distributed (i.i.d.) random variables with a common density ρ ∈ L 1 0; ∞ ½ Þwhich is unknown by the inventory manager. In addition, we assume finite expectation Moreover, there exists a measurable function ρ ∈ L 1 0; ∞ ½ Þsuch that almost everywhere with respect to the Lebesgue measure. In addition For example, if ρ s ð Þ ≔ Kmin 1; 1=s 1þr È É , s ∈ ½0, ∞Þ, for some positive constants K and r, then there are plenty of densities that satisfy (3)-(4).
The one-stage cost function is defined as where h, c, and b are, respectively, the holding cost per unit, the ordering cost per unit, and the shortage cost per unit, satisfying b > c: The order quantities applied by the IM are selected according to rules known as ordering control policies defined as follows. Let H t be the space of histories of the inventory system up to time t: That is, a typical element of H t is written as h t ¼ I 0 ; q 0 ; D 0 ; …; I tÀ1 ; q tÀ1 ; D tÀ1 ; I t À Á : An ordering policy (or simply a policy) γ ¼ γ t f g is a sequence of measurable functions γ t : H t ! Q , such that γ t h t ð Þ ¼ q t , t ≥ 0. We denote by Γ the set of all policies. A feedback policy or Markov policy is a sequence γ ¼ g t È É of functions g t : I ! Q , such that g t I t ð Þ ¼ q t : A feedback policy γ ¼ g t È É is stationary if there exists a function g : I ! Q such that g t ¼ g for all t ≥ 0: When using a policy γ ∈ Γ, given the initial inventory level I 0 ¼ I, we define the total expected discounted cost as where α ∈ 0; 1 ð Þis the so-called discount factor. The inventory control problem is then to find an optimal feedback policy γ * such that V γ * ; is the optimal discounted cost, which we call value function.
We define the mean one-stage cost as Then, by using properties of conditional expectation, we can rewrite the total expected discounted cost (6) as where E γ I denotes the expectation operator with respect to the probability P γ I induced by the policy γ, given the initial inventory level I 0 ¼ I (see, e.g., [8,10]). The sequence of events in our model is as follows. Since the density ρ is unknown, the one-stage cost (8) is also unknown by the IM. Then if at stage t the inventory level is I t ¼ I ∈ I, the IM implements a suitable density estimation method to get an estimate ρ t of ρ: Next, he/she combines this with the history of the system to select an order quantity q t ¼ q ¼ γ ρ t t h t ð Þ∈ Q: Then a cost c I; q ð Þ is incurred, and the system moves to a new inventory level I tþ1 ¼ I 0 ∈ I according to the transition law where 1 B : ð Þ denotes the indicator function of the set B ∈ B I ð Þ, and B I ð Þ is the Borel σÀalgebra on I. Once the transition to the inventory level I 0 occurs, the process is repeated. Furthermore, the costs are accumulated according to the discounted cost criterion (9).

Dynamic programming equation under the true density ρ
The study of the inventory control problem will be done by means of the wellknown dynamic programming (DP) approach, which we now introduce in terms of the unknown density ρ: In order to establish precisely the ideas, we first present some preliminary and useful facts.
The set of order quantities in which we can find the optimal ordering policy should be Thus, we can restrict the range of q so that q ∈ Q * : Specifically we have the following result.
That is, γ 0 is a better solution than γ: Proof. Let I 0 t , t ¼ 0, 1, …, be the inventory levels generated by the application of γ 0 , and I t ; q t À Á be the sequence of inventory levels and order quantities generated out loss of generality, we suppose that for a q > Q * we have q 0 ¼ q: Note that I 0 t ≤ I t , for all t ≥ 0: Then observing that cq > bD= 1 À α ð Þ, where, by writing y ¼ I þ q, In addition, observe that for any fixed s ∈ ½0, ∞Þ, the functions y ! y À s ð Þ þ and y ! s À y ð Þ þ are convex, which implies that L y ð Þ is convex. Moreover lim y!∞ L y ð Þ ¼ ∞: The following lemma provides a growth property of the one-stage cost function (8).
Lemma 3.3 There exist a number β and a function W : I ! ½1, ∞Þ such that 0 < αβ < 1, and for all I; q In addition, for any density μ on 0; The proof of Lemma 3.3 is given in Section 6. We denote by B W the normed linear space of all measurable functions u : I ! ℜ with finite weighted-norm (WÀnorm) ∥ Á ∥ W defined as Essentially, Lemma 3.3 proves that the inventory system (1) falls within of the weighted-norm approach used to study general Markov decision processes (see, e.g., [11]). Hence, we can formulate, on the space B W , important results as existence of solutions of the DP-equation, convergence of the value iteration algorithm, as well as existence of optimal policies, in the context of the inventory system (1). Indeed, let t¼0 α t c I t ; q t À Á " # be the n-stage discounted cost under the policy γ ∈ Γ and the initial inventory level I ∈ I, and the corresponding value function. Then, for all n ≥ 0 and I ∈ I, (see, e.g., [6, 10, 11]), (d) V * satisfies the dynamic programming equation: (e) There exists a function g * : I ! Q such that g * I ð Þ ∈ Q * and, for each I ∈ I, Moreover, γ * ¼ g * f g is an optimal control policy.

Density estimation
As the density ρ is unknown, the results in Theorem 3.4 are not applicable, and therefore they are not accessible to the IM. In this section we introduce a suitable density estimation method with which we can obtain an estimated DP-equation. This will allow us to define a scheme for the construction of optimal policies. To this end, let D 0 , D 1 , …, D t , … be independent realizations of the demand whose density is ρ: Theorem 4.1 There exists an estimator ρ t s ð Þ ≔ ρ t s; D 0 ; D 1 ; …; D tÀ1 ð Þ , s∈ 0, ∞Þ, of ρ, such that (see (2) and (3)): D.1. ρ t ∈ L 1 0; ∞ ½ Þis a density. D.2. ρ t ≤ ρ Á ð Þ a.e. with respect to the Lebesgue measure.
for measurable functions μ on 0; ∞ ½ Þ: It is worth noting that for any density μ on 0; ∞ ½ Þsatisfying (14), the norm kμk is finite. The remainder of the section is devoted to prove Theorem 4.1.
We define the set D ⊂ L 1 0, ∞ ½ Þ ð Þas: Observe that ρ ∈ D. Lemma 4.2 The set D is closed and convex in L 1 0, ∞ ½ Þ ð Þ: Proof. The convexity of D follows directly. To prove that D is closed, let μ t ∈ D be a sequence in D such that μ t !
We assume that there is A ⊂ ½0, ∞Þ with m A ð Þ> 0 such that μ s ð Þ > ρ s ð Þ, s ∈ A, m being the Lebesgue measure on ℜ. Then, for some ε > 0 and Combining (21) and (22) we have Þ , t≥ 0: , we obtain that μ t does not converge to μ in measure, which is a contradiction to the convergence in L 1 : Therefore μ s ð Þ ≤ ρ s ð Þ a.e. On the other hand, applying Holder's inequality and using the fact that ρ ∈ L 1 0; ∞ ½ Þ, from (20), which implies Ð ∞ 0 μ s ð Þds ¼ 1: Now, as μ ≥ 0 a:e:, we have that μ is a density. Similarly, from (4), This proves that D is closed.∎ Letρ t s ð Þ ≔ρ t s; D 0 ; D 1 ; …; D t ð Þ , s∈ ½0, ∞Þ, be an arbitrary estimator of ρ such that Lemma 4.2 ensures the existence of the estimator ρ t which is defined by the projection ofρ t on the set of densities D: That is, the density ρ t ∈ D, expressed as is the "best approximation" of the estimatorρ t on the set D, that is, Now observe that ρ t satisfies the properties D.1, D.2, and D.3. Hence, Theorem 4.1 will be proved if we show that ρ t satisfies D.4 and D.5. To this end, since ρ ∈ D, from (26) observe that which implies that, from (25), That is, ρ t satisfies Property D.4. In fact, since Now, to obtain property D.5, observe that from (12) kρ t À ρk ¼ sup Therefore, property D.4 yields which proves the property D.5.

Estimation and control
Having defined the estimator ρ t , we will now introduce an estimate dynamic programming procedure with which we can construct optimal policies for the inventory systems.
Observe that for each t ≥ 0, from (14), Now, we define the estimate one-stage cost function: where (see Remark 3.2) for y ¼ I þ q, In addition, observe that for each t ≥ 0, L t y ð Þ is convex and lim y!∞ L t y ð Þ ¼ ∞: We define the sequence of functions V t f g as V 0 0, and for t ≥ 1 We can state our main results as follows: Theorem 5.1 (a) For t ≥ 0 and I ∈ I, Therefore, V t ∈ B W : For each t ≥ 0, there exists K t ≥ 0 such that the selector g t : I ! Q defined as attains the minimum in (34). Remark 5.2 From [10, Proposition D.7], for each I ∈ I, there is an accumulation point g ∞ I ð Þ ∈ Q * of the sequence g t I ð Þ È É . Hence, there exists a constant K * such that Theorem 5.3 Let g ∞ be the selector defined in (36). Then the stationary policy γ * ≔ g ∞ È É is an optimal base stock policy for the inventory problem.

Concluding remarks
In this chapter we have introduced an estimation and control procedure in inventory systems when the density of the demand is unknown by the inventory manager. Specifically we have proposed a density estimation method defined by the projection to a suitable set of densities, which, combined with control schemes relative to the inventory systems, defines a procedure to construct optimal ordering policies.
A point to highlight is that our results include the most general scenarios of an inventory system, e.g., state and control spaces either countable or uncountable, possibly unbounded costs, finite or infinite inventory capacity. This generality entailed the need to develop new estimation and control techniques, accompanied by a suitable mathematical analysis. For example, the simple fact of considering possibly unbounded costs led us to formulate a density estimation method that was related to the weight function W, which, in turn, defines the normed linear space B W (see (15)), all this through the projection estimator. Observe that if the cost function c is bounded, we can take W 1 and we have Á k k ¼ Á k k L 1 (see (19) and (25)). Thus, any L 1 Àconsistent density estimator ρ t can be used for the construction of optimal ordering policies.
Finally, the theory presented in this chapter lays the foundations to develop estimation and control algorithms in inventory systems considering other optimality criteria, for instance, the average cost or discounted criteria with random stateaction-dependent discount factors (see [14,15] and references therein).