Open access peer-reviewed chapter

Using Dynamic Programming Based on Bayesian Inference in Selection Problems

By Mohammad Saber Fallah Nezhad

Submitted: October 30th 2013Reviewed: November 20th 2013Published: April 29th 2014

DOI: 10.5772/57423

Downloaded: 1200

1. Introduction

An important subject in mathematical science that causes new improvements in data analysis is sequential analysis. In this type of analysis, the number of required observations is not fixed in advance, but is a variable and depends upon the values of the gathered observation. In sequential analysis, at any stage of data gathering process, to determine the number of required observations at the next stage, we analyze the data at hand and with respect to the obtained results, we determine how many more observations are necessary. In this way, the process of data gathering is cheaper and the information is used more effectively. In other words, the data gathering process in sequential analysis, in contrast to frequency analysis, is on-line. This idea caused some researches to conduct researches in various statistical aspects (Basseville and Nikiforov[1]).

In this chapter, using the concept of the sequential analysis approach, we develop an innovative Bayesian method designed specifically for the best solution in selection problem. The proposed method adopts the optimization concept of Bayesian inference and the uncertainty of the decision-making method in dynamic programming environment. The proposed algorithm is capable of taking into consideration the quality attributes of uncertain values in determining the optimal solution. Some authors have applied sequential analysis inference in combination with optimal stopping problem to maximize the probability of making correct decision. One of these researches is a new approach in probability distribution fitting of a given statistical data that Eshragh and Modarres [2] named it Decision on Belief (DOB). In this decision-making method, a sequential analysis approach is employed to find the best underlying probability distribution of the observed data. Moreover, Monfared and Ranaeifar [3] and Eshragh and Niaki [4] applied the DOB concept as a decision-making tool in some problems.

Since the idea behind the sequential analysis modeling is completely similar to the decision-making process of a human being in his life, it may perform better than available methods in decision-making problems. In these problems, when we want to make a decision, first we divide all of the probable solution space into smaller subspaces (the solution is one of the subspaces). Then based on our experiences, we assign a probability measure (belief) to each subspace, and finally we update the beliefs and make the decision.

2. An application to determine the best binomial distribution

In the best population selection problem, a similar decision-making process exits. First, the decision space can be divided into several subspaces (one for each population); second, the solution of the problem is one of the subspaces (the best population). Finally, we can assign a belief to each subspace where the belief denotes the performance of the population in term of its parameter. Based upon the updated beliefs in iterations of the data gathering process, we may decide which population possesses the best parameter value.

Consider n independent populations P1,P2,...,Pn, where for each index i=1,2,...,n, population Piis characterized by the value of its parameter of interest pi. Let p[1]...p[n]denote the ordered value of the parameters p1,...,pn. If we assume that the exact pairing between the ordered and the unordered parameter is unknown, then, a population Piwith pi=p[n]is called the best population.

There are many applications for the best population selection problem. As one application in supply chain environments, one needs to select the supplier among candidates that performs the best in terms of the quality of its products. As another example, in statistical analysis, we need to select a distribution among candidates that fits the collected observations the most. Selecting a production process that is in out-of-control state, selecting the stochastically optimum point of a multi-response problem, etc. are just a few of these applications.

The problem of selecting the best population was studied in papers by Bechhofer and Kulkarni [5] using the indifference zone approach and by Gupta and Panchapakesan [6] employing the best subset selection approach.

2.1. Belief and the approach of its improvement

Assume that there are n available Binomial populations and we intend to select the one with the highest probability of success. Furthermore, in each stage of the data gathering process and for each population, we take an independent sample of size m. Let us define αi,t'and βi,t'to be the observed number of successes and failures of the ith Binomial population in the tth stage (sample) and αi,kand βi,kto be the cumulative observed number of successes and failures of the ith Binomial population up to the kth stage (sample) respectively. In other words, αi,k=t=1kαi,t' and  βi,k=t=1kβi,t'. Then, in the kth stage defining pi,k¯to be the estimated probability of success of the ith population obtained by αi,kkm, referring to Jeffrey’s prior (Nair et al.[7]), for pi,k¯, we take a Beta prior distribution with parameters αi,0=0.5 and βi,0=0.5. Then, using Bayesian inference, we can easily show that the posterior probability density function of pi,k¯is


At stage k of the data gathering process, after taking a sample and observing the numbers of failures and successes, we update the probability distribution function of pi,k¯for each population. To do this, define B(αi,k,βi,k)as a probability measure (called belief) of the ith population to be the best one given αi,kand βi,kas

B(αi,k,βi,k)=Pr{ithpopulation is the best|αi,k,βi,k}E2

We then update the beliefs based on the values of (αi,k,βi,k)for each population in iteration k. If we define B(αi,k1,βi,k1)as the prior belief for each population, in order to update the posterior belief B(αi,k,βi,k), since we may assume that the data are taken independently in each stage, we will have

B(αi,k,βi,k)=Pr{ith Population is the best|(αi,k1,βi,k1)}Pr{(αi,k,βi,k)|ith Population is the best}j=1n[Pr{jth Population is the best|(αj,k1,βj,k1)}Pr{(αj,k,βj,k)|jth Population is the best}]=B(αi,k1,βi,k1)Pr{(αi,k,βi,k)|ith Population is the best}j=1n[B(αj,k1,βj,k1)Pr{(αj,k,βj,k)|jth Population is the best}]E3

From equation (3) we see that to update the beliefs, we need to evaluate Pr{(αi,k,βi,k)|ith Population is the best}   ;   i=1,2,...,nin each decision-making stage. One way to do this is to use

Pr{(αi,k,βi,k)|ith Population is the best}=pi,k¯j=1npj,k¯E4

Then, the probability given in equation (3) will increase when a better population is selected. In the next theorem, we will prove that when the number of decision-making stages goes to infinity this probability converges to one for the best population.

Theorem 1

If the ith population is the best, then LimkB(αi,k,βi,k)=Bi=1.

In order to prove the theorem first we prove the following two lemmas.

Lemma 1:

Define a recursive sequence {Rk,j;j=1,2,...,l}as

Rk,j={cjRk1,ji=1lciRk1,ifork=1,2,3,...Pjfor             k=0E5

where c1,c2,...,and clare different positive constants, j=1lPj=1 , and Pj>0Then, if lj=Limk(Rk,j), there exist at most one non-zero lj.


Suppose there are two nonzero ls>0 and lt>0. Taking the limit on Rk,jas k goes to infinity we have

Limk (Rk,j)=lj=Limk (cjRk1,ji=1lciRk1,i)=cjlji=1lciliE6

Now since ls>0 and lt>0, then by equation (6) we have

ls=cslsi=1lcili      cs=i=1lcili and  lt=ctlti=1lcili      ct=i=1lcili E7

In other words, we conclude cs=ct, which is a contradiction.

Lemma 2:

Sequence Rk,jconverges to one for j=gand converges to zero for jg, where g is an index for the maximum value of cj.


From equation (6), we know that j=1llj=1. Then by lemma 1, we have li=1for only one i. Now suppose that cg=maxj{1...m}{cj}and gi. We will show that this is a contradiction. Consider Hk,i=Rk,gRk,i. By equation (5), we have Hk,i=cgciHk1,i. Since Ho,i>0we will have

Hk,i=cgciHk1,i=(cgci)kH0,i      Limk(Hk,i)=E8

That is a contradiction because Limk(Hk,i)=Limk(Rk,g)Limk(Rk,i)=lgli=0. So lg=1

Now we are ready to prove the convergence property of the proposed method. Taking limit on both sides of equation (3), we will have

LimkB(αi,k,βi,k)=Bi=Limk[B(αi,k1,βi,k1)Pr{(αi,k,βi,k)|ith Population is the best}j=1n[B(αj,k1,βj,k1)Pr{(αj,k,βj,k)|jth Population is the best}] ]  E9

From the law of large numbers, we know that Limkpj,k¯=pj, where pjis the probability of success of the jth population. Hence, using equation (7) we have  Bi=Bipij=1nBjpj. Then assuming population i is the best, i.e., it possesses the largest value of pj’s, by lemma 1 and 2 we conclude that Bi=1and Bjji=0. This concludes the convergence property of the proposed method.

In real-world applications, since there is a cost associated with the data gathering process we need to select the best population in a finite number of decision-making stages. In the next section, we present the proposed decision-making method in the form of a stochastic dynamic programming model in which there is a limited number of decision-making stages available to select the best population.

2.2. A dynamic programming approach

The proposed dynamic programming approach to model the decision-making problem of selecting the best Binomial population is similar to an optimal stopping problem.

Let us assume that to find the best population there is a limited number of stages (s) available. Then, the general framework of the decision-making process in each stage is proposed as:

  1. Take an independent sample of size m from each population.

  2. Calculate the posterior beliefs in terms of the prior beliefs using Bayesian approach.

  3. Select the two biggest beliefs.

  4. Based upon the values of the two biggest beliefs calculate the minimum acceptable belief.

  5. If the maximum belief is more than the minimum acceptable belief, then we can conclude that the corresponding subspace is the optimal one. Otherwise, go to step 1.

In step 3 of the above framework, let populations i and j be the two candidates of being the best populations (it means that the beliefs of populations i and j are the two biggest beliefs) and we have s decision-making stages. If the biggest belief is more than a threshold (minimum acceptable belief)di,j(s), (0di,j(s)1), we select the corresponding subspace of that belief as the solution. Otherwise, the decision-making process continues by taking more observations. We determine the value of di,j(s)such that the belief of making the correct decision is maximized. To do this suppose that for each population a new observation, (αj,k,βj,k), is available at a given stage k. At this stage, we define V(s,di,j(s))to be the expected belief of making the correct decision in s stages when two populations i and j are the candidates for the optimal population. In other words, if we let CS denote the event of making the correct decision, we define Vi,j(s,di,j(s))=E[Bi,j{CS}], where Bi,j{CS}is the belief of making the correct decision. Furthermore, assume that the maximum of Vi,j(s,di,j(s))occurs at di,j*(s). Then, we will have


We denote this optimal point by Vi,j*(s). In other words, Vi,j*(s)=Vi,j(s,di,j*(s)). Moreover, let us define Siand Sjto be the state of selecting population i and j as the candidates for the optimal population, respectively, and NSi,jas the state of choosing neither of these population. Then, by conditioning on the above states, we have


In order to evaluateVi,j*(s), in what follows we will find the belief terms of equation (11).

  1. Bi,j{CS|Si}and Bi,j{CS|Sj}

These are the beliefs of making the correct decision if population i or j is selected as the optimal population, respectively. To make the evaluation easier, we denote these beliefs by Bi,j(i)and Bi,j(j). Then, using equation (2) we have

Bi,j{CS|Si}=Bi,j(i)=B(αi,k1,βi,k1)pk,i¯B(αi,k1,βi,k1)pk,i¯+B(αj,k1,βj,k1)pk,j¯  E12


Bi,j{CS|Sj}=Bi,j(j)=B(αj,k1,βj,k1)pk,j¯B(αj,k1,βj,k1)pk,j¯+B(αi,k1,βi,k1)pk,i¯  E13
  1. Bi,j{Si}and Bi,j{Sj}

These are the beliefs of selecting population i or j as the optimal population, respectively. Regarding the decision-making strategy, we have:

Bi,j(i)=max(Bi,j(i),Bi,j(j)) and Bi,j(i)di,j*(s)E14

Hence, we define event Sias


Since Bi,j(i)+Bi,j(j)=1and that the beliefs are not negative we conclude max{Bi,j(i),Bi,j(j)}0.5. Furthermore, since the decision making is performed based upon the maximum value of the beliefs, without interruption of assumptions, we can change the variation interval of di,j*(s)from [0,1] to [0.5,1]. Now by considering di,j*(s)0.5implicitly, we have Si{Bi,j(i)di,j*(s)}. By similar reasoning Sj{Bi,j(j)di,j*(s)}. Hence


In which, h(di,j*(s))=di,j*(s)B(αi,k1,βi,k1)(1di,j*(s))B(αj,k1,βj,k1).

To evaluate Pr{pj,k¯pi,k¯h(di,j*(s))}in equation (16), let f1(pj,k¯)and f2(pi,k¯)to be the probability distributions of pj,k¯and pi,k¯, respectively. Then,

f2(pi,k¯)=Γ(αi,k+βi,k+1)Γ(αi,k+0.5)Γ(βi,k+0.5)pi,k¯αi,k0.5(1pi,k¯)βi,k0.5 f1(pj,k¯)=Γ(αj,k+βj,k+1)Γ(αj,k+0.5)Γ(βj,k+0.5)pj,k¯αj,k0.5(1pj,k¯)βj,k0.5E17





By change of variables technique, we have:

U=pj,k¯pi,k¯ and V=pi,k¯f(U)=AiAjUαi,k0.501Vαi,k+αj,k(1V)βi,k1(1UV)βj,k0.5dVPr{pj,k¯pi,k¯h(di,j*(s))}=10h(di,j*(s))f(U)dU=1F(h(di,j*(s)))E20

For Bi,j{Si}we have

  1. Bi,j{CS|NSi,j}

Bi,j{CS|NSi,j}is the belief of making the correct decision when none of the subspaces i and j has been chosen as the optimal one. In other words, the maximum beliefs has been less than di,j*(s)and the process of decision-making continues to the next stage. In terms of stochastic dynamic programming approach, the belief of this event is equal to the maximum belief of making the correct decision in (s-1) stages. Since the value of this belief is discounted in the current stage, using discount factor α,

Having all the belief terms of equation (11) evaluated in equations (12), (13), (14), (15), and (16), and knowing that by partitioning the state space we have Bi,j{NSi,j}=1(Bi,j{Si}+Bi,j{Sj}), equation (11) can now be evaluated by substituting.


2.2.1. Making the decision

Assuming that for the two biggest beliefs we have Bi,j(i)Bi,j(j), equation (23) can be written as


For the decision-making problem at hand, three cases may happen

  1. Bi,j(i)<αVi,j*(s1):

In this case, both (Bi,j(i)αVi,j*(s1))and (Bi,j(j)αVi,j*(s1))are negative. Since we are maximizing Vi,j(s,di,j(s)), then the two probability terms in equation (24) must be minimized. This only happens when we let di,j*(s)=1, making the probability terms equal to zero. Now since Bi,j(i)<di,j*(s)=1, we continue to the next stage.

  1. Bi,j(j)>αVi,j*(s1):

In this case, (Bi,j(i)αVi,j*(s1))and (Bi,j(j)αVi,j*(s1))are both positive and to maximize Vi,j(s,di,j(s))we need the two probability terms in equation (24) to be maximized. This only happens when we let di,j*(s)=0.5. Since Bi,j(i)>di,j*(s)=0.5, we select population i as the optimal subspace.

  1. Bi,j(j)αVi,j*(s1)Bi,j(i):

In this case, one of the probability terms in equation (24) has positive coefficient and the other has negative coefficient. In this case, in order to maximize Vi,j(s,di,j(s))we take the derivative as follows.

Substituting equations (20) and (21) in equation (24) we have


Thus following is obtained,


For determining Pr{pj,k¯pi,k¯h(1di,j*(s))}, first using an approximation, we assume that pi,k¯is a constant number equal to its mean, then we have:


Also Pr{pj,k¯pi,k¯h(di,j*(s))}is obtained as follows,


Now following can be resulted,

Vi,j(s,di,j(s))di,j(s)=0(Bi,j(i)αVi,j*(s1))B(αi,k1,βi,k1)(1di,j*(s))2B(αj,k1,βj,k1)1f1(pi,k¯h(1di,j*(s))) =(Bi,j(j)αVi,j*(s1))B(αi,k1,βi,k1)(di,j*(s))2B(αj,k1,βj,k1)f1(pi,k¯h(di,j*(s)))(Bi,j(i)αVi,j*(s1))(Bi,j(j)αVi,j*(s1))=(1di,j*(s)di,j*(s))2(αi,k1+βi,k1)di,j1(s)=1((Bi,j(i)αVi,j*(s1))(Bi,j(j)αVi,j*(s1)))12(αi,k1+βi,k1)+1E29

Now the approximate value of di,j(s)say d1i,j(s)is determined.

Second using another approximation, we assume that pj,k¯is a constant number equal to its mean thus with similar reasoning, following is obtained:


Therefore the approximate optimal value of di,j*(s)can be determined from following equation,


3. An application for fault detection and diagnosis in multivariate statistical quality control environments

3.1. Introduction

In this section, a heuristic threshold policy is applied in phase II of a control charting procedure to not only detect the states of a multivariate quality control system, but also to diagnose the quality characteristic(s) responsible for an out-of-control signal. It is assumed that the in-control mean vector and in-control covariance matrix of the process have been obtained in phase I.

3.2. Background

In a multivariate quality control environment, suppose there are mcorrelated quality characteristics whose means are being monitored simultaneously. Further, assume there is only one observation on the quality characteristics at each iteration of the data gathering process, where the goal is to detect the variable with the maximum mean shift. Let xkibe the observation of the ith quality characteristic, i=1,2,...,m, at iteration k, k=1,2,..., and define the observation vector xk=[xk1,xk2....,xkm]Tand observation matrix Ok=(x1,x2,...,xk). After taking a new observation, xk, define Bi(xk,Ok-1), the probability of variable ito be in an out-of-control state, as


where OOCstands for out-of-control. This probability has been called the belief of variable ito be in out-of-control condition given the observation matrix up to iteration k1and the observation vector obtained at iteration k.

Assuming the observations are taken independently at each iteration, to improve the belief of the process being in an out-of-control state at the kth iteration, based on the observation matrix Ok-1and the new observation vector xk, we have


Then, using the Bayesian rule the posterior belief is:


Since the goal is to detect the variable with the maximum mean shift, only one quality characteristic can be considered out-of-control at each iteration. In this way, there are m1remaining candidates for which m1quality characteristics are in-control. Hence, one can say that the candidates are mutually exclusive and collectively exhaustive. Therefore, using the Bayes' theorem, one can write equation (34) as


When the system is in-control, we assume the mcharacteristics follow a multinormal distribution with mean vector μ=[μ1,μ2,...,μm]Tand covariance matrix


In out-of-control situations, only the mean vector changes and the probability distribution along with the covariance matrix remain unchanged. In latter case, equation (35) is used to calculate the probability of shifts in the process mean μat different iterations. Moreover, in order to update the beliefs at iteration kone needs to evaluate Pr{xk|OOCi}.

The term Pr{xk|OOCi}is the probability of observing xkif only the ith quality characteristic is out-of-control. The exact value of this probability can be determined using the multivariate normal density, Aexp(12(xkμ1i)TΣ-1(xkμ1i)), where μ1idenotes the mean vector in which only the ith characteristic has shifted to an out-of-control condition and Ais a known constant.

Since the exact value of the out-of-control mean vector μ1iis not known a priori, two approximations are used in this research to determine Pr{xk|OOCi}. Note that we do not want to determine the exact probability. Instead, the aim is to have an approximate probability (a belief) on each characteristic being out-of-control. In the first approximation method, define ICito be the event that all characteristics are in-control, and let Pr{xk|ICi}be the conditional probability of observing xkgiven all characteristics are in-control. Further, let xk'=[μ01,...,xki,μ0i+1,...,μ0m]Tin the aforementioned multivariate normal density, so that Pr{xk|ICi}can be approximately evaluated using Pr{xk|ICi}=Pr{x'k|ICi}, where Pr{x'k|ICi}=Aexp(12(xk'μ0)TΣ-1(xk'μ0)). Note that this evaluation is proportional to exp(12(xkiμ0iσi)2), and since it is assumed that characteristic i is under control, no matter the condition of the other characteristics, this approximation is justifiable.

In the second approximation method, we assume Pr{xk|OOCi}1Pr{xk|ICi}. Although it is obvious that Pr{xk|OOCi}is not equal to 1Pr{xk|ICi}, since we only need a belief function to evaluate Pr{xk|OOCi}and also we do not know the exact value of out-of-control mean vector, this approximation is just used to determinePr{xk|OOCi}. Moreover, it can be easily seen that the closer the value of the ith characteristic is to its in-control mean the smaller is Pr{xk|OOCi}as expected. We thus let


where Ris a sufficiently big constant number to ensure the above definition is less than one.

The approximation to Pr{xk|OOCi}in equation (37) has the following two properties:

  • It does not require the value of out-of-control means to be known.

  • The determination of a threshold for the decision-making process (derived later) will be easier.

Niaki and Fallahnezhad [8] defined another equation for the above conditional probability and showed that if a shift occurs in the mean of variable i, thenLimkBi(xk,Ok1)=Bi=1. They proposed a novel method of detection and classification and used simulation to compare its performances with that of existing methods in terms of the average run length for different mean shifts. The results of the simulation study were in favor of their proposed method in almost all shift scenarios. Besides using a different equation, the main difference between the current research and Niaki and Fallahnezhad [8] is that the current work develops a novel heuristic threshold policy, in which to save sampling cost and time or when these factors are constrained, the number of the data gathering stages is limited.

3.3. The proposed procedure

Assuming a limited number of the data gathering stages, N, to detect and diagnose characteristic(s), a heuristic threshold policy-based model is developed in this Section. The framework of the proposed decision-making process follows.

Step I

Define i=1,2,...,mas the set of indices for the characteristics, all of which having the potential of being out-of-control.

Step II

Using the maximum entropy principle, initialize Bi(O0)=1/mas the prior belief of the ith variable to be out-of-control. In other words, at the start of the decision-making process all variables have an equal chance of being out-of-control. Set the discount rateα, the maximum probability of correct selection when Ndecision making stages remainsV(N), and the maximum number of decision making stagesN.

Step III

Set k=0

Step IV

Obtain an observation of the process.

Step V

Estimate the posterior beliefs, Bi(Ok)(fori=1,2,...,m), using equation (35).

Step VI

Obtain the order statistics on the posterior beliefs Bi(Ok)such that


Furthermore, let Bgr(Ok)=B(m)(Ok)and Bsm(Ok)=B(m1)(Ok).

Step VII

Assume the variables with the indices i=grand j=smare the candidates of being out-of-control, where Ndecision-making steps are available. Define V(N,di,j(k))the probability of correct choice between the variables iand j, where di,j(k)is the acceptable belief. Also, define CSthe event of correct selection and event Ei,jthe existence of two out-of-control candidate variables iand j. Then, we have:


where " " means "defined as."

Assuming di,j*(k)the maximum point of V(N,di,j(k)), called the minimum acceptable belief, we have


Let Siand Sjbe the event of selecting iand jas the out-of-control variables, respectively, and NSi,jbe the event of not selecting any. Then, by conditioning on the probability, we have:


At the kth iteration, the conditional bi-variate distribution of the sample means for variables grand sm, i.e, Xk,j=gr,sm|Xk,jgr,sm, is determined using the conditional property of multivariate normal distribution given in appendix 1. Moreover, knowing E(xk,j)=μjand evaluating the conditional mean and standard deviation (see appendix 1) results in




Based on the decomposition method of Mason et al. [9], define statistics Tk,j and Tk,i|jas


Thus, when the process is in-control, the statistics Tk,j and Tk,i|jfollow a standard normal distribution [9].

Now, let Bi,j(i;xk,Ok1)denote the probability of correct selection conditioned on selecting ias the out-of-control variable. Hence,


Then, the probability measure Pri,j{CS|Si}is calculated using the following equation,


The probability measure Pri,j{Si}is defined as the probability of selecting variable ito be out-of-control. Regarding to the explained strategy, we have:


Since Bi,j(i;xk,Ok1)+Bi,j(j;xk,Ok1)=1and the value of beliefs are not negative, we conclude


Without interruption of assumptions, we can change the variation interval of di,j(k)from [0,1] to [0.5,1]. Hence,


By similar reasoning, we have:


The term Pri,j{CS|NSi,j}denotes the probability of correct selection conditioned on excluding the candidates iand jas the solution. In other words, the maximum belief has been less than the threshold (minimum acceptable belief) di,j*(k)and the decision making process continues to the next stage. In terms of stochastic dynamic programming approach, the probability of this event is equal to the maximum probability of correct selection when there are N1stages remaining. The discounted value of this probability in the current stage using the discount factor αequals to α Vi,j(N1). Further, since we partitioned the decision space into events {NSi,j;Si;Sj}, we have:


Now we evaluate Vi,j*(N)as follows,


In other words,


The method of evaluating the minimum acceptable belief dgr,sm*(k)is given in Appendix 2.

Step VIII: The Decision Step

If the belief Bgr,sm(gr;xk,Ok1)in the candidate set (sm,gr)is equal to or greater than dgr,sm*(k)then choose the variable with index grto be out-of-control. In this case, the decision-making process ends. Otherwise, without having any selection at this stage, obtain another observation, lower the number of remaining decision-stages to N1, set k=k+1, and return to step V above. The process will continue until either the stopping condition is reached or the number of stages is finished. The optimal strategy with Ndecision-making stages that maximizes the probability of correct selection would be resulted from this process.

In what follows, the procedure to evaluate Vi,j*(N)of equation (53) is given in detail.

3.4. Method of evaluating Vi,j*(N)

Using di,j*(k)as the minimum acceptable belief, from equation (53) we have


Then, for the decision-making problem at hand, three cases may occur

  1. Bi,j(i;Ok)<αVi,j*(N1)

In this case, both (Bi,j(i;Ok)αVi,j*(N1))and (Bi,j(i;Ok)αVi,j*(N1))are negative. Since we are maximizing Vi,j(N,di,j(k)), the two probability terms in equation (54) must be minimized. This only happens when di,j*(k)=1, making the probability terms equal to zero. In other words, since Bi,j(i;Ok)<di,j*(k)=1, we continue to the next stage.

  1. Bi,j(j;Ok)>αVi,j*(s1)

In this case, (Bi,j(i;Ok)αVi,j*(N1))and (Bi,j(i;Ok)αVi,j*(N1))are both positive and to maximize Vi,j(N,di,j(k))we need the two probability terms in equation (54) to be maximized. This only happens when di,j*(k)=0.5. In other words, since Bi,j(i;Ok)>di,j*(k)=0.5, the variable with the index iis selected.

  1. Bi,j(j;Ok)<αVi,j*(N1)<Bi,j(i;Ok)

In this case, one of the probability terms in equation (54) has a positive and the other a negative coefficient. Then, in order to maximize Vi,j(N,di,j(k)), the first derivative on di,j(k)must be equated to zero. To do this, define h(dgr,sm(k))and r(dgr,sm(k))as follows:


We first present the method of evaluating Pr{Bgr,sm(sm;Ok)dgr,sm(k)}as follows.


Then, the method of evaluating probability terms in equation (57) is given in appendix 2.

With similar reasoning, we have,


The method of determining the minimum acceptable belief is given in appendix 2.

4. An application for fault detection in uni-variate statistical quality control environments

In a uni-variate quality control environment, if we limit ourselves to apply a control charting method, most of the information obtained from data behavior will be ignored. The main aim of a control charting method is to detect quickly undesired faults in the process. However, we may calculate the belief for the process being out-of-control applying Bayesian rule at any iteration in which some observations on the quality characteristic are gathered. Regarding these beliefs and a stopping rule, we may find and specify a control threshold for these beliefs and when the updated belief in any iteration is more than this threshold, an out-of-control signal is observed.

In Decision on Beliefs, first, all probable solution spaces will be divided into several candidates (the solution is one of the candidates), then a belief will be assigned to each candidate considering our experiences and finally, the beliefs are updated and the optimal decision is selected based on the current situation. In a SPC problem, a similar decision-making process exits. First, the decision space can be divided into two candidates; an in-control or out-of-control production process. Second, the problem solution is one of the candidates (in-control or out-of-control process). Finally, a belief is assigned to each candidate so that the belief shows the probability of being a fault in the process. Based upon the updated belief, we may decide about states of the process (in-control or out-of-control process).

4.1. Learning — The beliefs and approach for its improvement

For simplicity, individual observation on the quality characteristic of interest in any iteration of data gathering process was gathered. At iteration k of data gathering process, Ok=(x1,x2,......,xk)was defined as the observation vector where resemble observations for previous iterations 1, 2, …, k. After taking a new observation, Ok-1 the belief of being in an out-of-control state is defined as B(xk,Ok1)=Pr{Outofcontrol|xk,Ok1}. At this iteration, we want to update the belief of being in out-of-control state based on observation vector Ok1and new observation xk. If we define B(Ok1)=B(xk1,Ok2)as the prior belief of an out-of-control state, in order to update the posterior belief B(xk,Ok1), since we may assume that the observations are taken independently in any iteration, then we will have


With this feature, the updated belief is obtained using Bayesian rule:


Since in-control or out-of-control state partition the decision space, we can write equation (60) as


Assuming the quality characteristic of interest follows a normal distribution with mean μ and variance σ2, we use equation (61) to calculate both beliefs for occurring positive or negative shifts in the process mean μ.

  • Positive shifts in the process mean

The values of B+(Ok), showing the probability of occurring a positive shift in the process mean, will be calculated applying equation (61) recursively. Pr{xk|Incontrol}is defined by the following equation,


For positive shift, the probability of being a positive shift in the process at iteration k, Pr{xk|Outofcontrol}, is calculated using equation (63).


where φ(xk)is the cumulative probability distribution function for the normal distribution with mean μ and variance σ2. Above probabilities are not exact probabilities and they are a kind of belief function to ascertain good properties for B+(Ok)

Therefore B+(Ok)is determined by the following equation,

  • Negative shifts in the process mean

The values of B(Ok)denotes the probability of being a negative shift in the process mean that is calculated using equation (61) recursively. In this case, Pr{xk|Incontrol}is defined by the following equation,


Also is Pr{xk|Outofcontrol}calculated using equation (66).


Thus B(Ok)is determined by the following equation,


4.2. A decision on beliefs approach

We present a decision making approach in terms of Stochastic Dynamic Programming approach. Presented approach is like an optimal stopping problem.

Suppose n stages for decision making is remained and two decisions are available.

  • A positive shift is occurred in the process mean

  • No positive shift is occurred in the process mean

Decision making framework is as follows:

  • Gather a new observation.

  • Calculate the posterior Beliefs in terms of prior Beliefs.

  • Order the current Beliefs as an ascending form and choose the maximum.

  • Determine the value of the minimum acceptable belief (d+(n)is the minimum acceptable belief for detecting the positive shift and d(n)is the least acceptable belief for detecting the negative shift)

  • If the maximum Belief was more than the minimum acceptable belief, d+(n), select the belief candidate with maximum value as a solution else go to step 1.

  • In terms of above algorithm, the belief with maximum value is chosen and if this belief was more than a control threshold like d+(n), the candidate of that Belief will be selected as optimal candidate else the sampling process is continued. The objective of this model is to determine the optimal values of d+(n). The result of this process is the optimal strategy with n decision making stages that maximize the probability of correct selection.

Suppose new observation xkis gathered. (k is the number of gathered observations so far). V(n,d+(n))is defined as the probability of correct selection when n decision making stages are remained and we follow d+(n)strategy explained above also V(n)denotes the maximum value of V(n,d+(n))thus,


CS is defined as the event of correct selection. S1 is defined as selecting the out-of-control condition (positive shift) as an optimal solution and S2 is defined as selecting the in-control condition as an optimal decision and NS is defined as not selecting any candidate in this stage.

Hence, using the total probability law, it is concluded that:


Pr{CS|S1}denotes the probability of correct selection when candidate S1 is selected as the optimal candidate and this probability equals to its belief, B+(Ok), and with the same discussion, it is concluded that Pr{CS|S2}=1B+(Ok)

Pr{S1}is the probability of selecting out of control candidate (positive shift) as the solution thus following the decision making strategy, we should have B+(Ok)=max(B+(Ok),1B+(Ok))and B+(Ok)>d+(n)that is equivalent to following,


With the same reasoning, it is concluded that,

  1. Pr{CS|NS}denotes the probability of correct selection when none of candidates has been selected and it means that the maximum value of the beliefs is less than d+(n)and the process of decision making continues to latter stage. As a result, in terms of Dynamic Programming Approach, the probability of this event equals to maximum of probability of correct selection in latter stage (n-1), V(n1), but since taking observations has cost, then the value of this probability in current time is less than its actual value and by using the discounting factor α, it equals αV(n1)

  1. Since the entire solution space is partitioned, it is concluded that Pr{CS|NS}=1(Pr{S1}+Pr{S2})

By the above preliminaries, the function V(n)is determined as follows:


In terms of above equation, V(n,d+(n))is obtained as follows:


Calculation method for V(n,d+(n)):

B+(gr,Ok)and B+(sm,Ok)are defined as follows:


Now equation (73) is rewritten as follows:


There are three conditions:

  1. B+(gr,Ok)<αV(n1)

In this condition, both B+(gr,Ok)αV(n1)and B+(sm,Ok)αV(n1)are negative, thus we should have d+(n)=1in order to maximize V(n,d+(n)). Since B+(gr,Ok)<d+(n)=1, we don’t select any candidate in this condition and sampling process continues.

  1. B+(sm,Ok)>αV(n1)

In this condition, both B+(gr,Ok)αV(n1)and B+(sm,Ok)αV(n1)are positive, thus we should have d+(n)=0.5in order to maximize V(n,d+(n)). since B+(gr,Ok)>d+(n)=0.5, we select the candidate of belief B+(gr,Ok)as the solution.

  1. B+(sm,Ok)<αV(n1)<B+(gr,Ok)

In this condition, one of the probabilities in equation (10) has positive coefficient and one has negative coefficient, to maximize V(n,d+(n)), optimality methods should be applied.

  • Definition: h(d+(n))is defined as follows:


First the value of Pr{B+(Ok)>d+(n)}is determined as follows:


Since φ(xk)is a cumulative distribution function thus it follows a uniform distribution function in interval [0, 1], thus the above equality is concluded.

With the same reasoning, it is concluded that:


Now equation (73) can be written as follows:


And equation (79) can be written as follows:


Since V*(n)=Max0.5<d+(n)<1[V(n,d+(n))]thus it is sufficient to maximize the real value function V(n,d+(n)), therefore; we should find the function value in points where first derivative is equated to zero as follows,


The optimal threshold d+(n)is determined by the above equation. Since the optimal value of d+(n)should be in the interval [0.5, 1] thus it is concluded that the optimal value of d+(n)will be determined as follows:


The above method is presented for detecting the positive shifts in the process mean and can be adapted for detecting the negative shifts with the same reasoning.

The general decision making algorithm is summarized as follows:

  1. Set k=0 and the initial beliefs B+(O0)=0.5,B(O0)=0.5.

  2. Gather an observation and set k=k+1,n=n1.

  3. If n<0, then no shift is occurred in the process mean and decision making stops.

  4. Update the values for the beliefs B(Ok),B+(Ok)by equation (61).

  5. If Min(B+(Ok),1B+(Ok))>αV(n1), then if Max(B+(Ok),1B+(Ok))=B+(Ok), it is concluded that a positive shift is occurred in the process mean and decision making stops, also if Max(B+(Ok),1B+(Ok))=1B+(Ok), then no positive shift is occurred in the process mean and decision making stops.

  6. If Max(B+(Ok),1B+(Ok))<αV(n1), then data is not sufficient for detecting the positive shift and go to stage 2 after checking the occurrence of negative shift in the rest of the algorithm.

  7. If Min(B(Ok),1B(Ok))>αV(n1)then if Max(B(Ok),1B(Ok))=B(Ok)it is concluded that a negative shift is occurred the process mean and decision making stops and if Max(B(Ok),1B(Ok))=1B(Ok), then no negative shift is occurred in the process mean and decision making stops.

  8. If Max(B(Ok),1B(Ok))<αV(n1), then data is not sufficient for detecting the negative shift and go to stage 2.

  9. If Max(B+(Ok),1B+(Ok))>αV(n1)>Min(B+(Ok),1B+(Ok)), then determine the value of d+(n)(minimum acceptable belief for detecting the positive shift) by the following equation:

  1. If Max(B(Ok),1B(Ok))>αV(n1)>Min(B(Ok),1B(Ok)), then determine the value of d(n)(minimum acceptable belief for detecting the negative shift) by the following equation:

  1. If B+(Ok)>d+(n), then a positive shift is occurred and decision making stops, and if (1B+(Ok))>d+(n), then no positive shift is occurred and decision making stops, else go to stage 2 after checking the occurrence of negative shift in rest of the algorithm.

  2. If B(Ok)>d(n), then a negative shift is occurred and decision making stops, and If (1B(Ok))>d(n), then no negative shift is occurred and decision making stops, else go to stage 2.

  3. The approximate value of αV(n1)based on the discount factor αin the stochastic dynamic programming approach is αnV(0).

5. Conclusion

In this chapter, we introduced a new approach to determine the best solution out of m candidates. To do this, first, we defined the belief of selecting the best solution and explained how to model the problem by the Bayesian analysis approach. Second, we clarified the approach by which we improved the beliefs, and proved that it converges to detect the best solution. Next, we proposed a decision-making strategy using dynamic programming approach in which there were a limited number of decision-making stages.

Appendix 1

Conditional Mean and Variance of the Variables

Conditional mean of variables grand smcan be evaluated using the following equation.


where, b2'=ΣxXΣXX-1



Σ: The covariance matrix of the process

Σxx: Submatrix of the covariance matrix Σfor variables j=gr,sm

ΣxX: Submatrix of the covariance matrix Σbetween variables j=gr,smand jgr,sm

ΣXX: Submatrix of the covariance matrix Σfor variables jgr,sm

Further, the conditional covariance matrix of variables j=gr,smon variables jgr,sm, is obtained as Σxx-ΣxXTΣXX-1ΣxX.

Appendix 2

Evaluating the Optimal Value of dgr,sm(k)

Assume (μj)j{1,2,...,m}=0 and (σj)j{1,2,...,m}=1. Then,


Now since (Tk,sm,Tk,gr|sm)follow a standard normal distribution (μj)j{gr,sm}=0 and (σj)j{gr,sm}=1, hence (Tk,gr|sm)2and (Tk,sm)2follow a χ2distribution with one degree of freedom. Then using an approximation, if we assume that (Tk,sm)2is approximately equal to its mean, we have




Now, since (Tk,gr|sm)2χ2(1), we have






Replacing the above equations in equation (53) results in


Now by solving the equation δVi,j(N)δdgr,sm(k)=0, the following equation is obtained.


Finally, the approximate value of dgr,sm(k)say d1gr,sm(k)is determined by solving this equation numerically or by a search algorithm.

Now using another approximation, if we assume that (Tk,gr)2is approximately equal to its mean, the approximate value of dgr,sm(k)say d2gr,sm(k)is determined by solving following equation,


The approximate optimal value of dgr,sm(k)is obtained as follows,


How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Mohammad Saber Fallah Nezhad (April 29th 2014). Using Dynamic Programming Based on Bayesian Inference in Selection Problems, Dynamic Programming and Bayesian Inference, Concepts and Applications, Mohammad Saber Fallah Nezhad, IntechOpen, DOI: 10.5772/57423. Available from:

chapter statistics

1200total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Bayesian Networks for Supporting Model Based Predictive Control of Smart Buildings

By Alessandro Carbonari, Massimo Vaccarini and Alberto Giretti

Related Book

First chapter

Toward a Better Quality Control of Weather Data

By Kenneth Hubbard, Jinsheng You and Martha Shulski

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More about us