Using Dynamic Programming Based on Bayesian Inference in Selection Problems

An important subject in mathematical science that causes new improvements in data analysis is sequential analysis. In this type of analysis, the number of required observations is not fixed in advance, but is a variable and depends upon the values of the gathered observation. In sequential analysis, at any stage of data gathering process, to determine the number of required observations at the next stage, we analyze the data at hand and with respect to the obtained results, we determine how many more observations are necessary. In this way, the process of data gathering is cheaper and the information is used more effectively. In other words, the data gathering process in sequential analysis, in contrast to frequency analysis, is on-line. This idea caused some researches to conduct researches in various statistical aspects (Basseville and Nikiforov[1]).


Introduction
An important subject in mathematical science that causes new improvements in data analysis is sequential analysis. In this type of analysis, the number of required observations is not fixed in advance, but is a variable and depends upon the values of the gathered observation. In sequential analysis, at any stage of data gathering process, to determine the number of required observations at the next stage, we analyze the data at hand and with respect to the obtained results, we determine how many more observations are necessary. In this way, the process of data gathering is cheaper and the information is used more effectively. In other words, the data gathering process in sequential analysis, in contrast to frequency analysis, is on-line. This idea caused some researches to conduct researches in various statistical aspects (Basseville and Nikiforov [1]).
In this chapter, using the concept of the sequential analysis approach, we develop an innovative Bayesian method designed specifically for the best solution in selection problem. The proposed method adopts the optimization concept of Bayesian inference and the uncertainty of the decision-making method in dynamic programming environment. The proposed algorithm is capable of taking into consideration the quality attributes of uncertain values in determining the optimal solution. Some authors have applied sequential analysis inference in combination with optimal stopping problem to maximize the probability of making correct decision. One of these researches is a new approach in probability distribution fitting of a given statistical data that Eshragh and Modarres [2] named it Decision on Belief (DOB). In this decision-making method, a sequential analysis approach is employed to find the best underlying probability distribution of the observed data. Moreover, Monfared and Ranaeifar [3] and Eshragh and Niaki [4] applied the DOB concept as a decision-making tool in some problems.
Since the idea behind the sequential analysis modeling is completely similar to the decisionmaking process of a human being in his life, it may perform better than available methods in decision-making problems. In these problems, when we want to make a decision, first we divide all of the probable solution space into smaller subspaces (the solution is one of the subspaces). Then based on our experiences, we assign a probability measure (belief) to each subspace, and finally we update the beliefs and make the decision.

An application to determine the best binomial distribution
In the best population selection problem, a similar decision-making process exits. First, the decision space can be divided into several subspaces (one for each population); second, the solution of the problem is one of the subspaces (the best population). Finally, we can assign a belief to each subspace where the belief denotes the performance of the population in term of its parameter. Based upon the updated beliefs in iterations of the data gathering process, we may decide which population possesses the best parameter value.
Consider n independent populations P 1 , P 2 , ..., P n , where for each index i = 1, 2, ..., n, population P i is characterized by the value of its parameter of interest p i . Let p 1 ≤ ... ≤ p n denote the ordered value of the parameters p 1 , ..., p n . If we assume that the exact pairing between the ordered and the unordered parameter is unknown, then, a population P i with p i = p n is called the best population.
There are many applications for the best population selection problem. As one application in supply chain environments, one needs to select the supplier among candidates that performs the best in terms of the quality of its products. As another example, in statistical analysis, we need to select a distribution among candidates that fits the collected observations the most. Selecting a production process that is in out-of-control state, selecting the stochastically optimum point of a multi-response problem, etc. are just a few of these applications.
The problem of selecting the best population was studied in papers by Bechhofer and Kulkarni [5] using the indifference zone approach and by Gupta and Panchapakesan [6] employing the best subset selection approach.

Belief and the approach of its improvement
Assume that there are n available Binomial populations and we intend to select the one with the highest probability of success. Furthermore, in each stage of the data gathering process and for each population, we take an independent sample of size m. Let us define α i,t ' and β i,t ' to be the observed number of successes and failures of the i th Binomial population in the t th stage (sample) and α i,k and β i,k to be the cumulative observed number of successes and failures of the i th Binomial population up to the k th stage (sample) respectively. In other words, km , referring to Jeffrey's prior (Nair et al. [7]), for p i,k , we take a Beta prior distribution with parameters α i,0 =0.5 and β i,0 =0.5. Then, using Bayesian inference, we can easily show that the posterior probability density function of p i,k is , , At stage k of the data gathering process, after taking a sample and observing the numbers of failures and successes, we update the probability distribution function of p i,k for each population. To do this, define B(α i,k , β i,k ) as a probability measure (called belief) of the i th population to be the best one given α i,k and β i,k as , Pr population is the best , We then update the beliefs based on the values of (α i,k , β i,k ) for each population in iteration k.
If we define B(α i,k −1 , β i,k −1 ) as the prior belief for each population, in order to update the posterior belief B(α i,k , β i,k ), since we may assume that the data are taken independently in each stage, we will have Population is the best , Pr , Population is the best From equation (3) we see that to update the beliefs, we need to evaluate  (4) Then, the probability given in equation (3) will increase when a better population is selected. In the next theorem, we will prove that when the number of decision-making stages goes to infinity this probability converges to one for the best population.

Theorem 1
If the i th population is the best, then Lim In order to prove the theorem first we prove the following two lemmas.

Proof:
Suppose there are two nonzero l s > 0 and l t > 0. Taking the limit on R k , j as k goes to infinity we have 1, , Now since l s > 0 and l t > 0, then by equation (6) In other words, we conclude c s = c t , which is a contradiction.

Lemma 2:
Sequence R k , j converges to one for j = g and converges to zero for j ≠ g, where g is an index for the maximum value of c j .

Proof
From equation (6), we know that ∑ j=1 l l j = 1. Then by lemma 1, we have l i = 1 for only one i. Now suppose that c g = max j∈{1...m} {c j } and g ≠ i. We will show that this is a contradiction. Consider Now we are ready to prove the convergence property of the proposed method. Taking limit on both sides of equation (3), we will have Pr , Population is the best , , Pr , Population is the best From the law of large numbers, we know that Lim k→∞ p j,k = p j , where p j is the probability of success of the j th population. Hence, using equation (7) we have Then assuming population i is the best, i.e., it possesses the largest value of p j 's, by lemma 1 and 2 we conclude that B i = 1 and B j j≠i = 0. This concludes the convergence property of the proposed method.
In real-world applications, since there is a cost associated with the data gathering process we need to select the best population in a finite number of decision-making stages. In the next section, we present the proposed decision-making method in the form of a stochastic dynamic programming model in which there is a limited number of decision-making stages available to select the best population.

A dynamic programming approach
The proposed dynamic programming approach to model the decision-making problem of selecting the best Binomial population is similar to an optimal stopping problem.
Let us assume that to find the best population there is a limited number of stages (s) available. Then, the general framework of the decision-making process in each stage is proposed as:

1.
Take an independent sample of size m from each population.

2.
Calculate the posterior beliefs in terms of the prior beliefs using Bayesian approach.
3. Select the two biggest beliefs.

4.
Based upon the values of the two biggest beliefs calculate the minimum acceptable belief.

5.
If the maximum belief is more than the minimum acceptable belief, then we can conclude that the corresponding subspace is the optimal one. Otherwise, go to step 1.
In step 3 of the above framework, let populations i and j be the two candidates of being the best populations (it means that the beliefs of populations i and j are the two biggest beliefs) and we have s decision-making stages. If the biggest belief is more than a threshold (minimum acceptable belief)d i, j (s), (0 ≤ d i, j (s) ≤ 1), we select the corresponding subspace of that belief as the solution. Otherwise, the decision-making process continues by taking more observations. We determine the value of d i, j (s) such that the belief of making the correct decision is maximized. To do this suppose that for each population a new observation, (α j,k , β j,k ), is available at a given stage k. At this stage, we define V (s, d i, j (s)) to be the expected belief of making the correct decision in s stages when two populations i and j are the candidates for the optimal population. In other words, if we let CS denote the event of making the correct decision, we We denote this optimal point by V i, j * (s). In other words, V i, j * (s) = V i, j (s, d i, j * (s)). Moreover, let us define S i and S j to be the state of selecting population i and j as the candidates for the optimal population, respectively, and N S i, j as the state of choosing neither of these population. Then, by conditioning on the above states, we have In order to evaluateV i, j * (s), in what follows we will find the belief terms of equation (11).

a. B i, j {CS | S i } and B i, j {CS | S j }
These are the beliefs of making the correct decision if population i or j is selected as the optimal population, respectively. To make the evaluation easier, we denote these beliefs by B i, j (i) and B i, j ( j). Then, using equation (2) we have , , Similarly, , , These are the beliefs of selecting population i or j as the optimal population, respectively.
Regarding the decision-making strategy, we have: Using Dynamic Programming Based on Bayesian Inference in Selection Problems http://dx.doi.org/10.5772/57423 To evaluate Pr{ (16), let f 1 ( p j,k ) and f 2 ( p i,k ) to be the probability distributions of p j,k and p i,k , respectively. Then, , , h d s p f p f p dp dp A p p A p p dp dp , By change of variables technique, we have: Pr , , Dynamic Programming and Bayesian Inference, Concepts and Applications j }is the belief of making the correct decision when none of the subspaces i and j has been chosen as the optimal one. In other words, the maximum beliefs has been less than d i, j * (s) and the process of decision-making continues to the next stage. In terms of stochastic dynamic programming approach, the belief of this event is equal to the maximum belief of making the correct decision in (s-1) stages. Since the value of this belief is discounted in the current stage, using discount factor α, Having all the belief terms of equation (11) equation (11) can now be evaluated by substituting.

Making the decision
Assuming that for the two biggest beliefs we have For the decision-making problem at hand, three cases may happen In this case, (B i, j (i) − αV i, j * (s − 1)) and (B i, j ( j) − αV i, j * (s − 1)) are both positive and to maximize V i, j (s, d i, j (s) ) we need the two probability terms in equation (24) to be maximized. This only In this case, one of the probability terms in equation (24) has positive coefficient and the other has negative coefficient. In this case, in order to maximize V i, j (s, d i, j (s) ) we take the derivative as follows.
Substituting equations (20) and (21) in equation (24) we have Thus following is obtained, For determining Pr{ * (s))}, first using an approximation, we assume that p i,k is a constant number equal to its mean, then we have: , , Pr 1 , , Pr Pr Now following can be resulted, Second using another approximation, we assume that p j,k is a constant number equal to its mean thus with similar reasoning, following is obtained: 3. An application for fault detection and diagnosis in multivariate statistical quality control environments

Introduction
In this section, a heuristic threshold policy is applied in phase II of a control charting procedure to not only detect the states of a multivariate quality control system, but also to diagnose the quality characteristic(s) responsible for an out-of-control signal. It is assumed that the incontrol mean vector and in-control covariance matrix of the process have been obtained in phase I.

Background
In a multivariate quality control environment, suppose there are m correlated quality characteristics whose means are being monitored simultaneously. Further, assume there is only one observation on the quality characteristics at each iteration of the data gathering process, where the goal is to detect the variable with the maximum mean shift. Let x ki be the observation of where OOC stands for out-of-control. This probability has been called the belief of variable i to be in out-of-control condition given the observation matrix up to iteration k − 1 and the observation vector obtained at iteration k.
Assuming the observations are taken independently at each iteration, to improve the belief of the process being in an out-of-control state at the k th iteration, based on the observation matrix O k -1 and the new observation vector x k , we have Then, using the Bayesian rule the posterior belief is: Since the goal is to detect the variable with the maximum mean shift, only one quality characteristic can be considered out-of-control at each iteration. In this way, there are m − 1 remaining candidates for which m − 1 quality characteristics are in-control. Hence, one can say that the candidates are mutually exclusive and collectively exhaustive. Therefore, using the Bayes' theorem, one can write equation (34) as 1 1 Pr{ When the system is in-control, we assume the m characteristics follow a multinormal distribution with mean vector μ = μ 1 , μ 2 , ..., μ m T and covariance matrix In out-of-control situations, only the mean vector changes and the probability distribution along with the covariance matrix remain unchanged. In latter case, equation (35) is used to calculate the probability of shifts in the process mean μ at different iterations. Moreover, in order to update the beliefs at iteration k one needs to evaluate Pr{x k | OOC i }.
The term Pr{x k | OOC i } is the probability of observing x k if only the i th quality characteristic is out-of-control. The exact value of this probability can be determined using the multivariate , where μ 1i denotes the mean vector in which only the i th characteristic has shifted to an out-of-control condition and A is a known constant. Since the exact value of the out-of-control mean vector μ 1i is not known a priori, two approximations are used in this research to determine Pr{x k | OOC i }. Note that we do not want to determine the exact probability. Instead, the aim is to have an approximate probability (a belief) on each characteristic being out-of-control. In the first approximation method, define I C i to be the event that all characteristics are in-control, and let Pr{x k | I C i } be the conditional probability of observing x k given all characteristics are in-control. Further, let x k ' = μ 01 , ..., x ki , μ 0i+1 , ..., μ 0m T in the aforementioned multivariate normal density, so that ). Note that this evaluation is proportional to , and since it is assumed that characteristic i is under control, no matter the condition of the other characteristics, this approximation is justifiable.
In the second approximation method, we assume Pr{x k | OOC i }∝ 1 where R is a sufficiently big constant number to ensure the above definition is less than one.
The approximation to Pr{x k | OOC i } in equation (37) has the following two properties: • It does not require the value of out-of-control means to be known.
• The determination of a threshold for the decision-making process (derived later) will be easier.
Niaki and Fallahnezhad [8] defined another equation for the above conditional probability and showed that if a shift occurs in the mean of variable i, thenLim proposed a novel method of detection and classification and used simulation to compare its performances with that of existing methods in terms of the average run length for different mean shifts. The results of the simulation study were in favor of their proposed method in almost all shift scenarios. Besides using a different equation, the main difference between the current research and Niaki and Fallahnezhad [8] is that the current work develops a novel heuristic threshold policy, in which to save sampling cost and time or when these factors are constrained, the number of the data gathering stages is limited.

The proposed procedure
Assuming a limited number of the data gathering stages, N , to detect and diagnose characteristic(s), a heuristic threshold policy-based model is developed in this Section. The framework of the proposed decision-making process follows.
Step I Define i = 1, 2, ..., m as the set of indices for the characteristics, all of which having the potential of being out-of-control.
Step II Using the maximum entropy principle, initialize B i (O 0 ) = 1 / m as the prior belief of the i th variable to be out-of-control. In other words, at the start of the decision-making process all variables have an equal chance of being out-of-control. Set the discount rateα, the maximum probability of correct selection when N decision making stages remainsV (N ), and the maximum number of decision making stagesN .
Step III Set k = 0 Step IV Obtain an observation of the process.
Step VI Obtain the order statistics on the posterior beliefs B i (O k ) such that Step where " ≜ " means "defined as." Let S i and S j be the event of selecting i and j as the out-of-control variables, respectively, and N S i, j be the event of not selecting any. Then, by conditioning on the probability, we have: At the k th iteration, the conditional bi-variate distribution of the sample means for variables gr and sm, i.e, X k , j=gr ,sm | X k , j≠gr ,sm , is determined using the conditional property of multivariate normal distribution given in appendix 1. Moreover, knowing E(x k , j ) = μ j and evaluating the conditional mean and standard deviation (see appendix 1) results in Based on the decomposition method of Mason et al. [9], define statistics T k , j and T k ,i| j as Thus, when the process is in-control, the statistics T k , j and T k ,i| j follow a standard normal distribution [9].  The probability measure Pr i, j {S i } is defined as the probability of selecting variable i to be outof-control. Regarding to the explained strategy, we have: By similar reasoning, we have: The term Pr i, j {CS | N S i, j } denotes the probability of correct selection conditioned on excluding the candidates i and j as the solution. In other words, the maximum belief has been less than the threshold (minimum acceptable belief) d i, j * (k) and the decision making process continues to the next stage. In terms of stochastic dynamic programming approach, the probability of this event is equal to the maximum probability of correct selection when there are N − 1 stages remaining. The discounted value of this probability in the current stage using the discount factor α equals to α V i, j (N − 1). Further, since we partitioned the decision space into events In other words, The method of evaluating the minimum acceptable belief d gr ,sm * (k ) is given in Appendix 2.
Step VIII: The Decision Step If the belief B gr ,sm (gr; x k , O k −1 ) in the candidate set (sm, gr) is equal to or greater than d gr ,sm * (k ) then choose the variable with index gr to be out-of-control. In this case, the decision-making process ends. Otherwise, without having any selection at this stage, obtain another observation, lower the number of remaining decision-stages to N − 1, set k = k + 1, and return to step V above. The process will continue until either the stopping condition is reached or the number of stages is finished. The optimal strategy with N decision-making stages that maximizes the probability of correct selection would be resulted from this process.
In what follows, the procedure to evaluate V i, j * (N ) of equation (53) is given in detail.

Method of evaluating V i, j * (N )
Using d i, j * (k) as the minimum acceptable belief, from equation (53) we have ( 1) Pr ; Then, for the decision-making problem at hand, three cases may occur This only happens when d i, j * (k ) = 1, making the probability terms equal to zero. In other words, * (k) = 1, we continue to the next stage.
) are both positive and to maximize V i, j (N , d i, j (k ) ) we need the two probability terms in equation (54) We first present the method of evaluating Pr{B gr ,sm (sm; O k ) ≥ d gr ,sm (k)} as follows.
Then, the method of evaluating probability terms in equation (57) is given in appendix 2.
With similar reasoning, we have, The method of determining the minimum acceptable belief is given in appendix 2.

An application for fault detection in uni-variate statistical quality control environments
In a uni-variate quality control environment, if we limit ourselves to apply a control charting method, most of the information obtained from data behavior will be ignored. The main aim of a control charting method is to detect quickly undesired faults in the process. However, we may calculate the belief for the process being out-of-control applying Bayesian rule at any iteration in which some observations on the quality characteristic are gathered. Regarding these beliefs and a stopping rule, we may find and specify a control threshold for these beliefs and when the updated belief in any iteration is more than this threshold, an out-of-control signal is observed.
In Decision on Beliefs, first, all probable solution spaces will be divided into several candidates (the solution is one of the candidates), then a belief will be assigned to each candidate consid-ering our experiences and finally, the beliefs are updated and the optimal decision is selected based on the current situation. In a SPC problem, a similar decision-making process exits. First, the decision space can be divided into two candidates; an in-control or out-of-control production process. Second, the problem solution is one of the candidates (in-control or out-of-control process). Finally, a belief is assigned to each candidate so that the belief shows the probability of being a fault in the process. Based upon the updated belief, we may decide about states of the process (in-control or out-of-control process).

Learning -The beliefs and approach for its improvement
With this feature, the updated belief is obtained using Bayesian rule: Since in-control or out-of-control state partition the decision space, we can write equation (60) as Assuming the quality characteristic of interest follows a normal distribution with mean μ and variance σ 2 , we use equation (61) to calculate both beliefs for occurring positive or negative shifts in the process mean μ.
• Negative shifts in the process mean The values of B − (O k ) denotes the probability of being a negative shift in the process mean that is calculated using equation (61) Also is Pr{x k | Out − of − control} calculated using equation (66).

( )
Thus B − (O k ) is determined by the following equation,

A decision on beliefs approach
We present a decision making approach in terms of Stochastic Dynamic Programming approach. Presented approach is like an optimal stopping problem.
Suppose n stages for decision making is remained and two decisions are available.
• A positive shift is occurred in the process mean • No positive shift is occurred in the process mean Decision making framework is as follows: • Gather a new observation.
• Calculate the posterior Beliefs in terms of prior Beliefs.
• Order the current Beliefs as an ascending form and choose the maximum.
• Determine the value of the minimum acceptable belief (d + (n) is the minimum acceptable belief for detecting the positive shift and d − (n) is the least acceptable belief for detecting the negative shift) • If the maximum Belief was more than the minimum acceptable belief, d + (n), select the belief candidate with maximum value as a solution else go to step 1.
• In terms of above algorithm, the belief with maximum value is chosen and if this belief was more than a control threshold like d + (n), the candidate of that Belief will be selected as optimal candidate else the sampling process is continued. The objective of this model is to determine the optimal values of d + (n). The result of this process is the optimal strategy with n decision making stages that maximize the probability of correct selection.
Suppose new observation x k is gathered. (k is the number of gathered observations so far).
V ( n, d + (n)) is defined as the probability of correct selection when n decision making stages are remained and we follow d + (n) strategy explained above also V (n) denotes the maximum value of V ( n, d + (n)) thus, CS is defined as the event of correct selection. S 1 is defined as selecting the out-of-control condition (positive shift) as an optimal solution and S 2 is defined as selecting the in-control condition as an optimal decision and NS is defined as not selecting any candidate in this stage.
Hence, using the total probability law, it is concluded that: Pr{CS | S 1 } denotes the probability of correct selection when candidate S 1 is selected as the optimal candidate and this probability equals to its belief, B + (O k ), and with the same discussion, it is concluded that Pr{CS | S 2} = 1 − B + (O k ) Pr{S 1 } is the probability of selecting out of control candidate (positive shift) as the solution thus following the decision making strategy, we should have 1. Pr{CS | NS} denotes the probability of correct selection when none of candidates has been selected and it means that the maximum value of the beliefs is less than d + (n) and the process of decision making continues to latter stage. As a result, in terms of Dynamic Programming Approach, the probability of this event equals to maximum of probability of correct selection in latter stage (n-1), V (n − 1), but since taking observations has cost, then the value of this probability in current time is less than its actual value and by using the discounting factor α, it equals αV (n − 1)

2.
Since the entire solution space is partitioned, it is concluded that Pr{CS | NS} = 1 − (Pr{S 1 } + Pr{S 2 }) By the above preliminaries, the function V (n) is determined as follows: In terms of above equation, V ( n, d + (n)) is obtained as follows: Now equation (73) is rewritten as follows: There are three conditions: In this condition, both B + (gr, O k ) − αV (n − 1) and B + (sm, O k ) − αV (n − 1) are negative, thus we should have d + (n) = 1 in order to maximize V ( n, d + (n)). Since B + (gr, O k ) < d + (n) = 1, we don't select any candidate in this condition and sampling process continues.
In this condition, one of the probabilities in equation (10) has positive coefficient and one has negative coefficient, to maximize V ( n, d + (n)), optimality methods should be applied.
• Definition:h ( d + (n)) is defined as follows: Now equation (73) can be written as follows: And equation (79) can be written as follows: Since V * (n) = Max 0.5<d + (n)<1 V ( n, d + (n)) thus it is sufficient to maximize the real value function V ( n, d + (n)), therefore; we should find the function value in points where first derivative is equated to zero as follows, The optimal threshold d + (n) is determined by the above equation. Since the optimal value of d + (n) should be in the interval [0.5, 1] thus it is concluded that the optimal value of d + (n) will be determined as follows: The above method is presented for detecting the positive shifts in the process mean and can be adapted for detecting the negative shifts with the same reasoning.

3.
If n < 0, then no shift is occurred in the process mean and decision making stops.    3. The approximate value of αV (n − 1) based on the discount factor α in the stochastic dynamic programming approach is α n V (0).

Conclusion
In this chapter, we introduced a new approach to determine the best solution out of m candidates. To do this, first, we defined the belief of selecting the best solution and explained how to model the problem by the Bayesian analysis approach. Second, we clarified the approach by which we improved the beliefs, and proved that it converges to detect the best solution. Next, we proposed a decision-making strategy using dynamic programming approach in which there were a limited number of decision-making stages.

Appendix 1
Conditional Mean and Variance of the Variables Conditional mean of variables gr and sm can be evaluated using the following equation.