Classifying by Bayesian Method and Some Applications

Tai Vovan

doi:10.5772/intechopen.70052

Abstract

This chapter sums up and proposes some results related to classification problem by Bayesian method. We present the classification principle, Bayes error, and establish its relationship with other measures. The determination for Bayes error in reality for one and multi-dimensions is also considered. Based on training set and the object that we need to classify, an algorithm to determine the prior probability that can make to reduce Bayes error is proposed. This algorithm has been performed by the MATLAB procedure that can be applied well with real data. The proposed algorithm is applied in three domains: biology, medicine, and economics through specific problems. With different characteristics of applied data sets, the proposed algorithm always gives the best results in comparison to the existing ones. Furthermore, the examples show the feasibility and potential application in reality of the researched problem.

Keywords

Bayesian method
classification
error
prior
application

Author Information

Show +

Tai Vovan*
- College of Natural Sciences, Can Tho University, Can Tho City, Vietnam

*Address all correspondence to: vvtai@ctu.edu.vn

1. Introduction

Classification problem is one of the main subdomains of discriminant analysis and closely related to many fields in statistics. Classification is to assign an element to the appropriate population in a set of known populations based on certain observed variables. It is an important development direction of multivariate statistics and has applications in many different fields [25, 27]. Recently, this problem is interested by many statisticians in both theories and applied areas [14–18, 22–25]. According to Tai [22], we have four main methods to solve the classification problem: Fisher method [6, 12], logistic regression method [8], support vector machine (SVM) method [3], and Bayesian method [17]. Because Bayesian method does not require normal condition for data and can classify for two and more populations it has many advantages [22–25]. Therefore, it has been used by many scientists in their researches.

Given k populations {w_i}, with probability density functions (pdfs) and the prior probabilities respectively {f_i} and {q_i}, i = 1, 2,…, k, where q i ∈ ( 0 ; 1 ) , ∑ i = 1 k q i = 1. Pham–Gia et al. [17] used the maximum function of pdfs as a tool to study about Bayesian method and obtained important results. The classification principle and Bayes error were established based on the g_max(x) = max{q₁f₁(x), q₂f₂(x), …, q_kf_k(x)}. The relationship between the upper and lower bounds of the Bayes error and the L¹—distance of the pdfs and the overlap coefficient of the pdfs—were established. The function g_max(x) played a very important role in the classification problem by Bayesian method and Pham–Gia et al. [17] continued to do research on it. Using the MATLAB software, Pham–Gia et al. [18] succeeded in identifying g_max(x) for some cases of the bivariate normal distribution. With similar development, Tai [22] has proposed the L¹—distance of the {q_if_i(x)}—and established its relationship with Bayes error. This distance is also used to calculate Bayes error as well as to classify new element. This research has been applied in classifying ability to repay debt of bank customers. However, we think that the survey of two Bayesian approach relevant research was not yet completed. There are some relations between Bayes error and other statistical measures.

Bayesian method has many advantages. However, to our knowledge, the field of applications of this method in practice is narrower than other methods. We can find many applications in banking and medicine using Fisher method, SVM method, logistic method [1, 3, 8, 12]. Recently, all statistics software can effectively and quickly process the classification of large data sets and multivariate statistics using either three of the methods mentioned above, whereas the Bayesian method does not have this advantage. The cause of this problem is the ambiguity in determining prior probability, in estimating pdfs, and the complexity in calculating Bayes error. Although all these issues have been discussed by many authors, the optimal methods have yet to be found [22, 25]. In this chapter, we consider to estimate the pdf and to calculate Bayes error to apply in reality. We will present the problem on how to determine the prior probability in this chapter. In case of noninformation, we normally choose prior probabilities by uniform distribution. If we have some types of past data or training set, the prior probabilities are estimated either by Laplace method: q_i = (n_i + n/k)/(N + n) or by the frequencies of the sample: q_i = n_i/N, where n_i and N are the number of elements in the ith population and training set, respectively, n is the number of dimensions, and k is the number of groups. The above-mentioned approaches have been studied and applied by many authors [14, 15, 22, 25]. We will also propose an algorithm to determine prior probability based on the training set, classified objective, and fuzzy cluster analysis. The proposed algorithm is applied in some specific problems of biology, medicine, and economics and has advantages over existing approaches. All calculations are performed by MATLAB procedures.

The next section of this chapter is structured as follows. Section 2 presents the classification principle and Bayes error. Some results of the Bayes error are also established in this section. Section 3 resolves the related problems in real application of the Bayes method. There are estimation of pdfs and determination of Bayes error in case of one dimension and multidimension. This section also proposes an algorithm to determine prior probability. Section 4 applies the proposed algorithm in real problems and compares outcome results to those obtained using existing approaches. Section 5 concludes this chapter.

2. Classifying by Bayesian method

The classification problem by Bayesian method has been presented in many documents [15, 16, 27], where the classification principle and the Bayes error are established based on Bayes theorem. In this section, we present them via the maximum function of q_if_i(x), i = 1, 2, …, k that they have advantages over existing approaches in real application [17, 18, 21–25]. This section also establishes the upper and lower bounds of the Bayes error and the relationships of Bayes error with other measures in statistical pattern recognition.

2.1. Classification principle and Bayes error

Given k populations w₁, w₂, …, w_k with q_i ∈ (0;1) and f_i(x) are the prior probability and pdf of ith population, respectively, i = 1, 2, …, k. According to Pham–Gia et al. [17], element x₀ will be assigned to w_i if

g i ( x 0 ) = g max ( x 0 ) , i = 1 , 2 , … , k E1

where g i ( x ) = q i f i ( x ) , g max ( x ) = max { q 1 f 1 ( x ) , q 2 f 2 ( x ) , … , q k f k ( x ) } .

Bayes error is given by the formula:

P e 1 , 2 , … , k ( q ) = ∑ i = 1 k ∫ R n \ R i n q i f i d x = 1 − ∑ i = 1 k ∫ R i n q i f i ( x ) d x , E2

where R i n = { x | q i f i ( x ) > q j f j ( x ) , ∀ i ≠ j , i , j = 1 , 2 , … , k } , ( q ) = ( q 1 , q 2 , … , q k ) .

From Eq. (2), we can prove the following result:

P e 1 , 2 , … , k ( q ) = ∑ j = 1 k ∫ R n \ R j n q j f j ( x ) d x = ∑ j = 1 k [ ∫ R n q j f j ( x ) d x − ∫ R j n max 1 ≤ l ≤ k { q l f l ( x ) } d x ] = ∫ R n ∑ j = 1 k q j f j ( x ) d x − ∑ j = 1 k ∫ R j n max 1 ≤ l ≤ k { q l f l ( x ) } d x = 1 − ∫ R n max 1 ≤ l ≤ k { q l f l ( x ) } d x E20

or

P e 1 , 2 , … , k ( q ) = 1 − ∫ R n g max ( x ) d x . E3

The correct probability is determined by C e 1 , 2 , … , k ( q ) = 1 − P e 1 , 2 , … , k ( q ) .

For k = 2, we have

P e 1 , 2 ( q , 1 − q ) = ∫ R n min { q f 1 ( x ) , ( 1 − q ) f 2 ( x ) } d x = λ 1 , 2 ( q , 1 − q ) = 1 2 [ 1 − ‖ q f 1 , ( 1 − q ) f 2 ‖ 1 ] , E4

where

λ 1 , 2 ( q , 1 − q ) is the overlap area measure of qf₁(x) and (1−q)f₂(x) and ‖ q f 1 , ( 1 − q ) f 2 ‖ 1 = ∫ R n | q f 1 ( x ) − ( 1 − q ) f 2 ( x ) | d x .

2.2. Some results about Bayes error

Theorem 1. Let f_i(x), i =1, 2, …, k, k ≥ 3 be k pdfs defined on R n , n ≥ 1 , q i ∈ ( 0 ; 1 ) . We have the relationships of Bayes error with other measures as follow:

P e 1 , 2 , … , k ( q ) ≤ 1 − 1 k − 1 ( 1 − ∏ j = 1 k q j α j D T ( f 1 , f 2 , … , f k ) α ) , E5
P e 1 , 2 , … , k ( q ) ≤ ∑ i < j q i β q j 1 − β D T ( f i , f j ) ( β , 1 − β ) , E6
{ ( k − 1 ) − ∑ i ∑ j ‖ g i , g j ‖ 1 } / k ≤ P e 1 , 2 , … , k ( q ) ≤ 1 − ( 1 / 2 ) max i < j { ‖ g i , g j ‖ 1 } − min i { q i } , E7
0 ≤ P e 1 , 2 , … , k ( q ) ≤ max i { q i } , E8

where

α = ( α 1 , α 2 , … , α k ) ; α j , β ∈ ( 0 , 1 ) , ∑ j = 1 k α j = 1 , i, j = 1, 2, …, k, and

D T ( f 1 , f 2 , … , f k ) α = ∫ R n ∏ j = 1 k [ f j ( x ) ] α j d x is affinity of Toussaint [26].

Proof:

For each j = 1,2,…,k, we have
( ∑ j = 1 k q j f j ) α i ≥ ( q i f i ) α i , i = 1 , 2 , … , k . E21
Therefore,
( ∑ j = 1 k q j f j ) α 1 + α 2 + … + α k ≥ ∏ j = k ( q j f j ) α j ⇔ ∑ j = 1 k q j f j ≥ ∏ j = k ( q j f j ) α j . E9
On the other hand,
( min 1 ≤ j ≤ k { q j f j } ) α 1 ≤ ( q 1 f 1 ) α 1 , … … , ( min 1 ≤ j ≤ k { q j f j } ) α k ≤ ( q k f k ) α k , E22
So
( min 1 ≤ j ≤ k { q j f j } ) α 1 + ⋯ + α k ≤ ∏ j = 1 k ( q j f j ) α j . E23
or
min 1 ≤ j ≤ k { q j f j } ≤ ∏ j = 1 k ( q j f j ) α j . E10
Combining Eqs. (9) and (10), we obtain
0 ≤ ∑ j = 1 k q j f j − ∏ j = 1 k ( q j f j ) α j ≤ ∑ j = 1 k q j f j − min 1 ≤ j ≤ k { q j f j } . E24
Because ∑ j = 1 k q j f j − min 1 ≤ j ≤ k { q j f j } includes (k−1) terms, we have
∑ j = 1 k q j f j − min 1 ≤ j ≤ k { q j f j } ≤ ( k − 1 ) max 1 ≤ j ≤ k { q j f j } . E25
Thus,
0 ≤ ∑ j = 1 k q j f j − ∏ j = 1 k ( q j f j ) α j ≤ ( k − 1 ) max 1 ≤ j ≤ k { q j f j } . E26
Integrating the above relation, we obtain:
1 − ∏ j = 1 k q j α j D T ( f 1 , f 2 , … , f k ) α ≤ ( k − 1 ) ∫ R n g max ( x ) d x . E11
Using ∫ R n g max ( x ) = 1 − P e 1 , 2 , … , k ( q ) for Eq. (11), we have Eq. (5).
From Eq. (2), we have
P e 1 , 2 , … , k ( q ) = ∑ j = 1 k ∫ R n \ R j n q j f j ( x ) d x = ∑ j = 1 k ∑ j ≠ i ∫ R j n min { q i f i ( x ) , q j f j ( x ) } d x = ∑ i < j ∫ R i n min { q i f i ( x ) , q j f j ( x ) } d x . E27
Since
[ min { q i f i ( x ) , q j f j ( x ) } ] β ≤ ( q i f i ) β and [ min { q i f i ( x ) , q j f j ( x ) } ] 1 − β ≤ ( q i f i ) 1 − β , E28
then
min { q i f i ( x ) , q j f j ( x ) } ≤ ( q i f i ) β ( q j f j ) 1 − β . E29
Integrating the above inequality, we obtain:
P e 1 , 2 , … , k ( q ) ≤ ∑ i < j ∫ R i n [ ( q i f i ( x ) ) β ( q j f j ( x ) ) 1 − β ] d x ≤ ∑ i < j q i β q j 1 − β D T ( f i , f j ) ( β , 1 − β ) d x . E30
We have
∫ R n max { g 1 ( x ) , g 2 ( x ) , … , g k ( x ) } d x ≥ max i < j ∫ R n max { g i ( x ) , g j ( x ) } d x E31
On the other hand,
max i < j { ∫ R n max { g i ( x ) , g j ( x ) } d x } = max i < j { 1 2 ‖ g i , g j ‖ 1 + 1 2 ( q i + q j ) } ≥ max i < j { 1 2 ‖ g i , g j ‖ 1 } + min i < j { 1 2 ( q i + q j ) } ≥ max i < j { 1 2 ‖ g i , g j ‖ 1 } + min i < j { ( q 1 , q 2 , … , q k ) } . E32
Hence,
∫ R n g max ( x ) d x ≥ 1 2 max i < j { ‖ g i , g j ‖ 1 } + min i < j { ( q 1 , q 2 , … , q k ) } . E12
We also have ∑ i < ∑ j | g i − g j | ≥ ∑ j = 1 k [ max { g 1 , g 2 , ⋯ g k } − g j ] = k [ max { g 1 , g 2 , ⋯ g k } ] − ∑ j = 1 k g j
Therefore,
max { g 1 , g 2 , ⋯ g k } ≤ 1 k ∑ i < ∑ j | g i − g j | + 1 k ∑ j = 1 k g j . E13
Since
∫ R n g i ( x ) d x = q i and ∑ i = 1 k q i = 1 , the inequality Eq. (13) becomes:
∫ R n g max ( x ) d x ≤ 1 k ∑ i < j ‖ g i , g j ‖ 1 + 1 k . E14
Replacing ∫ R n g max ( x ) = 1 − P e 1 , 2 , … , k ( q ) to Eqs. (12) and (14), we have Eq. (7).
We have
q i f i ( x ) ≤ max { q 1 f 1 ( x ) , q 2 f 2 ( x ) , … , q k f k ( x ) } ≤ ∑ i = 1 k q i f i ( x ) for all i = 1,…,k.
Integrating the above relation, we obtain:
q i ≤ ∫ R n g max ( x ) d x ≤ 1 . E33
Above inequality is true for all i = 1,…,k, so
max { q i } ≤ ∫ R n g max ( x ) d x ≤ 1 . E34

Replacing ∫ R n g max ( x ) = 1 − P e 1 , 2 , … , k ( q ) in above relation, we have Eq. (8).

From the result of Eqs. (5) and (6), with α 1 = α 2 = … = α k = 1 / k , , we have the relationship between Bayes error and affinity of Matusita [11]. Especially, when k = 2, we have the relationship between P e 1 , 2 ( q , 1 − q ) and Hellinger’s distance.

In addition, we also have the relation between Bayes error and overlap coefficients as well as L¹–distance of {g₁(x), g₂(x), …, g_k(x)} (see Ref. [22]). For special case: q₁ = q₂ = … = q_k = 1/k, we had established expressions about relations between Bayes error and L¹–distance of {f₁(x), f₂(x), …, f_k(x)}, P e 1 , 2 , … , k ( 1 / k ) and P e 1 , 2 , … , k + 1 ( 1 / ( k + 1 ) ) (see Ref. [17]).

3. Related problems in applying of Bayesian method

To apply Bayesian method in reality, we have to resolve three main problems: (i) Determine prior probability, (ii) compute Bayes error, and (iii) estimate pdfs. In this section, we propose an algorithm to solve for (i) based on fuzzy cluster analysis and classified objective that can reduces Bayes error in comparing with traditional approaches. For (ii), Bayes error is established by closed expression for general case and determine it by an algorithm to find maximum function of g_i(x), i = 1, 2, …, k for one dimension case. The quasi-Monte Carlo method is proposed to compute Bayes error in this section. For (iii), we review the problem to estimate pdfs by kernel function method where the bandwidth parameter and kernel function are specified.

3.1. Prior probability

In the n-dimensions space, given N populations N ( 0 ) = { W 1 ( 0 ) , W 2 ( 0 ) , … , W N ( 0 ) } with data set Z = [z_ij]_nxN. Let matrix U = [ μ i k ] c × n , where μ_ik is probability of the kth element belonging to w_i. We have μ i k ∈ [ 0 , 1 ] and satisfies the following conditions:

∑ i = 1 c μ i k = 1 , 0 < ∑ k = 1 N μ i k < N , 1 ≤ i ≤ c , 1 ≤ k ≤ N . E35

We call

M z c = { U = [ μ i k ] c x N | μ i k ∈ [ 0 , 1 ] , ∀ i , k ; ∑ i = 1 c μ i k = 1 , ∀ k ; 0 < ∑ k = 1 N μ i k , ∀ i } E15

be fuzzy partitioning space of k populations,

D i k A 2 = ‖ z k − v i ‖ A 2 = ( z k − v i ) T A ( z k − v i ) is the matrix whose element d i k 2 is the square of distance from the object z_k to the ith representative population. This representative is computed by the following formula:

v i = ∑ k = 1 N ( μ i k ) m z k ∑ k = 1 N ( μ i k ) m , 1 ≤ i ≤ c , E16

where m ∈ [1,∞) is the fuzziness parameter.

Given the data set Z including c known populations w_1, w₂, …, w_c. Assume x₀ is an object that we need to classify. To identify the prior probabilities when classifying x₀, we propose the following prior probability by fuzzy clustering (PPC) algorithm:

Algorithm 1. Determining prior probability by fuzzy clustering (PPC)
Input: The data set Z = [ z i j ] n × N of c populations {w₁, w₂, …, w_c}, x₀, ε, m and the initial partition matrix U = U ( 0 ) = [ μ i j ] c × N + 1 , where μ_ij = 1 if the jth object belongs to the w_i and μ_ij = 0 for the opposite, i = 1 , c ¯ ; j = 1 , N ¯ , μ i j = 1 / c for j = N + 1.
Output: The prior probability μ i ( N + 1 ) , i = 1 , 2 , … c .

Repeat:
Find the representative object of w_i: v i = ∑ k = 1 N ( μ i k ) m z k ∑ k = 1 N ( μ i k ) m , 1 ≤ i ≤ c
Compute the matrix [ D i k ] c × N + 1 (the pairwise distance between objects and representative objects).
Update the new partition matrix U^(new) by the following principle:
If D_ik > 0 for all i = 1 , 2 , … , c ; k = 1 , 2 , … , N + 1 then

μ i k ( new ) = 1 ∑ j = 1 c ( D i k / D j k ) 2 / ( m − 1 ) , i ≠ j = 1 , 2 , … , c
Else, μ i k ( new ) = 0
End;
Compute S = ‖ U ( new ) − U ‖ = max i k ( | μ i k ( new ) − μ i k | )
U = U ( new )
Until S < ε;
The prior probability μ i ( N + 1 ) , i = 1 , 2 , … c (the final column of the matrix U);

In the above algorithm, we have:

ε is a really small number and is chosen arbitrarily. The smaller ε is, the more iterations time is taken. In the examples of this chapter, we choose ε = 0.001.
The distance matrix D_ik depends on the norm-inducing matrix A. When A = I, D_ik is the matrix of Euclidean distances. Besides, there are several choices of A, such as diagonal matrix or the inverse of the covariance matrix. In this chapter, we chose the Euclidean distances in the numerical examples and applications.
m is the fuzziness parameter, when m = 1, the fuzzy clustering becomes the nonfuzzy clustering. When m → ∞, the partition becomes completely fuzzy μ_ik = 1/c. The determining of this parameter, which affects the analysis result, is difficult. Even though Yu et al. [28] proposed two rules to determine the supermom of m for clustering problems, the searching of the specific m was done by meshing method (see [2, 4, 5, 9] for more details). By this process, the best m among several of given values will be chosen. In this chapter, m is also identified by meshing method for the classification problem. The best integer m between 2 and 10 will be used.

At the end of the PPC algorithm, we obtain the prior probabilities of x₀ based on the last column of the partition matrix U ( μ i ( N + 1 ) , i = 1 , 2 , … c ) . The PPC algorithm helps us determine the prior probabilities via the closeness degree between the classified object and the populations. Each object will receive its suitable prior probabilities.

In this chapter, Bayesian method with prior probabilities calculated by the uniform distribution approach, the ratio of samples approach, the Laplace approach, and the proposed PPC algorithm approach are respectively called BayesU, BayesR, BayesL, and BayesC.

Example 1. Given the studied marks (scale 10 grading system) of 20 students. Among them, nine students have marks that are lower than 5 (w₁: fail the exam) and 11 students have marks that are higher than 5 (w₂: pass the exam). The data are given in Table 1.

Objects	Marks	Groups	Objects	Marks	Groups
1	0.6	w₁	11	5.6	w₂
2	1.0	w₁	12	6.1	w₂
3	1.2	w₁	13	6.4	w₂
4	1.6	w₁	14	6.4	w₂
5	2.2	w₁	15	7.3	w₂
6	2.4	w₁	16	8.4	w₂
7	2.4	w₁	17	9.2	w₂
8	3.9	w₁	18	9.4	w₂
9	4.3	w₁	19	9.6	w₂
10	5.5	w₂	20	9.8	w₂

Table 1.

The studied marks of 20 students and the actual classifications.

Assume that we need to classify the ninth object, x₀ = 4.3, to one in two populations. Using the PPC algorithm, we have the following final partition matrix:

( 0.957 0.973 0.981 0.993 1 0.997 0.997 0.830 0.321 0.290 0.158 0.1 0.1 0.01 0.009 0.037 0.045 0.054 0.062 0.724 0.043 0.027 0.019 0.007 0 0.003 0.003 0.170 0.679 0.710 0.842 0.9 0.9 0.99 0.991 0.963 0.955 0.946 0.938 0.276 ) E36

This matrix shows the prior probabilities when assigning the ninth object to w₁ and w₂ are 0.724 and 0.276, respectively. Meanwhile, the prior probabilities determined by BayesU, BayesR, and BayesL are (0.5; 0.5), (0.421; 0.579), and (0.429; 0.571), respectively.

From the data in Table 1, we might estimate the pdfs f₁(x) and f₂(x) and compute the values q₁f₁(x) and q₂f₂(x), where q₁ and q₂ are the calculated prior probabilities. The results of classifying x₀ by four approaches: BayesU, BayesR, BayesL, and BayesC are given in Table 2.

Methods	_Priors	_gmax(x0)	Populations	Bayes errors
BayesU	(0.5; 0.5)	0.0353	2	0.0538
BayesR	(0.421; 0.579)	0.0409	2	0.0558
BayesL	(0.429; 0.571)	0.0403	2	0.0557
BayesC	(0.724; 0.276)	0.0485	1	0.0241

Table 2.

The results when classifying the ninth object.

Because the actual population of x₀ is w₁, only BayesC gives the true result. The Bayes error of BayesC is also the smallest. Thus, in this example, the proposed method improves the drawback of the traditional method in determining the prior probabilities.

3.2. Determining Bayes error

Theorem 2. Let f_i(x), i =1, 2, …, k, k ≥ 3 be k pdfs defined on Rⁿ, n ≥ 1 and let q_i ∈ (0;1),

{ R 1 n = { x ∈ R n : q 1 f 1 ( x ) > q j f j ( x ) , 2 ≤ j ≤ k } , R k n = { x ∈ R n : q k f k ( x ) > q j f j ( x ) , 1 ≤ j ≤ k } , R l n = { x ∈ R n : q i f i ( x ) > q l f l ( x ) , 1 ≤ i ≤ k , 2 ≤ l ≤ k − 1 , i ≠ l } . E17

The Bayes error is determined by

P e 1 , 2 , … , k ( q ) = 1 − ∫ R 1 n q 1 f 1 ( x ) d x − ∑ l = 2 k − 1 ∫ R l n q l f l ( x ) d x − ∫ R k n q k f ( x ) k d x . E18

Proof:

To obtain Eq. (18), we need to prove two following results:

R i n ∩ R j n = φ , ( 1 ≤ i ≠ j ≤ k ) E37

and ⋃ i = 1 k R i n = R 1 n ∪ ⋃ i = 2 k − 1 R i n ∪ R k n = R n , f max ( x ) = f i ( x ) , ∀ x ∈ R i n .

Let A ¯ = R n \ A , we have

R ¯ i j = { x ∈ R n : q i f i ( x ) ≤ q j f j ( x ) } , R i j = { x ∈ R n : q i f i ( x ) > q j f j ( x ) } , ( 1 ≤ i , j ≤ k ) . E38

From Eq. (17), we obtain

R 1 n = ⋂ j = 2 k R 1 j , R l n = ⋂ i ≠ k R ¯ i l , ( 2 ≤ l < k ) . E39

Therefore,

R 1 n ∩ R l n = ( ⋂ j = 2 k R i j ) ∩ ( ⋂ i ≠ k R ¯ i l ) ⊂ R i l ∩ R ¯ 1 l = φ ⇒ R 1 n ∩ R l n = φ , ( 2 ≤ l < k ) . E40

On the other hand, from antithesis style of D’Morgan, we have

R 1 n ∪ R l n ¯ = ( ⋃ j = 2 n R ¯ i j ) ∪ ( ⋃ i ≠ k R i l ) ⊂ R ¯ i l ∩ R 1 l = φ ⇒ R 1 n ∪ R l n = R n , ( 2 ≤ l < k ) . E41

Similarly,

R k n ∩ R l n = φ , ( 2 ≤ l < k ) , R 1 n ∩ R k n = φ , E42

so

⋃ i = 1 k R i n = R n , ∪ ( ⋃ l = 2 k − 1 R l n ) ∪ R k n = R 1 n ∪ ( ⋃ l = 2 k − 1 R l n ) ∪ R k n = ( ⋃ l = 2 k − 1 R 1 n ∪ R l n ) ∪ ( ⋃ l = 2 k − 1 R k n ∪ R l n ) = R n ∪ R n = R n ⇒ ⋃ i = 1 k R i n = R n . E43

In addition, from Eq. (17), we can directly find out

g m ax ( x ) = g i ( x ) , ∀ x ∈ R i n , ( 1 ≤ i ≤ k ) . E44

For k = 2, q₁ = q₂ = 1/2, we consider the two following special cases:

If f₁(x) and f₂(x) are two one-dimension normal pdfs ( N ( μ i , σ i ) , i = 1, 2), without loss of generality, we suppose that μ 1 < μ 2 (for μ 1 ≠ μ 2 ), σ 1 < σ 2 (for σ 1 ≠ σ 2 ), then
P e 1 , 2 ( 1 / 2 , 1 / 2 ) = { 1 2 [ ∫ − ∞ x 1 f 2 ( x ) d x + ∫ x 1 + ∞ f 1 ( x ) d x ] , i f σ 1 = σ 2 , 1 2 [ ∫ − ∞ x 2 f 1 ( x ) d x + ∫ x 2 x 3 f 2 ( x ) d x + ∫ x 3 + ∞ f 1 ( x ) d x ] , i f σ 1 < σ 2 , E45
where
x 1 = μ 1 + μ 2 2 , x 2 = ( μ 1 σ 2 2 − μ 2 σ 1 2 ) − σ 1 σ 2 ( μ 1 − μ 2 ) 2 + K σ 2 2 − σ 1 2 , x 3 = ( μ 1 σ 2 2 − μ 2 σ 1 2 ) + σ 1 σ 2 ( μ 1 − μ 2 ) 2 + K σ 2 2 − σ 1 2 , K = 2 ( σ 2 2 − σ 1 2 ) ln ( σ 2 σ 1 ) ≥ 0. E46
For μ₁ = μ₂ =μ, the above result becomes:
P e 1 , 2 ( 1 / 2 , 1 / 2 ) = { 1 , if σ 1 = σ 2 , 1 2 [ ∫ − ∞ x 4 f 1 ( x ) d x + ∫ x 4 x 5 f 2 ( x ) d x + ∫ x 5 + ∞ f 1 ( x ) d x ] if σ 1 < σ 2 , E47
where x 4 = μ − σ 1 σ 2 E and x 5 = μ + σ 1 σ 2 E with E = 2 σ 2 2 − σ 1 2 ln ( σ 2 σ 1 ) ≥ 0.
If f₁(x) and f₂(x) are two n-dimension normal pdfs ( N ( μ i , Σ i ) , n ≥ 2 , i = 1 , 2 ) then
P e 1 , 2 ( 1 / 2 , 1 / 2 ) = 1 2 [ ∫ R 1 f 2 ( x ) d x + ∫ R 2 f 1 ( x ) d x ] , E48
where
R 1 n = { x : d ( x ) ≤ 0 } , R 2 n = { x : d ( x ) > 0 } , d ( x ) = [ μ 1 T ( Σ 1 ) − 1 − μ 2 T ( Σ 2 ) − 1 ] x − 1 2 x T [ ( Σ 1 ) − 1 − ( Σ 2 ) − 1 ] x − m , m = 1 2 [ ln | Σ 1 | | Σ 2 | + μ 1 T ( Σ 1 ) − 1 μ 1 − μ 2 T ( Σ 2 ) − 1 μ 2 ] . E49

In case of n = 2, d(x) can be straight lines or parabola or ellipses or hyperbola.

3.3. Maximum function in the classification problem

To classify a new element by the principle (1) and to determine Bayes error by the formula (3), we must find g_max(x). Some authors, such as Pham–Gia et al. [15, 17] and Tai [21, 22], have surveyed relationships between g_max(x) with some related quantities of classification problem. The specific expression for g_max(x) in some special case has been found [18]. However, the general expression for all of cases is a complex problem that has not been still found yet.

Given k pdfs f_i(x) and q_i, i = 1, 2, …, k with q₁ + q₂ + …+ q_k = 1 and let g_i(x) = q_if_i(x), g_max(x) = max{g_i(x)}. Now, we take interest in determining g_max(x).

(a) For one dimension

In this case, we can find g_max(x) by the following algorithm:

Algorithm 2. Find the g_max(x) function
Input: g_i(x) = q_if_i(x), where f_i(x) and q_i are the probability density function and the prior probability of w_i, i = 1, 2, …, k, respectively.
Output: The g_max(x) function.

Find all roots of the equations g i ( x ) − g j ( x ) = 0 , i = 1 , k − 1 ¯ , j = i + 1 , k ¯ .
Let B be the set of all roots.
For x_lm ∈ B (the roof of equation g l ( x ) − g m ( x ) = 0 ) do
For p ∈{1,2,…,k}\{l,m} do
If g l ( x l m ) < g p ( x l m ) then B = B \ { x l m }
End
End
End
Arrange the elements of B in order from smallest to largest:
B = { x 1 , x 2 , … , x h } , x 1 < x 2 < … < x h

(Determine the function g_max(x) in interval (−∞,x₁])
For i = 1 to k do
If g i ( x 1 − ε 1 ) = max { g 1 ( x 1 − ε 1 ) , g 2 ( x 1 − ε 1 ) , … , g k ( x 1 − ε 1 ) } then
g max ( x ) = g i ( x ) , for all x ∈ (−∞,x₁]
End
End
(Determine the function g_max (x) in interval ( x j , x j + 1 ] , j = 1 , h − 1 ¯ )
For i =1 to k do
For j =1 to h-1 do
If g i ( x j + ε 2 ) = max { g 1 ( x j + ε 2 ) , g 2 ( x j + ε 2 ) , … , g k ( x k + ε 2 ) } then
g max ( x ) = g i ( x ) , for all x ∈ ( x j , x j + 1 ]
End
End
End
(Determine the function g_max (x) in interval (h,+∞))
For i = 1 to k do
If g i ( x h + ε 3 ) = max { g 1 ( x h + ε 3 ) , g 2 ( x h + ε 3 ) , … , g k ( x h + ε 3 ) } then
g max ( x ) = g i ( x ) , for all x ∈ ( h , + ∞ )
End
End

In the above algorithm, ε₁, ε₂, ε₃ are the positive constants such that:

x 1 + ε 1 < x 2 , x h − ε 3 > x h − 1 , x i − ε 2 < x i − 1 and x i + ε 2 < x i + 1 . E50

From this algorithm, we have written a MATLAB code to find the g_max(x). When g_max(x) is determined, we will easily calculate Bayes error by using formula (3), as well as classify a new element by principle (1).

Example 2. Given seven populations having univariate normal pdfs {f₁, f₂,…, f₇} with specific parameters as follows (Figure 1):

Figure 1.
The graph of seven one-dimension normal pdfs, f_max(x) and g_max(x).

μ 1 = 0.3 , μ 2 = 4.0 , μ 3 = 9.1 , μ 4 = 1.9 , μ 5 = 5.3 , μ 6 = 8 , μ 7 = 4.8 , σ 1 = 1.0 , σ 2 = 1.3 , σ 3 = 1.4 , σ 4 = 1.6 , σ 5 = 2 , σ 6 = 1.9 , σ 7 = 2.3. E51

Using codes written with q i = 1 / 7 , g i ( x ) = q i f i ( x ) , i = 1 , 2 , .. , 7 , we have the results:

g max ( x ) = { g 1 if − 1.28 < x ≤ 0.99 , g 2 if 2.58 < x ≤ 4.89 , g 3 if 8.30 < x ≤ 12.52 , g 4 if { − 7.86 < x ≤ − 1.28 } ∪ { 0.99 < x ≤ 2.58 } , g 5 if 4.89 < x ≤ 6.65 , g 6 if { 6.65 < x ≤ 8.30 } ∪ { 12.52 < x ≤ 23.33 } , g 7 if { x ≤ − 7.86 } ∪ { x > 23.33 } . E52

(b) For multidimension

In multidimension cases, it should be very complicated to obtain closed expression for g_max(x). The difficulty comes from the various forms of the intersection space curves between the pdfs surfaces. This problem has been interested by many authors in Refs. [17, 18, 21–25]. Pham–Gia et al. [18] have attempted to find the function g_max(x); however, it has been only established for some cases of bivariate normal distribution.

Example 3. Given the four bivariate normal pdfs N(μ_i, Σ_i) with the following specific parameters [16]:

μ 1 = [ 40 20 ] , μ 2 = [ 48 24 ] , μ 3 = [ 43 32 ] , μ 4 = [ 38 28 ] , Σ 1 = ( 35 18 18 20 ) , Σ 1 = ( 28 − 20 − 20 25 ) , Σ 1 = ( 15 25 25 65 ) , Σ 1 = ( 5 − 10 − 10 7 ) E53

With q₁ = 0.25, q₂ = 0.2, q₃ = 0.4, and q₄ = 0.15, we have the graphs of g_i(x) = q_if_i(x) and their intersection curves as shown in Figure 2.

Figure 2.
The graph of three bivariate normal pdfs and their g_max(x).

Here, we do not find the expression of g_max(x). We compute Bayes error instead by taking integration of g_max(x) by quasi-Monte Carlo method [17]. An algorithm for doing calculations has been constructed, and a corresponding MATLAB procedure is used in Section 4.

3.4. Estimate the probability density function

There are many parameter and nonparameter methods to estimate pdfs. In the examples and applications of Section 4, we use the kernel function method, the popular one in practice nowadays. It has the following formula:

f ⌢ ( x ) = 1 N h 1 h 2 … h n ∑ i = 1 N ∏ j = 1 n f j ( x j − x i j h j ) , E19

where x_j, j = 1,2,…,n are variables, x_j, i = 1,2,…,N are the ith data of the jth variable, h_j is the bandwidth parameter for the jth variable, f_j(.) is the kernel function of the jth variable which is usually normal, Epanechnikov, biweight, and triweight. According to this method, the choice of smoothing parameter and the type of kernel function play an important role and affect the result. Although Silverman [20], Martinez and Martinez [10], and some other authors [7, 13, 27] had discussions about this problem, the optimal choice still has not been found yet. In this chapter, the smoothing parameter is from the idea of Scott [19] and the kernel function is the Gaussian one. We have also written the code by MATLAB software to estimate the pdfs in n-dimensions space using this method.

We have written the complete code for the proposed algorithm by MATLAB software. It is applied effectively for the examples of Section 4.

4. Some applications

In this section, we will consider three applications in three domains: biology, medicine, and economics to illustrate for present theories and to test established algorithms. They also show that the proposed algorithm presents more advantages than the existing ones.

Application 1. We consider classification for well-known Iris flower data, which have been presented in many documents like in Ref. [17]. These data are often used to compare the new method and existing ones in classifying. The three varieties of Iris, namely, Setosa (Se), Versicolor (Ve), and Virginica (Vi), have data in four attributes: X1 = sepal length, X2 = sepal width, X3 = petal length, and X4 = petal width.

In this application, the cases of one, two, three and four variables are respectively considered to classify for three groups (Se), (Ve), and (Vi) by Bayesian method with different prior probabilities. The purpose of this performance is to compare the results of BayesC with BayesU, BayesR, and BayesL. Because the numbers of the three groups are equal, and the results of BayesU, BayesR, and BayesL are the same. The correct probability of methods is summarized in Table 3.

Variables	B BayesU = BayesL = BayesR	BayesC
X1	0.667	0.679
X2	0.668	0.579
X3	0.903	0.916
X4	0.815	0.827
X1, X2	0.715	0.807
X1, X3	0.893	0.895
X1, X4	0.807	0.850
X2, X3	0.891	0.898
X2, X4	0.809	0.815
X3, X4	0.843	0.866
X1, X2, X3	0.892	0.919
X1, X2, X4	0.764	0.810
X1, X3, X4	0.762	0.814
X2, X3, X4	0.736	0.822
X1, X2, X3, X4	0.725	0.745

Table 3.

The correct probability (%) in classifying Iris flower.

Table 3 shows that in almost all cases, the results of proposed algorithm are better than those using other algorithms, and in the case using three variables X1, X2, and X3, it gives the best results.

Application 2. This application considers thyroid gland disease (TGD). Thyroid gland is an important and the largest gland in our body. It is responsible for the metabolism and work process of all cells. Some of the common diseases of gland thyroid are hypothyroidism, hyperthyroidism, thyroid nodules, and thyroid cancer. They are dangerous diseases. Recently, the rate of thyroid gland disease has been increasing in some poor countries. Data includes 3772 person with 3541 for ill group (I) and 231 ones for nonill group (NI). Detail for this data is given in http://www.cs.sfu.ca/wangk/ucidata/dataset/thyroid–disease, in which the surveyed variables are Age (X1), Query on thyroxin (X2), Anti-thyroid medication (X3), Sick (X4), Pregnant (X5), Thyroid surgery (X6), Thyroid Stimulating Hormone (X7), Triiodothyronine (X8), Total thyroxin (X9), T4U measured (X10), and Referral source (X11). In this application, this chapter will use random 70% of the data size (2479 elements belong to group I and 162 elements belong to group NI) as the training set to determine significant variables, to estimate pdfs, and to find suitable model. About 30% of the remaining data will be used as test set (1062 elements belong to group I and 69 elements belong to group NI). The result of Bayesian method is also compared to others.

To assess the effect of independent variables in TGD, we build the logistic regression model log(p/1−p) with variables Xi, i = 1, 2, …, 11 (p is the probability of TGD). The analytical results are summarized in Table 4.

Variable	Sig.	Variable	Sig.
X1	0.000	X7	0.304
X2	0.279	X8	0.000
X3	0.998	X9	0.995
X4	0.057	X10	0.999
X5	0.997	X11	0.000
X6	0.997	Const	0.992

Table 4.

Value Sigs of logistic regression model.

In Table 4, the three variables X1, X8, and X11 in bold face have statistical significance in classifying the two groups (I) and (NI) at 5% level, so we use them to classify TGD.

Applying the PPC algorithm for cases of one variable, two variables, and three variables with all prior probabilities, we obtain the results given in Table 5.

Table 5 shows that the correct probability is high, in which BayesC always gives the best result in all three cases of variables. BayesC gives the almost exact result with three variables. We also compare BayesC with existing methods (Fisher, SWM, and logistic) for all the above three cases. All cases show that BayesC is more advantageous than others in reducing Bayes error.

Cases	Variables	BayesU	BayesR	BayesL	BayesC
One variable	X1	91.13	97.47	97.46	97.97
	X8	90.72	98.51	98.50	98.65
	X11	90.53	97.48	97.47	98.19
Two variables	X1, X8	98.73	98.77	98.77	99.78
	X1, X11	98.11	98.65	97.65	99.44
	X8, X11	98.71	98.77	98.77	99.82
Three variables	X1, X8, X11	98.35	98.89	98.89	99.96

Table 5.

The correct probability (%) in classifying TGD by Bayesian method from training set.

Using the best results for each case of methods from Table 6, classifying for test set (1131 elements), we have the results given in Table 7.

Methods	One variable	Two variables	Three variables
Logistic	93.90	93.90	93.90
Fisher	72.30	73.60	71.70
SVM	93.87	93.87	93.87
BayesC	98.65	99.82	99.96

Table 6.

The correct probability (%) for optimal models of methods in classifying TGD.

From Table 7, we see that with the test set, BayesC also gives the best result.

Methods	Correct numbers	False numbers	Correct probability
Logistic	835	296	73.8
Fisher	835	296	73.8
SVM	1062	69	90.9
BayesC	1062	69	93.9

Table 7.

Compare the correct probability (%) in classifying TGD from test set.

Application 3. This application considers the problem of repaying bank debt (RBD) by customers. In bank credit operations, determining the repayment ability of customers is really important. If the lending is too easy, the bank may have bad debt problems. In contrast, the bank will miss a good business. Therefore, in the current years, the classification of credit application on assessing the ability to repay bank debt has been specially studied and has been a difficult problem in Vietnam. In this section, we appraise this ability of companies in Can Tho city (CTC), Vietnam by using the proposed approach. We collect a data on 214 enterprises operating in key sectors as agriculture, industry, and commerce, including 143 cases of good debt (G) and 71 cases of bad debt (B). Data are provided by responsible organizations of CTC. Each company is evaluated by 13 independent variables in the expert opinion. The specific variables are given in Table 8.

Xi	Independent variables	Detail
X1	Financial leverage	Total debt/total equity
X2	Reinvestment	Total debt/total equity
X3	Roe	Net profit/equity
X4	Interest	(Net income + depreciation)/total assets
X5	Floating capital	(Current assets − current liabilities)/total assets
X6	Liquidity	(Cash + Short-term investments)/current liabilities
X7	Profits	Net profit/total assets
X8	Ability	Net sales/Total assets
X9	Size	Logarithm of total assets
X10	Experience	Years in business activity
X11	Agriculture	Agricultural and forestry sector
X12	Industry	Industry and construction
X13	Commerce	Trade and services

Table 8.

The surveyed independent variables.

Because of sensitive problem, author has to conceal real data and use training data set. The steps to perform in this application are similar as in Application 2. Training set has 100 elements belonging to group G and 50 elements belonging to group B, and the test set has 43 elements belonging to group G and 21 elements belonging to group B. With training set, the logistic regression model shows only three variables X1, X4, and X7 have statistical significance at 5% level, so we use these three variables to perform BayesU, BayesR, BayesL, and BayesC. Their results are given in Table 9.

Cases variables		BayesU	BayesR	BayesL	BayesC
One variable	X1	86.21	86.14	84.13	87.13
	X4	81.12	82.91	86.16	88.19
	X7	83.21	84.63	83.14	84.52
Two variables	X1, X4	87.25	88.72	87.19	89.06
	X1, X7	88.16	88.34	83.26	89.56
	X4, X7	89.25	89.04	89.02	91.34
Three variables	X1, X5, X7	91.15	91.53	90.17	93.18

Table 9.

The correct probability (%) in classifying RBD by Bayesian method from training set.

From Table 9, we see that BayesC gives the highest probability in all the cases. We also use logistic method, Fisher, and SVM with training set to find the best results. We have the correct probability given in Table 10.

Using the best model for each case of methods from Table 10 to classify the test set (67 elements), we obtain the results given in Table 11.

Methods	One variable	Two variables	Three variables
Logistic	84.04	88.29	88.69
Fisher	84.73	80.73	79.32
SWM	82.34	82.03	83.07
BayesC	88.19	91.34	93.18

Table 10.

The correct probability (%) for optimal models of methods in classifying RBD.

Once again from Table 11, we see that with test data, BayesC also gives the best result.

Methods	Correct numbers	False numbers	Correct probability
Logistic	53	11	82.81
Fisher	52	12	81.25
SVM	53	11	82.81
BayesC	57	7	89.06

Table 11.

Compare the correct probability (%) in classifying RBD from test set.

5. Conclusion

This chapter presents the classification algorithm by Bayesian method in both theory and application aspect. We establish the relations of Bayes error with other measures and consider the problem to compute it in real application for one and multidimensions. An algorithm to determine the prior probabilities which may decrease Bayes error is proposed. The researched problems are applied in three different domains: biology, medicine, and economics. They show that the proposed approach has more advantages than existing ones. In addition, a complete procedure on MATLAB software is completed and is effectively used in some real applications. These examples show that our works present potential applications for research on real problems.

References

1. Altman DG. Statistics in medical journals: Development in 1980s. Statistical Medicine. 1991;10:546-551. DOI: 10.1002/sim.4780101206
2. Bora DJ, Gupta AK. Impact of exponent parameter value for the partition matrix on the performance of fuzzy C means Algorithm. International Journal of Scientific Research in Computer Science Applications and Management Studies. 2014;3:1-6. DOI: arXiv:1406.4007
3. Cristiani S, Shawe TJ. An introduction to support vector machines and other kernel-based learning method. 2nd ed. London: Cambridge University; 2000. p. 204. DOI: 10.1108/k.2001.30.1.103.6
4. Cannon RL, Dave JV, Bezdek JC. Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1986;2:248-255. DOI: 10.1109/TPAMI.1986.4767778
5. Fadili MJ, et al. On the number of clusters and the fuzziness index for unsupervised FCA application to BOLD fMRI time series. Medical Image Analysis. 2001;5(1):55-67. DOI: 10.1016/S1361-8415(00)00035-9
6. Fisher RA. The statistical utilization of multiple measurements. Annals of Eugenic. 1936;7:376-386. DOI: 10.1111/j.1469-1809.1938.tb02189
7. Ghosh AK. Classification using kernel density estimates. Technometrics. 2006;48:120-132. DOI: 10.1198/004017005000000391
8. Jan YK, Cheng CW, Shih YH. Application of logistic regression analysis of home mortgage loan prepayment and default. ICIC Express Letters. 2010;2:325-331. DOI: 325-331. 10.12783/ijss.2015.03.014
9. Hall LO, et al. A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain. IEEE Transactions on Neural Networks. 1992;3(5):672-682. DOI: 10.1109/72.159057
10. Martinez WL, Martinez AR. Computational statistics handbook with Matlab. 1st ed. Boca Raton: CRC Press; 2007. DOI: 1198/tech.2002.s89
11. Matusita K. On the notion of affinity of several distributions and some of its applications. Annals of the institute of Statistical Mathematics. 1967;19:181-192. DOI: 10.1007/BF02481018
12. Marta E. Application of Fisher's method to materials that only release water at high temperatures. Portugaliae Etecfochlmlca Acta. 2001;15:301-311. DOI: 10.1016/S0167-7152(02)00310-3
13. McLachlan GJ, Basford KE. Mixture Models: Inference and Applications to Clustering. 1st ed. New York: Marcel Dekker; 1988. DOI: 10.2307/2348072
14. Miller G, Inkret WC, Little TT. Bayesian prior probability distributions for internal dosimetry. Radiation Protection Dosimetry. 2001;94:347-352. DOI: 10.1093/oxfordjournals.rpd.a006509
15. Pham-Gia T, Turkkan T. Bounds for the Bayes error in classification: A Bayesian approach using discriminant analysis. Statistical Methods and Applications. 2006;16:7-26. DOI: 10.1007/s10260-006-0012-x
16. Pham–Gia T, Turkkan N, Bekker A. Bayesian analysis in the L1–norm of the mixing proportion using discriminant analysis. Metrika. 2008;64:1-22. DOI: 10.1007/s00184-006-0027-1
17. Pham–Gia T, Turkkan N, Tai VV. Statistical discrimination analysis using the maximum function. Communications in Statistics-Simulation and Computation. 2008;37:320-336. DOI: 10.1080/03610910701790475
18. Pham–Gia T, Nhat ND, Phong, NV. Statistical classification using the maximum function. Open Journal of Statistics. 2015;15:665-679. DOI: 10.4236/ojs.2015.57068
19. Scott DW. Multivariate density estimation: Theory, practice, and visualization. 1st ed. New York: Wiley; 1992. DOI: 10.1002/9780470316849
20. Silverman BW. Density Estimation for Statistics and Data Analysis. 1st ed. Boca Raton: CRC Press; 1986. DOI: 10.1007/978-1-4899-3324-9
21. Tai VV, Pham–Gia T. Clustering probability distributions. Journal of Applied Statistics. 2010;37:1891-1910. DOI: 10.1080/02664760903186049
22. Tai VV. L¹–distance and classification problem by Bayesian method. Journal of Applied Statistics. 2017; 44(3):385-401. DOI: 10.1080/02664763.2016.1174194
23. Tai VV, Thao NT, Ha CN. The prior probability in classifying two populations by Bayesian method. Applied Mathematics Engineering and Reliability. 2016;6:35-40. DOI: 10.1201/b21348-7
24. Thao NT, Tai VV. Fuzzy clustering of probability density functions. Journal of Applied Statistics. 2017;44(4):583-601. DOI: 0.1080/02664763.2016.117750
25. Thao NT, Tai VV. A new approach for determining the prior probabilities in the classification problem by Bayesian method. Advances in data analysis and classification. Forthcoming. DOI: 10.1007/s11634-016-0253
26. Toussaint GT. Some inequalities between distance: Measures for feature evaluation. IEEE Transactions on Computers. 1972;21:405-410. DOI: 10.1109/TC1972.5008990
27. Webb AR. Statistical Pattern Recognition. 1st ed. New York: Wiley; 2003. DOI: 10.1109/34.824819
28. Yu J, Cheng Q, Huang H. Analysis of the weighting exponent in the FCM. IEEE Transactions on Systems, Man, and Cybernetics, Part B. 2004;34(1):634-639

[1] 1. Altman DG. Statistics in medical journals: Development in 1980s. Statistical Medicine. 1991;10:546-551. DOI: 10.1002/sim.4780101206

[2] 2. Bora DJ, Gupta AK. Impact of exponent parameter value for the partition matrix on the performance of fuzzy C means Algorithm. International Journal of Scientific Research in Computer Science Applications and Management Studies. 2014;3:1-6. DOI: arXiv:1406.4007

[3] 3. Cristiani S, Shawe TJ. An introduction to support vector machines and other kernel-based learning method. 2nd ed. London: Cambridge University; 2000. p. 204. DOI: 10.1108/k.2001.30.1.103.6

[4] 4. Cannon RL, Dave JV, Bezdek JC. Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1986;2:248-255. DOI: 10.1109/TPAMI.1986.4767778

[5] 5. Fadili MJ, et al. On the number of clusters and the fuzziness index for unsupervised FCA application to BOLD fMRI time series. Medical Image Analysis. 2001;5(1):55-67. DOI: 10.1016/S1361-8415(00)00035-9

[6] 6. Fisher RA. The statistical utilization of multiple measurements. Annals of Eugenic. 1936;7:376-386. DOI: 10.1111/j.1469-1809.1938.tb02189

[7] 7. Ghosh AK. Classification using kernel density estimates. Technometrics. 2006;48:120-132. DOI: 10.1198/004017005000000391

[8] 8. Jan YK, Cheng CW, Shih YH. Application of logistic regression analysis of home mortgage loan prepayment and default. ICIC Express Letters. 2010;2:325-331. DOI: 325-331. 10.12783/ijss.2015.03.014

[9] 9. Hall LO, et al. A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain. IEEE Transactions on Neural Networks. 1992;3(5):672-682. DOI: 10.1109/72.159057

[10] 10. Martinez WL, Martinez AR. Computational statistics handbook with Matlab. 1st ed. Boca Raton: CRC Press; 2007. DOI: 1198/tech.2002.s89

[11] 11. Matusita K. On the notion of affinity of several distributions and some of its applications. Annals of the institute of Statistical Mathematics. 1967;19:181-192. DOI: 10.1007/BF02481018

[12] 12. Marta E. Application of Fisher's method to materials that only release water at high temperatures. Portugaliae Etecfochlmlca Acta. 2001;15:301-311. DOI: 10.1016/S0167-7152(02)00310-3

[13] 13. McLachlan GJ, Basford KE. Mixture Models: Inference and Applications to Clustering. 1st ed. New York: Marcel Dekker; 1988. DOI: 10.2307/2348072

[14] 14. Miller G, Inkret WC, Little TT. Bayesian prior probability distributions for internal dosimetry. Radiation Protection Dosimetry. 2001;94:347-352. DOI: 10.1093/oxfordjournals.rpd.a006509

[15] 15. Pham-Gia T, Turkkan T. Bounds for the Bayes error in classification: A Bayesian approach using discriminant analysis. Statistical Methods and Applications. 2006;16:7-26. DOI: 10.1007/s10260-006-0012-x

[16] 16. Pham–Gia T, Turkkan N, Bekker A. Bayesian analysis in the L1–norm of the mixing proportion using discriminant analysis. Metrika. 2008;64:1-22. DOI: 10.1007/s00184-006-0027-1

[17] 17. Pham–Gia T, Turkkan N, Tai VV. Statistical discrimination analysis using the maximum function. Communications in Statistics-Simulation and Computation. 2008;37:320-336. DOI: 10.1080/03610910701790475

[18] 18. Pham–Gia T, Nhat ND, Phong, NV. Statistical classification using the maximum function. Open Journal of Statistics. 2015;15:665-679. DOI: 10.4236/ojs.2015.57068

[19] 19. Scott DW. Multivariate density estimation: Theory, practice, and visualization. 1st ed. New York: Wiley; 1992. DOI: 10.1002/9780470316849

[20] 20. Silverman BW. Density Estimation for Statistics and Data Analysis. 1st ed. Boca Raton: CRC Press; 1986. DOI: 10.1007/978-1-4899-3324-9

[21] 21. Tai VV, Pham–Gia T. Clustering probability distributions. Journal of Applied Statistics. 2010;37:1891-1910. DOI: 10.1080/02664760903186049

[22] 22. Tai VV. L¹–distance and classification problem by Bayesian method. Journal of Applied Statistics. 2017; 44(3):385-401. DOI: 10.1080/02664763.2016.1174194

[23] 23. Tai VV, Thao NT, Ha CN. The prior probability in classifying two populations by Bayesian method. Applied Mathematics Engineering and Reliability. 2016;6:35-40. DOI: 10.1201/b21348-7

[24] 24. Thao NT, Tai VV. Fuzzy clustering of probability density functions. Journal of Applied Statistics. 2017;44(4):583-601. DOI: 0.1080/02664763.2016.117750

[25] 25. Thao NT, Tai VV. A new approach for determining the prior probabilities in the classification problem by Bayesian method. Advances in data analysis and classification. Forthcoming. DOI: 10.1007/s11634-016-0253

[26] 26. Toussaint GT. Some inequalities between distance: Measures for feature evaluation. IEEE Transactions on Computers. 1972;21:405-410. DOI: 10.1109/TC1972.5008990

[27] 27. Webb AR. Statistical Pattern Recognition. 1st ed. New York: Wiley; 2003. DOI: 10.1109/34.824819

[28] 28. Yu J, Cheng Q, Huang H. Analysis of the weighting exponent in the FCM. IEEE Transactions on Systems, Man, and Cybernetics, Part B. 2004;34(1):634-639

Classifying by Bayesian Method and Some Applications

Bayesian Inference

Abstract

Keywords

Author Information

Tai Vovan*

1. Introduction

2. Classifying by Bayesian method

2.1. Classification principle and Bayes error

2.2. Some results about Bayes error

3. Related problems in applying of Bayesian method

3.1. Prior probability

Table 1.

Table 2.

3.2. Determining Bayes error

3.3. Maximum function in the classification problem

(a) For one dimension

Figure 1.

(b) For multidimension

Figure 2.

3.4. Estimate the probability density function

4. Some applications

Table 3.

Table 4.

Table 5.

Table 6.

Table 7.

Table 8.

Table 9.

Table 10.

Table 11.

5. Conclusion

References

Hypothesis Testing for High-Dimensional Problems

Classifying by Bayesian Method and Some Applications

Bayesian Inference

Abstract

Keywords

Author Information

Tai Vovan*

1. Introduction

2. Classifying by Bayesian method

2.1. Classification principle and Bayes error

2.2. Some results about Bayes error

3. Related problems in applying of Bayesian method

3.1. Prior probability

Table 1.

Table 2.

3.2. Determining Bayes error

3.3. Maximum function in the classification problem

(a) For one dimension

Figure 1.

(b) For multidimension

Figure 2.

3.4. Estimate the probability density function

4. Some applications

Table 3.

Table 4.

Table 5.

Table 6.

Table 7.

Table 8.

Table 9.

Table 10.

Table 11.

5. Conclusion

References

Continue reading from the same book

Bayesian Inference