Quantitative Structure-Activity Relationship Modeling and Bayesian Networks: Optimality of Naive Bayes Model Quantitative Structure-Activity Relationship Modeling and Bayesian Networks: Optimality of Naive Bayes Model

Previously, computational drag design was usually based on simplified laws of molecular physics, used for calculation of ligand ’ s interaction with an active site of a protein-enzyme. However, currently, this interaction is widely estimated using some statistical properties of known ligand-protein complex properties. Such statistical properties are described by quantitative structure-activity relationships (QSAR). Bayesian networks can help us to evaluate stability of a ligand-protein complex using found statistics. Moreover, we are possible to prove optimality of Naive Bayes model that makes these evaluations simple and easy for practical realization. We prove here optimality of Naive Bayes model using as an illustration ligand-protein interaction.


Introduction
The determination within the chapter is based on a paper [1]. Bayes classifiers are broadly utilized right now for recognition, identification, and knowledge discovery. The fields of application are, for case, image processing, personalized medicine [2], chemistry (QSAR (quantitative structure-activity relationship) [3,4]; see Figure 1). The especial importance Bayes Classifiers have in Medical Diagnostics and Bioinformatics. Cogent illustrations of this can be found in the work of Raymer and colleagues [5]. Let us give some example of using QSAR from papers [3,4]: "Molecular recognition and binding performed by proteins are the background of all biochemical processes in a living cell. In particular, the usual mechanism of drug function is effective binding and inhibition of activity of a target protein. Direct modeling of molecular interactions in protein-inhibitor complexes is the basis of modern computational drug design but is an extremely complicated problem. In the current paradigm, site similarity is recognized by the existence of chemically and spatially analogous regions from binding sites. We present a novel notion of binding site local similarity based on the analysis of complete protein environments of ligand fragments. Comparison of a query protein binding site (target) against the 3D structure of another protein (analog) in complex with a ligand enables ligand fragments from the analog complex to be transferred to positions in the target site, so that the complete protein environments of the fragment and its image are similar. The revealed environments are similarity regions and the fragments transferred to the target site are considered as binding patterns. The set of such binding patterns derived from a database of analog complexes forms a cloudlike structure (fragment cloud), which is a powerful tool for computational drug design." However, these Bayes classifiers have momentous property-by strange way the Naive Bayes classifier more often than not gives a decent and great description of recognition. More complex models of Bayes classifier cannot progress it significantly [1]. In the paper [6] creators clarify this exceptional property. In any case, they utilize a few suspicions (zero-one misfortune) which diminish all-inclusiveness and simplification of this proof. We allow in this chapter a common verification of Naive Bayes classifier optimality. The induction within the current chapter is comparative to [1]. The consequent attractive consideration of Naive Bayes classifier optimality problem was made in [7,8]. Be that as it may, shockingly these papers do not incorporate any investigation of the past one [1].
We would like to prove Naive Bayes classifier optimality using QSAR terminology. Indeed, we use QSAR only for clearness; the proof is correct for any field of use of Naive Bayes classifier. Let us define the essential issue that we attempt to unravel within the chapter. Assume that we have a set of states for a complex of ligand-active site of protein and a set of factors that characterize these states. For each state, we know the likelihood dispersion for each factor. In any case, we have no data of the approximate relationships of the factors. Presently, assume that we know factor values for some test of the state. What is the probability that this test corresponds to some state? It could be a commonplace issue of recognition over a condition of incomplete data.
In the simplest case, we can define two states for "ligand-active site of protein" complex. It is 0 (ligand is not bound to active site of protein) or 1 (ligand is not bound to active site of protein). The next step is definition of factors (reliabilities below) that characterize strength of a bond for "ligand-active site of protein" complex. Let us grant an illustration of factors (reliabilities below) from experience of QSAR in papers [3,4]: "First, consider the protein 5 A -environment A = {a 1 , a 2 ,…a N } of one ligand atom X in the analog protein, that is, all atoms from the binding site that are in the 5 A -neighborhood of X. Suppose that the complete target binding site T consists of N 0 atoms: T = {t 1 , t 2 ,…t N' } and there exists a subset T 0 ⊆ T of size n (N 0 ≥ n ≥ 4) such that n atoms from T 0 are similar to n atoms A 0 = {a i1 , a i2 ,…a in } ⊆ A in their chemical types and spatial arrangement. The search for A 0 and T 0 is performed using a standard clique detection technique in the graph whose nodes represent pairs (a i , t i ) of chemically equivalent atoms and edges reflect similarity of corresponding pairwise distances. If the search is successful, the optimal rigid motion superimposing matched protein atoms is applied both to the initial ligand atom X and its complete environment A (Figure 2(a) in [3]). The atoms are thus transferred to the target binding site. Then we extend the matching between A 0 and T 0 by such atom pairs (a i ,t i ) that a i and t i have the same chemical atom type in the coarser 10-type typification mentioned above, and the distance between t i and the image a 0 i of atom a i is below a threshold. Next, a reliability value R, with 0 ≤ R ≤ 1, is assigned to the image X 0 of X in the target site and reflects the similarity between the environments of X and its image X 0 . If the environments are highly similar (R ≈ 1) we expect that the position of X 0 is the place where an atom with chemical type identical to X can be bound by the target, since the environment of X 0 contains only atoms required for binding with no "alien" atoms. However, as illustrated in Figure 2(a) in [3], the analog site may contain extra binding atoms (shown on the lower side) that decrease the reliability value. In a simple form, the reliability R can be defined as the sum of the number of matched atoms divided by the total number of analog and target atoms in the 5 A -environments of X and X 0 , respectively ( Figure 2(b) in [3]): R = 2n/(N + N 0 ), using the notation presented above. In fact, we use a somewhat more complicated definition that accounts for the quality of spatial superposition of matched atoms and their distance from X 0 ." We do not want to discuss here these definitions for these factors and states. Our purpose is not the demonstration of effectiveness of these definitions or effectiveness of QSAR. The interested reader can learn it from papers [3,4] and references inside of these papers. As we said above, we use QSAR only for clearness; the proof is correct for any field of use of Naive Bayes classifier.
Let us consider the case when no relationships exist between reliabilities. In this case, the Naive Bayes model is a correct arrangement of the issue. We demonstrate in this chapter that for the case that we don't know relationships between reliabilities even approximately-the Naive Bayes model is not correct, but ideal arrangement in a few senses. More point by point, we demonstrate that the Naive Bayes model gives minimal mean error over all conceivable models of relationship. We assume in this confirmation that all relationship models have the same likelihood. We think that this result can clarify the depicted over secretive optimality of Naive Bayes model.
The Chapter is built as described in the following statements. We grant correct numerical description of the issue for two states and two reliabilities in Section 2. We characterize our notations in Section 3. We define general form of conditional likelihood for all conceivable relationships of our reliabilities in Section 4. We characterize the limitations of the functions depicting the relationships in Section 5. We find the formula for an interval between two models of probability (correlation) in Section 6. We discover constraints for our fundamental functions in Section 7. We illuminate our primary issue; we demonstrate Naive Bayes model's optimality for uniform distribution of all conceivable relationships in Section 8. We discover mean error between the Naive Bayes model and a genuine model for uniform distribution of all conceivable relationships in Section 9. We consider the case of more than two states and reliabilities in Section 10. We make conclusions in Section 11.

Definition of the task
Suppose that A is a state for "ligand-active site of protein" complex. It is 0 (ligand is not bound to active site of protein) or 1 (ligand is not bound to active site of protein). Accept that the apriori likelihood P A ð Þ ¼ P A ¼ 1 ð Þis known, and indicate it by θ. Let X 1 , X 2 be two reliability values (defined above), with values in a set 0; 1 ½ . However, for generality, we will define X 1 , X 2 in a set [À∞;+∞], but probability density to find X 1 , X 2 in [À∞; 0] or [1;+∞] is equal to zero. We have the taking after data: X 1 ¼ x 1 and X 2 ¼ x 2 (gotten through estimation). Moreover, we have two functions, "classifiers," which for given x 1 and x 2 give us We want to find the likelihood in terms of α, β and θ. More particularly we wish to discover a function Γ opt α; β; θ À Á which on the average is the most excellent estimation for P A=x 1 ; x 2 ð Þin a sense to be characterized expressly within the following consideration (see Figure 2).

Notation and preliminaries
ð Þ-joint PDF (probability density function) for X 1 and X 2 .
We can find Let us say that if X 1 and X 2 are conditionally independent, i.e., Let us define the following monotonously nondecreasing probability distribution functions: ð Þ are monotonously increasing. However, such limitation will be unnecessary as we will see within the following conclusion.), the inverse functions H À1 To be brief, let us use the following concise designation: We currently obtain As a result from Eqs. (2) and (3) Note, that for values of J ¼ J ¼ 1 (conditional independence of x 1 and x 2 ) equation (8) becomes the exact solution for the optimal model: As a result Thus, we obtain the following condition: and similarly Similarly, we can get All the solutions of Eqs. (11)-(15) together with (8) can define the set of all possible realizations of P A=x 1 ; x 2 ð Þ .

Constraints for basic functions
We will consider further all functions with arguments 1 ≥ F 1 , F 2 ≥ 0, but not x 1 , x 2 . We have six functions of F 1 , F 2 , which define Eq. (16): J, J, H 1 , H 2 , α, β. Let us to write the functions by help these functions (F 1 , F 2 ) and find restrictions for these functions: By the same way We know that functions H 1 , F 1 , H 2 , F 2 are cumulative distribution functions of x 1 ,x 2 , correspondingly. These functions are monotonously nondecreasing and change from 0 to 1 from the definition of cumulative distribution functions. Therefore, we can conclude the following restraints for functions H 1 , H 2 as functions of F 1 , F 2 exist: By the same way

Optimization
We shall find the best approximation of Γ α; β; θ À Á as follows: where the expected value (or expectation or mathematical expectation or mean or the first moment) E … ½ is taken with respect to the joint PDF of possible realizations of J, J, α, β, H 1 , H 2 for given F 1 and F 2 .
For the sake of brevity, we denote Thus It remains to calculate the expected value in Eq. (19).
We have by obvious assumptions r J, J, α, β, H1H2=F1, F2 J; J; α; β; Lemma 1 Proof: We can take into the consideration the function r J a;b ð Þ=a, b . The domain of the function J a; b ð Þis square 0 ≤ a, b ≤ 1. By dividing this square into small squares i; j ð Þ, we can get sampling of the function J. Then, on every square i, j, we can define the value of the function J ij . We can write the following restraints for function J * * * ð Þ: All matrixes J ij that satisfy the above conditions have the same probability. So we can define probability density function r J 11 ; …; J ij ; …; J NN : This density function should be symmetric according to transpositions of columns and rows of the matrix J ij , because the density function has the same probability for all matrixes J ij that satisfy the above conditions. Indeed, these conditions are also symmetric according to transpositions of columns and rows of matrix J ij . From symmetry conditions that define this function r ð Þ according to transpositions of columns and rows of matrix J ij , it is possible to conclude that this function r ð Þ also does not transform according to these transpositions.
We can consider function r u=ij u=ij ð Þthat is a discretization of the function r J a;b ð Þ=a, b J a; b ð Þ=a; b ð Þ : We can transpose columns and rows J ij in such a way that element J ij will be replaced by the other element J nk , and after it the function r J 11 ; … ð Þwill not be transformed. So from the above equation, we can get From this equation we can conclude that r u=ij u=ij ð Þ does not depend on ij so r J=a, b J=a; b ð Þdoes not depend on ab and Quantitative Structure-Activity Relationship Modeling and Bayesian Networks: Optimality of Naive Bayes Model http://dx.doi.org/10.5772/intechopen.85976 From we can conclude that So we can obtain that Const ¼ 1 in Eq. (21).
Lemma 2: Probability distribution functions α and β do not depend on F 1 and F 2 : Proof: Let us make sampling of the function α F 1 ð Þ by dividing the domain of this function All columns α k ð Þ that are satisfied by these conditions have equal probability. We can consider respective function r α 1 ; …; α k ; …; α l ; …; α N ð Þ . From symmetry conditions that define this function according to transpositions α k ! α l , function r α 1 ; …; α k ; …; α l ; …; α N ð Þ also does not transform according to such transpositions. As a result, it is possible to write From this equation, we can conclude that function r α=F1 α=F 1 ð Þdoes not depend on F 1 : It remains to find Since if the expression in square brackets is minimized at each point, then the whole integral in Eq. (22) is minimized. Thus, we may proceed as follows: Hence the optimum Γ α; β; θ À Á is given by 9. Mean distance between the proposed approximation of P A=x 1 ; x 2 ð ÞÀΓ α; β; θ À Á and the actual function P A=x 1 ; x 2 ð Þ The mean distance from (18) is where Const in this equation is defined by From this equation we can find boundaries of the Const. From 0 ≤ P A=x 1 ; x 2 ð Þ≤ 1 we can conclude The second condition is So from these two equations, we can conclude In the next step, we would like find function r α α ð Þ (r β β À Á ) in the equation for DIS.
Restrictions for function α F 1 ð Þ, 0 ≤ F 1 ≤ 1 are the following: Let us define a function U α set ð Þ in the following way: Then the function that satisfies equal probability distribution with considering restrictions (i) and (ii) is the following: here δ is the Dirac delta function.
We can define the constant C by ð þ∞ À∞ … ð þ∞ À∞ r αset α set ð Þdα 1 …dα N ¼ 1: It can be proved for N ! ∞ that distribution (23) is equal to the following distribution (from "statistical mechanics" [9]; transform from microcanonical to canonical distribution): Here we can find Z and K from the following equations: Quest function r α α ð Þ can be found by where D N ¼ Z: From Eqs. (24) and (25), we can find where Λ K ð Þ is the decreasing function If K is the root of Eq. (29), we can write from Eqs. (26)-(29) for function r α α ð Þ r α α ð Þ ¼ For where 2 10. The case of more than two states A and reliabilities X Let A be a state, with values in set 0, 1, …, L. This number can characterize strength of a bond. Assume that the apriori probability P A ¼ i ð Þis known, and denote it by θ i ; here i ¼ 1, …, L. Let X 1 , …, X K be random variables, with values in some set, say À ∞; þ ∞½. We have the following information: X 1 ¼ x 1 ,...,X K ¼ x K (obtained through measurement). Furthermore, we have systems, "classifiers," which for given x 1 ,...,x K produce We want to find the probability Þ in terms of α ij and θ i . In more detail, we want to find a function Γ opt, M α ij ; θ i À Á , which is the best approximation for P A ¼ M=x 1 ; …; x K ð Þ on the average. By the same way, in case of two variables, it is possible to find that the Γ opt, M α ij ; θ i À Á can be defined by the following equation: We have evidential restraints for α ij ,θ i Quantitative Structure-Activity Relationship Modeling and Bayesian Networks: Optimality of Naive Bayes Model http://dx.doi.org/10.5772/intechopen.85976

Conclusions
Using as an illustration the QSAR, we demonstrated effectively that the Naive Bayes model gives minimal mean error over uniform dispersion of all conceivable relationships between characteristic reliabilities. This result can clarify the portrayed over secretive optimality of Naive Bayes model. We too found the mean error that the Naive Bayes model gives for uniform distribution of all conceivable relationships of reliabilities.
Medicinal chemistry (quantitative structure-activity relationships, QSAR) prediction increasingly relies on Bayesian network-based methods. Its importance derives partly from the difficulty and inaccuracies of present quantum chemical models (e.g., in SYBYL and other software) and from the impracticality of sufficient characterization of structure of drug molecules and receptor active sites, including vicinal waters in and around hydrophobic pockets in active sites. This is particularly so for biologicals (protein and nucleic acid APIs (nucleic acid active pharmaceutical ingredients)) and target applications that exhibit extensive interreceptor trafficking, genomic polymorphisms, and other system biology phenomena. The effectiveness and accuracy of Bayesian methods for drug development likewise depend on certain prerequisites, such as an adequate distance metric by which to measure similarity/ difference between combinatorial library molecules and known successful ligand molecules targeting a particular receptor and addressing a particular clinical indication. In this connection, the distance metric proposed in Section 6 of the chapter manuscript and the associated Lemmas and Proofs are of substantial value in the future of high-throughput screening (HTS) and medicinal chemistry.
However, our purpose here was not demonstration of effectiveness of these definitions or effectiveness of QSAR. The interested reader can learn it from papers [3,4] and references inside of these papers. As we said above, we use QSAR only for clearness; the proof is correct for any field of use of Naive Bayes classifier.