Open access peer-reviewed chapter

Quantitative Structure-Activity Relationship Modeling and Bayesian Networks: Optimality of Naive Bayes Model

By Oleg Kupervasser

Submitted: June 4th 2018Reviewed: March 20th 2019Published: May 29th 2019

DOI: 10.5772/intechopen.85976

Downloaded: 828


Previously, computational drag design was usually based on simplified laws of molecular physics, used for calculation of ligand’s interaction with an active site of a protein-enzyme. However, currently, this interaction is widely estimated using some statistical properties of known ligand-protein complex properties. Such statistical properties are described by quantitative structure-activity relationships (QSAR). Bayesian networks can help us to evaluate stability of a ligand-protein complex using found statistics. Moreover, we are possible to prove optimality of Naive Bayes model that makes these evaluations simple and easy for practical realization. We prove here optimality of Naive Bayes model using as an illustration ligand-protein interaction.


  • quantitative structure-activity relationship
  • Naive Bayes model
  • optimality
  • Bayes classifier
  • Bayesian networks
  • protein-ligand complex
  • computational drag design
  • molecular recognition and binding
  • ligand-active site of protein
  • likelihood
  • probability

1. Introduction

The determination within the chapter is based on a paper [1]. Bayes classifiers are broadly utilized right now for recognition, identification, and knowledge discovery. The fields of application are, for case, image processing, personalized medicine [2], chemistry (QSAR (quantitative structure-activity relationship) [3, 4]; see Figure 1). The especial importance Bayes Classifiers have in Medical Diagnostics and Bioinformatics. Cogent illustrations of this can be found in the work of Raymer and colleagues [5].

Figure 1.

Quantitative structure-activity relationship.

Let us give some example of using QSAR from papers [3, 4]:

“Molecular recognition and binding performed by proteins are the background of all biochemical processes in a living cell. In particular, the usual mechanism of drug function is effective binding and inhibition of activity of a target protein. Direct modeling of molecular interactions in protein-inhibitor complexes is the basis of modern computational drug design but is an extremely complicated problem. In the current paradigm, site similarity is recognized by the existence of chemically and spatially analogous regions from binding sites. We present a novel notion of binding site local similarity based on the analysis of complete protein environments of ligand fragments. Comparison of a query protein binding site (target) against the 3D structure of another protein (analog) in complex with a ligand enables ligand fragments from the analog complex to be transferred to positions in the target site, so that the complete protein environments of the fragment and its image are similar. The revealed environments are similarity regions and the fragments transferred to the target site are considered as binding patterns. The set of such binding patterns derived from a database of analog complexes forms a cloudlike structure (fragment cloud), which is a powerful tool for computational drug design.”

However, these Bayes classifiers have momentous property—by strange way the Naive Bayes classifier more often than not gives a decent and great description of recognition. More complex models of Bayes classifier cannot progress it significantly [1]. In the paper [6] creators clarify this exceptional property. In any case, they utilize a few suspicions (zero–one misfortune) which diminish all-inclusiveness and simplification of this proof. We allow in this chapter a common verification of Naive Bayes classifier optimality. The induction within the current chapter is comparative to [1]. The consequent attractive consideration of Naive Bayes classifier optimality problem was made in [7, 8]. Be that as it may, shockingly these papers do not incorporate any investigation of the past one [1].

We would like to prove Naive Bayes classifier optimality using QSAR terminology. Indeed, we use QSAR only for clearness; the proof is correct for any field of use of Naive Bayes classifier.

Let us define the essential issue that we attempt to unravel within the chapter. Assume that we have a set of states for a complex of ligand-active site of protein and a set of factors that characterize these states. For each state, we know the likelihood dispersion for each factor. In any case, we have no data of the approximate relationships of the factors. Presently, assume that we know factor values for some test of the state. What is the probability that this test corresponds to some state? It could be a commonplace issue of recognition over a condition of incomplete data.

In the simplest case, we can define two states for “ligand-active site of protein” complex. It is 0 (ligand is not bound to active site of protein) or 1 (ligand is not bound to active site of protein).

The next step is definition of factors (reliabilities below) that characterize strength of a bond for “ligand-active site of protein” complex. Let us grant an illustration of factors (reliabilities below) from experience of QSAR in papers [3, 4]:

“First, consider the protein 5 A°-environment A = {a1, a2,…aN} of one ligand atom X in the analog protein, that is, all atoms from the binding site that are in the 5 A°-neighborhood of X. Suppose that the complete target binding site T consists of N′ atoms: T = {t1, t2,…tN’} and there exists a subset T0T of size n (N′ ≥ n ≥ 4) such that n atoms from T0 are similar to n atoms A0 = {ai1, ai2,…ain} A in their chemical types and spatial arrangement. The search for A0 and T0 is performed using a standard clique detection technique in the graph whose nodes represent pairs (ai, ti) of chemically equivalent atoms and edges reflect similarity of corresponding pairwise distances. If the search is successful, the optimal rigid motion superimposing matched protein atoms is applied both to the initial ligand atom X and its complete environment A (Figure 2(a)in [3]). The atoms are thus transferred to the target binding site. Then we extend the matching between A0 and T0 by such atom pairs (ai,ti) that ai and ti have the same chemical atom type in the coarser 10-type typification mentioned above, and the distance between ti and the image a′i of atom ai is below a threshold. Next, a reliability value R, with 0 ≤ R ≤ 1, is assigned to the image X′ of X in the target site and reflects the similarity between the environments of X and its image X′. If the environments are highly similar (R ≈ 1) we expect that the position of X′ is the place where an atom with chemical type identical to X can be bound by the target, since the environment of X′ contains only atoms required for binding with no “alien” atoms. However, as illustrated in Figure 2(a) in [3], the analog site may contain extra binding atoms (shown on the lower side) that decrease the reliability value. In a simple form, the reliability R can be defined as the sum of the number of matched atoms divided by the total number of analog and target atoms in the 5 A°-environments of X and X′, respectively (Figure 2(b) in [3]):

Figure 2.


R = 2n/(N + N′), using the notation presented above. In fact, we use a somewhat more complicated definition that accounts for the quality of spatial superposition of matched atoms and their distance from X′.”

We do not want to discuss here these definitions for these factors and states. Our purpose is not the demonstration of effectiveness of these definitions or effectiveness of QSAR. The interested reader can learn it from papers [3, 4] and references inside of these papers. As we said above, we use QSAR only for clearness; the proof is correct for any field of use of Naive Bayes classifier.

Let us consider the case when no relationships exist between reliabilities. In this case, the Naive Bayes model is a correct arrangement of the issue. We demonstrate in this chapter that for the case that we don’t know relationships between reliabilities even approximately—the Naive Bayes model is not correct, but ideal arrangement in a few senses. More point by point, we demonstrate that the Naive Bayes model gives minimal mean error over all conceivable models of relationship. We assume in this confirmation that all relationship models have the same likelihood. We think that this result can clarify the depicted over secretive optimality of Naive Bayes model.

The Chapter is built as described in the following statements. We grant correct numerical description of the issue for two states and two reliabilities in Section 2. We characterize our notations in Section 3. We define general form of conditional likelihood for all conceivable relationships of our reliabilities in Section 4. We characterize the limitations of the functions depicting the relationships in Section 5. We find the formula for an interval between two models of probability (correlation) in Section 6. We discover constraints for our fundamental functions in Section 7. We illuminate our primary issue; we demonstrate Naive Bayes model’s optimality for uniform distribution of all conceivable relationships in Section 8. We discover mean error between the Naive Bayes model and a genuine model for uniform distribution of all conceivable relationships in Section 9. We consider the case of more than two states and reliabilities in Section 10. We make conclusions in Section 11.


2. Definition of the task

Suppose that A is a state for “ligand-active site of protein” complex. It is 0 (ligand is not bound to active site of protein) or 1 (ligand is not bound to active site of protein). Accept that the apriorilikelihood PA=PA=1is known, and indicate it by θ. Let X1,X2be two reliability values (defined above), with values in a set 01. However, for generality, we will define X1,X2in a set [−∞;+∞], but probability density to find X1,X2in [−∞; 0] or [1;+∞] is equal to zero. We have the taking after data: X1=x1and X2=x2(gotten through estimation). Moreover, we have two functions, “classifiers,” which for given x1and x2give us


We want to find the likelihood


in terms of α,βand θ. More particularly we wish to discover a function Γoptαβθwhich on the average is the most excellent estimation for PA/x1x2in a sense to be characterized expressly within the following consideration (see Figure 2).


3. Notation and preliminaries

ρX1,X2x1x2—joint PDF (probability density function) for X1and X2.

ρX1,X2/Ax1x2=hx1x2—joint PDF for X1andX2, known A=1. In terms of hx1x2and θ, we can find PA/x1x2as follows:



h¯x1x2ρX1,X2/A¯x1x2—joint PDF for X1and X2, known A=0.

We can find


4. Generic form of PA/x1x2

Let us define the function gx1x2and g¯x1x2


Let us say that if X1and X2are conditionally independent, i.e.,




Let us define the following monotonously nondecreasingprobability distribution functions:


Take attention that since H1x1,H2x2,H¯1x1and H¯2x2are monotonous (At this point we can assume that h1x1,h2x2,h¯1x1,h¯2x2>0so that H1x1,H2x2,H¯1x1and H¯2x2are monotonously increasing. However, such limitation will be unnecessary as we will see within the following conclusion.), the inverse functions H11x1,H21x2,H¯11x1and H¯21x2must exist. As a result, we can characterize


To be brief, let us use the following concise designation:


By the definition


We currently obtain


As a result from Eqs. (2) and (3)


Now from Eq. (1)


Note, that for values of J=J¯=1(conditional independence of x1 and x2) equation (8) becomes the exact solution for the optimal model:


5. Limitations for the functions Jaband J¯ab

We can write


As a result


Thus, we obtain the following condition:


and similarly


Similarly, we can get



Jab,J¯ab 0,E14

All the solutions of Eqs. (11)(15) together with (8) can define the set of all possible realizations of PA/x1x2.

Let us give some example of a solution of (11), (12) and (14), (15):

Let ρxbe a function such that ρx 0and 01ρxdx=1


satisfy Eqs. (11), (12), (14), and (15).


6. Definition of distance

We define the distance between the proposed approximation of PA/x1x2-Γαβθand the actual function PA/x1x2as follows:


Now we have from Eqs. (2) and (3) and Eqs. (4)(7)




7. Constraints for basic functions

We will consider further all functions with arguments 1F1,F20, but not x1,x2. We have six functions of F1,F2, which define Eq. (16): J,J¯,H1,H2,α,β. Let us to write the functions by help these functions (F1, F2) and find restrictions for these functions:


By the same way


We know that functions H1,F1,H2,F2are cumulative distribution functions of x1,x2, correspondingly. These functions are monotonously nondecreasingand change from 0 to 1 from the definition of cumulative distribution functions. Therefore, we can conclude the following restraints for functions H1,H2as functions of F1,F2exist:


By the same way


8. Optimization

We shall find the best approximation of Γαβθas follows:


where the expected value (or expectation or mathematical expectation or mean or the first moment) Eis taken with respect to the joint PDF of possible realizations of J,J¯,α,β,H1,H2for given F1and F2.

For the sake of brevity, we denote


Then from Eqs. (17) and (16)




It remains to calculate the expected value in Eq. (19).

We have by obvious assumptions


Lemma 1


Proof: We can take into the consideration the function ρJab/a,b. The domain of the function Jabis square 0a,b1. By dividing this square into small squares ij, we can get sampling of the function J. Then, on every square i,j, we can define the value of the function Jij. We can write the following restraints for function J:


Here i=1,,N, j=1,,N.

All matrixes Jijthat satisfy the above conditions have the same probability. So we can define probability density function


This density function should be symmetric according to transpositions of columns and rows of the matrix Jij, because the density function has the same probability for all matrixes Jijthat satisfy the above conditions. Indeed, these conditions are also symmetric according to transpositions of columns and rows of matrix Jij. From symmetry conditions that define this function ρaccording to transpositions of columns and rows of matrix Jij, it is possible to conclude that this function ρalso does not transform according to these transpositions.

We can consider function ρu/iju/ijthat is a discretization of the function ρJab/a,bJab/ab:


We can transpose columns and rows Jijin such a way that element Jijwill be replaced by the other element Jnk, and after it the function ρJ11will not be transformed. So from the above equation, we can get


From this equation we can conclude that ρu/iju/ijdoes not depend on ijso ρJ/a,bJ/abdoes not depend on aband






we can conclude that


So we can obtain that Const=1in Eq. (21).

Lemma 2: Probability distribution functions αand βdo not depend on F1and F2:


Proof: Let us make sampling of the function αF1by dividing the domain of this function F1,01on intervals of 1/N,N1. Then restriction conditions for αk,k=1,,N


All columns αkthat are satisfied by these conditions have equal probability. We can consider respective function ρα1αkαlαN. From symmetry conditions that define this function according to transpositions αkαl, function ρα1αkαlαNalso does not transform according to such transpositions. As a result, it is possible to write


From this equation, we can conclude that function ρα/F1α/F1does not depend on F1:


From (20) we obtain


Let us define


By Lemma 1, EJ=EJ¯=1. Hence


It remains to find




if the expression in square brackets is minimized at each point, then the whole integral in Eq. (22) is minimized. Thus, we may proceed as follows:


Hence the optimum Γαβθis given by


9. Mean distance between the proposed approximation of PA/x1x2Γαβθand the actual function PA/x1x2

The mean distance from (18) is


where Constin this equation is defined by


From this equation we can find boundaries of the Const. From 0PA/x1x21we can conclude


The second condition is


So from these two equations, we can conclude


In the next step, we would like find function ραα(ρββ) in the equation for DIS.

Restrictions for function αF1,0F11are the following:


In discrete form (for N), we can rewrite αset=α1α2αN


Let us define a function Uαsetin the following way:

Uαset=i=1Nαifor 0αi1,i=1,2,,N+otherwise,
Uiαi=αifor 0αi1+otherwise.

Then the function that satisfies equal probability distribution with considering restrictions (i) and (ii) is the following:


here δis the Dirac delta function.

We can define the constant Cby


It can be proved for Nthat distribution (23) is equal to the following distribution (from “statistical mechanics” [9]; transform from microcanonical to canonical distribution):


Here we can find Zand Kfrom the following equations:


Quest function ρααcan be found by




From Eqs. (24) and (25), we can find


where ΛKis the decreasing function


If Kis the root of Eq. (29), we can write from Eqs. (26)(29) for function ραα

ραα=ForK=01for 0α10otherwiseαForK=+2δα0α10otherwiseαForK=2δα10α10otherwiseαForotherwiseK1De0α10otherwiseα,

where 201δα1=201δα=1and 1D=K1eK.


10. The case of more than two states Aand reliabilities X

Let A be a state, with values in set 0,1,,L. This number can characterize strength of a bond. Assume that the aprioriprobability PA=iis known, and denote it by θi; here i=1,,L. Let X1,,XKbe random variables, with values in some set, say ];+[. We have the following information: X1=x1,…,XK=xK(obtained through measurement). Furthermore, we have systems, “classifiers,” which for given x1,…,xKproduce


We want to find the probability PA=i/X1=x1XK=xKin terms of αijand θi. In more detail, we want to find a function Γopt,Mαijθi, which is the best approximation for PA=M/x1xKon the average. By the same way, in case of two variables, it is possible to find that the Γopt,Mαijθican be defined by the following equation:


We have evidential restraints forαij,θi


11. Conclusions

Using as an illustration the QSAR, we demonstrated effectively that the Naive Bayes model gives minimal mean error over uniform dispersion of all conceivable relationships between characteristic reliabilities. This result can clarify the portrayed over secretive optimality of Naive Bayes model. We too found the mean error that the Naive Bayes model gives for uniform distribution of all conceivable relationships of reliabilities.

Medicinal chemistry (quantitative structure-activity relationships, QSAR) prediction increasingly relies on Bayesian network-based methods. Its importance derives partly from the difficulty and inaccuracies of present quantum chemical models (e.g., in SYBYL and other software) and from the impracticality of sufficient characterization of structure of drug molecules and receptor active sites, including vicinal waters in and around hydrophobic pockets in active sites. This is particularly so for biologicals (protein and nucleic acid APIs (nucleic acid active pharmaceutical ingredients)) and target applications that exhibit extensive inter-receptor trafficking, genomic polymorphisms, and other system biology phenomena. The effectiveness and accuracy of Bayesian methods for drug development likewise depend on certain prerequisites, such as an adequate distance metric by which to measure similarity/difference between combinatorial library molecules and known successful ligand molecules targeting a particular receptor and addressing a particular clinical indication. In this connection, the distance metric proposed in Section 6 of the chapter manuscript and the associated Lemmas and Proofs are of substantial value in the future of high-throughput screening (HTS) and medicinal chemistry.

However, our purpose here was not demonstration of effectiveness of these definitions or effectiveness of QSAR. The interested reader can learn it from papers [3, 4] and references inside of these papers. As we said above, we use QSAR only for clearness; the proof is correct for any field of use of Naive Bayes classifier.

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Oleg Kupervasser (May 29th 2019). Quantitative Structure-Activity Relationship Modeling and Bayesian Networks: Optimality of Naive Bayes Model, Bayesian Networks - Advances and Novel Applications, Douglas McNair, IntechOpen, DOI: 10.5772/intechopen.85976. Available from:

chapter statistics

828total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Bayesian Graphical Model Application for Monetary Policy and Macroeconomic Performance in Nigeria

By David Oluseun Olayungbo

Related Book

Statistical Approaches With Emphasis on Design of Experiments Applied to Chemical Processes

Edited by Valter Silva

First chapter

Introductory Chapter: How to Use Design of Experiments Methodology to Get Most from Chemical Processes

By Valter Bruno Reis e Silva, Daniela Eusébio and João Cardoso

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us