Summary of QSAR Models for Predicting LogPS (Lanevskij et al., 2008)
In this chapter, we briefly summarize the concept of analytical methodologies used for detecting, measuring, and/or monitoring in public health research. Additionally, this chapter describes in silico ADME, ADMET and ADME/Tox approaches or models relevant to analytical methodologies for public health researchers or practitioners.
Recently, adopted technologies to cope with this type of scientific demand in terms of drug development and testing are the applications of in silico techniques used in pharmaceutical companies in the process of drug discovery (Ndibewu et al., 2012; Vallero, 2012; Lipinski et al., 1997; Leeson & Springthorpe, 2007). Two fast growing examples involve the use of cheminformatics (also known as chemoinformtaics) and chemical informatics which is the use of computer and informational techniques, applied to a range of problems in the field of chemistry (Langdon et al., 2011; Brown, 2011). These in silico techniques are employed in pharmaceutical companies in the process of drug discovery (Lipinski, 2004). These methods can also be used in chemical and allied industries in various other forms (Pradeep, 2009). Combined with the accuracy of data obtained with validated analytical methods, this encompasses the mixing of those information resources necessary to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization. The outcome is clearly the efficiency in public health research and the beneficiary is mankind. In the section that follows, we briefly summarize in silico pharmacokinetics, widely designated as ADME approaches or models relevant to analytical methodologies for public health researchers or practitioners (Yamashita & Hashida, 2004). Pharmacokinetics is the study of the time course of a drug within the body and incorporates the processes of absorption, distribution, metabolism and excretion (ADME) (Van de Waterbeemd & Gifford, 2003). The extent of distribution will depend on the structural and physicochemical properties of the compound (Fig. 1).
Figure 2 shows a simple ADME decision-making flow sheet which will incorporate predictors for volume of distribution, oral bioavailability, half-life (t1/2), distribution\protein binding module including percentage plasma protein binding values (%PPB) and drug affinity constant to human serum albumin represented by log KaHSA constants. The adaptability of this simple form has continuously refine existing models by building on larger and higher-quality data sets crucial to the success of the in silico approaches as grouped by Van de Waterbeem and Gifford (2003) in the form of problem areas for which predictive models could be helpful (Fig. 2). And during these past few years, the range of models have further expanded to include, for example, models for various transporters, metabolism by non-P450 enzymes, plasma protein binding, and so (Lu et al., 2003; Ekins et al., 2001).
Figures 2a and 2b outline the parameters in the prediction of a safe drug given in acceptable dose, which it is ultimately hoped will be reliably obtainable from molecular structure and appropriate descriptors using a suite of predictive models. This expression had earlier been made clear by Japertas and coworkers (2011).
In figure 2c (Waterbeemd & Gifford, 2003), the form a depicting the classical project-collaboration approach between chemistry, biology and drug metabolism (ADMES) groups in the 1990s is shown to b which is a much more automated world at the start of this millennium in which combinatorial chemistry (CombiChem), high throughput screening and ADME studies are linked together in a streamlined fashion.
It should be noted that all three activities shown in figure 3 can even be carried out by separate companies of research units or even researchers in health departments or for health interest. A good example is the Medical Research Council (MRC) of South Africa which supports unit health research projects towards a nationally planned and prioritized health sectors. Furthermore, the wide introduction of in silico and high-speed in vivo methods could redefine the traditional meaning of ADME to Automated Decision-Making Engine. To reflect, one sees that considerable effort has equally been devoted into the development of in silico models for the prediction of oral absorption (Veber et al., 2002; Agoram et al., 2001; Parrott & Lavé, 2002; Yu et al., 2000). This range from the simplest models based on a single descriptor, such as log P or log D, or polar surface area, which is a descriptor of hydrogen-bonding potential to combo or meta- or QSAR models.
2. Analytical methods in health research
As analytical methods become increasingly sophisticated and capable for the detection of each component in a sample (Barnes & Dourson, 1988), including biological systems (Kote-Jarai et al. 2011), it is critical to separate and quantify them. Methods such as mass spectrometry (MS) and high performance chromatography (HPLC) routinely ensure these in many laboratories around the world (Thorp et al., 2011). HPLC instrumentations provide crucial analytical data (Beitler, 1995) used to calculate or predict drug’s affinity constants (log KaHSA) (Packard et al., 1996; Endo et al., 1982). For nearly half a century, analytical and testing methods as opposed to empirical approaches have played a key role in the identification of key diseases causative toxins (Barnes & Dourson, 1988; Bathija, 2003) and drug-like compounds to cure them (Kote-Jarai et al., 2011; Moore & Carpenter, 1999).
In this process, as tons of data are being produced with the analytical chemists struggling to make sense out of the bunch, the health researcher and health practitioners are faced with a constant task to make better and faster decisions in the area of disease treatment and prevention based on laboratory results. In the midst of all this, there is the requirement not only to produce efficient drugs, in enough quantities, to cure diseases but their development at the pace at which pandemics are spreading around the globe is also required, for example, cancer and HIV & AIDS.
As far as the identification of data needs is crucial in clinical laboratories, the quest for methods to determine biomarkers of exposure and effect of diseases in the public health is also growing fast. Hence, analysis of metabolites of drugs in humans or animals can provide a biomarker of exposure that is sensitive to low levels of exposure and correlates well with exposure concentrations. Methods for determining biomarkers of exposure in humans are needed to determine background levels in the population and levels at which biological effects occur. For example, Abdel-Rahman et al. (1980a & 1980b) developed a method to quantitatively and qualitatively measure the metabolites of chlorine dioxide (e.g., ClO2-, and ClO-) in biological fluids. These biomarkers may be used to indirectly measure chlorine dioxide exposure.
In the absence of sensitive and reliable methods for determining diseases vector-borne metabolites and biomarkers of exposure, mechanistic models of tissue distribution of drug compounds have been used (Rowley et al., 1997) to assess levels at which biological effects occur in the population and mitigate disease occurrence. Poulin et al. (Poulin & Theil, 2002; Poulin & Krishna, 1995) developed tissue composition-based equations for calculating tissue-plasma partition coefficients (Pt:p).
The following expressions are used (equations 1 and 2) (Yamashita & Hashida, 2004):
Po:w is the n-octanol:buffer partition coefficient of non-ionized species at pH 7.4.
D*vo:w is the olive oil:buffer partition coefficient of both the nonionized and ionized species at pH 7.4, V is the fractional tissue volume content of neutral lipids (nl), phospholipids (ph), and water (w), t is the tissue, p is the plasma and fu is the unbound fraction.
These equations are based on the assumption that each tissue and plasma is a mixture of lipids, water and plasma proteins in which the drug can be homogeneously distributed.
The first term of these equations is based on the drug Lipophilicity-hydrophilicity balance of tissues and plasma due to their lipid and water contents, while the second term of the equation considers the binding to common proteins present in plasma and plasma interstitial space.
3. Use of In-silico techniques and chemical informatics in health research
Even though disease mapping has been done for over a hundred years, historically, the focus in health research has been on person and conventional medicinal chemistry targeting specific disease treatment with little regard for the implications of de novo molecular design costs. Also, the need for high throughput screening in drug discovery research for public health interest has surfaced as a top priority in the last decade (Norris et al., 2000; Green et al, 1974). However, the testing of much lead drug-like candidates often fails because of unsatisfactory ADME properties (Zmuidinavicius et al., 2003). In handling this challenge, ADME studies employing in silico techniques to improve the rate of success in the more costly downstream stages of drug development before clinical trials has gain tremendous interest in public health research methodologies. In this light, in silico ADME studies use various models developed for predicting ADME properties of compounds from their chemical structures and integrating them to simulate the kinetics at the organ or body levels Carlson & Segall, 2002; Podlogar et al., 2001; Ekins et al., 2001). Data-based approaches such as quantitative structure-activity relationship (QSAR) (Zhaou et al., 2001, Yoshida & Topliss, 2000), similarity searches, 3D-QSAR (Jones et al., 1996) and structure-based methods such as ligand-protein docking and pharmacophore modeling are amongst approaches currently used. In addition, several methods of integrating ADME properties to predict pharmacokinetics at the organ or body level have been studied (Agatonovic-Kustrin, 2001; Van de Waterbeemd, 2001; Ho et al., 2000; Jacoby et al., 2009; Lave et al., 1999). All these effort is to reduce the risk of late-stage attrition of drug development and to optimize screening and testing by looking at only the promising compounds. Recently, researchers have equally develop keen interest in de novo molecular design (Good et al., 1995; Clark et al., 1995), predictive modeling, graph theory, molecular similarity and diversity, virtual ligand docking, scaffold hopping (Langdon et al., 2010), multi-objective optimization Nicolaou et al., 2007), molecular descriptors, bioisosteric replacement, machine learning and evolutionary algorithms (Gillet, 2008).
In the last decade, a wide variety of descriptors used in QSAR studies have been developed (Khan et al., 2009; Miners et al., 2006). A subset of these descriptors is potentially useful for predicting ADME properties. Many QSAR studies on BBB permeation of drugs have been published recently. In the big junk of these works (Wichmann et al., 2007; Zhao et al., 2007; Cuadrado et al., 2007; Katritzky et al., 2006; Garg & Verma, 2006; Hemmateenejad et al., 2006; Narayanan & Gunturi, 2005) experimental data are represented as logBB constants or as qualitative (binary) index subdividing all compounds into ‘CNS positive’ and “CNS negative” classes according to presence or lack of central nervous system (CNS) activity. Table 1 summarizes the most notable QSAR models (logPS) representing blood/brain partitioning coefficients at equilibrium conditions (logBB)( Goodwin & Clark, 2005; Abbott, 2004) showing that typical data sets involved only 20-30 compounds.
LogPS is based on in vivo kinetic permeability measurements using intravenous administration (Oldendorf, 1971), brain uptake index (Bickel, 2005; Oldendorf, 1971) and in situ perfusion in rat or mouse (Bickel, 2005; Dagenais et al., 2000; Takasato et al., 1984;). Where, P (cm s-1) is observed permeability across BBB, whereas S (cm2/g) is surface area of brain capillary endothelium which equals to ~ 100 – 130 cm2 in rats (Abraham, 2004; Bodor & Buchwald, 1999). PS product can be calculated from Kety-Renkin-Crone equation of capillary transport (Kin = F. (1- e-PS/F)), and by its physical meaning, PS is equal to the unidirectional influx rate constant (Kin) corrected for cerebral blood flow (F). Earlier attempts of logPS prediction were largely restricted by the lack of high quality data. A subset of statistical techniques can deal with larger sets of molecular descriptors aimed at finding relationships or patterns in data sets. For examples, multiple linear regressions (MLR) and partial least square (PLS) (Norinder & Österberg, 2001).
|Abraham and coworkers10,24||Solvation parameters (A, B, E, S, Vx)||18||0.95||0.48|
|Bodor and Buchwald11||LogP(logD)||58||0.90||0.62|
|Liu et al.5||LogD, PSA, vsa base||23||0.74||0.50|
|Luco and Marchevsky25|
(review of earlier studies)
|LogP, different MW functions||7-37||0.80-0.96||-|
|This literature||LogP, HD, HA, Vx, ion fractions (pKa function)||125a||0.84||0.48|
These illustrations show how in silico chemistry or cheminformatics and high-throughput screening have increased the possibility of finding new lead compounds at much shorter time periods than conventional medicinal chemistry. With judicious selection of lead compounds and constant monitoring of physical properties (especially Lipophilicity (equation 1) or other major physchem parameter) during optimization, medicinal chemists have an opportunity to help alleviate the appalling attrition rates, estimated at 93–96% (Norris et al., 2000) in clinical drug development (Bhal et al.,. This means that physicochemical properties in small-molecule drug discovery are completely under the control of medicinal chemists and can easily be calculated before chemical synthesis. It is, however, important to emphasize here that when interpreting results from prediction models, that the predictions are only as good as the dataset used to create the model. So when we calculate a prediction, if the training set does not contain chemical structures that are similar to the particular compound in question, the predicted result may not be reliable, regardless of the actual result of the prediction.
From various literature sources (Bhal, 2007; Sazonovas et al., 2010), it is reported that ADME and toxicity prediction models can be a valuable part of many different research workflows, including virtual screening, metabolite identification, impurity analysis and chemical safety, reliability index (RI) value (0 – 1) in addition to the predicted probability result (Japertas et al., 2010) which is an indicator of how well spatial chemical space around a particular compound is represented within the training set of the model (Fig. 3)
The development of in silico technologies or chemical informatics over the last 20 years has, thus, provided a more powerful and rapid ability to examine clinical decision-making and answers to critical public research health issues based on scientifically validated models from an un biased an empirical question based on a minimally biased appraisal of all the relevant empirical studies. This, in turn, has fostered the discussion of policy of drug discovery relevant to health issues as well as health services and planning, in conjunction with the use of clinical investigations and disease surveillance. Such reviews aim to improve ethically relevant decisions in public healthcare research or policy. With this in mind, identification of a lead compound or biochemical starting point for a drug discovery program has been highlighted recently as a critically important activity, reflected by lead generation strategies being widely implemented in the pharmaceutical industry (Jupertas, 2007).
For a more robust process, calculated quantitative parameters will provide further information though slightly different from the core predictive pharmacokinetic data. These parameters show great inter relation. Such parameters include the drug’s affinity constants (log KaHSA) to human serum albumin (the major carrier protein in plasma). Experimental data come from direct chromatographic determination of binding strength to that particular protein. These parameters are usually calculated as follows (Bhal et al. 2007):
Where [LA] is the concentration of ligand bound to albumin, [L] is that of free ligand, and [A] is the concentration of free albumin which, estimated at ~ 0.6 mM in human plasma.
%PPB values represent the overall fraction of drug bound in human plasma, i.e. accounts for interactions with different proteins: albumin, α-1-acid glycoprotein, liproteins, SHBG, transortion, etc. In vitro measurements of the extent of plasma protein binding usually involve equilibrium dialysis, ultrafiltration or ultracentrifugation methods. The supplementary distribution/Vd module calculates apparent volume of distribution of drugs in human body expressed in litres per Kg body weight (L/Kg). This expression is given as: %PPB = (1-fu) x 100%, Where fu is a fraction of free (unbound) drug in plasma (0 -1). Note that predictive models for %PPB and logaHSA are derived using GALAS (Global, Adjusted Locally According to Similarity) modeling methodology consisting of global (baseline) statistical model (based on PLS with multiple bootstrapping, using a predefined set of fragmental descriptors), and local correction to baseline prediction (based on the analysis of model performance for similar compounds from the training set = self-training library). Because of the percentage (%PPB) scale platform of the overall protein binding in plasma, values are linearized prior to modeling and converted to apparent serum affinity constants, log Kapp, the main parameter used in modeling. With this, final prediction can be converted back to %PPB using the following expression (Bhal et al., 2007):
To illustrate the performance of %PPB and logaHSA models, validation set compounds within Model Applicability Domain (MAD) (RI ≥ 0.3) is taken from Bhal et al. (2012) and shown in figures 4. In equation (y), the local part of the model provides the basis for estimating reliability of prediction by means of calculated Reliability Index (RI) values that range from 0 – 1. 0 means unreliable prediction and 1 fully reliable or ideal prediction. The RI values can also be used for interpreting prediction results, for example in setting the cut-off point (RI = 0.3), meaning that if compound falls outside of the MAD, the respective prediction should be totally discarded from further analysis irrespective of %PPB and logaHSA.
In order to understand the behavior of drug compounds in the real world, Bhal et al. (2012) has used ACD/Predictors such as logP, logD, solubility, and pKa to evaluate and predict the likely behavior of 5-Methoxy-2-(1-piperidin-4-ylpropyl)pyridine (Fig. 7), a compound prior to its synthesis. Lipophilicity is represented by the descriptors logP (also known as Kow or Pow) and logD, and is used, for example, to help predict in-vivo permeability of active compounds in drug discovery and the behavior of compounds in many other areas of health related research. The partition coefficient, P, is a measure of the differential solubility of a compound in two immiscible solvents and the most commonly used commonly used solvent system is octanol-1-ol/water.
The P descriptor is a lipophilicity descriptor for neutral compounds, or where the compounds exist in a single form.and LogP = log10(Partition Coefficient)
LogP = 1 can be explained as a ratio of 10 :1 organic:aqueous phase.
For ionizable solutes, the compounds may exist as a variety of different species in each phase at a given pH. D, typically used in the logarithmic for (loD) representing the distribution coefficient which is the appropriate descriptor for ionizable compounds since it is a measure of the pH-dependent differential solubility of all species in the octanol/water system (Levin, 1980).
Bhal (2012) used methylamine to illustrate the difference between these two descriptors as follows: MeNH3+ MeNH2 + H+, meaning that,and/or . So, keeping the definition of D above in mind, we can write the expression of D as:
To accurately predict a compound’s lipophilicity based on predicted molecular physical properties, it was imperative that the author applied the correct descriptor in an appropriate manner. In this context, logD instead is used in lieu of logP as the use of the latter to address lipophilicity concerns by drug research and manufacturing companies in the past (around 1980’s and 90’s) had resulted in incorrect conclusions for ionizable compounds. A good example would be to reproduce data generated by the previous author employing ACD/LogD in his application to discuss the significance of applying logD instead of logP using drug discovery as an example where lipophilicity is correlated with in vivo permeability. Using the sample molecule in figure 6 (5-Methoxy-2-(1-piperidin-4-ylpropyl)pyridine), data presented in table were calculated.
The pH dependence of logD for this sample molecule (5-Methoxy-2-(1-piperidin-4-ylpropyl)pyridine) is shown in figure 7 showing a plot of logD versus pH, while figure 8 displays the changing ionic forms of molecule A.
Looking at the plot in figure 8, and according to Bhal el al. (2012), we can confirm that ionization of the compound greatly affects octanol-water partitioning and that lipophilicity cannot be simplified to a constant. This is very so as lipophilicity of the compound is low below pH 12 when the majority of the compound exists in an ionized form. This would definitively be contradictory, of course, if logP was examined alone (Predicted logP is 2.7 ± 0.3 by author for comparison sake).
The author concludes that the negative values of logD (-1.44 to 0) in the physiologically relevant pH range (pH 1–8) lead us to conclude that this compound would be more susceptible to higher aqueous solubility and of lower lipophilicity in the body. As a result we would expect membrane permeability to be poor. Also, it is true as seen from graph in figure 8 the neutral form of molecule A is almost non-existent at physiologically relevant pH (1–8). This neutral form possibly dominates at ~pH 13. Conclusively, compound A is highly associated with the lipid phase (>30 fold affinity for octanol over water), and thus will likely permeate biological membranes spontaneously.
Figure 9 shows a schematic representation of the changing pH environments that an orally administered compound is likely to encounter in the gastrointestinal (GI) tract. From figure 9, we can observe that there is, thus, no constant pH in the body and it is therefore essential that we consider an appropriate pH when predicting the in vivo behavior of a drug candidate.
4. Implications of analytical methods and in-silico techniques in public health
The outcomes from a global network on the development of standardized analytical methods for the public and environmental health directly impacts on the quality of health of mankind worldwide. High quality data necessitates the development of harmonized study approaches and adequate reporting of data (Bouwmeester et al., 2011).
Priority public health scale can only be based on well-characterized dose-response relations derived from a systematic study of the bio-kinetics and bio-interactions of drugs or drug-like molecules at both organism and (sub)-cellular levels using validated analytical methods and pharmacokinetic studies. The ADME (absorption, distribution, metabolism and excretion) and toxicity effects is crucial to declare a particular molecule safe for the treatment of a particular disease and often clinical trials to arrive at a conclusive release of a new drug very costly, sometimes in the range of millions of dollars covering the cost of fundamental research through clinical trials or testing to manufacturing. Multiple content databases, data mining and predictive modeling algorithms, visualization tools, and high-throughput data-analysis solutions are being integrated to form systems-ADME/Tox (Ekins et al., 2005). More so, Ekins and co-authors (2005) reported that the functional interpretation and relevance of complex multidimensional data to the phenotype observed in humans is the focus of current research in toxicology.
In fact, increased effort is needed to develop and validate analytical methods to determine ADMET effects in complex matrices such as the human body. This implies the use of validated analytical methods and in-silico chemistry to reduce time and cost at developmental stage. In addition, this would allow for systematic study of sets of drug-like molecules’ reactivities with specific aim to generate a data set which would allow the establishment of dose-response relations. This approach is commonly referred to as quantitative structure-activity relationship (QSAR) and has a practical implication in the process of in-silico algorithms optimizations. A more general but direct implications of analytical methods and in-silico techniques in public health is that international cooperation and worldwide standardization of terminology, reference materials, models and protocols are needed to make progress in establishing lists of essential globally acceptable drug formulation methods, clinical trials, quality control and assurance. In this way, traditional pharmacopeia and those exploiting natural biodiversity can benefit from proven methods at the grassroots’ levels, especially in developing countries.
Metabolomics, metabonomics, proteomics, pharmacogenomics and toxicogenomics, are groups of latest experimental approaches that are combined with high-throughput molecular screening of targets to provide a view of the complete biological system that is modulated by a compound with direct or indirect implications of analytical methods and in-silico techniques development for application in the public health sector (Ekins et al., 2005). In conclusion, it is widely recognized in industry-orientated research and development of APIs (active pharmaceutical ingredients) that predicting or determining the ADME/Tox properties of molecules would help to prevent failure of many of the compounds targeted before they reach the clinic. Many authors (Bhal et al., 2012; O’Donnell et al., 2012; Ndibewu et al., 2012; Thorp, 2011; Ekins et al., 2005) agree that this has, undoubtedly, been as a result of considerable research into developing better in-silico, in vitro and in vivo methods and models.
5. General conclusion
Driven by the changes in the working paradigm in the pharmaceutical and biotechnology, and now in environmentally health-related research, in-silico approaches will inevitable find their place. Some insilicoids have even mentioned that this approach will save the world. This is, probably, owing to the gains recorded in terms of cost reduction and efficiency for early stage drug discovery research to the point of manufacturing of some important drugs or vaccines to which the future of mankind mercilessly rely like terminal illnesses such as cancer or HIV-AIDS. All these diseases are unquestionable public health pandemonium requiring quick systematically robust holistic research approaches to halt their effects on human evolution.
Conclusively, ADMET data is tackled in three ways, namely: first, a variety of in vitro assays are further automated through the use of robust laboratory integrated system (LIM); secondly, in silico models are used to assist in the selection of both appropriate assays, as well as in the selection of subsets of compounds to go through these screens. And thirdly, predictive models have are then developed that might ultimately become sophisticated enough to replace in vitro assays and/or in vivo experiments. So, the need for ADMET information should start with the design of new compounds. Finally, this information shall influence the decision to proceed with synthesis either via traditional medicinal chemistry or combinatorial chemistry strategies. Although, it but obvious that at this point computational approaches are the most appropriate option to get this information, we should never, however, forget that predictions can be imperfect if flawed with errors. So, optimization should be the way forward once a lead compound is obtained and further optimized using more robust mechanistic models towards clinical trials before final manufacturing and authorization for absorption as a public health drug.
Since most models are rule-based and may use descriptors that are not easily understood by the chemist or not easily translated into better molecular structures, it is important to constantly train models of datasets. A combo approach, combining first generation (basic predictive descriptors) and second generation (meta-models) computational ADMET technologies would be the best way to go.
To then get value for your money, it is clearly demonstrated that ADME predictive tools is imperative, nowadays, in the health research programs in order to cut costs and propose reliable lead drug-like compounds. It is though highly desirable and recommendable to add in-house data in the prediction models whenever available. Sensitive and reliable high throughput instrumentations are a prerequisite in generating in-house analytical data necessary for efficient and useful predictive processes. Training data sets in models would be an added advantage for a wide range of investigations in health related research.
The author would like to acknowledge the application scientists at the Advanced Chemistry Development Inc., (Toronto, Canada), in particular, Dr Sanji Bhal for providing some of the illustrative predictive examples. Many thanks go to the reviewers for their helpful suggestions and revision of this chapter. Finally, the authors are grateful to Dr GPP Kamatou of the Department of Pharmaceutical Sciences of the Tshwane University of Technology, Pretoria, South Africa, for reading through the proof of the chapter and hereby acknowledge his positive criticism.