Xenobiotic metabolizing enzymes genes regulate via AhR pathway.
Aryl hydrocarbon receptor (AhR) is a biological sensor that integrates environmental, metabolic, and endogenous signals to control complex cellular responses in physiological and pathophysiological functions. The full-length AhR encompasses various domains, including a bHLH, a PAS A, a PAS B, and transactivation domains. With the exception of the PAS B and transactivation domains, the available 3D structures of AhR revealed structural details of its subdomains interactions as well as its interaction with other protein partners. Towards screening for novel AhR modulators homology modeling was employed to develop AhR-PAS B domain models. These models were validated using molecular dynamics simulations and binding site identification methods. Furthermore, docking of well-known AhR ligands assisted in confirming these binding pockets and discovering critical residues to host these ligands. In this context, virtual screening utilizing both ligand-based and structure-based methods screened large databases of small molecules to identify novel AhR agonists or antagonists and suggest hits from these screens for validation in an experimental biological test. Recently, machine-learning algorithms are being explored as a tool to enhance the screening process of AhR modulators and to minimize the errors associated with structure-based methods. This chapter reviews all in silico screening that were focused on identifying AhR modulators and discusses future perspectives towards this goal.
- human AhR
- in silico
- in vitro
- AhR modulator
- crystal structure
- AhR modeling
Six decades ago, researchers made extensive studies to answer a puzzling question. That was how administrating exogenous substances such as polycyclic aromatic hydrocarbons (PAHs) had a potent induction on xenobiotic-metabolizing enzymes in rats’ livers [1, 2]. It was finally Alan Poland and his colleagues who finally answered this question in the early 1970s. Poland discovered a novel hepatic protein in complex with the polycyclic aromatic hydrocarbons compound, 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) . The new protein was bound to TCDD in a potent affinity and was isolated from hepatic cytosolic fractions of mice C57BL/6, a mice model strain for studying aromatic hydrocarbon responsiveness. This protein was later termed as the aryl hydrocarbon receptor (AhR)  and was identified as a ligand-activated transcription factor.
Later studies showed that AhR is expressed in several tissues including but not limited to; liver, lung, placenta, and heart and different cell types throughout the developmental periods of organ growth . Further knockout studies in mice revealed essential functions for AhR in multiple physiological and pathophysiological pathways [4, 5, 6]. This accumulated knowledge over the last decades defined AhR as an environmental sensor for air pollutants and as a ligand-activated transcriptional factor, which regulates the expression of various genes, including enzymes responsible for xenobiotic metabolism .
AhR-mediates the toxicity of uncountable xenobiotics, and their triggered toxicity is accompanied by an overexpression and overactivation of AhR in cells. Thus, it increases the pathophysiological functions of AhR and could develop cancer in different organs such as the breast and liver. In addition, it can also lead to cardiovascular diseases, among other diseases [8, 9]. Thus, targeting AhR with a small molecule agonist/antagonist could efficiently inhibit several of the important hallmarks of various cancers .
Computational modeling and computer simulations continue to be an important tool for studying various biological mechanisms and for analyzing the interactions between biomolecular entities (
2. AhR structure and functions
2.1 AhR domain structure
AhR is a member of the basic helix–loop–helix (bHLH)-PER- ARNT-SIM (PAS) family of transcription factors. The “PAS” term is an abbreviation for three proteins, namely, the Drosophila circadian rhythm protein period (Per), the mammalian AhR nuclear translocator (ARNT), and Drosophila neurogenic protein single-minded (Sim) [7, 14, 15]. Human AhR is a 848 amino acid with a molecular weight of ~96 kDa . It includes two PAS domains, namely PAS A and PAS B, and interacts with the Aryl hydrocarbon nuclear tranlocator (ARNT) protein. Moreover, the PAS B domain involves two interactions sites: a ligand-binding site in which a bound ligand can modulate the AhR activity; and a direct binding interface for the HSP 90-chaperone protein. Additionally, AhR includes a basic helix loop helix motif located near its N-terminal domain, which is responsible for DNA binding as well as contributing to other protein–protein interactions. Finally, the transactivation (
2.2 The AhR ligands and their modes of action
The AhR PAS B domain can interact with both exogenous and endogenous chemicals from various origins. These interactions can induce different effects on AhR activity, leading to a wide range of physiological and toxicological downstream consequences. For example, several studies showed that environmental pollutants have been associated with developing cardiovascular diseases, cancer, and other diseases through AhR modulation [7, 17, 18]. Exogenous AhR ligands include various aromatic hydrocarbon molecules such as dioxins. One can be exposed to such ligands through contaminated food or environmental pollutants. Following exposure, their interaction with AhR can lead to several toxic effects, including organ dysfunctions, immunotoxicity, and carcinogenicity. On the other hand, endogenous AhR ligands are usually metabolic derivatives derived from cellular processes such as 6-Formylindolo (3,2- b) carbazole (FICZ). The interaction of these ligands with AhR is part of a normal functional response through AhR modulation [7, 19, 20].
2.3 AhR physiological and pathophysiological roles
AhR is an essential protein that contributes to countless biological pathways to establish its physiological role in developing the immune system and regulating xenobiotic enzymes [7, 15, 21]. AhR knockout mice models showed abnormal female reproductive functions and impairment in managing blood pressure . The overactivation and constitutive activation of AhR have been associated with the initiation, promotion, progression, and invasion of cancer cells. For example, activating AhR by exogenous AhR ligands can have several effects, which includes inducing cell proliferation in the G1-S phase, silencing tumor suppressor genes, and activating proto-oncogenes in cancer cell lines.
Earlier findings showed that the exogenous AhR ligand, 2,3,7,8-tetrachlorodibenzo-p-dioxin TCDD promoted the degradation of cell–cell adhesion and expansion of cancer cells’ motility by separating the Src kinase from the AhR protein complex. Furthermore, the activation of AhR via environmental pollutants can lead to a significant induction of xenobiotic-metabolizing enzymes, including CYP1A, which produces reactive intermediate metabolites and reactive oxygen species to promote tumor growth [14, 22]. In a nutshell, AhR resembles a machinery of genes, which controls xenobiotic-metabolizing enzymes in phases I and II, as shown in Table 1. Also, known AhR agonists such as TCDD and β -naphthoflavone have been shown to induce cellular hypertrophic actions on H9c2 cardiomyoblast cells. This was correlated with an increase in the levels of numerous cytochrome P450 genes, which could overcome by using an AhR antagonist .
|Phase II||NQO1 NAD(P)H: Quinine oxidoreductase 1|||
|Phase II||GSTA1 Glutathione transferase A1/2|||
|Phase II||UGT1A6 Uridine diphosphate glucuronosyltransferase 1A6|||
|Phase II||ALDH3A1 Aldehyde dehydrogenase 3A1|||
On the positive side, experiments on a mouse model of induced colitis showed that the endogenous AhR agonist (FICZ), which has a strong binding affinity towards AhR, could block IL-6 and claudin-2 expression, and prevent any induced disorders in the intestinal barrier function through AhR activation . Further protein knockout studies showed that AhR ligands play a fundamental role in autoimmune diseases through regulating Tregs and TH17 cell differentiation in the immune system. For example, FICZ inhibited Treg and TH17 cell development, accelerating experimental autoimmune encephalomyelitis in mice models [21, 25].
2.4 AhR signaling pathways
AhR is generally expressed in its inactive form in the cytoplasm as part of a protein complex encompassing a dimer heat shock protein, co-chaperone p23, an AhR-interacting protein, called AIP, and the protein kinase SRC (see Figure 2). The PAS B domain within AhR binds to one monomer of the HSP90 dimer and the second HSP90 monomer interacts with the AhR basic helix–loop–helix domain (bHLH) as well as with the PAS A domain . As shown in Figure 2, the bHLH domain within AhR is also crucial for DNA binding in a process initiated by the binding of an AhR ligand within the PAS B domain and its interaction with the co-chaperone P23. Binding to P23 stabilizes AhR in the cytoplasm, protecting it from proteasomal degrading, and also maintains the PAS B domain of AhR in a unique conformation, suitable for strong ligand binding [35, 36].
Once an AhR ligand binds to the PAS B domain, it forms an AhR-ligand complex, including p23, SRC, and AIP (see Figure 2). This complex is transformed into an active state and then translocated inside the nucleus. Then in the nucleus, all complex components dissociate from the AhR-ligand complex, excluding an agonist and AhR protein. Subsequently, AhR forms an active heterodimer with ARNT and creates an AhR–ARNT complex. This complex is then recruited to the DNA via the Dioxin response element (DRE), exhibiting a common DNA compromise motif (5′-TNGCGTG-3). This canonical AhR pathway increases the expression of various genes, including the principal ones in xenobiotic metabolism, AhR repressor (AHRR), and other genes .
3. AhR three-dimensional structures
Resolving the full-length three-dimensional structure of AhR has been a challenging exercise for the last two decades. Unfortunately, despite the many efforts towards this goal, there is no complete structure for the whole AhR protein. However, as discussed below, there are a few structures, which describe the number AhR domains. Although these structures do not reveal the exact overall AhR architecture, they can still provide useful information on the function of these separate domains. Giving computational modeling a favorable vantage point to construct reliable hypotheses for the full-length AhR organization for rational drug development and drug screening campaigns.
The first AHR 3D structure was reported in 2013 for the mouse PAS A domain (residues 110 to 267) at a resolution of 2.55 Å (PDB ID: 4M4X) (See Figure 3). This X-ray diffraction-based PAS A homodimer structure was obtained from recombinant
Two more additional AhR structures were revealed in 2017 (see Figure 4). The two structures comprise multiple AhR domains and show a clear interaction between AhR and its dimerization partner, ARNT, as well as its interaction with two DNA strands. The two structures (PDB IDs: 5V0L and 5NJ8) [39, 40] were resolved at a resolution of 4.0 and 3.35 Å, respectively and revealed the complex formation among the bHLH and PAS A domains from human AhR and their interactions with ARNT and DNA. However, due to the observed high flexibility of the AhR PAS B domain and the transactive domain (C- terminal), none of these two subdomains were included in this architecture. However, both structures clearly explain the protein–protein interactions (PPI) and show clear interface regions for these interactions between the individual domains within AhR as well as their interactions with ARNET and DNA.
As shown in Figure 4, the first PPI interface is between the AhR-ARNT heterodimer with the two DNA strands. This interaction is mediated by DRE Ser36, His39, and Arg40 from the AhR bHLH domain and His79, Asp83, Arg86, and Arg87 from ARNT, as well as thymine and guanine from the DNA. The second PPI interface is between AhR and ARNT through different regions within the two proteins. These regions involve many hydrophobic interactions from both proteins and comprise residues Leu47, Leu50, Leu53, Val74, and Leu70 from the AhR bHLH domain and residues Ile109, Leu112, Val136, and Met139 in ARNT. The third PPI interface involves interactions between residues from the PAS A domain in both AhR and ARNT, mediated by residues Phe117, Leu118, Ala121, Leu122, Tyr137, Val126, Phe266, and Ile268 from AhR. The fourth, and final PPI interface encompasses the interdomain interactions between the AhR bHLH and AhR PAS A domains, through residues Phe136, Ser151, Ile154, and Leu246 from the PAS A domain and Phe56, Val60, Leu72, Ala79, and Phe82 from the bHLH domain.
4. Applications of computational methods in AhR modeling
The wealth of structural information described above on AhR provides an excellent opportunity to apply various computer-based simulations to study the dynamicity and structural organization of the various AhR domains. The applications of such computational tools not only can yield much needed insights on how these domains interact together within the AhR machinery, but can also offer detailed answers on their interactions with other AhR partners (
4.1 Modeling the PAS B domain
Most of the
In many AhR studies, the human hypoxia inducible factors (HIF-2α) crystal structures served as templates for AhR-PAS B domain because it has the highest sequence similarity towards the AhR-PAS B domain. Table 2 provides a list of the reported
|PDB ID||Structure method||Year of study||Reference|
|3F1O, 3H7W, 3H82||X-ray diffraction||2018, 2018||[52, 75]|
|3F1N, 3F1O, 3F1P, 3H7W, 3H82, 4GHI, 4GS9, 4XT2, 4ZP4, 4ZQD||X-ray diffraction||2019|||
|3H82, 3H7W, 4ZQD||X-ray diffraction||2020|||
For example, Bisson and his group established an agonist-optimized model of the human AhR-PAS B domain, followed by docking around five thousand chemical structures, including AhR agonists and antagonists, within the PAS B domain. Docking results were then filtered and the top five systems were subjected to long MD simulations (~ 60 ns) to study the conformational and dynamical changes in these generated complexes. Findings from Bisson’s work revealed the importance of residues 307–329 in the PAS B domain, which were shown to be very flexible, acting as an access gate to the ligand-binding pocket. These residues can also adopt different conformations upon AhR ligands’ binding and play a primary function in controlling the structural changes and accessibility of the ligands to the AhR ligand binding pocket .
4.2 Interaction of the PAS B domain with different ligands
With the 3-dimensional structure of the PAS B domain in hand, many groups focused on studying its binding to different ligands (
Mutations at outer residues (e.g., Arg282, Thr311, Glu339, and Lys350) into alanine did not impact TCDD binding to AhR . In the human AHR-LBD a mutation at Ala375 to Val and Leu decreases the binding affinity of TCDD and makes indirubin a less potent endogenous AhR ligand [52, 53]. Additional site-directed mutagenesis within AhR-LBD residues has been used to identify key residues promoting for ligand selectivity in AhR. These developed models provided a clear basis towards understanding the mechanism of ligand-dependent activation of AHR via its PAS B domain. In particular, the above mentioned molecular docking and mutagenesis analyses helped in identifying and confirming the binding pocket of TCDD and other AhR modulators [46, 51, 54, 55].
Examples of these models include those developed by Kim and her team, who constructed 3D models from several avian species including, chicken, albatross, and cormorant, and studied the sensitivity of dioxin derivatives against multiple AhR isoforms. All models were subjected to docking simulations with TCDD followed by MD simulations. Kim’s results used the mean square displacement (MSD) of the MD trajectories as a stability indicator for the bound ligands. These findings revealed Ile324 and Ser380 from chicken AhR1 exhibited the least MSD values compared to all AhR-LBD residues in other avian species. The size of binding pocket was also shown to be variable among the different species. Moreover, stabilization of TCDD in the binding pocket of chicken AhR relied on the features of Ile324 and Ser380, which explained why chicken AhR is more sensitive to TCDD binding compared to other AhR isoforms [48, 56, 57, 58].
Further mutational and functional analysis studies were expanded to include additional AhR modulators other than TCCD. For example, the work of Faber and her team studied induribin binding to AhR in both mouse and human. This study revealed that a mutation in His326Tyr and Ala349Thr in mouse AhR, and Tyr332 and Thr355 in human AhR can increase the potency of indole compounds, particularly, indirubin. Also, although indirubin and vemurafenib can fit within the same binding pocket in AhR, the two compounds showed two different modes of binding [52, 59]. For example, flutamide efficiently binds to residues inside the AHR-LBD with a high affinity in both mouse and human AHR to activate the AhR pathway . It is important to note that, the biological response of AhR is dependant on the type of the bound ligand and has been shown to change based on the interaction of a given ligand with the residues forming the LBD in the PAS B domain [61, 62].
4.3 Virtual screening and machine learning models applied to AhR
Over the last few decades, virtual screening has been used as a major tool to in hit identification campaigns against numerous biological targets . In this regard, AhR is no exception and various
Pharmacophore modeling maps the ligand-target interactions into a set of steric and electronic features structured in a specific 3D arrangement . These pharmacophore models can be then used to screen millions of available chemical structural libraries for compounds that satisfy these pharmacophore features, which can be used for scaffold hopping and fragment-based drug design. On the other hand, structure–based methods require the knowledge of target protein crystal structure, or its 3D developed homology models. Ligands from a given database can be fitted into the active site of the target protein and can be ranked based on the predicted binding affinities. In this context, molecular docking and molecular dynamics simulations are among the many valuable tools that can be used to predict the most probable mode of binding of a given ligand within the target. Furthermore, structure-based pharmacophore models can provide more detailed insights on the interaction of ligand with the binding site [66, 68, 69].
As discussed below, several AhR screening studies combined both methods to enhance the search for possible AhR candidates [64, 70]. The plethora of accumulated physicochemical, chemical and structural data on AhR modulators augmented this hit identification search with great tools to build reliable machine learning models, which require large datasets of chemical structures along with their interaction kinetics with AhR .
An example of AhR
In a similar approach, Rath and his team built two human PAS B domain; a wild type and mutant (Val381 Ala, Val381Asn) models. Around 60 natural compounds from
|Compound name||Induction of AhR transcription||Binding free energy (kcal/mol)||Reference|
|Pinocembrin (5,7-Dihydroxyflavanone, R-form)||+||−2.9|||
|IMA-06201 (N-ethyl-||+||Not report|||
|IMA-06504 (N-(4-trifluoromethylphenyl)-1,2-dihydro-4-hydroxy-5-methoxy-1-methyl-2-oxo-quinoline-3-carboxamide)||+||Not report|||
In another screening study, Mahiout, et al. identified IMA-06201 and IMA-06504 as two novel AhR agonists, with similar modes of binding to that of TCDD. Both compounds showed great stability in the central area of the AhR ligand-binding pocket. Furthermore, these AhR agonists were shown to be more efficient and more potent as selective AhR modulators than TCDD. To confirm that, Mahiout used CYP1A1 enzyme activity as a biomarker for AhR activation and compared the efficacy and potency of IMA-06201 and IMA-06504 (see Figure 5 and Table 3) to that of TCDD in the presence and absence of the AhR antagonist, CH-223191, at different concentrations in rat hepatoma cell lines. Their results showed that the new compounds, IMA-06201 and IMA-06504, were able to induce CYP1A1 activity in a similar efficacy to that of TCDD, where CH-223191 was shown to block their CYP1A1 induction. Also, in an Ames test to assess the genotoxicity of the new identified compounds, IMA-06201 and IMA-06504 did not show mutagenic effects at low concentrations .
Machine-learning algorithms combined with QSAR have been recently used to screen for new AhR ligands. For instance, Matsuzaka used deep learning (DL) to construct machine-learning models to predict AhR activators. These models showed advantages on enhanced input data based on the 3D chemical structures of the compounds into these models, and their performance was better than traditional machine learning models . To enhance the screening process of AhR ligands, Zhu established a virtual screening protocol from combining ligand-based and structure-based screening with supervised machine learning to screen around eight thousand from the pesticide databases to identify an agonistic effect on AHR activity. Zhu’s results revealed sixteen compounds as AhR activators and these findings were validated in a zebrafish
Towards improving the prediction accuracy of his model, Yang, et al. used machine learning algorithms to construct two-dimensional quantitative structure–activity relationship (2D-QSAR) models from multiple linear regression (MLR) and artificial neural network (ANN) algorithms. He used the pEC50 values of 60 dioxins derivatives as AhR activators to build. These models predicted the toxicity of 162 new dioxin derivatives, showing a good correlation between compounds’ chemical structures and their IC50 and EC50 values.
Recently, Goya-Jorge employed various machine learning algorithms to build a set of QSAR models. These models adopted the adoboost (AdB), random forest (RF), gradient boosting (GB), support vector machine (SVM), and multilayer perceptron (MLP) as classifiers to examine around 1900 compounds from synthetic and natural sources on their AhR agonism. Around 40 compounds baring the benzothiazole scaffold were classified as AhR agonists. In vitro validation of these hits showed that indole derivatives can serve as AhR ligands, including the endogenous substances [78, 79]. Table 3 reports some of the top hits emerging from different
5. Current challenges in modeling AhR
Identifying novel AhR modulators using in silico approaches require establishing more comprehensive computational models of this target. These models should describe the detailed organization of the different AhR domains as well is its interaction with other protein/DNA partners. While the available crystal structures provide a glimpse of these missing pieces of information, there are still more to be done in this regard. For example, all currently available AhR crystal structures deposited in the protein data bank are lacking two important AhR domains, namely the PAS B domain and the transactivation domain . The transactivation domain is essential in AhR intercellular trafficking.
On the other hand, the PAS B domain interacts with an AhR ligand, which can modulate the AhR activity. While homology modeling has helped constructing acceptable models for this domain, the similarity of the templates used to build the PAS B domain is very low, leaving a lot of doubt about their accuracy. A crystal structure of the PAS B domain would be a great leap forward towards understanding the mode of action of AhR modulators and towards identifying better agonists/antagonists for this important target. Furthermore, there is a gap of knowledge on how AhR interact with other protein partners in the inactive state, including co-chaperone, AIP, and the protein kinase SRC. This builds an additional challenge to identify druggable pockets at their protein–protein interfaces [7, 82]. With the apparent advances in obtaining 3D experimental structures of protein (e.g. Cryo-electron microscopy (cryo-EM)) one expects several of these structural challenges can be solved in the near future, opening new gates for the computational science to identify new AhR modulators and to help understand its functional, structural and biological characterizes more clearly.
6. Executive summary
The AhR is a ligand-activated transcriptional factor. It regulates various genes’ expression and plays a pathophysiological function in numerous diseases. Crystallography has been employed to resolve three crystal structures containing bHLH and PAS A domains from human and mouse origin and to identify four protein–protein interfaces. However, all these structures lacked the PAS B domain, which plays a fundamental role in ligands’ binding domain to AhR. Computational and mutational studies revealed important residues that constitute the binding pockets within the PAS B domain. Towards identifying novel AhR modulators, several virtual screening and machine learning algorithms were constructed based on the available structural and pharmacological properties of known AhR ligands. Computational methods are extremely fast and intensely reduce the cost and time in screening millions of compounds to find compounds that could interact with the AhR. Recent studies employing these methods against AhR have been reviewed and discussed in this chapter. We hope the literature presented here can help advance the development of novel, selective and potent AhR modulators.