Overview of selected reader domains for post-translational modificationsa.
The function of chromatin ultimately depends on the many chromatin-associated proteins and protein complexes that regulate all DNA-templated processes such as transcription, repair and replication. As the molecular docking platform for these proteins, the nucleosome is the essential gatekeeper to the genome. As such, the nucleosome-binding activity of a myriad of proteins is essential for a healthy cell. Here, we review the molecular basis of nucleosome-protein interactions and classify the different binding modes available. The structural data needed for such studies not only come from traditional sources such as X-Ray crystallography but also increasingly from other sources. In particular, we highlight how partial interaction data, derived from for example NMR or mutagenesis, are used in data-driven docking to drive the modeling of the complex into an atomistic structure. This approach has opened up detailed insights for several nucleosome-protein complexes that were intractable or recalcitrant to traditional methods. These structures guide the formation of new hypotheses and advance our understanding of chromatin function at the molecular level.
- protein interactions
- chromatin binding
- acidic patch
- histone tails
- post-translational modifications
- data-driven docking
- NMR spectroscopy
- structural models
The packaging of DNA into chromatin represents one of the most fundamental layers of the biology of the cell. It provides the required structural compaction of DNA to fit in the nucleus and plays crucial roles in controlling cell fate and protecting genome integrity. The fundamental unit of chromatin is the nucleosome in which 147 base pairs (bp) of DNA are wrapped around an octameric protein complex composed of two copies of histone proteins H2A, H2B, H3 and H4 [1, 2, 3]. Nucleosomes are arranged as beads-on-a-string forming 10 nanometer (nm) wide fiber that subsequently condense into higher order structures . Nucleosomes as the basis of chromatin are responsible for its dynamics. Chromatin state and changes in DNA accessibility are determined at the nucleosome level. These changes are mediated through interactions of histone proteins and nucleosomal DNA alike with a wide range of protein complexes that control the structure of chromatin. They interpret, write and erase post-translational modifications or act as ATP-dependent nucleosome remodelers. This allows changes in the functional state of chromatin and regulation of DNA-templated processes. While promoting a large variety of effects on chromatin structure, nucleosome-interacting proteins share the molecular basis of recognizing and binding the nucleosome. Understanding the basis of chromatin dynamics therefore demands understanding the molecular basis of nucleosome-protein interactions.
In particular, insights into the molecular mechanistic basis of how histone-modifying enzymes install or remove post-translational modifications (writers and erasers, respectively) and how these modifications are recognized by effector proteins (readers) are of immense interest, especially in drug development. Deregulation of these proteins is strongly connected to pathological outcome, including cardiovascular diseases, neurological disorders, metabolic disorders and cancer . So-called epigenetic drugs that target the nucleosome interaction of these chromatin factors offer new therapeutic potential [6, 7, 8, 9]. A selection of epigenetic drugs including those currently undergoing clinical trial is described in detail elsewhere . Advancement in their development requires insights into the underlying molecular mechanism of nucleosome recognition, enabling control over subsequent modification of the chromatin state.
In the following, we will review the molecular basis of nucleosome-protein interactions, focusing on the different binding epitopes presented by the nucleosome. After an overview of the nucleosome-protein structures determined by crystallography or cryo-electron microscopy (cryo-EM), we highlight several studies in which experimental data from nuclear magnetic resonance spectroscopy (NMR), cross-link-based mass spectrometry (XL-MS) or mutational analysis were used to build atomistic structural models of nucleosome complexes. Throughout, we emphasize the role of these data-driven models in deepening our understanding of nucleosome recognition.
2. Nucleosome-binding epitopes
Consisting of DNA and histone proteins, the nucleosome offers a selection of distinct interaction surfaces for binding of effector proteins with high levels of specificity (Figure 1).
Histone proteins possess a globular tertiary structure with exposed, disordered N-terminal tails. Histone tails are known to carry a wide range of covalent, post-translational side chain modifications (PTMs) such as, mono-, di- and trimethylation (Lys, Arg); acetylation (Lys); phosphorylation (Ser, Thr) and ubiquitination (Lys) [11, 12]. This cosmos of modifications maintains a dynamic nature through the reversibility of the covalent modifications. Modified histones are recognized by so-called reader protein domains specific for the respective modification (Figure 1A). Interestingly, nucleosome-interacting proteins can possess more than one reader domain which allows cross talk between different post-translational modifications. Examples of PTM reader domains are Chromo, Tudor, PHD and MBT domains for methylated lysine residues, bromodomains for acetylated lysine residues and 14–3-3 proteins for phosphorylated serine [11, 13] (Table 1). The most recent addition to the list is YEATS domains that recognize crotonylated lysine [14, 15, 16]. Reader domains often have structurally conserved motifs that are able to complex a specific modification. The “Royal Family” of reader domains is in this respect a particularly instructive example. This superfamily includes the Chromo, MBT, PWWP and plant Agenet domains that bind methylated lysine (Tudor, Chromo, MBT, PWWP, plant Agenet) or arginine (Tudor) residues. Most domains of this family contain a barrel-shaped structure formed by 3–5 antiparallel β-strands that holds a cluster of aromatic residues that form the so-called aromatic cage . The aromatic cage presents an electron-rich yet hydrophobic surface that is ideally suited to bind methylated lysines through cation-π interactions . The structural features and similarities, as well as their substrate specificity, have been subject to literature reviews [19, 20, 21].
|Tudor||Kme1, Kme2, Kme3, Rme2||53BP1||DNA damage response |
|TDRD3||Transcription activation |
|MBT||Kme1, Kme2||L3MBTL1||Transcriptional repression [26, 27]|
|PWWP||Kme3||PSIP1||Transcriptional co-activation, DNA repair [28, 29]|
|Chromo||Kme, Kme2, Kme3||CHD1||Chromatin remodeling [30, 31]|
|Plant Agenet||Kme, Kme2, Kme3||FMRP||DNA damage response |
|KAc||BRD2/3||Transcriptional regulation |
|Sph||14–3-3ζ||Transcriptional activation |
Reader domains can, in addition to the post translational modification, show specificity for a defined amino acid sequence motif around the epigenetic mark that supports complex formation. For example, the WD40 domain of the EED (embryonic ectoderm development) protein selectively reads out trimethylated lysine in a A-R-K-S sequence motif (as for H3K27me3) but not in a R-T-K-Q motif (as for H3K4me3) .
Next to histone tails, the nucleosome also possesses intrinsic docking platforms on its histone surface. The most prominent of these is composed of histones H2A and H2B. While the histone octamer is overall highly positively charged, there is a patch on the H2A-H2B dimer surface formed by acidic residues with negative surface charge. This structural feature is named the acidic patch and engages in a manifold of interactions with specific binding domains (Figure 1), including the tail of histone H4 of adjacent nucleosomes that promotes chromatin compaction. A common feature observed for acidic patch-interacting proteins is a positively charged arginine residue that interacts with a triad of acidic residues on H2A (Glu61, Asp90, Glu92). This is referred to as the arginine anchor . It is often supported by surrounding positively charged residues interacting with acidic H2A/H2B interface residues.
Other parts of the histone core surface may also mediate protein-nucleosome interactions (Figure 1C). First, a solvent exposed cleft between H4 and H2B was shown to be involved in binding interactions with Sir3 or 53BP1 [39, 40]. Interestingly, these proteins bind simultaneously to both the H4-H2B cleft and the acidic patch using one nucleosome-binding domain for each epitope. Second, incorporation of non-canonical histones in nucleosomes introduces specific interaction surfaces that allow histone variant-specific nucleosome binding (Figure 1D). An example hereof are CENP-N and CENP-C that recognize the incorporated histone H3 variant CENP-A [41, 42].
Finally, the nucleosomal DNA is a major protein interaction site. First, it forms the binding site of linker histone H1 [43, 44, 45] (see also Section 4.9). Second, it is often involved in additional synergistic interactions to nucleosome-binding domains (Figure 1E). Finally, recent studies have identified transcription factor proteins that primarily bind to nucleosomal DNA. These so-called pioneer factors bind their DNA target sites while embedded in the nucleosome [46, 47, 48]. The structural details of these are however still lacking.
Throughout the advances in studies on nucleosome binding, it has become clear that binding of effector proteins in many cases involves interactions of nucleosome-binding domains to multiple nucleosome epitopes (Figure 1G, H). However, due to their size and complexity as well as the stability and dynamics of complex formation, the nucleosome is a challenging system for structural biology.
3. Crystal clear: lessons from crystallography and single particles
A key role in the research of protein interactions are high-resolution three-dimensional structures of the complexes, typically obtained by crystallography and, increasingly, cryo-electron microscopy. These structures enable the identification of binding sites and intermolecular interactions, offering a guided approach to design binding-deficient mutants or competitive binders. The history of nucleosome structural biology peaked with the publication of the high-resolution crystal structure of the nucleosome in 1997 . Luger et al. achieved crystallization of the nucleosome together with a palindromic version of human α-satellite DNA . This milestone study provided the foundation to also study the structures of nucleosomes together with chromatin factors in complexes. Table 2 lists the structures of nucleosome-protein complexes solved to date by crystallography and cryo-electron microscopy [39, 50, 51, 52, 53, 54, 55, 56, 57, 58]. The most recent addition to this ever-growing list is the spectacular structures of the complex between the INO80 chromatin remodeler and the nucleosome [59, 60]. Below, we discuss a few cardinal studies to highlight the different nucleosomal binding modes of effector proteins.
|Sir3 BAH||4JJN, 3TU4, 4LD9, 4KUD||Chromatin compaction||2011, 2011, 2013, 2013||X-Ray||[39, 51, 52, 53]||3.0 – 3.3|
|CENP-C||4X23||H3 variant binding||2013||X-Ray||||3.5|
|CENP-N||6BUZ, 6C0W||H3 variant binder||2017, 2018||EM||[72, 73]||3.9/4.0|
|H1||4QLC, 5NL0||Linker histone||2015, 2017||X-Ray||[45, 75]||3.5|
|INO80||6FML, 6ETX||Remodeling complex||2018, 2018||EM||[59, 60]||4.4/4.8|
|LANA||1ZLA, 5GTC||Viral protein||2006, 2017||X-Ray||[61, 76]||2.9/2.7|
|GAG||5MLU||Synthetic acetylation system||2017||X-Ray||||2.8|
3.1 The first crystal structure of a nucleosome complex (LANA)
The first high-resolution structure of a nucleosome-protein complex was the crystal structure of a peptide model of Kaposi’s sarcoma-associated herpesvirus LANA N-terminal region bound to the nucleosome . The binding site identified in this study was the acidic patch. The atomistic resolution allowed to identify intermolecular side chain interactions including the arginine anchor bound to the acidic triad. Ever since, the LANA-nucleosome has become a golden standard for comparisons with other acidic patch interactions [50, 55]. Importantly, LANA is used to investigate the acidic patch binding ability of other proteins by competitive binding [62, 63, 64]. Interestingly, this exact epitope happened to be the binding interface also for the first full protein domain that was crystalized in its nucleosome-bound state.
3.2 The first crystal structure of a nucleosome-bound protein domain (RCC1)
The first structure of a protein bound to the nucleosome was the RCC1-nucleosome complex published by the Tan lab in 2010. RCC1 (regulator of chromosome condensation) is essential during mitosis by recruiting Ran GTPase, which plays a role in nucleus reorganization, to the nucleosome [65, 66]. A comparison with LANA highlighted the crucial and conserved interaction of arginine residues with the acidic patch triad . Strikingly, RCC1 binds to the acidic patch using the canonical arginine anchor, here contained in a loop, and also binds the nucleosomal DNA through its N-terminal tail. Such synergetic interactions have been observed later in many other nucleosome-binding proteins [50, 55, 67, 68, 69, 70]. This study was the first to show such complexity of nucleosomes as interaction platforms. It also highlights the importance of properly defining the boundaries of binding domains to capture all binding epitopes in order to reveal possible synergetic interactions and fully understand complex formation and subsequent effects on chromatin structure.
3.3 Specificity of effector protein orientation in nucleosome complex formation (PRC1)
Besides determining the binding mode, synergetic interactions can also provide the structural basis for specificity of effector protein activity. This was shown in the crystal structure, also from the Tan lab, of the polycomb repressive complex 1 (PRC1) that ubiquitinates H2A K119 in a highly specific manner . On its surface, the nucleosome displays various lysine residues that can be ubiquitinated by the respective writer proteins. However, the downstream response wildly differs depending on the position of the ubiquitinated lysine. Thus, target specificity is of high importance for ubiquitin writer proteins. In case of PRC1, this is based on two distinct binding processes. For one, there is the interaction between acidic patch and the arginine anchor of the Ring1B/Bmi1 subunit. In addition, the E2 subunit UbcH5c engages the nucleosomal DNA. Combined, both contributions are responsible for exact positioning of the catalytic center of the ubiquitin carrying E2 to the target H2A K119 (Figure 2B).
Besides LANA, RCC1 and PRC1, other crystal structures of nucleosome complexes offered further insights into nucleosome recognition. In particular, the structure of the nucleosome complex of the SAGA DUB deubiquitination module showed a non-canonical acidic patch binding. Morgan et al. found that the SAGA nucleosome-binding DUB module possesses three equally crucial arginine residues distributed over an α-helix  (Figure 2A). This perhaps points towards yet other acidic patch interaction modes.
Recently, also cryo-EM-derived structures of nucleosome-protein complexes have been published. The first structure, solved in 2016, yielded the structure of the complex with 53BP1, a reader protein for post-translational histone modifications . Subsequently, the structures of Snf2 and CENP-N were solved and published [71, 72, 73].
Since the first crystal structure two decades ago, the list of nucleosome complexes deposited in the RCSB PDB protein databank is continuously growing. Still, the 12 high-resolution structures solved to date only encompass a fraction of all nucleosome-protein interactions. This discrepancy highlights the need for alternative techniques in chromatin structural biology.
4. Data-driven modeling
An attractive alternative to traditional structure determination methods is the modeling of structures of complexes based on some sort of experimental information on the interaction [79, 80]. In such data-driven modeling of a complex structure, the two interaction partners are docked together, guided by the experimental data, and respecting their biophysical properties. The exact binding interface and relative orientation of the binding partners are typically refined over several steps. Prerequisite for this approach is the availability of the 3D structures of the interacting partners. Several molecular docking programs allow the incorporation and use of experimental data and so increase the accuracy of resulting structures . Hence, data from diverse biophysical techniques are translated into restraints guiding the docking process [82, 83, 84]. The type of information includes interaction interface, distances or shape of the complex and its subunits. Techniques that can provide these information are listed in Table 3.
|H/D exchange||Forster resonance energy transfer (FRET)||Small angle X-ray or neutron scattering (SAXS/SANS)|
|Electron paramagnetic resonance (EPR)||Ion-mobility mass spectrometry (IM-MS)|
Interestingly, all three classes of information can be provided by NMR spectroscopy. It is possible to gather data on intermolecular distances and shape by paramagnetic relaxation enhancement (PRE) and the nuclear Overhauser effect (NOE) as well as information on binding interfaces and binding affinity through chemical shift perturbation (CSP). The use of these NMR methods in docking studies is reviewed in detail elsewhere . An overview of publications that used data-driven docking to investigate nucleosome-protein complexes is listed in Table 4.
|PSIP1-PWWP||Trimethyl lysine reader H3K36||NMR||[67, 68, 85]|
|RNF169||Ubiquitin reader||NMR, SAXS||[69, 89]|
|H1||Linker histone||NMR||[43, 90]|
|Rad18||DNA repair factor||NMR|||
|PHF1 Tudor||Trimethyl lysine reader H3K36||Crystallography/NMR|||
4.1 Bringing data-driven modeling to nucleosome complexes (LSD1-CoREST)
A pioneer study for data-driven modeling of a nucleosome complex was successfully applied for the lysine-specific demethylase 1 and CoREST complex . Both proteins cooperate in the demethylation of mono- and dimethylated H3K4. While it was possible to solve the crystal structure of LSD1-CoREST, their nucleosome-bound state remains elusive. Yang et al. gained insight into the molecular basis of LSD1-CoREST interaction by identifying point mutations that interfere with the LSD1-CoREST ability to demethylate a methylated peptide model of the histone H3 tail. Since it was previously shown that LSD1 recognizes a specific stretch of the H3 tail , it was possible to employ modeling to identify intermolecular interactions between the peptide and both the LSD1 active site and the LSD1-CoREST interface (Figure 3B). Lastly, NMR titration experiments of the CoREST SANT2 domain with DNA revealed a DNA-binding interface on SANT2. These pieces of interaction data were used to guide a docking approach resulting in a complete structural model of the LSD1-CoREST-nucleosome complex (Figure 3A). With the lack of experimental data on the nucleosome interaction, this is a prime example of combining crystal structures, mutagenesis and NMR data to overcome limitations of the separate techniques.
4.2 NMR-based structural biology of nucleosome-protein complexes
Over recent years, several studies have demonstrated that state-of-the-art solution NMR can offer high-resolution and site-specific characterization of the structures and dynamics of nucleosome-protein complexes. NMR has the particular advantage of its sensitivity to dynamics and the ease with which interactions can be studied, allowing detailed insights into molecular recognition processes. NMR allows studies when systems are dynamic, or (partially) disordered, while this typically hampers high-resolution structure determination by crystallography and cryo-EM.
The molecular size of nucleosomes, and even more so of complexes with effector proteins, poses a challenge to traditional NMR methods. However, this challenge can be overcome through the use of methodologies designed for high-molecular weight systems. This method, methyl group-based transverse-relaxation-optimized spectroscopy (methyl-TROSY), relies on the highly sensitive observation of NMR signals of protein methyl groups . Here, a specific isotope-labeling scheme is used, which typically results in observation of isoleucine, leucine, valine (ILV) methyl groups. The methyl-TROSY NMR spectra can subsequently be used to delineate binding sites of effector proteins on the nucleosome surface and vice versa [68, 69, 93, 96]. Extracting more detailed structural information is possible through the use of so-called spin-labels that can generate long-range distance restraints between the interaction partners [97, 98]. Whichever way used, NMR-based interaction data are of unique value in the modeling of nucleosome-protein complexes.
4.3 Expanding data sources for nucleosome complex models to NMR (HMGN2)
Kato et al. were the first to use the methyl-TROSY approach for the study of nucleosome-protein interactions . Importantly, they reported the NMR signal assignments of the ILV-methyl groups for all histones in the nucleosomes. These assignments are essential in determining protein-binding sites on the nucleosome surface. The approach was demonstrated using high mobility group nucleosomal protein 2 (HMGN2), which regulates a variety of chromatin functions. HMGN2 was found to bind both the acidic patch and nucleosomal DNA. Based on these NMR data, supported by mutagenesis, it was possible to determine a structural model of the complex (Figure 4A). HMGN2 binds to the nucleosome as a staple, using two main interaction sites. On one side, HMGN2 is anchored to the acidic patch using a canonical arginine anchor in the N-terminal region of the binding domain, while the lysine-rich motif in its C-terminal region binds to nucleosomal DNA (Figure 4B). This binding mode provided a structural basis for the antagonistic function of HMGN2 towards linker histone H1 for nucleosome binding.
4.4 Latest applications of NMR to investigate structures of nucleosome complexes (RNF169 & Rad18)
Two recent studies relied on methyl-TROSY NMR-derived binding data to elucidate the recognition of ubiquitinated nucleosomes. Both focused on the interaction between ubiquitylated H2A K13/15 and the DNA repair factor RNF169. The work of Kitevski-LeBlanc et al. established the molecular basis of this interaction. The α-helical MIU2 (motif interacting with ubiquitin) domain binds to a hydrophobic patch on the K13/15-conjugated ubiquitin while a disordered region anchors RNF169 on the nucleosome by binding to the acidic patch. They subsequently reconstructed a model structure that presents both epitopes in their nucleosome-bound state (Figure 5A). The work of Hu et al. combined traditional NOESY-based structure determination at the level of histone-dimers with interaction studies at the nucleosome level and complemented these with SAXS data into a final model . The authors also extended their findings to an NMR-based structural model for the complex with DNA repair factor Rad18. Both RNF169 and Rad18 are known to interfere with the binding of 53BP1 to nucleosomes ubiquitinated at H2A K13/15. These NMR-based structural models have allowed to hypothesize on the molecular mechanism for this interference.
4.5 Importance of the nucleosomal context in epigenetic read-out (PSIP1-PWWP & PHF1-Tudor)
The complexity of nucleosome recognition by reader proteins is well illustrated by the NMR-based studies on the recognition of H3K36me-nucleosomes by the PWWP domain of PSIP1(Ledgf). NMR studies of this reader interaction found that the PWWP domain has binding affinity orders of magnitude lower for a H3K36me peptide compared to H3K36me3 in a nucleosomal context. Interestingly, a similar observation was made for the Tudor domain of the H3K36me reader PHF1 . Here, an isolated peptide model of the H3 tail showed decreased affinity as well. Due to the proximity of H3K36 to nucleosomal DNA, a role of DNA binding was hypothesized for both proteins. NMR studies showed for PSIP1 and PHF1 alike a binding site for nucleosomal DNA, resulting in a simultaneous binding mechanism of both trimethyl lysine and nucleosomal DNA.
For PHF1-Tudor, a crystal structure bound to a trimethylated H3 tail peptide was already available to use. The additional importance of the nucleosomal context and synergetic binding mechanism can be understood from the corresponding nucleosome-bound structure (Figure 6A). In case of PSIP1-PWWP, the domain structure was solved by NMR and, together with NMR titration data, used to determine a structural model of nucleosome-bound protein (Figure 6B) [67, 68, 85]. The structural models of both highlighted the importance of the nucleosomal context in H3K36me3 recognition, emphasizing that complex formation critically depends on two synergetic binding processes. Firstly, the aromatic residues that form the aromatic cage bind to trimethylated lysine H3K36me3. This recognition of the PTM is crucial for the binding, but the readers reach their full binding affinity only when their positive surface residues interact with the nucleosomal DNA. This makes both studies outstanding examples of synergetic interplay of epitopes in nucleosome-binding proteins (Figure 6C, D).
The insights derived from these structural models were used to design experiments to validate the structural model and may offer possible tools for further research approaches. In case of PSIP1-PWWP, the structural model sparked current efforts in the design of nucleosome-mimicking peptides to modulate the PSIP1-chromatin interaction.
4.6 LANA goes solid state
The studies mentioned above illustrate the potential of data-driven modeling of nucleosome-protein complexes based on state-of the-art solution NMR. Recent advances in solid-state NMR (ssNMR) have enabled the detailed investigation of large, soluble biomolecular complexes. Very recently, our lab capitalized on these advances and tailored them for application to nucleosome-protein complexes . Unlike the methyl-TROSY methods, this approach allows observation of all residues, in principle allowing for a more complete mapping of binding interfaces. In this approach, NMR spectra are recorded on sediments, generated by ultracentrifugation, of nucleosomes or their complexes. After assignments of NMR signals of histone H2A in the unbound nucleosome, spectra were recorded on the nucleosome complex with the LANA peptide, analogous to the LANA crystal structure (Figure 7A) [61, 87]. Based on the chemical shift changes, the binding site of LANA could be mapped to the acidic patch and a structural model generated. The large agreement between the crystal structure and ssNMR-derived structural model (Figure 7B) illustrates the power of this approach. In our view, ssNMR, just as the solution NMR approach, is an attractive alternative for structure determination for nucleosome-protein complexes. While its application awaits to be extended to larger nucleosome-binding domains, we anticipate that it will be a valuable addition to the tool kit in chromatin structural biology.
4.7 Modeling nucleosome-bound Rad6-Bre1 based on cross-linking MS
Next to NMR, cross-linking mass spectrometry has found increasing application as a data source on nucleosome-protein interactions. With cross-linking, intermolecular contacts between the proteins of interest are captured and converted to covalent connections. These connections are introduced by small molecule linkers, specific for the fusion of well-defined side chains or less specific as radical-forming photo cross-linkers. Furthermore, cross-linkers possess a spacer between their terminal functional groups to define the range of cross-linking ability [99, 100]. Both characteristics can be tuned for the study of a specific system, resulting in a manifold of reported linker molecules. After cross-linking, the protein complex undergoes trypsin digestion resulting in peptide fragments of the complex. Here, covalently cross-linked fragments stay connected. An analysis of these fragments by liquid chromatography mass spectrometry (LC-MS) enables identification of the sequence positions. The cross-links can thus be converted to distance restraints between two residues, with the distance depending on the length of the cross-linker. These restraints can be used to guide structural modelling of the complex . In one of the earliest examples for nucleosome complexes, XL-MS was used to map the binding sites of the various nucleosome-binding domains of the chromatin remodeling complex ISW2 onto the nucleosome surface . These data were subsequently used to build a structural model of the ISW2-nucleosome complex. A recent case of cross-linking-based modeling in nucleosome research is the E2/E3 ubiquitin ligase complex Rad6-Bre1 (Figure 8A). Bre1 is known to act as a homodimer in a complex with Rad6 to specifically ubiquitinate H2B K123 [101, 102]. However, the molecular mechanism of specific ubiquitination remained unknown without any nucleosome-bound complex structure available. Gallego et al. addressed exactly this problem by using XL-MS data to identify the binding interface between the Bre1 RING domain and the nucleosome. Next to nucleosomal DNA binding, they observed binding of the homodimer to the acidic patch (Figure 8B), which was verified by LANA-induced inhibition of Bre1 RING nucleosome binding. As a first step in the modeling, the authors modeled the Rad6-Bre1 complex structures based on homology with known E2/E3 RING ligases. Importantly, the resulting model was supported by the observed cross-links. The Rad6-Bre1 model could then be docked onto the nucleosome guided by the observed cross-links. This provided the structural basis for the specificity of Bre1 towards H2B K123 ubiquitination .
4.8 Adding new perspective on binding modes
Data-driven structural models complement high-resolution structures in many ways. An interesting example is the RCC1-nucleosome interaction, which serves as binding platform for subsequent binding of Ran, a protein relevant during mitosis (see Section 3.2). Biochemical data have shown that Ran activity is increased in the nucleosome-bound complex. The crystal structure suggests no nucleosome-Ran interactions upon modeling Ran to the RCC1 Ran-binding interface. Before the crystal structure of nucleosome-bound RCC1 was solved, a data-driven model was reported, which does feature Ran-nucleosome interactions. . The authors suggest that, upon Ran binding, the nucleosomal DNA contacts with RCC1 N-terminal tail observed in the crystal are broken in favor of Ran-nucleosome interactions as observed in model. Even though additional studies have to elucidate the exact mechanism of RCC1-Ran nucleosome binding, the use of crystal structure and data-driven model in combination outlines a possible mechanism to further investigate.
4.9 Debating H1
Another cardinal topic is the nucleosome-bound state of linker histone H1. To date, the structure of the chromatosome, consisting of the four canonical histones and 166bp of DNA in a complex with linker histones, is strongly debated. In this case as well, there are contradictions between structural models and a nucleosome-bound crystal structure of the chromatosome. The crystal structure reported by Zhou et al. displays the globular domain of linker histone H5 (chicken H5) with truncated tails in an on-dyad binding mode encountering both entering and leaving ends of linker DNA . As for linker histone H1 (X. laevis H1.0b, human H1.5), a similar on-dyad binding mode was reported by cryo-EM and crystallography independently from absence or presence of H1 tails . In fact, while not vital for linker histone positioning, the H1 C-terminal domain engages in binding of one of both linker DNAs preferably, introducing asymmetry into the nucleosome-bound complex.
In contrast to the proposed on-dyad complex, computational studies on linker histone binding suggest an alternative, off-dyad binding geometry of the complex in which the linker histone shows interactions with but one strand of linker DNA . This binding mode was shown experimentally in the case of the globular domain of linker histone H1 (D. melanogaster). Here, NMR-based distance information, obtained through paramagnetic relaxation enhancement (PRE), was used to derive the nucleosome-binding mode of H1, showing an asymmetric, off-dyad binding . Interestingly, it was shown by PRE as well that the mutation of a set of five crucial amino acids in H5 to its equivalents in H1 is sufficient to change the binding mode of H5 from on-dyad (crystal) to off-dyad . This points out the importance of linker histone subtype sequence and the interacting residues in determining the binding mode towards the nucleosome .
Chromatin structural biology is an equally important as demanding field. This is not only clear from the tremendous efforts necessary for the first nucleosome structure but also from the limited number of structures for nucleosome-protein complexes. While crystallography and cryo-EM resulted in various high-resolution structures, not every interaction is accessible this way due to either of many experimental limitations, such as the need for crystallization, the fleeting nature of some complexes or the pervasive role of highly dynamic protein regions. Here, an increasing number of studies shift towards a combined approach utilizing various sources of interaction data to direct sophisticated data-driven docking. This way all knowledge on a nucleosome-interacting system can be integrated into a structural model that is otherwise inaccessible. These models strongly depend on the quality and quantity of data and contain an inherent ambiguity. However, as in the case of linker histone H1, structural models can point to alternative binding modes and thus result in new, testable hypotheses. Additionally, crucial residues for nucleosome binding can be identified, allowing design of, for example, loss of function or loss of binding mutants to silence specific pathways. It also offers the possibility to drive the design of competing small molecule or peptide structures as potential candidates for epigenetic drugs interfering with specific effector binding. Remarkably, these developments might be otherwise lost due to the lack of a structure. However, as for now, a database for such structural models, akin to the RCSB protein databank, remains to be established. This might however be essential to advance the study of chromatin effector proteins. Publicly available structures including their data-based restraints could be used for further refinements upon availability of new, additional datasets from an array of techniques. It also would offer the possibility of negative results, otherwise rarely reported, to contribute to drive or score the quality of already reported models. Data-driven modeling of nucleosome-protein complexes has the potential to yield unique fundamental insights into nucleosome-binding dynamics and enable advances in modulation of chromatin effector proteins, which would be otherwise inaccessible.
We thank all authors of the studies included in this work who kindly provided us with files of their structural models for review. This work is supported by the Netherlands Organization for Scientific Research (NWO) through a VIDI grant (723.013.010) to Hugo van Ingen.
Conflict of interest
The authors of this work declare no conflict of interest.