Domains existing in a single polypeptide chain with the C5-DNA MTase domain (PF00145) according to the Pfam database.
(Cytosine-5)-DNA methyltransferases (C5-DNA MTases) are enzymes which catalyze methyl group transfer from S-adenosyl-L-methionine (AdoMet) to C5 atom of cytosine residue in DNA. As a result, AdoMet is converted into S-adenosyl-L-homocysteine (AdoHcy). The recognition sites of C5-DNA MTases are usually short palindromic sequences (2–6 bp) in double-stranded DNA. One or both DNA strands can be methylated. The introduced methyl group is localized in the major groove of the DNA double helix and thus does not disrupt Watson–Crick interactions .
In prokaryotes, DNA methylation underlies several important processes, e.g. host and foreign DNA distinction as well as maternal and daughter strand discrimination that is vital for correction of replication errors in the newly synthesized DNA strand. DNA methylation is also responsible for DNA replication control and its interconnection with cell cycle . The majority of the known DNA MTases are components of restriction–modification (R–M) systems which protect bacterial cells from bacteriophage infection. A typical R–M system consists of a MTase which modifies certain DNA sequences and a restriction endonuclease (RE) which hydrolyses DNA if these sequences remain unmethylated .
In eukaryotes, DNA methylation has diverse functions such as control of gene expression, regulation of genome imprinting, X-chromosome inactivation, genome defense from endogenous retroviruses and transposons, participation in development of immune system and in brain functioning. Anomalous methylation patterns in humans are associated with psychoses, immune system diseases and different cancers .
Different C5-DNA MTases share high similarity both in the primary and in the tertiary structure that enables their easy identification by bioinformatic tools. At the moment (July 2012), the Pfam database (http://pfam.sanger.ac.uk/) contains 5065 protein sequences that possess the characteristic domain of C5-DNA MTases (PF00145). Among this moiety, 3072 sequences (61%) contain the only domain PF00145 while the others have a duplication of this domain and/or other additional domains. The diversity of such domains fused in a single polypeptide with the C5-DNA MTase domain is rather wide (Table 1): there are MTases, RE, transcription factors, chromatin-associated domains etc.
To date, the structural characteristics of the catalytic domain from different MTases, the details of the catalytic mechanism and the biological functions of C5-DNA MTases from different organisms are summarized in a variety of reviews (for example, see [1, 4-8]). Therefore, these aspects are discussed here rather briefly. The present review is focused on the additional activities of C5-DNA MTases, on the structure and functions of the domains which are additional to the catalytic one. The data about C5-DNA MTases have not yet been summarized from this point of view.
2. The methyltransferase domain in prokaryotic and eukaryotic (cytosine-5)-DNA methyltransferases
The most studied enzyme among the prokaryotic C5-DNA MTases is MTase HhaI (M.HhaI) from Haemophilus haemolyticus. It methylates the inner cytosine residue in the sequence 5′-GCGC-3′/3′-CGCG-5′ (italicised). M.HhaI consists of only the MTase domain (Figure 1). The structural organization and catalytic mechanism of C5-DNA MTases were extensively studied using this enzyme as a model.
The catalytic domain of C5-DNA MTases consists of two subdomains, a large one and a small one, separated by a DNA-binding cleft. The tertiary structure of the large (catalytic) subdomain has a common structural core – a β-sheet that consists of 7 β-strands and is flanked by 3 α-helices from each side. Six of seven β-strands have a parallel orientation, while the 7th β-strand is located between the 5th and the 6th β-strands in an antiparallel orientation (Figure 1, b). Thus, the large subdomain consists of 2 parts: the first one (β1–β3) forms the AdoMet binding site while the second one (β4–β7) forms the binding site for the target cytosine. The small subdomain contains a TRD region (target recognition domain) that has a unique sequence in each MTase and is responsible for the substrate specificity. The small subdomains of C5-DNA MTases vary substantially in size and spatial structure .
|Domain name and number||Domain description|
|DNA_methylase (PF00145)||C5-cytosine-specific DNA methyltransferase|
|Eco57I (PF07669)||Eco57I restriction–modification methyltransferase|
|Methyltransf_26 (PF13659)||Methyltransferase domain|
|MethyltransfD12 (PF02086)||D12 class N6-adenine-specific DNA methyltransferase|
|N6_Mtase (PF02384)||N6-DNA methyltransferase|
|N6_N4_Mtase (PF01555)||DNA methyltransferase|
|Cons_hypoth95 (PF03602)||Conserved hypothetical protein 95|
|EcoRI_methylase (PF13651)||Adenine-specific methyltransferase EcoRI|
|Dam (PF05869)||DNA N6-adenine-methyltransferase (Dam)|
|RE_Eco47II (PF09553)||Eco47II restriction endonuclease|
|RE_HaeII (PF09554)||HaeII restriction endonuclease|
|RE_HaeIII (PF09556)||HaeIII restriction endonuclease|
|RE_HpaII (PF09561)||HpaII restriction endonuclease|
|Vsr (PF03852)||DNA mismatch endonuclease Vsr|
|DUF559 (PF04480)||Domain of unknown function|
|BsuBI_PstI_RE (PF06616)||BsuBI/PstI restriction endonuclease C-terminus|
|TaqI_C (PF12950)||TaqI-like C-terminal specificity domain|
|HTH_3 (PF01381)||Helix–turn–helix (HTH)|
|HTH_23 (PF13384)||Homeodomain-like domain|
|HTH_26 (PF13443)||Cro/C1-type HTH DNA-binding domain|
|MerR (PF00376)||MerR family regulatory protein|
|MerR_1 (PF13411)||MerR HTH family regulatory protein|
|DUF1870 (PF08965)||Domain of unknown function|
|PHD (PF00628)||PHD-finger (plant homeo domain)|
|Domains of other DNA-operating enzymes|
|SNF2_N (PF00176)||SNF2 family N-terminal domain|
|Helicase_C (PF00271)||Helicase conserved C-terminal domain|
|Terminase_6 (PF03237)||Terminase-like family|
|RVT_1 (PF00078)||Reverse transcriptase (RNA-dependent DNA polymerase)|
|MutH (PF02976)||DNA mismatch repair enzyme MutH|
|DYW_deaminase (PF14432)||DYW family of nucleic acid deaminases|
|DNA_pol3_beta_2 (PF02767)||DNA polymerase III beta subunit, central domain|
|DNA_pol3_beta_3 (PF02768)||DNA polymerase III beta subunit, C-terminal domain|
|HhH-GPD (PF00730)||HhH-GPD superfamily base excision DNA repair protein|
|Transposase_20 (PF02371)||Transposase IS116/IS110/IS902 family|
|Chromo (PF00385)||CHRromatin Organisation MOdifier|
|PWWP (PF00855)||PWWP domain (conserved Pro-Trp-Trp-Pro motif)|
|BAH (PF01426)||BAH domain (bromo-adjacent homology)|
|DMAP_binding (PF06464)||DMAP1-binding domain|
|DNMT1-RFD (PF12047)||Cytosine specific DNA methyltransferase replication foci domain|
|zf-CXXC (PF02008)||CXXC zinc finger domain|
|RCC1_2 (PF13540)||Regulator of chromosome condensation (RCC1) repeat|
|Pkinase_Tyr (PF07714)||Protein tyrosine kinase|
|YTH (PF04146)||YT521-B-like domain|
|PPR (PF01535)||PPR repeat (pentatricopeptide repeat)|
|AOX (PF01786)||Alternative oxidase|
|Cullin (PF00888)||Cullin family|
|Cyt-b5 (PF00173)||Cytochrome b5-like heme/steroid binding domain|
|PALP (PF00291)||Pyridoxal-phosphate dependent enzyme|
|CH (PF00307)||Calponin homology (CH) domain|
|Dabb (PF07876)||Stress responsive A/B barrel domain|
|Hint_2 (PF13403)||Hint domain|
|DUF1152 (PF06626)||Domain of unknown function|
|DUF3444 (PF11926)||Domain of unknown function|
Mammalian DNA MTase Dnmt1 is responsible for maintenance methylation. Its main recognition site is monomethylated 5′-CG-3′/3′-GC-5′ DNA fragment. The structures of M.HhaI and Dnmt1 in their complexes with DNA were compared . Their catalytic subdomains are rather similar (the root mean square deviation of Cα atoms is 2.0 Å over 218 aligned residues). However, the TRD primary and tertiary structures differ significantly between Dnmt1 and M.HhaI. The larger part of the Dnmt1 TRD is structurally isolated and stabilized by Zn2+ ion, the latter one being coordinated by three Cys and one His residues. A β-hairpin in the C-terminal part of the Dnmt1 TRD forms hydrophobic contacts with the catalytic subdomain and the BAH1 domain (see section 4.1.5). The side chains of a few residues (presumably arginine) in the catalytic subdomain make contacts with the phosphate groups that flank unmethylated 5′-CG-3′/3′-GC-5′ sites. In the M.HhaI structure, the DNA is located in the cleft between the catalytic subdomain and the TRD, whereas the DNA in Dnmt1 complex is distant from the Dnmt1 catalytic center. This is likely to be connected with the fact that the activity of the Dnmt1 catalytic domain is regulated by the N-terminal part of the protein. An isolated Dnmt1 catalytic domain proved to be inactive [10-13].
3. The mechanism of DNA methylation
To catalyze the methylation reaction, a MTase binds DNA containing its recognition site and the cofactor AdoMet. Specific DNA–protein contacts are formed between the MTase and heterocyclic bases of the recognition site except the cytosine to be methylated. The target cytosine is methylated according to SN2 mechanism. The whole methylation process can be divided into 3 steps: a cytosine flipping out of the DNA double helix, a formation of a covalent enzyme–substrate intermediate and a methyl group transfer to the cytosine residue [14-23] (Figure 2).
The target nucleotide flipping can occur spontaneously or with the help of an enzyme . The catalytic loop (residues 81–100 in M.HhaI) shifts substantially towards the DNA almost simultaneously with the target cytosine flipping. As a result, the flipped out base occurs in close proximity with the cofactor molecule inside of a closed catalytic cavity [23-25].
A nucleophilic attack of M.HhaI Cys81 thiol group on the cytosine C6 atom results in a formation of M.HhaI–DNA conjugate. The Cys81 residue is a part of a conservative ProCys dipeptide of the protein (motif IV, Figure 2). The Glu119 residue from GluAsnVal tripeptide (motif VI) protonates the cytosine N3 atom thus facilitating the nucleophilic attack by the thiol group. In the conjugate, the negative charge of the cytosine C5 atom causes its alkylation by the AdoMet methyl group. Proton elimination from the C5 atom of the methylated cytosine and β-elimination of the Cys residue result in a breakdown of the covalent DNA–protein complex and restore aromaticity of the modified cytosine base (Figure 2, a). The rate constant for methyl group transfer catalyzed by M.HhaI is about 0.14–0.26 s–1 [17, 18]. The following release of the reaction products is a rate-limiting step of the M.HhaI catalytic cycle [17, 18].
C5-DNA MTases HhaI and HpaII catalyze the methylation reaction in a distributive manner, i.e. dissociate from the DNA substrate after every catalytic act [1, 18]. Each one of these MTases is a component of a R–M system where the cognate RE searches for its recognition sites via scanning DNA by a linear diffusion mechanism. The MTase distributivity provides for the corresponding RE a possibility to bind an unmodified site before the MTase and to cleave it. On the other hand, C5-DNA MTase SssI  has no cognate RE and is able to methylate several recognition sites located in one DNA substrate in a processive manner, i.e. without dissociation from DNA after each catalytic act.
The affinity of prokaryotic C5-DNA MTases to their DNA ligands in the presence of nonreactive cofactor analogs increases in the following manner: dimethylated DNA << unmethylated DNA < monomethylated DNA [1, 2, 27, 28]. A similar correlation is observed for the eukaryotic enzyme Dnmt1. The Dnmt1 affinity to monomethylated 5′-CG-3′/3′-GC-5′ sites is 2–200 times higher than to unmethylated ones depending on the experimental conditions [13, 29-35]. Dnmt1 modifies monomethylated sites processively (more than 50 sites per one binding act on average). Unmethylated sites (mainly 5′-CCGG-3′) are methylated by Dnmt1 in a distributive manner .
Mammalian C5-DNA MTases Dnmt3a and Dnmt3b methylate presumably unmodified sites in DNA and are responsible for de novo DNA methylation during embryonic development. Dinucleotides 5'-CG-3'/3′-GC-5′ are the main recognition sites of Dnmt3a and Dnmt3b. These enzymes are also able to modify dinucleotides 5′-CА-3′ but 10–100 times less efficiently . The efficiency of methylation by Dnmt3a and Dnmt3b is also dependent on the sequences flanking the recognition site: 5′-RCGY-3′ is the most frequently methylated site whereas 5′-YCGR-3′ is methylated with a lower efficiency (R and Y are purine and pyrimidine nucleosides respectively). The difference in the methylation rate of these sites can exceed 500 times . The primary structures of the Dnmt3a and Dnmt3b catalytic domains share identity of 84%. In contrast to Dnmt1, isolated catalytic domains of Dnmt3a and Dnmt3b retain their activity [12, 39]. Dnmt3a methylates DNA in a distributive manner while Dnmt3b modifies DNA processively since the DNA binding center of Dnmt3b is more positively charged than DNA binding center of Dnmt3a [37, 39]. Interestingly, a Cys residue substitution in the catalytic ProCys motif of an isolated Dnmt3a catalytic domain does not totally abolish the enzyme activity but merely decreases it 2–6 times. This Cys residue is shown to take part in the DNA–Dnmt3a conjugate formation. However, the Dnmt3a catalytic domain looses its activity completely after a substitution of a Glu residue in the GluAsnVal motif. The active center of Dnmt3a seems to have an unusual conformation and the Cys residue perhaps does not have its optimal orientation. Therefore, the nucleophilic attack onto the cytosine C6 atom can be performed by some other residue or hydroxyl ion. Post-translational modifications or interaction with other proteins are likely to be needed for the Dnmt3a activation .
Murine Dnmt3a is able to perform automethylation, a methyl group transfer to the Cys residue of its own catalytic center in the presence of AdoMet. This reaction is irreversible and rather slow but its can be activated by Dnmt3L. In the presence of a duplex containing 5′-CG-3′/3'-GC-5' sites Dnmt3a methylates the substrate DNA but not its own Cys residue. The automethylation seems to have a regulatory function that enables to inactivate excessive enzyme molecules in a cell. On the other hand, it can be just a side reaction which takes place in the absence of DNA .
A protein called Dnmt3L is also a member of Dnmt3 family. It is catalytically inactive but plays an important role as a stimulator of the Dnmt3a and Dnmt3b activity. A structure of a complex consisting of Dnmt3a and Dnmt3L C-terminal domain has been determined by X-ray crystallography (PDB-code 2qrv) . The complex is a heterotetramer where the subunits are localized in the following order: Dnmt3L–Dnmt3a–Dnmt3a–Dnmt3L. Two active centers of Dnmt3a are localized nearby and probably can methylate two 5′-CG-3′/3′-GC-5′ sites simultaneously. These recognition sites should be separated by 8–10 bp (about one turn of DNA double helix). Twelve murine genes which undergo maternal imprinting contain 5′-CG-3′/3′-GC-5′ sites localized in such a manner. Moreover, highly methylated regions of human chromosome 21 possess 5′-CG-3′/3′-GC-5′ sites separated by 9, 18 and 27 bp more frequently in comparison to the unmethylated regions . The effective methylation of these regions could be determined by the proper distribution of 5′-CG-3′/3′-GC-5′ sites. A substitution of Dnmt3a and Dnmt3L residues which do not take part in catalysis but are important for the interface formation (Dnmt3a–Dnmt3L or Dnmt3a–Dnmt3a interfaces) results in a suppression of Dnmt3a activity. This fact confirms the importance of the appropriate complex formation between Dnmt3a and Dnmt3L . The orientation of the Dnmt3a–Dnmt3L complex relative to the substrate DNA is still unclear.
4. Additional functions of prokaryotic (cytosine-5)-DNA methyltransferases
4.1. DNA methyltransferases with multiple methyltransferase domains
Some C5-DNA MTases contain more than one MTase domain. The “additional” domain can belong to (cytosine-5)-DNA or (adenine-N6)-DNA MTases. Up to date, the Pfam database contains 5065 protein sequences that possess the characteristic domain of C5-DNA MTases (PF00145). Among them, 676 sequences (13%) contain two PF00145 domains and 42 sequences (1%) – even three PF00145 domains. Such structures might have arisen as fusions of two adjacent genes encoding different MTases. However, none of these proteins has been studied experimentally.
The ability of a single molecule to methylate both the cytosine C5 atom and the adenine N6 atom is typical for MTases that recognize asymmetric DNA sequences. This phenomenon was demonstrated for M.Alw26I from Acinetobacter lwoffi RFL26 (recognition site 5′-GTCTC-3′/3′-CAGAG-5′) and for M.Esp3I from Hafnia alvei RFL3 (recognition site 5′-CGTCTC-3′/3′-GCAGAG-5′) . Each of these proteins contains the N6-DNA MTase domain in its N-terminal part and the C5-DNA MTase domain – in its C-terminal part . The ability of each domain to methylate its recognition site in the absence of the second domain has not been investigated.
A gene coding for another enzyme consisting of two MTase domains has been constructed from two genes of Helicobacter pylori 26695 . These genes are located in tandem orientation and code for DNA MTases M.HpyAVIB and M.HpyAVIA. One nucleotide insertion before the stop codon results in a formation of a fused gene. A similar mutation has been occurred naturally in H. pylori D27 strain . M.HpyAVIB methylates the cytosine C5 atom and M.HpyAVIA methylates the adenine N6 atom in the sequence 5′-CCTC-3′/3′-GGAG-5′ (italicised). The obtained bifunctional protein contains the C5-DNA MTase domain and the N6-DNA MTase domain in its N-terminal and C-terminal parts, respectively. The both domains recognize in DNA the same sites as the initial proteins. The methylation kinetics and the properties of point mutants demonstrate that these domains function independently from each other. Each one of them contains its own catalytic and AdoMet binding motifs .
4.2. Deamination of cytosine and 5-methylcytosine
Prokaryotic C5-DNA MTases M.HpaII, M.HhaI, M.SssI, Dcm (from E. coli), M.EcoRII, and M.SsoII are shown to increase the rate of C → dU → T and m5C (5-methyl-2'-deoxycytidine) → T mutagenesis in vitro [48-55]. Some of them demonstrate the mutagenic activity (M.HpaII, M.EcoRII, and Dcm) also in vivo increasing the mutagenesis rate up to 50 times [50, 54, 56]. Interestingly, prokaryotic M.MspI does not stimulate cytosine deamination in vitro but its mutagenic effect is comparable with the effect of other C5-DNA MTases when M.MspI is expressed in E. coli cells . Moreover, М.EcoRII is shown to catalyze 5-methylcytosine conversion into thymine .
The enzymatic catalysis of cytosine deamination is a side reaction of the methylation process. According to the standard mechanism (Figure 2), the cysteine thiol group performs a nucleophilic attack onto the cytosine C6 atom and at the same time the cytosine N3 atom gets protonated that altogether leads to the DNA–enzyme conjugate formation. As a result, the cytosine base aromaticity is disrupted and the C5 atom becomes negatively charged. However, the following step (the methyl group transfer to the C5 atom) becomes impossible in the absence of AdoMet. If water penetrates into the enzyme active center, hydroxylation of the cytosine C4 atom is likely to occur. These processes initiate the deamination reaction (Figure 2, b). Afterwards, the cytosine amino group is substituted with a carboxyl group and the base is converted into uracil [58, 59]. The presence of AdoMet or AdoHcy prevents water penetration into the active center and therefore inhibits deamination. A point mutant of М.HpaII incapable of AdoMet binding is a very effective catalyst of C → dU conversion .
The AdoMet analogs such as sinefungin and 5′-amino-5′-deoxyadenosine can increase the rate of enzymatic deamination even in the presence of AdoMet and AdoHcy . They seem to trigger other reaction mechanisms which are not completely clarified yet. The supposed mechanisms include a water molecule direct attack onto the cytosine C4 atom. It becomes possible after the N3 atom protonation by a MTase which is a step of the methylation reaction. Two alternative mechanisms are suggested (Figure 3). According to the first one (mechanism 1), the hydroxyl group of 5′-amino-5′-deoxyadenosine activates a water molecule producing a hydroxyl ion which attacks the cytosine C4 atom (Figure 3). According to the mechanism 2, 5′-amino-5′-deoxyadenosine acts as an acid and protonates the cytosine N4 atom (Figure 3) thus facilitating the amino group elimination in the form of ammonium ion . The mechanisms 1 and 2 are based on a water molecule direct attack onto the C4 atom and differ considerably from the others as they do not require a MTase interaction with the cytosine C6 atom. Therefore, a mutant form of a MTase which does not catalyze the methylation reaction should be able to catalyze these side reactions. Indeed, it has been shown experimentally for an М.EcoRII point mutant where catalytic Cys was substituted with Ala .
Both the methylation and the deamination reactions require flipping out of the cytosine residue. Thus, the longer cytosine base remains flipped out the faster these reactions go. The both reactions share a common intermediate – the enzyme–substrate conjugate (Figure 2). The usage of tritiated cytosine allows an estimation of tritium to hydrogen exchange rate at the cytosine C5 atom which can serve as a measure of the conjugate formation rate. In the absence of the cofactor, such an exchange catalyzed by murine Dnmt1 is slower than the one catalyzed by M.HhaI . Thus, Dnmt1 forms the conjugate in the absence of AdoMet with a lower probability and therefore is less mutagenic than M.HhaI . On the contrary, prokaryotic MTases catalyze effectively the deamination reaction in the absence of the cofactor [48, 53, 57].
The different rates of the covalent enzyme–substrate complex formation in the absence of the cofactor could reflect different physiological functions of prokaryotic and eukaryotic C5-DNA MTases. Limited nutrition decreases AdoMet amounts in prokaryotic cells. The mutations derived from the cytosine deamination might not be lethal for a bacterial cell and could help to prevent hydrolysis of cellular DNA by phage endonucleases. So, the ability of a C5-DNA MTase to catalyze deamination can turn out a physiological advantage for a bacterial cell . On the contrary, mammalian cells are not likely to benefit from this kind of mutagenesis and therefore have developed mechanisms which provide low mutagenesis rate.
4.3. Topoisomerase activity
Two prokaryotic C5-DNA MTases are shown to have a topoisomerase activity, namely М.SssI from Spiroplasma MQI and M.MspI from Moraxella species. М.SssI is not a part of R–M system (there is no corresponding RE in Spiroplasma genome). As well as eukaryotic MTases, М.SssI modifies cytosine C5 atom in 5′-CG-3′/3′-GC-5′ sequences . In the presence of 10 mM Mg2+, М.SssI provides relaxation of negatively supercoiled plasmids which leads to accumulation of plasmids with different degree of supercoiling. The obtained set of plasmids is similar to the products of topoisomerase I from calf thymus. An ATP addition does not influence the topoisomerase activity of М.SssI. Since type II topoisomerases need ATP for the enzymatic activity, М.SssI can be regarded as a type I topoisomerase . The MTase and the topoisomerase activities of М.SssI are functionally independent. The methylation process requires AdoMet, while the topoisomerase reaction demands Mg2+ ions. The M.SssI conservative motifs IV and VIII share a certain similarity with topoisomerase sequences. In particular, the motif IV contains Tyr that is an important catalytic residue in topoisomerases. A more detailed analysis of M.SssI regions responsible for the topoisomerase activity has not been conducted.
Different speculations are proposed to explain why these two activities are combined in one M.SssI molecule. Firstly, the topoisomerase activity alters the supercoiling degree of plasmid DNA and thus could facilitate or complicate the cytosine flipping out of the DNA helix. Secondly, the methylation of 5′-CG-3'/3′-GC-5′ sites by M.SssI could change the DNA structure. For example, a negatively supercoiled DNA region with large amount of methylated 5′-CG-3′/3′-GC-5′ sites is likely to be converted into Z-form. For the B-form restoration, topoisomerase activity is necessary. Thirdly, the change in DNA topology perhaps influences the level of gene expression in Spiroplasma. Finally, the two different activities can be combined in one protein for the purpose of genome economy. Spiroplasma belongs to mycoplasmas – cellular organisms which have the smallest genome (from 600 to 1800 kbp). For a comparison, the T4 bacteriophage genome consists of 165 kbp and the E. coli genome – of 4600 kbp .
M.MspI is a part of MspI R–M system and methylates the first cytosine residue in the sequence 5′-CCGG-3′/3′-GGCC-5′ . The unique property of this MTase is its ability to bend DNA at 142 ± 4° upon its binding to the methylation site. This was demonstrated using 127 bp DNA duplex and has not been shown for any other MTase . Unlike M.SssI, M.MspI has an N-terminal part responsible for the topoisomerase activity. This part consists of 107 residues and is located before the conservative motif I. There are two regions of M.MspI that share similarity with topoisomerase sequences: the residues 32–98 and the conservative motif VIII. In contrast to M.SssI, there is no similarity between the M.MspI conservative motif IV and the topoisomerase sequences. A mutant form of M.MspI lacking the first 34 residues retains the ability to methylate DNA but loses its topoisomerase activity. Mutant proteins M.MspI(W34A) and M.MspI(Y74A) also do not have topoisomerase activity but are still able to methylate the DNA substrate . Additionally, the M.MspI C-terminal part contains a region (245–287 а.a.) that shares similarity with DNA ligase I active center. This is in accordance with the fact that the topoisomerase I activity includes ligation of the DNA strands .
4.4. (Cytosine-5)-DNA methyltransferases as transcription factors
According to the Table 1, some C5-DNA MTases contain domains that can function as transcription factors. These domains are located in the N-terminal parts of the proteins and are followed by the MTase domains. The main structural element of these domains is a characteristic helix-turn-helix (HTH) motif that is also present in many transcription factors. To date, the Pfam database contains 68 sequences that include a domain with HTH motif followed by the C5-DNA MTase domain. Among them, 25 sequences belong to the HTH_3 family (PF01381). The ability to downregulate expression of its own gene was shown experimentally only for 7 DNA MTases (M.MspI, M.EcoRII, M.ScrFIA, M1.LlaJI, M.Eco47II, M.SsoII, and M.Ecl18kI) [66-68]. Among them, M.SsoII and M.Ecl18kI are the most remarkable ones as they not only suppress the transcription of their own genes but also stimulate the transcription of their cognate RE genes.
M.EcoRII from E. coli R245 strain methylates C5 atom of the second cytosine in the sequence 5′-CCWGG-3′/3′-GGWCC-5′ (W = А or Т). In vitro experiments demonstrate that M.EcoRII can bind both its methylation site and the promoter region of its own gene. The enzyme’s binding site has been determined by footprinting with DNase I: M.EcoRII protects 47 nucleotides in the “top” strand and 49 nucleotides in the “bottom” strand from DNase I hydrolysis . Thus, the binding site of M.EcoRII is located upstream of the MTase gene coding region and overlaps with its –10 and –35 promoter elements. This localization of M.EcoRII prevents RNA polymerase binding to the promoter and results in a suppression of the MTase gene transcription. The M.EcoRII binding site in the promoter region contains an imperfect inverted repeat (with 2 nucleotide substitutions). The repeat consists of two 11 bp sequences spaced by 12 bp. This kind of symmetry supposes the MTase to bind the promoter region as a dimer though this protein is a monomer in solution. Investigation of M.EcoRII catalytically inactive forms shows that the efficiency of the MTase binding to the promoter region does not depend on its ability to methylate substrate. These facts demonstrate that M.EcoRII consists of two domains: a catalytic domain and a domain responsible for the interaction with the promoter region .
R–M system Eco47II from E. coli RFL47 strain contains a MTase that is also able to downregulate the expression of its own gene. М.Eco47II methylates the cytosine C5 atom in the sequence 5′-GGNCC-3′/3′-CCNGG-5′ (N = A, G, C, T). It remains unclear which one cytosine is modified. The М.Eco47II N-terminal part is predicted to contain an HTH motif and is demonstrated to be responsible for the transcription regulation but not for the MTase activity. Mutations introduced into the catalytic center of the enzyme result in suppression of the methylation activity but do not disrupt the regulatory function .
In the MspI R–M system, the mspIM and mspIR genes are transcribed divergently from the complementary DNA strands and their promoter regions (–35 elements) are separated by 6 bp. The regulatory site of M.MspI is located in the promoter region of the mspIM gene and contains a 12 bp inverted repeat. M.MspI protects from DNase I hydrolysis the region from –34 to +17 position in the “top” DNA strand and the region from –33 to +17 position in the “bottom” DNA strand (the numbers indicate the position relatively to the start point of the mspIM gene transcription). So, M.MspI interaction with the regulatory region prevents RNA polymerase binding to the promoter region and blocks transcription initiation from the mspIM gene. At the same time, M.MspI does not interact with the promoter region of the mspIR gene and does not interfere with the expression of RE MspI .
R–M system ScrFI from Lactococcus lactis subsp. cremoris also contains a C5-DNA MTase that regulates gene expression in its R–M system . The RE gene (scrFIR) is flanked by two genes that code for MTases: scrFIBM and scrFIAM. The scrFIAM gene has its own promoter while the scrFIBM and scrFIR genes are transcribed together with a gene of unknown function – orfX (Figure 4). Both MTases from the ScrFI R–M system recognize and methylate the cytosine base in the sequence 5′-CCNGG-3′/3′-GGNCC-5′ (it remains unknown which one cytosine is methylated). The biological sense of existing of two MTase genes is not clear. The N-terminal part of M.ScrFIA is predicted to contain an HTH motif . M.ScrFIA is also shown to bind to the regulatory region – a 15 bp inverted repeat located before the transcription start point for the scrFIAM gene. M.ScrFIA binds to this region and inhibits the expression of its own gene 72].
R–M system LlaJI from Lactococcus lactis contains two C5-DNA MTases – M1.LlaJI and M2.LlaJI. These MTases have the same recognition site 5′-GACGC-3′/3′-CTGCG-5′ but methylate different cytosine bases (italicised) in the “top” and the “bottom” DNA strands respectively [73, 74]. Genes coding for these MTases and two RE compose one operon and are transcribed from the same promoter (Figure 5, a). The promoter region contains a 24 bp inverted repeat that is a regulatory site for the MTases. This palindrome sequence contains two methylation sites of the LlaJI MTases one of which overlaps with –35 promoter element (Figure 5, b). At first, M2.LlaJI modifies both sites that enables binding of M1.LlaJI and methylation of cytosine bases in the opposite strand. Interaction of M1.LlaJI with the unmethylated substrate has not been demonstrated . M2.LlaJI acts only as a modifying enzyme while Ml.LlaJI has an additional capability of binding to the inverted repeat that contains –35 promoter element. The N-terminal part of Ml.LlaJI is predicted to contain an HTH motif. The M1.LlaJI binding to the inverted repeat results in suppression of gene transcription for the whole LlaJI operon. This mechanism seems to be unique since methylation usually decreases the binding efficiency of a repressor with its operator . The complicated regulatory mechanism is likely to enable fine tuning of transcriptional level in LlaJI R–M system .
Transcription regulation has been studied most thoroughly in the R–M systems SsoII from Shigella sonnei 47 and Ecl18kI from Enterobacter сloacea 18k. The nucleotide sequences of these systems share 99% identity. The sequences of the intergenic regions are completely identical while the proteins differ in 1 amino acid residue. We will refer to such R–M systems as to SsoII-like ones. The genes of the SsoII R–M system are located divergently and spaced by an intergenic region of 109 bp (Figure 6). To investigate the regulation mechanism of the system, two plasmids have been constructed, with both possible combinations of the intergenic region and a lacZ gene which encodes β-galactosidase. Thus, the expression of the lacZ gene is under control of a promoter of the ssoIIM gene (pACYC-SsoIIM) or the ssoIIR gene (pACYC-SsoIIR). The β-galactosidase expression level is found to be 540 times higher in the cells containing the pACYC-SsoIIM plasmid in comparison to the other plasmid. Therefore, the expression from the ssoIIM promoter is much higher than from the ssoIIR promoter. When transformed with an additional plasmid where the ssoIIM gene is under its own promoter, the cells containing pACYC-SsoIIM demonstrate a 20-fold decrease of the β-galactosidase expression while the cells containing pACYC-SsoIIR display an 8-fold increase of it . Thus, M.SsoII is shown to downregulate the expression of its own gene and to stimulate the expression of the cognate RE gene. The transformation of the cells with a plasmid encoding a mutant M.SsoII without its first 72 residues does not influence the β-galactosidase expression level. The same effect is observed after the transformation with a plasmid encoding M.NlaX – a protein homologous to the M.SsoII domain responsible for methylation. On the contrary, a plasmid encoding a fusion of the M.SsoII first 72 residues with the full-length M.NlaX gives the same effect as the plasmid encoding the full-length M.SsoII . These experiments demonstrate that the M.SsoII ability to regulate transcription in the SsoII R–M system is determined by its N-terminal part (72 residues). This part is predicted to contain an HTH motif.
M.SsoII is a typical C5-DNA MTase which modifies the second cytosine in the sequence 5′-CCNGG-3′/3′-GGNCC-5′. Moreover, M.SsoII interacts with the intergenic region of the SsoII R–M system protecting from DNase I hydrolysis 48 nucleotides in the top strand and 52 nucleotides in the bottom strand . The intergenic region contains a 15 bp inverted repeat (regulatory site) . Seven guanine bases interacting with M.SsoII are identified using protection footprinting; six of them are located inside the regulatory site symmetrically relative to the central А•Т pair (Figure 7) . Interference footprinting experiments with formic acid, hydrazine, dimethyl sulfate, and N-ethyl-N-nitrosourea show 6 guanine, 2 adenine, and 4 thymine residues as well as 6 phosphate groups interacting with M.SsoII. These nucleotides form two symmetrically located clusters: 5'-GGA-3' and 5'-TGT-3' in each DNA strand of the regulatory site (Figure 7) . Such a symmetrical interaction with the both halves of the palindromic site is typical for many regulatory proteins which bind to the operator sequence in a dimeric form and contain an α-helix that interacts with the DNA major groove recognizing the specific sequence. M.SsoII as a transcription factor is supposed to have the same mechanism of functioning.
Arg35 or Arg38 substitution with Ala in M.Ecl18kI significantly impairs the protein binding to the regulatory site . According to the computer simulation performed for M.SsoII, these residues belong to the second recognizing helix of the HTH motif . Arg38 is supposed to form contacts with guanine bases of 5'-GGA-3' trinucleotide in one DNA strand while Arg35 can interact with thymine bases and DNA backbone of 5'-TGT-3' trinucleotide in the other strand (Figure 7). Amino acid substitutions in the M.Ecl18kI N-terminal part influence the ability of this protein to regulate transcription in the R–M system. However, there is no correlation between M.Ecl18kI affinity to the regulatory site and the amounts of the RNA transcripts . Amino acid substitutions in the M.Ecl18kI N-terminal part increase the methylation activity of this enzyme in most of cases. There is also an inverse relationship: an M.SsoII point mutant which has Cys142 substituted with Ala is catalytically inactive but demonstrates an increased affinity to the regulatory site and effectively regulates transcription in the SsoII R–M system . Thus, the interconnection between the two DNA binding sites is experimentally demonstrated for the SsoII-like DNA MTases. Moreover, it has been shown recently that M.SsoII binding to the regulatory site prevents its interaction with the methylation site. Thus, the two functions of the protein are mutually exclusive [67, 81].
The transcription start point of the RE gene is located in the beginning of the MTase gene in the SsoII and Ecl18kI R–M systems (Figure 6) . The transcription start point of the MTase gene is located inside the region protected by M.SsoII from DNase I hydrolysis, 5 bp away from the regulatory site. The suppression of the MTase gene transcription is based on the competitive binding of the MTase and RNA polymerase with the intergenic region of the SsoII (or Ecl18kI) R–M system. The MTase interaction with the regulatory site does not interfere with RNA polymerase binding to the RE gene promoter. Thus, the RE gene is activated indirectly via averting of RNA polymerase binding with the MTase gene promoter .
5. Domains of eukaryotic DNA methyltransferases
All known eukaryotic DNA MTases methylate C5 atom of cytosine. Eukaryotic MTases are usually multidomain proteins (Figure 8).
5.1. Functional domains of mammalian Dnmt1
Murine and human Dnmt1 consist of 1620 and 1616 amino acid residues respectively. The primary structures of these proteins share 85% identity. Dnmt1 molecule contains the following domains and functionally important regions (listed starting from the N-end, Figure 8):
charge-rich domain or DMAP1-binding domain (DMAP1 is DNA methyltransferase associated protein, a transcription repressor);
PCNA-binding domain, PBD;
at least three functionally independent nuclear localization signals, NLS ;
RFTS domain (replication foci targeting sequence), also called TS (targeting sequence), RFD (replication foci domain) or TRF (targeting to replication foci) domain;
cysteine-rich Zn2+-binding domain, also called CXXC domain;
BAH1 and BAH2 (bromo-adjacent homology or bromo-associated homology) domains which are parts of the so-called PBHD domain (polybromo homology domain);
KG linker (consists of Lys and Gly residues) which connects N- and C-terminal parts of Dnmt1;
C-terminal catalytic domain.
The N-terminal part of Dnmt1 contains several domains that regulate the activity of the catalytic domain. Such a structural organization supposes that the Dnmt1 gene has arisen as fusion of a MTase gene with nonhomologous genes of other DNA-binding proteins [11, 83].
Structures of two Dnmt1 complexes with DNA have been determined recently by X-ray crystallography: a fragment of murine Dnmt1 (residues 650–1602) with AdoHcy and a 19 bp DNA duplex containing two unmethylated 5′-CG-3′/3′-GC-5′ sites (PDB code: 3pt6) and a fragment of human Dnmt1 (residues 646–1600) with AdoHcy and the same duplex (PDB code: 3pta) . The catalytic domain of Dnmt1 forms a core of the complex and makes contacts with the DNA on one side and with both BAH domains on the other side (Figure 9). AdoHcy molecule is located in the active center of the catalytic domain. The CXXC and BAH1 domains are located on different sides of the catalytic domain and are connected by a long CXXC–BAH1 linker. The BAH1 and BAH2 domains are located distantly from the bound DNA and are separated from each other by an α-helical linker. The KG linker is disordered in the crystal. Different Dnmt1 domains are discussed further in this chapter.
It is worth to note that Dnmt1 targeting to replication foci is provided by three types of domains located in its N-terminal part: the PCNA binding domain, the RFTS domain, and the BAH domains [84-86]. Studying of Dnmt1 deletion mutants revealed 3 different DNA binding regions: the residues 1–343, the CXXC domain (residues 613–748), and the catalytic domain (residues 1124–1620) . The catalytic domain binds preferentially monomethylated 5′-CG-3′/3′-GC-5′ sites while the fragment of residues 1–343 binds these sites independently of their methylation status . The CXXC domain is shown to bind primarily dimethylated sites . However, in the crystal structure the CXXC domain interacts with an unmethylated substrate .
5.1.1. DMAP1-binding domain
The first 120 residues of Dnmt1 can bind the transcription repressor DMAP1. This interaction provides DMAP1 presence in replication foci during the whole S phase of cell cycle. Moreover, Dnmt1 interacts directly with histone deacetylase 2 (HDAC2) during late S phase. Dnmt1, HDAC2, and DMAP1 form a complex in vivo. Since the direct interaction between HDAC2 and DMAP1 has not been demonstrated, Dnmt1 is likely to serve as a basis for this complex formation .
A special form of Dnmt1, Dnmt1o, is synthesized in oocytes. It lacks the first 118 residues that compose the DMAP1-binding domain. Dnmt1o is accumulated in oocyte cytoplasm and is transferred into nuclei of the 8-cell stage embryo where it is likely to be responsible for maintaining the methylation patterns of imprinted genes . Dnmt1o is replaced by the regular Dnmt1 after implantation of the blastocyst. Homozygous mutant mice containing Dnmt1o in all somatic cells show a normal phenotype and have a normal level of genome methylation. This fact confirms the ability of Dnmt1o to perform all Dnmt1 functions. However, the Dnmt1o amounts and the corresponding enzymatic activity are much higher than the ones of Dnmt1. In heterozygous embryonic stem cells, the expression levels of Dnmt1o and Dnmt1 are the same. Though, in adult mice the Dnmt1o amount is 5 times higher than the Dnmt1 amount. So Dnmt1o seems to be more stable than Dnmt1 . The DMAP1-binding domain is likely to decrease Dnmt1 stability in vivo and thus could be involved in Dnmt1 degradation.
5.1.2. PCNA-binding domain
The Dnmt1 residues 163–174 are responsible for binding of PCNA (proliferating cell nuclear antigen, also known as processivity factor for DNA polymerase δ). Dnmt1 relocates to DNA replication foci when the cell enters S phase [84, 90]. Its binding with PCNA is observed in the regions of newly synthesized DNA in intact cells. This binding does not influence the MTase activity of Dnmt1 .
In mammalian cells, newly replicated DNA is rapidly packaged into nucleosomes to which histone H1 is added further [91, 92]. Histone H1 has a high affinity to methylated DNA regardless its nucleotide sequence  and can suppress the Dnmt1 enzymatic activity . Therefore, the maintenance methylation should be performed before DNA is packaged in nucleosomes. Dnmt1 binding with PCNA probably underlies a special mechanism required for coordination of these processes in a cell .
5.1.3. RFTS domain
Besides the presence in DNA replication foci during S phase (provided by the PCNA-binding domain) [84, 90], Dnmt1 is also associated with chromatin (mainly heterochromatin) from late S phase until early G1 phase . This association is provided by the RFTS domain (replication foci targeting sequence) and does not depend on the methylation patterns specific for heterochromatin or on the histone binding proteins. Moreover, the association with chromatin does not depend on DNA replication, since it takes place in G2 phase and M phase de novo .
The RFTS domain inhibits the Dnmt1 binding with both free DNA and nucleosomal DNA. It functions as an intrinsic competitive inhibitor of Dnmt1 and can decrease its enzymatic activity up to 600 times . The RFTS domain also inhibits the CXXC domain binding with nucleosomal DNA. The inhibition is observed for the isolated RFTS and CXXC domains as well as for the two domains in a single polypeptide chain. However, a deletion mutant containing both the CXXC domain and the catalytic domain is able to bind polynucleosomes in the presence of the isolated RFTS domain. Thus, the simultaneous presence of the two DNA binding domains seems to make the complex relatively resistant to exclusion of DNA by the RFTS domain .
The RFTS domain contains a Zn2+-binding motif followed by a β-barrel and an α-helical bundle (Figure 10). Hydrophobic interactions of the RFTS domains provide Dnmt1 dimerization, although its functional importance is still unknown . In murine Dnmt1(291–1620), the negatively charged RFTS domain penetrates deeply into the positively charged DNA-binding center of the catalytic domain and forms several hydrogen bonds inside it. Such a structural organization could explain the mechanism of DNA displacement from the catalytic center by means of competition with the RFTS domain .
5.1.4. CXXC domain
The cysteine-rich Zn2+-binding domain contains several cysteine residues organized in CXXC motifs which provide a name to the domain. The CXXC domain of Dnmt1 is similar to cysteine-rich domains of other chromatin-associated proteins such as the MeCP2 protein, the CG binding protein (CGBP), the histone MTase MLL, and the histone demethylases JHDM1A and JHDM1B. This domain is shown to bind unmethylated 5′-CG-3′/3′-GC-5′ sites in vitro in the case of MBD1, MLL, CGBP, JHDM1B, and Dnmt1 proteins . The CXXC domain of Dnmt1 is crescent-shaped and contains 8 conservative catalytically important Cys residues . These residues are clustered in two groups and bind two Zn2+ ions. In the structure of murine Dnmt1(650–1602) complexed with AdoHcy and 19 bp DNA (PDB code: 3pt6), all the specific contacts with DNA are formed by the CXXC domain which interacts with both major and minor grooves . A loop region of the CXXC domain (Arg684-Ser685-Lys686-Gln687) penetrates into the major groove and forms contacts with heterocyclic bases and phosphate groups. The guanine bases of the 5′-CG-3′ dinucleotide are recognized by the side chains of Lys686 and Gln687, whereas the cytosine bases are recognized by the backbone interactions of Ser685 and Lys686. Salt bridges are formed between the Arg side chains and the DNA backbone.
The CXXC domain is known to bind specifically unmethylated 5′-CG-3′/3′-GC-5′ sites [98, 99]. The structural data confirm this type of specificity: a methyl group presence at the cytosine C5 atom would result in steric clashes between the DNA and the protein atoms . The CXXC domain seems to bind newly synthesized unmethylated sites after DNA replication which would protect them from de novo methylation.
As mentioned above, Dnmt1 is a maintenance MTase that modifies mainly monomethylated sites. A deletion of the CXXC domain and the part of the CXXC–BAH1 linker results in a 7 times decrease of the Dnmt1 affinity to monomethylated DNA relatively to unmethylated one. The same effect is observed after a substitution of two residues (K686A/Q687A) in the CXXC domain that form contacts with the guanine bases in the recognition site .
Addition of dimethylated DNA stimulates murine Dnmt1 to methylate unmodified sites. Such an allosteric activation of Dnmt1 results in lowering its specificity [13, 29, 34, 100]. This effect depends on the presence of Zn2+ ions and seems to be provided by the binding of the Dnmt1 residues 613–748 with dimethylated DNA . This Dnmt1 region includes the CXXC domain. However, it remains unclear how the CXXC domain could bind dimethylated DNA.
5.1.5. BAH domains
The BAH1 and BAH2 domains (bromo-adjacent homology) are the parts of the so-called PBHD domain (polybromo homology domain). This domain is present in some transcription regulators and is supposed to participate in protein–protein interactions that lead to gene repression. The BAH1 and BAH2 domains in Dnmt1 molecule are connected by an α-helix and are arranged in a dumbbell shape (Figure 11). Three Cys and one His residues coordinate a Zn2+ ion that keeps the BAH1 domain near the linker α-helix. Despite low similarity of the primary structures, the both BAH domains have the same fold (Figure 11): the N-terminal subdomain consists of three antiparallel β-strands, the following subdomain consists of five antiparallel β-strands. Some smaller β-strands and loops which are located further are not homologous in different BAH domains.
Both BAH domains are physically associated with the MTase domain. Seven β-strands of the catalytic subdomain and two β-strands of the BAH1 domain form a common β-sheet. The BAH2 domain has a long loop (BAH2–TRD loop) which interacts with the TRD region of the MTase domain (Figure 11). Perhaps, this interaction prevents the TRD binding to DNA in the complex of murine Dnmt1(650–1602) with AdoHcy and 19 bp DNA duplex (PDB code: 3pt6) (Figure 9). The BAH1 and BAH2 domains have large solvent-accessible surfaces and thus could serve as platforms for interaction with other proteins.
The structure of Dnmt1(650–1602) complex with 19 bp DNA duplex and AdoHcy (PDB-код 3pt6) suggests an autoinhibition mechanism of Dnmt1: the CXXC domain binding with DNA results in DNA removal from the active center. The negatively charged CXXC–BAH1 linker is located between the DNA and the active center and thus prevents DNA entrance into the catalytic pocket (Figure 12). In addition, the BAH2–TRD loop fixes the TRD apart from DNA preventing its interaction with the major groove .
5.1.6. KG linker
The peptides containing the (LysX)nLys sequences (where Х = Gly, Ala or Lys) can effectively bind to Z-DNA and stabilize it even at low NaCl concentrations (10–150 mМ) and physiological pH. Such peptides can also induce B-DNA transition into Z-DNA. The efficacy of this process grows as the number of repeats (n) increases [101, 102]. The sequences (LysX)nLys are found in different proteins of plants, animals and unicellular eukaryotes. For example, they are present in the linkers which connect N- and C-terminal parts in eukaryotic MTases [102-104]. The number of repeats (n) varies from 5 to 7 in this case. The sequences of the linker and the preceding protein region are highly conservative. Thus, the linker is supposed to have an important but still unknown function.
The most effective DNA transition into Z-form is observed for the sequences with alternating purine and pyrimidine nucleosides, including 5′-(CG)n-3′ and especially when the cytosine residues are methylated at the C5 atoms. Additionally, the transition into Z-form is stimulated by high ionic strength in solution and by DNA negative supercoiling . The KG linker probably promotes Dnmt1 binding to 5′-CG-3′ islands in Z-form. However, Dnmt1 is not able to methylate Z-DNA . Perhaps, the KG linker participates in Dnmt1 targeting to the regions located behind a replication fork (due to their negative supercoiling when DNA polymerase has just passed) .
5.2. Functional domains of mammalian Dnmt3 family
Dnmt3 family includes C5-DNA MTases Dnmt3a and Dnmt3b, which are considered de novo MTases, and Dnmt3L – a catalytically inactive protein. There are also different isoforms of the proteins in the Dnmt3 family. Dnmt3a and Dnmt3b are expressed in embryonic cells and during gametogenesis . Knockout of the corresponding genes suppresses de novo methylation. Mouse embryos where the Dnmt3b gene is knocked out die in utero while mouse embryos lacking Dnmt3a gene die soon after birth . Dnmt3b methylates microsatellite repeats. In humans, Dnmt3b point mutations decreasing its enzymatic activity lead to a severe disease – ICF syndrome (immunodeficiency, centromere instability, facial abnormalities syndrome) [108-110].
Like Dnmt1, the Dnmt3 family enzymes consist of the N-terminal regulatory part and the C-terminal part containing the conservative motifs typical for C5-DNA MTases (Figure 8). The catalytic domains of Dnmt1 and Dnmt3 enzymes are homologous while the N-terminal parts share no similarity. Thus, these families seem to have been evolved from different prokaryotic predecessors [111, 112]. The intramolecular interactions between the N-terminal and the C-terminal parts are absent in Dnmt3a and Dnmt3b, in contrast to Dnmt1 .
The N-terminal parts of Dnmt3a and Dnmt3b contain two domains:
cysteine-rich domain ADD (ATRX–Dnmt3–Dnmt3L), also called PHD domain (plant homeodomain);
There is no PWWP domain in Dnmt3L. Moreover, Dnmt3L lacks some catalytic residues and the MTase motifs IX and X in the C-terminal domain. Dnmt3L functions as a stimulator of Dnmt3a and Dnmt3b enzymatic activity. Dnmt3L knockout mice are viable but the males are sterile while the females do not produce viable offspring [113-116].
5.2.1. PWWP domain
Dnmt3a and Dnmt3b contain a PWWP domain as well as some other chromatin-associated proteins. The PWWP domain is named after a conserved ProTrpTrpPro motif, though the first Pro is substituted with Ser in Dnmt3a and Dnmt3b. The second part of the motif is always the same, TrpPro . PWWP domain along with chromo domain, Tudor and MBT domains belongs to the Tudor domain “Royal family” [118, 119]. The members of this family are shown to bind modified lysine residues of histones . The PWWP domain consists of 100–130 amino acid residues. Its structure includes 3 motifs: a β-barrel core, an insertion between the second and the third β-strands, and a C-terminal α-helical bundle (Figure 13). Three aromatic residues form a cleft in the center of the β-barrel that is a distinctive feature of the “Royal family”. The insertion motif varies in length and secondary structure among the different PWWP domains. The C-terminal α-helical bundle can contain from 1 to 5 α-helices .
The PWWP domain of Dnmt3b has a positively charged surface with an approximate area of 45 × 32 Å2 and can bind DNA nonspecifically [117, 121]. The PWWP domain binds 30 bp duplexes with unmethylated, mono-, and dimethylated 5′-CG-3′/3′-GC-5′ sites. However, it can also bind a nonspecific duplex of the same length with the same efficiency . Deletion of the PWWP domain does not influence the Dnmt3b methylation efficiency in vitro . On the contrary, the PWWP domain of Dnmt3a is almost unable to bind DNA .
The PWWP domains of Dnmt3a and Dnmt3b are necessary for targeting these MTases to pericentromeric heterochromatin [121-123]. Deletions in the PWWP domain change the protein distribution in a nucleus and therefore result in its disability to methylate satellite repeats. However, such mutants are catalytically active since they are able to methylate DNA in other regions . In humans, S282P point mutation in the PWWP domain of Dnmt3b causes the ICF syndrome . In this case, the disease is likely to be caused by the enzyme improper distribution in nuclei rather than by its insufficient catalytic activity.
The PWWP domain of Dnmt3a specifically binds trimethylated Lys36 of H3 histone (H3K36m3) that increases the Dnmt3a ability to methylate nucleosomal DNA . Distribution of methylated sequences in a genome correlates with the presence of H3K36m3 [126, 127]. DNA methylation and H3K36m3 serve as marks for histone deacetylation and the following gene suppression . There is no crystal structure of Dnmt3a or Dnmt3b PWWP domain in complex with a histone protein or a peptide. The resolved spatial structures have a cleft that is supposed to bind methylated lysine residues. This cleft in the crystal contains a molecule of Bis-tris buffer (bis(2-hydroxyethyl)amino-tris(hydroxymethyl)methane) that is situated similar to the lysine in complexes of other PWWP domains with histone peptides containing di- or trimethylated lysine residues .
5.2.2. ADD domain
ADD domain is found only in the following proteins: ATRX (alpha thalassemia/mental retardation syndrome X-linked), Dnmt3a, Dnmt3b, and Dnmt3L. Thus, the domain is called ADD (ATRX–Dnmt3–Dnmt3L) . To date, the crystal structures of ADD domain from Dnmt3a, Dnmt3L, and ATRX proteins are resolved. The structural organization of the domain remains the same in all the cases: it contains two C4-type zinc fingers [129-131]. One of them is similar to GATA binding protein 1 (GATA1) and the other one – to plant homeodomain (PHD).
The ADD domain contains many cysteine residues which bind Zn2+ providing thus the Dnmt3a interaction with many other proteins such as transcription factors PU.1, Myc, RP58, histone deacetylase HDAC1, heterochromatin protein HP1, histone MTases SUV39H1, SETDB1 and EZH2, methyl-CG-binding protein Mbd3, and chromatin remodeling factor Brg1 . The functions of most of these interactions remain unclear. Additionally, the ADD domains of Dnmt3a, Dnmt3b, and Dnmt3L are shown to interact specifically with the N-terminal part of H3 histone when its Lys4 is not modified. This interaction stimulates Dnmt3a to methylate DNA [130-132]. Thus, the ADD domain of C5-DNA MTases can induce DNA methylation in response to specific histone modifications.
5.3. Domains of plant DNA methyltransferases
Plant DNA MTases are very diverse but are studied much less than the mammalian DNA MTases up to now . In particular, none of the plant MTases has been crystallized.
DNA MTase Met1 from Arabidopsis thaliana is quite similar to mammalian Dnmt1 (Figure 15). The C-terminal domains of these enzymes which are responsible for methylation share 50% identity. Both proteins possess an extended N-terminal part that is connected with the C-terminal domain via KG linker. The N-terminal parts of Met1 and Dnmt1 share 24% identity . There are four similar genes encoding Met1 in A. thaliana whereas only one gene encodes the mammalian Dnmt1. Genomes of Daucus carota and Zea mays contain two Met1 homologs. The N-terminal part of Met1 has two BAH domains. These domains seem to serve as a platform for protein–protein interactions which result in inhibition of gene expression thus providing an interconnection of DNA methylation, replication, and transcription regulation .
Dnmt3 family homologs are found in plants also. They form a DRM family (domains rearranged methyltransferase). The peculiarity of these enzymes is a circular permutation of their conservative motifs: the motifs VI–X are followed by the motifs I–V. The circular permutation of the conservative motifs is also found in some prokaryotic enzymes such as C5-DNA MTase BssHII . Another characteristic of the DRM proteins is their ability to methylate 5′-CHG-3′ and 5′-CHH-3′ sites (H = A, C or T) de novo in an RNA-dependent manner. Such DNA methylation is supposed to take place in the presence of short RNAs that guide methylation of homologous DNA . The DRM1 MTase from Nicotiana tabacum is shown to avoid cytosine methylation in 5′-CG-3′/3′-GC-5′ sites rather than specifically recognize any sequence. A structural basis of such an unusual functioning is not yet clarified .
Proteins of the DRM family are found only in flowering plants . The C-terminal domains of DRM proteins share 28% identity with Dnmt3a or Dnmt3b and contain the same catalytic conservative motifs. Cysteine-rich regions of mammalian MTases (the RFTS, the СХХС, and the ADD domains) are found neither in proteins of DRM family nor in other plant homologs of C5-DNA MTases . The N-terminal part of the DRM MTases contains several UBA domains (ubiquitin-associated). The presence of UBA domains seems to provide a link between DNA methylation and ubiquitin-mediated protein degradation. These domains could promote degradation of DRM molecules at specific points of the cell cycle . The UBA domains are not found in any other families of DNA MTases. The UBA domains of DRM2 MTase in A. thaliana were experimentally shown to be required for normal RNA-directed DNA methylation. Perhaps, these domains are essential for proper localization of MTases in the cell .
C5-DNA MTases of chromomethylase (or chromomethyltransferase, CMT) family modify 5′-CNG-3′/3′-GNC-5′ sites (N = A, C, G or T) in plant genomes. These MTases contain chromo domains that were identified as conserved sequences between the II and the IV MTase motifs (Figure 15). The chromo domain (chromatin organization modifier) consists of 50 amino acid residues and contains three β-strands and a perpendicularly packed α-helix. This folding type belongs to OB class (oligonucleotide/oligosaccharide binding fold) and is considered to be evolutionary very old. The chromo domain is responsible for DNA binding . This domain is originally found in polycomb-group proteins where it is important for the protein association with heterochromatin . Therefore, CMT are thought to modify heterochromatin. The chromo domain is not found in other C5-DNA MTases. The A. thaliana genome contains 3 members of the CMT family: CMT1, CMT2, and CMT3. The members of this family are also found in genomes of Oryza sativa and Brassica oleracea. Function of CMT1 and CMT2 are still unknown while CMT3 seems to methylate 5′-CNG-3′/3′-GNC-5′ sites . CMT3 deficiency in A. thaliana results in loss of DNA methylation in centromeric regions and also leads to retrotransposon activation [141, 142].
The enzymes from CMT family contain one BAH domain in its N-terminal region in contrast to the MTases from Met1 family that possess two BAH domains.
Precise time and space coordination of different molecular events underlies development of all living organisms, unicellular as well as multicellular. Synchronization of molecular processes can take place at a transcriptional level (when the same DNA-binding protein regulates expression of several genes) or at a post-translational level (when a multifunctional protein participates in different processes). Multifunctionality of a protein can be based on the presence of several domains in a single polypeptide chain. For example, mammalian Dnmt1, besides the catalytic domain which provides DNA methylation, contains several other domains responsible for Dnmt1 cellular localization, its interaction with other proteins, and regulation of its catalytic domain activity. Among prokaryotes, multidomain proteins are less common. Nevertheless, some bacterial DNA methyltransferases contain additional domains which are responsible for transcription regulation, topoisomerase activity etc.
As shown in this review, the structural and functional features of the additional domains in C5-DNA MTases are studied yet insufficiently. On the basis of the existing data, it is impossible to draw a decisive conclusion on the effect of the additional domains onto the methylating activity of C5-DNA MTases. Evidently, a complex research of multifunctional DNA MTases with multidomain organization would be most promising.
The authors would like to thank Mrs. Anna Nazarenko for her technical assistance. The work was supported by the Russian Foundation for Basic Research (grants no. 10-04-01578 and 12-04-32103).