Bifunctional Prokaryotic DNA-Methyltransferases

Restriction-modification systems (RMS) are prokaryotic tools against invasion of foreign DNAs into cells [1]. They reduce horizontal gene transfer, thus stimulating microbial biodi‐ versity. Usually, they consist of a restriction endonuclease (REase) and a modification DNA methyltransferase (MTase) enzyme recognising the same short 4-8 nucleotide sequence. MTase is responsible for methyl group transfer to adenine or cytosine nucleotides within the target sequence, thus preventing its hydrolysis by cognate REase. Up to now, more than 20 000 different RMS have been collected in the REBASE, the database holding all known, and many putative, RMS [2]. Many of these RMS have head-to-tail gene orientation, thus provid‐ ing, by our hypothesis, the possibility of gene fusion through point mutations or genome re‐ arrangements such as deletions, insertions, inversions or translocations. These events could be responsible for the origin of bifunctional restriction enzymes of type IIC [3] such as AloI, BcgI, BseMII, BseRI, BsgI, BspLU11III, CjeI, Eco57I, HaeIV, MmeI, PpiI, TstI and TspWGI; bifunctional MTases such as FokI and LlaI, and regulatory SsoII-related MTases [4].


Introduction
Restriction-modification systems (RMS) are prokaryotic tools against invasion of foreign DNAs into cells [1].They reduce horizontal gene transfer, thus stimulating microbial biodiversity.Usually, they consist of a restriction endonuclease (REase) and a modification DNA methyltransferase (MTase) enzyme recognising the same short 4-8 nucleotide sequence.MTase is responsible for methyl group transfer to adenine or cytosine nucleotides within the target sequence, thus preventing its hydrolysis by cognate REase.Up to now, more than 20 000 different RMS have been collected in the REBASE, the database holding all known, and many putative, RMS [2].Many of these RMS have head-to-tail gene orientation, thus providing, by our hypothesis, the possibility of gene fusion through point mutations or genome rearrangements such as deletions, insertions, inversions or translocations.These events could be responsible for the origin of bifunctional restriction enzymes of type IIC [3] such as AloI, BcgI, BseMII, BseRI, BsgI, BspLU11III, CjeI, Eco57I, HaeIV, MmeI, PpiI, TstI and TspWGI; bifunctional MTases such as FokI and LlaI, and regulatory SsoII-related MTases [4].
In our previous work we proved the possibility of fully functional hybrid polypeptide origin through gene fusion, taking as an example Eco29kI RMS.In the given RMS, the REase gene precedes MTase gene and their Stop and Start codons overlap.By site-directed mutagenesis we joined these two ORFs into one and characterised the resulting protein, carrying both REase and MTase activities [5].Its REase activity was decreased three times and the optima of the catalytic reaction changed, whereas MTase activity turned out to be intact [5][6][7].The bifunctional enzyme could be changed as a result of evolution, leading to further divergence of its properties and functions in the cell.By our hypothesis, this example could serve as a molecular mechanism of new bifunctional RMS origin.In the current work, based on genomic data and their bioinformatics analysis, we aimed to prove that gene fusion could play an important role in evolution of metyltransferases and in origin of multidomain eukaryotic methyltransferases.
For the current work we searched protein databases as described above and found 76 new potential bifunctional MTases.Their structural organisation is presented and discussed in the report.Beside this, we analysed structural organisation of 627 non-putative prokaryotic DNA-methyltransferases available out of 980 methyltransferases collected up to now in RE-BASE.The most frequently observed structural type, other than "canonical" MTases, represents SsoII-related methyltransferases, capable to serve as transcriptional autoregulatory proteins.These data provide additional evidences that gene fusion might play an important role in evolution of methyltransferases, restriction-modification systems and other DNAmodifying proteins.We discuss the general consequences of a hypothetical protein fusion event with methyltransferases and RMS enzymes.

Method used
For this study 10619 methyltransferases and 3250 restriction enzyme sequences were downloaded from the REBASE database [2] and searched against the non-redundant protein database which was downloaded from the NCBI's ftp site.The similarity search was carried out by BLAST version 2.2.23+ [8] on a Linux server using the BLAST default parameters.The local pairwise alignments hits were then filtered using the following criteria: a match to a methyltransferase enzyme was kept if its E-value was less than 1e-140, the sequence identity of the aligned region was greater than 80% and the subject sequence was at least twice longer than the query.By this algorithm we found 272 candidate hits.A match to a restriction enzyme was kept if its E-value was less than 1e-140, the sequence identity within the aligned region was greater than 80% and the length of the subject sequence was at least 1.5 times longer than the query.By this algorithm we found 28 candidate matches.The candidate matches then were manually analysed.

Methyltransferase fusions with a restriction endonuclease
Newly found potential bifunctional restriction and modification enzymes are presented in Table 1.With our BLAST search we succeeded to filter from NCBI database 22 fusions of a DNA methyltransferase with a restriction endonuclease, carrying both endonuclease and methyltransferase domains in one polypeptide.As can be seen from Table 1, the enzymes were grouped according to their domain organisation as it was presented in Conserved Domain Database [9].As could be judged from their domain organisation, 20 new bifunctional REases are thought to represent the fusion of a REase with MTase and target recognition subunits of the type I restriction-modification systems (R-M-S structure), having similar organisation with the known type IIC bifunctional enzymes such as AloI [10], CjeI [11], MmeI [12], PpiI [13], TstI [13] and TspGWI [14].Type I RMS enzymes are multisubunit proteins that function as a single protein complex, consisting of R, M and S subunits [3].The S subu-nit is the specificity subunit that determines which DNA sequence is recognised.The R subunit is essential for cleavage (restriction) and the M subunit catalyses the methylation reaction.Their protein products are marked as HsdS, HsdR and HsdM, respectively.Covalent linking of these subunits in one polypeptide is not thought to interfere with their catalytic activities, giving an opportunity for successful fusion.Currently, the REBASE contains more than 8000 entries corresponding to Type I RMS.Hypothetically, any of these RMS could be joint, naturally or artificially, giving a new bifunctional RMS.
One RMS from Bacteroides sp.D22, probably originated from the fusion of type III enzymes.It has conserved motifs similar to Eco57I protein, which consists of Mod and Res subunits of type III enzymes [15].Type III systems are composed of two genes (mod and res) encoding protein subunits that function in one protein complex either in DNA recognition and modification, Mod, or restriction, Res [3].As in the case of the type I enzymes, in-frame fusion might not affect their normal functioning, thus, being also favourable for new bifunctional protein origin.
One of the found RMS from Arthrospira maxima CS-328 has a different domain organisation, belonging to R-M type of structure (type II REase and MTase fusion).It could be suggested that it originated from fusion of head-to-tail oriented Type II REase and MTase genes.The principal possibility of a new bifunctional RMS origin by this mechanism was proven by inframe joining of type IIEco29kI REase and MTase genes [5].The resulting RMS was capable of defending host cells from phage invasion, although 100 times less effectively than the wild type.In a similar way, a new bifunctional RMS could appear from other head-to-tail oriented RMS of type II such as AccI, BanI, Bsp6I, BsuBI, Cfr9I, DdeI, EagI, EcoPI, EcoP15, EcoRI, FnuDI, HaeIII, HgiBI, HgiCI, HgiCII, HgiDI, HgiEI, HgiGI, HhaII, HincII, HindIII, HinfI, HpaI, MboII, MwoI, NcoI, NdeI, NgoMI, NgoPII, NlaIII, PaeR7I, RsrI, SalI, Sau3A, Sau96I, TaqI, TthHB8I, XbaI, and XmaI [16].Type II restriction enzymes and modification enzymes work separately and their fusion could create steric difficulties for their functionality.In fact, in the case of the RM.Eco29kI enzyme, its REase activity decreased three times in comparison with initial R.Eco29kI nuclease.This is perhaps why it is the only found example of natural bifunctional RMS originating from type II RMS.Table 1.Methyltransferase fusions with a restriction endonuclease.Domains are predicted and presented as in Conserved Domain Database [9].Different domains are shown by different fillings and their classification is shown under the table.Short description for each found protein includes Gene Bank accession number, length in amino acids, current name in the database and host strain information.

Fusions between two DNA methyltransferases
As shown in Table 2, in contrast to the situation with bifunctional MTase -REase fusions, among 54 newly found fusions between two methyltransferases 49, apparently, are joining of two different methyltransferase ORFs (M-M structure) and 5 of methyltransferase (HsdM) and target recognition subunit (HsdS, M-S structure).11 M-M type proteins represent interesting examples of dcm and dam methyltransferase fusion.In this case dam corresponds not only to one particular MTase, but to a conserved domain common for DNA adenine methyltransferases, as adopted from the Conserved Domain Database web site [9].In a similar way, dcm corresponds to a conserved domain common for DNA cytosine methyltransferases.These MTases catalyse methyl group transfers to different nucleotide bases, adenine in the case of dam, and cytosine in the case of dcm.It could be suggested that originally they belonged to two different genes, and were joint in-frame occasionally.The post-segregation killing effect of restriction-modification enzymes prevents RMS from being lost [22], thus promoting maintenance of a fused ORF and its spreading in bacterial populations.The next chapter will be devoted to a more detailed analysis of this effect.The other 23 bifunctional MTases of M-M type probably originated from a similar joining of two dam methyltransferase genes (Table 2).Further evolution of bifunctional MTases depends on their involvement in RMS functioning.If both activities are critical for the RMS work, for example, modifying two different bases of an asymmetric recognition sequence [23], they will be maintained as elements of this RMS.If the activity of at least one MTase domain of the bifunctional enzyme is redundant, it could accumulate mutations and, after many generations, reduce or gain new substrate specificity and function in the cell history.Table 2. Fusions between two DNA methyltransferases.Domains are predicted and presented as in Conserved Domain Database [9].Different domains are shown by different fillings and their classification is shown under the table.Short description for each found protein includes Gene Bank accession number, length in amino acids, current name in the database and host strain information.
For example, DNMT1, a DNA methyltransferase 1 from H. sapiens [24], contains a conserved domain of m5C MTases at its C-terminal domain.The N-terminal and central parts of the enzyme include several different domains such as DMAP binding domain, replication foci domain, zinc finger domain and two bromo adjacent homology domains (Figure 1 a).It could be suggested that several gene fusion events were involved in its evolutionary.This hypothesis is supported by existence of simpler homologs of DNMT1 such as, for example, M.AimAII (Figure 1 2 represent the fusion of a conserved pfam12564 type III RMS 60 aa domain with pfam01555 N4-N6 MTase domain, characteristic both for N4 cytosine-specific and N6 adenine-specific DNA methylases.The pfam01555 conserved domain could be found both in type II MTases, such as M.KpnI, and type III Mod proteins, such as EcoP1I and EcoP15I.In contrast, a 60 aa conserved pfam12564 domain could be found only in several type III enzymes.Its addition could influence the biochemical properties of the corresponding proteins.To establish the character of its influence experimentally in the future, it would be necessary to compare the biochemical properties of MTases containing the pfam12564 60 aa sequence, with MTases not containing it.

Fusion of RMS enzymes with a hypothetical protein
Figure 2 shows the consequences of a RMS protein fusion with a hypothetical protein, X.The upper part of the figure represents normal RMS functioning, when host DNA is protected by an MTase and foreign DNA is degraded by a cognate REase.In the case of RMS loss, MTase and REase enzymes will be diluted following cell divisions, host DNA will become unprotected and, finally, degraded by the residual REase activity.
This effect is known as post-segregation killing [22] and is possibly due to REase activity lasting longer than MTase activity after the RMS loss.The lower part of the figure illustrates the situation of the fusion of some protein X with one of the RMS enzymes.If this joining will not affect seriously the enzyme activities, the RMS will continue to protect cells from foreign DNA invasion.In the case of RMS loss, the same mechanism will lead to host DNA degradation.In this way, a fused protein with RMS enzymes will be supported and spread among bacterial populations.In fact, during a BLAST search we could see close homologues (>98%) of bifunctional enzymes such as, for example, CjeI, spread among numerous strains of the Campylobacter group of microorganisms.These observations could illustrate our hypothesis.Another example of the propagation of a RMS-fused protein is provided by M.SsoII-related enzymes.These MTases represent fusion of a regulatory protein with a C5cytosine methyltransferase [4] and can be found in 63 different microbial taxa.In another case, if fusion with a hypothetical protein is detrimental for the activity of one of the RMS enzymes, an outcome will depend on which of the enzymes is affected.If an MTase activity will be reduced, the corresponding RMS will be eliminated due to host DNA degradation by cognate REase.If a REase activity is disturbed, the corresponding RMS will become non-functional, will not be supported by post-segregation killing mechanism and, after many generations, could disappear or take on different functions in the cell.Another, less probable, scenario is possible if a joining with a hypothetical protein would improve the properties of RMS enzymes.In this situation, the corresponding RMS would protect host cells more effectively and that would increase their selective advantage over competitive microbial populations, which, in turn, could lead to a wider distribution of the RMS carrying the fused protein.

Domain organisation of non-putative DNA methyltransferases from REBASE
We analyse structural organisation of 627 non-putative prokaryotic DNA-methyltransferases collected up to now in REBASE, a major database of restriction-modification enzymes [2].We succeeded to download sequences of 627 prokaryotic methyltransferases out of 980 nonputative MTases enlisted in REBASE on 01.12.2011 (for their detailed description see Supplementary materials).Out of 627, 190 sequences belong to dcm type of DNAmethyltransferases; 172 -to N6-N4 type; 99 -to HsdM type and 78 -to dam-related enzymes according to Conserved Domain Database [9].We found that the most frequently observed structure, other than "canonical" methyltransferases with conserved motives responsible for binding with AdoMet and a methyl group transfer, represents C5-methyltransferase core domain fusion with a regulatory DNA-binding protein and up to now includes 18 potential enzymes (Table 3).These SsoII-related methyltransferases carry additional DNA-binding HTH-domains and they are capable to serve as transcriptional autorepressors [4].Among these 18 SsoII-like MTases HTH-motif can be located in majority of cases on N-terminal part, and in three proteins -in the middle of their polypeptide chains (M.Esp1396I, M.PflMI and M.SfiI; Table 3).Ability for autoregulation was not confirmed for majority of these SsoII-like MTases and will require some experimental proofs, which could be considered as perspective future directions of research.The fact of SsoII-like enzymes propagation among different bacterial taxa could illustrate well our analysis of a RMS protein fusion with a hypothetical protein, described in the previous chapter.

Conclusion
Here we report finding 76 new bifunctional methyltransferases.The majority of the found joint proteins with a nuclease are thought to be fusions of a restriction nuclease with methylase and target recognition subunits of type I restriction-modification systems (R-M-S structure).The majority of the found joint proteins between two methylases appears to be damdcm and dam-dam enzyme fusions (M-M structure).Similar proteins could serve as structural intermediates for multidomain eukaryotic methyltransferase evolution.We suggest that a hypothetical protein fusion with a restriction-modification enzyme can promote its propagation in bacterial populations.Altogether, our data illustrate a role of gene fusion in restriction-modification enzyme evolution.
b) and M.AimAI (Figure 1 d) from Ascobolus immersus, and M.NcrNI from Neurospora crassa (Figure 1 c), looking like not completely assembled DNMT1 with one or several domains missing.15 other M-M type bifunctional MTases from Table

Figure 2 .
Figure 2. Schematic representation of post segregation killing effect responsible for maintaining RMS and proteins joint with one of its enzymes.The figure summarizes different outcomes of a hypothetical protein X joining to one of a RMS enzymes.If this joining is neutral or positive for the RMS functioning, it will be maintained and spread; if detrimental, it will be eliminated.Black filled circles show methylated nucleotides, interrupted lines -degraded DNA.