Duplication of Coagulation Factor Genes and Evolution of Snake Venom Prothrombin Activators

Snake venom is a complex mixture of pharmacologically active molecules which are responsible for immobilization, paralysis, death and digestion of prey organisms. This armory of toxins has evolved to target two key systems, namely the neuromuscular and circulatory systems, in order to induce rapid immobilization and death. So far, several hundreds of protein toxins from snake venoms have been purified and characterized. Most of these toxins have been documented to be structurally, and at times functionally, similar to proteins expressed in different tissues of the body. For example, elapid phospholipase A2 toxins are structurally and catalytically similar to mammalian pancreatic phospholipase A2 enzymes (Robin Doley et al. 2009). Similarly, sarafotoxins are structurally and functionally similar to endothelins produced primarily in endothelium (Landan et al. 1991a). Based on such structural and functional similarities, it is hypothesized that toxin proteins are “recruited” from body proteins by gene duplication (Fry 2005). Accordingly, the genes of body proteins are duplicated and modified to have differential and specific expression in venom glands. This phenomenon is broadly termed as “recruitment”. This “recruitment” process of body proteins has not only been observed in snakes but also in various other venomous animals, such as cone snails, spiders, scorpions and sea anemones as well as hematophagous animals (Fry et al. 2009). Although this overarching concept existed in the field of snake venom toxins for decades, there is not much direct molecular evidence for this process of “recruitment”. Our laboratory has extensively characterized prothrombin activators from Australian elapid snake venoms and documented their structural and functional similarity with mammalian plasma coagulation factors. Through systematic, detailed studies, we provided the molecular details of the “recruitment” of venom prothrombin activators from plasma coagulation factors after gene duplication. We also identified several key structural changes that make these prothrombin activators better toxins. In this chapter, we will describe the first molecular evidence for the “recruitment” process and the evolution of prothrombin activators in venoms of Australian elapid snakes.

function of glycosylation is hypothesized to protect trocarin D from inactivation by proteolysis and confer it with thermal stability (Rao et al. 2003a;Wang et al. 1996). Although human FX contains two O-glycosylation and two N-glycosylation moieties (Inoue and Morita 1993), all the carbohydrate moieties are exclusively found on the activation peptide, which is removed when FX is activated. Therefore, activated mammalian FXa is not glycosylated. In contrast, the short activation peptide of trocarin D does not have any glycosylation sites. The last difference in post-translational modification is that the Asp 63 of FX light chain EGF-I domain is β-hydroxylated (McMullen et al. 1983;Stenflo et al. 1987), but the corresponding residue in trocarin D is not (Joseph et al. 1999) (Figure 1).

Group C prothrombin activators
The group C snake venom prothrombin activators are also found exclusively in the venoms of Australian elapid snakes (Rosing and Tans 1991). Oscutarin from Oxyuranus scutellatus venom was the first member of the group C prothrombin activators to be isolated and characterized (Owen and Jackson 1973;Speijer et al. 1986;Walker et al. 1980;Welton and Burnell 2005). This group of proteins is generally ~300 kDa in size and comprises two subunits (~60 and ~220 kDa) ( Table 1). The smaller enzymatic subunit is a serine proteinase and has characteristics of FXa, whereas the nonenzymatic subunit resembles activated mammalian plasma coagulation factor V (FVa). Overall, group C prothrombin activators have striking resemblances to and similar co-factor requirements as the mammalian plasma coagulant "FXa-FVa" complex (Filippovich et al. 2005;Masci et al. 1988;Rao and Kini 2002;Speijer et al. 1986;Walker et al. 1980) (Table 1).
Pseutarin C was purified from P. textilis venom, and it activates prothrombin to thrombin. For its optimal activity, pseutarin C requires only Ca 2+ ions and phospholipids (Rao and Kini 2002). These functional characteristics are similar to that of the mammalian "FXa-FVa" complex. As with other group C prothrombin activators, pseutarin C comprises two subunits of ~60 kDa and ~220 kDa (Rao and Kini 2002) (Table 1). The smaller subunit, with serine proteinase activity, was termed the pseutarin C catalytic subunit (PCCS) and the larger subunit, which has no enzymatic activity, was termed the pseutarin C nonenzymatic subunit (PCNS) (Rao and Kini 2002). A comparison of the protein quantities in the venom and plasma revealed that pseutarin C is expressed ~4,200 times higher in the venom than the amount of FV and FX in the plasma (Rao and Kini 2002). We purified pseutarin C and its subunits and characterized them both functionally and structurally as representatives of the group C prothrombin activators. 1. Pseutarin C catalytic subunit (PCCS) -Functionally, PCCS is similar to mammalian FXa and group D prothrombin activators (Rao and Kini 2002). They have the same co-factor requirements, including Ca 2+ ions, phospholipids and FVa, for their optimal activity and activate prothrombin by cleaving the same two peptide bonds (Arg 274 -Thr 275 and Arg 323 -Ile 324 ). The enzymatic activity (Vmax) of PCCS is enhanced by the presence of FVa (Rao and Kini 2002). The amino acid sequence of PCCS and its precursor was determined using both Edman degradation and cDNA sequencing (Rao et al. 2004;Rao and Kini 2002). Structurally, PCCS is also similar to mammalian FXa and group D prothrombin activators ( Figure 1). Its sequence shows ~42% identity to mammalian FXa and 74-83% identity to group D prothrombin activators (Rao et al. 2004). Like mammalian FXa and group D prothrombin activators (Rao et al. 2004), the domain architecture of PCCS consists of a light and a heavy chain that are linked by a single disulfide bond (Rao et al. 2004). The light chain has a Gla domain followed by two EGF-like domains, and the heavy chain contains a serine proteinase domain (Figure 1). Despite such functional and structural similarities, the differences between PCCS and mammalian FXa reside in the size of the activation peptide and post-translational modifications (Rao et al. 2004). Similar to trocarin D precursor, the activation peptide of PCCS precursor is 27 residues long and is significantly shorter than those of mammalian FXs (Figure 1). Like trocarin D, it has an insertion in its heavy chain. Interestingly, the PCCS insert is 13 residues long and is distinctly different from the 12residue insert in trocarin D (Rao et al. 2004;Reza et al. 2005b). This strongly indicates that the evolution of groups C and D prothrombin activators are independent. Like trocarin D, the Ser 52 and Asn 45 residues of the light and heavy chains of PCCS are O-and Nglycosylated, respectively ( Figure 1). However, as mentioned previously, these two residues have no post-translational modifications in mammalian FX/FXa. These functional and structural characteristics suggest that PCCS is a homologue of mammalian FXa and group D prothrombin activators. 2. Pseutarin C nonenzymatic subunit (PCNS) -Structurally, PCNS is similar to mammalian FV. The amino acid sequence of PCNS and its precursor was determined using Edman degradation and cDNA sequencing (Rao et al. 2003b;Rao and Kini 2002). PCNS shares ~50% identity with the mammalian FV and has identical domain architecture with mammalian FV (Rao et al. 2003b). Both PCNS and mammalian FV have six domains: A1, A2, B, A3, C1 and C2 ( Figure 2). Domains A and C are functionally important, and these domains are highly conserved in PCNS and FVs of other species (Rao et al. 2003b). These structural similarities suggest that PCNS is a homologue of mammalian FV (Rao et al. 2003b). Despite being a FV homologue, PCNS shows several differences with FV from other species. Firstly, the domain B size of PCNS is significantly smaller (127 residues) than that of fishes (fugu: 530 residues and zebrafish: 756 residues) and mammals (murine: 843 residues, bovine: 869 residues, and human: 882 residues) (Rao et al. 2003b) (Figure 2). During FV activation, domain B is removed by thrombin or FXa by cleavage at three activation sites: Arg 709 , Arg 1018 and Arg 1545 (bovine FV numbering) (Foster et al. 1983;Nesheim et al. 1979;Suzuki et al. 1982) ( Figure 3B). Although only two of these sites (Arg709 and Arg1545) are conserved in PCNS ( Figure 3B), the complete domain B can still be cleaved off during the activation of PCNS (Rao et al. 2003b). Thus, the difference in domain B size should not have any effect on the function of PCNS. Secondly, PCNS and FV of other species have different post-translation modifications. While bovine FV has 29 glycosylation sites, PCNS has only 11 potential N-glycosylation sites (Rao et al. 2003b) (Figure 2). Mammalian FV is phosphorylated at the Ser692 residue, but PCNS is not (Rao et al. 2003b). In addition, there are six sulfation sites in human FV that are absent in PCNS (Rao et al. 2003b). This difference in post-translation modifications is interesting, as sulfation and phosphorylation are important for regulating the activation of human FV by thrombin (Kalafatis et al. 1994;Pittman et al. 1994). Aside from these differences, it is noted that PCNS possesses certain modifications and associations that allow it to function efficiently as a toxin (Bos et al. 2009). Firstly, PCNS has evolved a way to evade inactivation by protein C. In the human coagulation system, protein C is activated by thrombin in the presence of thrombomodulin, and activated protein C (APC) subsequently inactivates FVa in a negative feedback loop (Esmon 2001). This inactivation occurs by proteolytic cleavage at three sites on the FVa heavy chain: Arg 306 , Arg 506 and Arg 662 (Kalafatis et al. 1994;Mann et al. 1997) in bovine FVa ( Figure 3B). PCNS is able to evade APC inactivation, as it does not have any of the three APC cleavage sites (Rao et al. 2003b). Even an alternate less efficient cleavage site at Arg 316 (van der Neut et al. 2004b) is not conserved in PCNS (Rao et al. 2003b). Secondly, FVa is inactivated by phosphorylation at Ser 692 (Kalafatis et al. 1994). This phosphorylation site is not present in PCNS (Rao et al. 2003b). Lastly, PCNS is shielded from APC inactivation (Nesheim et al. 1982;Rao et al. 2003b) and is kept activated through its constant and stable association with FXa-like PCCS (Rao et al. 2003a;Rao et al. 2004;Thorelli et al. 1998) in the venom. We have shown experimentally that pseutarin C is unaffected by APC, while bovine FXa-FVa complex is completely inactivated (Bos et al. 2009;Rao et al. 2003b) (Figure 4). Overall, PCNS is a good example of how a toxin gene is duplicated from an ancestral gene and undergoes modifications to gain unique characteristics that allow it to function efficiently as a toxin.   (Rao et al. 2003b). Varying concentrations of APC were added either to pseutarin C (E; 8 nM) or bovine FXa-FVa (F; FXa 42 nM, FVa 2 nM) complex which was diluted in 50 mM Tris-HCl buffer (pH 7.5) containing 100 mM NaCl, 5 mM CaCl2, and 0.5 mg/mL BSA. The reaction mixture was incubated for 30 minutes at room temperature. Prothrombin was added to a final concentration of 2.8 µM and thrombin formed was assayed using thrombin-specific chromogenic substrate S-2238. Each point represents an average of 2 independent experiments each carried out in triplicates.

Parallel prothrombin activator system in Australian elapid snakes
As described above, groups C and D snake venom prothrombin activators are functional and structural homologues of mammalian blood coagulation factors. As snakes are vertebrates, their hemostatic system should contain plasma coagulation factors. Thus, Australian elapid snakes should possess parallel prothrombin activating systems: one in their venom, which is used as an offensive weapon to attack the hemostatic system of the prey, and the other in their plasma, which is used for their own hemostatic purpose. We examined the presence of plasma coagulation factors in the snake's hemostatic system and determined the relationship between the snake venom and plasma coagulation factors.

Trocarin D and FX from Tropidechis carinatus (TrFX)
Since the liver mainly expresses plasma coagulation factors, the cDNA encoding T. carinatus FX (TrFX) was sequenced from liver tissue (Reza et al. 2005a). The deduced amino acid sequence of TrFX is similar to mammalian FX (~50%) and trocarin D (~80%) ( Figure 5). Structurally, TrFX is similar to trocarin D. They both have conserved cysteine residues and identical domain architecture. However, there are some differences between TrFX and trocarin D. The activation peptide of TrFX is similar to the mammalian FXs and not to that of FVa Activity (%)

Duplication of Coagulation Factor Genes and Evolution of Snake Venom Prothrombin Activators 265
the venom prothrombin activators (Reza et al. 2005b). It is 57 residues long compared to 27 residues in trocarin D ( Figure 5). In addition, there is no 12-residue insert in the heavy chain of TrFX as was observed to be present in the trocarin D precursor (Reza et al. 2005b). These differences in amino acid sequences, and the lengths of activation peptides and insertion in the heavy chain, suggest that TrFX and trocarin D are encoded by two independent genes. Hence, this confirms the presence of a parallel prothrombin activator system. TrFX is more similar to trocarin than to mammalian FX in terms of post-translational modifications (Reza et al. 2005a). TrFX and trocarin D both have N-and O-glycosylation modifications that are not found in mammalian FXs (as described previously). Trocarin D and TrFX differ in their physiological roles. Trocarin D plays an offensive role as a toxin in the venom that is used for killing prey. Upon envenomation, like other prothrombin activators (Masci et al. 1988;Rao et al. 2003a), it induces cyanation and death in experimental animals (Joseph et al. 1999) through disseminated intravascular coagulopathy. On the other hand, TrFX plays a crucial role in the coagulation cascade and prevents excessive blood loss by promoting blood coagulation when there is a vascular injury. Trocarin D is an active enzyme and is found in large quantities in the venom. In contrast, TrFX is found as a zymogen, which gets activated only when required and is found in much smaller concentrations in the plasma. Real-time polymerase chain reaction (RT-PCR) was used to determine the amount of expression of these two closely related proteins in the liver and venom gland. The results indicate that trocarin D is expressed in the venom gland but  T  T  T  T  T   I  I  I  I D  D   I  I  I  I  I   A  A  A  A  A   L  I  I  I  I   I  I  I  I  I 437  437  iver_type_1  417  iver_type_2  421  Cat  420   TADFANQVLMKQDFGIVSGFGRTRERGQTSNTLKVVTLPYVDRHTCMLSSNFPITQNMFCAGYNTLPQDACQGDSGGPHITAYRDTHFIT  TADFANQVLMKQNFGIVSGFGRTRERGKTSNTLKVVTLPYVDRHTCMLSSNFPITQNMFCAGYDTLPQDACQGDSGGPHITAYRDTHFIT  TADFANQVLMKQDFGIVSGFGRIFEKGPKSKTLKVLKVPYVDRHTCMVSSETPITPNMFCAGYDTLPRDACQGDSGGPHTTVYRDTHFIT  TADFANQVLMKQDFGIVSGFGGIFGRGPNSKTLKVLKVPYVDRHTCMLSSNFPITPTMFCAGYDTLPQDACQGDSGGPHITAYRDTHFIT  TADFANEVLMKQDSGIVSGFGRIQFKQPTSNTLKVITVPYVDRHTCMLSSDFRITQNMFCAGYDTLPQDACQGDSGGPHITAYRDTHFIT   T  T  T  T  T T  T  T  T  T   H  H  H  H  H   F  F  F  F  F   I  I  I  I  I   T  T  T  T  T   483  483  iver_type_1  463  iver_type_2   Propeptide not in the liver, while TrFX is expressed in the liver but not in the venom gland. Further, the expression of trocarin D is ~30 times higher in the venom gland than TrFX in the liver (Reza et al. 2007). Such differential expression patterns of trocarin D and TrFX strongly support the distinct physiological roles of these two proteins.

PCCS and FX from Pseudonaja textilis (PFX)
To understand the evolution of group C prothrombin activators, we also determined the cDNA sequence of the P. textilis FX (PFX) from the liver. Interestingly, two PFX isoforms (PFX1 and PFX2) were detected in the liver, and their cDNA sequences are ~85% similar (Reza et al. 2006). The domain architecture and cysteine residues of these two isoforms are also conserved compared to group D prothrombin activators. Amino acid sequence comparison shows that PFX1 is more similar to TrFX (~94%), while PFX2 is more similar PCCS and trocarin D (~90%) ( Figure 5). Further, PFX1 has a longer activation peptide, similar to plasma FXs, whereas PFX2 has a shorter activation peptide, similar to PCCS and trocarin D. Also, PFX2 has a 9-residue insert, which is not present in PFX1. These structural differences suggest that PFX1, PFX2 and PCCS are encoded by three independent genes and that PFX2 is an evolutionary intermediate between PFX1 and PCCS (Reza et al. 2006) ( Figure  5). This similarly confirms the presence of a parallel prothrombin activator system. The expression profiles of PFX1, PFX2 and PCCS were determined in liver and venom gland tissues by RT-PCR (Reza et al. 2006). The results show that PFX1 and PFX2 are expressed only in the liver, while PCCS is expressed only in the venom gland. PFX1 is also found to be expressed ~55,000 times higher than PFX2 in the liver, and PCCS is expressed ~80 times higher in the venom gland than is PFX1 in the liver (Reza et al. 2006). In summary, the sequence comparisons and expression profiles indicate that PCCS has evolved from PFX1 by gene duplication and PFX2 is an intermediary product of this "recruitment" process ( Figure 6).  (Reza et al. 2006).

PCNS and FV from Pseudonaja textilis (PFV)
The cDNA sequence of P. textilis FV (PFV) was determined from its liver (Minh et al. 2005). The deduced amino acid sequence of PFV shows similarities to other mammalian and nonmammalian FVs (~50%) and PCNS (~96%) and shares identical domain architecture (Minh et al. 2005). Like the FVs of other species, PFV and PCNS comprise A1, A2, B, A3, C1 and C2 domains ( Figure 3A). Functionally important domains A and C are highly conserved in both PFV and PCNS, whereas domain B is the most variable (Minh et al. 2005). The domain B (126 residues) of PFV is one residue shorter than that of PCNS (127 residues), and is much shorter than that of mammalian and non-mammalian FVs. A more detailed comparison shows that all the FXa and thrombin proteolytic cleavage sites (which are important for activation of these nonenzymatic proteins) are conserved in PFV and PCNS ( Figure 3A). However, PFV has an additional FXa proteolytic cleavage site at Arg 1765 (Minh et al. 2005;Rao et al. 2003b). This cleavage site also exists in mammalian FV but not in FVs of teleosts (Minh et al. 2005). This is evolutionarily interesting as this additional cleavage site may be a characteristic found only in tetrapod FVs. However, the functional implication of this cleavage with regards to procoagulant activity of FV is not yet known. As mentioned previously, PCNS has evolved to be resistant to inactivation by activated protein C (APC), which is crucial to its function as a toxin. On the other hand, PFV is similar to other FV, as it can still be inactivated by APC. PFV can be inactivated by APC by cleavage at Arg 316 , a primitive inactive site (van der Neut et al. 2004a), and at Arg 506 (Minh et al. 2005) ( Figure 3B). The expression profiles of PFV and PCNS in the liver and the venom gland were determined using RT-PCR. As with other venom prothrombin activator genes, PCNS is expressed only in the venom gland, while PFV is expressed only in the liver. It was found that PCNS is expressed ~280 times higher in the venom gland than is PFV in the liver (Minh et al. 2005). Thus, PCNS and PFV are have differential expressions (Minh et al. 2005). Based on sequence comparisons, we confirmed the presence of parallel prothrombin activator systems in Australian elapid snakes and showed for the first time that groups C and D prothrombin activators in snake venom and their plasma coagulation factor counterparts are closely related. We also proposed that these venom prothrombin activators evolved from their plasma coagulation factor counterparts by gene duplication and were subsequently modified to function efficiently as toxins.

Phylogenetic relationship between snake venom and plasma prothrombin activators
A phylogenetic tree of the snake venom and plasma prothrombin activators with other known FX sequences was constructed to understand their evolutionary relationships using zebrafish FX as the out group (Reza et al. 2006;St Pierre et al. 2005). All the reptilian sequences form a monophyletic group (Reza et al. 2006) (Figure 7). Within the reptilian clade, group C and D prothrombin activators appear as two separate clades on the tree. This indicates that, despite their similarities, group C and D prothrombin activators have originated independently. Interestingly, the PFX2 sequence is found nested within the group C prothrombin activators. This supports the hypothesis that PFX2 is an evolutionary intermediate of PCCS from PFX1. Based on the topology of the phylogenetic tree, it is suggested that these snake venom prothrombin activators have been "recruited" through independent evolutionary events (Reza et al. 2006) (Figure 7). Fig. 7. Phylogenetic relationships of snake venom and plasma prothrombin activators with other known FX sequences (Reza et al. 2006). "vPA" is an abbreviation for venom prothrombin activators. Arrows indicate the three independent "recruitment" events of snake venom prothrombin activators.

Comparison of trocarin D and TrFX genes
In the previous sections, we have described how the venom prothrombin activators have been modified to gain certain characteristics, such as resistance to inactivation, which enables them to function better as toxins. However, differential and tissue-specific expression of venom prothrombin activators and their plasma coagulation factors is also important for their respective physiological roles. The expression of toxins should be venom gland-specific and inducible to higher levels. This is so the snake can protect itself against its own venom toxins and replenish its venom supply quickly. Conversely, plasma coagulant factors are mainly expressed in the liver at constituently low levels so that they can be activated to induce blood coagulation during vascular injuries.
To understand how the venom prothrombin activators are regulated for tissue specificity and level of expression, we determined the gene structure of trocarin D and TrFX. Based on the cDNA sequences of trocarin D and TrFX, and that of mammalian FX gene, primers were  designed, and the gene sequences were determined using genomic DNA PCR and genome walking strategies (Reza et al. 2006). The gene organizations of trocarin D and TrFX are identical (eight exons and seven introns). The intron sequences were highly similar to each other with only differences in the promoter and intron 1 regions (Figure 8). Such similarities strongly support our findings that there are two closely related, parallel prothrombin activator systems, and that the venom prothrombin activators are "recruited" from plasma coagulation factors through gene duplication. The duplicated FX gene was subsequently modified and "recruited" for expression in the venom gland as a venom prothrombin activator.  (Reza et al., 2007).

Cis-elements in trocarin D promoter region
The overlapping promoter regions of trocarin D and TrFX were characterized by comparing them against previously characterized human (Hung et al. 2001;Hung and High 1996) and murine (Wilberding and Castellino 2000) FX promoter regions (Reza et al. 2007) (Figure 9). Based on these comparisons, four conserved cis-regulatory elements in the trocarin D and TrFX promoter regions were identified ( Figure 9): (i) a CCAAT box (Hung et al. 2001;Hung and High 1996;Wilberding and Castellino 2000), (ii) a gut-specific transcription factor GATA-4 binding site (Hung et al. 2001), (iii) a liver-specific transcription factor HNF-4 (Hung and High 1996), and (iv) multiple Sp1/Sp3 binding sites (Hung et al. 2001).
Comparison of the trocarin D and TrFX promoter regions reveals that trocarin D has a 264 bp insertion (Figure 8 and 9). This 264 bp is located from -33 to -297 bp upstream of the trocarin D start codon (ATG) (Figure 9). This insertion is postulated to play a major role in the recruitment of the duplicated TrFX gene by causing it to be exclusively expressed in the venom gland as the procoagulant toxin, trocarin D. Hence, it was termed Venom Recruitment/Switch Element (VERSE). This segment was characterized for its cis-elements and gene-regulatory role using luciferase assays in primary venom gland cells and mammalian cell lines (Kwong et al. 2009). The VERSE promoter was found to be responsible for the elevation of expression levels, but not tissue-specific expression, of trocarin D. In terms of cis-element characterization, besides confirming the presence of two TATA-boxes, one GATA box and one Y-box, three novel cis-elements were also identified ( Figure 9). Functionally, it is found that both TATA boxes (TLB2 and TLB3) are functional. However, TLB2 is the primary TATA box which initiates and directs transcription start site (Kwong et al. 2009) (Figure 9).  Fig. 9. Comparison of promoter regions in mammalian and T. carinatus prothrombin activator genes (Reza et al., 2007).

Comparison of trocarin D and TrFX first introns
The intron 1 size of trocarin D is 7911 bp, while that of TrFX is 5293 bp. The difference in size is explained by three insertions and two deletions in the trocarin D intron 1 region (Reza et al. 2007) (Figure 10). Bioinformatics analysis of these insertion/deletion segments reveals that they are novel. The three insertion segments within intron 1 of trocarin D region are 214 bp, 1975 bp and 2174 bp in size with respective positions at 128 bp, 914 bp and 3300 bp on the trocarin D gene (Figure 10). Upon closer analysis of the insertion segment sequences, it is observed that the first insert within intron 1 of trocarin D is almost an exact repeat (96.33% identity) of the intron 1 segment spanning from 3082 bp to 3299 bp. The other two inserts seem to be inverted repeats of each other (~71% identity). The third insert shows potential of being a Scaffold/Matrix Attachment Region (S/MAR) due to: (i) a high AT content (Cockerill and Garrard 1986;Liebich et al. 2002;Zhou and Liu 2001), (ii) a topoisomerase II (Boulikas 1993), (iii) a S/MAR consensus motif (van Drunen et al. 1997), (iv) a significant over-representation of characteristic hexanucleotides (Liebich et al. 2002), and (v) an ATTA motif and an AT-rich region with H-box ). The two deletion segments within intron 1 of trocarin D are 255 bp and 1406 bp in size with respective positions at 2610 bp and 3770 bp on the TrFX gene (Reza et al. 2007) ( Figure 10). As the VERSE promoter of trocarin D does not regulate tissue-specific expression (Kwong et al. 2009 intron 1 of trocarin D could contain cis-elements that are responsible for the venom glandspecific expression of trocarin D. In summary, the characterization of the VERSE promoter and intron 1 regions of trocarin D has increased our understanding regarding gene regulation of venom prothrombin activators. Fig. 10. Comparison of intron 1 regions in TrFX and trocarin D genes (Reza et al., 2007).

Gene duplication in snake venom toxin diversification
Besides "recruitment", gene duplication also plays an important role in the diversification of venom toxins. This diversification is essential for the development of novel toxins. This diversification through gene duplication is evident from the many toxin isoforms present in the snake venom. Interestingly, each isoform varies in its function and gene regulation. Gene duplication has also led to neofunctionalization of venom toxins, which has led to the new families of snake venom toxins (St Pierre et al. 2008) and addition of new members within these families (Fry et al. 2003;Landan et al. 1991b;Lynch 2007;Moura-da-Silva et al. 1996). The three-finger toxin (3FTx) multigene family is a good example of neofunctionalization by gene duplication (Fry et al. 2003). Structurally, all the members of this family have very well-conserved cysteine residues and share a common structure of three beta-stranded loops extending from a central core. However, they exhibit a wide variety of pharmacological effects. For example, acetylcholinesterase inhibition (fasciculin from Dendroaspis angusticeps venom), neurotoxicity ( -bungarotoxin from Bungarus multicinctus venom), cardiotoxicity ( cardiotoxin from Ophiophagus hannah venom), and many others (for details, see (Kini and Doley 2010)). Neofunctionalization occurs when a toxin gene undergoes gene duplication and the duplicated gene is mutated within the functional sites, which often results in new ligand-binding specificities (Kini 2002). Besides neofunctionalization, changes in gene regulation are also the outcomes of gene duplication. This can be seen in two isoforms present in the venom of Naja sputatrix: cardiotoxin and -neurotoxin (Ma et al. 2001). Besides varying in function, these two isoforms have different expression levels in the venom gland. Cardiotoxin constitutes 60% of the venom while the -neurotoxin makes up only 3% of the venom. Gene duplication is evident from the gene comparison whereby the structures and amino acid sequence of these two toxins are very well-conserved (Ma et al. 2002). The main difference lies in the promoter segment, where it was found that the -neurotoxin promoter contains a stronger silencer element, which is responsible for significantly reducing its expression level in the venom (Ma et al. 2001;Ma et al. 2002).
In the case of venom prothrombin activators, we have shown that they have been "recruited" from the gene of an ancestral plasma prothrombin activator protein through gene duplication. The duplicated gene underwent modifications in its regulatory and coding regions to gain toxin characteristics. VERSE segments were inserted in the promoter regions of trocarin D and PCCS and are responsible for their elevated level of expression. Insertion/deletion segments in their intron 1 regions are postulated to be responsible for venom-gland specific expression. Modifications in the gene-coding regions enable prothrombin activators to function better as toxin by gaining certain characteristics such as resistance to inactivation.

Conclusion
Gene duplication has played a major role in the development of snake venom toxins. Our findings on venom prothrombin activators and blood coagulation factors have captured the first molecular evidence of gene duplication. The characterization of the differences in their genes, i.e. VERSE segment and intron 1 of trocarin D, has increased our understanding of gene regulation of snake venom toxins. It is shown that the VERSE segment is responsible for the elevation of gene expression and that the intron 1 is probably responsible venom gland-specific expression (unpublished observations). We identified three novel cis-elements in the VERSE segment, and these play important roles in gene regulation. It would be interesting to further characterize them and their transfactor partners to determine how various trans-factors interact with each other to regulate gene expression. The answers to some of these questions will increase our overall understanding of gene regulation.