Proteins expressed during the different latency programs.
The Epstein-Barr virus (EBV) is a DNA virus with a relatively stable genome. Indeed, genomic variability is reported to be around 0.002%. However, some regions are more variable such as those carrying latency genes and specially EBNA1, -2, -LP, and LMP1. Tegument genes, particularly BNRF1, BPLF1, and BKRF3, are also quite mutated. For a long time, it has been considered for this ubiquitous virus, which infects a very large part of the population, that particular strains could be the cause of certain diseases. However, the mutations found, in some cases, are more geographically restricted rather than associated with proliferation. In other cases, they appear to be involved in oncogenesis. The objective of this chapter is to provide an update on changes in viral genome sequences in malignancies associated with EBV. We focused on describing the structure and function of the proteins corresponding to the genes mentioned above in order to understand how certain mutations of these proteins could increase the tumorigenic character of this virus. Mutations described in the literature for these proteins were identified by reporting viral and/or cellular functional changes as they were described.
- Epstein-Barr virus
- next generation sequencing
Epstein-Barr virus (EBV), a ubiquitous gamma-herpesvirus, infects the vast majority of the worldwide human population. This virus was initially discovered in cultured lymphoma cells from patients with Burkitt’s lymphoma (BL) in 1964 . During the primary infection, EBV infects epithelial cells of the oropharynx where it actively replicates and also infects B cells where it establishes a life-long latency in the form of an episome located in the host cell nucleus. During latency, EBV may produce nine viral latency proteins, including six so-called “Epstein-Barr Nuclear Antigens” (EBNA1, -2, -3A, -3B, -3C, and -LP), involved in transcriptional regulation, and three “Latent Membrane Proteins” (LMP1, -2A, and -2B), mimicking signals needed for B cell maturation, as well as two small noncoding RNAs (EBER-1 and EBER-2), BamHI-A rightward transcripts (BARTs), and miRNAs. Four different latency programs can be identified, based on the proteins that are expressed (Table 1). EBV primary infection, which occurs more often in childhood, is usually asymptomatic in children, whereas it may be responsible for infectious mononucleosis (IM) in teenagers or young adults in western countries. In addition to this nonmalignant disease, EBV can also be associated with diverse malignant pathologies. In particular, EBV is involved in the development of several malignancies of lymphoid origin including endemic Burkitt’s lymphoma , nasal NK/T lymphoma , some Hodgkin’s lymphoma , and B- or T-cell lymphoproliferations in immunocompromised patients . It is also implicated in epithelial malignancies such as undifferentiated nasopharyngeal carcinoma (NPC)  and 10% of cases of gastric carcinoma . Although populations from all geographic areas are infected by the virus, the incidence of the pathologies in which it occurs varies significantly depending on the region . For example, BL occurs mainly in children living in sub-Saharan Africa , and the prevalence of NPC is particularly high in adults living in Southern China, Southeast Asia, and Northern Africa . The differences observed in the geographic distribution of these pathologies suggest that there could be various genetic variants of EBV, of different global distributions, and with different levels of transforming capacity. This question of a specific disease variant is raised by many authors and is still being debated. In this chapter, we wish to take inventory of the state of knowledge concerning the variability observed on the most mutated genes among all EBV genes and the possible implications in human pathology.
|Program||EBV expressed proteins||Active promoters||B cell type|
|Latency III||Growth||EBNA-1, -2, -3A, -3B, -3C, -LP||Initially Wp||Naive B cells|
|LMP-1, -2A-, -2B||Then Cp|
|EBER-1 and -2p|
|Latency II||Default||EBNA-1||Qp, EBER-1 and -2p|
|LMP-1, -2A, -2B||LMP promoters|
|Latency I||Latency||EBNA-1||Qp, EBER-1 and -2p||Resting B cells|
|Latency 0||No protein or LMP-2A||LMP-2Ap||Memory B cells|
2. Evolving knowledge of the EBV genome
The fact that the viral genome is relatively large (175 kb), that it is made up of DNA, therefore less variable than if it was an RNA genome, and that it carries repetitive regions, limited its sequencing for a long time. The first published sequences were small fragments of the B95-8 genome; then, the entire B95-8 genome was sequenced in 1984 . The B95-8 strain was the first cultured EBV cell line able to secrete large amounts of viral particles into the culture medium. It was originally obtained from a spontaneous human lymphoblastoid cell line (LCL) established from a North American case of infectious mononucleosis, the 883L cell line, whose virus was used to transform lymphocytes from a cotton top marmoset. Since it was the first strain with a fully published genome, B95-8 has been extensively studied and mapped for transcripts, promoters, and open reading frames.
This first EBV whole genome sequencing was followed by others, and complete viral genome sequences of the cell lines AG876, originating from a Ghanaian case of African BL  and GD1, obtained from cord B cells infected with EBV from saliva of an NPC patient in Guangzhou, China  were published. Sequences of some genes, mainly latency genes, were also studied, especially in lines established from patients [14, 15]. B95-8, GD1, and AG876 were sequenced by conventional shotgun sequencing (Sanger’s method). The comparison of sequences obtained for various cell lines revealed the existence of two types of EBV: type 1 or A, of which B95-8 can be considered as the prototype, and type 2 or B, exemplified by AG876. The main difference between the two types concerns the EBNA2 gene, with only 70% identity at the nucleotide level and 54% identity in the protein sequence . Additional variations have also been observed in the EBNA3 genes, but to a lesser extent: 10, 12, and 19% of base pair differences for EBNA3A, 3B, and 3C, respectively . The comparison of viral sequences also highlighted that the B95-8 cell line has a significant 11.8 kb deletion (positions 139,724–151,554) corresponding to some of the BART miRNA genes, one of the origins of lytic replication , the LF2 and LF3 genes, and a part of the LF1 gene. More complete sequence comprising the B95-8 sequence supplemented with a Raji fragment at the level of deletion has been constructed. It was annotated in 2010 as RefSeq HHV4 (EBV) sequence NC_007605 and is now used as a wild-type strain reference .
As adaptation of the virus to in vitro culture is possible, thus generating a bias in the results, some authors have preferred to sequence the viral genome directly in samples from patients. Therefore, the sequences GD2, from a Guangzhou NPC biopsy, and HKNPC1, from a Hong Kong NPC biopsy, were published [19, 20], both using a more recent sequencing technique, “next generation sequencing” (NGS). This technology can be used directly on samples or after enrichment, which avoids artifacts due to cellular DNA. Enrichment can be achieved by PCR or cloning into F-factor plasmids, but most frequently, it is carried out using target DNA capture by hybridization. NGS delivers a wealth of information and requires extensive bioinformatic analysis. This technology has made it possible to rapidly increase the number of fully sequenced viral genomes originating from healthy subjects or patients and thus obtain more information.
3. The most variable regions of the genome
Authors who sequenced the entire viral genome and analyzed the genomic variations came to the conclusion that the latent genes harbored the highest numbers of nonsynonymous mutations [20, 21, 22, 23, 24]. For example, Liu et al.  compared the sequences of nine strains of EBV to GD1, of which they were most closely related, and showed that latency genes were the most mutated. In this study, latent and tegument genes were found to harbor 58.4 to 84.3% of all nonsynonymous mutations detected for each genome. Santpere et al.  found that latent genes were twice as mutated as lytic genes. The observation that the latent genes harbor more nucleotide diversity than lytic genes was made regardless of the type of pathology: nasopharyngeal carcinoma [20, 21], NK/T lymphoma , endemic Burkitt’s lymphoma , Hodgkin’s lymphoma , posttransplant lymphoproliferative disease , gastric carcinoma , lung carcinoma , and also strains originating from infectious mononucleosis  or healthy subjects . Why latent genes are the most variable is not clear today. By analyzing their data according to the Yang model , Santpere et al.  showed that the lytic genes had an evolutionary constraint close to that of the host: a strong purifying selection was objectified for 11 lytic genes. However, signatures of accelerated protein evolution rates were found in coding regions related to virus attachment and entry into host cells. The latency genes, on the other hand, show a positive selection, perhaps in relation to the MHC, which can be the cause of their large diversity. Changes in amino acids (aa) often occur in immune epitopes. Amino acid changes in CD8+ epitopes were described in all latent proteins, while changes in CD4+ epitopes were shown only for EBNA1 and -2 and LMP1 and -2 . However, most codons of the EBNA3 gene under positive selection are not cytotoxic T-lymphocyte epitopes: either there are epitopes not described to date or the selection relates to other functionalities. The selection of mutants may depend on a difference in immunity in relation to the geography and/or capacity of a strain to infect and persist.
4. Variability of main latency proteins
After the virus enters a host cell, the genome circularizes through recombination of the terminal repeats (TRs) located at each end of the genome to form an episome that will be chromatinized and methylated in the same way as the human genome. Latent transcription programs in B cells are due to the differential activity of epigenetically regulated promoters and take place in three successive waves. The EBNA2 and EBNA-LP, as well as BHRF1, a bcl2 homolog, are the first viral proteins to be expressed, under the dependence of Wp promoter. The two expressed EBNAs and the cellular factor recombination signal-binding protein for immunoglobulin Kappa J region (RBP-Jk) activate then the Cp promoter, which drives the expression of all of the EBNA proteins, while Wp becomes progressively hypermethylated; the transcription will gradually be under Cp control. Subsequently, LMP1, LMP2A, and LMP2B proteins are expressed due to activation of their respective promoters. During latency I or II, Qp promoter controls EBNA1 expression, and Cp methylation is responsible for the five other EBNA silencing. Methylation does not control the Qp promoter, which is switched off by binding to a repressor protein.
As previously developed, latency proteins show the most sequence variations, and among them, EBNA1, EBNA2, EBNA-LP, and LMP1 are the most mutated. The main properties of these proteins are reported in Table 2.
EBNA1, expressed in both latent and lytic EBV infections, was the first EBV protein detected. EBNA1, whose structure (Figure 1) and functions have largely been studied [29, 30], is a 641 aa protein. However, EBNA1 proteins frequently exhibit size variations due to differing numbers of gly-ala repeats (aa 89–325). During latency, EBNA1 is the only protein expressed in all forms of latency in proliferating cells and also in all EBV associated malignancies. EBNA1, which acts as a homodimer, is essential for initiating EBV episome replication before mitosis, once per cell cycle, and mitotic segregation of EBV episomes, thus for the maintenance of EBV episome in latently infected cells . The EBNA1 DNA-binding domain is essential but not sufficient for the replication function, and the N-terminal half of EBNA1 is also required. Two EBNA1 regions (aa 8–67 and aa 325–376) are particularly important for this activity, and the point mutations G81 or G425 enhance EBNA1-dependent DNA replication. Inversely, the EBNA1 aa 395–450 region mediates an interaction with the human ubiquitin-specific protease, USP7, which may negatively regulate replication. The partitioning of EBV episomes in two dividing cells requires two viral components: the ori P FR element and EBNA1, mainly the central Gly-Arg region aa 325–376 and secondarily the aa 8–67 sequence. EBNA1 also activates the expression of other latency genes participating in immortalization: the regions involved are the central Gly-Arg sequence and the 61–89 region. Interaction with the recognition sites located on FR, DS of ori P, and Bam-HI-Q takes place through binding sites located in the C-terminal of EBNA1 (aa 459-607), sequence which also mediates the dimerization of EBNA1 (aa 504–604). Through its interaction with both human casein kinase CK2 (aa 383–395) and cellular ubiquitin-specific protease USP7 (aa 442–448), EBNA1 is also able to disrupt promyelocytic leukemia protein (PML) bodies and degrade PML. In addition to its role in latent infection, EBNA1 can therefore participate in lytic infection by overcoming suppression by PML proteins . Indeed, PML proteins and nuclear bodies were found to suppress lytic infection by EBV. Recently , organization in an oligomeric hexameric ring form was described for the EBNA1 DNA-binding domain, the oligomeric interface pivoting around residue T585. Mutations occurring on this residue had both positive and negative effects on EBNA1-dependent DNA replication and episome maintenance.
Based on polymorphisms observed at 15 codons, Bhatia et al.  reported two strains named P (prototype) and V (variant), each having two subtypes defined by the aa at position 487 (P-ala, P-thr, V-pro, and V-leu). They detected mostly the P-thr and the V-leu variants, respectively, in African and American BL tumors, but these findings were not confirmed by another group who reported different spectra of EBNA1 subtypes according to different geographical areas in both healthy patients and BL tumors . A fifth subtype, V-val, was later recognized in South-East Asia and was found to be prevalent in NPC samples by numerous authors [20, 35, 36, 37]. These findings suggest that the V-val variant might adapt particularly well to the nasopharyngeal epithelium or that this strain possesses an increased oncogenic potential. Indeed, most of the variant codons, localized in the DNA-binding domain, may have an impact on the EBV phenotype resulting in impaired ability to transform B-lymphocytes . However, other reports observed that this subtype had no tumor-specific expression , and it is likely that it probably represents a dominant EBNA1 subtype in Asian regions, not found in other areas of the world [8, 23, 25]. The P-thr subtype is the most commonly observed in peripheral blood of American and African subjects as well as in African tumors. In our experience, P-thr is also the most prevalent in France and particularly in the course of lymphoproliferative diseases.
Apart from these mutations, others have been reported. For example, Borozan et al.  looked at gastric carcinomas and mainly found two mutations already described in NPC, H418L and A439T, located outside the DNA-binding domain and common in both NPC and GC but uncommon in other EBV isolates, from lymphomas or healthy subjects. They also described a new mutation, T85A, positioned in the region required for transcriptional activation of other latency genes and thus able to modify this function. Wang et al.  described the substitution T585I. T585 is subject to substitutions, and T585 polymorphism is found frequently in NPC tumors and Burkitt’s lymphoma. T585I was previously found, and this strain was defective in replication and maintenance of the viral episome , as well as deficient in suppressing lytic cycle gene transcription and lytic DNA replication.
In summary, EBNA1 V-val variant seems to be a geographic variant almost exclusively present in South-East Asia. Conversely, mutations T85 and T585, which occur in functional regions of the protein, could have biological consequences and especially the substitution T585I, which promotes lytic replication and is found in NPC.
EBNA2, a 487 aa protein, is expressed in vivo during latency III shortly after infection of B cells or in lymphomas occurring in immunocompromised patients and in LCL. As mentioned above, the variations in EBNA2 make it possible to classify EBV as types 1 and 2 (or A and B) since only 70% identity at the nucleotide level and 54% homology in the protein sequence were observed. The overall structure of the EBNA2 protein (Figure 2) is characterized by poly-P and poly-RG areas, this last one being a protein-protein and protein-nucleic acid interaction domain important for efficient cell growth transformation, and nine regions conserved throughout the gene . EBNA2 acts principally as a transcription factor and contains three categories of domains critical for its transcription regulation function: transactivation domains (TAD), self-association domains (SAD), and nuclear localization signals (NLS). EBNA2 does not bind directly to DNA. It uses cell proteins as adapters to access viral or cellular enhancer and promoter sites. The C-terminal TAD (aa 448–471) is able to recruit components of basic transcriptional machinery as well as chromatin modifiers and can bind to the viral coactivator EBNA-LP, while the N-terminal TAD (aa 1–58) cannot bind EBNA-LP, although its activity can be enhanced by this protein. Two SADs (aa 1–58 and 97–121), separated by the poly-proline stretch, were identified in the N-terminal region . An additional third one has been reported, localized in a nonconserved region, and flanked by the second SAD and the adapter region . EBNA2 contributes to B-cell immortalization, and it has been demonstrated that type 1 EBV, which is predominantly found in EBV-associated diseases, immortalizes B cells in vitro much more efficiently than type 2 , which is predominantly determined by sequence variation in the C-terminus of EBNA2 . During the early events of EBV infection in resting B cells, EBNA2 initiates the transcription of a cascade of primary and secondary viral and cellular target genes and therefore is responsible for the initiation of immortalization by reprogramming the resting state into a proliferative state. For this, EBNA2 interacts with chromatin remodelers and as a transcription factor cofactor . Mühe et al.  demonstrated that the first 150 N-terminal aa of EBNA2 are important for the initiation of immortalization. EBNA2 is also involved in immortalization maintenance; the region implicated here (aa 295–378) includes the conserved regions CR5 (aa 295–307) and CR6 (aa 320–326), particularly important for this function. CR5 mediates the contact between EBNA2 and SKIP (Ski-interacting protein), and CR6 is the CBF1 (C promoter-binding factor 1) or RBP-Jk targeting domain. Mechanisms to initiate and maintain B cell immortalization are not completely understood today.
Wang et al. , working on 25 EBV-associated GCs, 56 NPCs, and 32 throat washings from healthy donors in Northern China, described 4 EBNA2 subtypes according to the presence of a deletion, namely subtypes E2-A (no aa deletion), E2-B (aa 294Q deletion), E2-C (aa 357K and 358G deletion), and E2-D (aa 357K, 358G, and 294Q deletion). The E2-A subtype exhibited six nonsilent mutations, P291T, R413G, I438L, E476G, P484H, and I486T; the substitution P291T was present in six NPC E2-D and six NPC E2-C. The substitution R413G was detected in E2-C for one patient. They found that E2-A and E2-C were dominant in the samples they analyzed and that the E2-D pattern was detected only in the NPC specimens. The mutation R163M was detected in all samples. This mutation has previously been described worldwide and in different diseases.
Mutations 357 and 358 occurred in the RG domain (aa 335–362), a downregulator of EBNA2 activation of the LMP1 promoter . Moreover, aa 357–363 (KGKSRDK) constitutes the PKC phosphorylation site, which can reduce the amounts of EBNA2/CBF1 complex formed. EBNA2 is suspected to be involved in the development of malignancies as a result of sequence variations most frequently affecting its regulation function.
Interestingly, EBNA2 entire-gene deletion has been shown in some endemic BL cell lines such as P3HR1, Daudi, Sav, Oku, and Ava ; it remains to determine if this deletion occurs classically in vivo in African BL.
In short, geographic variants were not formally demonstrated for EBNA2. Among the described mutations, the most interesting are those occurring in the PKC phosphorylation site because they can activate the Cp and/or LMP1p and thus increase the production of latency proteins.
4.3 EBNA-LP (EBNA-leader protein)
EBNA-LP, like EBNA2 and concomitantly with EBNA2, is expressed shortly after the infection of B cells in healthy individuals as well as in EBV-related malignant diseases in immunodeficient patients and LCLs. EBNA-LP acts mostly as a coactivator of the transcriptional activator EBNA2, thus inducing the expression of some cellular genes, including cyclin D2 , or viral genes, that is, LMP1 , LMP2b, and Cp and therefore having an important role in B cell immortalization. EBNA-LP also can directly interact with several cell proteins such as tumor suppressors or proteins involved in apoptosis or cell cycle regulation.
EBNA-LP is comprised of a variable number of 66 aa repetitive units, corresponding to the variable number of W1 and W2 exons located in the EBV internal repeat IR1, followed by a unique 45 aa domain, encoded by two unique 3′ exons Y1 and Y2 (Figure 3). Therefore, EBNA-LP protein may vary in size according to the number of W1–W2 repeats contained in each EBV isolate. By convention, the protein annotation is based on a single W repeat isoform (Figure 4). In this configuration, the protein has 110 aa. Conserved regions were identified in the N extremity of the protein (CR1 to CR3, respectively, aa 11–33, 45–52, and 55–62, implicated in EBNA2 binding), and in the C-terminal region (CR4 and CR5, respectively, aa 76–82 and 101–110). CR3 and a serine within W2 (S35) were demonstrated to be important for EBNA2 coactivation. EBV-mediated B cell immortalization maps to the W1W2 repeated domains and requires at least two IR1 repetitions to be effective, but a number greater than or equal to 5 is optimal . Some interactions with cell proteins are mediated by the repeated W1W2 N-terminus . EBNA-LP gene transcription initiates from the W promoter (Wp) residing in each IR1 repeat during the early stages of infection, and multiple EBNA-LP protein isoforms are produced. During the later stages of infection and in LCLs, transcription initiates from the C promotor (Cp) . The level of transcription initiated by Cp compared to Wp varies according to different circumstances .
About 15% of BL tumors host a virus, which uses exclusively the W promoter, expressing an EBV atypical latency program , harboring EBNA1, EBNA3A, 3B, 3C, and a truncated form of EBNA-LP. In these cases, EBV genome lacks the EBNA2 gene and the unique Y1Y2 exons of EBNA-LP. This was firstly described in P3HR1 and Daudi BL cell lines . Subsequently, these cells were shown to be more resistant to apoptosis than cells infected by wild-type virus, what would be related to the truncated shape of EBNA-LP.
Given the difficulty of sequencing repetitive regions, only few authors have sequenced the IR1 region, including the EBNA-LP coding region. Previous studies identified two EBNA-LP distinct isoforms, type 1 and type 2 variants, based on the presence of G8/T12 or V8/A12 in exon W1 . The Q54R substitution was also described in exon W2 from an African type 2 spontaneous lymphoblastoid cell line LCL . Despite this, a high degree of conservation was reported for the Wp promoter and the W1-W2 intron, while the most diversity was observed for the BWRF1 ORF, which only shows 80% homology between various strains, and for Y exons . The sequence variations in the Y exons, and especially the Y2 exon, made it possible to define four main subgroups, called A, B, C, and Z. The Akata strain belongs to subgroup A and B95-8 to subgroup B. Subgroup Z is found in type 2 EBVs, and the C subtype is characterized by V95E and V102I. Finally, it has been reported that tumor-derived strains are more prone to interstrain genetic exchange in IR1 .
LMP1 is considered to be the main oncogenic protein in EBV. LMP1 is a multifunctional self-aggregating protein essential for the transformation of human B cells and rodent fibroblasts . It is a 386 aa protein comprising a 24 aa cytosolic N-terminal (NT) segment, a 162 aa portion consisting of six transmembrane (TM) domains, and a 200 aa cytosolic C-terminal (CT) domain (Figure 5) . The NT domain plays an important role in the orientation and anchoring of LMP1 to the membrane and its constitutive aggregation, thus contributing to the transforming function of LMP1 . The TM region is involved in the localization of LMP1 at the level of lipid rafts in the membrane, thus inducing its clustering to activate signaling from the CT tail. It is remarkable that the F38LWY41 pattern in the first transmembrane fragment (TM1) and a second pattern consisting of aa W98 in TM3 are essential for the association of TM domains (1–2) with TM domains (3–6) as well as for the oligomerization and signaling of LMP1 . The CT part is involved in the activation of LMP1-induced cell signaling pathways, including two important regions, CTAR1/TES1 and CTAR2/TES2 (Carboxyl-Terminal Activating Region/Transformation Effector Site) critical for EBV-mediated B-cell growth transformation . Together, these regions mimic CD40, a member of the tumor necrosis factor (TNF) receptor family and key B-cell costimulatory receptor, thus enabling the recruitment of cell adapters associated with the TNF receptor family, TNF receptor-associated factors (TRAFs). The CTAR1 region includes the P204-X-Q206-X-T208 consensus pattern necessary for the attachment of TRAF adapters, specifically TRAF1, TRAF2, TRAF3, and TRAF5 . Within the CTAR2 region, the Y384-Y385-D386 pattern is essential for binding the TNF receptor-associated death domain (TRADD) adapter. There is a third region, CTAR3 (aa 232–350), that is not essential for in vitro B cell immortalization and is less well known . In this region located between CTAR1 and CTAR2 (aa 253–302), a variable number of repeat 11 aa elements (4 repeats for B95-8) exist.
LMP1 acts principally as a viral pseudoreceptor, which regulates host cell signal transduction by constitutive activation of cell pathways as mitogen-activated protein kinase (MAPK) pathways and principally the extracellular regulated kinases 1 and 2 (ERK1/2), c-Jun amino-terminal kinases 1–3 (JNK1–3), and p38 isoform pathways. LMP1 also induces the phosphatidylinositol 3-kinase (PI3K) pathway, which contributes to survival signals  and transcription of activator protein 1 (AP1) , PI3K, and AP1 pathways, therefore playing a major role in proliferation and cell cycle control. LMP1 is also responsible for the activation of JAK/STAT and interferon regulatory factor 7 (IRF7) pathways and for aberrant constitutive NF-kB activation. Indeed, the CTAR1 PXQXT pattern is able to engage TRAFs, leading finally to the activation of noncanonical NF-kB pathway that controls processing of the NF-kB2/p100 precursor . The CTAR2 YYD pattern is in turn implicated in the activation of the canonical NF-kB pathway  after binding of tumor necrosis factor receptor superfamily member 1A (TNFRSF1A)-associated via TRADD and receptor-inter-acting protein 1 (RIP1). A wider region of LMP1 seems to be responsible for binding RIP1 (aa 351–386), compared to TRADD (aa 375–386) . NF-kB is considered to be the principal factor by which LMP1 regulates gene expression and modifies cell behavior . Activation of NF-kB is associated with upregulation of anti-apoptotic genes [32, 73] and downregulation of pro-apoptotic factors, as well as induction of tumorigenesis-associated B-cell activation markers [74, 75]. CTAR3, less well defined, seems to activate SUMOylation pathways and participate in the maintenance of EBV latency and control of cell migration, a hallmark of oncogenesis [76, 77].
Besides its ability to transform B cells, during the latency state, LMP1 seems also to be able to facilitate the release of virions from B cells during lytic replication .
Variations in the LMP1 sequence have been widely studied, particularly in the context of its impact on clinical occurrence or evolution. A 30 bp deletion (del30), resulting in a 10 aa loss in the C-terminal (aa 343–352), was first described in the Cao cell isolate from a Chinese NPC . In addition, this isolate harbored numerous substitutions. A high prevalence of the same deletion, as reviewed by Chang et al. , was found in Asian NPC biopsy tissues [79, 80], in lymphomas and EBV-related gastric cancers from Eastern Asia  and in Asian nasal NK/T-cell lymphomas [82, 83]. Del30 was shown to be often associated with the G335D mutation in NPC, and such strains were reported to have a greater transforming activity in vitro than the reference LMP1 [84, 85]. If the 30 bp deletion is partly localized to CTAR2, it does not alter NF-kB activation  and finally does not modify signaling properties . However, it is clear that strains bearing del30 are selected over the wt-LMP1 variants in NK/T-cell lymphomas  and NPC tumors . Given that del30 strains have been currently detected in normal carriers  or in various EBV-associated diseases , and, because of a low prevalence of del30 strains in samples from Africa, North America, and Europe [8, 91], it is generally admitted that LMP1 del30 may represent a geographic polymorphism rather than a disease-associated polymorphism . In a study, we carried out in France in patients with NK/T lymphoma, we found a del30 EBV in 4/4 biopsies studied and in 46.1% of total blood samples analyzed, while in a control population, the deletion was present in 4.8% of cases . Other deletions were also described, such as the rare C terminal 69 bp deletion reported to weakly activate the AP1 transcription factor , or the 15 bp deletion (aa 275–279) frequently encountered in Western Europe .
Otherwise, numerous substitutions have been described in LMP1 (Table 3), particularly in the N-terminal extremity. Some authors have made attempts to classify viral strains by taking into account these substitutions with the aim of highlighting a viral implication in certain pathologies . Thus, Mainou and Raab-Traub  classified EBV into seven variants, namely Alaskan, China 1, China 2, Med+, Med-, NC, and B95-8, all having the same in vitro transforming potential and signaling properties. Zuercher et al.  mentioned two polymorphisms, I124V/I152L and F144I/D150A/L151I, which seem to be markers of increased NF-kB activation in vitro. Lei et al.  distinguished four models according to the substitutions occurring in both the LMP1 gene and its promoter. The patients suffering from NPC that they studied all carried a strain belonging to pattern B, while the BLs were distributed among the four patterns. Many authors recognize two evolutionarily distinct clusters, Asian-derived EBV strains including GD2, HKNPC1, and Akata strains and non-Asian and African/American strains including AG876, B95-8, and Mutu strains, suggesting that the LMP1 gene could be used as a geographic marker [25, 98].
|EBNA-1||Latency||Initiation of viral episome replication before mitosis|
|Mitotic segregation of EBV episomes|
|Transcription of other latency genes (Cp and LMPp enhancer)|
|Degradation of promyelocytic leukemia protein (PML) bodies|
|Cellular transcription regulation|
|EBNA-2||Latency||Viral and cellular transcription factor|
|Initiation and maintenance of B cell immortalization|
|Blocking of methylation sites for BZLF-1 binding|
|EBNA-LP||Latency||Coactivator of the transcriptional activator EBNA2|
|LMP-1||Latency||Similarity to constitutively activated CD40|
|Constitutive activation of cell pathways|
|Maintenance of EBV latency and control of cell migration|
|BNRF1||Tegument||Establishment of latency and cell immortalization|
|Increase in the number of cellular centrioles|
|BPLF1||Tegument||Downregulation of viral ribonucleotide reductase (RR)|
|Disruption of damaged DNA repair|
|Decreasing of innate immunity|
|BKRF3||Tegument||DNA replication and repair—viral DNA mutagenesis prevention|
|Region concerned||Mutation||Consequence of the mutation||References|
|NT domain (24AA) Mutation rate: 0.33||E2D||[85, 97]|
|H3R/L||[85, 93, 97]|
|TM1 (20AA) Mutation rate: 0.15||L25I/M||[13, 85, 93, 97, 98]|
|V43I||[85, 93, 97]|
|D46N||[85, 93, 97, 98]|
|TM2 (21AA) Mutation rate: 0.19||S57A||[85, 93, 97]|
|I63 V/L||[85, 93, 97, 98]|
|TM3 (21AA)Mutation rate: 0.24||A82G||Homo-oligomerization and/or interaction with other molecules in lipid rafts||[85, 98]|
|I85L||[85, 93, 97, 98]|
|TM4 (21AA) Mutation rate: 0.43||F106V/Y||Homo-oligomerization and/or interaction with other molecules in lipid rafts||[85, 97, 98]|
|I122L||[85, 93, 98]|
|I124G/V||I124V + I152L: increased NF-kB activation in vitro||[93, 96, 97]|
|L126F||M129I: increased LMPI half-life in epithelial cells||[85, 93, 97, 98]|
|M129I||[85, 93, 97, 98]|
|TM5 (22AA) Mutation rate: 0.18||F144I/D||FI44I/D + D150A/L + L151I: increased NF-kB activation in vitro||[85, 96, 98]|
|D150A||D150A/L + F144I/D + LI511: increased NF-kB activation in vitro||[85, 96, 98]|
|L151I||L151I + F144I/D + D150A/L: increased NF-kB activation in vitro||[85, 93, 96, 97, 98]|
|II52L||I152L + I124V: increased NF-kB activation in vitro||[93, 96, 97]|
|TM6 (21AA) Mutation rate: 0.05||L178M||[85, 98]|
|CTAR1 (45AA) Mutation rate: 0.15||Q189P||[85, 98]|
|G212S/T||G212S: Erk activation, thus c-Fos induction and binding to API site||[85, 93, 97, 98]|
|SNQ pattern (212–214) + del30 in NK/T biopsies|
|H213N||SNQ pattern (212–214) + del30 in NK/T biopsies||[85, 93, 97]|
|E214Q||SNQ pattern (212–214) + del30 in NK/T biopsies||[85, 93, 97]|
|CTAR3 (56AA) Mutation rate: 0.20||del275–279||[85, 93]|
|S309N||S309N + del30 + dell5 in NK/T biopsies||[85, 93, 97, 98]|
|Q322E/N/K||[85, 93, 97, 98]|
|Q334R||[85, 93, 97, 98]|
|G335D/S||[85, 93, 98]|
|del343–352||[23, 85, 93, 98]|
|CTAR2 (35AA) Mutation rate: 0.23||L338S/P||[85, 93, 97, 98]|
|S366A/Q/T||S366T: Erk activation, thus c-Fos induction and binding to AP1 site||[85, 93, 97, 98]|
Finally, it should be noted that LMP1 carries a molecular signature of accelerated evolution rate probably due to positive selection as deduced from a significant proportion of nonsignificant variations .
So, regarding LMP1, which is the most oncogenic latency protein, two geographic clusters appear to exist corresponding to an Asian variant and a non-Asiatic variant. The described 30 bp deletion is mainly present on Asian strains, and it shows an obvious tropism for nasopharynx. Although many substitutions have been described, little work is done to analyze changes in LMP1 properties based on these substitutions. NPC could be associated with a particular strain, but this remains to be confirmed.
5. Variability of tegument proteins
After the latency proteins, the tegument proteins carry the most changes, and among them, the most mutated are BNRF1, BPLF1, and BKRF3, which will be detailed, as well as BBRF2. This latter protein appears to play an important role in viral infectivity , but its structure and function are poorly known today. For this reason, BBRF2 will not be developed here.
EBV major tegument protein BNRF1 contains 1318 aa, and its structure is shown schematically in Figure 6. BNRF1 is a member of a protein family with homology to the cellular purine biosynthesis enzyme FGARAT. BNRF1 is involved in the establishment of latency and cell immortalization by hijacking the antiviral DAXX (death domain-associated protein-6) histone chaperone . BNRF1 seems to have lost conventional purine biosynthesis activity. It forms a stable quaternary complex with DAXX histone-binding domain (HBD), H3.3 and H4 , responsible for BNRF1 localization to PML nuclear bodies involved in antiviral intrinsic resistance and transcriptional repression of host cells. In the presence of BNFR1, DAXX can no longer collaborate with ATRX to assemble histone variant H3.3 into repressive chromatin at GC-rich repetitive DNA. Binding to DAXX, histone H3.3 and histone H4 occur, respectively, via the BNRF1 DAXX interaction domain (DID) (aa 360–600) and BNRF1 residues 40–52 and 99–102. Huang et al.  demonstrated that the quaternary complex formation is abrogated when dual mutations V546D/L548D and D568A/D569A occurred on BRNF1 DID and is partially diminished in vitro in case of dual mutations Y390A/K461A and V546S/L548S on BNRF1 DID. BNRF1 mutations at K461A, Y390A/K461A, V546S/L548S or Y390A, V546A/L548A, and D568A/569A moderately or severely reduced BNRF1 colocalization at PML nuclear bodies, respectively. A PurM-like domain (610–976) and a GATase domain (1037–1318) were defined. It has also recently been shown that BNRF1 can cause an abnormal increase in the number of cellular centrioles . This phenomenon can lead to aneuploidy or structural chromosome abnormalities and, possibly, to carcinogenesis. The gene regions concerned have not been described.
BNRF1 is reported to be one of the most frequently mutated tegument proteins. It is interesting to note that a nonsense mutation was described in C666–1, an EBV-positive NPC cell line, with no major structural alterations in the BNRF1-deleted virus .
So, the mutations described for BNRF1 do not appear to correspond to a particular geographical distribution. On the other hand, some mutations seem to be able to modify DNA chromatinization, thus affecting the transcription, and therefore have important consequences on cell functioning.
BPLF1, the largest EBV protein (3149 aa), is a late lytic tegument protein. BPLF1 possesses a deubiquitinating (DUB) activity. BPLF1 is able to downregulate viral ribonucleotide reductase (RR) activity, by deubiquitination of the large subunit RR1 , and to specifically deubiquitinate proliferating cell nuclear antigen (PCNA), a DNA polymerase processivity factor, thus disrupting the repair of damaged DNA . By triggering activation of repair pathways and co-opting DNA repair and replication factors, the virus could create genomic instability. The DUB activity is carried by the first 246 aa of the N-terminal region, and the C61 residue of the catalytic triad (Cys-His-Asp) is essential for activity . BPLF1 relocalizes Pol 𝜼 to nuclear sites of viral DNA production, thereby bypassing DNA damage . This mechanism contributes to efficient production of infectious virus.
BPLF1 is also able to deubiquinate cell factors, such as TRAF6, NEMO, and IkBα, leading to TLR signaling inhibition through both MyD88- and TRIF-dependent pathways, thus decreasing innate immune responses by reduced NF-kB activation and proinflammatory cytokine production . It is noteworthy that the same catalytic active site also carries a deneddylating activity shown to target cullin ring ligases, potentially affecting viral replication and infectivity . The role of BPLF1 to help drive human B-cell immortalization and lymphoma formation has also been discussed .
Sequencing of various viral strains has shown that BPLF1 is one of the proteins with the greatest number of changes [20, 24, 109]. Most of these mutations are not analyzed in detail, but Kwok et al. , working on the sequences of eight NPC biopsy specimens, reported two nonsynonymous mutations in the N-terminal region of the protein that exhibit deubiquitinating activity. The same finding was reported by Simbiri et al. , who also described 3 C-terminal mutations (L2935P, P2987L, and R3005Q). A single-nucleotide deletion coupled with a single-nucleotide insertion three nucleotides away was reported by Zeng et al.  in a NPC strain. As a result, two aa substitutions (GA/EG) were predicted to occur. Tu et al.  undertook phylogenetic analysis based on several reported EBV genome sequences and some major genes as BPLF1. They observed that EBV Asian subtypes clustered as a separate branch from the non-Asian ones.
So, as with other proteins, it seems that the Asian strains carry a protein different from the other strains. Substitutions occurring in the region carrying the deubiquitinase activity could have biological consequences.
BKRF3 is a small protein (255 aa), which belongs to the early lytic gene family, and encodes an uracil-DNA glycosylase (UDG), which removes inappropriate uracil residues from DNA. BKRF3 excises uracil bases incorporated in double-stranded DNA due to uracil misincorporation or more often cytosine deamination [110, 111]. BKRF3 participates in DNA replication and repair and prevents viral DNA mutagenesis. BKRF3 shares substantial similarity in overall structure with one UDG family. Four of the five catalytic motifs are completely conserved (aa 90–94, 110–114, 146–149, 191–192), whereas the fifth domain (aa 213–229) carries a seven-residue insertion in the leucine loop . In addition, the 29 N-terminal aa carry a nuclear localization signal (sequence KRKQ). Only changes in BKRF3 that do not severely affect viral replication can be retained, but it may be considered that these mutations cause a change in virus-cell interrelations.
The aim of this chapter was to take stock of the most frequently observed variations in the EBV genome and more particularly to see if some of these variations are considered to be involved in tumor pathology. The candidate viral genes concerned are numerous; those developed here are the most affected, and the mutations reported in the literature have been identified. Some mutations have been well studied, in particular as regards their impact on the structure or functionality of the protein or the cellular consequences of these modifications. However, most mutations have only been described. If a tumorigenic impact of viral mutations is not yet certain, many authors agree that geographic variants exist, and it seems clear that Asian strains have different characteristics from non-Asian strains. Further work is necessary to complete the mass of information and analysis, not at the level of one or several genes, but at the level of the entire genome.