EBV genomes reviewed in this chapter
Epstein–Barr virus (EBV) is the cause of certain cancers, such as Burkitt lymphoma, Hodgkin lymphoma, NK/T cell lymphoma, nasopharyngeal carcinoma, and a subset of gastric carcinomas. The genome-wide characteristics of EBV are essential to understand the diversity of strains isolated from EBV-related malignancies, provide the first opportunity to test the general validity of the EBV genetic map and explore recombination, geographic variation, and the major features of variation in this virus. Moreover, understanding more about EBV sequence variations isolated from EBV-related malignancies might give important implications for the development of effective prophylactic and therapeutic vaccine approaches targeting the personalized or geographic-specific EBV antigens in these aggressive diseases. In this chapter, we will mainly focus on the EBV genome-wide profiling in three common EBV-related cancers in Asia, including nasopharyngeal carcinoma, EBV-associated gastric carcinoma, and NK/T-cell lymphoma.
- Epstein–Barr virus (EBV)
- next-generation sequencing (NGS)
- nasopharyngeal carcinoma (NPC)
- EBV-associated gastric carcinoma (EBVaGC)
- NK/T-cell lymphoma (NKTCL)
Epstein–Barr virus (EBV), a ubiquitous human herpesvirus discovered in 1964 is classified as a group I carcinogen by the International Agency for Research on Cancer (IARC), since the latent infection by EBV has been estimated to be responsible for 200,000 cancer cases worldwide , including Burkitt lymphoma, Hodgkin lymphoma, NK/T cell lymphoma (NKTCL), nasopharyngeal carcinoma (NPC), and a subset of gastric carcinomas. It has been shown that viruses can contribute to the biology of multistep oncogenesis and are implicated in many of the hallmarks of cancer . Notably, the discovery of links between viral infection and cancer types has provided actionable opportunities, such as the use of human papilloma virus (HPV) vaccines as a preventive measure, to reduce the global impact of cancer. However, until now, approved vaccines for EBV have not been available.
EBV has a double stranded DNA genome comprised of approximately 172 kilobases. The expression products cover at least 86 proteins and 46 functional small-untranslated RNAs [3, 4, 5]. EBV has two distinct life cycles: latency and lytic replication. During latency, viral genomes only express a limited number of latent proteins (EBV-determined nuclear antigen 1 (EBNA1), 2, 3A, 3B, and 3C and EBNA leader protein (EBNA-LP); latent membrane protein 1 (LMP1) and LMP2 (which encodes two isoforms, LMP2A and LMP2B)), noncoding EBV-encoded RNAs (EBER1 and EBER2), and viral miRNAs (BHRF1-miRNA and BART-miRNA). EBV latency is categorized as three latency types (latency I–III). EBV genomes in type-I latency are known to express EBNA1 and EBER. EBV genomes in type-II latency are known to express more genes such as EBNA-LP, LMP1, LMP2A, and LMP2B. EBV genomes in type-III latency are known to express most restricted latent genes including EBNA2, EBNA3A, EBNA3B, and EBNA3C. Lytic genes encode viral transcription factors (e.g., BZLF1), a viral DNA polymerase (BALF5) and associated factors, and viral glycoproteins (e.g., gp350/220 and gp110) and structural proteins (capsid and tegument proteins).
Southern blot of restriction fragment length polymorphisms was first used to detect EBV strain variation, and Sanger sequencing of certain specific viral genes (e.g., EBNA1 and LMP1) was later developed to detect sequence diversity. Now, on the basis of high-throughput sequencing, genome-wide analysis is becoming possible.
Prior to 2013, EBV whole genome sequences available from GenBank were limited to less than 10 strains (B95-8, EBV-WT, GD1, AG876, GD2, HKNPC1, Akata, and Mutu). The prototypic type 1 EBV strain B95-8 was the first complete genome sequenced from an individual with infectious mononucleosis using a conventional strategy (i.e., subcloning followed by Sanger sequencing) . Subsequently, a more representative type 1 EBV reference genome, human herpesvirus 4 complete wild type genome, was constructed by using B95-8 as the backbone with an 11-kb deletion segment provided by the Raji sequences (named EBV-WT) . AG876 was the unique complete type 2 EBV sequence from a Ghanaian case of Burkitt lymphoma . Akata and Mutu were sequenced from Burkitt lymphoma cell lines from a Japanese patient and a Kenyan patient, respectively . GD1 , GD2  and HKNPC1  were isolated from NPC patients.
Since 2014, a new technology named Hybrid Capture (Figure 1), has marked a new era of EBV genome sequencing. Using the method of target enrichment of EBV DNA by hybridization, followed by next-generation sequencing, de novo assembly, and joining of contigs can yield complete EBV genomes. The development of high-throughput sequencing technologies enabled sequencing of EBV genomes derived from a wide variety of clinical samples, such as tumor biopsy samples . The number of available EBV sequences is increasing exponentially and up to now, more than 500 EBV genomes have been sequenced from a variety of human malignancies, including NPC, lymphoma, gastric cancer, and lung cancer, as well as from healthy carriers [14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]. Progress has made it possible that the population-based case–control studies of EBV strain variation in EBV-related cancer patients as compared with the healthy population and a comprehensive survey of EBV integration in a variety of human malignancies can be effectively conducted [20, 25, 26, 27]. These developments have revealed that various EBV strains are differentially distributed throughout the world, and that the behavior of cancer-derived EBV strains is different from that of the prototype EBV strain of noncancerous origin.
Hence, the genome-wide characteristics of EBV are essential to assess the diversity of strains isolated from EBV-related malignancies. Meanwhile, understanding the pattern of EBV sequence variation is important for knowing whether there is a disease-related strain-specific or geographic regional variation of EBV strain, and might provide important implications for the development of effective prophylactic and therapeutic vaccine approaches targeting the personalized or geographic-specific EBV antigens in these aggressive diseases.
In this chapter, EBV genomes reviewed are from three common EBV-related cancers in Asia, including NPC, EBV-associated gastric carcinoma (EBVaGC), and NKTCL. The EBV strains include GD1 , GD2 , HKNPC1 , HKNPC2-HKNPC9 , EBVaGC1-EBVaGC9 , GDGC1-GDGC2 , NKTCL-EBV1-NKTCL-EBV8 , NKTCL-SC01-NKTCL-SC15 and NKTCL-SG01-NKTCL-SG12  (Table 1).
|HKNPC1||Hong Kong, China||NPC||2012|||
|HKNPC2-HKNPC9||Hong Kong, China||NPC||2014|||
|NKTCL-SC01-NKTCL-SC15||15 from Southern China, 12 from Singapore||NKTCL||2019|||
2. Genomic diversity of EBV-related malignances
NPC, an EBV-associated epithelial carcinoma, has a unique geographical distribution . A recent World Health Organization (WHO) report estimated that there were around 130,000 new NPC cases worldwide in 2018 . Rare in most of the world, NPC is particularly prevalent in South China and Southeast Asia . In Hong Kong and Guangdong in South China, NPC incidence is as high as 12.8–25.0/100,000 per year [28, 29]. The cause of NPC endemicity remains unknown.
Many studies have shown that EBV genome is present in almost all endemic NPC tumors with a unique pattern of virus latent gene expression, suggesting that EBV plays an important role in the tumorigenesis of NPC . Whole genome sequencing is useful for us to understand genomic characterization and divergence. Here, we mainly focus on 11 mostly available full-length genomes of NPC.
GD1 (Guangdong strain 1), the first NPC-derived EBV strain with full-length sequences determined using PCR amplification and sub-cloning followed by conventional Sanger sequencing technology, was analyzed from established a lymphoblastoid cell line (LCL) from umbilical cord blood mononuclear cells transformed by saliva virus from a Cantonese NPC patient in 2005 . The entire GD1 sequence is 171,656 bp in length and GD1 belongs to type 1 strain. Many sequence variations in GD1 compared to prototypical strain B95-8 were detected, including 43 deletion sites, 44 insertion sites, and 1413 point mutations. Furthermore, the frequency of some GD1 mutations in Cantonese NPC patients was evaluated, such as a 30-bp deletion in the C terminus of LMP1, and the V-Val subtypes of EBNA1. The results suggested that GD1 is highly representative of the EBV strains isolated from NPC patients in Guangdong, China, an area with the highest incidence of NPC in the world.
With the invention of next-generation sequencing (NGS) systems, it is possible to determine genome-wide sequences and the viral clonality of EBV strains by direct sequencing of EBV genomes in clinical tumors in a time- and cost-effective manner. GD2 with 164,701 bp long was directly sequenced using the Illumina (Solexa) platform, and successfully assembled from an NPC tumor of a patient in Guangdong province, a region in China by the same group who determined GD1 . GD2 was closely related to GD1 by sequence and phylogenetic analyses. The sequence similarities between GD2 and GD1 were 98.76%. GD2 and GD1 shared 505 common single-nucleotide variations (SNVs), including most SNVs in the coding regions (348 [68.91%] SNVs) and seven insertion and deletions (indels). From a comparison with the EBV-WT reference genome, a total of 927 SNVs and 160 indels with genome-wide distribution were found in the GD2 genome. The results revealed that NGS allows the characterization of genome-wide variations of EBV in clinical tumors and provides evidence of monoclonal expansion of EBV in vivo.
Because of the relatively small quantity of viral DNA present in the tumor sample, next-generation sequencing total cellular and viral DNA in a sample is costly and inefficient, and may limit the generation of the high read depth necessary to make high confident base calls of the viral genome. Using target enrichment technology could increase the relative amount of viral DNA. Kwok et al. reported an approach of PCR enrichment (Amplicon Sequencing) followed by sequencing the amplified products on the Illumina Genome Analyzer IIx platform to determine the genome sequence of an EBV isolate from NPC tumor of a Chinese patient in Hong Kong, designated as HKNPC1 . HKNPC1 is approximately 171,549 bp, and contains 1589 SNVs and 132 indels in comparison to the reference EBV-WT sequence. Non-synonymous SNVs were mainly found in the latent, tegument and glycoprotein genes. The same point mutations were found in glycoprotein (BLLF1 and BALF4) genes of GD1, GD2 and HKNPC1 strains and might affect cell type specific binding. The results showed that whole genome sequencing of EBV in NPC may facilitate discovery of previously unknown variations of pathogenic significance.
The group of Kwok and colleagues established a complete sequencing workflow comprising target enrichment of EBV DNA by hybridization, followed by next-generation sequencing, de novo assembly, and joining of contigs by Sanger sequencing to yield whole EBV genomes. The sequences of eight NPC biopsy specimen-derived EBV (NPC-EBV) genomes, designated HKNPC2 to HKNPC9, were then determined in the same geographic location in order to reveal their sequence diversity . The eight NPC-EBV genome sizes estimated based on the reference EBV-WT sequence ranged from 170,062 bp (HKNPC2) to 171,556 bp (HKNPC3 and -6). A total of 1736 variations were found, including 1601 substitutions, 64 insertions, and 71 deletions, compared to the reference EBV-WT genome. Furthermore, genes encoding latent, early lytic, and tegument proteins and glycoproteins were found to contain nonsynonymous mutations of potential biological significance. Thus, much greater sequence diversity among EBV isolates derived from NPC biopsy specimens is demonstrated on a whole-genome level through a complete sequencing workflow.
Obtaining whole-genome sequence information for more clinical EBV isolates, with good representation of the EBV repertoire in tumors, could help to address that hypothesis and uncover the pathogenic subtypes of EBV in NPC tumorigenesis. A case–control (62 NPC patients and 142 population carriers) study of NPC in Hong Kong has identified high-risk EBV subtypes with polymorphisms in the EBV-encoded small RNA (EBER) locus . A recent study published in Nature Genetics entitled ‘Genome sequencing analysis identifies high-risk Epstein–Barr virus subtypes for nasopharyngeal carcinoma’ by Xu et al. used large-scale EBV whole-genome sequencing to examine EBV subtypes in an attempt to explain the unique NPC endemicity in South China . Through EBV genomes from 156 NPC cases and 47 controls and two-stage association study, they identified two non-synonymous EBV variants within the BALF2 gene (BamHIA leftward reading frame 2 encoding a single strand DNA binding protein associated with EBV replication) strongly associated with the risk of NPC (odds ratio [OR] = 8.69 for SNP162476_C and OR = 6.14 for SNP163364_T). The cumulative effects of these variants contribute to 83% of the overall risk of NPC in southern China. These studies confirmed the critical role of EBV infection in the pathogenesis of NPC and provided an explanation for the striking epidemiological distribution of this tumor in South China.
EBVaGC has been recognized as a distinct subset of gastric carcinoma, accounting for about 10% of total gastric carcinomas [32, 33, 34, 35]. The monoclonal presence of the virus was uniformly distributed in malignant cells of EBV-positive tumors but not observed in the surrounding normal epithelial cells, providing strong evidence to support the role of EBV as an etiologic agent [32, 33]. However, the exact role of EBV in the development and progression of this specific type of gastric carcinoma is not yet clear.
Progress has been made in understanding the full spectrum of diversity existent within the EBV genome from EBVaGC clinical tumor samples, since the NGS technology has been developed. Here, 11 EBV strains from primary EBVaGC biopsy samples were included.
Our group reported the first genome-wide view of sequence variation of EBV isolated from primary EBVaGC biopsy specimens in 2016 . We used the method of target enrichment of EBV DNA by hybridization, followed by next-generation sequencing. EBV probes were designed according to full-length genome of six available EBV strains, including EBV-WT, B95-8, AG876, GD1, GD2, and HKNPC1. According to the value of coverage of the target region, all DNA sequence generated from GC-EBV strains most resembled GD1. Thus, GD1 was used as the reference EBV genome in our study. De novo assembly was performed for nine sequenced GC-EBV strains. Finally, nine EBVaGC genomes were successfully sequenced, designated EBVaGC1 to EBVaGC9. The genome sizes, estimated based on the reference GD1 sequence, ranged from 171,612 bp (EBVaGC6) to 171,957 bp (EBVaGC1).
Whole-genome sequencing of EBV enabled the comparison and thus the determination of EBV variations at the genome level. In our study, 961 variations were observed in the EBVaGC1 to 9 genomes in comparison to the reference GD1, including 919 substitutions, 23 insertions, and 19 deletions. Both latent genes and genes encoding tegument proteins in nine GC-EBV genomes were found to harbor the majority of nonsynonymous mutations, accounting for 58.4% (EBVaGC8) to 84.3% (EBVaGC3) of all nonsynonymous mutations detected for each genome.
EBNA1 is essential for maintenance of the EBV episome in latently infected cells and is the only EBV antigen that is consistently expressed in all EBV associated malignancies . Based on the amino acid changes at position 487 in the COOH-terminal region in EBNA1 relative to B95-8 (P-ala), V-val was the most common subtype, accounting for 77.7% of nine GC-EBV strains, followed by P-thrV, accounting for 22.3%. Multiple results showed that V-val is the dominant subtype in Asian regions studies, not only in EBVaGC but also in NPC and healthy donors, while V-val subtype was rarely found in Africa, Europe, and America irrespective of source (lymphoma, NPC, EBVaGC, or healthy donors) [37, 38, 39], indicating that polymorphism of EBNA1 subtypes has geographic differences but is not tumor-specific. Apart from changes in the C-terminus, EBNA1 has variations in the N-terminus. Interestingly, we identified two interstrain recombinants at the EBNA1 locus, which provided a further mechanism for the generation of diversity. EBNA1 N-terminus changes have revealed additional variants that were not simply classified based on the signature amino acid residue 487 in the C-terminus as widely used previously. The N-terminus changes reinforce the need to evaluate the EBV genome more comprehensively in order to characterize the full extent of EBV genetic diversity. A comprehensive investigation into the functional and immunological impact of the naturally occurred EBNA1 sequence variations and interstrain recombinants is required to evaluate their possible significance, which may also be helpful for clarifying the association of EBNA1 subtypes and EBVaGC.
2.2.2 GDGC1 and GDGC2
In 2018, NGS was employed to determine the EBV genomes from two EBVaGC specimens, designated as GDGC1 and GDGC2, from Guangdong, China, an endemic area of NPC . Due to the presence of the much more abundant cellular genomic DNA in the DNA preparations, the number of reads belonging to EBV was low, accounting for only 0.02–0.23% of the total reads. However, since the original data were sufficient, the average sequencing depth for genomes GDGC1 and GDGC2 was ~73x and ~24x, respectively, which was sufficient for further analysis. The genome sizes, estimated based on the reference EBV-WT genome sequence, were as follows: GDGC1 (169,611 bp) and GDGC2 (171,299 bp).
The authors reported that a total of 1815 SNPs (146 indels) and 1519 SNPs (106 indels) were found in GDGC1 and GDGC2, respectively, compared with the reference EBV-WT genome. Among these, 1229 SNPs (66 indels) and 1076 SNPs (54 indels) were located in the coding regions for GDGC1 and GDGC2, respectively, while the remaining variations were found in the non-coding regions. Consistent with previous reports , there is clear evidence for a higher frequency of SNPs in latent genes, followed by the genes encoding tegument and membrane glycoproteins. In contrast to the frequent mutations that occurred in latent genes, the sequences of promoters and ncRNAs were investigated to be strictly conserved. A few point mutations were found in the sequences of Cp, Qp, Fp and LMP2Ap, and only scattered mutations could be identified in certain ncRNA sequences. Promoters and EBV-generated ncRNAs play important roles in regulating viral processes and in mediating host-virus interactions. Thus, a detailed EBV genome-wide analysis of EBVaGC from Guangdong was performed, which would be helpful for further understanding of the relationship between EBV genomic variation and EBVaGC carcinogenesis.
The features of the disease and geographically associated EBV genetic variation as well as the roles that the variation plays in carcinogenesis and evolution remain unclear. A recent study sequenced 95 geographically distinct EBV isolates from EBVaGC biopsies (n = 41) and saliva of healthy donors (n = 54) to detect variants and genes associated with gastric carcinoma from a genome-wide spectrum . BRLF1, BBRF3, and BBLF2/BBLF3 genes had significant associations with gastric carcinoma. LMP1 and BNLF2a genes were strongly geographically associated genes in EBV. The results provided insights into the genetic basis of oncogenic EBV for gastric carcinoma, and the genetic variants associated with gastric carcinoma could serve as biomarkers for oncogenic EBV.
Extranodal NKTCL, a rare type of non-Hodgkin lymphoma, is characterized by the presence of EBV in virtually all cases, irrespective of their ethnicity or geographical origin. NKTCL is an aggressive malignancy, predominantly occurs in the nasal, paranasal, and oropharyngeal sites, and is much more prevalent in East Asia and Latin America than in Western countries .
Although the association of this B lymphotropic virus with malignancies of T and NK cell origin was quite unexpected, both the presence of virus sequences in tumor cells and the virus’s oncogenic potency have led to the hypothesis that whether particular EBV strains are preferentially selected in NKTCL. Pathogenesis and genotype analyses of NKTCL have mainly focused on genetic variations in a small fraction of EBV genes before, which is limited to define the spectrum of diversity within the whole genome of EBV. The genome-wide characteristics of EBV are essential to understand the diversity of strains isolated from NKTCL. In 2019, for the first time, 35 NKTCL-derived EBV genomic landscapes at genome-wide level were simultaneously systematically characterized by two groups.
Our group directly sequenced EBV-captured DNA from eight primary NKTCL biopsy samples from China using Illumina HiSeq 2500 sequencer platform and presented the eight EBV sequences, designated NKTCL-EBV1-NKTCL-EBV8 . Aiming at knowing the detail of subtype, the obtained DNA sequences were compared with six mostly referenced sequences, including AG876, B95-8, EBV WT, GD1, GD2, and HKNPC1. The GD1 coverage percentages are higher than the rest. The genome sizes, estimated based on the reference GD1 sequence, ranged from 171,590 bp (NKTCL-EBV8) to 172,059 bp (NKTCL-EBV1).
Whole-genome sequence alignments revealed extensive nucleotide variation in the eight NKTCL-EBV genomes. In comparison with the most similar GD1 strain, the NKTCL-EBV1 to NKTCL-EBV8 harbored 2072 variations in total, including 1938 substitutions, 58 insertions, and 76 deletions. Among them, 1218 substitutions, 15 insertions, and 26 deletions were located in the coding regions. The number of the nonsynonymous mutations is highest in the gene regions encoding latent proteins in each of the NKTCL-EBV genomes, followed by genes encoding the tegument protein and membrane glycoproteins.
EBNA1 and LMP1 are the most frequently studied regions to date. Based on the amino acid changes in certain residues of LMP1 and EBNA1, eight NKTCL-EBVs were sorted to China 1 and V-val subtype, respectively. Of interest, EBNA1 of NKTCL-EBV3 sequence showed clustered away from the other seven NKTCL-EBV strains. Analysis of amino acid sequences of EBNA1 supported that EBNA1 of NKTCL-EBV3 may arise from recombination of GD1 and B95-8. Other two commonly classification systems for LMP-1 gene polymorphisms include a 30-bp deletion in the C terminus and the loss of restriction site Xho I in the N terminus of the gene. LMP1 is a key latent protein with abilities to promote cell proliferation and inhibit cell apoptosis in NKTCL. In our study, the LMP1 strain in NKTCL-EBV1-NKTCL-EBV7, but not NKTCL-EBV8, harbored the 30-bp deletion. The variant of 30-bp deletion of LMP1 has been demonstrated that it is associated with poor prognosis of patients with NKTCL, which might serve as a potential marker to monitor treatment . In addition, eight NKTCL-EBV strains had Xho I restriction site loss at exon 1 of the LMP1 gene.
2.3.2 NKTCL-SC01-NKTCL-SC15 and NKTCL-SG01-NKTCL-SG12
The other group assembled 27 NKTCL-derived EBV genome sequences retrieved from whole-genome sequencing data using the Hiseq sequencer (Illumina), including 15 EBV-positive NKTCL tumor samples from Southern China and 12 samples from Singapore . The average percentage of EBV sequences in WGS data is 0.45% (0.03–1.06%), and the coverage depth is 222.2X in average (26.7X–612.8X). As ~34 kb of 172 kb of EBV genome are repeat regions, which could not be properly assembled with short-reads sequencing technology, the groups assigned “N” for these regions and subsequently joined the scaffolds, resulting in EBV genomes with ~172 kb in length.
The authors reported that among the 27 NKTCL samples, in average 1152 EBV SNVs for each sample were determined by aligning the viral reads against the reference EBV-WT genome. The most frequent tumor-specific non-synonymous mutations in NKTCL-derived EBV were located at BPLF1 gene (position 49,790–59,239 bp). An average of 44.8 small indels (<50 bp) of EBV were found in each NKTCL sample, and the 30-bp deletion of LMP1 was commonly found in the samples (21/27), with a frequency consistent with the previous study revealed by using Sanger sequencing . Large deletions of EBV (>1 kb) were found in 10 of 27 NKTCL samples, without any sequencing coverage in the deleted regions. The findings provided insights into the understanding of EBV’s role the etiology of NKTCL.
A genome-wide association study of 189 patients with extranodal NKTCL, nasal type and 957 controls from Guangdong province, Southern China was performed to identify common genetic variants affecting individual risk of NKTCL . All cases were genotyped with Illumina Human OmniExpress ZhongHua-8 BeadChip, and population controls were scanned by Illumina OmniHumanExpress-24 V1.0 (both Illumina, San Diego, CA, USA). The findings were validated in four independent case–control series. The SNP with the strongest association was rs9277378 (OR 2.65 [95% CI 2.08–3.37]) located with HLA-DPB1, indicating the importance of HLA-DP antigen presentation in the pathogenesis of NKTCL. The pathogenic subtypes of EBV in NKTCL tumorigenesis should be further explored.
3. Phylogenetic analysis of the EBV genomes
Phylogenetic analysis of EBV genomes could demonstrate detailed overall genomic differences in EBV genome within or beyond subtypes of EBV-associated diseases, thus, EBV genomic similarity is likely to better infer the phylogenetic relatedness among EBV genomes.
Traditionally, EBV has two distinct subtypes, type 1 and type 2. Type 1 EBV (e.g., B95-8, GD1 and Akata) is the main EBV strain prevalent worldwide, while type 2 EBV (e.g., AG876) is abundant only in parts of Africa and New Guinea. Type 1 and type 2 EBV encode different EBNA2 genes, with only 54% amino acid sequence identity. A recent whole genome sequencing study confirmed that EBNA2 and EBNA3 are the only genes that can distinguish type 1 and type 2 EBV strains . Technologies for genome sequencing were currently developed with tools for genome analysis. High-throughput sequencing technology such as illumine dye sequencing was introduced to successfully sequence viral genomes. As exemplary tools for genome analysis, Molecular Evolutionary Genetics Analysis (MEGA) is used for both conducting statistical analysis of molecular evolution and constructing phylogenetic trees .
The NPC genomes from Asian EBV strains, including GD1, GD2, and HKNPC1-HKNPC9, are type 1 viruses and were clustered in a branch distant to the non-Asian strains AG876, B95-8 . Analysis of LMP1 and -2 showed a phylogenetic relationship corresponding to the geographical origin of the viral genomes instead of the type 1 and 2 dichotomy, indicating that LMP1 and -2 genes can serve as geographical markers. GD1 seemed to harbor many mutations that were not present in the other Chinese strains. HKNPC6 and -7 genomes, which were isolated from tumor biopsy specimens of advanced metastatic NPC cases, were distinct from the other NPC-EBV genomes. Future work should investigate the relationship between the distinct lineage of EBV and the clinical stages of NPC.
GC-EBV strains, EBVaGC1-EBVaGC9 and GDGC1-GDGC2 involved here, were closely related to all Asian-derived EBV strains, distant to the non-Asian strains, and also showed that the EBV sequences generally clustered in a manner consistent with geographical location [17, 21]. Neighbor-joining trees derived from the sequences of EBNA2 gene showed that all the GC-EBV genomes are type 1 viruses, clustered in a branch with other type 1 EBV strains, distant to the only type 2 EBV strain, AG876. Phylogenetic trees based on the LMP1 gene and whole EBV genomes indicated that the nine EBVaGC strains were closely related to all Asian-derived EBV strains and distant to the non-Asian strains, suggesting that the LMP1 gene can serve as a geographical marker . This is in line with the previous results from the NPC-EBV genomes . In addition, phylogenetic analyses on GDGC1 and GDGC2 derived from specific EBV-encoded gene suggested the presence of at least two parental lineages of EBV, as GDGC1 and GD2 clustered closely, while GDGC2 and GD1 clustered closely .
In our recent study, the phylogenetic trees were conducted based on alignment of eight full-length NKTCL-EBVs and previously published 28 strains . Of note, eight NKTCL-EBVs genomes clearly sort into type 1, based on differences in whole genome and especially EBNA2. Eight NKTCL-EBVs were related to other Asian EBV strains, including EBVaGC1–9, HKNPC1–9, GD1, and GD2 obtained from China, and Akata from Japan, whereas none of the specimens was clustered in a branch of non-Asian strains AG876, B95-8, and Mutu. Other group compared the sequences between 27 NKTCL-derived EBV and 164 EBV genome sequences from public database to determine the sequence diversity of EBV . Phylogenetic analysis revealed clear clustering of EBV isolates firstly according to their respective geographic origin; moreover, EBV isolates derived from NKTCL samples tend to cluster closely, apart from clusters by other diseases, supporting the hypothesis of the existence of disease-specific EBV. However, whether the unique EBV has been driving the development of NKTCL or simply adapted to the niche of NKTCL as bystander await further investigations.
In this chapter, phylogenetic analysis was conducted on full-length EBV genomes, including 11 NPC-EBV strains (GD1, GD2, HKNPC1-HKNPC9), 11 GC-EBV strains (EBVaGC1-EBVaGC9, GDGC1-GDGC2), 35 NKTCL-EBV strains (NKTCL-EBV1-NKTCL-EBV8, NKTCL-SC01-NKTCL-SC15, NKTCL-SG01-NKTCL-SG12), B95-8, EBV-WT, Mutu, Akata, and AG876 (Figure 2). The result of phylogenetic tree supports the conclusion that EBV infections are more likely affected by different geographic regions rather than particular EBV-associated malignancies.
4. Amino acid changes in CD4+ and CD8+ T-cell epitopes
Sequence variations of EBV genes also result in amino acid epitope exchanges, which should have a significant impact on EBV-specific T-cell immunity.
Among the shared non-synonymous SNVs of the Chinese derived GD1, GD2 and HKNPC1 isolates, 34 are associated with known EBV-specific epitopes; 19 and 15 are found in CD8+ and CD4+ epitopes, respectively . HKNPC2-9 genomes harbored nonsynonymous mutations in epitopes specific for both CD4+ and CD8+ T cells . Amino acid changes were found in seven CD8+ epitopes of LMP2, five epitopes of EBNA3A, and three or fewer in other proteins. Thirteen CD4+ epitopes of EBNA1, six in LMP1, six in LMP2, five in EBNA2, and three or fewer in other proteins contained amino acid changes. Some of the nonsynonymous mutations were affecting multiple epitopes.
EBVaGC shows EBV type I latency neoplasm, in which EBNA1 is expressed in 100% and LMP2A in about half of EBVaGC cases, respectively . Recent studies show that EBNA1, as well as LMP2A, can be presented to both CD4+ and CD8+ T cells, highlighting its potential importance in the development of therapeutic strategies against EBV-associated malignancies [46, 47]. There is some clear evidence for sequence variation affecting immune recognition of EBNA1 and potential epitope selection for vaccine development . So far, most research on the EBNA1 protein has been focused exclusively on the B95-8 strain alone [46, 47]. Sequence analysis of the gene encoding EBNA1 in EBV isolates from nine EBVaGC specimens has revealed considerable EBNA1 sequence divergence from the B95-8 strain . Importantly, T cell recognition of EBNA1 epitope might be greatly influenced by this sequence polymorphism as adoptive transfer of EBNA1-targeted T cells has a potential use in immunotherapy of EBV associated carcinomas.
NKTCL is associated with type II EBV latency, in which only restricted EBV antigens, namely EBNA1, and LMP1 and 2, are expressed . These EBV encoded proteins might be the targets of immune recognition during its persistent infection, and their nonsynonymous variations in CD4+ and CD8+ T-cell epitopes may affect the efficacy for a cytotoxic T lymphocyte (CTL)-based therapy. Many epitopes were defined and were mapped in EBV antigens and correlated with major histocompatibility complex type in previous studies. In our study, we mainly investigated the amino acid changes in CD4+ and CD8+ T-cell epitopes of EBNA1, LMP1, and LMP2A. Compared with B95-8, amino acids changes were found in 3 CD8+ epitopes of EBNA1, 8 epitopes of LMP1, and 12 epitopes of LMP2A. Eleven CD4+ epitopes of EBNA1, 13 in LMP1, and 9 in LMP2A contained amino acids. Some of the nonsynonymous mutations were affecting multiple epitopes . In another study, alterations of the known T-cell epitopes were examined in EBV sequences derived from NKTCL . Alterations of T-cell epitopes were detected in EBV derived from NKTCL samples. Notably, 21 of these epitopes with significant enrichment in NKTCL samples were restricted to six EBV genes, including EBNA3A (G373D, F325L, I333K, L406P, S412R, H464R, M466R, T585I, and A588P), EBNA3B (A399S, V400L, V417L, K424T, Y662D, and K663E), EBNA3C (P916S), BARF1 (V29A), BCRF1 (V6M), and BNRF1 (G456R, S497G, and A1289T).
Therefore, these data have implications for the development of effective prophylactic and therapeutic vaccine approaches targeting the personalized EBV antigens in these aggressive diseases. Adoptive transfer of cytotoxic T cells (CTLs) specific for EBV antigens has proved safe and effective as prophylaxis and treatment for EBV-associated lymphoproliferative disease. Some patients with advanced stage or relapsed EBV-associated malignancies achieved complete remission after treatment with autologous LMP1/2- and EBNA1-specific CTLs or activated by peptides derived from LMP1/2 [49, 50]. Nonetheless, some cases still did not respond to LMP-CTL therapy, and this failure was usually attributed to immune escapes by antigen loss. It is worth noting that all these previous studies used prototype EBV sequence, B95-8, to design full-length LMP epitopes. Therefore, recent work gives an alternative explanation for the lack of tumor response. Whether changes in such epitopes confer immune evasion of the tumor cells may constitute another hypothesis for future testing.
5. Genomic integration of viral sequences
Viral integration into the host genome has been shown to be a causal mechanism that can lead to the development of cancer . Not surprisingly, known tumor-associated viruses, such as EBV, HBV, HPV16 and HPV18, were among the most frequently detected targets . Notably, the approach of WGS is sensitive to detect viruses. This is particularly true for the common integration verified for HBV, HPV16 and HPV18 in a variety of studies [53, 54, 55]. The known causal role of HPV16 and HPV18 in several tumor entities, which triggered one of the largest measures in cancer prevention, has been the motivation for extensive elucidation of the pathogenetic processes involved. Integration events with high confidence were demonstrated for HBV (liver cancer), HPV16 and HPV18 (in both cervical and head-and neck carcinoma), however, low-confidence integration events were detected for EBV (gastric cancer and malignant lymphoma) .
Comprehensive analyses of WGS datasets may reveal some novel findings on EBV integration. Recently, a comprehensive survey of EBV integration in a variety of human malignancies, including NPC, EBVaGC, and NKTCL was conducted, using EBV genome capture combined with ultra-deep sequencing, which could efficiently detect integrated EBV sequences from background “noise” introduced by nuclear EBV episomes . The EBV integration rates were 25.6% (10/39), 16.0% (4/25), 9.6% (17/177) in the EBVaGC, NKTCL, and NPC tumors, respectively, which were lower than HPV integration in cervical cancer (76.3%) and head and neck squamous cell carcinoma (60.7%), and HBV in hepatocellular carcinoma (92.6%) [54, 57, 58, 59]. They found that EBV integrations into the introns could decrease the expression of the inflammation-related genes, TNFAIP3, PARK2, and CDK15, in NPC tumors . The EBV integration breakpoints were frequently at oriP or terminal repeats, and were surrounded by microhomology sequences, consistent with a mechanism for integration involving viral genome replication and microhomology-mediated recombination, which has an important role in the integration of other tumorigenic viruses, HBV and HPV [54, 59]. Meanwhile, researchers also observed integrations of short EBV fragments into human chromosomes, coincident with episomal EBV genomes in NKTCL, and showed that 31 EBV-host integration sites were detected from eight NKTCL samples, and enriched in the repeat regions of human genome, such as SINE, LINE, and satellite .
However, there are still few studies on EBV integration based on WGS technology. In addition, authors only selected some potential breakpoints to perform PCR and Sanger sequencing for validating. For example, Xu et al. randomly select 12 integrations from 197 breakpoints identified from NPC and other EBV-associated malignancies, and only 10 breakpoints were successfully validated . As integration of EBV sequence into the host genome and the consequent disruption of the important host genes might represent a novel tumorigenesis mechanism in EBV associated malignancies, all the potential EBV integration breakpoints should be validated and biological function of host genes involved should be further conducted.
In conclusion, full-length EBV genomes isolated from primary NPC, EBVaGC, and NKTCL biopsy specimens have been successfully sequenced and the sequence diversity on a whole-genome level has been analyzed, although their pathogenesis remains to be clarified. Phylogenetic analysis has shown that all aforementioned NPC, GC, and NKTCL-EBV strains are type 1 EBV and close to other Asian subtypes, leading to the conclusion that EBV infections are more likely affected by different geographic regions rather than particular EBV-associated malignancies. In addition, sequence variations of EBV genes also result in amino acid epitope exchanges, which should have a significant impact on EBV-specific T-cell immunity. Recent data have provided optimization proposal for selecting EBV genome for treatment from individual patients or at least predominant strains prevalent in geographical regions instead of commonly used B95-8 genome. We acknowledge that further characterizations of the molecular events would provide more information on the exact mechanisms underlying their pathogenic potentials and clinical significance.
This work was supported by grants from National Natural Science Foundation of China (81903155 to Y.L.), and Beijing Municipal Natural Science Foundation (7202023 to Y.L.), and Beijing Hospitals Authority Youth Program (QML20181106 to Y.L.)
Conflict of interest
The authors declare no conflict of interest.