The Study of Hepatitis B Virus Using Bioinformatics

Hepatitis refers to the inflammation of the liver. A major cause of hepatitis is the hepatotropic virus, hepatitis B virus (HBV). Annually, more than 786,000 people die as a result of the clinical manifestations of HBV infection, which include cirrhosis and hepatocellular carcinoma. Sequence heterogeneity is a feature of HBV, because the viral-encoded polymerase lacks proof-reading ability. HBV has been classified into nine genotypes, A to I, with a putative 10th genotype, “J,” isolated from a single individual. Comparative analysis of HBV strains from various geographic regions of the world and from different eras can shed light on the origin, evolution, transmission and response to anti-HBV preventative, and treatment measures. Bioinformatics tools and databases have been used to better understand HBV mutations and how they develop, especially in response to antiviral therapy and vaccination. Despite its small genome size of ~3.2 kb, HBV presents several bioinformatic challenges, which include the circular genome, the overlapping open reading frames, and the different genome lengths of the genotypes. Thus, bioinformatics tools and databases have been developed to facilitate the study of HBV.


Introduction
Primarily, bioinformatics is the use of computational science to study biological and clinical data using statistics, mathematics, and information theory.This field is developing and evolving; thus, the definition cannot be precise.Moreover, the field is broad, ranging from the study of DNA and proteins, to structural biology, drug design and comparative genomics, transcriptomics, proteomics, and metagenomics.The optimization of computational technology is paramount in order to handle, store, manage, and analyze the large volumes of data generat-ed in the last decade.The data include molecular sequencing data of host and pathogen genomes and their associations to demographic and clinical records, laboratory test results, as well as information on treatment.Moreover, bioinformatics can aid in the investigation of virus-host genome and environmental interactions and in the identification of both host and viral biomarkers.This analysis can lead to a better understanding of clinical manifestation of disease and effective design of preventative and treatment measures [1].
In the first section, we describe the unique genomics and molecular biology of hepatitis B virus (HBV).Using illustrative examples, we showed how bioinformatics analyses can facilitate the understanding of the origin, evolution, transmission, and response to antiviral agents of HBV.Next, we described the bioinformatics challenges posed by HBV and present the public databases and tools currently available for the study of HBV.

Hepatitis
Hepatitis refers to the inflammation of the liver.A major cause of hepatitis is the hepatotropic virus, HBV.HBV infection is a public health problem of worldwide importance.Globally, 2 billion people have been exposed to this virus at some stage of their lives, and 240 million are chronic carriers of the virus [2].This infection can lead to a spectrum of clinical consequences.In the majority of cases, the infection is subclinical and transient, whereas in 25% of cases, it can cause self-limited acute hepatitis and in 1% of these progress to acute liver failure.The virus can persist in 90% of neonates and 5-10% of adults, leading to chronic infection that can progress to either chronic hepatitis or an asymptomatic carrier state.Both of these states can ultimately develop liver cancer or hepatocellular carcinoma (HCC), with or without the intermediate cirrhotic stage.Annually, more than 786,000 people die as a result of these clinical manifestations of HBV infection [3].

Prevalence
The prevalence of HBV in a community can be estimated by the proportion of the population, who are hepatitis B surface antigen (HBsAg)-positive carriers.HBV prevalence varies widely in the world [3].The prevalence is low (<1%) in northern Europe, Australia, New Zealand, Canada, and the United States of America.Northern Asia, the Indian subcontinent, parts of Africa, Eastern and south-eastern Europe, and parts of Latin America are areas of intermediate prevalence (1-5%).The high prevalence areas (5-20%) include East and Southeast Asia, the Pacific Islands, and sub-Saharan Africa.

Classification and structure
HBV, the prototype member of the family Hepadnaviridae, belongs to the genus Orthohepadnavirus.With a diameter of 42 nm and a DNA genome of ~3.2 kilobases (kb), it is the smallest DNA virus infecting man.The genome is circular and partially double stranded.One DNA strand is complete, except for a small nick (the minus strand), and the other is short and incomplete (the plus strand).The minus strand contains four overlapping open reading frames (ORFs; Figure 1) [4] that represent: (1) the preS/S gene that codes for the envelope proteins, large, middle, and small HBsAgs; (2) the P gene for DNA polymerase/reverse transcriptase (POL); (3) the X gene for the X protein, a key regulator during the natural infection process, which has transcriptional trans-activation activity and is required to initiate and maintain HBV replication [5]; and (4) the precore/core gene that codes for the HBcAg or core protein that forms the capsid and for an additional protein known as HBeAg, which is not incorporated into the virus itself but is expressed on the liver cells and secreted into the serum.Figure 2 illustrates the structure of the hepatitis B virion.

Regulatory elements of HBV
Every single nucleotide of the HBV genome is necessary for the translation of a protein and may also be part of one of the regulatory elements of HBV, which overlap with protein expressing regions.The regulatory elements include the S1 and S2 promoters, which overlap both the preS region and polymerase ORFs; the preC/pregenomic promoter, which includes the basic core promoter (BCP) and overlaps the X and preC ORF; and the X promoter.There are two enhancers (enhancer I and enhancer II) as well as cis-acting negative regulatory elements (URR: upper regulatory region, CURS: core upstream regulatory sequence, NRE: negative regulatory element).These regulatory elements control transcription (reviewed in [6,7]).

Replication of HBV
HBV and other members of the family Hepadnaviridae have an unusual replication cycle.These DNA viruses replicate by reverse transcription of a RNA intermediate known as the pregenomic RNA (pgRNA) [8].Entry into the cell is via the sodium taurocholate cotransporting polypeptide (NTCP), a multiple transmembrane transporter predominantly expressed in the liver [9].After entry, the virion is uncoated and the core particle is actively transported to the nucleus [10], where the partially double strand relaxed circular DNA molecule is released.The single-stranded gap is closed by the viral polymerase to yield a covalently closed circular molecule of DNA (cccDNA) [11], which is the template for transcription by the host RNA polymerase II [12].The mRNAs are transported into the cytoplasm where they are translated into the seven viral proteins.In addition to being translated into the polymerase and the core protein, the pgRNA is packaged into immature core particles by the process known as encapsidation.In order to be encapsidated, the 5′ end of the pgRNA has to be folded into a particular secondary structure known as the encapsidation signal (ε) [13].
The encapsidation signal (ε) is a bipartite stem-loop structure, consisting of an upper and lower stem, the bulge, and an apical loop.Besides encapsidation, ε has a number of other functions (reviewed in [13]) and references therein.It acts in template restriction so that not any piece of RNA is encapsidated, and it also plays a role in the activation of the viral polymerase, so that there is no indiscriminate reverse transcription.It is also involved in the initiation of reverse transcription.The polymerase or reverse transcriptase acts as a primer of RNA-directed DNA synthesis by the binding of the polymerase to the bulge of ε.The first three nucleotides of the negative stand of DNA are synthesized at the bulge and are transferred to an acceptor site on the 3' end of the pgRNA, where DNA synthesis proceeds toward the 5′ end of the pgRNA [14], giving rise to the immature virion.The virus matures by acquiring its glycoprotein envelope, containing HBsAg, in the endoplasmic reticulum and is exported by vesicular transport from the cell [15].

Genotypes and subgenotypes of HBV
Sequence heterogeneity is a feature of HBV, because the viral-encoded polymerase lacks proofreading ability as mentioned above [16].Using phylogenetic analysis of the complete genome of HBV and an intergroup divergence of greater than 7.5%, HBV has been classified into nine genotypes, A to I [17,18,19], with a putative 10th genotype, "J," isolated from a single individual [20].With between ~4 and ~8% intergroup nucleotide difference across the complete genome and good bootstrap support, genotypes A-D, F, H, and I are classified further into at least 35 subgenotypes [21].The genotypes differ in genome length, the size of ORFs and the proteins translated [17], as well as the development of various mutations [22].Generally, the genotypes, and in some cases the subgenotypes, have a distinct geographic distribution (Table 1).# And in regions outside Africa where there was historical forced migration as a result of the slave trade [23].¥ Vietnamese residing in Canada [24].Table 1.Comparison of the virological and clinical characteristics of the genotypes and subgenotypes of HBV ¶ .

Genotyping and subgenotyping methods
HBV genotypes, and in some cases subgenotypes and various mutations, can influence the clinical course of disease [22] as well as response to antiviral therapy [25] and can be used to show transmission [26] and to trace human migrations [23].Thus, HBV genotyping is becoming increasingly relevant in the clinical setting and may contribute to future personalized treatment [27] and may be important in epidemiological and transmission studies.Bioinformatics has played a major role in the development of various tools that can be used for identifying genotypes/subgenotypes and detecting various mutations.Therefore, a number of methods have been developed [28,29].
Although analysis of the HBV S gene sequence is sufficient to classify HBV into genotypes [30], the complete genome sequence provides additional information with respect to phylogenetic relatedness [31,32], including the identification of recombinants.Furthermore, even though complete genome analysis is the gold standard for genotyping, it does not allow for rapid and direct analysis on a large scale basis [17] and requires expertise and thus capacity development in computer processing coupled with phylogenetic analyses.In order to expedite and facilitate genotyping, a number of methods have been developed [17,28,29].Each one has its advantages and disadvantages [17,28,29], which should be taken into account, when selecting the genotyping method appropriate for a particular study or application.

Phylogenetic analyses of HBV
Although, as already mentioned, the error-prone polymerase of HBV leads to sequence heterogeneity [16], the degree, at which this can occur, is constrained by the partially overlapping ORFs and the presence of secondary RNA structures, such as ε, coded by nonoverlapping regions [33,34].The HBV genome has been estimated to evolve with an error rate of ~10 −3 -10 −6 nucleotide substitutions/site/year [35][36][37][38][39][40][41], although this rate is not constant within the different regions of the HBV genome [41].The progress of computers and information technology has played an important role in the development of phylogenetic analysis as a powerful tool in the analysis of the molecular evolution of viruses.
As exemplified in the next sections, comparative analysis of HBV strains from various geographic regions of the world and from different eras can shed light on the origin, evolution, transmission, and response to anti-HBV preventative and treatment measures.

Origin
The origin and age of the family Hepadnaviridae remains controversial.However, until the issues with the estimation of the substitution rate of HBV [41] are overcome, the debate on the origin of HBV will continue ( [17,41] and references cited therein).Nonetheless, bioinformatics, coupled with growing number of hepadnaviral sequences in the databases, with accurate sampling times, and advances in phylogenetic and coalescent methodology [42], is beginning to shed light on this issue.For example, according to Suh and colleagues [43], analysis of the endogenous sequences in the zebra finch provides direct evidence that the compact genomic organization of hepadnaviruses has not changed during the last 482 million years of hepadnaviral evolution.Furthermore, phylogenetic analyses and distribution of HBV relics suggest that birds potentially are the ancestral hosts of the family Hepadnaviridae and that mammalian hepatitis B viruses probably emerged after a bird-mammal host switch [43].

Evolution
Genetic variation is important in viral evolution.The sequence heterogeneity displayed by HBV because of the lack of proof-reading ability of the polymerase is limited by functional constraints [33], leading to non-random variation [44].Moreover, mutations can be affected by host-virus interaction and selective pressure, imposed endogenously by the immune system and exogenously by vaccination and antiviral treatment [17].Phenotypic resistance to antiviral drugs occurs because of mutations in the reverse transcriptase of POL, whereas mutations in the BCP/preC and preS regions have been implicated as risk factors for the development of HCC.Mutations in the S region coding for HBsAg can lead to both vaccine and detection escape of HBV.At any time, the virus population can be composed of a number of different mutants referred to as "quasispecies" [45].Direct sequencing and more recently next generation sequencing (NGS), parallel with bioinformatics, provide us with powerful tools to study the evolution of the various HBV mutations.NGS or ultra-deep sequencing generates large volumes of data, which can only be analyzed using bioinformatics tools and provides large coverage that can detect minor quasispecies populations of HBV [46][47][48][49][50][51] that may be important in understanding HBV pathogenicity and response to treatment.In order to minimize the number of artifactual calls of single-nucleotide variations in NGS, it is important that the correct reference sequences are used [51,52].
By designing a circular construct, Homs and co-workers [53] were able to use NGS to study evolution of both the precore and polymerase regions.They demonstrated the presence of precore mutants in HBeAg-positive phase, wild-type precore in the HBeAg-negative phase as well as lamivudine resistance strains in treatment naïve patients.This demonstrates that viral strains occurring at low frequencies can act as reservoirs or memory genomes, which are selected and evolve in response to both intrinsic (host immune response) and extrinsic (drug administration) factors.

Transmission and tracing human migrations
Sequencing and bioinformatics have played an important role in demonstrating transmission routes, for which previous evidence could only be anecdotal.For example, molecular characterization of HBV together with phylogenetic analysis was used to demonstrate inter-spousal transmission of HBV even after long marriages, in two Japanese patients, who developed acute liver failure [54].Similarly, the first known case of transfusion-transmitted HBV infection by blood screened using individual donor nucleic acid testing was confirmed by the 99.7% sequence homology between the complete genome sequences of the donor and the recipient HBV strains [26].When migration events were estimated by ancestral state reconstruction using the criterion of parsimony, it was shown that Africa was the most probable source of dispersal of subgenotype A1 of HBV globally and its dispersal to Asia and Latin America occurred as a result of the slave and trade routes [23,55].

Treatment response and resistance to treatment
According to international chronic hepatitis B treatment guidelines, the most desirable endpoint of treatment is HBsAg loss.Following HBsAg loss, patients have better clinical outcomes, including decreased risk of developing cirrhosis and HCC, and death [56].However, the currently available treatments, which include either nucleos(t)ide analogues (NAs) for direct inhibition of the viral polymerase or pegylated interferon (PegIFN) for immunemediated HBV control, generally achieve HBV DNA suppression and HBeAg loss only, which are not enduring.In an attempt to identify viral factors associated with HBsAg loss, Charuworn et al. [57] demonstrated that viral diversity could differentiate those patients, who would lose HBsAg when treated with tenofovir disoproxil fumarate.Lower diversity was seen in the protein-encoding regions of HBV from patients who lost HBsAg compared to those who did not.On the other hand, higher diversity in regulatory elements of HBV was found to be a predictor of HBsAg loss [57].These findings need to be confirmed by studies incorporating larger numbers of patients, as well as genotypes other than A and D.
The high mutation rate of HBV means that it can evolve to develop resistance against NAs that target the viral DNA polymerase.Drug-resistant mutants develop under drug pressure in order for HBV to survive in the presence of the NA.The development of drug resistance mutations can be affected by HBV DNA levels at baseline, rate of viral suppression, length of NA treatment, and prior exposure to NA treatment [58].Sequential treatment with different NAs, following drug failure, can lead to the development of multidrug resistance, which cannot be treated using currently available drugs [59].The most frequent lamivudine drug resistance mutants are rtM204V/I, which are also selected by the L-pyrimidine analogues, emitricitabine, clevudine, and telbivudine but are susceptible to the purine analogues adefovir and tenofovir [59].rtA181V develops following lamivudine treatment but is sensitive to other NAs, whereas rtN236T is resistant to adefovir only.In deciding on treatment options, the detection of genotypic resistance, which is defined as the detection of viral mutations conferring drug resistance, is a priority in clinics.Direct and NGS of the polymerase region of the HBV genome can detect both well-defined and novel mutations.
Bioinformatics tools and databases have been used to better understand HBV mutations and how they develop, especially in response to antiviral therapy and vaccination.Although laboratory methods have been used to study mutations, they are both labor intensive and expensive and limited in the degree of complexity they can investigate.As a more economical alternative, bioinformatics and computer simulation can use available biological data, such as the protein sequence and structural information, to investigate interactions by virus, host, and the environment [60].Thus, Shen et al. [60] showed that most mutations develop in the hydrophobic regions of HBsAg and POL and that the amino acids that are more likely to be mutated are serine and threonine [60].Understanding how amino acids mutations develop in HBV proteins can facilitate the rational design of both vaccines and drugs [60], for the prevention and treatment of HBV infection, respectively.By the use of bioinformatics to compare viral and host genomic patterns, together with clinical information, to data from databases can lead to enhanced and individualized antiviral therapy.

Bioinformatics challenges of HBV
Despite its small genome size of ~3.2 kb, HBV presents several bioinformatic challenges: 1.The genome is circular, with position 1 conventionally taken to be the first "T" nucleotide in the EcoR1 restriction site ("GAATTC").Historically, position 1 was the start of the "Core" region, which is position 1901 in the current numbering system.Therefore, a number of sequences deposited earlier in the public databases are numbered using this outdated system and thus require processing before they can be used in alignments, together with more recently submitted sequences.

2.
Four overlapping reading frames are encoded in the circular genome, whereas nucleotides or amino acids are sequenced and processed linearly.Extracting nucleotide or amino acid sequences for the S and POL ORFs, which span the EcoRI site, from full-length or subgenomic fragments, requires additional processing.

3.
The differences in genome lengths between the nine HBV genotypes (ranging from 3182 to 3248 base pairs in length) mean that direct comparison of loci between genotypes is not always possible using the current numbering system.These differences in genome lengths result in genotype alignments containing several regions of gaps, ranging from 3 to 33 nucleotides in length.A possible solution is the implementation of a standardized "universal numbering system" for all HBV genotypes, which we are currently developing.

4.
Sequence variability is a feature of HBV.It is, therefore, essential to check all sequences carefully, to distinguish between artifacts and true variation (mutations).Variation within a population at a locus may result in two overlapping peaks on a chromatogram.Superinfections or co-infections with different strains may result in mixed populations, which appear as multiple or misaligned peaks on sequencing chromatograms.Disambiguating these is essential for robust downstream analyses.

Public sequence databases
The first public sequence database, "GenBank," was established in 1982, having arisen from the earlier Los Alamos database, established in 1979 [61,62].Since then, the number of nucleotides in GenBank has doubled approximately every 18 months [63].The International Nucleotide Sequence Database Collaboration (INSDC) is a collection of three publicly available nucleotide (DNA or RNA) sequence databases, which synchronize data daily [64].The collection consists of the DNA DataBank of Japan (DDBJ, located in Japan), the European Molecular Biology Laboratory (EMBL, located in the United Kingdom) and GenBank (located in the United States of America).The latest release of the database (release 211.0, from 15 December, 2015; [65]) contains 189,232,925 loci and 203,939,111,071 bases, from 189,232,925 sequences, totaling approximately 742 gigabytes.In addition to the INSDC, many other databases exist, including genome databases, protein sequence, structure and interaction databases, microarray databases, and meta-databases.A list of biological databases on Wikipedia includes over 200 entries [66].
When searching for "hepatitis b virus" across all fields, the GenBank database [63], accessed on 27th January 2016, contained 105,745 sequences.When searching for "hepatitis b virus" in the "organism" field only, 84,119 sequences were found, with the oldest sequence submitted in the early 1980s.Refining this search to include only sequences of 200 nucleotides or longer, and excluding words such as "recombinant," "clone," and "patent," resulted in 68,762 sequences.When this same query was previously executed on 29 November 2015, 67,893 sequences were returned.Therefore, in the 59 days between the two queries, 869 new sequences (of at least 200 nucleotides in length, and not containing the words mentioned previously) were uploaded to GenBank.On average, this equates to almost 15 new HBV sequences added to GenBank per day.
Making use of these sequences in downstream applications, such as multiple sequence alignments or phylogenetic analyses, is often challenging, as it is difficult to query for sufficient sequences, of the correct genotype, or subgenotype, and covering the required genomic region.
In order to overcome this limitation, we have developed a bioinformatics solution, whereby all sequences matching a query are downloaded, curated, and aligned.The algorithm developed allows for the generation of a multiple sequence alignment for each genotype, which contains all the available sequences matching the query and in their correct position and orientation [67].

GenBank Submission PadSeq
• Places two HBV sequence fragments on a backbone template ¶ Table modified from Bell and Kramvis [68].* Described for the first time here.

Table 2.
List of the online tools developed and the workflow process at which each would be used ¶ .

Bioinformatics -Updated Features and Applications
A standard molecular biology laboratory workflow includes DNA extraction, polymerase chain reaction (PCR) amplification, direct DNA sequencing, viewing and checking of chromatograms, preparation of curated sequences, multiple sequence alignment, sequence analysis, serotyping, genotyping, phylogenetic analysis, and preparation of sequences for submission to the GenBank public sequence database [68].Each of these steps presents data processing challenges, many of which have been addressed by the development of a suite of online tools (Table 2) [68].Any operating system platform from any location with an internet connection can be used to access stand-alone, web-based tools.There is no requirement to install and learn new bioin- formatics software, as these tools can be used when required.A system for processing ultradeep pyrosequencing (amplicon resequencing) data has also been developed [51].In addition, a number of HBV-specific websites and databases are currently available, a selection of which are represented in Table 3.

New bioinformatics tools for HBV
Here, we present two newly developed tools for the bioinformatic analysis of HBV.

Divergence calculator [http://hvdr.bioinf.wits.ac.za/divergence/]
One method of classifying HBV sequences into genotype or subgenotype is to examine nucleotide sequence divergence between sequences.This divergence calculation is performed by totaling the number of nucleotides, which differ, between two aligned sequences and computing the percentage difference.The divergence calculator (Figure 3) performs various divergence calculations on groups of sequences from nucleotide or amino acid multiple sequence alignments in FASTA format.A minimum of one group containing two sequences, or two groups containing one sequence each, must be specified.As an example, consider an alignment of 10 genotype A sequences (group 1) and 10 genotype D sequences (group 2).Intra-group divergence, for each group, is calculated by comparing each sequence in group 1 with each other sequence in group 1 and then calculating the median, mean, and standard deviation of the divergences.This is then repeated for group 2. The intergroup divergence compares each sequence in group 1 with each sequence in group 2, and then calculates the median, mean, and standard deviation.If more than two groups are specified, the calculations iterate over all groups in turn.
If the optional "query" group is specified, the tool compares each sequence in the query group with each sequence in the other group or groups, but outputs statistics for each sequence in the query group individually.This method would typically be used with a set of unknown query sequences and one or more groups of reference sequences.A comprehensive list of descriptive statistics is included on the output page for each analysis.

Random FASTA extraction and allocation (RAFAEL) [http://hvdr.bioinf.wits.ac.za/rafael/]
In some analyses, particularly when constructing phylogenetic trees, it may be desirable to extract one or more random subsets of sequences from a master or reference alignment.The "RAFAEL" tool was designed to perform this task.This tool takes an input file in FASTA format, which does not have to be aligned and generates one or more subsets of the file, each containing a random selection of the specified number of sequences.The number of sequences may be specified as a count, or as a percentage of the number of sequences in the input file.
There are guaranteed to be no duplicate sequences within each subset.However, duplicates may exist in multiple subsets, as subsets are not unique.

Open-source software
In addition to biological databases, a large variety of biological analysis software, which is generally genome agnostic, is available.As with software in any field, the licensing terms and commercial costs of these packages vary widely.Packages, which may be free of cost, may not necessarily be open-source, for example.
The Free Software Foundation (FSF) [76,77] defines free software as software which "respects the users' freedom" in the sense that "users have the freedom to run, copy, distribute, study, change, and improve the software".As such, "free" is "a matter of liberty, not price".Free software, therefore, does not necessarily have to be made available at no cost or be a noncommercial project.Furthermore, software, which is provided at no cost, may not be "free" in the sense described above.
The term "open-source" is often used when referring to "free" software.However, the two terms are not synonymous, although there is some overlap.Open-source software may, or may not, be free software, depending on the restrictions placed on users by the software.If the user is not free to distribute, change, and improve the software, even if it is open source, then it cannot be considered to be free software.Most software, for which a license is purchased, is not free, or open source.The user does not have the freedom to distribute the software, or to use it on any computer chosen.

Recommended software
A list of recommended freely available download software is presented in * "GUI" = graphical user interface, "CL" = command line interface, "OSS" = open-source software, "Lin" = GNU/Linux, "Mac" = Apple MacIntosh, "Win" = Microsoft Windows, "Emu" = emulator or virtual machine recommended by authors, "Com" = compilation from source code required.
Table 4. Bioinformatics software available free of charge for various computer operating system platforms.

Conclusion
The unique genome structure and molecular biology of HBV pose a number of challenges, and thus, the development of bioinformatic tools has facilitated a more comprehensive and detailed analysis and understanding of the origin, evolution, transmission, and response to antiviral agents of HBV and its interaction with the host.There are a wide range of free and commercially available tools, which have been developed for different applications.The availability and applications of high-throughput sequencing techniques and the advancement of "-omics" will continue to provide additional challenges, which will need to be addressed by further computational solutions.

Figure 1 .Figure 2 .
Figure 1.The genome of hepatitis B virus (HBV).The partially double-stranded DNA (dsDNA) with the complete minus (−) strand and the incomplete (+) strand.The four open reading frames (ORFs) are shown: precore/core (preC/C) that encodes the e antigen (HBeAg) and core protein (HBcAg); P for polymerase (reverse transcriptase), PreS1/PreS2/S for surface proteins [three forms of HBsAg, small (S), middle (M), and large (L)] and X for a transcriptional trans-activator protein.

•Wild-type 2 × 2 •••
Plots chromatogram quality scoresAutomatic contig generator tool• Generates a contig from a forward and reverse chromatogramAlignmentAutomatic alignment clean-up tool • Eliminates "gap-columns" and disambiguate ambiguous bases Mind the gap • Splits FASTA file based on gap threshold per column Analysis Babylon • Extracts HBV protein sequences (ORFs) Calculates 2 × 2 wild-type/mutant contingency tables Divergence calculator* • Intra-and Inter-group divergence with custom groups Rafael* Generates random subsets from an input FASTA file Serotyping Generates a phylogenetic tree

Figure 3 .
Figure 3.The input screen of the divergence calculator in which sequences are extracted and allocated to groups and other parameters specified.

Table 3 .
Currently available HBV websites and databases ¶ .