Multiple Functions and Disordered Nature of Nucleocapsid Proteins of Retroviruses and Hepadnaviruses

This chapter aims at presenting small viral proteins that orchestrate replication of the human immunodeficiency virus type-1 (HIV-1) and the human hepatitis virus (HBV), two canonical examples of small human pathogens. HIV-1 nucleocapsid protein (NC) and the C-terminal domain (CTD) of the HBV core protein (HBc) are essential structural components of the virus capsid ensuring protection of the viral genome; they also chaperone replication of the HIV-1 genomic RNA and the HBV DNA by a reverse-transcription mode, and later, these proteins kick-start virus morphogenesis. HIV-1 NC and HBV CTD belong to the family of intrinsically disordered proteins (IDP), a characteristic rendering possible a large number of molecular interactions. Although these viral proteins share little sequence homologies, they have in common to be rich in basic amino acids and endowed with RNA-binding and chaperoning activities. Similar viral RNA-binding proteins (vRBP) are also encoded for by other virus families, notably flaviviruses, hantaviruses, and coronaviruses. We discuss how these vRBPs function based on the abundant RBP family that plays key physiological roles via multiple interactions with non-coding RNA regulating immune defenses and cell stress. Moreover, these RBPs are flexible molecules allowing dynamic interactions with many RNA and protein partners in a semi-solid milieu favoring biochemical reactions.


Forewords on viruses and RNA chaperones
Viruses that replicate their genome by the process of reverse transcription (RTion) are common in animals, plants, algae, and fungi [1]. These so-called reverse-transcribing viruses have been classified into five different families, namely, Caulimoviridae, Hepadnaviridae, Metaviridae, Pseudoviridae, and Retroviridae to which was the recently added Belpaoviridae [2]. Among these widespread viruses, two are major human pathogens, the human immunodeficiency virus type 1 (HIV-1) and the human hepatitis B virus (HBV).
Retroviruses exist as infectious exogenous RNA viruses as well as endogenous retroelements (ERV) present at high copy numbers in the genome of vertebrates.
Hepadnaviruses can also integrate their genome in the host genome but at a much lower rate [3].
Replication of the genome of these two classes of virus necessitates a reverse transcription step. For HIV-1 the genomic RNA of 9600 nt in length has a structure similar to cellular mRNAs with a 5′ cap and 3′ poly A and contains 9 genes leading to the expression of 15 proteins. Retroviruses replicate their genome by a copy and paste mechanism, whereby the single-stranded positive-sense retroviral genomic RNA is converted into a double-stranded DNA by the virion reverse transcriptase (RT enzyme) [4], subsequently integrated into the host genome [5]. The integrated viral DNA called provirus is expressed by the host transcription machinery to synthesize the full-length viral RNA (FL RNA), which after nuclear export in the cell cytoplasm is translated by the ribosomes to synthesize the major structural proteins and enzymes, the Gag and Gag-Pol precursors. Specific interactions of the genomic RNA with the Gag polyprotein precursor drive Gag polymerization and viral core assembly at the plasma membrane (PM) [6].
For hepadnaviruses the small double-stranded DNA genome in a relaxed circular form (rcDNA) is targeted to the nucleus after virus infection where it is converted into a covalently closed circular form (cccDNA) and expressed by the transcription machinery of the infected cell to synthesize the full-length RNA called pre-genomic RNA (pgRNA) [7]. Upon translation of the pgRNA, the newly made core protein and RT enzyme interact with the pgRNA to synthesize the ds DNA genome. The genome of this virus has unique features such as an extensive overlapping of the genes, namely, 3200 nt with four coding sequences leading to the expression of seven proteins for HBV, and a pseudo-circular structure [8]. In addition several of the HIV and HBV proteins were found to be multifunctional, notably NC, TAT, and VIF protein for HIV and the HBV core protein (HBc) [9,10].
These two classes of viruses probably emerged during the early Paleozoic Era, some 450-520 million years ago, with a marine origin [11]. The HBVs seem to originate from non-enveloped progenitors called nackednaviruses present in fishes, some 400 million years ago [12].
In addition to an RNA/DNA-dependent DNA polymerase called reverse transcriptase with an associated RNase H activity, these two classes of small viruses encode for a core protein endowed with RNA-binding, unwinding, annealing, and matchmaker activities and the ability to cause the formation of nucleoprotein complexes with a gel-like milieu favoring molecular crowding and biochemical reactions such as reverse transcription.
This chapter will briefly review the multiple roles of the core proteins drawing a parallel between the HIV-1 Gag and the HBV core. In fact these viral core proteins turn out to be much more than a structural component forming a cage enveloping the genome since they provide assistance to the RT-RNase H enzyme at all steps of viral DNA synthesis and then ensure stability of the newly made viral DNA.
Despite common functions in HIV and HBV morphogenesis and replication, the core protein appears much different from Gag on an amino acid sequence basis, but taking a closer look at their activities and functions reveals that these viral proteins are similar. Standard assays for monitoring the nucleic acid chaperone activity. These in vitro chaperoning assays summarize several properties of nucleic acid chaperone proteins, notably their ability to rapidly anneal complementary nucleic acid sequences (top panel) and favor formation of the most stable duplex, in physiological-like conditions. Bottom panel: R+ and R− sequences represent the 5′ end repeats of the HIV-1 genome of 96 nt in length. R− (mut) contains three mutated residues at its 3′ end in order to generate 3 nt mismatch upon annealing to R+; this was achieved by incubating R+ and R− mut at 66°C for 1 h. Next R− WT is added together with NC protein for 5 min at 30°C. The duplex and ss R− mut were resolved by native gel electrophoresis. Adapted from Darlix et al. [15].
in addition there are two small peptides p1 and p2 flanking NC in the Pr55 gag (Figure 2) [16]. The N-terminus is myristoylated, which, together with a row of basic residues within MA, targets Gag to the plasma membrane where assembly takes place [17].
In infected cells the full-length viral RNA is translated by the ribosome machinery to produce the Gag and Gag-Pol polyprotein precursors. The present model of assembly stipulates that newly made Gag molecules accumulate in the cytoplasm, probably in the vicinity of the translating polysomes [18] where they kick-start virus assembly (Figure 3); this is achieved through two types of interactions (i) Gag-NC with the 5′ untranslated region (5′ UTR) of FL RNA [19] and (ii) the myristoylated matrix domain with phospholipids of the T-cell membrane [20]. These interactions target the Gag-RNA nucleoprotein complexes to the plasma membrane, causing Gag-oligomer formation; the nucleocapsid domain binds and selects the genomic RNA causing its dimerization and at the same time, together with the capsid domain, boosts Gag multimerization (Figure 3). These interactions between Gag and phospholipids as well as RNA lead to virus assembly that takes place at the plasma membrane. Subsequently, virus maturation occurs during the budding process, together with the recruitment of the envelope glycoproteins by the matrix domain [21] (Figure 3).
Maturation is a complex process whereby the core of HIV-1 becomes conical and at the same time the genomic RNA dimer is condensed, thus leading to the formation of infectious particles [23]. However most HIV-1 virions and more generally retroviral particles are noninfectious. As a matter of fact, the ratio of infectious virus to noninfectious particles is from 1:10 to 1:10 4 [24,25]. Thus a majority of particles are noninfectious most probably caused by the loss of envelope proteins, degradation of the genomic RNA, or else correspond to defective-interfering particles (DIP) that can lead to an underestimation of virus infectivity [26].

Characteristics of retroviral nucleocapsid proteins
Retroviral nucleocapsid proteins are small basic proteins with either one (MuLV, gammaretrovirus) or two CCHC zinc fingers (HIV and FIV, lentiviruses; RSV an Alpharetrovirus) ( Table 1). The zinc fingers are structured upon Zn 2+ binding (in red), while the flanking domains are disordered and basic. Therefore, these viral proteins are members of the large family of intrinsically disordered proteins/ intrinsically disordered protein domains (IDPDs) [27][28][29]. Of note all these NC proteins are endowed with RNA-binding and chaperoning activities as shown using in vitro reconstituted systems [15, [30][31][32]. Other important characteristics are the ability of these NC proteins to cause the formation of nucleoprotein complexes capable of recruiting enzymes such as reverse transcriptase and integrase (IN) [33]. In this gel-like milieu, molecular crowding can take place, thus facilitating enzymatic reactions, such as cDNA synthesis by RT and integration by IN. Along this line, NC protein interacts with RT improving the fidelity of cDNA synthesis by several different ways: (i) inhibition of self-primed initiation of cDNA synthesis, (ii) chaperoning the obligatory minus-and plus-stranded transfers for the synthesis of the LTR flanking the viral DNA (Figure 1), and (iii) improving the processivity of RT as well as its excision repair activity resulting in a much higher fidelity of viral DNA synthesis (Figure 1 on chaperoning assays).
How is this achieved? According to Uversky, protein-RNA interfaces are most probably very large with the concomitant implications of basic, hydrophobic, and aromatic residues engaged, respectively, in ionic, hydrophobic, and intercalating interactions [34][35][36]. The interactions between NC and RT are poorly understood, but they appear to necessitate the RNA template as the scaffolding agent [37,38].  [22].
In that respect the RTp66 subunit with its active site appears to be extremely flexible with notably a large template-binding pocket. These observations favor the notion that the viral proteins NC and RT and the template RNA making up the replication machine exhibit a flexible nature in an active nucleoprotein complex in agreement with the proposal of Uversky [29]. These in vitro and ex vivo studies on retroviral Gag polyproteins and NC proteins have essentially been carried out using HIV-1 ( Table 1); additional experiments performed with Alpharetrovirus RSV NCp12 with two zinc fingers flanked by basic residues; NCp10 of the gammaretrovirus MuLV, with a unique zinc finger flanked by basic residues; and of the yeast retrotransposon Ty3 NCp9, with a unique zinc finger and basic residues, gave very similar results with respect to RNA binding, chaperoning, and ribonucleoprotein complex formation in vitro ( Table 1).

The core protein of HBV and its roles in virus assembly and viral DNA synthesis
HBV is an enveloped virus with a 3.2 kb partially double-stranded DNA genome referred to as rcDNA [7] that is synthesized by reverse transcription of the pgRNA [39]. The core protein contains 183-185 residues corresponding to two domains (Figure 4): the N-terminus (NTD) (residues 1-140) that oligomerizes in a capsid structure linked by a flexible sequence to the basic C-terminal domain (CTD) (residues 150-183) [40,42,43]. The core CTD interacts with nucleic acids and is endowed with nucleic acids annealing, matchmaker, and aggregating activities [8,44].
The HBV core protein orchestrates virus assembly to form an icosahedral capsid [54] (Figure 5). During assembly, HBc specifically recognizes the Pol-pgRNA complex [55], promotes its packaging into nascent particles, and assists rcDNA synthesis by the viral RT and cccDNA maintenance [56][57][58] (for review see Seeger and Mason [59]). The processes of RTion and capsid maturation are regulated by CTD phosphorylation/dephosphorylation [41,[60][61][62] together with structural rearrangements of the capsid structure [63][64][65]; this influences capsid trafficking in virusproducing cells and is driven by an unknown mechanism the viral ribonucleoprotein (vRNP) complex to the nucleus and thus the formation of cccDNA. Else the vRNP is targeted to cellular compartments where they interact with the envelope proteins  [68]). HBV secretion remains a challenging issue since different types of viral particles are found in the circulating blood of patients. Despite a heterogeneous distribution from patients HBV core primary sequence. The core protein is divided into two parts: the N-terminus (1-140) and the C-terminus (150-183). The N-terminus is sufficient in vitro for the process of self-assembly [8]. NTD monomer contains a series of five α helices, the third and the fourth helices associate in a four-helix bundle giving the characteristic spikes at the surface of assembled capsids [40]. The C-terminus contains arginine residues essential for the interaction of core with NA an activity that is regulated by the phosphorylation of the seven serine residues [41]. The effect of mutations in the flexible linker between NTD and CTD (141-149) suggests that the orientation of NTD respective to CTD is essential in the multistage process of HBV replication [42]. (1) Virus attachment to the sodium-taurocholate cotransporting polypeptide (NTCP) and entry [45]. (2) Nucleocapsid release in the cytoplasm upon fusion of the cellular and viral membranes and trafficking to the nucleus [43]. This traffic is probably mediated by the CTD containing NLSs [46]. (3) Nuclear pore attachment of the nucleocapsid and release of the rcDNA (relaxed circular) into the nucleus with reorganization of the capsid [47]. (4) Conversion of the rcDNA into cccDNA (covalently closed circular DNA) and formation of a nucleosome-bound minichromosome, possibly associated with HBx, core protein, histone, and nonhistone cellular proteins [7]. (5) Transcription generating the pre-genomic RNA, precore RNA, preS1/preS2/S mRNAs, and HBx mRNA [48]. (6) Synthesis of the viral proteins by the cell machinery. (7) Production of HBeAg and assembly of S alone with some L protein giving rise to subviral particles. (8) Formation of empty capsids or pgRNA-Pol containing capsids with immature nucleocapsids. Reverse transcription of pgRNA to generate rcDNA is concomitant with the maturation of the nucleocapsid. (9) Both empty and mature particles are embedded by L and S proteins and produced in the supernatant [49]. The egress of particles necessitates the endosomal sorting complexes required for transport (ESCRT) [50] even though naked capsids were shown to be released through the ALIX pathway [51]. (10) Alternatively, the rcDNA-containing particles recycle to the nucleus, amplifying the cccDNA copy numbers [52]. Adapted from Revill et al. [53].
to patients, it is estimated that most of the particles are consisting of the sole envelope proteins (HBsAg) as sphere and filaments also referred to as Australian antigen (10 14 /ml), empty particles (without genome, 10 11 /ml), RNA-containing virions (10 6 /ml), and complete infectious particles (Dane particles, 10 9 /ml) [49]. The low amount Dane particles remain unclear, and it has been proposed that reverse transcription of the pgRNA triggers a structural change of the capsid (maturation signal), which in turn causes the envelopment and secretion of complete infectious virions [65,69]. More recently the group of Hu et al. proposed a two-signal model, the first one exposed in the empty particles at the level of the NTD-CTD linker resulting in an interaction with the S protein and the second one exposed in maturate particles at the level of the MBD (matrix-binding domain) to cause an interaction with the L envelope [70,71] (for review see Liu and Hu [72]). Even though the molecular bases of these two domains remain to be clarified, they both lie on the capsid structure in agreement with the large effect of capsid envelopment by single-point mutations around the hydrophobic pocket in the center of the spikes [73][74][75][76]. Thus the envelopment of the vRNP is closely linked to the HBc protein that represents a critical factor in virus replication and as a matter of fact, a target in the search for antiviral molecules [77,78].

The CTD of HBV core protein has nucleic acid chaperone activity
The nucleic acid chaperone activity of the CTD was first suggested by the group of Loeb [66] and of Zlotnick [79]. In the first case, they followed the strand transfer of the initial (−) DNA from the 5′ ε bulge to the 3′ DR1 sequence and (−) DNA elongation. This suggests that the core protein has a nucleic chaperone activity similar to retroviral NC [30,32,34,80,81]. In the second case, using core constructs mimicking the unphosphorylated or phosphorylated core, they found a correlation between the number of positive charge in the core protein and the RNA density suggesting that the core protein induced RNA structural modification.
The RNA/DNA chaperoning activity of HBc was confirmed using DNA-DNA hybridization and hammerhead ribozyme cleavage in vitro [56]. In the first assay, authors followed the annealing of the DNA version of HIV TAR. This sequence is located in the 5′ end of HIV genome. In addition to its role in the RNA transcription with TAT protein [82,83], this sequence is essential for the (−) singlestranded transfer along the RT-dependent synthesis of HIV DNA [30,32,34,81]. Interestingly they compared assembled and disassembled HBc particles and found that dissembled HBc was more efficient in DNA duplex formation. Using a series of peptides, they found that this chaperoning activity maps at the CTD and required the four stretches of basic residues (Table 1). When a peptide is containing phosphorylated serine residues at the positions 155, 162, and 170, considered as the three major serine phosphorylation sites, the DNA annealing activity was progressively reduced as a function of the number of phosphorylation sites. Similar results were obtained with the hammerhead ribozyme cleavage assays [27] (Figure 6).
The role of HBc nucleic acid annealing and matchmaker activities was assessed using viral particles. Therefore, a plasmid expressing HBV genome with a stop codon in the HBc ORF was cotransfected with a plasmid expressing core in agreement with the fact that HBV competent for the replication can be obtained by trans-complementation assays [84,85]. Mutations shown to decrease HBc in vitro activity gave rise to an important decrease of HBV DNA synthesis and a loss of viral replication. These results support the notion that HBc has nucleic acid chaperone activity essential for minus-stranded and plus-stranded DNA synthesis along the replication cycle. Nevertheless, the defect observed in viral DNA synthesis could originate from a defect of HBc assembly [86,87], RNA encapsidation [88,89], HBc trafficking [46,68,90], or HBc maturation/single-stranded blocking model [65,[69][70][71] since all these steps require the arginine residues of the C-terminus.

Concluding remarks and questions
As for retroviruses that are widespread in living organisms and can rapidly and efficiently circulate and spread in animals, even crossing species barriers, the small basic protein called NC has multiple functions in virus structure, replication, and dissemination. Indeed, NC protein is a helper factor for the RT enzyme and its associated RNAse H activity and also for the integrase enzyme. NC is indeed a chaperoning factor required from the start to the end of viral DNA synthesis as well as for the recruitment of cofactors required for transport (Figure 7). Also they are considered as membrane-less organelles that play key roles in cells such as fine tuning of gene expression, translation, and immune controls via noncoding RNAs. Furthermore, NC is a key factor for driving the recombination reactions fuelling the genetic diversity of the newly made viral particles.
The HBV C-terminal domain (C-ter) appears to play multiple roles in virus replication in a manner similar to the retroviral NC protein, by chaperoning genome replication ensuring the fidelity of the viral DNA synthesis and its stability once it is complete.
There are many questions on how do such viral nucleoprotein complexes function.
One concerns the process of reverse transcription, i.e., how does the RT enzyme copy the RNA molecule coated by hundreds of such highly basic protein molecules? Another one deals with the permeability of these viral nucleoprotein ensembles [91], i.e., the accessibility of cofactors that can be helper or restriction factors such as cytidine and adenosine deaminases, apolipoprotein B-editing catalytic subunit (APOBEC) [92], or adenosine deaminase acting on RNA-1 (ADAR1) that was recently shown to inhibit HBV replication by enhancing microRNA-122 processing [93]. Furthermore, the recruitment of restriction factors is not an on-off process since a limited accessibility appears to take place for both HIV and HBV, impacting on the genetic diversity of the virus that is a major issue in antiviral treatments [94,95]. On a more general basis, vRBP's with chaperoning activities are widespread in the virus world, since they are, for example, encoded for by other virus families such as flaviviruses, notably the core proteins of HCV and dengue viruses, the N protein of coronaviruses and hantaviruses, and the delta antigen of the HDV viroid [96][97][98][99][100][101][102][103]. In addition, unpublished data show that the N protein of influenza virus has also chaperone activity.
Similar RNA chaperone proteins are found in bacteriophages of the Leviviridae family where they regulate vRNA translation and assembly [104]. Also a closer look at the replication of the Q-beta genomic RNA by the viral replicase reveals the chaperoning contribution of the host factors EF-Tu and Ts. These data favor the notion that RNA chaperones may also influence protein conformation and enzymatic activity, raising the possibility that such proteins are Janus chaperones [28,105,106].