Sequences of retroviral nucleocapsid proteins and the C-terminus of HBV core protein. The one-letter code has been used, and the basic domains are in black, while the CX2CX4HX2C zinc fingers are in red. Note the low complexity basic domains flanking the NC zinc fingers. For the yeast retrotransposon Ty3, note the zinc finger essential for TY3 retrotransposition and the low complexity flanking sequences. The C-terminal domain of the HBV core protein has four R-rich sequences and is of low complexity.
This chapter aims at presenting small viral proteins that orchestrate replication of the human immunodeficiency virus type-1 (HIV-1) and the human hepatitis virus (HBV), two canonical examples of small human pathogens. HIV-1 nucleocapsid protein (NC) and the C-terminal domain (CTD) of the HBV core protein (HBc) are essential structural components of the virus capsid ensuring protection of the viral genome; they also chaperone replication of the HIV-1 genomic RNA and the HBV DNA by a reverse-transcription mode, and later, these proteins kick-start virus morphogenesis. HIV-1 NC and HBV CTD belong to the family of intrinsically disordered proteins (IDP), a characteristic rendering possible a large number of molecular interactions. Although these viral proteins share little sequence homologies, they have in common to be rich in basic amino acids and endowed with RNA-binding and chaperoning activities. Similar viral RNA-binding proteins (vRBP) are also encoded for by other virus families, notably flaviviruses, hantaviruses, and coronaviruses. We discuss how these vRBPs function based on the abundant RBP family that plays key physiological roles via multiple interactions with non-coding RNA regulating immune defenses and cell stress. Moreover, these RBPs are flexible molecules allowing dynamic interactions with many RNA and protein partners in a semi-solid milieu favoring biochemical reactions.
- RNA chaperoning
- molecular crowding
1. Forewords on viruses and RNA chaperones
Viruses that replicate their genome by the process of reverse transcription (RTion) are common in animals, plants, algae, and fungi . These so-called reverse-transcribing viruses have been classified into five different families, namely,
Retroviruses exist as infectious exogenous RNA viruses as well as endogenous retroelements (ERV) present at high copy numbers in the genome of vertebrates. Hepadnaviruses can also integrate their genome in the host genome but at a much lower rate .
Replication of the genome of these two classes of virus necessitates a reverse transcription step. For HIV-1 the genomic RNA of 9600 nt in length has a structure similar to cellular mRNAs with a 5′ cap and 3′ poly A and contains 9 genes leading to the expression of 15 proteins. Retroviruses replicate their genome by a copy and paste mechanism, whereby the single-stranded positive-sense retroviral genomic RNA is converted into a double-stranded DNA by the virion reverse transcriptase (RT enzyme) , subsequently integrated into the host genome . The integrated viral DNA called provirus is expressed by the host transcription machinery to synthesize the full-length viral RNA (FL RNA), which after nuclear export in the cell cytoplasm is translated by the ribosomes to synthesize the major structural proteins and enzymes, the Gag and Gag-Pol precursors. Specific interactions of the genomic RNA with the Gag polyprotein precursor drive Gag polymerization and viral core assembly at the plasma membrane (PM) .
For hepadnaviruses the small double-stranded DNA genome in a relaxed circular form (rcDNA) is targeted to the nucleus after virus infection where it is converted into a covalently closed circular form (cccDNA) and expressed by the transcription machinery of the infected cell to synthesize the full-length RNA called pre-genomic RNA (pgRNA) . Upon translation of the pgRNA, the newly made core protein and RT enzyme interact with the pgRNA to synthesize the ds DNA genome. The genome of this virus has unique features such as an extensive overlapping of the genes, namely, 3200 nt with four coding sequences leading to the expression of seven proteins for HBV, and a pseudo-circular structure . In addition several of the HIV and HBV proteins were found to be multifunctional, notably NC, TAT, and VIF protein for HIV and the HBV core protein (HBc) [9, 10].
These two classes of viruses probably emerged during the early Paleozoic Era, some 450–520 million years ago, with a marine origin . The HBVs seem to originate from non-enveloped progenitors called nackednaviruses present in fishes, some 400 million years ago .
In addition to an RNA/DNA-dependent DNA polymerase called reverse transcriptase with an associated RNase H activity, these two classes of small viruses encode for a core protein endowed with RNA-binding, unwinding, annealing, and matchmaker activities and the ability to cause the formation of nucleoprotein complexes with a gel-like milieu favoring molecular crowding and biochemical reactions such as reverse transcription.
This chapter will briefly review the multiple roles of the core proteins drawing a parallel between the HIV-1 Gag and the HBV core. In fact these viral core proteins turn out to be much more than a structural component forming a cage enveloping the genome since they provide assistance to the RT-RNase H enzyme at all steps of viral DNA synthesis and then ensure stability of the newly made viral DNA.
Despite common functions in HIV and HBV morphogenesis and replication, the core protein appears much different from Gag on an amino acid sequence basis, but taking a closer look at their activities and functions reveals that these viral proteins are similar.
2. The RNA folding problem and RNA chaperones
The need for RNA chaperones comes from the RNA folding problem whereby RNA molecules have to find their native functional structure in an extremely wide landscape of structures . In fact, RNA chaperones are as diverse and abundant as RNA molecules, coding and noncoding from prokaryotes to eukaryotes . Recent findings highlight the fact that RNA chaperones are disordered in nature and function in a disordered state and do not require ATP as a source of energy to direct RNA folding . Instead RNA chaperones seem to exploit a mechanism of an energy transfer during a rapid on-off RNA-binding kinetics. A number of standard assays are used to monitor RNA chaperoning activity; notably binding, fraying, and annealing of complementary sequences; activation of hammerhead ribozyme-directed cleavage of an RNA substrate; and formation of a dense nucleoprotein complexes. Figure 1 illustrates assays aimed at describing the influence of NC on DNA strand transfers that occur during the process of reverse transcription resulting in the synthesis of cDNA.
3. The retroviral GAG polyprotein and its multiple roles
The major structural proteins of retroviruses are encoded for by Gag that is formed of several modular domains, namely, Map17, Cap24, NCp7, and p6; in addition there are two small peptides p1 and p2 flanking NC in the Pr55 gag (Figure 2) . The N-terminus is myristoylated, which, together with a row of basic residues within MA, targets Gag to the plasma membrane where assembly takes place .
In infected cells the full-length viral RNA is translated by the ribosome machinery to produce the Gag and Gag-Pol polyprotein precursors. The present model of assembly stipulates that newly made Gag molecules accumulate in the cytoplasm, probably in the vicinity of the translating polysomes  where they kick-start virus assembly (Figure 3); this is achieved through two types of interactions (i) Gag-NC with the 5′ untranslated region (5′ UTR) of FL RNA  and (ii) the myristoylated matrix domain with phospholipids of the T-cell membrane . These interactions target the Gag-RNA nucleoprotein complexes to the plasma membrane, causing Gag-oligomer formation; the nucleocapsid domain binds and selects the genomic RNA causing its dimerization and at the same time, together with the capsid domain, boosts Gag multimerization (Figure 3). These interactions between Gag and phospholipids as well as RNA lead to virus assembly that takes place at the plasma membrane. Subsequently, virus maturation occurs during the budding process, together with the recruitment of the envelope glycoproteins by the matrix domain  (Figure 3).
Maturation is a complex process whereby the core of HIV-1 becomes conical and at the same time the genomic RNA dimer is condensed, thus leading to the formation of infectious particles . However most HIV-1 virions and more generally retroviral particles are noninfectious. As a matter of fact, the ratio of infectious virus to noninfectious particles is from 1:10 to 1:104 [24, 25]. Thus a majority of particles are noninfectious most probably caused by the loss of envelope proteins, degradation of the genomic RNA, or else correspond to defective-interfering particles (DIP) that can lead to an underestimation of virus infectivity .
4. Characteristics of retroviral nucleocapsid proteins
Retroviral nucleocapsid proteins are small basic proteins with either one (MuLV, gammaretrovirus) or two CCHC zinc fingers (HIV and FIV, lentiviruses; RSV an
How is this achieved? According to Uversky, protein-RNA interfaces are most probably very large with the concomitant implications of basic, hydrophobic, and aromatic residues engaged, respectively, in ionic, hydrophobic, and intercalating interactions [34, 35, 36]. The interactions between NC and RT are poorly understood, but they appear to necessitate the RNA template as the scaffolding agent [37, 38]. In that respect the RTp66 subunit with its active site appears to be extremely flexible with notably a large template-binding pocket. These observations favor the notion that the viral proteins NC and RT and the template RNA making up the replication machine exhibit a flexible nature in an active nucleoprotein complex in agreement with the proposal of Uversky . These in vitro and ex vivo studies on retroviral Gag polyproteins and NC proteins have essentially been carried out using HIV-1 (Table 1); additional experiments performed with
5. The core protein of HBV and its roles in virus assembly and viral DNA synthesis
HBV is an enveloped virus with a 3.2 kb partially double-stranded DNA genome referred to as rcDNA  that is synthesized by reverse transcription of the pgRNA . The core protein contains 183–185 residues corresponding to two domains (Figure 4): the N-terminus (NTD) (residues 1–140) that oligomerizes in a capsid structure linked by a flexible sequence to the basic C-terminal domain (CTD) (residues 150–183) [40, 42, 43]. The core CTD interacts with nucleic acids and is endowed with nucleic acids annealing, matchmaker, and aggregating activities [8, 44].
The HBV core protein orchestrates virus assembly to form an icosahedral capsid  (Figure 5). During assembly, HBc specifically recognizes the Pol-pgRNA complex , promotes its packaging into nascent particles, and assists rcDNA synthesis by the viral RT and cccDNA maintenance [56, 57, 58] (for review see Seeger and Mason ). The processes of RTion and capsid maturation are regulated by CTD phosphorylation/dephosphorylation [41, 60, 61, 62] together with structural rearrangements of the capsid structure [63, 64, 65]; this influences capsid trafficking in virus-producing cells and is driven by an unknown mechanism the viral ribonucleoprotein (vRNP) complex to the nucleus and thus the formation of cccDNA. Else the vRNP is targeted to cellular compartments where they interact with the envelope proteins promoting virus egress [43, 66, 67] (for review see Blondot et al. ). HBV secretion remains a challenging issue since different types of viral particles are found in the circulating blood of patients. Despite a heterogeneous distribution from patients to patients, it is estimated that most of the particles are consisting of the sole envelope proteins (HBsAg) as sphere and filaments also referred to as Australian antigen (1014/ml), empty particles (without genome, 1011/ml), RNA-containing virions (106/ml), and complete infectious particles (Dane particles, 109/ml) . The low amount Dane particles remain unclear, and it has been proposed that reverse transcription of the pgRNA triggers a structural change of the capsid (maturation signal), which in turn causes the envelopment and secretion of complete infectious virions [65, 69]. More recently the group of Hu et al. proposed a two-signal model, the first one exposed in the empty particles at the level of the NTD-CTD linker resulting in an interaction with the S protein and the second one exposed in maturate particles at the level of the MBD (matrix-binding domain) to cause an interaction with the L envelope [70, 71] (for review see Liu and Hu ). Even though the molecular bases of these two domains remain to be clarified, they both lie on the capsid structure in agreement with the large effect of capsid envelopment by single-point mutations around the hydrophobic pocket in the center of the spikes [73, 74, 75, 76]. Thus the envelopment of the vRNP is closely linked to the HBc protein that represents a critical factor in virus replication and as a matter of fact, a target in the search for antiviral molecules [77, 78].
6. The CTD of HBV core protein has nucleic acid chaperone activity
The nucleic acid chaperone activity of the CTD was first suggested by the group of Loeb  and of Zlotnick . In the first case, they followed the strand transfer of the initial (−) DNA from the 5′ ε bulge to the 3′ DR1 sequence and (−) DNA elongation. This suggests that the core protein has a nucleic chaperone activity similar to retroviral NC [30, 32, 34, 80, 81]. In the second case, using core constructs mimicking the unphosphorylated or phosphorylated core, they found a correlation between the number of positive charge in the core protein and the RNA density suggesting that the core protein induced RNA structural modification.
The RNA/DNA chaperoning activity of HBc was confirmed using DNA-DNA hybridization and hammerhead ribozyme cleavage in vitro . In the first assay, authors followed the annealing of the DNA version of HIV TAR. This sequence is located in the 5′ end of HIV genome. In addition to its role in the RNA transcription with TAT protein [82, 83], this sequence is essential for the (−) single-stranded transfer along the RT-dependent synthesis of HIV DNA [30, 32, 34, 81]. Interestingly they compared assembled and disassembled HBc particles and found that dissembled HBc was more efficient in DNA duplex formation. Using a series of peptides, they found that this chaperoning activity maps at the CTD and required the four stretches of basic residues (Table 1). When a peptide is containing phosphorylated serine residues at the positions 155, 162, and 170, considered as the three major serine phosphorylation sites, the DNA annealing activity was progressively reduced as a function of the number of phosphorylation sites. Similar results were obtained with the hammerhead ribozyme cleavage assays  (Figure 6).
The role of HBc nucleic acid annealing and matchmaker activities was assessed using viral particles. Therefore, a plasmid expressing HBV genome with a stop codon in the HBc ORF was cotransfected with a plasmid expressing core in agreement with the fact that HBV competent for the replication can be obtained by trans-complementation assays [84, 85]. Mutations shown to decrease HBc in vitro activity gave rise to an important decrease of HBV DNA synthesis and a loss of viral replication. These results support the notion that HBc has nucleic acid chaperone activity essential for minus-stranded and plus-stranded DNA synthesis along the replication cycle. Nevertheless, the defect observed in viral DNA synthesis could originate from a defect of HBc assembly [86, 87], RNA encapsidation [88, 89], HBc trafficking [46, 68, 90], or HBc maturation/single-stranded blocking model [65, 69, 70, 71] since all these steps require the arginine residues of the C-terminus.
7. Concluding remarks and questions
As for retroviruses that are widespread in living organisms and can rapidly and efficiently circulate and spread in animals, even crossing species barriers, the small basic protein called NC has multiple functions in virus structure, replication, and dissemination. Indeed, NC protein is a helper factor for the RT enzyme and its associated RNAse H activity and also for the integrase enzyme. NC is indeed a chaperoning factor required from the start to the end of viral DNA synthesis as well as for the recruitment of cofactors required for transport (Figure 7). Also they are considered as membrane-less organelles that play key roles in cells such as fine tuning of gene expression, translation, and immune controls via noncoding RNAs. Furthermore, NC is a key factor for driving the recombination reactions fuelling the genetic diversity of the newly made viral particles.
The HBV C-terminal domain (C-ter) appears to play multiple roles in virus replication in a manner similar to the retroviral NC protein, by chaperoning genome replication ensuring the fidelity of the viral DNA synthesis and its stability once it is complete.
There are many questions on how do such viral nucleoprotein complexes function.
One concerns the process of reverse transcription, i.e., how does the RT enzyme copy the RNA molecule coated by hundreds of such highly basic protein molecules? Another one deals with the permeability of these viral nucleoprotein ensembles , i.e., the accessibility of cofactors that can be helper or restriction factors such as cytidine and adenosine deaminases, apolipoprotein B-editing catalytic subunit (APOBEC) , or adenosine deaminase acting on RNA-1 (ADAR1) that was recently shown to inhibit HBV replication by enhancing microRNA-122 processing . Furthermore, the recruitment of restriction factors is not an on-off process since a limited accessibility appears to take place for both HIV and HBV, impacting on the genetic diversity of the virus that is a major issue in antiviral treatments [94, 95].
On a more general basis, vRBP’s with chaperoning activities are widespread in the virus world, since they are, for example, encoded for by other virus families such as flaviviruses, notably the core proteins of HCV and dengue viruses, the N protein of coronaviruses and hantaviruses, and the delta antigen of the HDV viroid [96, 97, 98, 99, 100, 101, 102, 103]. In addition, unpublished data show that the N protein of influenza virus has also chaperone activity.
Similar RNA chaperone proteins are found in bacteriophages of the
Personal (JLD) thanks are due to my spouse Anne Napoly and to Yves Mély (Faculty of Pharmacy, Strasbourg) for their continuous support during the past 12 months and to Lada Bozic for her kind understanding. Supports from INSERM, ANRS, and Philippe Roingeard (Faculty of Medicine, Tours) are acknowledged (HdR).