Retroviral vectors have gained an increasing value in gene therapy because they stably deliver therapeutic genes to the host cell genome. These therapeutic genes are supposed to rectify consequences of inherited and acquired mutated genes in the host cell genome, or alter host cell function to cure diseases. In the following section we will discuss the biology and life cycle of retroviruses which starts with viral entry into the host cell, reverse transcription of viral RNA, nuclear import of the provirus, and finally integration of viral DNA into the cell host genome (Flint, Racaniello et al. 2004). Integration involves viral and host cellular proteins. Their role is discussed in the third and fourth sections of this chapter. Recently, the process of integration site selection (which is where the viral DNA integrates with the host cell DNA) has been quite understood throughout many in vitro and in vivo studies. The human genome project has enabled us to identify integration site preferences for retroviral vectors in human trials. The results of these human trials are reviewed in the fifth section of the chapter. Finally, the last section of the chapter will demonstrate the latest gene therapy trials attempts to control integration sites by manipulation of retrovirus genes and proteins.
1.1. Retrovirus structure and life cycle
Viruses are obligate parasites which depend on living cells to multiply. Their ability to deliver stable RNA and DNA into cells has determined their use in gene therapy. In 1983 Mann et al. developed one of the first retroviral gene therapy vectors for delivery in vitro (Mann, Mulligan et al. 1983). This development was followed by many successfully gene therapy trials of retroviruses (Anderson, Blaese et al. 1990; Levine and Friedmann 1991; Blaese, Culver et al. 1993). Now, retrovirual vectors are implemented in nearly 22.2% of clinical trials (
Retroviruses belong to the Retroviriade family. The retroviral particle consists of 2 copies of positive-single strand (+ss) RNA and viral proteins (reverse trascriptase, integrase, and protease) which are all contained by nucleocapsid. The nucleocapsid complex is surrounded by a protein shell called capsid to form the viral core. A layer of matrix protein, which is formed outside the capsid, interacts with the envelope (env) which consists of lipid envelope derived from the host cell and viral envelope glycoproteins. Viral glycoproteins are made of two units: a transmembrane portion, which attaches the protein into the lipid bilayer, and a surface portion, which binds to the cellular receptor.
The life cycle of the retrovirus consist of several steps. It begins with the binding of the viral envelope to cellular receptors, which enables fusion of the viral envelope with the cellular membrane. Consequently, the viral particle is uncoated, liberating the viral core into the cell cytoplasm. The viral DNA is reverse transcribed to DNA. Then, the viral DNA is transported to the nucleus where it is integrated into the host cell’s genome. From there, viral DNA is transcribed to RNA, some of which is translated to proteins. The viral RNA is packed in a viral particle along with viral proteins. Then, virion is produced when viral particles bud from the hosting cells (Escors and Breckpot).
The retroviral enzyme integrase (IN) plays a vital role in integration. It exists as a tetramer (dimer-of-dimers) inside the virion or the preintegration complex. IN facilitate viral DNA integration in vitro, even in the absence of other viral or cellular proteins (Coffin, Hughes et al. 1997; Flint, Racaniello et al. 2004). Integration is classified into two distinct steps. The first step called processing, where the IN removes two nucleotides from the 3’ ends of the viral DNA, the synthesis of which was produced by the viral enzyme reverse transcriptase (Coffin, Hughes et al. 1997; Flint, Racaniello et al. 2004). Then, when the viral preintegration complex is in the vicinity of targeted host DNA, IN catalyzes a coupled cleavage-joining reaction, where the 3’ ends of viral DNA are joined to host cell DNA, in the joining step (Coffin, Hughes et al. 1997; Flint, Racaniello et al. 2004). The intermediate product of the integration process is flanked by short single-stranded gaps in host cell DNA. After the integration reaction, postintegration repair takes place, in which the 5' ends of viral DNA are trimmed, the gaps filled, and ligated to host cell DNA. Lastly, the appropriate chromatin structure is reconstituted at the integration site. Postintegration repair does not require viral proteins, but instead depends on host cell DNA repair proteins (Daniel, Katz et al. 1999; Lau, Swinbank et al. 2005).
In vitro experiments show that incubating IN with oligonucleotides as DNA substrate, and target DNA were sufficient to achieve integration of one end of the DNA substrate (Flint, Racaniello et al. 2004). However, in vivo, stable integration requires cellular proteins to be accomplished. These cellular proteins have invoked interest of their potential as cofactors of integration. Using a yeast two-hybrid screen, human immunodeficiency virus (HIV)-1 IN-binding protein termed integrase interactor 1 (INI1) was identified (Kalpana, Marmon et al. 1994). At the beginning, INI1 protein was found to boost integration efficiency when it was added to the integration reaction in vitro (Kalpana, Marmon et al. 1994). Also, small interfering RNA (siRNA) targeting INI1, demonstrated that knocking down INI1 was sufficient to significantly reduce HIV-1 replication (Ariumi, Serhan et al. 2006). However, another study showed that lacking INI1 protein did not affect integration reaction (Boese, Sommer et al. 2004). Now it is accepted that INI1 does not affect integration but it appears to be involved in other process of the retroviral life cycle (Ariumi, Serhan et al. 2006; Mahmoudi, Parra et al. 2006; Treand, du Chene et al. 2006). Another cellular non-histone chromatin protein called high-mobility group protein-1 (HMG-1) was found to enhance integration in vitro (Aiyar, Hindmarsh et al. 1996). This enhancement was thought to be attributed to its DNA-bending ability (Aiyar, Hindmarsh et al. 1996; Flint, Racaniello et al. 2004). HMG-I(Y), another related protein in the HMG family, was found in HIV-1 preintegration complexes (Farnet and Bushman 1997). As with HMG-1, HMG-I(Y) and HMG-2 boost integration in in vitro (Aiyar, Hindmarsh et al. 1996; Farnet and Bushman 1997; Hindmarsh, Ridky et al. 1999). Unfortunately, studies using HMG-I(Y) deficient cells did not elucidate the role of this protein in the integration reaction (Beitzel and Bushman 2003). Thus the role of HMG proteins in integration remains unclear. Autointegration is the integration of the viral DNA into itself which will eventually abort the retroviral life cycle. An 89 amino acid protein, which was identified in murine leukemia virus (MLV) preintegration complexes, forbids autointegration of viral DNA, and was hence called the barrier-to-autointegration factor (BAF) (Lee and Craigie 1998). Also BAF was detected in HIV-1 preintegration complex to block autointegration (Lin and Engelman 2003). Finally, in 2003, a yeast two-hybrid system resulted in the isolation of a new HIV-1 IN-binding protein, a previously identified cellular protein termed LEDGF/p75 (lens epithelium-derived growth factor) (Cherepanov, Devroe et al. 2004). In knockout mice experiments, LEDGF/p75 was found not to be a lens growth factor, actually, the knockout mice of the mouse LEDGF/p75 homolog, PSIP1 (PC4 and SFRS1-interacting protein-1), had skeletal abnormalities, indicating that this protein is involved in bone development (Sutherland, Newton et al. 2006). Furthermore, many studies demonstrate that LEDGF/p75 targeting with siRNA or LEDGF/p75 null cells, from the LEDGF/p75 null transgenic animals, showed that integration of HIV-1-based vectors is reduced 89–96% in the absence of LEDGF/p75 (Llano, Saenz et al. 2006; Shun, Raghavendra et al. 2007). Therefore, LEDGF/p75 appears to be essential for efficient integration of HIV-1. Meanwhile, numerous studies displayed that LEDGF/p75 does not bind to MLV IN nor is it essential for MLV integration (Llano, Vanegas et al. 2004; Busschots, Vercammen et al. 2005; Shun, Raghavendra et al. 2007). In addition to the LEDGF/p75 role in enhancing integration in in vitro, it has the ability to target HIV-1 and HIV-1-based vector integration sites (Ciuffi, Llano et al. 2005; Llano, Vanegas et al. 2006; Shun, Raghavendra et al. 2007).
In summary, retroviral DNA integration is catalyzed by the viral protein integrase, but host cell proteins play a significant role in enhancing the efficiency of the reaction, and preventing autointegration.
2. Integration site preferences of retroviruses and retroviral vectors
While Integration of viral DNA can take place anywhere in the host cell genome and there is no strict host sequence for site selection, many studies showed that site selection is not a haphazard process (Schroder, Shinn et al. 2002; Wu, Li et al. 2003; Mitchell, Beitzel et al. 2004). In vitro studies demonstrated that some DNA-binding proteins can prevent contact of IN to target DNA and subsequently block the integration reaction at their binding sites (Pryciak and Varmus 1992; Bushman 1994). On the contrary, bending or distortion of DNA seems to enhance integration (Pryciak, Muller et al. 1992; Pryciak and Varmus 1992; Katz and Skalka 1994; Pruss, Bushman et al. 1994; Pruss, Reeves et al. 1994). Furthermore, studies showed that DNA wrapping around nucleosomes promotes distortion of DNA and thus promotes integration in the nucleosomes-bound DNA (Pryciak, Sil et al. 1992; Pryciak and Varmus 1992; Pruss, Bushman et al. 1994). All of the previous studies show that there are certain integration site preferences in DNA substrate in in vitro models. However, it should be considered that host DNA exists in a higher order chromatin structure, as the results of these in vitro studies may not translate to what really happens in the infected cell. To mimic the in vivo model, Taganov et al. used a 13-nucleosome extended array which includes binding sites for specific transcription factors and can be compacted into a higher-ordered structure using the histone H1 (Taganov, Cuesta et al. 2004). They noticed that chromatin structure impacts the integration site selection of HIV-1 and avian sarcoma virus (ASV) IN proteins differentially. In particular, HIV-1 IN-mediated integration was reduced after compaction of the target DNA/chromatin structure, whereas ASV IN-mediated integration was more efficient after compaction (Taganov, Cuesta et al. 2004). These results reveal that a higher order chromatin structure is involved in integration site selection and variant retroviruses may exhibit differential selectivity of their integration. According to the International Human Genome Sequencing Consortium (IHGSC), in 2004, 25,000 genes had been identified in the human genome. In 1990, two studies indicated that retroviruses have a preference to integrate in the vicinity of transcriptionally active regions (Mooslehner, Karls et al. 1990; Scherdin, Rhodes et al. 1990). These studies were challenged by the relatively low number of identified transcription sites (Bushman, Lewinski et al. 2005). Also, due to incomplete human genome sequencing, the percentage of the genome containing these “favored” integration sites was not clear. Thus, after the IHGSC announcement, researchers were able to define accurate statistical analysis of integration sites. Large-scale studies on HIV-1 integration in human T cell lines revealed that roughly 70% of integration events occurred in genes (Schroder, Shinn et al. 2002; Bushman, Lewinski et al. 2005). Furthermore, the 11q13 chromosomal region was found to be a “hotspot” of integration. Also, Schroder et al. showed similar results when using pseudotyped HIV-1-based vectors (Schroder, Shinn et al. 2002). Many studies have revealed that many retroviruses and retroviral vectors like simian immunodeficiency virus, an SIV-based vector, HIV-2, and feline immunodeficiency virus (FIV) integration preferences resemble HIV-1 integration preferences (Hematti, Hong et al. 2004; Crise, Li et al. 2005; Kang, Moressi et al. 2006; MacNeil, Sankale et al. 2006). On the contrary, MLV and MLV-based vectors demonstrated diverse integration preferences compared with HIV-1 (Wu, Li et al. 2003; Mitchell, Beitzel et al. 2004; Lewinski, Yamashita et al. 2006). 20% of MLV integration occasions occur in the vicinity of the 5’ ends of transcription (Wu, Li et al. 2003), approximately 17% of MLV integration events take place in the vicinity of CpG islands (Mitchell, Beitzel et al. 2004), 11% of the integration sites were detected in the vicinity of DNase I-hypersensitive sites (Lewinski, Yamashita et al. 2006), and the remaining integration sites are scattered in a random manner (Wu, Li et al. 2003).
Avian retroviruses and vectors show only a weak preference for integration around genes (about 40%) and no MLV-like preference for 5’ ends of transcription units (Mitchell, Beitzel et al. 2004; Narezkina, Taganov et al. 2004). Interestingly, high levels of transcription may even inhibit ASV integration in genes (Weidhaas, Angelichio et al. 2000; Maxfield, Fraize et al. 2005). These preferences are consistent with the above-described data from the in vitro system, which used nucleosomal arrays (Taganov, Cuesta et al. 2004). Interestingly, the human T-leukemia virus type 1 (HTLV-1) and mouse mammary tumor virus (MMTV), like avian retroviruses, do not specifically target genes and transcription start sites (Derse, Crise et al. 2007; Faschinger, Rouault et al. 2008).
Lastly, it appears that there is a symmetric base preferences surrounding integration sites for integration of HIV-1, SIV, MLV, and avian sarcoma-leukosis viruses (Crise, Li et al. 2005; Holman and Coffin 2005). These weak consensus sequences are virus specific and possibly reflect the influence of IN on integration site selection (Holman and Coffin 2005). This proposal is supported by the symmetry of the target site sequence, because IN likely functions as a tetramer (Coffin et al., 1997; Flint et al., 2004; Wu et al., 2005; and see above).
In summary, the integration preferences described in this section are distinct for different group of retroviruses. The first group including HIV-1, HIV-2, SIV, and FIV, show a preferential integration into genes (Daniel and Smith 2008). While the second group, consisting of MLV and FV, integrate in 5' ends of transcription units and CpG islands. The last group consists of AVLS, HTLV-1, and MMTV (Daniel and Smith 2008). This group shows weak or even no preferences for gene or transcription start sites. Also, it appears that DNA sequence has a role in integration site selection. However, other factors (cellular cofactors and cellular structures) are likely to be the principal controllers of integration site selection.
3. Mechanism of integration site selection
As mentioned before, IN has a low specificity for binding to host cell DNA. So, it seems that host cell proteins participate in the integration process. Using the yeast two-hybrid system, Debyser and coworkers have identified a new HIV-1 IN-binding protein, termed LEDGF/p75 (Cherepanov, Maertens et al. 2003). LEDGF/p75 is required for efficient integration of HIV-1 DNA. Also, LEDGF/p75 is a transcription factor and has a C-terminal IN-binding domain and N-terminal chromatin-binding domain (Cherepanov, Maertens et al. 2003; Cherepanov, Devroe et al. 2004; Vanegas, Llano et al. 2005; Llano, Vanegas et al. 2006; Turlure, Maertens et al. 2006). Chromatin binding is mediated by PWWP and AT-hook motifs in the N-termianl domain of LEDGF/p75 (Llano, Vanegas et al. 2006; Turlure, Maertens et al. 2006). In addition, LEDGF/p75 was detected in association with preintegration complexes of HIV-1 and FIV in cultured cells (Llano, Vanegas et al. 2006). Moreover, LEDGF/p75 halts proteasomal degradation of ectopically expressed HIV-1 IN, therefore it might assist to the stability of preintegration complexes during infection (Maertens, Cherepanov et al. 2003; Llano, Vanegas et al. 2006). Also, LEDGF/p75 null cells showed that the residual integration sites in these cells no longer take place in active genes (Shun, Raghavendra et al. 2007). However, integration occurred preferentially near promoters and CpG islands (Shun, Raghavendra et al. 2007). The symmetric base preferences surrounding the integration site remained preserved (Holman and Coffin 2005). As a result, in the absence of LEDGF/p75, HIV-1 integration site preferences resemble those of MLV (Shun, Raghavendra et al. 2007). All these results strongly support the hypothesis that LEDGF/p75 targets HIV-1 (and other lentiviral) integration into active genes by tethering the IN protein to chromatin.
Although LEDGF/p75 appears to be a major HIV-1 IN-binding cellular protein, other factors are likely involved in integration site selection by HIV-1 and HIV-1-based vectors. Analysis of robust number of integration sites demonstrated that preferred integration sites are found in the vicinity of certain computer-predicted epigenetic marks, such as histone H3 K4 methylation, H4 acetylation, or H3 aceytlation (Kalpana, Marmon et al. 1994). These results may suggest that the chromatin structure, including the histone code, may also affect integration site selection. However the decisive evidence that these marks play a role in integration site selection has yet to be revealed. Moreover, other factors which affect integration site selection have been identified. Knockdown of the T-cell lineage-specific chromatin organizer, SATB1 (special AT-rich sequence-binding protein-1), reduces HIV-1 integration in the vicinity of SATB1-binding sites (Kumar, Mehta et al. 2007). Consequently, SATB1 seems to be implicated in integration site selection by an unknown mechanism. Lastly, it has been shown that the cellular protein Ku80, which is present in the preintegration complex, directs integration to chromatin domains prone to silencing (Li, Olvera et al. 2001; Masson, Bury-Mone et al. 2007). In contrast to HIV-1, integration of MLV-based and ASV-based vectors does not seem to be determined by LEDGF/p75 (Mitchell, Beitzel et al. 2004; Narezkina, Taganov et al. 2004). It is still unknown what controls ASV integration site selection. While in the case of MLV, a study using HIV chimeras with MLV genes demonstrated that MLV IN appears to be the major director for integration site selection (Lewinski, Yamashita et al. 2006). Furthermore, Gag-derived proteins play an auxiliary role in the integration selection process, as an HIV-1 chimera with MLV Gag demonstrated other site preferences different from both HIV and MLV (Lewinski, Yamashita et al. 2006). All the previous data support a different mechanism of integration site selection for MLV versus HIV.
In conclusion, current data has promoted our understating of the retroviral site selection process and demonstrates a major role of host cell proteins in the process. Yet, the process is not entirely understood, and there will likely be new determinate members involved in the retroviral integration site selection process revealed in the near future.
4. Integration site selection and gene therapy
MLV and HIV-1 vectors are the two most widely used vectors in gene therapy. It was hypothesized that even if a retroviral vector integrates in the "wrong spot", it may not necessarily lead to the development of a tumor (Hahn and Weinberg 2002; Baum, Kustikova et al. 2006). However, this hypothesis was challenged when serious adverse effects emerged in gene therapy trials involving children to treat X-linked severe combined immunodeficiency (SCID-X1) (Hacein-Bey-Abina, Von Kalle et al. 2003; Alexander, Ali et al. 2007; Bushman 2007; Deichmann, Hacein-Bey-Abina et al. 2007; Faschinger, Rouault et al. 2008). In one these trials, which used an MLV-based vector, 4 out of 11 patients developed T cell leukemia. Moreover, in another SCID-X1 gene therapy trial, it has been reported that a patient, of 10 patients enrolled, developed leukemia (Alexander, Ali et al. 2007; Schwarzwaelder, Howe et al. 2007; Thrasher and Gaspar 2008). Using sequencing analysis, T cells from two of the patients in the first trial who developed leukemia, showed an insertion of the vector near (and subsequent activation of) Lin-1, IsI-1, Mec-3 (LIM) domain only-2 (LMO2) protooncogene by the long terminal repeat (LTR) enhancer of the vector (Hacein-Bey-Abina, Von Kalle et al. 2003). Also, in the second trial, the vector insertion was in the vicinity of the LMO2 protooncogene (Thrasher and Gaspar 2008). These striking data demonstrate that vector integration at a dangerous spot of the human genome could lead to cancer development. It is also true that there could be other unknown factors that contributed to the leukemia development. Proposed factors that may have been involved are expression of the transgenes and chromosomal rearrangement (Hacein-Bey-Abina, Von Kalle et al. 2003; Pike-Overzet, de Ridder et al. 2006; Thrasher, Gaspar et al. 2006; Woods, Bottero et al. 2006). A follow-up analysis of the patients of these gene therapy trials exhibited a nonrandom distribution of integration sites in vivo (Deichmann, Hacein-Bey-Abina et al. 2007; Schwarzwaelder, Howe et al. 2007). Integration of vectors occurred preferentially near the 5' ends of genes and associated CpG islands, which is consistent with the data obtained with MLV in in vitro studies (Bushman 2007; Deichmann, Hacein-Bey-Abina et al. 2007; Schwarzwaelder, Howe et al. 2007). Comparison of integration sites, in transduced-cells before and after infusion into patients, showed that vector integration manipulates cell growth, survival and proliferation in vivo (Deichmann, Hacein-Bey-Abina et al. 2007; Schwarzwaelder, Howe et al. 2007). Similarly, clonal evolution was noticed in a gene therapy trial using ADA-SCID. However, in this trial, no adverse effects were related with vector integration site (Aiuti, Cassani et al. 2007). In similar trials, vector insertion caused a deregulation of gene expression without any development of cancer (Ott, Schmidt et al. 2006; Recchia, Bonini et al. 2006). Likewise, animal gene therapy model results were similar to results obtained in human gene therapy trials (Li, Dullmann et al. 2002; Hematti, Hong et al. 2004; Modlich, Kustikova et al. 2005; Baum, Kustikova et al. 2006; Montini, Cesana et al. 2006). Moreover, Kaiser described in his article the first successful gene therapy for Beta-thalassemia disease using an HIV vector to correct β-globin coding gene (Kaiser 2009). The infused cells with corrected genes were highly proliferating due to overexpression of mutated HMGA2. The follow-up of the patient did not show any serious adverse effects, still the elevation of HMGA2 seems to be a caveat.
In conclusion, integration of a retroviral vector into the human genome contributed to the development of leukemia both in animal models and human patients. Nevertheless, these insertions may not be directly involved in cancer development, few patients of gene therapy trials developed malignancies (Hacein-Bey-Abina, Von Kalle et al. 2003; Dave, Jenkins et al. 2004). These cases emphasize the need for further improvements of retroviral vector designs to obtain vectors with low preferences for “wrong spots” to increase the safety margin in gene therapy applications.
5. Retargeting integration
The hypothetical need for integration targeting was realized even prior to the adverse events described above. Thus, attempts to target integration were made in the last decade of the 20th century. These attempts involved attaching a specific DNA binding domain (binding to a known DNA sequence) to the retroviral integrase protein. It had been shown that these fusion proteins target integration in vitro (“testube”), however, when these proteins were introduced into a vector particle, they either failed to perform integration or did not target it efficiently to predicted sites ((Goulaouic and Chow 1996)). Following the discovery of LEDGF/p75, it has been hypothesized that it is possible to retarget integration using a modified LEDGF/p75 protein. Thus, the Daniel laboratory created a fusion protein, in which the LEDGF/p75 chromatin binding domain was replaced by the chromatin binding domain of the heterochromatin protein 1a (HP-1a, (Silvers, Smith et al.)). HP-1a binds to the trimethylated lysine 9 of the histone H3, which is a hallmark of heterochromatin. It should be noted that cellular chromatin consists of euchromatin, containing most genes, and heterochromatin, which contains mainly repetitive sequences and relatively few genes. Thus, integration into heterochromatin should be “safer” than integration into euchromatin and genes. This fusion protein, when transfected into cells prior to infection with a HIV-1 vector, indeed reduced integration events occurring in genes. Other labs, following a similar strategy, demonstrated that further reduction in genes can be achieved by knocking down the endogenous LEDGF/p75 (Ferris, Wu et al. 2010; Gijsbers, Ronen et al. 2010). It should be noted that the knockdown did not result in reduced integration efficiency, because the novel fusion proteins efficiently replaced LEDGF/p75 function. These results thus pave the way to retargeting integration, and reducing the safety risk in gene therapy trials. However, caveats remain. One disadvantage of these methods is that targeting requires two vectors, one to deliver the fusion LEDGF/p75-based protein, and one to deliver the therapeutic genes. In addition, a significant percentage of integrations still occurred in genes. One possible approach to address the first weakness is to introduce the targeting protein directly into a vector particle. It is possible that the second disadvantage can be removed by using chromatin binding domains that show more specificity for heterochromatin than that of HP-1a. These approaches are currently being explored. We hope they ultimately result in self-targeting HIV-1 vectors that can carry negligible risk of adverse events in gene therapy trials.