Retroviral Genotoxicity

Gene therapies have enormous potential to cure human disease. In recent years, hematopoiet‐ ic stem cell (HSC) gene therapy has advanced tremendously, due in part to years of intense re‐ search to develop effective vectors and efficient ex vivo transduction protocols. In early clinical trials, inefficient gene transfer resulted in either a lack of therapeutic benefit or shortlived therapeutic benefit [1-3]. Advances in preclinical animal models, led to improved gene transfer in human clinical trials, where long-term efficacy has now been achieved. HSC gene therapy has been used to correct several monoallelic genetic diseases [4], such as X-linked se‐ vere combined immune-deficiency (SCID X-1) [5], chronic granulomatous disease (CGD) [6-8], adenine deaminase deficiency (ADA-SCID) [9-12], Wiskott-Aldrich syndrome [13-14], and X-linked adrenoleukodysrophy [15,16]. Recently HSC gene therapy has also been used to treat glioblastoma [17], X-linked hyper-immunoglobulin M syndrome (HIGM), and familial haemophagocyticlymphohistiocytosis syndrome (HLH) [18]. These successes are in large part due to advances in ex vivo transduction protocols and improvements with recombinant vec‐ tor technologies. The French SCID-X1 HSC gene therapy trial marked a major turning point in the field when nine of the ten patients treated exhibited therapeutic benefit. However, follow‐ ing this exciting achievement the field was dealt a major setback when it was initially report‐ ed, that two patients from the study had developed vector-mediated leukemia resulting from the treatment [19]. This was the first vector-mediated malignancy reported in a HSC gene therapy clinical trial. Four boys ultimately developed leukemia as a side effect of the gene therapy procedure [5]. Three of the four boys were successfully treated with chemotherapy, but one patient died due to vector-mediated T cell leukemia. In these patients, vector-mediat‐ ed dysregulation of host genes led to leukemia, and this unwanted adverse side effect is cur‐ rently a major challenge for HSC gene therapy. The effect of the integrated viral vector on host gene expression resulting in an altered phenotype is known as genotoxicity.


Introduction
Gene therapies have enormous potential to cure human disease.In recent years, hematopoietic stem cell (HSC) gene therapy has advanced tremendously, due in part to years of intense research to develop effective vectors and efficient ex vivo transduction protocols.In early clinical trials, inefficient gene transfer resulted in either a lack of therapeutic benefit or shortlived therapeutic benefit [1][2][3].Advances in preclinical animal models, led to improved gene transfer in human clinical trials, where long-term efficacy has now been achieved.HSC gene therapy has been used to correct several monoallelic genetic diseases [4], such as X-linked severe combined immune-deficiency (SCID X-1) [5], chronic granulomatous disease (CGD) [6][7][8], adenine deaminase deficiency (ADA-SCID) [9][10][11][12], Wiskott-Aldrich syndrome [13][14], and X-linked adrenoleukodysrophy [15,16].Recently HSC gene therapy has also been used to treat glioblastoma [17], X-linked hyper-immunoglobulin M syndrome (HIGM), and familial haemophagocyticlymphohistiocytosis syndrome (HLH) [18].These successes are in large part due to advances in ex vivo transduction protocols and improvements with recombinant vector technologies.The French SCID-X1 HSC gene therapy trial marked a major turning point in the field when nine of the ten patients treated exhibited therapeutic benefit.However, following this exciting achievement the field was dealt a major setback when it was initially reported, that two patients from the study had developed vector-mediated leukemia resulting from the treatment [19].This was the first vector-mediated malignancy reported in a HSC gene therapy clinical trial.Four boys ultimately developed leukemia as a side effect of the gene therapy procedure [5].Three of the four boys were successfully treated with chemotherapy, but one patient died due to vector-mediated T cell leukemia.In these patients, vector-mediated dysregulation of host genes led to leukemia, and this unwanted adverse side effect is currently a major challenge for HSC gene therapy.The effect of the integrated viral vector on host gene expression resulting in an altered phenotype is known as genotoxicity.
Genotoxicity is a result of retroviral mediated delivery of the integrated form of the retroviral vector genome known as the vector provirus into the host genome.Integration of the vector provirus into a host chromosome, by definition, alters the host DNA.In cases where a retrovirus or retroviral vector provirus has dysregulated host gene expression, insertional mutagenesis is said to have occurred.However, it is important to remember that provirus integration always results in mutation of the host genome, regardless of whether the vector provirus exerts an effect on host gene expression.The oncogenic properties of replicationcompetent retroviruses were well known prior to the development of retroviral vectors for gene therapy.However, vectors that are used in gene therapy have been engineered so that they do not have the ability to replicate, only to insert their genome into a target cell.These vectors are thus referred to as replication-incompetent.In numerous preclinical and clinical studies conducted prior to the SCID-X1 trial, malignancies were not observed when using replication-incompetent vector systems [20].It was therefore assumed that the potential for malignant transformation from a replication-incompetent vector was very low.Unfortunately, it has now been clearly shown in the French SCID-X1 trial and in subsequent HSC gene therapy trials, that genotoxicity is indeed a problem for replication-incompetent vectors.
Here we review the mechanisms of vector-mediated genotoxicity in HSC gene therapy and describe efforts in the field to reduce genotoxicity which is currently a major challenge in the field [21,22].

Why integrating vectors are used for HSC gene therapy
Why use integrating vectors for HSC gene therapy if we know that retroviral vectors mutagenize the genome and therefore carry a risk to induce genotoxicity?The answer is that provirus integration to HSCs is currently the only way to efficiently and stably deliver transgenes to the billions of mature blood cells produced every day in the body.Our mature blood cells are generated from a relatively small pool of self-renewing long term-repopulating HSCs in the bone marrow through a process known as hematopoiesis (Figure 1).During hematopoiesis, long-term repopulating stem cells provide lifelong supplies of the mature cells of each blood cell lineage via massive expansion of transit-amplifying cells that include multi-potent and lineage restricted progenitors.By permanently modifying a long term repopulating HSC via proviral integration into the HSC genome we can ensure that all progeny produced from these gene-modified cells will inherit the transgene during mitosis.Thus, the mature blood cells that arise during hematopoiesis from gene-modified HSCs and their daughter transit amplifying cells all inherit the transgene.Using retroviral vectors to efficiently deliver a therapeutic transgene via integration of the vector provirus into a HSC is currently the only effective approach for HSC gene therapy.While there have been reports of some success with adenovirus and other non-integrating approaches in small animal models, to date only integrating vectors have been used successfully for HSC gene therapy in large animal models and in clinical trials.Long-term-hematopoietic stem cells (LT-HSCs) are a self-renewing population of stem cells that reconstitute the blood system throughout the entirety of our life span.Short-term-HSCs, reconstitute our blood system for only limited periods.The short-term-HSCs differentiate into multipotent progenitors (MPPs), which have the ability to differentiate into several transit amplifying cell lineages.Common lymphoid progenitors (CLPs) differentiate into (Pro-Dendritic Cell, Pro-T, Pro-NK, and Pro-B) lymphoid progenitor cells.Finally, these progenitors give rise to the mature lymphoid class cells of the blood system (T-lymphocytes, B-lymphocytes, and natural killer (NK) cells).Common myeloid progenitors (CMPs), give rise to granulocyte-macrophage progenitors (GMPs) and megakaryocyte-erythroid progenitors (MEPs) that differentiate into :( macrophages, granulocytes, megakaryocytes, and erythroid) myeloid class progenitors.Finally, these progenitors give rise to the mature myeloid class cells of the blood system.

Retroviruses as insertional mutagens
We now know that the use of retroviral vectors for HSC gene therapy, though highly efficient, can dysregulate host genes near the vector provirus and ultimately lead to malignant transformation.The ability of replicating retroviruses to cause tumorigenesis is well established.In 1911, Peyton Rous showed that a sarcoma growing on a domestic chicken could be transferred to another chicken by exposing the healthy bird to a cell-free filtrate [23].This filterable agent is now known to be the Rous sarcoma retrovirus.Since this report, many retroviruses have been discovered that cause diverse malignancies.There are several mechanisms whereby retroviruses can cause malignancy.Varmus et al. showed that acutely transforming onco-retroviruses capture and deliver cellular oncogenes, which allow these viruses to efficiently, convert target cells into a malignant phenotype [24].It is important to note that oncogene capture does not occur at a detectable frequency with current replication-incompetent vectors used in gene therapy.Yet several mechanisms remain for cellular transformation from replication-incompetent retroviral vector proviruses (Figure 2).Despite the risks associated with malignant transformation from retroviral vectors via insertional mutagenesis, for several severe hematopoietic diseases the therapeutic benefit of HSC gene therapy outweighs the risks.Currently, major efforts are underway to further our understanding of genotoxic events and to improve vector safety by reducing their genotoxic potential [18,25,26].

Overview of ex vivo HSC gene therapy
It is important when studying genotoxicity to consider how the target cells are manipulated during the gene transfer process.HSC gene therapy is conceptually straightforward but requires culture of stem cells under the appropriate ex vivo conditions.A patient's cells are collected and enriched for repopulating stem cells using the CD34 marker.The CD34 protein is a member of the sialomucin family, and is expressed in early HSCs [27].The CD34 marker allows for rapid enrichment of HSCs, typically via column enrichment using CD34 antibody-conjugated magnetic beads.CD34-enriched cells that include repopulating HSCs are then exposed to the vector containing the therapeutic gene in an ex vivo transduction process.Following ex vivo transduction, gene modified cells must be infused into the patient.Correction of the disease phenotype will occur if enough gene-modified repopulating cells engraft.Engraftment requires that gene modified cells survive, home to the bone marrow, and proliferate sufficiently to repopulate the blood system.
Early preclinical HSC gene therapy studies in mice demonstrated high gene transfer rates.However, early clinical trials using similar approaches and culture conditions had inefficient gene transfer [28,29].Large animal models such as the dog and non-human primates more accurately model human HSC gene therapy and have since been used to establish the conditions for efficient ex vivo transduction [30].More effective ex vivo gene protocols were developed, that resulted in higher gene transfer efficiencies while maintaining efficient engraftment.These improvements included defining cytokine support and the extracellular matrix CH-296 fibronectin fragment [5,31,32].Together these advances contributed to the success of HSC gene therapy clinical trials such as the French SCID-X1 trial, and now efficient transduction of human CD34+ cells can be routinely achieved using retroviral vectors.These improved gene transfer efficiencies also factored into the observed genotoxicity.For example, two patients in the French SCID-X1 trials were estimated to have a total of 4.3 × 10 6 and 11.3 × 10 6 CD34+γC + cells/kg body weight gene modified cells respectively [5].Thus in each of these patients many proviral integrants exist with the potential to dysregulate many nearby genes, including proto-oncogenes.

The SCID-X1 trials as an example of genotoxicity in HSC gene therapy
SCID-X1 is a fatal X-linked inherited mutation of the IL2RG locus harboring the gamma c (γC) cytokine receptor common subunit [5].Inactivating mutations in this gene prevent proper cellular communication and maturation of lymphoid progenitor cells.The loss of cellular signaling in lymphoid progenitors prevents the development of mature T, NK, and B cells.Many SCID-X1 patients fail to thrive or suffer morbidity and mortality in early life because of impaired immune function, which leaves them susceptible to life-threatening infection.Allogeneic HSC transplantation has been used to treat SCID-X1, but many patients do not have suitable donors.In addition, graft versus host disease is a major source of mortality for patients treated by this approach.Graft versus host disease occurs when transplanted (allogeneic) immune cells from a donor recognize the host recipient tissue as foreign and attack these cells.SCID-X1 HSC gene therapy using the γC transgene has a lower mortality rate and a higher treatment efficacy compared to conventional allogeneic bone marrow transplants [25], and is currently the only therapeutic choice for patients without a suitable donor.Prior to the French study, preclinical studies conducted in both murine and canine models corrected SCID-X1 deficiency was with no reported adverse events [20, 31, 33, and 34].
In the French SCID-X1 trial, four of the ten patients developed T-cell leukemia from insertional mutagenesis from the murine moloney leukemia virus (MLV)-based vector [5,25].Careful molecular analysis of leukemic cells showed that the MLV provirus integrated near the proto-oncogene, LMO2, and suggested that viral enhancer elements in the provirus contributed to leukemia (Figure 2B) [25].The SCID-X1 trials were the first HSC gene therapy clinical trial where vector-mediated insertional mutagenesis led to cancer.In this trial, MLV vector LTR enhancers activated LMO2 expression, resulting in T-cell LMO2 dependent proliferation (Figure 2B).LMO2 is normally silenced in mature T-cells, and when viral enhancers turn on expression, LMO2 drives T-cell proliferation by dysregulating transcription networks that affect the cell cycle.This promotes cell cycle escape and can result in higher proliferation rates compared to normal cells [35].Leukemic transformation was a result of provirus integration near LMO2 with additional proviral integrants near other proto-oncogenes that resulted in expansion of cells with these mutations [5,19].

Clonal expansion in the SCID-X1 trials
As evidenced in the French SCID-X1 trial, retroviral vector integration can dysregulate nearby host genes, thus affecting cell growth and survival.Once integrated, elements in the provirus, particularly enhancers in the LTR, can dysregulate nearby gene expression through several mechanisms (Figure 2).Aberrant gene expression can dysregulate host cell genes and regulatory networks involved with cell growth, including proto-oncogenes.A survival advantage can occur through the activation of proto-oncogenes giving cells a proliferative advantage or "go signal".Alternatively, tumor suppressors can be inactivated, causing a proliferative advantage with uncontrolled cellular division from the lack of a "stop signal".To date the genotoxicity described in HSC gene therapy trials has been through activation of growth promoting genes and proto-oncogenes, rather than inactivation of tumor suppressors.This is because activation requires a single integration into only one allele whereas inactivation of a tumor suppressor requires a second event, either a second integration or a loss of heterozygosity at that allele, to inactivate tumor suppressor activity.Tumor suppressors have however been identified in preclinical mouse studies [36][37].These vector mediated mutations along with additional accumulating mutagenic events in expanding genemodified repopulating cells can ultimately result in tumorigenesis.The proviral promoter and enhancer elements have been shown to act up to a distance of 500 Kb upstream and downstream of the site of proviral integration [38].In the SCID-X1 trials where patients developed T-cell leukemia, the integration sites of dominant repopulating clones near the genes LMO2, BMI1, HMGA2, SEPT9, RUNX2, and RUNX3 gave rise to cells with a proliferative advantage or survival advantage over competitor repopulating cells.These advantages eventually led to an over-representation of these clones (Figure 3) [5].
We now know how vector-mediated dysregulation of these different genes may have contributed to clonal expansion and frank leukemia.LMO2 or Lim only 2 is a proto-oncogene that regulates early progenitor expansion during hematopoiesis [39].LMO2 oncogenic properties were first observed in mature gene-modified T-cells, where it is not normally ex-pressed [39].However, after proviral integration events, resulting in activation of LMO2, these gene-modified T-cells begin to expand from dysregulation of transcriptional regulatory pathways.In 2010, Oram et al. demonstrated that LMO2 expression in T cells activates FLI1 and ERG enhancers, known to be involved in blood stem/progenitor cells.These gene products of FLI1 and ERG in turn activate the enhancer of the HHEX/PRH gene locus, which has been shown to act in early progenitor cell expansion and formation of T-lineage acute lymphoblastic leukemia (T-ALL) [35].LMO2 overexpression has also been demonstrated to reduce or eliminate cell cyclin dependent kinase (CDK) inhibitors promoting escape of the G1 cell cycle checkpoints during cellular division [40].Once aberrantly expressed within a cell, LMO2 promotes cell cycle progression via multiple mechanisms, giving the cell a proliferative advantage.
Additional proto-oncogenes were activated in the French SCID-X1 patients.The BMI1 protooncogene normally functions in self-renewal and maintenance of hematopoietic primitive stem cells [40,41].Like LMO2, dysregulation of BMI-1 via enhancer activation results in clonal expansion.Additional mechanisms mediate clonal expansion, such as the high mobility group AT-hook2 (HMGA2) which can provide a proliferative advantage when the endogenous gene is truncated via insertional mutagenesis.Truncation results in the loss of regulatory target sequences within the protein mRNA preventing degradation by the endogenous miRNA let7.These miRNA elements normally function to regulate HMGA2 via RNA interference using the RNA induced silencing complex (RISC) machinery [42].Truncated HMGA2 mRNA is not degraded thus continuing activation of gene networks involved with cell proliferation, and cell-cycle progression [42].Another mechanism that can contribute to clonal expansion is the aberrant expression of septin proteins.SEPT9 functions as a microtubule regulator and plays an important function in cytokinesis and chromosome segregation, thus affecting genomic stability [43,44].When aberrantly expressed, SEPT9 causes dysregulated cytokinesis or cell division resulting in missegregation of chromosomes [43,44].Genomic instability then results from SEPT9 dysregulation, leading to the accumulation of chromosomal deletions or amplifications from missegregation of chromosomes.These events in conjunction with additional mutations can enhance cell proliferation and survival.
Two additional genes identified near vector proviruses in the SCID-X1 trial were RUNX family members RUNX2 and RUNX3.RUNX proteins (RUNX 1, 2, 3) are a family of RUNT homology domain containing α-subunits that form heterodimeric transcription factors that mediate hematopoietic differentiation and expansion in conjunction with β subunit core binding factor (CBF).Aberrant expression of the RUNX proteins in mouse models hinders myeloid class progenitor differentiation capacity and represses expression of several target genes including Csf1R, Mpo, Cebpd, and the cell cycle inhibitor Cdkn1a [45].Repression of these genes blocks hematopoietic stem cell differentiation leading to an accumulation of undifferentiated cells.These cells cannot pass the differentiation block to repopulate the depleted blood cell niche.The lack of differentiated mature cells continues to generate proliferative signaling pathways that further stimulate mutant HSC expansion.The expanded undifferentiated blast cells accumulate in the bone marrow, disrupting normal blood cell production, and can eventually give rise to various cytopenias and leukemic blast crisis.This is an ideal proviral distribution, which HSC gene therapy would ideally maintain after infusion into patients.However, due to genotoxicity and selection pressures in vivo C) Oligoclonal expansion can be observed, where some clones expand.In some cases, D) individual clones may harbor a proviral integration near genes promoting a proliferative or survival advantage, which may eventually contribute to malignancy.
In summary, our current understanding of genotoxicity is that vector proviruses dysregulate the expression of key regulators of cell cycle, cell survival, genomic stability, proliferative gene networking and cellular differentiation.This leads to an over-representation of genemodified clones with these mutations in the peripheral blood (PB) and bone marrow (BM) which is referred to as clonal expansion (Figure 3).Clonal expansion can lead to additional mutations that can eventually cause frank leukemia.The leukemias observed in the French SCID-X1 trial refocused the gene therapy field to better understand the mechanisms of vector dysregulation of host genes, with the ultimate goal of reducing the risk of genotoxicity.

Risk factors for clonal expansion
It is important to remember that the SCID-X1 trials are one example of clonal expansion in a specific disease setting using a specific vector type with a specific transgene.Several factors can influence clonal expansion including the type of vector, vector design, the therapeutic transgene and the disease setting.For example the transgene in and of itself may provide a selective advantage to cells through its expression.This was true in the SCID-X1 trials, where expression of the corrective γC transgene gave cells a proliferative advantage allowing reconstitution of the lymphoid cell population from a modest number of gene modified HSCs [5].To determine the mechanisms behind these expansion events investigators must first characterize the integration sites of the vectors being used to deliver the therapeutic transgene.This allows identification of nearby genes that may have been dysregulated leading to clonal expansion.

Vector integration sites in expanded repopulating clones allow clonal tracking
To understand the risk of genotoxicity we need to identify where different vectors tend to integrate.Conveniently, integrated vector proviruses serve as molecular tags to identify integration sites and to track specific clones in order to study clonal expansion.Sequencing the unique vector-chromosome junctions can identify where in the genome the virus has integrated.Analysis of vector insertion sites has allowed researchers to compile comprehensive integration profiles for specific virus types and assess the safety of viral vectors based on the regions of preferred or recurrent integration.
Long term tracking of gene-modified cells is necessary to monitor potential adverse events that may occur over time resulting in clonal expansion.By identifying the spectrum of vector integration sites in repopulating cells, the clonality of repopulating cells can be estimated and the expansion of specific clones can be monitored.Long-term tracking may also provide insight into specific mechanism of clonal expansion, such as emergent LMO2 expansion in SCID-X1 trials, and will direct novel approaches to reduce genotoxic effects [46].It has become an important area of study to understand where retroviruses integrate in the human genome, thus affecting their safety for use in HSC gene therapy approaches.

The integration profile of retroviruses and its relation to genotoxicity
Identifying the integration profiles of different vector types has provided important data on the relative genotoxic risk associated with different vectors.Following the leukemias observed in the SCID-X1 trial, integration site distributions were described for different retroviral vectors being developed for gene therapy.Viral integration sites for HIV-1, MLV and foamy virus (FV) vectors were reported and each exhibits a specific and unique integration profile.HIV-1 based vectors showed preferences for integration within actively transcribed genes [47] whereas MLV vectors tends to integrate within transcription start sites near CpG islands [48][49][50].FV vectors also preferentially integrate near transcription start sites and CpG islands but less frequently than MLV vectors and integrate less frequently in genes than HIV vectors.The propensity of MLV-based vectors to integrate preferentially very close to promoter regions was of significant concern since this may increase the risk of dysregulating proto-oncogenes.The integration profiles were found to be largely independent of the route of entry [47,51,52] and target cell type [49,53,54] although characteristics such as cell cycle of the target cell can play a minor role in the profile [49,54].
The factors that contribute to the integration profile of viruses and viral vectors are greatly influenced by a compliment of host proteins that interact with a poorly defined retroviral pre-integration complex (PIC).The PIC is a complex of proteins associated with the viral genome, and during infection, the PIC must migrate to the nucleus to mediate integration of the reversetranscribed viral DNA to generate the vector provirus.This process and the associated proteins that affect it have been studied using various methods [55][56][57].Viral Gag and integrase proteins have been shown to interact with chromatin, affectively tethering the PIC to specific chromosomal regions, thus directing integration [57,58].Studies have compared the contributions of the viral integrase and Gag proteins using MLV-HIV chimeras, and shown that both play important roles in integration site specificity [57].The HIV lens-epithelium-derived growth factor (LEDGF) is a host derived tethering protein that has been demonstrated to associate with the PIC and chromatin affecting HIV-1 integration patterns.This host protein has a strong binding affinity for HIV integrase proteins, which are associated with the lentiviral PIC [59].The tethering of the PIC to LEDGF protects the PIC from host enzymatic defenses [56], promotes chromatin binding [57,60], and directs integration site distribution [61].Unique to foamy virus biology, the c-terminal end of the Gag protein contains glycine-arginine motifs known as a GR boxes [62].These boxes direct viral packaging [63,64] and nuclear localization [62,64].In addition to these features, a 13 amino acid motif called the chromatin-binding site (CBS) has been characterized [58].This CBS contains a functional binding domain for core histones H2A/H2B that is thought to tether the PIC to the chromatin after translocation into the nucleus [58].Host chromatin tethering proteins often associate with the PIC complex and affect integration site distributions.Better characterization of cell-virus interaction should enhance our understanding of viral integration patterns.This has potentially led to novel approaches to direct vector integrations to "safe harbor" chromosomal regions, that do not have genes that can lead to clonal expansion when dysregulated.

Methods for integration site analysis
Many methods exist for generating retroviral insertion site data.PCR based techniques include ligation mediated PCR (LM-PCR Figure 4A), Linear amplification-mediated PCR (LAM-PCR Figure 4B), and non-restrictive LAM-PCR (nrLAM-PCR Figure 4C).LM-PCR relies on frequently cutting restriction enzymes to generate fragments that contain the provirus: chromosome junction.These fragments are then ligated to linkers, and after several rounds of PCR, the resulting products are sequenced.LAM-PCR uses an LTR-specific primer in several rounds of 'linear' amplification where the LTR: chromosome junction is amplified.Nested PCR is then used to produce products that can be directly sequenced or transformed into bacteria and sequenced.nrLAM-PCR is similar to LAM-PCR but uses random shearing rather than digestion of DNA with restriction enzymes prior to linker ligation and sequencing, thus avoiding restriction site bias and is currently the gold standard in the field., where genomic DNA is cut by restriction enzyme digestion, ligated to a linker, and amplified before sequencing of oligos with an LTR specific primer.B) Left Panel: Linear-amplification-mediated PCR (LAM-PCR) amplifies regions of genomic DNA containing integrated vector proviruses using an LTR specific primer.The resulting oligos are captured on magnetic beads and double strand synthesis is performed, followed by restriction enzyme digestion and ligation of a double stranded linker.Nested PCR is then used and the resulting products sequenced.Right Panel: Non-restrictive linear-amplification-mediated PCR (nrLAM-PCR) amplifies genomic DNA with integrated vector proviruses with an LTR specific primer.The resulting products are enriched on magnetic beads, followed by single strand linker ligation.Nested PCR is then employed and the products are sequenced.
One limitation of the above methods is that PCR bias can affect the frequency of detected integration sites [65,66].Another method that has been used is shuttle vector rescue technology, which eliminates PCR-based bias [67, 68, and 54].In shuttle vector rescue, vector plasmids encode a bacterial origin of replication and selection gene.DNA fragments that contain the shuttle vector LTR: chromosome junction are ligated and then transformed into bacteria.These bacteria can then be grown as colonies to amplify plasmid clones of each potential insertion site in the absence of PCR based skewing (Figure 5).Plasmid DNA is then extracted from bacterial colonies and sequenced with an LTR specific primer.In all of the above methods, aligning the genomic sequence immediately next to the proviral LTR to a published genome databases allows for identification of the proviral integration site.It will be interesting to compare the shuttle vector approach to nrLAM-PCR in animal models to provide information on any potential bias from either technique.LTR, the genetic element encoding a bacterial origin of replications (Ori), and the vector provirus 3'LTR.Vector exposed cells are lysed and genomic DNA harboring integrated vector proviruses collected.The genomic DNA harboring proviral integrants is fragmented by restriction enzyme digest, and then self-ligated to form plasmids which may contain portions of the provirus encoding a bacterial origin of replication and an antibiotic selection gene.These plasmids are then transformed into E. coli.E. coli transformed with plasmids containing the bacterial origin and antibiotic resistance gene will form colonies.Sequencing the colony plasmids identifies proviral LTR: chromosome junctions.

Animal models to study genotoxicity: Tumor prone mouse models
Animal models allow the in vivo study of the genotoxicity of HSC gene therapy approaches, within specific disease contexts.These studies are critical because while in vitro genotoxicity assays can provide important information on the relative genotoxicity of different vectors [50], only animal models can assess genotoxic effects on in vivo hematopoiesis.Tumor prone mouse models have provided important data on the relative genotoxicity of different vectors systems and have identified genes and gene networks involved in vector-mediated malignant transformation [69].The advantage of tumor prone mice is that the frequency of clonal expansion and tumorigenesis resulting from vector-mediated genotoxic events is increased, thereby allowing readout of vector-mediated malignancy within the life span of a mouse.
Several studies have focused on gammaretroviral and lentiviral vectors; testing vectors with hybrid LTRs from both viral systems to identify the elements responsible for different genotoxicities.These studies, in conjunction with tumor-prone mouse models, have informed vector design modifications.Enhancer deletion, use of internal housekeeping promoters, and deletions of vector cryptic splice sites can be used to reduce genotoxic events and improve safety [69].The tumor prone cdkn2a-/-mouse model has been used to compare retroviral insertional oncogenic potential using MLV and HIV-based vectors in in vivo genotoxicity assays [70].These assays demonstrated HIV-based lentiviral vectors exhibited an improved safety profile compared to MLV based vectors.The Cdkn2a locus controls cell senescence and has been shown to prevent cell transformation.Inactivation of this gene promotes malignancy and has been implicated in almost all types of human cancer [71,72].These studies have compared the genotoxic contribution of vector components such as strong LTR promoters.They have also shown the ability of self-inactivating LTR designs to reduce genotoxicity.
Although these studies primarily identify activated proto-oncogenes, it is also possible to identify dysregulated tumor suppressors using retroviral mutagenesis screens [73][74][75].Proviruses can downregulate nearby host gene transcription via host cell methylation of the proviral LTR that also leads to methylation of the nearby host genes.Identification of the vector provirus location and nearby host genes can be used to identify haplo-insufficiencies related to malignancy.Recent observations of viral LTR methylation causing proviral transgene silencing can now be used to identify down regulation of host genes near the methylated proviral integration sites [75].The vector LTR methylation events and subsequent silencing of host genes can identify potential tumor suppressors related to vectormediated genotoxicity.A recent study used methylation specific PCR and methylated DNA immunoprecipitation assays to analyze methylated proviral integrations in mutagenized mouse tumors [75].In this study the identification of the methylated and downregulated gene PTP4A3 in MLV vector-mutagenized murine leukemia samples, suggests that haplo-insufficiency may be involved in retroviral genotoxicity [75].This study also suggests that future studies may identify vector-mediated haploinsufficiency genes that contribute to genotoxicity.

Large animal models of genotoxicity
Large animal models allow long term monitoring of HSC genotoxicity due to the longer life span of large animals such as dogs and nonhuman primates relative to mice.These models are important to assess the long-term risks associated with malignant transformation follow-ing insertional mutagenesis from clonal expansion and long-term selection pressures.In two non-human primate studies, the distribution of MLV-based gammaretroviral and SIV and HIV-1 based lentiviral integration sites were evaluated over long periods [76,77].Clonal expansion or malignant transformation was not observed.However, integrants were observed at higher than expected frequencies near growth promoting genes and proto-oncogenes.This suggests that repopulating cells with integrations near these genes can influence survival of those clones.These studies can shed light on potential mechanisms of clonal expansion and can allow comparison of different vector types.However, these studies in normal animals using vectors with a reporter gene that is not expected to provide a selective growth advantage did not predict the clonal expansion observed in the SCID-X1 trial.This suggests that large animal models of specific disease settings such as the SCID-X1 dog model [34] will be important to test improved vectors designed to reduce genotoxicity in a specific disease setting.
Another important contribution of large animal models has been to improve our understanding of the effects of ex vivo culture on clonal expansion.It has been shown in a nonhuman primate model that genotoxicity can be significantly influenced by the culturing conditions of gene-modified cells ex vivo.In this study, only six days of culture increased the incidence of specific clones with gamma retroviral vector integrations near MDS/ EVI1 locus associated with a leukemic phenotype [78].To monitor the potential effects of ex vivo culturing condition and in vivo selection on gene-modified repopulating cells, clonal tracking methods must be employed.

Tracking of genetically modified clones
The above studies have identified mechanisms of genotoxicity and clonal expansion.During clinical trials, it is important to monitor potential clonal expansion in order to understand genotoxicity and to anticipate potential adverse events [79].As an example, dysregulation of HMGA2 in a clinical trial for β-thalassemia resulted in clonal expansion of gene-modified cells that has provided a therapeutic benefit without malignancy to date [80].β-thalassemia is a genetic deficiency that hinders β-globin production and patients with this mutation are reliant upon continued blood transfusions to restore normal blood globin levels.Before HSC gene therapy the only therapeutic available was allogeneic transplant, however the procedure is high risk and patient limited due to a lack of matched donors.Thus, patients risk transplant rejection or development of graft versus host disease.To achieve therapeutic benefit using HSC gene therapy, lineage specific transgene expression in erythrocytes is required, promoting appropriate β-globin expression.Therapeutic benefit was achieved in two gene therapy patients resulting from a partially dominant clone harboring proviral insertions near HMGA2 [80].The authors of this study conclude the clone with HMGA2 may remain homeostatic or eventually progress through multistep leukemogenesis, indicating a strong need for continued gene-marking studies and clonal tracking of these gene-modified cells in vivo [80].

MDS1/EVI1, PRDM16, SETBP1 in trial for CGD
Chronic Granulomatous disease (CGD) is an x-linked inherited immunodeficiency resulting from a mutation in one of the NADPH oxidase genes [87].The gp91 phox protein accounts for 70% of cases [81].The gp91 phox transgene has been used in corrective HSC gene therapy clinical trials [81].Unlike SCID-X1 gene-modified cells, CGD gene-modified cells do not exhibit a proliferative advantage from transgene expression.The lack of conditioning or selection of gene-modified cells contributed to a loss of therapeutic benefit and detection of gene-modified cells.Patients in the initial trials had low marking with gene expression of the corrected transgene for short periods of therapeutic benefit.Adverse genotoxic events developed 2 ½ years after the initial therapy, with clonal expansion and leukemic transformation [7,82].Clonal analysis found activation of MDS1/EVI1, PRDM16, and SETBP1 proto-oncogenes [81].One of the patients died in treatment from complications arising from the leukemia, the other patient survived after receiving an additional allogeneic transplant.In this study there was inefficient engraftment and short-term transgene expression, with vector silencing via vector LTR methylation (Figure 2D) [83].Further improvements to enhance engraftment and selection of gene-modified cells after infusion are needed for HSC gene therapy to treat CGD.

CCND2 and MDS1/EVI1 trial for Wiskott-Aldrich
HSC gene therapy has also been used in the treatment of Wiskott-Aldrich syndrome (WAS), an X-linked recessive immune disorder.In this study, patients underwent conditioning with busulfan to enhance the engraftment of gene-modified cells [84] Patients exhibited therapeutic benefit, with resolution of disease symptoms, although clonal skewing was detected for clones that harbored vector integration sites near CCND2 and MDS1/EVI1 [84].Despite the high success of WAS HSC gene therapy in nine of the ten patients treated, patient 2 was reported to have experienced vector-derived genotoxic events after more than 3 years of therapeutic benefit [84], ultimately resulting in T-cell leukemia.The leukemia was a result of proviral integration near the gene LMO2, and this patient has since been treated with chemotherapy resulting in remission [85][86][87].
Clonal tracking in vivo has recently been employed in a study where patients with glioblastoma were given gene-modified hematopoietic repopulating repopulating stem cells carrying a methylguanine methyltransferase mutant (MGMT-P140K) [17].In this approach genemodified hematopoietic repopulating cells expressing this mutant enzyme are resistant to O6-benzylguanine (O6BG).This allows treatment of the glioblastoma solid tumor with O6BG and an alkylating agent.By protecting the hematopoietic system from chemotherapy-mediated hematopoietic toxicity, higher doses of chemotherapy can be used to treat the glioblastoma.In patients undergoing chemotherapy, gene-modified cells were monitored to track potential clonal expansion and to assess patient safety.Repopulating cells were tracked and their retroviral integration sites monitored at several different time points, preand post-chemotherapeutic treatment.Throughout the course of chemotherapy treatment, over 12,000 unique retroviral insertion sites (RISs) were present in the three treated patients.The heterogeneity of RISs suggests a highly polyclonal engraftment of gene-modified repopulating cells.During tracking two patients exhibited clonal expansion, with prominent clones appearing with vector proviruses in PRDM16 (PR domain-containing 16), Set binding protein 1(SETBP1), and high-mobility group A2 (HMGA2) genes.
In summary, it is clear that HSC gene therapy is an efficacious therapeutic approach, able to treat debilitating and often fatal genetic deficiencies.However, the observed clonal expansion in these early clinical trials presents a major concern in the field.There is a need for vectors with an improved safety profile that are less likely to dysregulate genes and lead to clonal expansion.

Next-generation vectors: Reducing genotoxicity
Extensive efforts are underway to develop vector systems with safer integration profiles and reduced genotoxic effect.One approach is to retarget vector integration using tethering proteins that redirect the PIC.Other efforts focus on reducing genotoxicity by producing vectors less likely to dysregulate nearby genes.Such vectors include self-inactivating LTRs, which have deleted enhancer elements or U3 regions, preventing enhancer mediated expression of nearby genes.Newer vectors are also able to regulate context dependent transgene expression using insulators and repressor elements to prevent viral promoters from activating genes near the site of insertion [88].Recently investigators have also identified insertional effects mediating alternative splicing, producing aberrant splice variants and protein fusion products causing oncogenesis [89,90].Modifying the vector-borne cryptic splice sites in vector backbones can create safer vectors reducing aberrant splice variant, reducing posttranslational dysregulation of gene expression (Figure 2 C), [89,[91][92][93].In addition, vector and host miRNAs have recently been explored.An example of miRNA control was demonstrated using miRNA let7 control elements, regulating expression of transgenes in stem cells versus somatic cells.Silencing of the transgenes occurs in somatic mature cells by miRNA cleavage sites.When let7 target sequence is matured and expressed, cleavage of the transgene containing the target sequence occurs [94].In pluripotent cells, let7 is not expressed, thus the target sequences are not cleaved and full-length transgene is expressed [94].This technology could potentially direct HSC gene therapy over a major hurdle, by reducing vector-born genotoxicity through transgene expression in a highly controlled, cell specific context.These miRNA technologies have the ability to restrict transgene expression to a specific cell type and are even able to restrict transgene expression within a specific differentiation stage of that cell type, allowing a more specific control of transgene delivery, dosage, and expression [95].Incorporation of miRNA technologies can improve vector efficacy and safety, ultimately reducing or limiting vector-born genotoxic events.

Chromatin insulators
Chromatin insulators are being developed to reduce the propensity of integrated vector proviruses to dysregulate host gene expression.Insulators are DNA elements that repress the activity of enhancers on promoters.The chicken hypersensitive site-4 (cHS4] insulator contains five DNA binding elements within a 250 bp fragment known as the dominant DNase hypersensitive site [96,97].A 650 bp cHS4 element has been characterized in conjunction with a 400 bp element from cHS4 that can sufficiently block enhancer activation [98].Additional insulators have been described for sea urchin sns5 insulator and an adeno-associated (AAVS1] viral insulator DHS-S1 [99,100]. The cHS4 insulator has been used in several retroviral vector systems [80,99,[101][102][103][104][105][106].Initial studies with cHS4 lentiviral vectors were shown to be effective in reducing genotoxicity [107].Their use in erythrocytes gave encouraging results, albeit with low titers.In addition, this study also demonstrated the effects of insulator failure after a reduction of cHS4 element repeats, which was reported to have contributed to insertional mutagenesis and expansion of clones harboring HMGA2 mutations [80].Sea urchin sns5 is a 462 bp insulator region that was demonstrated to function in gamma retroviral vectors by maintaining chromatin position affects [100].This element also contains a previously identified insulator region of 265 bp found to block enhancer-activated directional transcription in human cells [108,109].The DHS-S1 viral insulator has been demonstrated to increase transgene expression 1000-fold from an elongation factor 1-alpha (EF1α) promoter in muscle cells, but was not studied for its ability to block transactivation of host genes [99].Insulators can potentially serve several major functions, by protecting against vector silencing, moderating vector variegation or uniformity of expression, and protecting nearby host genes from enhancer activation.Additional studies should help better characterize the efficacy of insulated vectors

Incorporation of cell-type specific control elements
Incorporation of cell-type specific control elements such as erythrocyte specific enhancerpromoter has been used to control transgene expression [110].The use of a lineage specific promoter ensures that transgene expression only occurs within the lineage from which the promoter is active.Moreover, avoiding expression of the transgene in other cell types with which the promoter is not active.The premise of lineage-restricted promoters for HSC gene therapy is that they may eliminate or reduce genotoxicity resulting from dysregulation of genes in stem/progenitor cells.This is accomplished by activating transgene expression only in a cell lineage with which transgene expression is required for therapeutic benefit.This ap-proach might protect primitive cells from dysregulation, as the promoter is not expressed until differentiation into the target cell type.This is an attractive area of research for diseases that characteristically are exhibited in one lineage of the blood system such as hemaglobinopathies.This approach was used in a thalassemia trial, where a β-locus-control-region-derived promoter was used [80].The transgene is delivered to long term repopulating HSCs, but the promoter is not active.Only after erythroid differentiation would the enhancer become active, resulting in transgene expression in erythrocytes.This may reduce the occurrence of proto-oncogene activation in stem/progenitor cells.Other lineage specific promoters are being studied, including B cell lineage specific promoters [111].When a lineage specific promoter is not a viable option, vectors may need to be targeted to specific regions of the chromatin, where vector insertion is at a much lower risk of causing malignancy.

Re-targeting of retroviral vectors
Efforts have been made to target retroviral proviruses to specific chromosomal locations.LEDGF, a host cell protein that interacts with HIV Gag has been used to effect tethering and targeting of viral integration Gijsbers et al [112].In this study, cells were modified to express LEDGF protein containing a chromatin-interacting domain of chromobox homolog 1 (CBX1), which binds di-and tri-methylated regions of histone 3 (H3) in heterochromatic regions of the genome [112].H3s are located pericentric to regions of heterochromatin, which is safer in terms of insertional mutagenesis as genes in these regions are normally silent.However, the reporting of significant retargeting of integration sites to heterochromatin is encouraging.
In addition, authors reported that transgene expression was not affected by targeting to these transcriptionally unfavorable heterochromatic sites [112].These exciting experiments have demonstrated that vectors containing Gag and Pol C termini with adapted or unique binding domains could direct insertional distribution [113].However, LEDGF cannot be modified in HSC gene therapy and alternative tethering approaches must be devised.Additional tethering proteins have been studied and need to be fully characterized to expand targeted integration locations in in vivo approaches [113].In future studies use of appropriate tethers for modified integration site preference may reduce genotoxicity and may provide a better understanding of virus and host interactions affecting viral integration.In addition, even with vector systems that have incorporated these safety mechanisms, genotoxic events may arise and methods to ablate the gene-modified cells will be useful to avoid malignancy.

Approaches to ablate expanded cell clones
Several approaches exist to ablate or control expanded clones after insertionally activated oncogenesis has occurred.Conditional selection systems have been employed to control the longevity and survival of HSC gene-modified clones after infusion.Several conditional promoter systems such as TET on/off and pro-drug inducible expression cassettes have been used to target cancer cells harboring dangerous integrations through vector silencing and suicide gene activation [114].
The tetracycline (Tet) on/off gene expression system utilizes a pro-drug to regulate transgene expression by modifying a Tet repressor protein (TetR).TetR is constitutively expressed and depending on its conformation will either be bound to the tetracycline operator (TetO), or unbound.In a Tet on system, TetR does not bind TetO until administration of the pro-drug, typically doxycycline.Once the pro-drug is administered, TetR actively binds Te-tO and silences transcription of nearby transgenes.In the absence of the pro-drug TetR cannot bind TetO, and this region of the genome is no longer blocked from transcription, and gene expression resumes.Alternatively, modifications have been made to TetR, allowing it to bind to repressor sequences until it is deactivated by a pro-drug; this system is called Tet off.The Tet on/off system may be used in conjunction with suicide genes to ablate undesired clones.
In gene suicide approaches, HSC gene therapy delivers an active transgene in conjunction with a pro-drug induced suicide gene such as Thymidine Kinase [115].In the event that transformations result from insertional mutagenesis and clonal expansion, clones harboring integrations can be eliminated or reduced by activating expression of the suicide gene, inducing apoptosis and eliminating clones harboring proviral integrations [26].Recent clinical trials using an inducible caspase 9 (iCasp9), which remains in an inactivated state until dimerization following treatment with AP1903 small molecule, was reported in four patients with graft versus host disease after gene-modified hematopoietic transfusion [116].In four patients, a single infusion of AP1903 was reported to have eliminated 90% of gene modified T cells within 30 minutes of administration of the inducing drug AP1903.GVHD and other associated illnesses typically observed after allogeneic bone marrow transplants were not detected up to a year after AP1903 treatment [116].Thymidine kinase and iCasp9 present effective safety switches to control an array of genotoxic effects arising from HSC gene therapy [26,117].Utilizing these safety mechanisms in new vector designs will aid in furthering safety and reducing genotoxic events, and allow for selective ablation of expanded genemodified clones in vivo.

Concluding summary
The use of HSC gene therapy in clinical trials is expanding, and the therapeutic potential is enormous.Following the initial successes with ADA SCID and SCID-X1 additional efficacious therapies were reported for WAS, β-thalassemia, and CGD.Seymour et al. reported that the majority of over 90 patients receiving HSC gene therapy exhibit prolonged clinical benefit, with greater than 90% survival rate despite the occurrence of genotoxic events [46].Current studies that are underway aim to characterize and reduce genotoxicity.Several approaches have reduced genotoxic events in preclinical studies.With ongoing technological refinement, newer and safer HSC gene therapy vectors are entering, or will soon enter, the clinical arena.These advances are crucial for HSC gene therapy to enter mainstream medicine as an effective and safe therapeutic approach.

Figure 1 .
Figure1.Human Hematopoiesis.Long-term-hematopoietic stem cells (LT-HSCs) are a self-renewing population of stem cells that reconstitute the blood system throughout the entirety of our life span.Short-term-HSCs, reconstitute our blood system for only limited periods.The short-term-HSCs differentiate into multipotent progenitors (MPPs), which have the ability to differentiate into several transit amplifying cell lineages.Common lymphoid progenitors (CLPs) differentiate into (Pro-Dendritic Cell, Pro-T, Pro-NK, and Pro-B) lymphoid progenitor cells.Finally, these progenitors give rise to the mature lymphoid class cells of the blood system (T-lymphocytes, B-lymphocytes, and natural killer (NK) cells).Common myeloid progenitors (CMPs), give rise to granulocyte-macrophage progenitors (GMPs) and megakaryocyte-erythroid progenitors (MEPs) that differentiate into :( macrophages, granulocytes, megakaryocytes, and erythroid) myeloid class progenitors.Finally, these progenitors give rise to the mature myeloid class cells of the blood system.

Figure 2 .
Figure 2. Mechanisms of Retroviral Mutagenesis.The black boxes represent promoters, and grey squares represent exons.A) The proviral 3' LTR can drive transcription of nearby cellular gene at an increased rate.B) Proviral LTR enhancers can activate a nearby promoter, increasing transcription of cellular genes.C) Transcription from 5' LTR in conjunction with proviral cryptic splice sights creates novel isoforms and fusion transcripts of both cellular and viral genes.D) Proviral LTR methylation, induces epigenetic changes, silencing proviral genes and nearby cellular genes.E) Proviral integration can disrupt cellular gene expression by causing premature polyadenylation (pA) signaling.

Figure 3 .
Figure 3. Clonal Expansion.A) Patient derivedCD34 -enriched hematopoietic stem cells (HSC) population prior to ex vivo vector exposure.Untransduced cells are blue.B) Polyclonal proviral integration distribution in vector treated HSCs.This is an ideal proviral distribution, which HSC gene therapy would ideally maintain after infusion into patients.However, due to genotoxicity and selection pressures in vivo C) Oligoclonal expansion can be observed, where some clones expand.In some cases, D) individual clones may harbor a proviral integration near genes promoting a proliferative or survival advantage, which may eventually contribute to malignancy.

Figure 4 .
Figure 4. PCR Based LTR: Chromosome Junction Sequencing.A) Demonstration of ligation-mediated PCR, where genomic DNA is cut by restriction enzyme digestion, ligated to a linker, and amplified before sequencing of oligos with an LTR specific primer.B) Left Panel: Linear-amplification-mediated PCR (LAM-PCR) amplifies regions of genomic DNA containing integrated vector proviruses using an LTR specific primer.The resulting oligos are captured on magnetic beads and double strand synthesis is performed, followed by restriction enzyme digestion and ligation of a double stranded linker.Nested PCR is then used and the resulting products sequenced.Right Panel: Non-restrictive linear-amplification-mediated PCR (nrLAM-PCR) amplifies genomic DNA with integrated vector proviruses with an LTR specific primer.The resulting products are enriched on magnetic beads, followed by single strand linker ligation.Nested PCR is then employed and the products are sequenced.

Figure 5 .
Figure5.Plasmid Shuttle Vector Rescue.Genomic DNA is presented with vector integrations showing the proviral 5' LTR, the genetic element encoding a bacterial origin of replications (Ori), and the vector provirus 3'LTR.Vector exposed cells are lysed and genomic DNA harboring integrated vector proviruses collected.The genomic DNA harboring proviral integrants is fragmented by restriction enzyme digest, and then self-ligated to form plasmids which may contain portions of the provirus encoding a bacterial origin of replication and an antibiotic selection gene.These plasmids are then transformed into E. coli.E. coli transformed with plasmids containing the bacterial origin and antibiotic resistance gene will form colonies.Sequencing the colony plasmids identifies proviral LTR: chromosome junctions.