IS 6110 the Double-Edged Passenger

Mycobacterium tuberculosis, as recent investigations demonstrate, has a complex signaling expression, which allows its close interaction with the environment and one of its most renowned properties: the ability to persist for long periods of time under a non-replicative status. Although this skill is well characterized in other bacteria, the intrinsically very slow growth rate of Mycobium tuberculosis, together with a very thick and complex cell wall, makes this pathogen specially adapted to the stress that could be generated by the host against them. In this book, different aspects of these properties are displayed by specialists in the field.


Insertion sequences in Mycobacterium tuberculosis complex
Insertion sequence (IS) is a short DNA mobile genetic element coding for proteins involved in the transposition activity, which allows it to spread within the genome. ISs are widely distributed in prokaryotes and can be grouped into different families established by Mahillon & Chandler (1998) based on structural characteristics and transposase similarities.
In the genus Mycobacterium have been located and identified more than 46 ISs from different species, mostly on the basis of sequence similarities . In the genome of the members of the Mycobacterium tuberculosis complex (MTBC) has been possible to find dispersed IS elements that could be included in various of the following families attending to their characteristics: IS3, IS5, IS21, IS30, IS110, IS256; IS1535, ISL3 and other IS-like elements   Table 1).
The ISs can induce duplications, deletions, and rearrangements in the bacteria genome, all of them essentials changes for the genome plasticity of the members of MTBC (Mahillon & Chandler, 1998). Not all of the ISs described in M. tuberculosis are active and have the availability of transpose from one site to another in the genome, some of the elements are defective copies. Furthermore, some of them have a limited host range .
The Table 1 shows the ISs described in M. tuberculosis that will be briefly presented below.

ISs families
The IS3 family represents an extensive set of insertion elements in bacteria. The features that characterize this family are their length between 1200 and 1600 bp, and their inverted repeats (IRs) between 20 and 40 bp long, as well as the presence of two overlapping open reading frames (ORFs: orfA and orfB) (Mahillon & Chandler, 1998;McAdam et al., 2000). After the insertion, a duplication of 3 or 4 bp occurs at the insertion point (Mendiola et al., 1992).
Members IS1540, IS1604, IS1556/990 and IS6110 belong to this family in the MTBC. The most representative member of this family is IS6110, one of the insertion element most abundant and best characterized in the MTBC. Copies of this IS can be found at 16 positions in the genome of M. tuberculosis H37Rv providing an important epidemiological tool (Small & van Embden, 1994).
Other elements of this family, namely IS1540, IS1604 and IS1556/990, have missing the IRs and Direct Repeats (DRs) or contain mutations in orfB making them supposedly inactive and non-functional (Dziadek et al., 1998;McAdam et al., 2000).
Members of the IS21 family are among the largest bacterial IS elements, with sizes between 2 and 2.5 Kb length. Their IRs are variable. These elements encode two proteins for the transposition (IStA and IStB). Duplication of 4 or 5 bp occurs after transposition at the insertion point. The transposases coded by IS1532, IS1533 and IS1534 shows homology to the elements of IS21 family. These elements posses end IRs of 48, 54 and 49 bp respectively and internal DRs (Mahillon & Chandler, 1998). All of them are absent from 40% of M. tuberculosis clinical isolates, as well as from M. bovis and M. bovis BCG Pasteur .
The IS110 family, exhibits unusual features for bacterial ISs, they have not IRs and DRs (McAdam et al., 2000) and may be differentiated in two groups.
The first group, in M. tuberculosis, included IS1608' and IS1547, unlike to other elements, this group have a target sequence: CATN (6-9) (T,C)CCTT. The IS1547 is one of the members that was detected only in members of the MTBC and seems to be an IS6110 preferential site for insertion (Fang et al., 1999a). The second group includes IS1558 and IS1607, they have imperfect IRs or lack of them (McAdam et al., 2000). Some copies of these IS elements, in the M. tuberculosis genome, are defective as was the case of IS1558' and IS1608'.
The IS256 family is, probably, the largest family of ISs in mycobacteria with more than 25% of the known ISs. Their members have been divided in two groups attending to the structural organization (Guilhot et al., 1999). One of the groups comprises the members of MTBC, such as: IS1081, IS1552', IS1553 and IS1554.  (Cole et al., 1998;Mariani et al., 1993). One of the two copies of IS1560 appears to be defective and probably is non-functional in the M. tuberculosis H37Rv.
The members of the IS30 family have a single open reading frame, IRs 20-30 bp, and DRs 2-3 bp long created after insertion (Mahillon & Chandler, 1998). IS603, an insertion sequence 1327 bp lenght and present in a single copy in the M. tuberculosis, genome belongs to this family (Table 1) with IRs 63 bp long and DRs have not been detected (McAdam et al., 2000).
To the ISL3 family belongs some defective copies of ISs present in M. tuberculosis: IS1555', IS1561' and IS1606' and IS1557. The IS1561' element is absent from some clinical strains of M. tuberculosis  and M. microti (Gordon & Supply, 2005).

Structural organization and function of the IS6110
IS6110 was initially named IS986. It is a genomic insertion element of 1361 bp long and shows 28 bp imperfect IRs, and duplications of 3 or 4 bp next to the insertion site. It has two overlapping ORFs (orfA and orfB) coding for a transposase, showing similarities with elements of the IS3 family of prokaryotes (Accesion No.: X17348, M29899; Fig. 1).
The IS6110 was found to be specific of mycobacteria belonging to the MTBC (Thierry et al., 1990a) and it was considered as the main target of the first reference genotyping tool, due to the high degree of polymorphism observed comparing strains of the MTBC (see part 3.2; Otal et al., 1991), turning into an important factor involved in the evolution of the M. tuberculosis genome. The sequences of IS6110 and IS986/IS987 identified in MTBC were practically identical and considered the same IS (Thierry et al., 1990b;McAdam et al., 1990).
Recently, Sankar and cols (2011b)   , only a few number of strains have no copies of this IS (see part 3.1.2).
Many functions have been shown by the IS6110: (i) activation of genes during infection (Safi et al., 2004) (ii) participation in the evolution as an epidemiological marker (van Embden et al., 1993) (iii) activation of downstream genes with an activity promoter orientationdependent (Soto et al., 2004). Finally, it has been suggested that the presence of IS6110 in M. bovis could participate in the adaptation of this bacteria to a particular host, animal or human (Otal et al., 2008). Several of these features are being reviewed herein.

The heads: Usefulness of IS6110
Soon after the discovery of IS6110 as a specific element in MTBC, its usefulness as diagnostic tool was explored. Subsequently, at the beginning of the nineties it was demonstrated that two strains isolated of different episodes of a patient had the same IS6110-RFLP pattern, in turn, a high degree of polymorphism was observed between strains isolated from different patients (Otal et al., 1991). The fact that IS6110 varies in copy number and location in the bacterial genomes, along with its stability over time showed their usefulness in genotyping of the MTBC. This IS has been successfully used throughout the world for identifying and characterize members of this complex.

IS6110 in the detection of members of the MTB complex
TB is a major public health problem in humans affecting many countries and large numbers of people. There are many reasons to explain the global relevance of this disease, including poverty, the limited vaccine efficacy and the persistence of the pathogen itself. One crucial factor is the difficulty in diagnosis TB. Currently, the main impediment is the lack of adequately sensitive, specificity and rapid tests. Culture and smear microscopy are probably the most common tools used worldwide for confirming the identification of TB in clinical samples. But culture is time consuming and smear microscopy is not specific enough. This has led to its gradual replacement in the developed world by more sensible, specific and rapid methods, such as PCR.
Because of the increased accessibility and convenience of PCR-based detection techniques, these are suitable to replace conventional culture methods. Since bacterial growth is not required, PCR can give results rapidly in as short a period as 1 day. Further PCR modifications, as nested-PCR or multiplex-PCR, can be used to improve results. Over the years, a significant improvement of PCR technologies has been achieved with the development of real-time PCR for the detection of target genes of M. tuberculosis in clinical specimens. The main advantages of real-time PCR are a shortened turnaround time, automation of the amplification and product detection and a decreased in the risk of crosscontamination (Espy et al., 2006).

Advantages of IS6110 as target of the MTBC
To obtain species-specific pathogen identification and detection in clinical samples, specific primers have been designed and tested using PCR-based methods, targeting different genomic sequences of M. tuberculosis. These have included IS6110, hsp65, TRC4 and mpt40 (Bannalikar et al., 2006;Narayanan et al, 2001;Savekoul et al., 2006;Tumwasorn et al., 1996;Wei et al., 1999). Among these, the most widely investigated has been the IS6110 being reported as a specific sequence of MTBC (Brisson-Noel et al. 1991;Eisenach, 1994;Sankar et al., 2011a). IS6110 is an ideal target for PCR. IS6110 is usually a multi-copy element and randomly distributed throughout the genome. The presence of multiple copies improves the sensitivity of the PCR amplification (Mathema et al., 2006;Sankar et al., 2011a).
Different oligonucleotides derived from that sequence have been successfully used to detect M. tuberculosis in all type of clinical specimens. Table 2 summarizes a list of the primers more frequently used in the literature. A problem found was that authors give different names to the same primers. The primers IS1 and IS2  are the most frequently used, these oligonucleotides amplified a final product of 123 bp from 759 to 881 nucleotide position of IS6110 (Table 2).
A search in the databases PubMed since 1991 using "IS6110" and "diagnostic" as keywords, allowed the identification of 138 papers that showed how IS6110 could be a useful tool in diagnostic of TB. In 105 of these papers the diagnostic is based on PCR. Up to 5 of the 11 works published during the seven first months of 2011 applied the real time PCR technique in tuberculosis diagnostic using IS6110 as target sequence.
In most of the cases the authors applied in-house PCR methods and compared results to other methods. Some authors concluded that IS6110-based PCR could be used routinely in clinical laboratories for rapid detection of M. tuberculosis, in sputum samples allowing early diagnosis and treatment (Ereqat et al., 2011). Evaluation of in-house PCR showed that variability in sensitivity and specificity is high (Cho et al., 2007).
The usefulness of IS6110 in the detection and identification of MTBC in clinical samples has been demonstrated in many studies, either detecting IS6110 as single target (Sankar et al., 2010a;Gupta et al., 2010;Inoue et al., 2011) or together to other specific targets (Sankar et al., 2010b;Leung et al., 2011). Multiplex PCR assay can be used for the simultaneous detection of other coinfections in clinical samples (Boondireke et al., 2010).
Additionally, in some cases, the location of IS6110 specific to one strain can be used. PCR with primers targeting IS6110 and the flanking region allowed identify and differentiate that

Disadvantages of IS6110 as target of the MTBC
However its wide applicability, targeting IS6110 may not be by itself sensitive enough to diagnose 100% of the cases. Studies in India documented that 41% of M. tuberculosis isolates harboured a single copy of IS6110 and 1% with no copy (Narayanan et al., 2002). In these situations the use of other targets for PCR in addition to IS6110 for the detection of TB can be of help (Narayanan et al., 2001;Das et al., 1995;Chauhan et al., 2007;Kusum et al., 2011). A good approach could be a multiplex real-time PCR targeting IS6110 and another target, as for example to use multiplex PCR using hsp65, protein B or MPB64 genes as targets.
As for DNA detection, another problem using IS6110-PCR is that it can detect non-viable mycobacteria for patients with earlier culture-positive specimens that had become culture negative following anti tuberculosis drug therapy (Causse et al., 2011).
M. bovis strains usually contain one to five copies of IS6110 (Otal et al., 2008), making the use of this IS less advantageous for the detection of this bacteria. The use of an immunomagnetic separation capture followed by PCR based on IS6110 showed a detection threshold corresponding from 10 CFU in PBS to 1000 CFU for M. bovis in infected bovine fresh tissues, providing a sensitive, rapid and specific technique for the diagnosis of bovine tuberculosis (Garbaccio et al., 2010).
On the other hand, Sankar et al. analysed the sequence diversity of IS6110 by using in silico approach. They found that IS6110 insertion sequences harboured variations in its sequence and there are divergences within the copies of one strain. They collected a list of primers from those successfully used in the conventional PCR for the diagnosis of TB, but the reported data showed variation in the sensitivity and specificity for different regions of IS6110. All these data suggest that care must be taken when designing specific primers for IS6110 detection. The authors recommended develop multiplex PCR assays targeting more than one region of the genome of M. tuberculosis (Sankar et al., 2011a).
Indeed, the IS6110 is still a favourite target sequence in the diagnosis of TB. Recently, a high sensitivity and specificity has been reported for the GeneXpert system, a real-time PCR assay that simultaneously detects both MTBC and rifampin resistance. However the accuracy of the Xpert MTB/RIF test for the detection of M. tuberculosis complex in paucibacillary samples was found to be lower than that of an in-house IS6110 real time PCR routinely used since 2004 (Armand et al., 2011).

Typing of members of the MTBC
DNA fingerprinting of M. tuberculosis, based on the variability in both the number and the genomic position of IS6110, was standardised in 1993 to generate fingerprints, which permit comparison of the results obtained by different laboratories (van Embden et al., 1993). Such standardization has facilitated investigations into the international transmission of tuberculosis and has allowed to identify specific strains with unique properties such as high infectivity, virulence or drug resistance. Although other techniques based on this insertion sequence and other repetitive elements were described, IS6110-RFLP demonstrated the best discriminatory power and reproducibility and was accepted as the gold standard method for M. tuberculosis genotype (Kremer et al., 1999). Up to now it is the best-validated genotyping method, however, the requirement of growth culture and the poor discrimination found among the low copy number of IS6110 strains (LCS), have led to search a better method, based on PCR, discriminative enough to be used on epidemiology.
The application of IS6110 as molecular tool has given a different global vision on TB. IS6110-RFLP has shown to be of great value in, among others: distinguish between recent transmission and reactivation, reinfection, mixed infections, studies of outbreaks, confirmation or rule out laboratory errors. It has also been useful to identify some strains that may differ in transmission, suggesting that more virulent strains could show different pathogenesis and epidemiological characteristics. The establishment of Databases of the RFLP patterns has allowed to analyse the risk factors for tuberculosis and to detect the prevalent strains and/or the most transmitted strains, among the studied populations.

Recent transmission & population studies
The relatively higher rate of IS transposition on genomes compared to that of mutations in structural genes and other loci has elicited strong interest in the applications of ISs as genetic markers to study bacterial population genetics and phylogeny, especially for species with conserved genomes, as is the case of IS6110 for M. tuberculosis (Fang et al., 2001). At the beginning of the nineties it was demonstrated the utility of IS6110 in epidemiology (Otal et al., 1991). On the basis of IS6110-RFLP, recent transmission of TB has been associated to those patients whose isolates presented the same RFLP pattern or were included in a "cluster". The use of IS6110-RFLP analysis in population studies has considerably advanced our knowledge of the epidemiology of M. tuberculosis. Above all, large population studies have led to better understand how transmission occurs in the population. One study carried out in The Netherlands concluded that a short time span between the first two patients in a cluster was the strongest predictor for large cluster episodes (Kik et al., 2008). In this regard, after two population studies carried out in Zaragoza, Spain, along three years each, a change in patterns' transmission of TB was detected (López-Calleja et al., 2007). One susceptible strain designed as "MTZ" caused a susceptible outbreak involving more than one hundred inhabitants (18% of the TB cases). This kind of studies have made possible the detection and characterization of specific M. tuberculosis epidemic strains (Lopez-Calleja et al., 2009).
Recent studies indicate that multidrug-resistant M. tuberculosis has emerged in many countries for the past few years, without the concomitant development of health systems able to provide adequate treatment. MDR and XDR strains can be transmitted among the population (Bifani et al., 1996;Samper et al., 1997;Samper et al., 2005). It is known that the pattern of IS6110-RFLP does not usually change after acquisition of resistances of the strain, nevertheless, the complementary characterization of the genes conferring the resistance helps in contact tracing (Gavin et al., 2009). More recently, drug-resistance and molecular epidemiology of TB in the Murmansk region was investigated in a 2-year population-based surveillance of the civilian population. The study showed that MDR-TB strains were actively transmitted in the northern Russia (Mäkinen et al., 2011). In Ukraine, where increase of TB cases is maintained, the number of drug-resistant isolates was reported to be growing steadily, and transmission of drug-resistant isolates seems to contribute to the spread of resistant TB (Dymova et al., 2011). The MDR-TB genotyping databases allow the comparison of M. tuberculosis strains to improve the application of appropriate public health actions at a national level and, ideally, it should be extended across country borders (Bifani et al., 2001;Gavin et al., 2011;Ritacco et al., 2011).
The current population studies have been essential not only to gain a better understanding of how to implement effective TB control measures but also to analyse the importance of immigration. In Germany, the dynamics of TB transmission between TB high-prevalence immigrant and TB low-prevalence local populations confirm that there is no significant TB transmission from high to low-prevalence population. This could be probably due to the good performance of TB screening programmes, to low degree of mixing high to low populations or by a combination of both (Barniol et al., 2009). One study was carried out to evaluate the origins of the resistant isolates in Finland, a country with a low incidence of TB. They have raised worries concerning the risk of disease in near-frontier contacts and they conclude that it is very probable that cases of MDR in Finland are mostly caught abroad (Vasankari et al., 2011).
Several studies illustrate the situation in the highest TB incidence areas, such as two areas of India (Shanmugam et al., 2011;Purwar et al., 2011) or Uganda (Asiimwe et al., 2009). Other study that gives an overview of the distribution of genotypes of M. tuberculosis in Korea, found that drug resistance phenotypes were more strongly associated with Beijing family (see part 4.1.3). The Beijing genotype strains are also a major cause of TB (75% of MDR-TB) in the Aral Sea region, they are also strongly associated with drug resistance, independent of previous TB treatment and may be strongly contributing to the transmission of MDR-TB (Cox et al., 2005).
In a population-based study carried out in rural China, the association between the Beijing family showed that a specific IS6110-RFLP and MIRU genotype 223325173533 were associated with MDR and with increased transmissibility (Hu et al., 2011).

Recurrent tuberculosis: Relapse or reinfection?
The frequency and determinants of exogenous reinfection and of endogenous reactivation of TB in patients previously treated are poorly understood. The importance of reinfection as a cause for recurrence of TB is unclear and has potential public-health implications. Different studies have used IS6110 genotyping to answer this question. The possibility of genotyping the isolates from initial and recurrent disease episodes allows to differentiate an episode of reinfection from that of relapse of TB.
At this respect, differences are shown depending on the incidences of TB and of the HIV status of the patients. In Spain, a country with a low incidence rate of TB, two studies on this issue were conducted. In the Gran Canaria Island, 2.4% of the cases had recurrent TB in a 5 years-period. Up to 44% of them corresponded to exogenous reinfection proved by IS6110 genotypes (Caminero et al., 2001). In a second study conducted in Madrid extended twelve years, up to 3.1% of the patients had a second episode of TB. Only one recurrent case showed different genotypes, suggesting exogenous re-infection. Re-infection is possible among people in low-risk areas, but the rates are lower than those occurring in high-risk areas (Cacho et al., 2007). On the other hand, in countries with high incidence as India, most of the recurrences after successful treatment of TB are due to exogenous reinfection in HIVinfected persons, in contrast to endogenous reactivation in HIV-uninfected persons. Strategies for prevention and treatment of TB infection must take these findings into consideration (Narayanan et al., 2010). Conversely, one study carried out in Karinga Malawi, concluded that HIV increases the rate of recurrent TB by increasing the rate of reinfection disease (Crampin et al., 2010). Other authors reviewed different studies on recurrence and argued that, apart from extreme situations, the problem of recurrence due to reinfection has few implications for TB-control programmes (Lambert et al., 2003).

Limits of IS6110 as epidemiological tool
A common dilemma of the different markers used for typing tuberculosis, including IS6110, is how to interpret the variability of the patterns. If two M. tuberculosis isolates from 2 different patients present the same genotype, transmission may have occurred between them. However, once transmission has occurred, the genotypes may change, resulting in divergent fingerprints. The advantage of IS6110 as marker is that the clock of change of the IS6110 patterns was determined in serial isolates; the half-life was extrapolated to be 3.2 years. These changes were predicted more common for persons with extrapulmonary disease and for those who had both pulmonary and extrapulmonary isolates. This fact supported the use of IS6110 typing in epidemiologic studies of recent transmission of TB (de Boer et al., 1999). The results of a study carried out to estimate the recent transmission based on IS6110-RFLP suggested that the interpretation of the recent transmission index, and the resulting necessary public health interventions, will vary according to how researchers account for spontaneous mutation when estimating transmission from the genotyping data (Benedetti et al., 2010).
In spite of all the studies carried out with this genomic element, some limitations have been found. Besides the technical difficulties that IS6110 typing presents for some laboratories (the long time that the mycobacteria requires to growth, the equipment and the software required for the analysis), this method have also demonstrated difficulties for differentiating LCS, including M. bovis strains and is unable to identify strains with cero copies. Some studies have solved this problem by applying a second technique for these cases ( Thong-On et al., 2010). Other studies with high prevalence of strains with LCS do not recommend this technique in their settings (Asgharzadeh et al., 2011). Mixed infections represent another limitation, which could be underestimated using IS6110-RFLP and could be confused with exogenous reinfection (Shamputa et al., 2006). The mixed tuberculosis infection suspected as a result of the IS6110-RFLP method could be clearly identified by MIRU-VNTR typing, which is more sensitive for the detection of multiple M. tuberculosis strains (Allix et al., 2004).

The tails: Risks of IS6110
Understanding the changes that occur in genomes among isolates of M. tuberculosis would give insights into their corresponding differences causing disease.
Many mechanisms can be related to changes in the bacterial genomes, being those mediated by ISs one of the most relevant and better studied (Galas & Chandler, 1989). According to general data, it was considered that among 5 to 15% of spontaneous mutations in the bacterial genomes were due to changes in the ISs locations.
The more common mechanism used by IS to move along genomes is transposition following the enzymatic activity of their encoded transposases, this transposition could lead to the generation of 3-4bp direct repeats (DR) immediately flanking the IS sequences, as it occurs to IS6110 (Thierry et al., 1990b). Recombination is also another mechanism participating in the changes of the location of ISs along the genomes. All those mechanisms lead to IS mediated gene rearrangements, inversions, deletions etc in the bacterial genomes.
ISs could have also some polar effect on the flanking genes, particularly on downstream genes. It has been demonstrated the occurrence of gene activation due to the presence of out-warding promoters within the elements as well as the formation of new promoters upon insertion (Galas & Chandler, 1989).
All those changes could be a risky to the bacteria's genomes integrity, being the carriage of mobile IS either a potential enemy with deadly influence on the bacterial fitness or a helpful ally contributing to the improvement of that fitness. Our current knowledge on how the IS6110-mediated mutations influence in the genome plasticity of the M. tuberculosis genome will be reviewed herewith.

Moving along the genome
The numerous studies published on IS6110-RFLP with epidemiological purposes showed a high level of variability in the locations of this IS along the M. tuberculosis genome (see part 3.2). On the basis of those results the rate of transposition of IS6110 was estimated to be about 18% over a period of 5-6 years. However it seems evident that the events of transposition are related to changes in the environment in which the bacteria are involved. It was suggested that transpositional events occur following mutational burst instead of following a constant mutation rate; this can explain the observation that changes in RFLP patterns would occurred more frequently during transmission and before diagnosis (soon after the bacilli enter inside the host) or after relapses or any other main event during the course of the infection (Schürch et al., 2010). In agreement to this consideration, two rather different half-life times were calculated for the IS6110-RFLP patterns stability in serial patient's isolates: 0.6 and 10.7 years; this most probably be due to changes in the patient's management or to the course of the infection in the different settings compared (Schürch et al., 2010).
Independently of why, how or when its transpositions occurred, IS6110 mediates genome plasticity of members of the MTBC, and that plasticity is ongoing both under controlled environment in vitro and during infection in vivo (Fang et al., 1999b).
To confirm the last assertion, some papers described changes in the RFLP pattern during infection. This is showing that microevolution of the bacilli mediated by IS could occur not only during transmission between patients but also during the course of the disease in a single patient (Al-Hajoj et al., 2010). Besides, the comparison of the whole-genomes of six different H37Rv strains, collected from several laboratories, showed that multiple IS6110 transposition events have occurred in the genome even under in vitro "controlled" environments (Ioerger et al., 2010).

How to identify the IS6110 insertion sites
Since late nineties, several methods have been applied to identify and sequence the loci in which the IS was integrated inside the genome. The methods applied for the identification and sequence of the flaking-regions included cloning of the agarose-excised hybridizing bands (Beggs et al., 2000); reverse dot blot assay (Steinlein & Crawford, 2001); wholegenome microarrays (Kivi et al., 2002), ligation-mediated PCR (Otal et al., 2008) and construction of BACs libraries (Alonso et al., 2011) among others. All these procedures are usually cumbersome and show difficulties to detect all the insertions present, particularly in those strains carrying high IS6110 copy number.
applicability. Whole-genome sequences of tens of MTBC strains are currently finished or at several degrees of accomplishment, however, that number could not compete with the thousand IS6110-RFLP patterns already registered at the available data-bases.
New technologies are being currently under development aiming to determine the IS6110 insertional sites of a high number of M. tuberculosis isolates by using high-throughput methodologies, such as the Masive-Insertion Site sequencing (IS-seq) (Sandoval et al., ESM-2010). Such a kind of procedures will surely help to unravel the IS flanking region sequences in a more feasible manner.

Where IS6110 can be inserted
The identification of the sites of insertion, and its relationships with the phenotype of the corresponding strain, could allow to have insights into the biological meaning of the genes targeted. The identification of those sites showed that this element could interrupt coding regions as well as be located in non-coding sequences (Fang et al., 1999a). The interruption of coding regions can be seen as a sort of natural knock-out mutation of the target gene. On the other hand, the insertion of the element in non-coding regions would have secondary consequences, such as the increasing or decreasing of the expression of the neighbouring genes (McEvoy et al., 2007).
The high variation detected in the RFLP pattern comparing multiple M. tuberculosis isolates showed apparent lack of preferential location of the IS in the bacterial genome. However, one of the first conclussions made evident was that the insertion was not fully at random.
Hermans and co-workers showed the first hot-spot integrative region in the genome described for this element, known as Direct Repeat (DR) locus (Hermans et al., 1991). With minor exceptions, all members of the MTBC carry a copy of the IS6110 integrated in that locus, and that characteristic has been exploited for the development of a widely applied typing procedure called Spoligotyping (see part 3.2). Later on, another hot-spot site of integration was described, namely the insertional preferential locus (ipl) (Fang et al., 1997). It was shown that this corresponded to the ORF of the virulent reference strain Rv0797 that encodes for another insertion sequence, IS1547 (see part 1.2). These preferential integration sites, are characterized by the occurrence of insertion in more than one site close each other (Sampson et al., 2001). The list of preferential sites for the insertion of this element identified at the moment rose to about half a dozen and most probably will be increased (McEvoy et al., 2009).
Appart of the identification of preferential loci for the IS6110 insertions, the location of the insertions along the genome was not equally organized. After the complete genome sequence of the reference virulent strain, namely H37Rv, the IS6110 was found to be inserted more often in some genome regions, on the contrary, other regions lacked in the presence of this IS (Cole et al., 1998). Up to near the 800 first kbp from the origin replication fail in carrying copies of IS6110 in the strain H37Rv. Besides, IS6110 was otherwise located more or less randomly along the rest of the genome. The conclusion was that this part of the genome could be more abundant in essential genes. This result was also seen when studies of other strains were accomplished.
Comparison of the IS6110-RFLP pattern to the corresponding list of insertion loci showed that RFLP has limited level of discriminative power. Thus, the finding of more insertion loci than RFLP bands is not a rare event (Beggs et al., 2000;Warren et al., 2000;Alonso et al., 2011). This result is more evident in those strain carrying high copy number of the IS.
The influence that the insertion could have in the content of active/non active genes was considered that could give insights into the number of genes required for infection, being thus a source of information to detect which were the genes or gene content essential for virulence (McEvoy et al., 2007).
Some works were devoted to compare the insertion loci of virulent with those of avirulent strains. The attenuated vaccine strain M. bovis BCG has major differences on the content of IS6110 compared to the virulent strain M. tuberculosis H37Rv: one and 16 copies respectively. However the IS6110 copy number per genome not appears to be related to the attenuation of the bacilli (see part 5). In fact, the avirulent strain H37Ra has a supplementary copy compared to its parenteral strain the virulent H37Rv. Comparison of H37Rv and H37Ra genomes showed two main differences among them mediated by the insertion of IS6110. However these changes have not a clear role in the attenuation of the avirulent strain (Brosh et al., 1999).
Comparison of several BCG strains showed differences among them in relation to IS6110. The "ancestral" BCG (for example, BCG tokio) carries two copies of the IS sited in the DR region and upstream the two component system phoP-phoR (see part 4.2). This last copy was lost in the "modern" BCG (for example, BCG pasteur) that has a single copy inserted in the preferential loci mentioned, namely DR region (Brosh et al., 2007).
Identification of essential genes could be also possible through the detection of those never carrying inserted ISs, following the assessment that those mutations could be deleterious for the bacteria. An in silico study, based on previous experimental data, estimated that the M. tuberculosis genome contains 35% of essential genes (Lamichhane et al., 2003). Even though the data on genome loci with insertion identifies transposition/recombination events either in coding or in non-coding regions, generally speaking, there has been detected higher number of insertion loci inside coding region. However, the non-coding sequences represent only 10% of the genome suitable to host IS. Therefore the proportion of insertions inside non-coding region is actually higher compared to the proportion of insertions inside coding regions (Table 3). This could represent a sort of "ORF-preserving" behaviour of the genome variability mediated by IS6110 transposition. This is consistent with the suggestion of a greater selection against intra-genic insertion in M. tuberculosis during infection in vivo than when grown in vitro (Yesilkaya et al., 2005).
In a study conducted over 161 clinical isolates of M. tuberculosis, the insertion sites of the IS6110 were determined (Yesilkaya et al., 2005). Only 100 ORF were affected by insertion, and was considered by the authors that represented a global low number of non-essential genes. In conclusion most of the genes in M. tuberculosis might play important role for infection and transmission.
From the data obtained thus far, a high proportion of the IS6110 coding-targeted genes correspond to the functional category containing PE-PPE group of genes (see references in Table 3 Table 3. Number of IS6110 inserted sites recorded from the literature. Percentages were approximate considering that 90% and 10% of the genome corresponded respectively to coding and non-coding sequences. (a) With the exception of the direct repeat loci, all low copy number strains analyzed in this study have IS6110 inserted exclusively inside coding regions.
As previously mentioned, the hallmark that identifies the transposition of IS6110 is the presence of 3-4bp direct repeats immediately flanking the IS sequence. The current availability of annotated whole genome sequences of members of the MTBC allow to differentiate, for each of the IS copy, if the insertion was due to transposition or recombination mechanisms. According to the data derived from 81 insertions in 10 of those members, transposition is the more frequent mechanism used by this IS to be inserted into the MTBC genome regardless the number of copies carried by the genome or the target sequence (insertion into coding or no-coding regions) (Table 4). In all cases, the insertion in the direct repeat loci has been as consequence of a transposition event. . For each genome, the number of copies of the IS6110 per genome was indicated as well as how many carry or not the 3-4bp direct repeat sequence.

IS6110 in the genome of the Beijing family
Efforts were addressed on the study of clinical isolates particularly relevant under microbiological, clinical or epidemiological aspects. This was the case of members of the Beijing family.
Since the first description in 1995, the M. tuberculosis Beijing strain becames a main health problem worldwide. It was responsible of one of the most important outbreaks due to multidrug-resistant strain in the USA during the early nineties (Moss et al., 1997). The M. tuberculosis Beijing identifies a family that includes highly transmissible drug resistant and drug susceptible strains, being currently responsible of about one third of the global TB cases (Alonso et al., 2011).
The members of the Beijing family usually are high copy number strains (HCS) of IS6110 (between 15-25 copies per genome) suggesting the relevance of this element in the variability of their genomes. Supporting this possibility, sublineages of this family were identified to carry an important genome duplication that involves up to 8% of the genome (corresponding to more than 300 genes). Copies of IS6110 were identified flanking that duplication, thus suggesting the occurrence of homologous recombination event mediated by this IS (Domenech et al., 2010).
The insertion sites of IS6110 of two drug resistant Beijing strains (W and 210) were determined (Beggs et al., 2000). These strains shared up to 17 insertion sites. Several features related to IS6110 characterize this family, such as the presence of one copy in the oriC region, the deletion of the right-site DR spacers (from spacers 1 to 34) and similar RFLP multiband pattern profile (Hanekom et al., 2011).
Recently, in a study undertaken in the laboratory of one of the authors (Alonso et al., 2011) the insertion sites of another Beijing strain were determined and compared to those from strains W and 210. A higher proportion of insertion in non-coding region was found including one locus with putative promoter-influence activity (see part 4.2). Nine loci common to all three Beijing strains, including the oriC, were also identified.
The presence of the IS in oriC, the region that control the replication of the genome, is expected to have some influence on the synchronization of the bacterial cell division (Casart et al., 2008). This site is currently considered a preferential locus, and multiple transposition events were described in several clinical isolates from patients in Caracas (Venezuela) (Turcios et al., 2009). Both the infection in the animal model as well as the in vitro growth rate were further analyzed for those clinical strains, and compared to strains lacking in IS at the oriC region (Casart et al., 2008). The presence of IS6110 in the origin of replication enlarge the bacilli and causes slow growth rate in vitro; besides the IS apparently causes attenuation in the animal model.

Switching on and off genes
To date, the data collected on M. tuberculosis confirm that its genome is highly conserved. This result raises the possibility that differences among isolates be more likely found through the study of regulatory and/or metabolic pathways. Taken into consideration the previous assertion, we should not forget that a big proportion of the ORFs identified in the tubercle bacilli are of unknown function. Nevertheless, following the previous statement, the insertion of IS6110 outside ORFs even though saves the bacilli of direct knockout of one/several gene, could putatively have important consequences for gene expression and then influence in the metabolic activity of the bacilli.
IS insertion could interfere both the initiation and the termination of gene expression providing it inserted up-or down-stream the gene coding sequence. It is considered that the influence of the IS on the downstream genes is related to the distance among the gene and the 3'-end of the IS. Thus, a promoter influence is possible within the range of 31 to 300bp of distance among them.
This could be due to a polar effect of the IS and also due to the presence of an outward promoter that was identified close to the 3'-end of IS6110 (Safi et al., 2004;Soto et al., 2004).
The promoter carried by IS6110 has the relevance of being activated inside monocytes (Safi et al., 2004) and its activity was demonstrated in several genes not only in the strain H37Rv but also in other clinical strains including Beijing strains (Safi et al., 2004). Remarkably that promoter activity has been demonstrated by the upregulation of the main two-component system of this bacterium, namely phoP/phoR (Soto et al., 2004).
In this context, it is noteworthy the presence of the IS inserted between dnaA-dnaN proteins that control the genome replication. This insertion was identified in many strains including several belonging to the Beijing family (see part 4.1.3). Moreover the IS could be inserted in both directions in this region (Turcios et al., 2009;Casart et al., 2008) having thus putatively a variable influence on the bacterial cell division.  Much effort should be used to complete the record of the loci in which IS6110 was inserted. That knowledge will much help to our understanding of the mechanisms used by the tubercle bacilli to cause Tuberculosis so successfully.

The IS6110 content: How many are the best number?
A high variability in number of IS6110 is observed amongst the different strains of the MTBC. While M. bovis usually has a unique copy, M. tuberculosis varies from zero to twenty five. In any case, it is difficult to answer the question: What is the best number for the bacteria?

M. tuberculosis low copy number strains (LCS) & high copy number strains (HCS)
M. tuberculosis strains with less than six copies of IS6110 are usually referred as low copy number strains (LCS) in the literature. A few clinical investigations reported the presence of LCS in regions as India, Vietnam or Tanzania. (Barlow et al., 2001;Sankar et al., 2011a). The 66% of the M. tuberculosis strains isolated in Tiruvallur, South India, presented a single copy of the IS6110 or LCS (Shanmugam et al., 2011). In Kanpur district, , North India, the 17% of the M. tuberculosis isolates were LCS (Purwar et al., 2011). High copy number M. tuberculosis strains (HCS), with six or more copies of IS6110, were reported by a greater number of papers. One study from Brazil, reported that 93.6% of M. tuberculosis strains had at least six copies ranging from 1 to 18 (Suffys et al., 2000). In San Francisco, of 1,326 isolates investigated, 90% had six o more copies and only two isolates had no copies of IS6110 (Yang et al., 1998).
A majority (96.2%) of the 183 strains fingerprinted from Kampala were HCS. These strains were isolated from patients with known HIV sero-status. The number of IS6110 copies ranged from 1 to 20 and the frequency of occurrence of IS6110 bands was similar between the two serogroups. The most prevalent pattern observed had 14 copies of IS6110 with the same distribution comparing HIV seropositive and HIV seronegative patients (Asiimwe et al., 2009).
Chauhan et al analyzed 308 isolates of M. tuberculosis from different parts of India and 56 per cent of the isolates showed HCS of IS6110. At the regional level, there was not much difference in the IS6110 copy numbers of isolates from different parts of that country (Chauhan et al., 2007).
A long term population based study analysing 1759 clinical strains from the state of Alabama showed that 65% corresponded with HCS. The results of this study demonstrated that clustering cases is clearly associated with different social factors and risk behaviors but not with high or low number of copies of the IS6110 (Kempf et al., 2005).

Are there any clinical properties associated to LCS or HCS?
After revision of the literature looking for the origin of outbreaks including MDR cases, it was evident that both LCS and HCS were involved in outbreaks at similar proportion. Some examples of large outbreaks in population studies showing different copy number strains are listed in Table 6.
The Beijing family is one of the lineages with the highest number of copies of IS6110 (see part 4.1.3). There are controversies among the behaviour of the Beijing lineage. On the one hand, a Beijing strain named GC1237 has been responsible of epidemic outbreaks since its appeared in the community in 1993 (Caminero et al., 2001), on the other hand, one study conducted in Cape Town (South Africa) found no significant association between the M. tuberculosis genotype and transmissibility within the household (Marais et al., 2009). Besides, there are outbreaks reported caused by LCS, as was the extensive transmission of M. tuberculosis in a rural population with minimal risk factors for TB. This strain was designated as CDC 1551 and the fingerprint showed only 4 copies of the IS6110 (Valway et al., 1998).
Because in most population-based studies the proportion of cases with isolates that have five or fewer copies of IS6110 is low, the impact of these cases in the study of the overall transmission of tuberculosis in a community will be low.

LCS versus HCS and IS6110 location
Several reports have strongly suggested that the severity and clinical manifestations of tuberculosis depend on the immunogenicity and pathogenicity of the infecting M. tuberculosis strain. In this regard the IS6110 sequence varies in number and position within the genome generating a high level of DNA polymorphism among strains.
The location of IS6110 in M. bovis isolates from endogenous reactivation cases from elderly people were studied in comparison to the bovine M. bovis strains, concluding that the presence of more copies in human strains could be related to the adaptation from the animal to the human host (Otal et al., 2008).
In addition to the DR locus, Fomukong et al. detected a highly preferred site of insertion of IS6110, namely "DK1", in M. tuberculosis strains with low copy number. However, the prevalence of this site decreases in HCS, suggesting a separate lineage for the HCS and the LCS (Fomukong et al., 1997). This contrasted with the M. bovis strains analysed without copy inserted at the same genomic position that M tuberculosis strains (Otal et al., 2008). This agreed with the idea that LCS of M. tuberculosis and M. bovis evolved separately after the progenitor acquired IS6110 at the DR region. According to Fomoukong et al (1997), among the different Beijing strains analysed until now, no IS6110 has been detected in the DK1 locus (data not shown).
Molecular epidemiological data support the observation that the copy-number of IS6110 in members of the MTBC may change over time. Factors affecting this rate may include the nature and duration of disease in a host and the opportunity to go through different host environments during the transmission cycle.
IS6110 has been also checked as a tool to analyze the evolution of members of the MTBC. Transposition may have influence on the evolution of the strains, thus the parental strains should carry low copy number and the descendant, more evolved, would carry high copy number. One example that theoretically support that consideration is the Beijing family, members of this family are IS6110 high copy-number and have shown high prevalence and high transmissibility (Mc Evoy et al., 2007). These characteristics could be seen as selective advantages of bacteria to its main purpose: infect humans (Hanekom et al., 2011). However the previous statement was theoretically possible, the presence of preferential sites, together to the presence of forbidden sites makes the study of IS6110 variation in the genomes useless as evolutionary tool (Kivi et al., 2002).