Links to databases and tools
Type 1 diabetes (T1D) affects about 0.25 % of the Caucasian population and is a major health problem due to early disease onset (before the age of 20 in half of the patients), the increasing disease incidence in most populations, the absence of an effective curative treatment, and the high vascular burden associated with residual hyperglycaemia in insulin-treated T1D patients. The disease results from the autoimmune destruction of the insulin secreting pancreatic beta-cells. Indirect evidence implicating auto-immunity in human diabetes relies on the detection of insulitis, islet cell antibodies, T cell responses to ß-cell antigens and association with a restricted set of class II Major Histocompatibility Complex (MHC) haplotypes.
Type 1 or insulin dependent diabetes (IDDM) is under both multifactorial and polygenic control, with the MHC class II locus and the insulin locus being two of the best studied genetic loci (Serreze & Leiter, 2001, Todd & Wicker, 2001, Maier & Wicker, 2005). Genetic studies of T1D aetiology in man have often proved difficult, reflecting the complexity of the genetic control, genetic heterogeneity, and in many cases the lack of intrinsic power of the studies. However, recent advances in human genetics, for example, the availability of the human genome sequence, the establishment of the HapMap (Zeggini et al., 2005, Taylor et al., 2006), the development of high throughput genotyping, the availability of larger sample collections (2007), and the increasing statistical power of the studies provided more promising results. The underlying genetic complexity of T1D and the difficulty of undertaking functional studies in human provide a strong argument for undertaking complementary genetic studies in a model animal system in which genetic heterogeneity and environmental factors can be more easily controlled. Only model animal studies moreover allow the functional genomic studies, which provide definitive proof of genetic causality.
The nonobese diabetes (NOD) mouse (Makino et al., 1980, Hattori et al., 1986) is a well-characterized animal model of IDDM. The NOD mouse spontaneously develops T1D, which shares most of the characteristics of the human disease. However one distinctive feature is that female mice have much higher prevalence than male mice: approximately 80% in females versus 40% in males at 8 months of age. More than forty murine insulin dependent diabetes susceptibility loci (Idd) have been genetically localised, although little information has been obtained about the nature of many of these non-MHC Idd genes. Identification and characterisation of particular Idd genes in the mouse will allow testing of the involvement of the human homologue in T1D and more particularly, since there is unlikely to be just a one to one correspondence between human and mouse, allow the identification of the underlying metabolic network for systematic candidate gene testing in human. An improved knowledge of the underlying genetics of T1D in both mouse and men should provide information about potential drug targets and lead to improved possibilities for earlier diagnostics.
2. Generation and analysis of NOD congenic mice
The mouse is the organism of choice as a model for human disease. Not only that many thousands of mutations are already isolated and projects to inactivate all mouse genes are underway (Grimm, 2006), but also 450 inbred strains have been described (Beck et al., 2000) with a wealth of genetic and phenotypic diversity. This collection of inbred strains provides a basis for studying phenotypes under complex genetic control. The Mouse Phenome Project has been organized to establish a collection of baseline phenotypic data on commonly used and genetically diverse inbred mouse strains and to make this information publicly available through a web-accessible database (see database links).
2.1. Definition of recombinant inbred, consomic and congenic strains
Different breeding systems have been established in the mouse (Figure 1). Recombinant inbred (RI) strains contain a unique admixture of genetic contributions in approximately equal proportions from its two original progenitor inbred strains. Recombinant inbred (RI) strains are established by crossing animals of two inbred strains, followed then by 20 or more generations of brother/sister matings (Bailey, 1971). The Complex Trait Consortium (Churchill et al., 2004) represents the largest community effort to date to generate some 1000 RIs from eight different parental strains to identify genes involved in complex disorders. Recombinant Congenic (RC) strains are also established by an initial crossing between two inbred strains, but this is followed by a few, usually two, backcrosses of the resulting F1 hybrids to one of the parental strains, called the 'recipient' strain, with subsequent brother/sister intercrossing (Demant & Hart, 1986). In both RI and RC strains the result is a mosaic genetic structure with blocks of genetic material from one parent interspersed with blocks of genetic material from the other parent. RI and RC strains differ however in the relative contribution of the two parents.
Consomics and congenics are inbred strains in which part of the genome of one mouse strain is transferred to another by backcrossing the donor strain to the recipient strain, followed by intercrossing in later generations to ensure homozygosity. In the case of the NOD congenic strains the recipient is the NOD mouse and the donor in most cases a C57BL/6 mouse. Genetic selection is systematically practised to ensure retention of the desired genetic material from the donor strain in each backcross. Most genetic markers and their alleles can be conveniently looked up in the Mouse Genome Informatics (MGI) database. The breeding method was first described by Snell who produced histo-incompatible congenic strains, that were originally called 'congenic resistant' strains (Snell, 1978). In the case of a consomic strain an entire chromosome is transferred (Nadeau et al., 2000, Santos et al., 2002). In the case of a congenic strain a chromosomal segment, also termed the differential segment, is transferred (Boyse & Bentley, 1977, Wakeland et al., 1997). Congenic strains will normally carry differential regions of 10-20 Mb in size (Peirce, 2001) unless specific efforts are made to reduce the size of the differential segment (see below). Many of the existing congenic strains, including many of the NOD congenics, can already be retrieved directly via the Jackson Laboratories. Congenic strains need to be distinguished from co-isogenic strains that differ at only a single locus from their parental strain (Roths et al., 1984). Co-isogenics can be derived by gene targeting, for example through homologous recombination, or by mutagenesis approaches. All above described strains have in common that they allow repeated phenotyping of large numbers of genetically homogenous animals under very defined environmental conditions. This would never be possible by studying human subjects.
Recombinant inbred strains are generated by intercrosses of F1 mice and subsequent brother-sister interbreeding. Consomic strains are generated by repeated backcrossing to the parental receiver strain (NOD mouse), but only one chromosome is derived from the diabetes resistant donor strain. In congenic strains only the differential chromosome segment is derived from the donor strain.
2.2. Establishment of congenic strains
The starting point for the identification of a genetic locus controlling a Complex trait, or Quantitative trait locus (QTL), or in the case of T1D diabetes an Idd locus, is most often the generation of an F1 intercross from two parental inbred strains differing in phenotype for the trait under study and the subsequent intercrossing of F1 progeny to produce an F2 generation that can be subject to genetic analysis. Alternatives may involve the backcrossing of the F1 progeny to either one or the other parental strains, to generate first stage backcross generation animals (BC1) and/or eventually backcross 2 animals (BC2). The number of animals needed for the F2, BC1 or BC2 analysis depends on the strength of the phenotypic effect conferred by each QTL under study and also on the size of the genetic interval(s) to be identified. Generally, experimental cohort sizes are in the range of several hundred animals. Most of the easily identified QTL present rather extreme phenotypes that confer above average contributions to the overall phenotype. Consulting the Mouse Genome Informatics (MGI) database (see below) reveals for example that whilst an initial localisation has been described for some 2000 quantitative trait loci, in less than 1 % of the cases has the gene(s) been identified. Contributions of the average single QTL to the overall phenotype has been estimated to be 5% or less (Flint et al., 2005).
Alternatives to performing new crosses involve the use of existing sets of recombinant inbred lines (RI) and/or recombinant congenic (RC) strains. The analytical power of an RI set depends on the number of generated lines and the degree of genetic/phenotypic variation in the parental strains. Whilst RI sets can deliver higher mapping resolution than F2 mice (Flint et al., 2005), current RI sets will often have insufficient power to identify genes with a small effect on the QTL. In contrast to RI sets, sets of RC strains have the property of limiting the amount of the genome that has to be searched for multiple genes involved in QTLs as long as they have been selected for the phenotype of interest. The genome of a standard RC strain comprises, on average, only 12.5% from the donor strain (Stassen et al., 1996).
Consomic and especially congenic strains are key resources for the dissection of T1D QTLs. Understanding the role of an individual QTL is often hampered by the complexity of its genetic and phenotypic interactions with other participating QTLs. Considering susceptibility to type 1 diabetes and the well-known disease model for type 1 diabetes, the nondiabetes (NOD) mouse (Makino et al., 1980, Hattori et al., 1986), it turns out that diabetes sensitive strains such as NOD carry not only diabetes sensitivity loci but also QTL loci conferring diabetes resistance. Conversely, diabetes resistant strains such as C57Bl/6 and C3H/HeJ (Rogner et al., 2001) carry loci conferring diabetes susceptibility as well as resistance genes. For example, on mouse chromosome 6 we have identified three loci involved in T1D, termed Idd6, Idd19, Idd20. Idd6 (Carnaud et al., 2001) and Idd20 (Morin et al., 2006) are both NOD susceptibility loci and Idd19 (Melanitou et al., 1998) is a NOD resistant locus. Furthermore, Idd19 can mask Idd6 phenotypes and Idd20 can mask Idd19 phenotypes. It is therefore the overall balance and interactions in a given congenic strain and an inbred strain, which determines the final phenotype (Yang & Santamaria, 2006). Some idea of the overall complexity is given by the forty or more murine insulin dependent diabetes loci (Idd), which have been genetically identified. In this complex situation, sets of congenic strains have been of critical importance for breaking down the overall genetic complexity. The second reason to generate congenic strains is to study the effects of a gene mutation, a knockout, a knockin, or a transgene on one or several different genetic backgrounds. This approach is of particular interest for studies on genetic modifiers (Montagutelli, 2000, Nadeau, 2001). The use of congenics, although often onerous, remains less time consuming than introducing the same modification in parallel onto different genetic backgrounds by gene targeting or transgenesis. Indeed in many cases the latter approach may be impossible as the appropriate ES cell lines may not be available. When congenic construction is used, it should however be born in mind that it is not a single gene that is being transferred but the gene and its surrounding genetic region. There have been examples showing that such genomic fragments can contribute to the phenotype or in even confer a new phenotype. Different breeding strategies to minimise such problems have been discussed (Wolfer et al., 2002).
Congenic strains are derived by repeated backcrossing of the donor strain to the recipient strain with selection for the differential segment. This breeding is then followed by sister/brother interbreeding of the backcrossed progeny. In practice, female F1 animals are mated with recipient strain males to establish the BC1 generation. If males are used at this stage, the final congenic strain may still carry the donor derived Y-chromosome. Males heterozygous for the selected chromosome region are then repeatedly backcrossed to recipient females (e.g. NOD females) during congenic strain derivation. Congenic strains are then rendered homozygous for the genetic intervals under study by intercrossing heterozygous males and females of the same genotype and subsequently maintained by brother and sister mating. When repeated backcrossing is used to establish a congenic strain a minimum of nine generations of backcrossing is normally recommended to remove 99.9% of the unwanted donor material (Silver, 1995), though the exact number of required backcross generations appears somewhat arbitrary (Festing, 1979). A genome scan should be carried out before fixing the congenic interval so that if further backcrossing is necessary to remove a contaminating genomic fragment this can be carried out before the congenic strain is rendered homozygous. When choosing recombinants to fix genetic intervals we recommend ensuring the highest possible density of markers within the differential fragment (at least 1-2 cM) to avoid partial heterozygosity. A good advice is also to check the interval of interest for eventual heterozygosity once the strains have been fixed because recombinations may have occurred in the parental heterozygous animals.
Alternative congenic breeding schemes have been established that involve both positive selection for the desired differential segment and negative selection against the rest of the donor genome during early backcross generations. In such breeding schemes, which are called 'speed congenics', the genetically 'best' animals, i.e. those carrying the differential segment and minimal detectable donor strain material elsewhere in the genome, are selected. Theoretically the process can lead to the creation of a congenic strain with less than 0.5% contaminating donor genome unlinked to the differential segment within a total of five generations or four backcrosses (Markel et al., 1997). Simulations suggest that screening between 16 and 20 male progeny per generation with markers spaced every 25 cM most efficiently reduces unlinked contaminating donor genome. Use of larger progeny cohorts and higher marker density seems of little advantage in reducing contaminating donor genome until later backcross generations. High-density genotyping of the differential segment in later generations is however necessary to reduce the size of the target region below 20-30 cM (Wakeland et al., 1997). Experience suggests that both 'best' and 'second best' males should routinely be kept for breeding, in particular when poor breeding performance may occur. Simulation studies have suggested that marker-assisted breeding strategies can lead to increased background heterogeneity, or 'gaps', in the recipient genetic background as compared to standard breeding procedures. This suggests that additional backcrossing may still be required in order to reduce the number and length of such gaps (Armstrong et al., 2006). On the assumption of putative remaining gaps, it may be of interest to derive a given set of congenic strains from a single breeding pair and to generate at least one congenic strain carrying no differential fragment as an internal control for phenotyping.
The benefit that can be obtained from a panel of congenic strains is critically dependent on the quality of the phenotyping available, which in turn, obviously depends on the disease under study. The availability of sub-phenotypes for characterisation is often critical to the fine dissection of the trait. Analysis often starts with the most robust and basic phenotype before proceeding to more subtle analysis of sub-phenotypes. Phenotyping employed in autoimmune disorders ranges from histology, evaluation of physiological parameters, to metabolomics and transcriptional profiling.
2.3. Methods to the analysis of type 1 diabetes in the NOD mouse model
The baseline for a systematic analysis of a given NOD congenic strain compared to the original NOD mouse or better also to a NOD control congenic strain is the follow-up of spontaneous diabetes development during a period of about 30 weeks. Overt diabetes starts around 10 to 12 weeks of age. For the monitoring of diabetes, measurement of glucose levels in the urine is usually sufficient. This simple test is in itself not much invasive to the animals, but importantly, the number of animals in the test series should be high enough to allow the detection of small changes in diabetes incidence. At this stage, generally 30 to 50 female mice and sometimes more are required. Using such a high number of animals avoids overestimation of changes occurring due to 'environmental' effects, e.g. the sometimes observed cage-specific effects. Some investigators prefer to keep their congenics together with the NOD controls in the same cage during diabetes testing. Young female animals from different NOD congenic strains usually do not aggress each other when kept in the same cage, but good attention should still be paid to the behaviour of the animals. Overcrowed cages lead to additional stress amongst the animals and this can have an influence on the diabetes incidence. Also, pregnant females should not be included in the testing because hormonal changes influence diabetes development.
Additional monitoring of diabetes incidence may involve an accelerated form of diabetes after injection of drugs such as cyclophosphamide (CY). CY is an alkylating agent that leads to the depletion of regulatory T cells, whereas IFN-gamma producing lymphocytes are CY resistant (Ablamunits et al., 1999). This method allows evaluating diabetes incidence within a much shorter period of 12 weeks only, but is not necessarily appropriate to analyse all T1D associated loci. For example, loci that reveal diabetes protection due to activity of regulatory T-cells may not be discovered because CY eliminates these cells (Rogner et al., 2001). Both methods to monitor T1D incidence can be combined with the evaluation of insulitis by rating the infiltration of the pancreatic islets with immune cells. This histological analysis is mostly performed at 12 weeks of age and usually requires about 10 animals per strain. It can be performed at earlier ages, but much less insulits and peri-insulitis should be expected, and usually no signs of insulitis are found before the age of three weeks.
The systematic evaluation of the number of immune cells (T cells, B cells and other relevant cell types) in different organs (thymus, spleen, lymph nodes, islets, etc.), of the cytokine, insulin and antigen levels can be much helpful to get further insights into the protective mechanism underlying the T1D locus under study. The number of animals used for these tests can often be limited to six, and different tests can be combined at this stage. It is also useful to perform the tests at different ages (4, 8, 12, and 16 weeks) to relate phenotypes to different stages of diabetes development. Finally, extended testing of cell activation or proliferation and adoptive transfer assays are good methods to complete the analysis. Since several examples have shown that T1D diabetes loci overlap with other autoimmune loci, NOD congenic strains may exhibit phenotypes in other organs and tissues than those directly involved in diabetes. One typical and well studied example of this is sialitis (Hjelmervik et al., 2007). In any case, the defined subphenotypes may be very helpful in following up QTLs during genetic dissection as they may vary less than the diabetes incidence.
2.4. Refining the candidate region
2.4.1. Generation of subcongenic strains
The genetic interval conferring a particular phenotype in a given congenic strain can often be reduced and refined by identification of new recombinants during further backcrossing. It becomes increasingly difficult to obtain the necessary recombinants as the genetic distance under study is reduced and then much larger breeding populations are needed. Several studies have shown the existence of sex-specific differences in recombination frequency (Shiroishi et al., 1991, Lynn et al., 2005, Morelli & Cohen, 2005) and it can therefore, on occasions, be useful to change the direction of the cross and use heterozygous females instead of males or vice versa. It should be noted that recombination does not occur with random efficiency throughout the genome and is often higher at so called 'hotspots'. Some hypotheses predict that recombination will occur more often in regions where genes density is higher and less often in what has been termed 'gene deserts'. This was observed in the human genome, where recombination rates are found to be higher in regions with higher gene density (Fullerton et al., 2001). The informative polymorphic markers necessary to characterize the mouse recombinants are listed in the MGI database and only rarely additional comparative sequencing efforts will be required to identify additional polymorphisms.
When analysing congenic strains, it has been observed that the phenotypic effects often get smaller as the genetic interval is reduced and subcongenics are generated (Hung et al., 2006). This most often occurs when the original effect was due to the combination of several genes and this may reflect the relatively frequent occurrence of QTLs as haplotype blocks. In other cases, genetic interactions may lead to the suppression of phenotypes when intervals are combined (Rogner et al., 2001). In theses cases, complexity within the genetic interval can normally still be successfully addressed, although it may require larger number of animals and more strains to be phenotyped and studied.
2.4.2. Haplotype analysis
In silico mapping has been suggested to be a powerful computational based method for predicting chromosomal regions regulating phenotypic traits (Grupe et al., 2001). Single nucleotide polymorphisms (SNPs) in different inbred mouse strains are organised in an alternating mosaic pattern of relatively large, typically 1-2 Mb, genomic regions (blocks) of low or high polymorphic variation (Lindblad-Toh et al., 2000, Wade et al., 2002). Regions that are poor in polymorphic markers have been assigned as regions of common ancestry. Haplotype blocks are defined as genomic segments harbouring sets of coupled polymorphisms that reflect a common ancestral origin (Frazer et al., 2004, Yalcin et al., 2004). Genome-wide association studies involving the correlation of a phenotype, for instance disease prone and disease resistance, over a wide selection of different inbred mouse strains, to patterns of genetic variation in these same strains have been suggested to provide powerful indications of potential candidate regions (Grupe et al., 2001). QTL genes are likely to be found in regions of different ancestry among any pair of differentially affected strains, whilst a given phenotype observed in common in different strains will likely be controlled by a region that is held in common between these strains (Guo et al., 2006). The approach requires that well standardised parental phenotype data is available for many inbred strains. The PHENOME project was established in part with this type of application in mind.
Applied to congenic strain analysis, knowledge of the haplotype block structure within a congenic interval may focus interest on a particular subregion within the congenic candidate region (Figure 2). There are several examples in the literature that have demonstrated the value of this type of combined approach (Ikegami et al., 1995, Lyons et al., 2000, Hillebrandt et al., 2005). Haplotype mapping in the NOD mouse appears however less powerful as only one sensitive strain can be compared to the panel of diabetes resistant strains.
In whatever way in silico mapping is applied, the success of the strategy depends on the size of the underlying haplotype blocks and the panel of relevant mouse strains. The approach also depends on rates of mutation within regions of shared haplotype being sufficiently low as to not obscure the underlying patterns of haplotype variation. Other work indicates that the definition of haplotype blocks is not that robust and that methods for QTL mapping may fail if they assume a simple block-like structure (Yalcin et al., 2004).
Comparison of the SNP distribution within the candidate interval between disease sensitive (NOD, chromosome represented by a white bar) and resistant (black segments) strains and generation of subcongenic strains may allow the initial candidate region to be reduced. The reduced interval is indicated by two black lines.
2.5. Transcriptional profiling
The use of expression profiling for candidate gene identification is based around the idea that in many cases the QTL will reflect also quantitative changes in the expression of the underlying gene(s). This approach has proven to be particularly successful in cases where the phenotype of the disease under study has provided clues as to the class(es) of genes or to the tissues in which the candidate gene is likely to be expressed. Annotated lists of genes for the region under study obtained from e.g. Ensembl, MGI or NCBI databases, including in silico expression profiling approaches based on exploiting data from sources such as Serial Analysis of Gene Expression (SAGE) libraries, microarray analysis based datasets, and cumulative data on Expressed Sequence Tags (ESTs), allow the tissue expression profiles of the genes to be established. Where necessary this data is then validated for the most promising candidates by comparative expression profiling of the discriminatory congenic strains using techniques such as quantitative real-time PCR. A complementary approach uses genome wide microarray based expression profiling to identify possible genetic pathways (Eaves et al., 2002). The efficiency of both strategies depends on the completeness of the gene annotations and the exhaustiveness of the gene representation that is being exploited and is influenced by both the cellular complexity of the target tissue and relative transcript expression levels. In the case of complex tissues sensitivity maybe increased by analysing cellular subpopulations of the tissue in question (Lock et al., 2002, Scearce et al., 2002).
A recent Genome-Wide Association Study (GWAS) has defined over 150 genomic regions containing variation predisposing to immune-mediated disease. The results provide evidence that for many of the complex diseases common genetic associations implicate regions encoding proteins that physically interact (Rossin et al., 2011). This type of analysis demonstrates how bioinformatics tools can be applied to the identification of gene networks in complex diseases.
2.6. Resequencing and search for polymorphisms
Once a candidate region has been defined and characterized, the crucial difficulty of identifying and validating the causative gene(s) arises. It is important to realise that there is rarely one single approach but rather a spectrum of complementarity approaches that can be used to identify without ambiguity the gene(s) underlying a QTL.
The identification of changes in primary nucleotide sequence, which is powerfully diagnostic in the case of mutations in monogenic disorders, is of much less certain value in the case of QTL characterisation. Nonsense or stop codons that completely abolish gene function are much less likely to underlie QTL variation than in mutations affecting monogenic traits, and it is often unknown whether to expect changes in coding sequences or in non-coding regulatory sequences. Extensive nucleotide variation may also occur outside of the exon sequences of some genes, which renders the identification of causal polymorphisms in such regions more difficult or impossible. This indeterminacy may moreover be compounded if the causal gene lies within a region, which shows a high degree of polymorphism between the parental strains. In such cases the polymorphism within the genomic sequence of the causal gene will likely be no more marked than that of the surrounding genes. Such regions of elevated polymorphism are to be found throughout the mouse genome and reflect the breeding history of the mouse inbred strains (see above). Such caveats suggest that resequencing of entire regions in the donor and recipient strains is an approach that can help to exclude a certain proportion of candidate genes rather than lead to the unambiguous identification of the responsible gene. The researcher can already benefit from the ongoing resequencing projects that include the NOD mouse.
2.7. Further validation: overlap with other autoimmune loci and interspecies comparison
The genome-scale analysis in type 1 diabetes has resulted in a number of non-major histocompatibility complex loci of varying levels of statistical significance. Comparative analysis of the position of loci for type 1 diabetes with candidate loci from other autoimmune diseases has shown considerable overlap (Becker et al., 1998). This supports a hypothesis that the underlying genetic susceptibility to type 1 diabetes may be shared with other clinically distinct autoimmune diseases such as systemic lupus erythemastosus, multiple sclerosis, and Crohn's Disease. This is less surprising when looking at the cellular and molecular levels that indicate similar mechanisms and pathways underpinning the diseases. An interesting way to further analyse T1D regions of interest is therefore to study their link to other autoimmune diseases (Jiang et al., 2009). For example, several loci involved peripheral neuropathy overlap with T1D loci in the NOD mouse. The MGI database lists phenotypes that have already been linked to genomic regions. The T1D database provides directly all known locations and genes in human and in mouse. Another approach is to analyse the syntenic regions in human and in rat. The Rat Genome database provides a tool to visualize syntenic regions for all three species. The Davis Human/Mouse Homology Map shows syntenic regions between mouse and human.
2.8. Gene manipulation in the NOD mouse
Neither the identification of sequence variation nor of altered expression profiles is of itself sufficient to establish causality. For this, techniques of gene inactivation, gene overexpression and replacement of the allele of one strain by that of the other strain, need to be undertaken. Ongoing programmes for inactivation or mutating all mouse genes will increasingly provide embryonic stem (ES) cells carrying a knockout for the candidate genes under study. Interestingly, some of the constructs used in the global knockout programmes may facilitate application of a knockin strategy, which would enable the integration of alternative functional allelic forms of the gene to be undertaken (Garcia-Otin & Guillou, 2006). Database links that provide information about existing knockouts and mutants are listed below. Exploiting such resources for QTL validation may in many cases be complicated by the need to cross the knockout onto the relevant genetic background. Direct gene targeting and deriving mouse models on the T1D background are however possible since a germline competent embryonic stem cell line from the nonobese diabetic mouse has been established (Nichols et al., 2009).
An alternative strategy to gene disabling is the use of small interfering RNA (siRNA) to inhibit/knock-down gene expression. RNA interference (RNAi) is a highly evolutionary conserved process of post-transcriptional gene silencing (PTGS) in which double stranded RNA (dsRNA), when introduced into a cell, causes sequence-specific degradation of homologous mRNA sequences (Fire et al., 1998). Double-stranded RNAs of around 21 nucleotides in length inhibit the expression of specific genes (Hasuwa et al., 2002, Xia et al., 2002, Qin et al., 2003). Whilst it has turned out that RNA silencing can be causative in disease development, it also provides a useful research tool in type 1 diabetes (Kissler et al., 2006). The RNAi WEB database provides a good summary of such studies and excellent practical advices. The long-term and stable inhibition of gene function normally necessary for assessing the effect of a QTL implies that delivery systems capable of supporting stable and persistent expression in vivo are necessary. Cloning of stable interfering short hairpin (sh) RNA molecules and use of several viral vectors and transposon based non-viral vectors have been reported that fulfil this requirement (Naldini, 1998, Yant et al., 2002). Interestingly, such RNAi approaches can be used with embryos circumventing the strain restriction bottleneck associated with ES cell use. Although there are problems associated with off target non-specificity of RNAi that require stringent control, RNAi can be engineered to be specific for a particular allelic forms of a given gene. Providing technical problems can be overcome, experiments aimed at targeting RNAi to one or other alleles of a candidate gene in F1 animals from two discriminatory congenic strains should prove particularly informative.
Overexpression studies provide an alternative if less stringent way of either further reducing the size of the candidate region or to validate formally candidate genes, when the trait under consideration is dominantly or co-dominantly expressed (Symula et al., 1999, Giraldo & Montoliu, 2001). Such approaches are often carried out using BAC clones, which may be one to several hundred kilobases in size and therefore allow the gene to be tested along with many of the cis-acting sequences necessary for its regulated expression. Such studies are being facilitated by the construction of BAC libraries for many mouse strains other than C57Bl/6 and 129/Sv (Osoegawa et al., 2000). The fingerprinting of these libraries, the ready availability of both mouse genomic draft and finished sequences (Waterston et al., 2002) and of strain resequencing programmes, allow the DNA hybridisation probes necessary for BAC isolation to be easily designed. Although BAC transgenesis is an efficient process, it should be noted that both copy number variation and variation in site of integration in the genome leading to position effects may lead to modification in gene expression profiles which might hamper or obscure the identification of the gene where subtle phenotypes are concerned.
It should be noted that less laborious but also less complete approaches include the transfection of particular cell types prior to adoptive transfer experiments and ex vivo studies.
2.9. Identification of new candidate genes using NOD congenic strains
The use of congenic strains has proven to be a successful approach to the identification of several T1D candidate genes. Amongst non-major Histocompatibility Complex (MHC) Idd genes that have been cloned are Idd3 for which Il2 has been implicated (Lyons et al., 2000), Idd5.1 where Ctla4 and Icos were suggested (Greve et al., 2004), and Idd5.2 where Nramp is a likely candidate (Kissler et al., 2006).
Our own research has focused on understanding the Idd6 locus on mouse chromosome 6 (Rogner et al., 2001, Rogner et al., 2006). We have shown that the congenic strain NOD.C3H 6.VIII (6.VIII), carrying C3H alleles at the 5.8 Mb Idd6 genetic locus, is resistant to the spontaneous development of diabetes and that splenocytes and CD4+ T cell populations from this strain suppress the development of diabetes in NOD.SCID mice more efficiently than those from NOD mice in diabetes transfer experiments. Congenic fine-mapping has further localized genetic control of the increased splenocyte suppressive activity to the 700 kb candidate Idd6.3 interval. Transcriptional profiling studies of genes in the Idd6.3 interval revealed Arntl2 as a promising candidate gene for diabetes protection (Hung et al., 2006). Arntl2 belongs to the basic helix-loop-helix/Per-Arnt-Sim (bHLH/PAS) family of transcription factors that are involved in the control of circadian rhythm (Jones, 2004). Interestingly two other members of the ARNT family, ARNT (Gunton et al., 2005) and ARNTL1 (Woon et al., 2007, Ando et al., 2008) have already been implicated in type 2 diabetes. Taken together these studies suggest a potentially important role for these circadian rhythm related genes in insulin and sugar metabolism. Other members of the bHLH/PAS family such as the Aryl-hydrocarbon receptor (AhR) have been shown to control key regulatory immune functions and to be involved in autoimmunity (Quintana et al., 2008, Veldhoen et al., 2008). We recently reported functional studies aimed at correlating Arntl2 expression to T cell activation and diabetes transfer. Our results show that upregulation of Arntl2 inhibits the proliferation rate of CD4+ T cells ex vivo and suppresses the disease promoting activity of diabetogenic splenocytes in vivo, whilst suppression of Arntl2 by RNAi leads to expansion of CD4+ T cells in vivo, to decreased levels of regulatory T cells and to increased diabetes incidence (He et al., 2010a, 2010b).
|Available mouse models||http://jaxmice.jax.org/index.html|
|Center of Rodent Genetics||http://www.niehs.nih.gov/research/resources/collab/crg|
|Complex Trait Consortium||http://www.complextrait.org/|
|Gene expression data||http://www.informatics.jax.org/menus/expression_menu.shtml|
|Knockout mouse project||http://www.nih.gov/science/models/mouse/knockout/index.html|
|Mouse Genome Informatics database (MGI)||http://www.informatics.jax.org|
|Mouse Phenome Database||http://www.jax.org/phenome|
|Mouse sequence databases||http://www.ncbi.nlm.nih.gov|
|Online books on mouse genetics and human molecular genetics||http://www.informatics.jax.org/silver/index.shtml|
|Online Mendelian Inheritance in Man, OMIM||http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM|
|Rat Genome Database||http://rgd.mcw.edu|
The table lists useful links to existing databases and to other tools.
Despite the past 75 million years of separate evolution, only about 300 genes, corresponding to 1 percent of the 25000 - 30,000 genes in the mouse genome, were found to be without a counterpart in the human genome. (Marshall, 2002, Okazaki et al., 2002). This leads to the idea that in the majority of cases the physiology of processes in man and mouse will be similar or identical. Most studies of mouse models of human disease are applied on this basis. Diseases under monogenic control provided often support for this assumption (Villasenor et al., 2005). When mutations in a given gene failed to produce the same phenotype in human and mouse, the differences were mainly imputable to differences in physiology, to subtle differences in gene regulation, epigenetic factors, or differences in the specific mutation itself rather than to the complete absence of the gene. The predictive value of QTLs identified in one species for the other is generally considerably weaker. This is almost certainly due to the genetic heterogeneity underlying most complex traits, to differences in penetrance, and to differences in the range of naturally occurring variation at a given locus in man and mouse. This however does not necessarily imply that the underlying genetic networks are highly divergent and indeed in type 1 diabetes this is clearly not the case. Indeed the identification of causative genes for the autoimmune disease type 1 diabetes (T1D) in humans and in the NOD mouse has shown that susceptibility or resistance to type 1 diabetes, involving genes and pathways contributing to the disease are often held in common by both species. For example, gene variants for the interacting molecules IL2 and IL-2R-α (CD25), which are members of a same pathway that is essential for immune homeostasis, are present in both mice and humans (Wicker et al., 2005). In this context identifying the mouse genes involved in T1D and consolidating our knowledge of the pathway underlying the pathogenesis will identify novel genes, which can be studied for their implication in the human disease.
I thank Philip Avner and Christian Boitard for scientific discussion, and the ANR, the ARD, the EFSD/JDRF/Novo Nordisk Programme, the CNRS, INSERM and the Pasteur Institute for funding of our research.