Links to databases and tools
Type 1 diabetes (T1D) affects about 0.25 % of the Caucasian population and is a major health problem due to early disease onset (before the age of 20 in half of the patients), the increasing disease incidence in most populations, the absence of an effective curative treatment, and the high vascular burden associated with residual hyperglycaemia in insulin-treated T1D patients. The disease results from the autoimmune destruction of the insulin secreting pancreatic beta-cells. Indirect evidence implicating auto-immunity in human diabetes relies on the detection of insulitis, islet cell antibodies, T cell responses to ß-cell antigens and association with a restricted set of class II Major Histocompatibility Complex (MHC) haplotypes.
Type 1 or insulin dependent diabetes (IDDM) is under both multifactorial and polygenic control, with the MHC class II locus and the insulin locus being two of the best studied genetic loci (Serreze & Leiter, 2001, Todd & Wicker, 2001, Maier & Wicker, 2005). Genetic studies of T1D aetiology in man have often proved difficult, reflecting the complexity of the genetic control, genetic heterogeneity, and in many cases the lack of intrinsic power of the studies. However, recent advances in human genetics, for example, the availability of the human genome sequence, the establishment of the HapMap (Zeggini et al., 2005, Taylor et al., 2006), the development of high throughput genotyping, the availability of larger sample collections (2007), and the increasing statistical power of the studies provided more promising results. The underlying genetic complexity of T1D and the difficulty of undertaking functional studies in human provide a strong argument for undertaking complementary genetic studies in a model animal system in which genetic heterogeneity and environmental factors can be more easily controlled. Only model animal studies moreover allow the functional genomic studies, which provide definitive proof of genetic causality.
The nonobese diabetes (NOD) mouse (Makino et al., 1980, Hattori et al., 1986) is a well-characterized animal model of IDDM. The NOD mouse spontaneously develops T1D, which shares most of the characteristics of the human disease. However one distinctive feature is that female mice have much higher prevalence than male mice: approximately 80% in females versus 40% in males at 8 months of age. More than forty murine insulin dependent diabetes susceptibility loci (
2. Generation and analysis of NOD congenic mice
The mouse is the organism of choice as a model for human disease. Not only that many thousands of mutations are already isolated and projects to inactivate all mouse genes are underway (Grimm, 2006), but also 450 inbred strains have been described (Beck et al., 2000) with a wealth of genetic and phenotypic diversity. This collection of inbred strains provides a basis for studying phenotypes under complex genetic control. The Mouse Phenome Project has been organized to establish a collection of baseline phenotypic data on commonly used and genetically diverse inbred mouse strains and to make this information publicly available through a web-accessible database (see database links).
2.1. Definition of recombinant inbred, consomic and congenic strains
Different breeding systems have been established in the mouse (Figure 1). Recombinant inbred (RI) strains contain a unique admixture of genetic contributions in approximately equal proportions from its two original progenitor inbred strains. Recombinant inbred (RI) strains are established by crossing animals of two inbred strains, followed then by 20 or more generations of brother/sister matings (Bailey, 1971). The Complex Trait Consortium (Churchill et al., 2004) represents the largest community effort to date to generate some 1000 RIs from eight different parental strains to identify genes involved in complex disorders. Recombinant Congenic (RC) strains are also established by an initial crossing between two inbred strains, but this is followed by a few, usually two, backcrosses of the resulting F1 hybrids to one of the parental strains, called the 'recipient' strain, with subsequent brother/sister intercrossing (Demant & Hart, 1986). In both RI and RC strains the result is a mosaic genetic structure with blocks of genetic material from one parent interspersed with blocks of genetic material from the other parent. RI and RC strains differ however in the relative contribution of the two parents.
Consomics and congenics are inbred strains in which part of the genome of one mouse strain is transferred to another by backcrossing the donor strain to the recipient strain, followed by intercrossing in later generations to ensure homozygosity. In the case of the NOD congenic strains the recipient is the NOD mouse and the donor in most cases a C57BL/6 mouse. Genetic selection is systematically practised to ensure retention of the desired genetic material from the donor strain in each backcross. Most genetic markers and their alleles can be conveniently looked up in the Mouse Genome Informatics (MGI) database. The breeding method was first described by Snell who produced histo-incompatible congenic strains, that were originally called 'congenic resistant' strains (Snell, 1978). In the case of a consomic strain an entire chromosome is transferred (Nadeau et al., 2000, Santos et al., 2002). In the case of a congenic strain a chromosomal segment, also termed the differential segment, is transferred (Boyse & Bentley, 1977, Wakeland et al., 1997). Congenic strains will normally carry differential regions of 10-20 Mb in size (Peirce, 2001) unless specific efforts are made to reduce the size of the differential segment (see below). Many of the existing congenic strains, including many of the NOD congenics, can already be retrieved directly via the Jackson Laboratories. Congenic strains need to be distinguished from co-isogenic strains that differ at only a single locus from their parental strain (Roths et al., 1984). Co-isogenics can be derived by gene targeting, for example through homologous recombination, or by mutagenesis approaches. All above described strains have in common that they allow repeated phenotyping of large numbers of genetically homogenous animals under very defined environmental conditions. This would never be possible by studying human subjects.
Recombinant inbred strains are generated by intercrosses of F1 mice and subsequent brother-sister interbreeding. Consomic strains are generated by repeated backcrossing to the parental receiver strain (NOD mouse), but only one chromosome is derived from the diabetes resistant donor strain. In congenic strains only the differential chromosome segment is derived from the donor strain.
2.2. Establishment of congenic strains
The starting point for the identification of a genetic locus controlling a Complex trait, or Quantitative trait locus (QTL), or in the case of T1D diabetes an
Alternatives to performing new crosses involve the use of existing sets of recombinant inbred lines (RI) and/or recombinant congenic (RC) strains. The analytical power of an RI set depends on the number of generated lines and the degree of genetic/phenotypic variation in the parental strains. Whilst RI sets can deliver higher mapping resolution than F2 mice (Flint et al., 2005), current RI sets will often have insufficient power to identify genes with a small effect on the QTL. In contrast to RI sets, sets of RC strains have the property of limiting the amount of the genome that has to be searched for multiple genes involved in QTLs as long as they have been selected for the phenotype of interest. The genome of a standard RC strain comprises, on average, only 12.5% from the donor strain (Stassen et al., 1996).
Consomic and especially congenic strains are key resources for the dissection of T1D QTLs. Understanding the role of an individual QTL is often hampered by the complexity of its genetic and phenotypic interactions with other participating QTLs. Considering susceptibility to type 1 diabetes and the well-known disease model for type 1 diabetes, the nondiabetes (NOD) mouse (Makino et al., 1980, Hattori et al., 1986), it turns out that diabetes sensitive strains such as NOD carry not only diabetes sensitivity loci but also QTL loci conferring diabetes resistance. Conversely, diabetes resistant strains such as C57Bl/6 and C3H/HeJ (Rogner et al., 2001) carry loci conferring diabetes susceptibility as well as resistance genes. For example, on mouse chromosome 6 we have identified three loci involved in T1D, termed
Congenic strains are derived by repeated backcrossing of the donor strain to the recipient strain with selection for the differential segment. This breeding is then followed by sister/brother interbreeding of the backcrossed progeny. In practice, female F1 animals are mated with recipient strain males to establish the BC1 generation. If males are used at this stage, the final congenic strain may still carry the donor derived Y-chromosome. Males heterozygous for the selected chromosome region are then repeatedly backcrossed to recipient females (e.g. NOD females) during congenic strain derivation. Congenic strains are then rendered homozygous for the genetic intervals under study by intercrossing heterozygous males and females of the same genotype and subsequently maintained by brother and sister mating. When repeated backcrossing is used to establish a congenic strain a minimum of nine generations of backcrossing is normally recommended to remove 99.9% of the unwanted donor material (Silver, 1995), though the exact number of required backcross generations appears somewhat arbitrary (Festing, 1979). A genome scan should be carried out before fixing the congenic interval so that if further backcrossing is necessary to remove a contaminating genomic fragment this can be carried out before the congenic strain is rendered homozygous. When choosing recombinants to fix genetic intervals we recommend ensuring the highest possible density of markers within the differential fragment (at least 1-2 cM) to avoid partial heterozygosity. A good advice is also to check the interval of interest for eventual heterozygosity once the strains have been fixed because recombinations may have occurred in the parental heterozygous animals.
Alternative congenic breeding schemes have been established that involve both positive selection for the desired differential segment and negative selection against the rest of the donor genome during early backcross generations. In such breeding schemes, which are called 'speed congenics', the genetically 'best' animals, i.e. those carrying the differential segment and minimal detectable donor strain material elsewhere in the genome, are selected. Theoretically the process can lead to the creation of a congenic strain with less than 0.5% contaminating donor genome unlinked to the differential segment within a total of five generations or four backcrosses (Markel et al., 1997). Simulations suggest that screening between 16 and 20 male progeny per generation with markers spaced every 25 cM most efficiently reduces unlinked contaminating donor genome. Use of larger progeny cohorts and higher marker density seems of little advantage in reducing contaminating donor genome until later backcross generations. High-density genotyping of the differential segment in later generations is however necessary to reduce the size of the target region below 20-30 cM (Wakeland et al., 1997). Experience suggests that both 'best' and 'second best' males should routinely be kept for breeding, in particular when poor breeding performance may occur. Simulation studies have suggested that marker-assisted breeding strategies can lead to increased background heterogeneity, or 'gaps', in the recipient genetic background as compared to standard breeding procedures. This suggests that additional backcrossing may still be required in order to reduce the number and length of such gaps (Armstrong et al., 2006). On the assumption of putative remaining gaps, it may be of interest to derive a given set of congenic strains from a single breeding pair and to generate at least one congenic strain carrying no differential fragment as an internal control for phenotyping.
The benefit that can be obtained from a panel of congenic strains is critically dependent on the quality of the phenotyping available, which in turn, obviously depends on the disease under study. The availability of sub-phenotypes for characterisation is often critical to the fine dissection of the trait. Analysis often starts with the most robust and basic phenotype before proceeding to more subtle analysis of sub-phenotypes. Phenotyping employed in autoimmune disorders ranges from histology, evaluation of physiological parameters, to metabolomics and transcriptional profiling.
2.3. Methods to the analysis of type 1 diabetes in the NOD mouse model
The baseline for a systematic analysis of a given NOD congenic strain compared to the original NOD mouse or better also to a NOD control congenic strain is the follow-up of spontaneous diabetes development during a period of about 30 weeks. Overt diabetes starts around 10 to 12 weeks of age. For the monitoring of diabetes, measurement of glucose levels in the urine is usually sufficient. This simple test is in itself not much invasive to the animals, but importantly, the number of animals in the test series should be high enough to allow the detection of small changes in diabetes incidence. At this stage, generally 30 to 50 female mice and sometimes more are required. Using such a high number of animals avoids overestimation of changes occurring due to 'environmental' effects, e.g. the sometimes observed cage-specific effects. Some investigators prefer to keep their congenics together with the NOD controls in the same cage during diabetes testing. Young female animals from different NOD congenic strains usually do not aggress each other when kept in the same cage, but good attention should still be paid to the behaviour of the animals. Overcrowed cages lead to additional stress amongst the animals and this can have an influence on the diabetes incidence. Also, pregnant females should not be included in the testing because hormonal changes influence diabetes development.
Additional monitoring of diabetes incidence may involve an accelerated form of diabetes after injection of drugs such as cyclophosphamide (CY). CY is an alkylating agent that leads to the depletion of regulatory T cells, whereas IFN-gamma producing lymphocytes are CY resistant (Ablamunits et al., 1999). This method allows evaluating diabetes incidence within a much shorter period of 12 weeks only, but is not necessarily appropriate to analyse all T1D associated loci. For example, loci that reveal diabetes protection due to activity of regulatory T-cells may not be discovered because CY eliminates these cells (Rogner et al., 2001). Both methods to monitor T1D incidence can be combined with the evaluation of insulitis by rating the infiltration of the pancreatic islets with immune cells. This histological analysis is mostly performed at 12 weeks of age and usually requires about 10 animals per strain. It can be performed at earlier ages, but much less insulits and peri-insulitis should be expected, and usually no signs of insulitis are found before the age of three weeks.
The systematic evaluation of the number of immune cells (T cells, B cells and other relevant cell types) in different organs (thymus, spleen, lymph nodes, islets, etc.), of the cytokine, insulin and antigen levels can be much helpful to get further insights into the protective mechanism underlying the T1D locus under study. The number of animals used for these tests can often be limited to six, and different tests can be combined at this stage. It is also useful to perform the tests at different ages (4, 8, 12, and 16 weeks) to relate phenotypes to different stages of diabetes development. Finally, extended testing of cell activation or proliferation and adoptive transfer assays are good methods to complete the analysis. Since several examples have shown that T1D diabetes loci overlap with other autoimmune loci, NOD congenic strains may exhibit phenotypes in other organs and tissues than those directly involved in diabetes. One typical and well studied example of this is sialitis (Hjelmervik et al., 2007). In any case, the defined subphenotypes may be very helpful in following up QTLs during genetic dissection as they may vary less than the diabetes incidence.
2.4. Refining the candidate region
2.4.1. Generation of subcongenic strains
The genetic interval conferring a particular phenotype in a given congenic strain can often be reduced and refined by identification of new recombinants during further backcrossing. It becomes increasingly difficult to obtain the necessary recombinants as the genetic distance under study is reduced and then much larger breeding populations are needed. Several studies have shown the existence of sex-specific differences in recombination frequency (Shiroishi et al., 1991, Lynn et al., 2005, Morelli & Cohen, 2005) and it can therefore, on occasions, be useful to change the direction of the cross and use heterozygous females instead of males or
When analysing congenic strains, it has been observed that the phenotypic effects often get smaller as the genetic interval is reduced and subcongenics are generated (Hung et al., 2006). This most often occurs when the original effect was due to the combination of several genes and this may reflect the relatively frequent occurrence of QTLs as haplotype blocks. In other cases, genetic interactions may lead to the suppression of phenotypes when intervals are combined (Rogner et al., 2001). In theses cases, complexity within the genetic interval can normally still be successfully addressed, although it may require larger number of animals and more strains to be phenotyped and studied.
2.4.2. Haplotype analysis
Applied to congenic strain analysis, knowledge of the haplotype block structure within a congenic interval may focus interest on a particular subregion within the congenic candidate region (Figure 2). There are several examples in the literature that have demonstrated the value of this type of combined approach (Ikegami et al., 1995, Lyons et al., 2000, Hillebrandt et al., 2005). Haplotype mapping in the NOD mouse appears however less powerful as only one sensitive strain can be compared to the panel of diabetes resistant strains.
In whatever way
Comparison of the SNP distribution within the candidate interval between disease sensitive (NOD, chromosome represented by a white bar) and resistant (black segments) strains and generation of subcongenic strains may allow the initial candidate region to be reduced. The reduced interval is indicated by two black lines.
2.5. Transcriptional profiling
The use of expression profiling for candidate gene identification is based around the idea that in many cases the QTL will reflect also quantitative changes in the expression of the underlying gene(s). This approach has proven to be particularly successful in cases where the phenotype of the disease under study has provided clues as to the class(es) of genes or to the tissues in which the candidate gene is likely to be expressed. Annotated lists of genes for the region under study obtained from e.g. Ensembl, MGI or NCBI databases, including
A recent Genome-Wide Association Study (GWAS) has defined over 150 genomic regions containing variation predisposing to immune-mediated disease. The results provide evidence that for many of the complex diseases common genetic associations implicate regions encoding proteins that physically interact (Rossin et al., 2011). This type of analysis demonstrates how bioinformatics tools can be applied to the identification of gene networks in complex diseases.
2.6. Resequencing and search for polymorphisms
Once a candidate region has been defined and characterized, the crucial difficulty of identifying and validating the causative gene(s) arises. It is important to realise that there is rarely one single approach but rather a spectrum of complementarity approaches that can be used to identify without ambiguity the gene(s) underlying a QTL.
The identification of changes in primary nucleotide sequence, which is powerfully diagnostic in the case of mutations in monogenic disorders, is of much less certain value in the case of QTL characterisation. Nonsense or stop codons that completely abolish gene function are much less likely to underlie QTL variation than in mutations affecting monogenic traits, and it is often unknown whether to expect changes in coding sequences or in non-coding regulatory sequences. Extensive nucleotide variation may also occur outside of the exon sequences of some genes, which renders the identification of causal polymorphisms in such regions more difficult or impossible. This indeterminacy may moreover be compounded if the causal gene lies within a region, which shows a high degree of polymorphism between the parental strains. In such cases the polymorphism within the genomic sequence of the causal gene will likely be no more marked than that of the surrounding genes. Such regions of elevated polymorphism are to be found throughout the mouse genome and reflect the breeding history of the mouse inbred strains (see above). Such caveats suggest that resequencing of entire regions in the donor and recipient strains is an approach that can help to exclude a certain proportion of candidate genes rather than lead to the unambiguous identification of the responsible gene. The researcher can already benefit from the ongoing resequencing projects that include the NOD mouse.
2.7. Further validation: overlap with other autoimmune loci and interspecies comparison
The genome-scale analysis in type 1 diabetes has resulted in a number of non-major histocompatibility complex loci of varying levels of statistical significance. Comparative analysis of the position of loci for type 1 diabetes with candidate loci from other autoimmune diseases has shown considerable overlap (Becker et al., 1998). This supports a hypothesis that the underlying genetic susceptibility to type 1 diabetes may be shared with other clinically distinct autoimmune diseases such as systemic lupus erythemastosus, multiple sclerosis, and Crohn's Disease. This is less surprising when looking at the cellular and molecular levels that indicate similar mechanisms and pathways underpinning the diseases. An interesting way to further analyse T1D regions of interest is therefore to study their link to other autoimmune diseases (Jiang et al., 2009). For example, several loci involved peripheral neuropathy overlap with T1D loci in the NOD mouse. The MGI database lists phenotypes that have already been linked to genomic regions. The T1D database provides directly all known locations and genes in human and in mouse. Another approach is to analyse the syntenic regions in human and in rat. The Rat Genome database provides a tool to visualize syntenic regions for all three species. The Davis Human/Mouse Homology Map shows syntenic regions between mouse and human.
2.8. Gene manipulation in the NOD mouse
Neither the identification of sequence variation nor of altered expression profiles is of itself sufficient to establish causality. For this, techniques of gene inactivation, gene overexpression and replacement of the allele of one strain by that of the other strain, need to be undertaken. Ongoing programmes for inactivation or mutating all mouse genes will increasingly provide embryonic stem (ES) cells carrying a knockout for the candidate genes under study. Interestingly, some of the constructs used in the global knockout programmes may facilitate application of a knockin strategy, which would enable the integration of alternative functional allelic forms of the gene to be undertaken (Garcia-Otin & Guillou, 2006). Database links that provide information about existing knockouts and mutants are listed below. Exploiting such resources for QTL validation may in many cases be complicated by the need to cross the knockout onto the relevant genetic background. Direct gene targeting and deriving mouse models on the T1D background are however possible since a germline competent embryonic stem cell line from the nonobese diabetic mouse has been established (Nichols et al., 2009).
An alternative strategy to gene disabling is the use of small interfering RNA (siRNA) to inhibit/knock-down gene expression. RNA interference (RNAi) is a highly evolutionary conserved process of post-transcriptional gene silencing (PTGS) in which double stranded RNA (dsRNA), when introduced into a cell, causes sequence-specific degradation of homologous mRNA sequences (Fire et al., 1998). Double-stranded RNAs of around 21 nucleotides in length inhibit the expression of specific genes (Hasuwa et al., 2002, Xia et al., 2002, Qin et al., 2003). Whilst it has turned out that RNA silencing can be causative in disease development, it also provides a useful research tool in type 1 diabetes (Kissler et al., 2006). The RNAi WEB database provides a good summary of such studies and excellent practical advices. The long-term and stable inhibition of gene function normally necessary for assessing the effect of a QTL implies that delivery systems capable of supporting stable and persistent expression
Overexpression studies provide an alternative if less stringent way of either further reducing the size of the candidate region or to validate formally candidate genes, when the trait under consideration is dominantly or co-dominantly expressed (Symula et al., 1999, Giraldo & Montoliu, 2001). Such approaches are often carried out using BAC clones, which may be one to several hundred kilobases in size and therefore allow the gene to be tested along with many of the cis-acting sequences necessary for its regulated expression. Such studies are being facilitated by the construction of BAC libraries for many mouse strains other than C57Bl/6 and 129/Sv (Osoegawa et al., 2000). The fingerprinting of these libraries, the ready availability of both mouse genomic draft and finished sequences (Waterston et al., 2002) and of strain resequencing programmes, allow the DNA hybridisation probes necessary for BAC isolation to be easily designed. Although BAC transgenesis is an efficient process, it should be noted that both copy number variation and variation in site of integration in the genome leading to position effects may lead to modification in gene expression profiles which might hamper or obscure the identification of the gene where subtle phenotypes are concerned.
It should be noted that less laborious but also less complete approaches include the transfection of particular cell types prior to adoptive transfer experiments and
2.9. Identification of new candidate genes using NOD congenic strains
The use of congenic strains has proven to be a successful approach to the identification of several T1D candidate genes. Amongst non-major Histocompatibility Complex (MHC)
Our own research has focused on understanding the
|Available mouse models||http://jaxmice.jax.org/index.html|
|Center of Rodent Genetics||http://www.niehs.nih.gov/research/resources/collab/crg|
|Complex Trait Consortium||http://www.complextrait.org/|
|Gene expression data||http://www.informatics.jax.org/menus/expression_menu.shtml|
|Knockout mouse project||http://www.nih.gov/science/models/mouse/knockout/index.html|
|Mouse Genome Informatics database (MGI)||http://www.informatics.jax.org|
|Mouse Phenome Database||http://www.jax.org/phenome|
|Mouse sequence databases||http://www.ncbi.nlm.nih.gov|
|Online books on mouse genetics and human molecular genetics||http://www.informatics.jax.org/silver/index.shtml|
|Online Mendelian Inheritance in Man, OMIM||http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM|
|Rat Genome Database||http://rgd.mcw.edu|
The table lists useful links to existing databases and to other tools.
Despite the past 75 million years of separate evolution, only about 300 genes, corresponding to 1 percent of the 25000 - 30,000 genes in the mouse genome, were found to be without a counterpart in the human genome. (Marshall, 2002, Okazaki et al., 2002). This leads to the idea that in the majority of cases the physiology of processes in man and mouse will be similar or identical. Most studies of mouse models of human disease are applied on this basis. Diseases under monogenic control provided often support for this assumption (Villasenor et al., 2005). When mutations in a given gene failed to produce the same phenotype in human and mouse, the differences were mainly imputable to differences in physiology, to subtle differences in gene regulation, epigenetic factors, or differences in the specific mutation itself rather than to the complete absence of the gene. The predictive value of QTLs identified in one species for the other is generally considerably weaker. This is almost certainly due to the genetic heterogeneity underlying most complex traits, to differences in penetrance, and to differences in the range of naturally occurring variation at a given locus in man and mouse. This however does not necessarily imply that the underlying genetic networks are highly divergent and indeed in type 1 diabetes this is clearly not the case. Indeed the identification of causative genes for the autoimmune disease type 1 diabetes (T1D) in humans and in the NOD mouse has shown that susceptibility or resistance to type 1 diabetes, involving genes and pathways contributing to the disease are often held in common by both species. For example, gene variants for the interacting molecules IL2 and IL-2R-α (CD25), which are members of a same pathway that is essential for immune homeostasis, are present in both mice and humans (Wicker et al., 2005). In this context identifying the mouse genes involved in T1D and consolidating our knowledge of the pathway underlying the pathogenesis will identify novel genes, which can be studied for their implication in the human disease.
I thank Philip Avner and Christian Boitard for scientific discussion, and the ANR, the ARD, the EFSD/JDRF/Novo Nordisk Programme, the CNRS, INSERM and the Pasteur Institute for funding of our research.