Type 1 diabetes (T1D) is an autoimmune disease characterized by immune destruction of insulin-producing pancreatic β cells. This leads to dysfunctional regulation of blood glucose levels in T1D patients. The destruction of β cells of Langerhans islets is caused by infiltration of dendritic cells, macrophages and T lymphocytes. The destruction of β cells starts with an autoimmune process that is followed by massive destruction of β cells later on. Autoantibodies against T1D-specific antigens are present in serum and can be detected in the early stage of the disease (Ounisis-Benkalha & Polychronakos, 2008). There are several main types of T1D autoantibodies: islet antibodies, antibodies to insulin (IAA), glutamic acid decarboxilase (GADA) and tyrosine phosphatise IA-2. In the last few years antibodies to zinc transporter (ZnT8) have been added to this group (Mehers & Gillespie, 2008). It is generally accepted that T1D occurs as a result of genetic and environmental factors when presence of many alleles combined with effects of numerous environmental factors lead to disease development (Pociot et al., 2010). Research of T1D genetic basis and environmental factors has increased dramatically in the last two decades. Today it is considered that beside
2. Genetic studies
There are two main approaches in dissecting T1D genetic background: linkage analysis and association analysis (Figure 1). Linkage analysis is based on simple Mendelian inheritance and it uses affected relatives (typically siblings) to identify regions on chromosomes that are shared more frequently than expected by chance. Since affected siblings are relatively rare in T1D, linkage studies have been performed in somewhat unique subgroup of families with T1D (Concannon et al., 2009). In general, samples are genotyped for a modestly dense panel of markers, typically microsatellites, to search for linked alleles i.e. alleles that are inherited together. Regions in the genome with accumulating evidence of linkage are further fine mapped, which means that additional markers are typed in the same chromosomal region,
in order to narrow down regions associated with disease. Linkage analyses are most effective in identifying rare alleles with large effect sizes (Figure 2) (Concannon et al., 2009). On the other hand, common alleles with modest and small effect sizes can be identified through association analysis. Association studies test for association of genotyped marker (typically single nucleotide polymorphism, SNP) with the disease of interest in a case-control or family-based sample. These studies rely on the assumption that investigated allele is associated with disease if it differs in frequency between two investigated groups of individuals. Tested polymorphism is usually not the causative one but it will show an association if it is in linkage disequilibrium (LD) with an unknown causative, risk or protective, variant. Human genome is divided into regions of high and low LD, and if allele resides in the region of high LD that means that many SNPs from the same region will be inherited together and therefore reflect one another. This means that genotyping of only limited, but carefully selected, set of SNPs can actually capture a majority of information within tested gene region (Lander, 2011).
For the last decade the most common design of association analysis used to be candidate gene approach that searches for differences in allele frequencies in specifically selected genes between affected and healthy groups of individuals or affected subjects and their parents. There are few general limits of candidate gene studies that include modest sample sizes, limited number of investigated variants, the fact that selection of genes/variants is often based on inadequate understanding of biological pathways and, most importantly, observed associations are usually difficult to replicate (Manolio et al., 2009). However, just few years ago a complete dominance in association analyses design was taken by genome-wide association studies (GWAS) approach. These are hypothesis free studies that usually test between 300,000 up to 1 million directly genotyped SNPs that capture substantial proportion of common genetic variation of the genome (McCarthy et al. 2008). The methodology behind the GWAS is the same as in any association study and that is to map susceptibility variants through identification of associations between allele (genotype) frequency and disease status (McCarthy et al. 2008). The development of both, high-throughput genotyping platforms and a catalogue of human variation by International HapMap Project (http://hapmap.ncbi.nlm.nih.gov/) and The 1000 Genomes project (http://www.1000genomes.org/) have made possible high utilization of GWAS. In addition, development of imputation methods that infer missing genetic variants enabled inclusion and comparison of different GWAS in large-scale meta-analysis framework (Marchini & Howie, 2010). Also, huge collaborative international projects such as The Type 1 Diabetes Genetics Consortium (T1DGC) (https://www.t1dgc.org/home.cfm) have put efforts to collect and systemise data from several thousand T1D affected and healthy individuals worldwide in order to identify genes contributing to an individual’s risk for T1D susceptibility. Overall, the combination of linkage, association and large-scale GWAS approaches provided evidence of genetic contribution of many common and rare alleles with wide range of effect sizes to the T1D development.
2.1. Linkage analysis approach
Genetic linkage studies have shown the biggest success in discovering genetic loci underlying monogenic disorders where risk factors, even rare in frequency, have large effects and often lead to change in amino acid sequence (Smith & Newton-Cheh 2009). In complex diseases, such as T1D, situation is less straightforward since there are loci of small, modest and large effect sizes contributing to the disease (Concannon et al., 2009, Kere, 2010). Several linkage analyses of T1D provided evidence for linkage between the
2.2. Association analysis approach
2.2.1. Case–control design
Case-control design is one of the most common association study designs. Case-control study compares two groups of individuals, one with the disease (cases) and the other without disease (controls). It is assumed that cases have higher prevalence of susceptibility alleles for disease of interest than controls and that susceptibility alleles can be detected through direct comparison of allele frequencies between two groups (McCarthy et al. 2008). A lot of attention is given to case ascertainment to minimize phenotypic heterogeneity. In addition, study power can be improved by selection of cases, for example to those with the extremes of phenotypes (McCarthy et al. 2008). Since it is observed that incident rates of T1D highly increase in the very young group of children, an early age of disease onset could be an example of extreme T1D phenotype and enrichment of those cases in the sample set is likely to improve power (McCarthy et al. 2008, Maahs et al. 2010).
In genetic epidemiological studies a lot of attention is given to control selection. Controls are matched with cases by ethnicity to avoid problems of population stratification that may result with spurious associations (false positives). This means that controls are selected from the same population, preferably from the same region, as cases (Zondervan & Cardon 2007). Nowadays, with the genome-wide data, it is possible to estimate the level of relatedness among individuals and, also, the matching of cases and controls by ancestry (Anderson et al. 2010). Principal component analysis is one of the most common methods that enables clustering of individuals by ancestry (Figure 3). To further reduce stratification within the sample set, controls can also be matched to cases by age, sex and environmental factors. Usually, association analyses are adjusted for covariates with strong impact on phenotype to reduce non-genetic contribution to phenotype variation (Smith & Newton-Cheh 2009).
2.2.2. Family-based design
Family-based association studies, most commonly in the form of parent–offspring trios, use another analytical approach to test for association. To make assumptions on association with the disease these studies examine the transmission of alleles from heterozygous parents to affected offspring that is observed more frequently than expected by chance (Smith & Newton-Cheh 2009). Since these studies are conducted within families they offer a protection from population stratification but they also rely on informative parent–offspring trios which usually reduce the effective sample size, thus power as well (McCarthy et al. 2008, Smith & Newton-Cheh 2009). Family-based studies are particularly useful in finding variants underlying relatively rare phenotypes that segregate within families. Also, these studies have advantages when age of disease onset is low, as in the case of T1D, because it enables easier collection of many family members (Smith & Newton-Cheh 2009).
2.2.3. Genome-wide association studies (GWAS) and meta-analyses
Rationale underlying GWAS is the ‘common disease, common variant’ hypothesis. It is believed that both, common and rare variants, contribute to complex disease risk. However, GWAS are generally powered only to detect association of common variants (allelic variants present in more than 5% of the population) with modest to large effect sizes (Manolio et al., 2009). GWAS are not designed to identify multiple rare mutations within a gene (Kere, 2010).
Because of the modest sample sizes individual GWAS have limited power to detect all associations underlying complex diseases. Increase in sample size achieved by combination of statistical evidence of individual studies through meta-analysis approach can improve study power and raise a discovery of susceptibility loci. Nowdays, a majority of new genetic findings underlying complex traits are found through meta-analysis approach. Since different studies differ in design, sample collection, genotyping platform and analysis methodology, one of the most important prerequisites for meta-analysis is capability to uniform study results. Most genotyping platforms have different representation of genetic markers and harmonisation of studies through expansion of SNP coverage can be achieved by imputation processes. Imputation infers and fills in missing genotypes on the basis of HapMap, the 1000 Genomes or other reference panels to allow different studies to analyse the same set of common SNPs (de Bakker et al., 2010). The biggest meta-analysis for T1D combined results from two studies and included total of 7,514 cases and 9,045 reference samples. This study identified another 18 regions associated with T1D that suggested novel candidate genes such as
2.3. Other types of genetic research
There are many other types of genetic research that contribute to understanding of complex disease mechanisms. Many of these studies use knowledge on susceptible genetic variation accumulated through linkage and association studies. Identified variants are usually additionally analysed for gene-gene and gene-environment interactions. Also, it is well known that genes interact through complex molecular networks and integrating the prior knowledge of biological pathways of genes of interest may increase a chance to find genes involved in disease development. These pathway-based analyses use different software packages that search through variety of web-based databases and take into account the existing data on biological pathways of investigated genes (Wang et al., 2010). Cross-disorder overlap is another search that looks for evidence of potential overlapping regions of the genome affecting various different diseases, such as T1D and other autoimmune traits (Eyre et al. 2010). All these supplementary analyses help in elucidating genetic contribution in complex disease development.
There are functional studies that also use data derived from genetic analyses. Most commonly performed ones are gene expression analyses that may investigate susceptible genes/gene variants in different tissues or investigate them under different environmental stimuli. Combining the information on gene expression profiles and alternate splicing sites across a range of human tissues together with genetic mapping for the same samples will be valuable in deciphering the roles of genetic variants (McCarthy et al., 2008). Genetical genomics analysis also offers new means in understanding the genetic architecture of gene expression (Cui et al., 2010)
2.4. Finding the missing heritability
GWAS have identified more than 50 genetic variants associated with T1D. However, just like in most other complex traits, associated variants explain only a small proportion of heritability of T1D (λs~5, whereas it is estimated to be 15) and have rather small effect on disease risk (Clayton, 2009). The remaining missing heritability can be explained in several different ways such as an influence of much larger number of common variants of smaller effect sizes that still need to be identified, an influence of rare variants of modest and small effect sizes that have not yet been discovered because of their underrepresentation in the current genotyping platforms and because of underpowered sample sizes, an influence of structural variants that are also poorly captured by existing platforms and generally low power to detect gene–gene and gene-environment interactions (Manolio et al., 2009). Sample size is generally one of the major limiting factors for discovery of common alleles with small effect sizes. Augmenting the number of investigated individuals through meta-analysis approach to more than tens of thousands of individuals is another way for discovery of new genetic loci (Lander, 2011). On the other hand sequencing is the best way for discovering rare and structural variants such as copy number variants, inversions, translocations, microsatellite repeat expansions, insertions of new sequence and complex rearrangements (Manolio et al., 2009). Because of immense decrease in price, sequencing is becoming a common practice and next generation sequencing (exon or whole-genome sequencing) might provide many clues for missing heritability. The 1000 Genomes Project (http://www.1000genomes.org/) aims to provide a complete catalogue of human genome sequence variation and the pilot phase of the project already identified around 15 million SNPs, 1 million short insertions and deletions and 20,000 structural variants (The 1000 Genomes Project Consortium et al., 2010). Most of these variants were previously unknown and will provide a foundation for future genetic research of human diseases, including T1D. Also, sequencing of individuals with extreme phenotypes, for example individuals with the extreme age of T1D diagnosis, might provide important findings because it is thought that they carry more deleterious, loss-of-function mutations (Romeo et al., 2007).
It is also thought that some of the missing heritability might be discovered by conducting studies in populations of non European ancestry. Most genetic studies have been limited to European populations even it is known that genetic variation is highest in the populations of recent African ancestry. These studies might prove useful in detecting rare variants associations and narrowing down associated regions due to smaller LD windows (International HapMap Consortium et al., 2007). Family studies and isolated populations are another sample sets that might help in identifying missing heritability due to their enrichment of unique genetic variants (Sabatti et al., 2009).
2.5. Prevention, diagnostics and clinical application of genetic findings
Genetic research of complex diseases aims to improve understanding of biological and physiological pathways involved in disease etiology. The main goal is integration of newly accumulated knowledge with clinical practice by development of more effective means of diagnosis, prevention, treatment and response to therapies. Identification of predictive variants for considerable proportion of disease, even with identification of many other risk variants with smaller effect sizes, is very challenging (Manolio et al., 2009). The biggest influence for T1D development is carried by
3. Genetic background of T1D
3.1. Rare monogenetic forms of T1D
Very rare form of autoimmune diabetes is monogenetic diabetes, which means that it is caused by mutation of a single gene. In such cases, diabetes occurs as part of multiple set of autoimmune diseases. One of them is known as the immune dysregulation, polyendocrinopathy, enteropathy, X-linked (IPEX) syndrome in which a function of regulatory T cells is impaired (van Belle et al., 2011). It occurs as a result of mutations of
3.2. Family history of T1D
Over 85% of T1D patients do not have positive family history of T1D, however there is a 6% of disease clustering among siblings. Siblings have 15 times greater chance of developing T1D in comparison with the general population which gives strong evidence of the genetic background of this disease. The pattern of inheritance seems very complicated, and disease development further depends on the triggers from the environment. Long-term monitoring showed that the concordance rate of inheritance is greater than 50% in monozygous twins, while it is 6-10% in dizygous twins, which is similar to that of siblings. Interestingly, the siblings who share both identical haplotypes of
First and consistent evidence of
genes that are in high LD. Therefore, it is difficult to determine which gene gives the observed effect. It is considered that haplotypes of high risk for T1D are
3.4. Other non-
Linkage analyses additionally pointed to linkage of some non-
The first strong non-
More recently in 2004,
Another strongly associated gene,
Since 2001 a significant number of GWA studies have been reported. Data from The International Type 1 Diabetes Genetics Consortium (T1DGC) collected through multiple genome-wide association studies are available to the scientific community by request. Recently, GWAS and large scale meta-analyses identified more than 40 loci that affect the risk of developing T1D (Table 1.) (Barrett et al., 2009, Pociot et al., 2010). The analysis included 7.514 cases and 9.045 control samples. Fifteen of these regions have been previously reported as regions associated with T1D susceptibility. Eighteen additional regions showed significant association with T1D and several of them contain new candidate genes of possible relevance to T1D (
|SNP||Chromosome||OR minor allele||Gene of interest|
|rs4900384||14q32.2||1.09||(0; gene desert)|
Additional functional studies provided evidence of causality of several genes within established loci, such as several cytokines and their receptors (
There are more than 300 candidate genes that are in LD with T1D associated genetic regions. Also, it has been shown that at least 10 T1D associated regions do not contain a functional candidate gene which suggests that distant, long-range gene regulation might underly some of the observed associations. The main focus of current research is to identify causal risk genes and to understand how they influence the disease (Todd, 2010; Pociot et al., 2010). T1DGC is involved in the research of many autoimmune diseases since it is believed that many of them share common genetic background. A genotyping assay called ImmunoChip, that includes ~200 000 SNPs that are expected to be involved or were previously associated with immune reactions, was developed in order to disentangle the genetic background of various autoimmune diseases including T1D (Pociot et al., 2010).
3.5. Genetic markers in prediction and prevention of T1D
Recently, several population studies attempted to stratify children at birth according to their predisposition for T1D development by examining their
There is a significant increase in the incidence of T1D in the last 50 years that is mainly explained by changes in the environment. It is believed that environmental factors can affect epigenetic mechanisms of candidate genes expression and development of T1D. Epigenetic mechanisms encouraged with environmental factors can cause identical genotypes to exhibit different phenotypes. The proposed environmental factors that can trigger an autoimmune process involve nutrition and viruses. Nutrients that may trigger epigenetic mechanisms are considered to be substances that provide a methyl group (methionine, choline) or cofactors (folic acid, vitamin B12 and pyridoxal phosphate) required for DNA and histone methylation (Hewagama & Richardson, 2009). Actually, there are three ways in which phenotype can be altered by epigenetic modifications of gene expression: methylation of DNA, histone modification or activation of micro-RNA. It is well known that silencing of gene expression can be achieved by methylation of cytosine in CpG dinucleotides. Acetylation, methylation, phosphorilation and ubiquitination of histones modify the chromatin conformation, which can stimulate or silent gene expression. MicroRNAs bind to mRNA causing degradation before the translation in protein. These mechanisms that alter gene expression may influence development and function of immune system, as well as development, function and recovery of pancreatic β cells. Differentiation of T-helper cells is regulated by a complex epigenetic control. Critical epigenetic process in T-helper cells differentiation is DNA methylation, which can affect the expression of specific cytokines (interferons, interleukins) and encourage autoreactivity. Development, function and regeneration of pancreatic β cells largely depend on the genetic profile that will be expressed. Progressive decline of pancreatic β cells in type 1 and type 2 diabetes is strongly associated with the expression of genes responsible for the development and function of β cells. It is shown that the activity of the insulin gene is dependent on mechanisms of histone acetylation and methylation. It was also shown that the blood glucose concentration can affect the activity of enzymes that regulate the methylation process, but this seems to be associated with type 2 diabetes (MacFarlane et al., 2009).
Numerous environmental factors are implicated in T1D disease development in genetically susceptible individuals. Many of these factors act in uterine life, infancy and early childhood and are namely associated with viral infections and diet (Norris, 2010, Roivainen & Klingel, 2010).
Viral infections are considered to be the major environmental factors predisposing to T1D. Rotaviruses, adenoviruses, retroviruses, reoviruses, cytomegalovirus, Epstein-Barr virus, mumps virus and rubella virus are the ones that have been implicated in T1D pathogenesis but the most risk ones are human enteroviral (HEV) intestinal infections. Coxsackievirus and echovirus serotypes of HEV infections, are highly cytolitic and can cause β cell cytolysis and activate innate and adaptive immune system but can also activate autoreactive T cells (Roivainen & Klingel, 2010). A recent study examined autoimmune microbiome for T1D and came out with conclusion that microbiomes of healthy children differ to those of children that develop T1D later in their lives. This means that microbiome could be used as bacterial marker for the early T1D diagnosis. Also, the “healthy” microbiome could be used in the prevention of T1D development in children at genetically high risk of developing disease (Giongo et al. 2011).
The maternal diet during pregnancy, such as vegetable and vitamin D consumption, and wheat, cow’s milk and omega-3 fatty acids early exposures in life are speculated to have a role in the aetiology of the disease. Introduction of cow’s milk and meat prior 6 months of age have shown to have risk effects. Likewise, cereal, gluten or wheat antigens may cause an aberrant response in developing immune system. Some other factors may have protective effects against T1D development, such as early introduction of vegetable oil and high omega-3 fatty acid intake (Norris, 2010). Huge international collaborative effort, The Environmental Determinants of Diabetes in the Young (TEDDY) (http://teddy.epi.usf.edu/), was developed with the aim of identifying environmental factors that modify risk for T1D (TEDDY, 2008).
It is generally considered that environmental and behavioural factors have stronger effects on disease development than genetic loci itself, but it is very hard to accurately identify and measure them. The largest effects are expected from gene and environment interaction in individuals that are genetically at high-risk for disease development (Clayton, 2009).
The main genetic predisposition for developing T1D comes from the HLA region. There are currently identified additional ~50 non-HLA loci that predispose to risk of T1D. It is expected that the remaining T1D susceptible loci will be explained by additional common and rare genetic variants, structural polymorphisms, gene-gene and gene-environment interactions and epigenetic events. A role of associated genes and their protein products in disease aetiology is under intense investigation but the candidacy of many loci implicate to the combined effect of adaptive and innate immune action in insulin-producing β cell destruction. Further genetic studies performed on much bigger datasets comprising tens of thousands of individuals, detailed genetic mapping, genotype-phenotype correlation studies and other functional studies will be crucial in deciphering a complete genetic architecture of T1D and understanding the disease mechanisms. The main goal of genetic research is to link research findings with advances in therapy such as screening of individuals and implementation of preventive measures to those with high genetic predisposition to T1D and development of new, more efficient treatments and therapies. Detection of major pathways in the development of T1D opens up new therapeutic targets, development of more efficient treatments and individual approaches to patients.
We would like to thank Marina Pehlić, MD and Ante Kokan for their help in preparation of figures.