T1D susceptibility loci identified to date.
The prevalence of diabetes is increasing worldwide and to date it impacts the lives of approximately 200 million people (Steyn et al., 2009). It is estimated that by 2030, there will be 439 million adults affected by diabetes (International Diabetes Federation/diabetes prevalence: www.idf.org). Type 1 diabetes (T1D) represents approximately 10% of these patients and is most prevalent in populations of European ancestry, where there is ample evidence of increased annual incidence during the past five decades (Onkamo et al., 1999; EURODIAB ACE Study Group, 2000).
T1D is a complex trait that results from the interplay between environmental and genetic factors. Much evidence supports a strong genetic component associated with T1D. The epidemiological data showing differences in geographic prevalence is one clear indicator, with populations of European ancestry having the highest presentation rate. T1D has high concordance among monozygotic twins (33 to 42%) (Redondo et al., 2001) and runs strongly in families with sibling risk being approximately 10 times greater than in the general population (Clayton, 2009); this is in clear contrast to the “less genetic” type 2 diabetes, where the sibling risk ratio is relatively modest at 3.5 (Rich, 1990).
T1D develops at all ages and occurs through the autoimmune destruction of pancreatic β-cells with resulting lack of insulin production. The immune system participates in β-cell destruction through several of its components including natural killer (NK) cells, B lymphocytes, macrophages, dendritic cells (DC), and antigen-presenting cells (APCs). Studies in human and animal models have shown that both innate and adaptive immune responses participate in disease pathogenesis, possibly reflecting the multifactorial nature of this autoimmune disorder.
In this review, we provide an update on genome-wide association studies (GWAS) discoveries to date and discuss the latest associated regions added to the growing repertoire of gene networks predisposing to T1D.
2. Genetic component in Type 1 diabetes
2.1. Before genome-wide association studies
Historically, prior to GWAS, only six loci had been fully established to be associated with T1D. The human leukocyte antigen (HLA) region on chromosome 6p21 was the first known candidate to be strongly associated with T1D in 1970s (Singal & Blajchman, 1973; Nerup et al., 1974; Cudworth & Woodrow, 1975). This cluster of homologous cell-surface proteins is divided into class I (A, B, C) and class II (DP, DQ, RD). The HLA genes encode highly polymorphic proteins, which are essential in self versus non-self immune recognition. The class I molecules are ubiquitously expressed and present intracellular antigen to CD8+ T cells. Class II molecules are expressed mainly on professional APCs: DCs, macrophages, B-lymphocytes and thymus epithelium. Class II molecules are composed of A and B chains, and present antigens to CD4+ T cells, which promote inflammation by secreting cytokines upon recognition of their specific targets. Approximately half of the genetic risk for T1D is conferred by the genomic region harboring the HLA class II genes primarily HLA-DRB1, -DQA1 and -DQB1 genes). In 1984, insulin (INS) gene encoded on chromosome 11p15 was identified as second loci linked with T1D (Bell et al., 1984). In 1996, the cytotoxic T-lymphocyte-associated protein 4 (CTLA4) gene encoded on chromosome 2q33 was recognized as third loci (Nistico et al., 1996). In 2004, a protein tyrosine phosphatase, non-receptor type 22 (PTPN22) gene encoded on chromosome 1p13, was found to be associated with susceptibility to T1D in another case-control study (Bottini et al., 2004). Vella et al., 2005 reported interleukin 2 receptor alpha (IL2RA) gene as fifth T1D loci on chromosome 10p15. In 2006, Smyth et al. identified the interferon-induced with helicase C domain 1 (IFIH1) gene on chromosome 2q24.3 as the sixth gene to be strongly associated with T1D.
2.2. GWAS of T1D
The advent of GWAS in the mid-2000s has changed the situation dramatically, increasing the pace and efficiency of discovery for the T1D associated loci, by a factor of ten. The critical platform for this work was laid by the HapMap project (International HapMap Consortium, 2003, 2005). The GWAS approach has been made possible by the development of high-density genotyping arrays. The genome is laid out in discrete linkage disequilibrium (LD) blocks with limited haplotype diversity within each of these blocks. Therefore, a minimal set of single nucleotide polymorphisms (SNPs) can detect almost all common haplotypes present, thus improving genotyping accuracy and reducing cost.
The first full-scale GWAS for T1D were published in 2007 by our group (Hakonarson et al., 2007) and The Wellcome Trust Case-Control Consortium (WTCCC, 2007). We examined a large pediatric cohort of European descent using the Illumina HumanHap 550 BeadChip platform. The design involved 561 cases, 1,143 controls and 467 triads in the discovery stage, followed by a replication effort in 939 nuclear families. In addition to finding the “usual” suspects, including an impressive 392 SNPs capturing the very strong association across the major histocompatibility complex (MHC), we identified significant association with variation at the KIAA0350 gene, which we replicated in an additional cohort. The WTCCC study investigated seven common complex diseases including T1D by genotyping 2,000 cases and 3,000 controls with ~500,000 SNPs using the Affymetrix GeneChip, and reported a number of novel T1D loci, including the KIAA0350 genomic region (WTCCC, 2007). Todd et al., 2007 confirmed these findings, using 4,000 cases, 5,000 controls and 3,000 T1D families as well as association reported in the WTCCC study to the 12q13 region. In a separate effort we fast-tracked 24 SNPs at 23 distinct loci from our original study and established association to the 12q13 region with a combined
2.3. Meta-analyses of T1D GWAS datasets
In order to get the most from GWAS and to increase the statistical power, several independent research groups carried out meta-analyses using datasets from different investigative groups. Cooper et al., 2008 performed the first meta-analysis by using T1D datasets from the WTCCC, 2007 and the Genetics of Kidneys in Diabetes (GoKind) study (Mueller et al, 2006; Manolio et al., 2007), and confirmed associations for PTPN22, CTLA4, MHC, IL2RA, 12q13, 12q24, CLEC16A and PTPN2. The SNPs with lowest nominal
Barrett et al., 2009 meta-analysis uncovered in excess of 40 loci, including 18 novel regions, plus they confirmed a number of previously reported (Smyth et al., 2008; Fung et al., 2009; Cooper et al., 2009). The study included samples from WTCCC, 20070, the GoKind study (Mueller et al., 2006) and controls and family sets from Type 1 Diabetes Genetics Consortium (T1DGC). The meta-analysis observed association to 1q32.1 (which harbors the immunoregulatory interleukin genes IL10, IL19 and IL20), 9p24.2 contains only Glis family zinc finger protein 3 (GLIS3; first suggested by us in Grant et al., 2009), 12p13.31 which harbors a number of immunoregulatory genes including CD69 and 16p11.2 harboring IL27. These findings were further supported by our
To identify additional genetic loci for T1D susceptibility, we examined associations in the largest meta-analysis to date between the disease and ~2.54 million genotyped and imputed SNPs in a combined cohort of 9,934 cases and 16,956 controls (Bradfield et al., 2011). Targeted follow-up of 53 SNPs in 1,120 affected trios uncovered three new loci associated with T1D that reached genome wide significance. The most significantly associated SNP (rs539514,
|Hakonarson et al., 2007||467 trios,|
561 cases, 1,143 controls
|2,350 individuals in 549 families;|
HLA-DQA2, CLEC16A, INS, PTPN22
|WTCCC 2007||1,963 cases, 2,938 controls||see Todd et al., 2007||European, British||GWAS||HLA-DRB1, INS, CTLA4, PTPN22, IL2RA, IFIH1, PPARG, KCNJ11, TCF7L2|
|Todd et al., 2007||see WTCCC 2007||2997 trios, 4,000 cases, 5,000 controls||European British||GWAS||PHTF1-PTPN22, ERBB3, CLEC16A, C12orf30|
|Hakonarson et al., 2008||467 trios, 561 cases, 1,143 controls||549 families, 364 trios||European ancestry||GWAS||SUOX - IKZF4|
|Concannon et al., 2008||2,496 families||2,214 trios, 7,721 cases, 9,679 controls||European ancestry||GWAS||INS, IFIH1, CLEC16A, UBASH3A|
|Cooper et al., 2008||3,561 cases, 4,646 controls||6,225 cases, 6,946 controls, 3,064 trios||European ancestry||GWAS meta-analysis||PTPN22, CTLA4, HLA, IL2RA, ERRB3, C12orf30,|
|Grant et al., 2009||563 cases, 1,146 controls, 483 case-parents trios||636 families, 3,303 cases, 4,673 controls||European ancestry||GWAS||EDG7, BACH2, GLIS3, UBASH3A, RASGRP1|
|Awata et al., 2009||735 cases,|
|-||Japanese||TaqMan genotyping||ERBB3, CLEC16A|
|Zoledziewska et al., 2009||1037 cases, 1706 controls||-||European, Sardinian||TaqMan genotyping||CLEC16A|
|Fung et al., 2009||8010 cases, 9733 controls||-||European, British||TaqMan genotyping||STAT4, STAT3, ERAP1, TNFAIP3, KIF5A/PIP4K2C|
|Wu et al., 2009||205 cases,|
|-||Han Chinese||TaqMan genotyping||CLEC16A|
|Barrett et al., 2009||7,514 cases, 9,045 controls||4,267 cases, 4,670 controls, 4,342 trios||European||GWAS meta-analysis||MHC, PTPN22, INS, C10orf59, SH2B3, ERBB3, CLEC16A, CTLA4, PTPN2, IL2RA, IL27, C6orf173, IL2, ORMDL3, GLIS3, CD69, IL10, IFIH1, UBASH3A, COBL, BACH2, CTSH, PRKCQ, C1QTNF6, PGM1|
|Wallace et al., 2010||7,514 cases, 9,045 controls||4,840 cases, 2,670 controls, 4,152 trios||European ancestry||GWAS meta-analysis||DLK1, TYK2|
|Wang et al., 2010||989 cases, 6197 controls||-||European ancestry||GWAS||PTPN22, IL10, IFIH1, KIAA0746, BACH2, C6orf173, TAGAP, GLIS3, L2R, INS, ERBB3, C14orf181, IL27, PRKD2, HERC2, CLEC16A, IFNG, IL26,|
|Reddy et al., 2011||1434 cases, 1864 controls||-||European ancestry, southeast USA||TaqMan genotyping||PTPN22, INS, IFIH1, SH2B3, ERBB3, CTLA4, C14orf181, CTSH, CLEC16A, CD69, ITPR3, C6orf173, SKAP2, PRKCQ, RNLS, IL27, SIRPG, CTRB2|
|Bradfield et al., 2011||9,934 cases, 16,956 controls||1,120 trios||European ancestry||GWAS meta-analysis||LMO7, EFR3B, 6q27, TNFRSF11B, LOC100128081, FOSL2|
|Asad et al., 2012||424 families,|
|-||European, Scandinavians||Genotyping and sequencing||HTR1A, RFN180|
|Huang et al., 2012||16,179 individuals||-||European ancestry||Genomes-based imputation||CUX2, IL2RA|
2.4. Immune components in T1D
The immune system is well organized and well regulated with a basic function of protecting the host against pathogens. This places the immune system in a vital position between healthy and diseased states of the host. Its protective task is regulated by a complex regulatory mechanism involving a diverse army of cells and molecules of humoral and cellular factors working in concert to protect the body against invaders. The human immune system has two components: innate and adaptive. Innate immunity is comprised of physical, chemical, and microbiological barriers to the entry of antigen, and the elements of immune system (DC, macrophages, mast cells, NK cells, neutrophils, monocytes, complements, cytokines and acute phase proteins), which provide immediate host defense. Adaptive immunity is the hallmark of the immune system of higher animals with T and B cells as the key cellular players that provide more specific life-long immunity.
In T1D this system breaks down: insulin-producing β-cells are subjected to specific attack by the host immune system. To better understand the etiology of T1D, a plethora of research has been done to link the systematic destruction of β-cells and the role of the immune system. Linkage studies in 1970s revealed MHC as the first key contributor to T1D susceptibility. Further linkage analysis and candidate gene association studies revealed additional loci associated with T1D. Starting in 2007, GWAS have increased the number of loci be associated with T1D to almost 60. In Figure 1 we present 59 T1D susceptibility loci as where we have classified them into loci harboring non-immune (14) vs. immune (45) genes. Functional aspects of some genes are discussed below.
The complex crosstalk between innate and adaptive immune cells has major impact on the pathogenesis and development of T1D as illustrated in Figure 2. The initiation phase (Phase I) of T1D development takes place in the pancreas where conventional dendritic cells (cDCs) capture and process β-cell antigens. Apoptosis (‘natural cell death’) or viral infection can lead to β-cell death. Antiviral responses are mediated by invariant natural killer T (iNKT) cells; crossplay between iNKT and plasmacytoid DCs (pDCs) controls viral replication thus prevents subsequent inflammation, tissue damage, and downregulating T1D pathogenesis. Migration of activated cDCs to the draining lymph node primes pathogenic islet antigen-specific T cells. This activation is promoted by macrophages through IL12 secretion. B cells present β-cell antigen to diabetogenic T cells and secrete autoantibodies in response. The activation of islet antigen-specific T cells can be inhibited by cDCs through engagement of programmed cell death ligand 1 (PDL1). The expansion phase (Phase II): iNKT cells can further promote the recruitment of tolerogenic cDCs and pDCs. These DCs promote expansion of regulatory T (TReg) cells through the production of indoleamine 2,3-dioxygenase (IDO), IL10, transforming growth factor-β (TGFβ) and inducible T cell co-stimulator ligand (ICOSL). Phase III: In the pancreas, β-cell can be killed by diabetogenic T cells and NK cells through the release of interferon-γ (IFNγ), granzymes and perforin, as well as by macrophages through the production of tumour necrosis factor (TNF), IL-1β and nitric oxide (NO). IL12 produced by cDCs sustains the effector functions of activated diabetogenic T cells and NK cells. TReg cells that inhibit diabetogenic T cells and innate immune cells through IL10 and TGFβ can prevent β-cell damage. Tolerogenic pDCs stimulated by iNKT cells could also control diabetogenic T cells through IDO production. Lastly, β-cells can inhibit diabetogenic T cells by expressing PDL1 and escape the cell death.
2.5. CLEC16A (16p13)
The C-type lectin domain family 16, member A (CLEC16A) gene encodes protein with C-type lectin domain structure, which makes it potentially related to the immune response (Robinson et al., 2006). It is established that C-type lectins function both as adhesion and pathogen recognition receptors (PPRs) (Cambi & Figdor, 2003). In addition, CLEC16A is almost exclusively expressed in immune cells including DCs, B-lymphocytes and NK cells. Our 2007 GWAS in a large pediatric cohort of European descent identified CLEC16A as a novel T1D susceptibility gene within a 233-kb linkage disequilibrium block on chromosome 16p13. Three common non-coding variants of the CLEC16A gene (rs2903692, rs725613 and rs17673553) reached genome-wide significance for association with T1D (Hakonarson et al., 2007). Subsequent replication studies in an independent cohort confirmed the association. Importantly, the allele of CLEC16A linked to protection from T1D was also associated with higher levels of CLEC16A expression in NK cells (Hakonarson et al., 2007).
The 2007 WTCCC study independently discovered CLEC16A (formally known as KIAA0350) as a T1D susceptibility locus associated with the non-coding variant rs12708716. This finding was confirmed immediately for T1D in populations of European descent (Todd et al., 2007, Cooper et al., 2008). To date, several SNPs (rs2903692, rs17673553, rs725613, rs12708716, rs12921922, rs12931878) within the CLEC16A gene have been reported to be associated with T1D in several populations: Sardinian (Zoledziewska et al., 2009), Spanish (Martinez et al., 2010), southeast USA (Reddy et al., 2011), Chinese (Wu et al., 2009; Sang et al., 2012), and Japanese (Yamashita et al., 2011). Recently CLEC16A was also associated with adult-onset of autoimmune diabetes (Howson et al., 2011).
Several GWAS in different autoimmune diseases such as multiple sclerosis (MS) (Zuvich, 2011; Nischwitz et al., 2011), Addison’s disease (Skinningsrud et al., 2008), systemic lupus erythematosus (SLE) (Gateva et al., 2009; Zhang et al., 2011), Celiac disease (Dubois et al., 2010), Crohn’s disease (Márquez et al., 2009), selective immunoglobulin A deficiency (Jagielska et al., 2012), alopecia areata (Jagielska et al., 2012), rheumatoid arthritis (Martinez et al., 2010) and primary biliary cirrhosis (Mells et al., 2011; Hirschfield et al., 2012) also demonstrated association of the 16p13 loci with disease risk, implying that the 16p13 region contains a key regulator of the self-reactive immune response.
Recently, Davison et al., 2012 reported intron 19 of the CLEC16A gene behaves as a regulatory sequence, which affects the expression of a neighboring gene dexamethasone-induced (DEXI). While it is clear that intron 19 of CLEC16A is highly enriched for transcription-factor-binding events, more functional studies are needed to advance from GWAS to candidate causal genes and their biological functions.
To find causal variant of CLEC16A gene we sequenced the 16p13 region in 96 T1D patients and found 10 new non-synonymous SNPs resulting in one stop-codon, two splice site mutations, and 7 amino acid changes (unpublished data). The studies are under way to examine if these changes are correlated with CLEC16A expression and if these SPNs are present in control group.
Little is yet proven about CLEC16A functions. Kim et al., 2010 characterized ema as an endosomal membrane protein is required for endosomal trafficking and promotes endosomal maturation. Expression of human orthologue of ema ‘CLEC16A’ rescued the Drosophila mutant demonstrating conserved function of the protein. A recent study by the same group also reported its requirement for the growth of autophagosomes and proposed that the Golgi is a membrane source for autophagosomal growth, and that ema facilitates this process (Kim et al., 2012). Expression of CLEC16A rescued the autophagosome size defect in the ema mutant, suggesting that regulation of autophagosome morphogenesis may be one of the fundamental functions of CLEC16A. Another recent study elucidated the dynamic expression changes and localization of CLEC16A in lipopolysaccharide (LPS) induced neuroinflammatory processes in adult rats. CLEC16A expression was strongly induced in active astrocytes in inflamed cerebral cortex.
2.6. Other novel T1D susceptibility loci (2011-2012)
In our latest effort to identify additional genetic loci for T1D, we examined associations in the largest meta-analysis to date between T1D and ~2.54 million SNPs in a combined cohort of 9,934 cases and 16,956 controls. Targeted follow-up of 53 SNPs in 1,120 affected trios uncovered three novel loci associated with T1D that reached genome-wide significance (Bradfield et al., 2011).
Nuclear receptor coactivator 1 (NCOA1) protein is a member of the p160/steroid receptor coactivator (SRC) family. The product of this gene binds to a variety of nuclear hormone receptors in a ligand-dependent manner, suggesting that NCOA1 may play a role as a bridging molecule between nuclear hormone receptors and general transcription factors (Onate et al., 1995; Torchia et al., 1997).
C2orf79 is peptidyl-tRNA hydrolase domain containing 1 (PTRHD1) predicted protein with unknown function.
Centromere protein O (CENPO) gene encodes a component of the interphase centromere complex. The protein is localized to the centromere throughout cell division and is required for bipolar spindle assembly, chromosome segregation and checkpoint signaling during mitosis (Okada et al., 2006).
Adenylate cyclase 3 (ADCY3) gene encodes a membrane-associated enzyme. This protein catalyzes the formation of the secondary messenger cyclic adenosine monophosphate (cAMP) and is highly expressed in human placenta, testis, ovary, and colon (Ludwig & Seuwen, 2002). Wong et al., 2000 reported the presence of adenylyl cyclase 2, 3, and 4 in olfactory cilia. ADCY3 mutants failed olfaction-based behavioral tests indicating that ADCY3 and cAMP signaling are critical for olfactory-dependent behavior.
DnaJ/Hsp40 homolog, subfamily C, member 27 (DNAJC27) gene encodes 273 amino acid protein with RAB-like GTPase and DNAJ domains. EST database suggests high expression in nervous system and reproductive organs (Nepomuceno-Silva et al., 2004).
Pro-opiomelanocortin (POMC) gene encodes a polypeptide hormone precursor protein synthesized mainly in corticotroph cells of the anterior pituitary. POMC is essential for normal steroidogenesis and maintenance of adrenal weight. Mutations in this gene have been associated with early onset obesity, adrenal insufficiency, and red hair pigmentation (Krude et al., 1998; Hung et al., 2012).
DNA (cytosine-5)-methyltransferase 3 alpha (DNMT3A) gene encodes a protein that functions as a
Additional fine gene mapping and functional studies are needed to determine causal variants for 2q23 region and their role in T1D.
Plant Homeo Domain (PHD) finger protein 10 (PHF10) encodes a subunit of an ATP-dependent chromatin-remodeling complex that functions in neural precursor cells (Yoo et al., 2009).
Delta-like 1-Drosophila (DLL1) is a human homolog of the Notch Delta ligand and a member of the delta/serrate/jagged family. It plays a role in mediating cell fate decisions during hematopoiesis and cell communication (Santos et al., 2007; Dontje et al., 2006). The protein is expressed in heart, pancreas and brain. Su et al., 2006 reported pancreatic regeneration in chronic pancreatitis requires activation of the notch-signaling pathway.
Family with sequence similarity 120B (FAM120B) gene encodes protein belonging to the constitutive coactivator of peroxisome proliferator-activated receptor gamma (PPARG) family. FAM120B functions in adipogenesis through PPARG activation in a ligand-independent manner (Li et al., 2007).
Proteasome (prosome, macropain) subunit, beta type, 1 (PSMB1) gene encodes a member of the proteasome B-type family, also known as the T1B family, that is a 20S core beta subunit (Trachtulec et al., 1997). This gene encodes TBP, the TATA-binding protein a transcription factor that functions at the core of the DNA-binding multiprotein transcription factor IID (TFIID). Binding of TFIID to TBP is the initial transcriptional step of the pre-initiation complex (PIC) and plays a role in the activation of eukaryotic genes transcribed by RNA polymerase II (Keutgens et al., 2010).
Programmed cell death 2 (PDCD2) gene encodes a nuclear protein highly expressed in placenta, heart, pancreas, lung, and liver, and lowly expressed in spleen, lymph nodes, and thymus. Expression of this gene is known to be repressed by B-cell CLL/lymphoma 6 (BCL6); a transcriptional repressor (Agata et al., 1996).
In addition, despite not reaching the genome wide significance, our study observed evidence for association at three additional loci containing the candidate genes LOC100128081, TNFRSF11B and FOSL2 (Bradfield et al., 2011). Of these, it is notable that the tumor necrosis factor receptor superfamily, member 11B (TNFRSF11B) is a strongly associated locus with bone mineral density, also discovered in GWAS, and the locus harboring LOC100128081 has also been reported in the context of a GWAS of SLE. FOS-like antigen 2 (FOSL2) gene encodes a leucine zipper protein that dimerizes with the JUN family proteins and forms the transcription factor complex activator protein 1 (AP-1). The FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation (Cohen et al., 1989).
This chapter provides a summary of recent advances in the identification of multiple variants associated with T1D. Genome wide association studies have revolutionized the field of autoimmune mediated disorders. In T1D only six genetic factors were well established before GWAS. GWAS has contributed greatly by expanding the number of established genetic variants to 57 genes. Most of these genes are novel and were not in any investigator’s favorite list. For the first time there is real consensus on the role of specific genetic factors underpinning T1D pathogenesis.
The discoveries of genetic factors involved in the pathogenesis of T1D through GWAS present the first step in a much longer process leading to cure. Genes uncovered using this approach are indeed fundamental to disease biology and will define the key molecular pathways leading to cure of T1D. However, such genome wide scans can lack coverage in certain regions where it is difficult to genotype so it is possible that other loci with reasonable effect sizes remain to be uncovered.
To date most of T1D-associated variants have been discovered utilizing cohorts of European ancestry because the SNP arrays were designed to optimally capture the haplotype diversity in this ethnicity. Novel SNP arrays are needed with the same degree of capture in diverse populations to elucidate the full role of each locus in a worldwide context.
The next challenge is to resolve the specific causal variants and determine how they affect the expression and function of these gene products. The Next-Generation Sequencing (NGS) technology has opened new avenues to elucidate the role of coding and noncoding RNAs in health and disease and would speed up the identification of causative gene variants in T1D.
No doubt, the
This research was financially supported by grant from National Institute of Health (DP3 DK085708-01) and an Institute Development Award to the Center for Applied Genomics from the Children’s Hospital of Philadelphia.