Genome-wide association studies (GWAS) have been fruitful in identifying common variants underlying many complex human diseases (McCarthy et al., 2008, Altshuler et al., 2008), with notable success especially in several autoimmune diseases (Lettre & Rioux 2008, Zhernakova et al., 2009). Hundreds of distinct genomic loci have been associated with various autoimmune diseases, including celiac disease (CeD), Crohn’s disease (CD), ulcerative colitis (UC), multiple sclerosis (MS), rheumatoid arthritis (RA), systemic lupus erythematosus (SLE) and type 1 diabetes (T1D). Besides individual studies, recent meta-analysis of GWAS has also enabled the identification of dozens of susceptibility loci for T1D (Cooper et al., 2008, Barrett et al., 2009, Bradfield et al., 2011), CD (Barrett et al., 2009, Franke et al., 2010), and UC (McGovern et al., 2010, Anderson et al., 2011), since single studies are typically underpowered. Additionally, comparisons of susceptibility loci between different autoimmune diseases have revealed important insights into their common genetic architecture. For example, interleukin 23 receptor (IL23R) has been consistently implicated in multiple related autoimmune disorders including CD, UC, ankylosing spondylitis and psoriasis, suggesting that it may be a common susceptibility factor for the major seronegative diseases (Cargill et al., 2007, Duerr et al., 2006, Tremelling et al., 2007). Another study compared shared genetic risk factors for T1D and CeD and reported multiple identical risk alleles (Smyth et al., 2008), suggesting that common biologic mechanisms may be etiologic features of both diseases. Several similar studies that examined known CD susceptibility loci in GWAS for UC identified previously unreported susceptibility loci shared by these related disorders (Franke et al., 2008, Anderson et al., 2009; Anderson et al., 2011). Taken together, these studies suggest that examination of related autoimmune diseases can help reveal shared genetic pathways, and that evaluation of known susceptibility loci for one disease in GWAS for another disease may uncover novel disease-loci relationships. The GWAS approach has been made possible by the development of high-density genotyping arrays that leverage the knowledge generated from The International HapMap project (The International HapMap Project, 2003, International HapMap Consortium, 2005). As a consequence, it has been shown that the genome is laid out in discrete linkage disequilibrium (LD) blocks with limited haplotype diversity within each of these blocks. Therefore, a minimal set of SNPs can detect almost all common haplotypes present, thus improving genotyping accuracy and reducing cost. Genome-wide genotyping with in excess of 500,000 SNPs can now be readily achieved to accurately tag the vast majority of the diversity in the genome (Steemers et al., 2006, Gunderson et al., 2005, Reich et al., 2005). GWAS has been enormously successful through large-scale studies of cohorts of patients and controls, providingcompelling evidences for genetic variants involved in complex autoimmune diseases.A full catalog of these studies is now available at the NIH website: http://www.genome.gov/gwastudies (Hindorff et al., 2010). We are now in a rapid phase of data accumulation for many complex disorders, especially T1D and IBD. 57 loci have now been uncovered to date as being robustly associated with pathogenesis of T1D. The chronic IBDs have seen a great success as well. To date there are 99 IBD susceptibility loci: 71 associated with CD, 47 with UC, and 28 with both UC and CD.
In this review, we first provide a summary of these recent discoveries and then discuss the shared susceptibility loci implicated in T1D, UC and CD. Our study helps understanding the genetic architecture, including shared genetic pathways and risk factors with opposing effects for these related diseases. Data were obtained from two main sources: the National Human Genome Research Institute catalogue of published genome-wide association studies (http://www.genome.gov/gwastudies/; last accessed on 1 April 2011); and PubMed literature search.
2.1. Type 1 diabetes (T1D)
T1D results from autoimmune destruction of pancreatic beta cells, resulting in a lack of insulin production. Of all cases of diabetes, T1D represents approximately 10% and is most prevalent in populations of European ancestry, where there is ample evidence of increased annual incidence during the past five decades (Onkamo et al., 1999, EURODIAB ACE Study Group, 2000). T1D risk is strongly influenced by multiple genetic loci and poorly understood environmental factors.
2.1.1. A genetic component to type 1 diabetes
T1D is a complex trait that results from the interplay between environmental and genetic factors. Many evidences support a strong genetic component associated to T1D. The epidemiological data for geographic prevalence differences is one clear indicator, with populations of European ancestry having the highest presentation rate. T1D has high concordance among monozygotic twins (33 to 42%) (Redondo et al., 2001) and runs strongly in families, with the sibling risk being approximately 10 times greater than the general population (Clayton, 2009); this is in clear contrast to the “less genetic” type 2 diabetes (T2D), where the sibling risk ratio is relatively modest at 3.5 (Rich, 1990).
2.1.2. Before GWAS
Historically, prior to GWAS, only five loci have been fully established to be associated with T1D. It has been long established that approximately half of the genetic risk for T1D is conferred by the genomic region harboring the HLA class II genes (primarily HLA-DRB1, -DQA1 and -DQB1 genes), which encode the highly polymorphic antigen-presenting proteins. Recent fine mapping efforts of the MHC addressed why the class II genes HLA-DQB1 and HLA-DRB1 cannot completely explain the association between T1D and the MHC region (Nejentsev et al., 2007). It turned out that most of the remaining association that could be detected was due to signals in HLA-B and HLA-A, and that the existence of other major T1D genes in the extended MHC was unlikely. Other established loci prior to GWAS are the genes encoding insulin (INS) (Bell et al., 1984; Bennett et al., 1995), cytotoxic T-lymphocyte-associated protein 4 (CTLA4) (Nistico et al., 1996; Anjos et al., 2004); protein tyrosine phosphatase, non-receptor type 22 (PTPN22) gene (Bottini et al., 2004, Smyth et al., 2004) and interleukin 2 receptor alpha (IL2RA) (Vella et al., 2005, Lowe et al., 2007). However, the majority of other reported associations in the pre-GWAS era have remained debatable (Guo et al., 2004, Mirel et al., 2002, Biason-Lauber et al., 2005), where an initial report of association does not hold up in subsequent replication attempts by other investigative groups, known as the “winner’s curse” (Lohmueller et al., 2003).
2.1.3. GWAS of T1D
The advent of GWAS has changed the situation dramatically, provided pace and great benefit to the discovery of loci associated with T1D, increasing the number of associated regions by a factor of ten. An early genome-wide SNP genotyping approach, using only 6,500 nonsynonymous SNPs (Smyth et al., 2006), represented a precursor to the full GWAS approached soon after; however it did uncover a robust association to the interferon-induced with helicase C domain 1 (IFIH1) gene. IFIH1 exerts its influence through the apoptosis of virally infected cells in antiviral immune responses, which may in turn support the notion that there is a connection between viral infections and the pathogenesis of T1D (Knip et al., 2005). Of interest, subsequent re-sequencing revealed additional rarer, higher risk conferring variants residing within the exons of this gene (Nejentsev et al., 2009). The first full-scale GWAS for T1D came simultaneously from our group (Hakonarson et al., 2007) and the Wellcome Trust Case-Control Consortium (WTCCC, 2007). In our study we examined a large pediatric cohort of European descent using the Illumina HumanHap 550 BeadChip platform. The design involved 561 cases, 1,143 controls and 467 triads in the discovery stage followed by a replication effort in 939 nuclear families. In addition to finding the “usual” suspects, including an impressive 392 SNPs capturing the very strong association across the MHC, we identified significant association with variation at the KIAA0350 gene, which we replicated in an additional cohort. The WTCCC study investigated seven common complex diseases including T1D (WTCCC, 2007) by genotyping2,000 cases and 3,000 controls with ~500,000 SNPs using the Affymetrix GeneChip and reported a number of novel T1D loci, including KIAA0350 genomic region. They confirmed these findings in a replication study in 4,000 cases and 5,000 controls plus nearly 3,000 T1D family trios that were reported in a companion paper that came out on the same day (Todd et al., 2007). In a separate replication effort we elected to fast-track 24 SNPs at 23 distinct loci that fell just below the bar for genome wide significance in our 2007 GWAS and established association to the 12q13 region, with a combined P-value of 9.13x10-10 (Hakonarson et al., 2008); this was the same locus as the one reported by the WTCCC and Todd et al., 2007. The 12q13 region harbors several genes, including ERBB3, RAB5B, SUOX, RPS26 and CDK2. Additional laboratory studies are needed to identify both the causative variant and the corresponding genes. The clarity of signals found in 2007-2009 T1D GWAS highlights the strength and consistency of GWAS approach in contrast to traditional candidate gene and family-based studies where the consensus amongst geneticist was weak (Lohmueller et al., 2003).
2.1.4. Meta-analyses of T1D GWAS datasets
To date, genome wide genotyping has been relatively expensive, and represents a large financial investment when leveraging large, well powered case-control cohorts. In order to get the most from such an endeavor, GWAS investigators have chosen to combine datasets from different investigative groups in order to carry out meta-analyses. We used this data-mining approach to determine additional novel loci associated with T1D (Grant et al., 2009) conferring increasingly modest risks in the region of 1.1 to 1.2.Through subsequent rounds of testing in an independent cohort of nuclear T1D families from Montreal, the Children’s Hospital of Philadelphia and the Type 1 Diabetes Genetics Consortium (T1DGC), followed by the WTCCC dataset and the Diabetes Control and Complications Trial (DCCT)/Epidemiology of Diabetes Interventions and Complications (EDIC) study cohort, we observed convincing association with the genes encoding ubiquitin-associated and SH3 domain-containing protein A (UBASH3A) and BTB and broad complex-tramtrack-bric-a-brac (BTB) and cap 'n' collar (CNC) homology 2 (BACH2). In further support of our finding, the UBASH3A locus was subsequently implicated in T1D from a large linkage study using dense SNP genotyping data generated on affected sib pairs (Concannon et al., 2008).
A subsequent meta-analysis by Cooper and colleges (Cooper et al., 2008) using T1D datasets from the TCCC (Wellcome Trust Case Control Consortium, 2007) and the Genetics of Kidneys in Diabetes (GoKinD) study (Mueller et al., 2006, Manolio et al., 2007), confirmed association to the previously observed loci of PTPN22, CTLA4, MHC, IL2RA, 12q13, 12q24, CLEC16A and PTPN2 but yielded less evidence for the IFIH1 and INS loci without reporting new T1D loci reaching the threshold for genome wide significance. The SNPs with lowest nominal P-values were taken forward for further genotyping in an additional British cohort of approximately 6,000 cases, 7,000 controls and 2,800 families. As a result, the IL2-IL21 association strengthened further and they found strong evidence for the following loci: BACH2 (as we described previously (Grant et al., 2009)), a 10p15 region harboring the protein kinase C, theta gene (PRKCQ), a 15q24 region harboring nine genes including cathepsin H (CTSH) and a 22q13 region harboring the C1q and tumor necrosis factor related protein 6 (C1QTNF6) and somatostatin receptor 3 (SSTR3). Additional studies are required to elucidate the culprit genes and their mechanism at the 15q24 and 22q13 loci.
The meta-analysis reported by Barrett et al., 2009 uncovered in excess of forty loci, including 18 novel regions plus confirmed a number of loci uncovered through cross-disease comparisons (Smyth et al., 2008, Fung et al., 2009, Cooper et al., 2009). This study not only involved samples from WTCCC (WTCCC, 2007) and GoKinD study (Mueller et al., 2006) but also brought in a further large set of cases, controls and family sets from T1DGC. In addition to confirmation for already known loci they also reported association to 1q32.1 (which harbors the interleukin genes IL10, IL19 and IL20), Glis family zinc finger protein 3 (GLIS3) (first suggested by us (Grant et al., 2009), CD69 and IL27. These findings were further supported by our in silico replication efforts (Qu et al., 2010).
To identify additional genetic loci for T1D susceptibility,we examined associations in the largest meta-analysis to date between the disease and ~2.54 million SNPs in a combined cohort of 9,934 cases and 16,956 controls (Bradfield et al., 2011). Targeted follow-up of 53 SNPs in 1,120 affect trios uncovered three new loci associated with T1D that reached genome wide significance. The most significantly associated SNP (rs539514, P = 5.66x10-11) resided in an intronic region of the LMO7 (LIM domain only 7) gene on 13q22. The second most significantly associated SNP (rs478222, P = 3.50x10-9) resided in an intronic region of the EFR3B (protein EFR3 homolog B) gene on 2p23; however the region of linkage disequilibrium is approximately 800kb and harbors additional multiple genes, including NCOA1, C2orf79, CENPO, ADCY3, DNAJC27, POMC, and DNMT3A. The third most significantly associated SNP(rs924043, P = 8.06x10-9) lied in an intergenic region on 6q27, where the region of association is approximately 900kb and harbors multiple genes including WDR27, C6orf120, PHF10, TCTE3, C6orf208, LOC154449, DLL1, FAM120B, PSMB1, TBP and PCD2. These latest associated regions add to the growing repertoire of gene networks predisposing to T1D. Figure 1 summarizes the 57 coli reported to date.
2.2. Inflammatory bowel disease
The two IBD subtypes, Crohn's disease (CD) and ulcerative colitis (UC) collectively referred as Inflammatory bowel disease (IBD), common inflammatory disorders with complex etiology involving multiple genes and environmental factors, are characterized respectively by confluent inflammation of the colonic mucosa (UC) and discontinuous transmural intestinal inflammation (CD). IBD is thought to develop as a result of dysregulation of immune response to normal gut flora in a genetically susceptible host based on our current knowledge of clinical investigations, gene association studies and laboratory experiments. IBD impacts large number of people and is becoming more common in the rest of the world with adoption of Western lifestyle. The prevalence of CD and UC has increased significantly over the last decade with peak age of onset in the second to fourth decades of life (Lashner 1995). CD and UC are considered related disorders that have some shared and some distinct genetic susceptibility loci. IBD being highly heritable has a complex genetic basis as suggested by family, twin, and phenotype concordance studies. When compared to general population, familial aggregation studies have reported a greater relative risk for developing IBD among twins and first-degree relatives of affected individuals. Monozygotic twins compared to dizygotic twins have reported substantially higher disease concordance for both CD and UC (Thompson et al., 1996, Duerr, 2003, Halme et al., 2006)
It had proven difficult before the advent of GWA studies to isolate disease genes that confer susceptibility to CD and UC using classical candidate gene and linkage approaches, with two notable exceptions. ‘Caspase recruitment domain family, member 15’ (CARD15; also known as NOD2), the first and most widely replicated CD susceptibility gene was positionally cloned using linkage analysis in 2001 (Hugot et al., 2001, Hampe et al., 2001). Linkage studies approach also discovered CD risk haplotype spanning the organic cation transporter, SLC22A4, and other genes on chromosome 5q31 (IBD5) (Rioux et al., 2001, Peltekova et al., 2004). Due to extensive linkage disequilibrium (LD) in the region, the identity of the causative gene and associated variants has been debated (Duerr et al., 2006).
GWAS have yielded many positive associations within CD and UC and other autoimmune diseases. Duerr et al., 2006 reported the first GWA study of IBD and association between CD and variants in the interleukin 23 receptor (IL23R) on chromosome 1p31 in ileal CD cases of European ancestry.
Early-onset IBD demonstrates unique characteristics in clinical phenotype, severity, and familial clustering. Extensive anatomical involvement at presentation, with early disease progression is now clearly established as a feature of both childhoods CD (Vernier-Massouille et al., 2008) and UC (Van Limbergen et al., 2008a). Recent data suggest that stratifying subjects based upon age-of-onset may be effective in identifying genes contributing to IBD pathogenesis, and that individuals with early-onset disease may be more genetically enriched and thereby compensate for the relatively smaller pediatric cohorts (Kugathasan et al., 2008, Imielinski et al., 2009). Recent meta-analysis of published studies is another approach by which several common genetic factors can be identified. A handful of meta-analysis for common susceptibility loci between UC and CD have been performed most notably for NOD, PTPN22, ATG16L1, and IGRM (Barrett et al., 2008, McGovern et al., 2010, Franke et al., 2010; Anderson et al., 2011). GWAS studies to date, combining previously reported pediatric and adult onset IBD studies in a large meta-analysis, have confirmed 99 IBD susceptibility loci: 71 associated with CD, 47 with UC, and 28 with both UC and CD (Lees et al., 2011).
2.3. Comparisons between T1D and IBD
It is becoming apparent that there is cross-talk between genes influencing autoimmune diseases. For instance, it has been shown that the first T1D locus we reported, CLEC16A, has also yielded association to multiple sclerosis in a GWAS of that disease (De Jager et al., 2009) and that PTPN22 has been similarly implicated in Crohn’s disease, rheumatoid arthritis, systemic lupus erythematosus and autoimmune thyroiditis (Hafler et al., 2007, Bottini et al., 2006). In addition, from our earlier comparative genetic analyses of inflammatory bowel disease and type 1 diabetes, we have intriguingly implicated multiple loci with opposite effects (Wang et al., 2010).
2.3.1. T1D and IBD (UC and CD) – common genes: IL10, IL2/IL21, ORMDL3, PTPN2, CLEC16AIL10 (1q32)
The IL10 gene, located in chromosome 1, encodes Interleukin 10, which is an anti-inflammatory cytokine produced primarily by monocytes and lymphocytes. IL10 has pleiotropic effects in immunoregulation and inflammation. Franke et al., 2008 GWAS first reported SNP rs3024505 immediately flanking the IL10 gene in a region 1q32.1 associated
with UC. Barrett et al., 2009 meta-analysis study revealed association with T1D. Study from our group (Wang et al., 2010) reported IL10 as novel association for CD with alleles having opposite direction of association between T1D and IBD (including both UC and CD) and the SNP rs3024505 confers protection to UC and CD. Later Franke et al., 2010 meta-analysis study of six Crohn’s disease GWAS comprising of 6,333 affected individuals and 15,056 controls and followed up the top association signals in 15,694 cases, 14,026 controls and 414 parent-offspring trios confirmed the association.
The IL2 gene and IL21 gene are located on chromosome 4, encode Interleukin 2 and Interleukin 21, respectively. IL2 protein is a secreted cytokine important for the proliferation of T and B lymphocytes. IL2 signals through a receptor complex consisting of IL2 specific IL2 receptor alpha (CD25), IL2 receptor beta (CD122) and a common gamma chain shared by all members of this cytokine family. Binding activates the Ras/MAPK, JAK/Stat and PI 3-kinase/Akt signaling modules. Interleukin 21 is a cytokine with potent regulatory effects on cells of the immune system, including natural killer (NK) cells and cytotoxic T cells that can destroy virally infected or cancerous cells (Parrish-Novak et al., 2002).
GWAS provides evidences for 4q27 region association with a number of autoimmune phenotypes, type 1 diabetes and Graves' disease (GD) (Todd et al., 2007), systemic lupus erythematosus (SLE) (Sawalha et al., 2008), psoriatic arthritis (Liu et al., 2008) and juvenile idiopathic arthritis (Albers et al.,) ulcerative colitis (Festen et al., 2009), Crohn's disease (Marquez et al., 2009a), and celiac disease (Garner et al., 2009). This region contains four genes in strong linkage disequilibrium (LD): KIAA1109-TENR-IL2-IL21. KIAA1109 and ADAD1 (adenosine deaminase domain containing 1 protein with testis specific expression)genes have no known relationship to immune function, while IL2 and IL21 are both of interest, especially because IL2is a risk gene for T1D in the non-obese diabetic mouse (Yamanouchi et al., 2007). The extensive linkage disequilibrium across the region is preventing fine-mapping efforts (Todd et al., 2007) suggesting resequencing followed by genotyping to identify all the variants with association in KIAA1109-TENR-IL2-IL21 region.
The CLEC16A gene, located on chromosome 16, encodes protein with unknown function. The exclusive expression specificity of CLEC16A in immune cells including dendritic cells, B lymphocytes and natural killer (NK) cells, all of which are pivotal in the pathogenesis of T1D (Poirot et al., 2004, Rodacki et al., 2007) indicates that the variant probably contributes to the disease by modulating immunity. Hakonarson et al., 2007 performed GWAS in a large pediatric cohort of European descent to identify new genetic factors that increase the risk of T1D and found that T1D was significantly associated with variation within a 233-kb linkage disequilibrium block on chromosome 16p13. Three common non-coding variants of the gene (rs2903692, rs725613 and rs17673553) in strong linkage disequilibrium reached genome-wide significance for association with T1D. A subsequent replication study in an independent cohort confirmed the association. There are no other genes in 16p13 region, making CLEC16A (also known as KIAA0350) a prime candidate for harboring the causative variant. We investigated the expression of KIAA0350 in four different NK cell lines and found higher expression in the NKL cell line; interestingly, this cell line is homozygous for allele A of rs2903692. WTCCC 2007 GWAS discovered KIAA0350 as T1D susceptibility locus. Several studies later confirmed the association with T1D (Todd et al., 2007, Cooper et al., 2008), pathogenesis of MS (De Jager et al., 2009) and Celiac disease (Dubois et al., 2010). Recent studies of the novel Drosophila gene, ema (endosomal maturation defective), orthologue of human CLEC16A, have defined roles for the protein in regulating endosomal trafficking (Kim et al., 2010). Such function suggests mechanisms by which CLEC16A may confer susceptibility to autoimmune disorders. In 2009, Marquez et al. concordantly with the already reported function of this locus in T1D and MS, showed the specific association of a CLEC16A/KIAA0350 polymorphism with a distinct group of NOD2-/CARD15Crohn’s disease patients. These new evidence argues in favor of the hypothesis that alterations in genetic factors involved in bacterial recognition through different pathways result in CD, which may allow more insight into processes critical to the pathogenesis of this IBD.
The ORMDL3 gene, located on chromosome 17, encodes ORM1-like 3 protein of unknown function which is expressed heart, brain, lung liver and kidney. Common SNPs in the chromosome 17q12 region reported by GWAS is known to alter the risk for asthma and also linked with the expression level of the ORM1-like 3 (ORMDL3) gene (Moffat et al., 2007). SNP (rs7216389) was highlighted as the potential causal variant based on evolutionary conservation. Several studies have replicated this association for adult asthma in different ethnic groups (Sleiman et al., 2008, Madoreet et al., 2008; Galanter et al., 2008, Hirota et al., 2008). The rs7216389 SNP is found within a large linkage disequilibrium (LD) block that encompasses not only ORMDL3, but also gasdermin B (GSDMB), zona pellucida binding protein 2 (ZPBP2), and IKAROS family zinc finger 3 (Aiolos) (IKZF3) genes. Another variant (rs2872507) located within this same LD block has been associated with CD (Barrett et al., 2008 GWAS and Franke et al., 2010 meta-analysis), suggesting that the same polymorphism confers susceptibility to both CD and asthma. More recently, the same region was associated with the risk of type 1 diabetes (for SNP rs2290400) (Barrett et al., 2009) and UC (McGovern et al., 2010) for rs2305480. Further elucidation of the genetic mechanisms of transcriptional control of the ORMDL3 genomic region may potentially shed light on the pathogenesis of multiple complex diseases.
The PTPN2 gene, located on chromosome 18, encodes a tyrosine-protein phosphatase nonreceptor type 2, a key negative regulator of inflammatory responses is expressed in cells of the immune system and islet β-cells. Recent studies point out the possible role of PTPN2 in preventing β-cell apoptosis (Moore et al., 2009).
WTCCC 2007 and Barret et al., 2008 identified PTPN2 as CD susceptibility gene. Franke et al., 2008 and Dubois et al., 2010 GWAS associated PTPN2 as previously unknown disease loci for UC. Hakonarson et al., 2007 and Todd et al., 2007 GWAS, Cooper et al., 2008 and Barrett et al., 2009 meta-analysis associated PTPN2 as T1D susceptibility gene. Feston et al., 2011 meta-analysis identified PTPN2 as shared risk loci for Crohn's disease and celiac disease. Of special interest to CD, PTPN2 regulates intestinal epithelial barrier function, thus identifying a novel link between disease associated gene and a key patho-physiological event in CD (Scharl et al., 2009). SNP rs1893217(C), an autoimmune associated variant in PTPN2, reveals an impairment of IL2R signaling in CD4+ T cells (Long et al., 2011). Scharl et al., 2011 demonstrated that PTPN2 is activated by TNF-αand regulates TNF-α-induced MAPK signaling and on a functional level, loss of PTPN2 is associated with increased expression and secretion of proinflammatory mediators in the intestinal epithelium, a potential role for PTPN2 in the pathogenesis of CD.
2.3.2 Type 1 diabetes and Crohn’s disease – common genes: PTPN22, IL18RAP, BACH2, TAGAP, IL2RA, IL27, TYK2, MTMR3)
The PTPN22 gene, located in chromosome 1, encodes tyrosine-protein phosphatase nonreceptor type 22, a lymphoid-specific phosphatase, an important downregulator of T cell activation. It is mainly expressed in T cells, B cells, NK cells, macrophages, monocytes, and dendritic cells. Initially PTPN22 polymorphism (C1858T) was associated and originally described in T1D patients (Bottini et al., 2004). This was consistently replicated in independent population’s studies (Smyth et al., 2004, Santiago et al., 2007). Later the same polymorphism was associated with several autoimmune disorders, including rheumatoid arthritis (Begovich et al., 2004, Orozco et al., 2005), systemic lupus eritematosus (Orozco et al., 2005), Wegener’s granulomatosis (Jagiello et al., 2005) and myasthenia gravis (Vandiedonck et al., 2006). C1858T polymorphism is a non-synonymous SNP with a substitution of arginine for tryptophan in the encoded protein (R620W). Functional studies in mice revealed that LYP (the mouse ortholog of PTPN22) has an increased phosphatase activity in presence of 1858T allele. The 620W autoimmune risk allele behaves as a gain-of-function mutation and results in a phosphatase with higher catalytic activity and more potent negative regulation of T-cell activation (Vang et al., 2005). By contrast, knockout mice have increased T-cell activation and an increased production of antibodies (Hasegawa et al., 2004).
PTPN22 was the first and most convincing example of common susceptibility genes underlying diverse autoimmune phenotypes. First GWAS for T1D reported PTPN22 as a susceptibility locus (Todd et al., 2007, Hakonarson et al., 2007). Later PTPN22 was associated with RA (Plenge et al., 2007, Stahl et al., 2010), Crohn’s disease (Barrett et al., 2008), Graves’ disease, Hashimoto thyroiditis, myasthenia gravis, systemic sclerosis, generalized vitiligo (Jin et al., 2010). For T1D and RA the 620W allele confers almost two-fold risk for the disease, with 3-4 odds rations for homozygous patients, making PTPN22 second in importance locus (after MHC), in terms of association strength, for these two diseases (Gregerson & Olsson, 2009). Barrett et al., 2008 was first to report “intriguing contrasts between genetic susceptibility to CD and other complex disorders”. They found that the same coding variant (620W) that is a risk factor for T1D and RA provides protection from CD. Our group in latest GWAS confirmed both the association for T1D and CD as well as protection for CD (Wang et al., 2010). In summary, the minor allele of PTPN22 variant 620W predisposes a person to many immune-mediated diseases but is protective for Crohn’s disease (Barrett et al., 2008, Wang et al., 2010).
The IL18RAP gene, located on chromosome 2, encodes Interleukin 18 receptor accessory protein. IL18RAP is strongly expressed in unstimulated T cells and NK cells (Su et al., 2004). The coexpression of IL18R1 and L18RAP is required for the activation of NF-κB and MAPK8 (JNK) in response to IL18.
IL18RAP is a long confirmed locus for type 1 diabetes. Smyth et al., 2008 genotyped DNA samples from 8064 T1D patients, 9339 control subjects, and 2828 families for eight loci previously known risk loci of celiac disease and reported association of IL18RAP and TAGAP genes with T1D (P<1.00×10-4) and showed that the minor alleles of the SNP rs917997-A for IL18RAP gene confers protection in T1D. Zhernakova et al., 2008 demonstrated strong association of rs917997 SNP for CD. The rs917997 genotype strongly correlates with lower IL18RAP expression in individuals homozygous for the risk allele, which may lead to differential IL18-mediated innate immune responses to infection (Zhernakova et al., 2010). In recent GWAS we confirmed IL18RAP region as T1D and CD loci and reported an opposite direction of association for the same allele residing in IL18RAP gene (rs917997) for T1D and CD (Wang et al., 2010). Meta-analysis studies by Franke et al., 2010 confirmed ones again IL18RAP as CD locus. Festen et al., 2011 meta-analysis study reported allele rs6708413-G residing in IL18RAP as a shared risk locus for CD and Celiac disease, validating previously identified associations.
The BACH2 gene ‘BTB and CNC homology 1, basic leucine zipper transcription factor 2’, located on chromosome 6, encodes a protein which functions as a transcription repressor (Oyake et al., 1996). It possesses BTB/POZ domain and bZip domain. Bach2 forms heterodimers with the small Maf proteins (MafK, MafG and MafF) through the leucine zipper and then binds to the Maf-recognition elements (MARE). BACH2 is abundantly expressed in both B cells and neurons (Hoshino & Igarashi, 2002).
GWAS provided convincing associations of BACH2 with T1D in unrelated data sets (Grant et al., 2009, Cooper et al., 2008, Barrett et al., 2009). A second-generation GWAS of 4,533 celiac disease cases and 10,750 controls revealedrs10806425-A allele as a risk factor for celiac disease (Dubois et al., 2010). Meta analysis confirmed BACH2 as a susceptibility gene for both Celiac disease and CD, with rs1847472-G being a common risk allele (Franke et al., 2010).
The TAGAP gene, located on chromosome 6, encodes T-cell activation RhoGTPase-activating protein, associated with multiple autoimmune diseases: T1D, CD, Celiac disease, and RA. With less known role in immune function, it has been found to be co-regulated with IL2 and is expected to play a role in T cell activation (Mao et al., 2004, Chang & Hsiao, 2005).
Smyth et al., 2008 first reported association of TAGAP with T1D (P<1.00×10-4). Recent genotyping study using the Sequenom MassArray platform reported TAGAP (rs182429) as an overlapping genetic susceptibility variant between three autoimmune disorders: RA, T1D and coeliac disease (Eyre et al., 2010). Franke et al., 2010 meta-analysis reported TAGAP association with CD, which was confirmed in another meta-analysis as common allele between CD and CelD (Festen et al., 2011). Interestingly, the TAGAP minor allele confers protection against RA (Eyre et al., 2010), similar to previous reports of T1D (Smyth et al., 2008) but contrasting in CD (Franke et al., 2010, Festen et al., 2011) and celiac disease (Smyth et al., 2008) in which the minor allele is associated with risk.
The IL2RA gene, located on chromosome 10, encodes α subunit of the IL2 receptor complex, thus mediating IL2 signaling in host defense and regulating response to autoantigens by Tregs. The imbalance between Th1 and Th2 cytokines plays a crucial role in the regulation of the immune response and in the pathogenesis of autoimmune diseases (Cantagrel et al., 1999, Sartor, 1994). Thus, the genes encoding Th1 and Th2 cytokines and their receptors might be considered good candidates to modify the risk of these diseases.
IL2RA was associated with T1D even before advent of GWAS (Vella et al., 2005).Several GWAS consistently confirmed the association with (WTCCC 2007; Cooper et al., 2008, Barrett et al., 2009). IL2RA was also implicated in MS (Hafler et al., 2007, De Jager et al., 2009), RA (Stahl et al., 2010). Most recently meta-analysis confirmed the association with Crohn's disease (Franke et al., 2010).
14q24 is common susceptibility locus shared between T1D and CD. Barrett at al., 2009 meta-analysis first reported rs1465788-G (14q24.1) to be a risk allele for type 1 diabetes; with two genes to be present at this LD region: C14orf181 and ZFP36L1. rs4899260-A in region 14q24.1 was found in GWAS to be a risk allele for Celiac (Dubois et al., 2010). Meta-analysis study associated rs4902642-G in the same region as a risk allele for CD (Franke et al., 2010).
The region contains C14orf181 and ZFP36L1 gene. ZFP36L1 gene is a functionally interesting candidate located on chromosome 14, encodes Zinc finger protein 36 C3H type-like 1, a modulator of mRNA stability.
The IL27 gene, located on chromosome 16, encodes Interleukin 27, a cytokine secreted by stimulated antigen-presenting cells. IL27is a member of the same family as IL12 and IL23. Initially it was associated only with the differentiation of TH1 lymphocytes, enhancement of the cellular type immune response, and the reciprocal inhibition of TH2 humoral immune reactions (Lucas et al., 2003). Later it was shown to play an important role in regulating the activity of B & T lymphocytes as well as NK cells (Larousserie et al., 2006). It could be important to pathogenesis of autoimmune diseases, that IL27 promotes cell-mediated immune reactions and limits inflammatory reactions (Jankowski et al., 2010).GWAS revealedassociation of IL27 with T1D (Barrett et al., 2009) and early onset IBD (Imielinski et al., 2009). In recent GWAS we confirmed IL27 (SNP rs4788084) as T1D locus and reported a new association with Crohn’s disease. We also showed that T1D risk allele residing at the IL27 loci protects against CD (Wang et al., 2010). Recent meta-analysis of six CD GWAS confirmed IL27 as susceptibility locus for CD (Franke et al., 2010).
The TYK2 gene, located on chromosome 19, encodes tyrosine kinase 2, a member of the JAK-signal transduction family. It is involved in cytokine signaling by IFN-γ, IL12 and IL23 and affects Th1 and Th17 lineage development. TYK2 also plays an important role in TLR-mediated responses in dendritic cells, including IL12 and IL23 production, and TYK2 mutations predispose to opportunistic infection (Ghoreschi et al., 2009). Initially, TYK2 gene has been associated with SLE (Suarez-Gestal et al., 2009) and MS (Mero et al., 2010). Only meta-analysis provided enough power to support at genome-wide significance for TYK2 associations with T1D (Wallace et al., 2010) and CD (Franke et al., 2010).
2.3.3 Type 1 diabetes and ulcerative colitis – common genes: TNFAIP3, IL26, HERC2
The TNFAIP3 gene, located on chromosome 6, encodes tumor necrosis factor, alpha-induced protein 3. This protein is a key regulator of NF-κB signaling pathway, modulates cell activation, cytokine signaling and apoptosis (Beyaert et al., 2000). The first association between autoimmune disease and SNPs spanning TNFAIP3 was identified for RA (Plenge et al., 2007). Then TNFAIP3 region was reported as a novel susceptibility locus for SLE (Graham et al., 2008) in Europeans and later it was confirmed in Asian populations (Han et al., 2009, Shimane et al., 2010). Fung et al., 2009 GWAS reported an association with T1D for two statistically independent SNPs rs6920220 and rs1049919. More recently, we tested this T1D loci susceptibility (TNFAIP3) in GWAS and discovered that it confers UC risk (Wang et al., 2010). Dubois et al., 2010 in GWAS reported association with celiac disease. Anderson et al., 2011 meta-analysis confirmed the association of TNFAIP3 as UC risk loci.
The IL26 (Interleukin 26) gene is located on chromosome 12 between the genes for two other important class 2 cytokines, IFN-γ and IL22 (Wilson et al., 2007). IL26 is a 171-amino acid protein and shares sequence homology to interleukin 10. IL26 signals through a receptor complex comprising two distinct proteins called IL20 receptor 1 and IL10 receptor 2 (Sheikh et al., 2004). First time, rs2870946-G and rs1558744-A in region 12q15 were associated with UC (Silverberg et al., 2009). Meta-analysis study further confirmed the association of rs1558744-A with UC (McGovern et al., 2010). Later GWAS from our group reported new association of rs1558744 variant with T1D (Wang et al., 2010).HERC2 (15q13)
The HERC2 gene, located on chromosome 15, encodes a novel ubiquitin ligase required for DSB-associated histone ubiquitylation. This gene belongs to the HERC gene family that encodes a group of unusually large proteins with multiple structural domains. The protein has a COOH-terminal HECT domain, a motif responsible for E3 ubiquitin ligase activity. It was recently shown that HERC2 is implicated in DNA damage repair functions (Bekker-Jensen et al., 2010). At first in a GWAS, rs916977 variant in HERC2 was identified and confirmed as susceptibility loci specific to ulcerative colitis (Franke et al., 2008). GWAS from our group later identified rs916977 variant as novel association with T1D (Wang et al., 2010).
2.3.4 Crohn’s disease and Ulcerative colitis – 32 common loci
The interleukin-23 receptor (IL23R) gene is located on chromosome 1p31 encodes receptor protein for IL23. IL23R is highly expressed in dendritic cells. This protein associates constitutively with JAK2, and also binds to transcription activator STAT3 in a ligand-dependent manner. Duerr et al., 2006 GWAS first identified the IL23R gene as a CD susceptibility gene in North American population, which indicated that the rare IL23R SNP rs11209026 (p.Arg381Gln; c.1142G>A) was associated with strong protection against CD. Since then this association has been replicated in both CD (Umeno et al., 2011; Ferguson et al., 2010) and UC (Umeno et al., 2011, Anderson et al., 2011, WatermanM et al., 2010), and also in pediatric CD (Baldassano et al., 2007, Gazouli M et al., 2010), psoriasis (Bowes & Barton, 2010), and ankylosing spondylitis (Laukens et al., 2010) by several GWAS and meta-analysis studies. Of importance, is notably the presence of multiple associations within the IL23R gene to CD and UC to date, suggests rigorous involvement of IL-23 pathway in IBD pathogenesis. These broad range associations might be attributed to the effect of different causal variants on different haplotypes or biological differences in the function of IL-23 and IL-23R signaling in different diseases. Additional genetic, bioinformatic, and laboratory assessments of association signals supplemented by deep resequencing efforts are needed to define more clearly susceptibility genes and contributing susceptibility alleles, especially where the association signals span several candidate genes to identify rare variants contributing to IBD pathophysiology.
KIF21B gene, located on chromosome 1 encodes for Kinesin member 21B protein. Kinesin family proteins are microtubule-dependent molecular motors involved in the intracellular motile process and organelle transport. KIF21B expression is detected in lungs, brain, testes and thymus (Harada et al., 2007). In the immune system, expression is highest in CD4+ and CD8+ T cells, CD56bright NK cells and B cells (T1Dbase, http:// www.t1dbase.org/). Of importance, the mouse homologue, Kif21b, is located in a region linked to susceptibility (Idd5.4a) for a mouse model for diabetes (Hunter et al., 2007). SNP rs11584383, located downstream of KIF21B was one among the 21 novel loci reported to be associated with CD in GWAS by Barrett et al., 2008. Several GWAS later identified KIF21B as ulcerative colitis susceptibility loci (Franke et al., 2008, Anderson et al., 2009, McGovern et al., 2010). Recent meta-analysis studies have confirmed and reported it as common susceptibility loci between CD and UC (Anderson et al., 2011 and Umeno et al., 2011).
TNFRSF9, ERFFI1, UTS2, PARK7 (1p36)
The region 1p36 recently shown to be associated with IBD contains several genes. Tumor necrosis factor receptor superfamily, member 9 (TNFRSF9), contributes to the clonal expansion, survival, and development of T cells. ERBB receptor feedback inhibitor 1 (ERFFI1) is negative regulator for several RGFR family members. Urotensin 2 (UTS) is a potent vasoconstrictive peptide that regulates both endothelium-dependent and independent vasodilation. Parkinson disease autosomal recessive early onset 7 (PARK7) is positive regulator of androgen receptor-dependent transcription. Dubois et al., 2010 GWAS reported association of TNFRSF9 and PARK 7 with UC in European population. Anderson et al., 2011 meta-analysis further confirmed the reported association of 1p36.23 region with UC.
PUS10 gene, located on chromosome 2, encodes ‘pseudouridylate synthase 10’ which catalyzes pseudouridination of universally conserved Psi55 in tRNA. These enzymes act as RNA chaperones, facilitating the correct folding and assembly of tRNAs (McCleverty et al., 2007). Barrett et al., 2008 was first to report PUS10 as susceptibility loci for Crohn’s disease through GWAS. Meta-analysis study identified as a risk locus in UC (McGovern et al., 2010) and CelD (Dubois et al., 2010) indicating that this locus is a shared risk locus. Recently Festen et al., in 2011, through a meta-analysis of GWAS data from CD and CeID as a single disease phenotype established PUS10,as shared risk loci of genome-wide significance, with p-value of 1.38X10-11 along-with IL18RAP, PTPN2 and TAGAP, as shared risk loci for Crohn’s Disease and Celiac Disease. This meta-analysis approach provided the power, lacking in individual disease-specific GWAS datasets, to identify shared risk loci with small effects in each single disease.
The REL gene located on chromosome 2, encodes c-Rel, a transcription factor that is a member of the Rel/NFKB family, which also includes RELA, RELB, NF-κB1/p50, and NF-κB2/p52. All five NF-kB members share a highly conserved 300 amino acid long N-terminal Rel homology domain (RHD), responsible for DNA binding, dimerization, nuclear localization, and binding to the NFKB inhibitor. REL plays a critical role through the production of pro-inflammatory cytokines and coordinating the expression of genes that control immune response, anti-apoptotic molecules and cell cycle modulators (Chen et al., 2003, Hayden et al., 2006, Belguise & Sonenshein, 2007; Tian & Liou 2009). Several studies have reported the effect of c-Rel in relation to autoimmune diseases, ESL(Burgos et al.,2000) and IBD (Neurath et al., 1998, Visekruna et al., 2006). The REL locus is common to UC, CD and RA. Zhernakova et al., 2008 reported moderate association of REL polymorphisms to UC. In 2009 Trynka et al.,discovered two new loci, REL and OLIG3/TNFAIP3 to be associated to coeliac disease. Gregersen et al., 2009 described the association of the intronic single nucleotide polymorphism (SNP) rs13031237G→T in the REL gene as a risk factor for RA in an expansion of previous genome-wide association studies. Varade et al., 2011reported first association of REL gene polymorphism in SLE. Recent GWAS studies reported epistasis between CARD9 and Rel in ulcerative colitis (McGovern et al., 2010) and Crohn’s disease (Dubois et al., 2010). Franke et al., 2010 meta-analysis further established the Rel association, increasing the number of CD susceptibility loci to 71.
GCKR gene located on chromosome 2 encodes, glucokinase (hexokinase 4) regulatory protein belonging to theGCKR subfamily. GCKR is implicated in metabolic traits such as triglyceride (Aulchenko et al., 2009), fasting glucose (Dupuis et al., 2010) and serum uric acid (Kolz et al., 2009). Franke et al., 2010 meta-analysis first reported the association of GCKR as Crohn's disease susceptibility loci. Meta analysis study published by Umeno et al., 2011 confirmed the association of GCKR never had been shown associated with UC in any single studies performed to date. However, there is little information how this gene affects the development of CD and UC. Functional analysis will provide further understanding of the common pathogenesis of CD and UC.
ATG16L1 gene, located on chromosome 2, encodes the autophagy-related 16-like 1 protein involved in the formation of autophagosomes during autophagy. Hampe et al., 2007 GWASfirst identified ATG16L1 as a susceptibility gene for CD and demonstrated the expression of mRNA and protein in the colon, small intestine, intestinal epithelial cells and leukocytes with no difference in protein expression in intestinal tissues between CD and healthy controls. In constrast, Lees et al., 2008 demonstrated down-regulated ATG16L1 mRNA in colonic CD biopsies compared with healthy controls in microarray dataset.Analysis further confirmed the marker, rs2241880, was an SNP that encodes a threonine to alanine substitution (T300A) at amino acid position 300 which was correlated with the incidence of CD in two German and one British studies (Rioux et al., 2007). Prescott et al., 2007 reported a non-synonymous SNP in ATG16L1 predisposes to ileal Crohn's disease independent of CARD15 and IBD5. Roberts et al., 2007 reported strong association of IL23R, R381Q and ATG16L1 T300A with CD in a study of New Zealand Caucasians with IBD. Glas et al., 2008 reported strong association of ATG16L1 gene variants rs2241879 and rs2241880 (T300A) to CD in German population. Weersma et al., 2009 confirmed the genetic association of ATG16L1 along-with multiple CD susceptibility loci in a large Dutch-Belgian cohort.In addition, a number of studies included child-onset IBD cases, the results of which were also confusing and conflicting. Baldassano et al., 2007 reported association of the T300A non-synonymous variant of the ATG16L1 gene with susceptibility to pediatric Crohn's disease. Subsequently Van Limbergen et al., 2008b, reported ATG16L1 gene influences susceptibility and disease location but not childhood onset in Crohn's disease in Northern Europe population. The study by Latiano et al., 2008 confirmed the association ATG16L1 with CD, in both adult and pediatric-onset subsets. A meta-analysis in Spanish population by Marquez et al., 2009a reported Thr300Ala polymorphism association with CD, regardless of the CARD15 or IL23R status, but not with UC. The meta-analysis by Cheng et al., 2010 reported ATG16L1 T300A polymorphism with susceptibility to both CD and UC. Recent study by Platinga et al., 2011 demonstrates that the genetic variant of human ATG16L1, which confers a higher risk for CD is associated with elevated production of pro-inflammatory cytokines after engagement of NOD2, thereby provide an explanation for the excessive inflammatory response observed in CD caused by microorganisms that reside in the gut. These findings implicate the role of autophagy and intestinal microbes in the pathogenesis of IBD, and demonstrate the need for further studies.
MST1, UBA7, APEH, AMIGO3, GMPPB, BSN (3p21)
Raelson et al., 2007 GWAS first reported association of 3p21 SNP rs11718165 with CD. GWAS undertaken by the WTCCC 2007 reported different SNP rs9858542 associated with CD in Spanish population. Several studies have reported association with UC also (Fisher et al., 2008, Franke et al., 2008, & Anderson et al., 2011 meta-analysis). Marquez et al., 2009c reported two SNPs significantly associated with CD. Beckly et al., 2008 demonstrated strong epistasis with known CD associated variants in CARD15. Latiano et al., 2010 confirmed the association of variants (BSN and MST1) at the 3p21 locus influence susceptibility and phenotype both in adults and early-onset patients with IBD. Morgan et al., 2010 replicated the reported association in the 3p21 region with CD in New Zealand population. Meta-analysis by Umeno et al., 2011 confirmed the association of BSN-MST as common susceptibility loci between UC and CD. Designated as IBD9, 3p21 potentially is an important IBD locus, previously identified by early genome-wide linkage and fine mapping studies. This region contains CD candidate genes like MST1, UBA7, APEH, AMIGO3, GMPPB, and BSN. MST1, macrophage stimulatory protein 1, is involved in inflammation and tissue remodeling for wound healing. Goyette et al., 2008 first identified a non-synonymous coding variant (rs3197999, R689C) in MST1 gene associated with both CD and UC. APEH encodes a serine peptidase with a functional role in the degradation of bacterial peptide break down products in the gut to prevent excessive immune response. Amphoterin-Induced Gene and ORF 3 (AMIGO3), member of the AMIGO family of type I transmembrane proteins are thought to play roles in neuronal axon tract development and cell adhesion. Ubiquitin-like modifier-activating enzyme 7protein in humans is encoded by the UBA7gene. GPX1 encodes the antioxidant, glutathione peroxidase isoform 1. The BSN (bassoon) gene encodes a protein known neurotransmitter released at central nervous system synapses. Recent GWAS have provided evidence for the involvement of 3p21 in the pathogenesis of IBD, although it remains unclear which is the causative SNP and/or gene.
Prostaglandin E receptor 4 protein encoded by PTGER4 gene located on chromosome 5 belongs to the G-protein coupled receptor family and is one of four receptors identified for prostaglandin E2 (PGE2) that can activate T-cell factor signaling, mediates PGE2 induced expression of early growth response 1 (EGR1), regulates the level and stability of cyclooxygenase-2 mRNA, and lead to the phosphorylation of glycogen synthase kinase-3. Libioulle et al., 2007 first described the localization of PTGER4 as novel major susceptibility locus for CD to gene desert on 5p13.1 by GWA in Belgian cohort. Kurz et al., 2006 previously reported PTGER4 as asthma susceptibility loci through fine mapping and positional candidate studies on chromosome 5p13. PTGER4 is a strong candidate susceptibility gene for CD as PTGER4 knock-out mice have increased susceptibility to dextran sodium induced colitis (Kabashima et al., 2002), further supported by the observation that the CD susceptibility allele at marker rs4495224 was associated with increased PTGER4 transcript levels in lymphoblastoid cell lines. The association of PTGER4 as susceptibility loci for Crohn's disease was further replicated through independent GWAS (Barrett et al., 2008, Franke et al., 2010).
The candidate genes encompassed at locus 5q31.1 includes a cytokine gene cluster (IL5, IL4, IL13), interferon regulator factor 1(IRF1), SLC22A4 (solute carrier family 22 member 4) and SLC22A5 (solute carrier family 22 member 5).SLC22A5 and SLC22A4 are high affinity sodium-dependent uptake transporters that function in the transport of L-carnitine and elimination of cationic drugs in the intestine. The locus 5q31 known as IBD5 was first associated with CD in genome-wide linkage scan in Canadian families (Rioux et al., 2001). Two pediatric studies reported in white populations confirmed an association of the SLC22A variants with CD (Babusukumar et al., 2006; Russell et al., 2006). Studies have also demonstrated IBD5 epitasis with NOD2/CARD15 for both CD and UC (Giallourakis et al., 2003, McGovern et al., 2003). First GWAS WTCCC 2007, confirmed the association of this locus with CD, followed by Raelson et al., 2007, Hampe et al., 2007, Franke et al., 2007. Furthermore, a meta-analysis by Barrett et al., 2008 pointed out another SNP, rs2188962, as the strongest disease-associated variant within IBD5. Besides CD, the TC haplotype has also been suggested to confer risk for UC (Waller et al., 2006; Palmieri et al., 2006). Preliminary data have also implicated this region associated to type I diabetes (Santiago et al., 2006). Although many studies have associated the region of chromosome 5, which contains SLC22A4 and SLC22A5 with CD, some investigators are hesitant to identify the mutations in these genes as causative of CD because of the tight linkage disequilibrium that exists between multiple genes in this chromosomal region. Hradsky et al., 2010 study on IBD5 confirmed the association of the IBD5 risk haplotype and showed prominent role of the rs6596075 and IGR2063b_1 in the Czech population. Franke et al., 2010 meta-analysis further confirmed the association with CD.
IL12B encodes the p40 subunit of the heterodimeric cytokines interleukin IL12 and IL23 (Langrish et al., 2004) produced by monocytes and dendritic cells. Association with CD was previously reported (WTCCC 2007) but not confirmed. The key role of the IL12/IL23 pathway in chronic intestinal inflammation is supported by the association between IL23R and CD3 and strong functional evidence from mouse models of colitis (Uhlig et al., 2006, Yen et al., 2006). Later polymorphisms at IL12B have been associated with both CD (Parkes et al., 2007, Barrett et al., 2008, McGovern et al., 2010) and UC in several GWAS (Franke et al., 2008, Fisher et al., 2008). Anderson et al., 2011 meta-analysis further confirmed the association with CD and UC. This gene also has been implicated in T1D susceptibility with inconsistent results (Howson et al., 2009). Recently, Ferguson et al., 2010 reported differential effect of two IL12B (rs1363670 and rs6887695) variants in New Zealand population (carrying the rs1363670 C variant increased CD risk while carrying the rs6887695 C variant decreased CD risk). Recent meta-analysies by Anderson et al., 2011& Umeno et al., 2011 further confirmed the association of JAK2 with both UC and CD.
PRDM1 encodes for PR domain containing 1, with ZNF domain (also known as BLIMP1) protein known to play important roles in the proliferation, survival and differentiation of B and T lymphocytes. BLIMP-1 acts as master transcriptional regulator of plasma cells and repressor of beta-interferon (β-IFN) gene expression. Anderson et al., 2011 meta-analysis first reported and Umeno et al., 2011 meta-analysis further confirmed the association of PRDM1 with ulcerative colitis and Crohn’s disease.
CDKAL1 gene encodes CDK5 regulatory subunit associated protein 1-like 1 with unknown function which belongs to methyl-thiotransferase family. Variants in the intronic block of CDKAL1 have been associated with Crohn’s disease (Barrett et al., 2008 GWAS) and T2D (Zeggini et al., 2008). CDKAL1 shows inconsistent replication in ulcerative colitis (McGovern et al., 2010 GWAS and Anderson et al., 2008 GWAS), this gene is also associated with psoriasis but does not exhibit genome-wide significance (Quaranta et al., 2009).
Janus kinase 2 (JAK2) is another common member of the IL23 pathway genes conferring susceptibility to IBD. Barrett et al., 2008 meta-analysis of GWAS first reported the association of polymorphism in the JAK2 promoter region with CD. Later, several studies reported association with UC (Fisher et al., 2008, Franke et al., 2008, Asano et al., 2009, Festen et al., 2009). McGovern et al., 2010 confirmed strong association of SNP rs10974944 in the JAK2 locus with UC in Dutch opulation. Franke et al., 2010 meta-analysis confirmed association with CD. Recent meta-analysis by Anderson et al., 2011& Umeno et al., 2011 further confirmed the association of JAK2 with both UC and CD. JAK2 gene locus is currently the subject of detailed re-sequencing efforts and functional studies to identify the causal variants.
TNFSF15 (tumor necrosis factor ligand superfamily, member 15) was identified as the first CD-susceptibility gene through a genome-wide association screening of 72738 SNPs in Japanese population (Yamazaki et al., 2005). The association has been well replicated by several studies; study in a Korean cohort (Yang et al., 2008), an independent Japanese cohort (Kakuta et al., 2006)and a US cohort (Picornell et al., 2007)and meta-analyses of GWAS in European populations (Barrett et al., 2008)although the association was less significant in Europeans than in Japanese or Korean individuals. Yang et al., 2008 replicated findings from studies on Caucasian populations and established TNFSF15 as CD susceptibility genes on Korean patients. However, the reports of TNFSF15 association with UC in Caucasians have been inconsistent (Franke et al., 2008, Wang et al., 2010, Haritunians et al., 2010). In contrast to his own previous finding, Yang et al., 2011 reported no association between TNFSF15 and IL23R with UC in Koreans. However, TNFSF15 showed a marginal association with UC in male patients only.
CARD9 gene located on 9q34.3 encodes caspase recruitment domain family, member 9 protein, an adaptor molecule of PRR signaling. CARD9 is essential in the process of stimulating the innate immune signaling by intracellular and extracellular pathogens thus is an attractive candidate gene for IBD association (Underhill & Shimada, 2007). Zhernakova et al., 2008 performed a comprehensive analysis of candidate genes from 85 genes in the innate-immunity molecular pathway, in a group of 1851 IBD patients and 1936 controls, and observed association of the CARD9 variant, located in extended haploblock on 9q34.3, predominantly with UC and confirmed replication in the WTCCC GWAS data set. Later Barrett et al., 2009 GWAS study of UC confirmed CARD9 susceptibility. In 2010 McGovern et al., reported epistasis between CARD9 and Rel in Ulcerative colitis. Franke et al., 2010 also confirmed the association of CARD9 in UC and IBD.
The cyclic adenosine 5'-monophosphate responsive element modulator (CREM) protein in humans encoded by the CREM gene binds to the IL2 gene promoter and suppresses expression of this cytokine (Tenbrock et al., 2002), critical for the initiation and termination of the immune response and T cell development. Meta-analysis studies by McGovern et al., 2010 confirmed the association of CREM with UC. Anderson et al., 2011 meta-analysis reported association with UC and CD. Recent meta-analysis study by Umeno et al., 2011 confirmed the association of CREM with UC.
Cyclin Y (CCNY) belongs to family of highly conserved proteins that activate cyclin-dependent kinases (Cdks) to regulate the cell cycle, transcription, and other cellular processes. Cyclin Y has not been characterized in any model organism. Very less is known for the human ortholog, CCNY. Franke et al., 2008 GWAS first reported association of SNP located in an intron of CCNY with the two IBD sub-phenotypes, CD and UC. Weersma et al., 2009 confirmed the association of CCNY as CD susceptibility loci in Dutch-Belgian cohort; though it is not yet clear whether CCNY plays a direct role in these diseases. Later the association of CCNY with UC and CD was confirmed in independent GWAS (Wang et al., 2010 and Waterman et al., 2010) and meta-analysis study by Umeno et al., 2011. First mutant allele reported for a Y-type cyclin, a null for Drosophila CycY generated by Liu & Finley, 2010 will provide further insight for the role of this conserved protein in human diseases.
Rioux et al., 2007 reported the first CD association with rs224136, a SNP 40kb downstream from Zinc finger 365 (ZNF365). GWAS performed in populations of Northern European descent by WTCCC, 2007 and Barett et al., 2008 GWAS further confirmed the CD susceptibility. Torkvist et al., 2010 reported ZNF365 in analysis of 39 CD risk loci in Swedish inflammatory bowel disease patients. Amre et al., 2010 reported ZNF365 as susceptibility loci for CD in Canadian children. ZNF365 exists as four known transcript variants with variable expression across the species (ZNF365 isoforms A-D; GenBank NM_014951, NM_199450, NM_199451 and NM_199452). Of interest, studies in human colorectal cell lines showing interaction with histone acetyltransferase p300 during butyrate activation of the cyclin-dependent kinase inhibitor, p21waf1, a protein with increased levels in colonic crypts of patients with ulcerative colitis (Hofseth et al., 2003) and mice demonstrating dextran sulfate sodium-induced colitis homozygous for ZNF148-delN, highlight role for ZNF148 in gastrointestinal homoeostasis (Law et al., 2006). Haritunians et al., 2011 reported association of variant in ZNF365 isoform-D with CD. This study also demonstrated expression of ZNF365D in intestinal resections from both CD subjects and controls, markedly reduced expression levels of ZNF365D in EpsteineBarr virus-transformed lymphoblastoid cell lines from CD subjects homozygous for the risk allele (Ala), association of ZNF365D rs7076156 genotype and altered expression of ZNF148 (transcription factor) known to be involved in gut homeostatsis. Umeno et al., 2011, meta- analysis study further confirmed the association of ZNF365 as common susceptibility loci for Chron’s disease and ulcerative colitis.
NK2 transcription factor related, locus 3 (NKX2-3) is a member of the NKX family of homeodomain-containing transcription factors encoded by NKX2-3 gene is implicated in many aspects of cell type specification and maintenance of differentiated tissue functions. The rs10883365 variant of NKX2-3 gene on chromosome 10q24.2 was first identified as a CD susceptibility gene in the WTCCC 2007 GWAS in Caucasian patients. Parkes et al., 2007 reported association of SNP rs10883365 (Pcomb=3.7 × 10−10), within the NKX2-3 gene loci contributing to CD. In addition, Fisher et al., 2008 reported a modest association (P = 3.3 × 10-4 in the UC panel and P = 2.4 × 10-6 using the expanded WTCCC control panel) between rs10883365 and UC in a non-synonymous SNP scan. Franke et al., 2008 GWAS reported NKX2-3 previously unknown disease loci for UC from studies of CD. Further confirmed association between CD and UC came from Barrett et al.,2008 and Franke et al., 2010 GWAS. Replication studies from Netherlands (Weersma et al., 2009) and Japan (Yamazaki et al., 2009) further confirmed the association with CD. Dutch study demonstrated that the association between the rs10883365 variant of NKX2-3 and CD was linked to smoking status, with the risk being more pronounced in active and passive smokers (Van der Heide et al.,2010).Recently, Meggyesi et al., 2010 confirmed that rs10883365 variant was associated with UC and CD susceptibility in the allelic and genotypic models with an OR of 1.53 and 1.24 respectively in Eastern European patients.
Leucine-rich repeat kinase 2 (LRRK2), also known as dardarin, member of leucine-rich repeat kinase family is an enzyme, in humans is encoded by LRRK2 gene. The protein is mainly expressed in the cytoplasm of neurons, myeloid cells and monocytes and is thought to be involved in the process of autophagy. LRRK2 was initially identified as the defective gene at PARK8 susceptibilty locus responsible for autosomal Parkinson’s disease (PD). The meta-analysis of GWAS by Barrett et al., 2008 identified LRRK2-MUC19 as CD susceptibility loci. Franke et al., 2010 GWAS further confirmed the association with CD. However, Phillips et al., 2010haplotype-tagging study of germline variation of MUC19 in inflammatory bowel disease suggested that LRRK2 represents the susceptibility gene at this region. Recently, meta-analysis of published studies by Umeno et al., 2011, reported LRR2-MUC19 as common susceptibility loci for CD and UC. With no known physiological substrate the role of LRRK2 in cellular context remains poorly understood. Zhang et al., 2009, GWAS reported association of an SNP located upstream from the LRRK2 locus with a higher susceptibility to Mycobacterium leprae. The study also highlighted other genes already known to be associated with CD, such as NOD2, TNFS15, and C13ORF31, suggesting that the pathways involved in the control of M. leprae might be also involved in the susceptibility to CD. Gardet et al., 2010 reported that LRRK2 is an IFN- target gene with increased expression in intestinal tissues upon CD inflammation, suggesting involvement in signaling pathways relevant to CD. Another member, MUC19, along-with LRRK2 is an important player in the rapidly evolving story of IBD. MUC19 encodes a mucin involved in epithelial cell lining protection (Hollingsworth & Swanson, 2004). Of interest, Mucin proteins protect the intestinal epithelium from injury are further supported by the fact that mucin deficiency leads to intestinal inflammation in mouse models of colitis (Van der Sluis et al., 2006). Meta-analysis by Umeno et al., 2011 reported LRRK2-MUC19 as novel susceptibility loci for CD and UC.
C13orf31 gene encoding chromosome 13 open reading frame 31 was first associated with leprosy in GWAS (Zhang et al., 2009). Meta-analysis by Anderson et al., 2011 reported C13orf31 as newly confirmed UC risk loci. Meta-analysis by Umeno et al., 2011 reported it as common susceptibility loci for both CD and UC.
STAT3 codes for a transcription factor that is involved in multiple pathways and functions, including Jak-STAT pathway, neuron axonal guidance, apoptosis, activation of immune responses, and Th17 cell differentiation (Egwuagu,2009). SNP (rs744166) within the STAT3 gene was associated to MS. Interestingly, the A allele of rs744166 tagging the MS-protective haplotype is associated with Crohn disease (Barrett et al., 2008 GWAS) and mutations in STAT3 are known to cause hyperimmunoglobulin E recurrent infection syndrome (HIES) (Holland et al., 2007), a rare autosomal-dominant disorder characterized by elevated immunoglobulin E levels and inflammation implying that STAT3 represents a shared risk locus for at least two autoimmune diseases. STAT3 was reported as T1D susceptibility locus by Fung et al., 2009, but it was not confirmed by other GWAS and meta-analysis studies. Recent meta-analyses by Anderson et al., 2011 and Umeno et al., 2011 confirmed the association with UC and CD.
Strawberry notch homologue 2 (SBNO2) gene located on chromosome 19 encodes for large 170-kDa proteins containing canonical DExD/H helicase domains. SBNO2 is key component of the IL-10 mediated anti-inflammatory pathway (Kasmi et al., 2007). Recent meta-analysis by Umeno et al., 2011 identified SBNO2 as novel common susceptibility loci for CD and UC.
TNFRSF6B, STMN3, RTEL1, ARFRP1, ZGPAT, SLC2A4RG, ZBTB46 (20q13)
The 20q13 signal resides in a complex telomeric region of LD and harbors the genes for STMN3 (stathmin-like 3), RTEL1 (regulator of telomere elongation helicase 1), TNFRSF6B (tumor necrosis factor receptor superfamily member 6B), ARFRP1 (ADP-ribosylation factor related protein 1), ZGPAT (zinc finger CCCH-type with G patch domain), LIME1 (Lck interacting transmembrane adaptor 1), SLC2A4RG (solute carrier family 2 member 4 regulator) and ZBTB46 (zinc finger and BTB domain containing 46). These loci contribute to both CD and UC susceptibility. The protein product for TNFRSF6B acts as a decoy receptor (DCR3) in preventing FasL-induced cell death, and a resistance to FasL-dependent apoptosis. Kugathasan et al., 2008 GWAS in a cohort of 1011 individuals using stratification of IBD by age of onset and 4250 matched controls identified previously unreported susceptibility loci for pediatric-onset IBD at 20q13 and 21q22. On the basis of the differences in serum DCR3 concentration between individuals with IBD with and without the identified SNPs, and its known biologic function, this study highlights TNFRSF6B as the most plausible candidate within the 20q13 locus.
Inducible costimulator-ligand (ICOS-L) is a member of the B7 family of costimulatory ligands (Coyle & Gutierrez-Ramos, 2001), shares 19–20% sequence identity with CD80 and CD86. ICOS-L binds to ICOS, a T cell-specific costimulatory molecule homologous to CD28 and CTLA-4. ICOSLG is intimately involved in proliferation and differentiation of T lymphocytes (Ito et al., 2007). Barrett et al., 2008 GWAS study first confirmed the association of ICOSLG with CD. The locus on chromosome 21q22 harboring genes including inducible T cell costimulator (ICOSLG) along-with autoimmune regulator (AIRE) and periodic tryptophan protein 2 homologue (PWP2) has been shown to influence susceptibility to both adult and pediatric CD and ulcerative colitis (UC).Imielinski et al., 2009 reported ICOSLG among the common variants at five new loci associated with early-onset inflammatory bowel disease. Franke et al., 2010 meta-analysis study further confirmed association with CD. Coquerelle et al., 2009 demonstrated that the treatment of trinitrobenzene sulfonic acid-induced murine colitis was ameliorated by anti-CTLA-4 antibodies through the induction of ICOShigh T regulatory cells. Sato et al., 2004 demonstrated high expression of ICOS in activated CD4+ LPMC of IBD patients contributing to the dysregulated immune responses in IBD suggesting a role for the ICOS-ICOSLG interaction in CD pathogenesis. Henderson et al., 2011 using a genome wide haplotype tagging approach in first family-based association analysis strategy reported that variation in ICOSLG influences CD susceptibility and demonstrated that the signal at the 21q22 locus is most likely to be due to germline variation at the 3 end begging for deep sequencing of the 3’UTR of the gene to identify causative variants.
This chapter gives an overview of comparative genetic analyses for type 1 diabetes and inflammatory bowel disease. Genome wide association studies have revolutionized the field of complex disease genetics. For the first time there is real consensus on the role of specific genetic factors underpinning common disorders. However, such genome wide scans can lack coverage in certain regions where it is difficult to genotype so it is possible that other loci with reasonable effect sizes remain to be uncovered.
It is clear that larger sample sizes and bigger meta-analyses of GWAS datasets will lead to the uncovering of further loci, albeit with lower and lower effect sizes. However it has been predicted that there are a myriad of rare variants (possibly with larger effects) contributing to disease that cannot be detected on current genotyping platforms. To uncover the remaining “missing heritability” in complex diseases like T1D and IBD (Manolio et al., 2009), investigators in the near future will need to work on large, high-throughput sequencing efforts involving thousands of DNA samples from affected subjects and a similar number of controls.
Novel genomic techniques, such as Next-Generation DNA Sequencing (NGS), have opened new avenues in the elucidation of genetic defects and had speeded up the identification of causative gene variants to systematically tackle previously intractable genetic disorders which would be missed by GWAS. NGS involves the use of threespecific techniques: DNA capture of specific regions (usually GWAS guided) usually in the 1-10 megabase range; exomic sequencing (the sequence of all coding regions of the genome), with capture of about 50 megabases; and the sequencing of entire human genome (~3 billion base pairs), in affected and unaffected individuals. Recent studies have used this approach to identify mutations for Miller syndrome (Roach et al., 2010) and Charcot-Marie-Tooth disease (Lupski et al., 2010). Although powerful, NGS is still very expensive and time consuming. However in future, with drop in costs of whole genome sequencing, it will likely become the dominant method for identifying mutations.
Recent GWAS have showed cross-talk of genes and autoimmune diseases. We report 16 common loci between T1D and IBD, six with opposite effects. As predicted, the role of some of the overlapping genes is not always the same for T1D, UC and CD. Thus, to discover all common, opposing and different variants for autoimmune disorders sequencing is necessary.
This research was financially supported by grant from National Institute of Health (DP3 DK085708-01) and an Institute Development Award to the Center for Applied Genomics from the Children’s Hospital of Philadelphia.