Open access peer-reviewed chapter

Genome-Wide Association in the Mitochondrial Genome Identifies Two Novel Genes Involved in Diabetes Mellitus Type 2

Written By

Julio Alejandro Valdez, Pedro Mayorga, Rafael Villa Angulo and Carlos Villa Angulo

Submitted: 28 February 2023 Reviewed: 13 March 2023 Published: 05 June 2023

DOI: 10.5772/intechopen.1001477

From the Edited Volume

Advances in Genetic Polymorphisms

Nouha Bouayed Abdelmoula and Balkiss Abdelmoula

Chapter metrics overview

42 Chapter Downloads

View Full Metrics

Abstract

Diabetes Mellitus Type 2 (DM2) is a complex and multifaceted disorder currently listed as one of the epidemics of the twenty-first century due to its prevalence and the adverse cardiovascular effects it causes. This chapter examines the relationships between base-pair positions in human mitochondrial genome and type 2 diabetes. The data included 510 complete mitochondrial genomes, of which 437 belonged to individuals with type 2 diabetes and 73 to healthy individuals. An alignment algorithm allowed inspecting and choosing a region with optional positions for analysis, a principal component analysis permitted viewing the data structure, and after a regression analysis, we declared three base-pair positions associated to DM2. Upon examination of the genome annotation, three genes were identified as potential candidates for association, one of which was previously linked to type 2 diabetes according to previous studies. This chapter offers further proof of a possible genetic link between type 2 diabetes and metabolic syndrome.

Keywords

  • genome-wide association study (GWAS)
  • type 2 diabetes mellitus (DM2)
  • logistic regression
  • principal component analysis (PCA)
  • risk factors

1. Introduction

A group of metabolic disorders known as diabetes mellitus are characterized by chronic hyperglycemia and can be caused by problems with insulin secretion, insulin action, or both. Alterations to the lipid and protein metabolism coexist with hyperglycemia. Long-term sustained hyperglycemia is linked to damage, dysfunction, and failure of many different organs and systems, particularly the heart, blood vessels, nerves, and the retina [1].

There are several types of diabetes and other categories of glucose intolerance. Type 1 diabetes mellitus (DM1): Its hallmark is autoimmune destruction of the β cell, which causes absolute insulin deficiency, and a tendency to ketoacidosis. Such destruction in a high percentage is mediated by the immune system, which can be evidenced by the determination of antibodies: Anti-GAD (anti glutamate decarboxylase), anti-insulin, and against the islet cell, with a strong association with the specific DQ-A alleles and DQ-B of the major histocompatibility complex (HLA). DM1 can also be of idiopathic origin, where the measurement of the aforementioned antibodies gives negative results [1]. It usually manifests itself in the infant-juvenile age (before the age of 30) and the vast majority are of autoimmune origin. It is characterized by a defect in insulin secretion and constitutes 5–10% of all cases of diabetes. It is always a subsidiary of insulin treatment [2].

Diabetes type 2 (DM2): This is the most prevalent variety and is frequently linked to obesity or an increase in visceral fat. Ketoacidosis rarely develops spontaneously. The issue ranges from a predominant resistance to insulin, accompanied by a relative hormone deficiency, to a progressive malfunction in its secretion [1]. It is the most frequent form of DM2 since it represents between 90 and 95% of cases. It usually appears after the age of 40 and is associated with obesity, which is present in up to 80% of patients with type 2 DM. Its treatment requires diet and exercise alone or is associated with oral antidiabetics and/or insulin [2].

Gestational Diabetes Mellitus (GDM): Specifically, groups glucose intolerance detected for the first time during pregnancy. Hyperglycemia before twenty-four weeks of pregnancy is considered undiagnosed pre-existing diabetes [1]. It occurs in 1–14% of pregnant women and is associated with an increased risk of obstetric and perinatal complications [2].

Due to the interaction of numerous genetic variants and other environmental factors, diabetes mellitus type 2 (DM2) is a complex and multifaceted disorder characterized by chronic hyperglycemia. The prevalence of obesity and physical inactivity, together with the aging of the population, have all contributed to a significant rise in the number of people worldwide who have type 2 diabetes [3]. It is classified as one of the epidemics of the twenty-first century, both for its growing magnitude and for its negative impact on cardiovascular diseases [4].

DM2 is a heterogeneous disease of multifactorial etiology, in which insulin resistance and inadequate compensatory insulin secretion by pancreatic beta cells are combined; It manifests as chronic hyperglycemia, accompanied by carbohydrate, fat, and protein metabolism disorders. The susceptibility of this disease is determined by the combined effect of genetic and environmental factors [4].

Environment refers to all non-genetic factors that modulate the phenotype, which may include random environmental factors such as climate, geography, demographics, and socioeconomics; as well as the lifestyle that is made up of diet, smoking, alcoholism, and physical activity, which the individual can modify [4].

The disease is regarded as a polygenetic disturbance in which each genetic variety confers a partial and additive effect. Just 5–10% of cases of DM2 may be attributed to genetic defects; these cases include juvenile-onset diabetes, insulin-resistance syndromes, mitochondrial diabetes, and neonatal diabetes [5]. Examination of DM2 susceptibility genes may be useful for the prediction, prevention, and early treatment of the disease.

Through the implementation of genome-wide association studies (GWAS), the number of common genetic variants associated with DM2 has increased rapidly [6, 7, 8, 9, 10, 11, 12]. In addition, more than 40 genetic loci associated with DM2 have been identified; however, these loci have been identified primarily in European populations [13]. Still there are additional genetic factors to be discovered since the identified genetic regions only account for a small portion of the estimated heritability of DM2. The high economic cost and a large number of hypotheses in these studies are a limitation of GWAS [14]. Several research studies have examined cluster-based GWAS’s viability and efficiency, with significant time and financial savings. [14, 15, 16]. In addition, whole genome sequencing across multiple samples in a population provides an unprecedented opportunity to comprehensively characterize polymorphic variants in the population [17].

Type 2 diabetes, as mentioned, is a complex illness brought on by numerous genetic and environmental factors; family-based and peer studies estimate that heredity ranges from 22 to 73%. Recent estimates placed the prevalence of DM2 in adults, adjusted for age, at 7.6% in European Americans, 14.9% in Afro-Americans, 4.3–8.2% in Asian Americans, and 10.9–15.6% in Hispanic Americans [18, 19, 20, 21]. More than 40 genetic loci associated with DM2 have been identified, but so far, these locations have primarily been revealed through studies of people with European ancestry. The candidate gene association studies discovered a link between DM2 and nonsensical variants in PPARG (MIM 601487) and KCNJ11 (MIM 600937), which are targets for drugs to treat diabetes, and they implicated common genetic variants responsible for Mendelian forms of diabetes in DM2 [22, 23, 24, 25, 26, 27].

The first genome-wide association studies (GWAS) for DM2 [6, 7, 8, 9, 28] and fasting glucose [29] successfully identified multiple associated loci. And, through recent GWAS meta-analyses for DM2 [30] and quantitative glycemic characteristics [31], the number of loci associated with DM2 have significantly increased in European populations; the majority of these variants act via defects in the function of beta-cells rather than insulin action. In total, known variants associated with DM2 account for 10% of genetic variation [30, 32], therefore it is likely that more locations and independent factors increase the risk of the disease.

Few people outside of Europe are aware of the genetic factors that contribute to type 2 diabetes. A new locus (KCNQ1 [MIM 607542]) was discovered based on a GWAS in a Japanese population [33, 34] and was later discovered to have separate alleles in people of European ancestry [30]. Most recently, GWAS in Chinese populations [5, 35], Japanese [36], and south Asian [37] discovered additional DM2 loci that exceed genome-wide significance. To date, GWAS in African Americans has been underpowered to detect new loci [38].

In a recent multiethnic meta-analysis, three DM2 risk loci in Europe (GATAD2A/CILP2/PBX4, TH/INS, and SREBF1), one DM2 risk locus in Africa (HMGA2), and one DM2 risk locus in multiple ethnic groups (BCL2) were associated confirming that an allele-based gene score exists. Hence, the multiethnic GWAS of DM2 should result in the discovery of additional genes associated with diabetes that are relevant to numerous ethnic groups [13].

There are still additional genetic factors to be discovered since the identified genetic regions only account for a small portion of the estimated heritability of DM2. The high economic cost and a large number of hypotheses in these studies are a limitation of GWAS [14]. Several studies have looked at the viability and effectiveness of GWAS based on clusters, with considerable time and cost savings [14, 15, 16]. In addition, whole genome sequencing across multiple samples in a population provides an unprecedented opportunity to comprehensively characterize polymorphic variants in the population [17].

The purpose of this chapter was to perform an association study in the mitochondrial genome to identify Base-Pair (bp) genomic positions statistically associated with DM2. An alignment analysis enabled visualization and selection of a genomic region with allelic variability. Subsequently, a Principal Component Analysis (PCA) was used to visualize the complexity of the data; followed by a simple and multiple logistic regression analysis that allowed the discovery of base-pair positions associated with DM2. Finally, an inspection of the mitochondrial genome annotation revealed 3 candidate genes to be associated with DM2.

Advertisement

2. Methodology

Next, the database used in this chapter will be explained, as well as the techniques used for the analysis of DNA sequences.

2.1 Database

We explored genetic variants of these type 2 diabetes-associated genes in different populations using genome-wide association analysis available in the Type 2 Diabetes Knowledge Portal database (http://www.type2diabetesgenetics.org/). The search criteria were: patients with DM2, considering a p-value <0.05 in the X2 test and an Odds Ratio > 1.0. Based on the results obtained, the variants were evaluated and identified in NCBI dbSNP (https://www.ncbi.nlm.nih.gov/snp/), and their registration was documented in ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/). Related polymorphisms were explored in the UCSC Genome Browser (https://genome.ucsc.edu/) using the GRCh37/hg19 version and the change of polymorphisms as reference allele/effect and minimum allele frequency were compared with the information available in genomic databases. The allele frequency and the genotype frequency of the effect on heterozygotes and homozygotes were queried in the 1000 Genomes database using Ensembl (http://grch37.ensembl.org/index.html). Finally, samples of different tissues from patients with type 2 diabetes were analyzed with the Orange package (https://orange.biolab.si). To identify the differences in expression of this gene in different tissues, from GEO data sets (https://www.ncbi.nlm.nih.gov/gds) expression values of muscle, liver, and pancreas were obtained and the differences were analyzed by Mann Whitney U Test considering p < 0.05 significant.

To explore the prevalence and distribution of mitochondrial polymorphisms associated with DM2, the search for complete sequences of the mitochondrial chromosome (16,569 base pairs) was designed and minor fragments were considered; because most of the works on the subject are amplified for the control region (D-loop) with a size smaller than 1000 base pairs in the nucleotide database of the National Center for Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih.gov/nucleotide) where the GenBank, the most extensive collection of genetics and genomics available, is located. Changes in the sequence were identified, as well as the insertion, deletion, and heteroplasmy sites. We also estimated the number of mitochondrial single nucleotide polymorphisms (mtSNPs), the average number of distinct nucleotides among populations, as well as the number of fixed differences, shared polymorphisms, and mono- and polymorphic mutations between populations. This made it easier to identify the polymorphisms that are most prevalent in the control region and the mtDNA codifying region.

The database is separated into two fasta files, the first file has 437 whole mitochondrial genome sequences from type 2 diabetic human patients and the second file has 73 from healthy individuals (each sequence is the most common or dominant whole mitochondrial chromosome in each individual, with 16,569 bases each, although there may be slight nucleotide variations between different individuals). The shortest sequence is 16,554 bases.

Both files were merged into a single file, with sick individuals placed before healthy ones, resulting in a total of 510 sequences. Once aligned, the sequences with the MEGA software have a length of 16,609 data. After this, a visual analysis of all the already aligned sequences was carried out, looking for the region that presented the greatest disadvantage, the region resulting from position 16,170–16,410, a total of 241 positions or data.

In addition, the nucleotides (Adenine (A), Cytosine (C), Guanine (G), Thymine (T)) were changed by numbers as follows, to perform a cluster analysis: A = 1, C = 2, G = 3 and T = 4, GAP(−) = 5. There were also other letters other than nucleotides such as R, Y, W, N. These letters were changed to the number 9. These last letters according to the nomenclature of the International Union of Pure and Applied Chemistry (IUPAC) correspond to:

  • R = GA (purine)

  • Y = TC (pyrimidine)

  • W = AT (weak bonds)

  • N = AGCT (any)

In addition, as extra information, we searched to which ethnic groups the people in the DNA sequences of the database belonged. Finding the following results: 239 sequences belong to Taiwanese people, 62 people are Indian, there are 6 Italians, 11 Chinese people, and 192 Japanese.

A total of 510 complete human mitochondrial genomes were used in this study. Of the total genomes, 437 were from people with DM2 and 73 from healthy people. The data was stored in a FASTA format file and the genomes were aligned using the CLUSTALW algorithm implemented in the MAFFT tool [39, 40]. The total length of the alignment was 16,610 nucleotides.

In the aligned genomes, an inspection was carried out to locate the region with the highest frequency, resulting in the detected region from position 16,170 to 16,410, with a length of 241 nucleotides. This region of the alignment was removed and the rest of the analysis was performed with these data.

2.2 Principal component analysis (PCA)

The main goal of principle component analysis (PCA) is to reduce the dimensionality of a set of data, which often consists of a large number of interrelated variables, while retaining all possible variation. This is accomplished by transforming a new group of variables known as the principal components (PCs), which are disassociated from one another and arranged so that the first few retain the most variation found in the total set of original variables [41].

Theoretically, PCA provides the best least squares transformation of a given set of data. In order to obtain the key components we provide a vector XT of n dimensions, X = [x1, x2, …, xn]T, whose mean vectors (M), and covariance (C) are described by M = E(X) = [m1, m2, …, mn]T and C = E [(X − M) (X − M)T]. Then we calculate the eigenvalues λ1, λ2, …, λn and the eigenvectors P1, P2, …, Pn; and order them according to their magnitude λ1 ≥ λ2 ≥ ⋯ ≥ λn. The d eigenvectors must be chosen to represent the n variables, d < n. Then P1, P2, …, Pd are known as principal components [41].

In order to apply PCA to the sequences, a transformation of the nucleotides was performed, from the ACGT format to the numerical format. Each nucleotide was assigned a value between 1 and 4 as follows: A = 1; C = 2; G = 3; and T = 4. In the same way, the blank spaces (GAP) = 5. The PCA analysis was applied to the resulting numerical matrix. The purpose of applying the PCA analysis was to analyze the structure of the data and look for possible clusters that differentiated the data from sick and healthy people.

2.3 Entropy analysis

Shannon’s entropy theory, initially developed by Claude E. Shannon, is applied to measure the contrast between criteria and this information is used to make decisions. In this analysis, it is indicated that for all pi within a probability distribution P, there is a measure H, which satisfies the following properties [42]:

  1. H is a continuous positive function,

  2. If all pi is equal and pi = 1/n, then H should be an increasing monotonic function of n; and,

  3. For all, n ≥ 2,

    Hp1p2pn=hp1+p2p3pn+p1+p2Hp1p1+p2p2p1+p2E1

Shannon proved that the only function that satisfies these conditions is:

HShannon=inpilogpiE2

Where: 0pi1;i=1npi=1

2.4 Regression models

The objective of a linear regression model is to try to explain the relationship between a dependent variable (response variable) and a set of independent variables (explanatory variables) X1, …, Xn. In a simple linear regression model, we try to explain the relationship between the response variable (Y) and a single explanatory variable (X). Using the regression techniques of a variable Y on a variable X, we look for a function that is a good approximation of a cloud of points (xi, yi), by means of a curve [43].

The variable dependency can be a univariate or multivariate regression. Univariate regression identifies the dependency between a single variable as represented in Eq. (2) [44].

Y=α+βX+εE3

Where y is a dependent variable, x is an independent variable with coefficient β (it is the slope of the line and indicates how Y changes when X increases by one unit), and α is a constant (it is the ordinate at the origin, the value which Y takes when X is 0), and ε a variable that includes a large set of factors, each of which influences the response only to a small magnitude, which we will call error. X and Y are random variables, so an exact linear relationship between them cannot be established [43]. While multivariate regression is to identify the dependence between several variables simultaneously, it is represented in Eq. (3) [44].

Y=β0+β1X1+β2X2++βpXp+εE4

Where ε is the error term, β0 is the intercept, β1k are partial regression coefficients, for example, βi when 1 ≤ i ≤ k represents the change in the mean response corresponding to a unit change in xi when the other variables remain constant.

Regression models predict the outcome of the dependent variables from the independent variables. Importance is considered in regression analysis to handle more complicated problems [44]. The objective of multiple linear regression is to solve the set of coefficients Θ=β0β1βk given the observations X and the objectives Y [45].

2.4.1 Linear regression

Linear regression is the most common predictive model to identify the relationship between variables. It can be simple linear or multiple linear regression. Linear regression is described in Eq. (4) [44].

y=+εE5

In Eq. (4) y is the independent variable and can be a continuous or categorical value; x is a dependent variable that is always a continuous value. It analyzes a probability distribution and focuses mainly on conditional probability distribution with multivariate analysis [44].

2.4.2 Simple linear regression

The simple linear regression process that is depicted in Figure 1 is a regression analysis that uses a single independent variable and is described in the Eq. (2). Similar to how correlation expands the relationship between two variables, simple linear regression distinguishes between dependent and independent variables; however, correlation does not do so [44].

Figure 1.

Simple linear regression [44].

2.4.3 Multiple linear regression

Multiple or Multivariate Linear Regression (MLR) depicted in Figure 2 is the prediction process with more than one independent or predictor variable that is similar to multivariate analysis as described in Eq. (3) [44].

Figure 2.

Multiple linear regression [44].

A statistical technique known as multiple linear regression uses many explanatory variables to predict the outcome of a response variable. The multiple linear regression’s goal is to model the relationship between the explanatory and response variables. The next model is a multiple linear regression model with k predictor variables, x1, …, xk [45].

The MLR problem is frequently resolved using least squares. If each predictor variable x1, x2,. .., xk has n observations, then xij represents the i-th observation of the j-th predictor variable xj. For example, x31 represents the first value of the third observation. Specifically, Eq. (3) above can be expressed as [45]:

yj=β0+β1Xj1+β2Xj2++βkXjk+εjE6

Where 1jn, yj is the jth target value. The system of n equations can be represented as a design matrix as shown in Eq. (2), and describes the levels of the predictor variables acquired at each observations. All of the regression coefficients are contained in the vector β. The least squares estimates, which are stated below, are used to create the regression model β [45].

β̂=XTX1XTyE7

Then the estimated value of y can be calculated as follows after obtaining β̂ [45].

ŷ=Xβ̂ϵ=yŷE8

The purpose of using regression data was to search for SNPs statistically associated with DM2.

2.5 Risk factors

A measure of the relationship between an exposure and a result is called an odds ratio (OR). The odds ratio (OR) shows the likelihood of an occurrence given a specific exposure in comparison to the likelihood of the outcome in the absence of that exposure. Case-control studies are the most frequent applications of odds ratios [46].

The odds ratio is used to compare the likelihood of an outcome (such a disease or disorder), because of exposure to a particular variable (e.g., health characteristic, item of medical history). The odds ratio can also be used to assess if a specific exposure represents a risk for a specific outcome and to assess the relative importance of several risk variables for that outcome [46].

  • OR = 1 Outcome probabilities are unaffected by exposure.

  • OR > 1 Exposure is linked to bigger odds of success.

  • OR < 1 Exposure is linked to a reduced likelihood of success.

It is calculated using the 95% Confidence Interval (CI) to determine the accuracy of the OR. A high OR precision is indicated by a small CI, while a low OR precision is shown by a large CI. It is important to note that the 95% CI does not provide information about a measure’s statistical significance, unlike the p-value. In reality, if the 95% CI does not overlap the null value (for instance, OR = 1), it is frequently regarded as a marker of statistical significance. Therefore, it would be incorrect to interpret a 95% CI OR that encompasses the null as showing that exposure and outcome are not related [46].

To define risk factors, each base-pair positions found to be significant in the association analysis (regression analysis) was inspected. The Odds Ratio (OR) calculation criteria and definition of Risk Factor, as described in [46] were applied. The statistical significance, OR value, and 95% confidence range for each variable were examined based on the findings. Then, each base-pair position that satisfied the subsequent requirements was declared as a risk factor:

  1. If the base-pair position statistical significance (p-value) was less than 0.05;

  2. The odds ratio (OR) was not equal to 1; and

  3. The 95% confidence range for the odds ratio did not contain 1.

Hence, if a base-pair position satisfied these three criteria and its OR > 1, it is declared as a risk factor associated with a higher probability of diabetes. In the same way, if the variable met the three conditions, and its OR < 1, it is declared as a risk factor associated with a lower probability of diabetes.

Advertisement

3. Results

The complete mitochondrial genomes of the 510 patients were aligned with the MAFFT tool. The result of the alignment was visualized with the MEGA X software [47]. By visual inspection, one region with variability was observed, while the rest showed perfect alignment. Figure 3a shows a fragment of the region with variability, and Figure 3b shows a fragment of the region without variability.

Figure 3.

Fragment of the alignment of the mitochondrial genomes of patients with DM2 and healthy. (a) Represents a region with variability and (b) represents a region with zero variability.

The region between positions 16,170 and 16,410, with a length of 241 nucleotides, was chosen to perform the rest of the analysis.

To analyze the structure of the information and look for possible clusters that would differentiate the data from sick and healthy people, PCA was applied to the aligned region of high variability. To carry out this analysis, the statistical language R was used. Figure 4 shows the graph of Principal Component 1 (PC1) against Principal Component 2 (PC2). As we can see in the graph, the information appears mixed and there is no clear differentiation between the groups. This analysis shows us the complexity of the data.

Figure 4.

Comparison graph between cases vs. controls.

The association analysis was performed in two steps, on the one hand, simple logistic regression was applied to each base-pair position (bp) of the variant region (241 base-pair positions), assigning a 1 to the dependent variable, for all healthy patients, and 0 to all patients with DM2. Those positions that were statistically significant (p-value <0.05) were selected. Subsequently, a multiple logistic regression was carried out grouping the positions that were significant in the simple regression. Those that were significant in the multiple regression were declared as positions associated with DM2. Table 1 shows the positions that were significant both in the simple regression and in the multiple regression.

Genomic position (BP)Simple regression (P-value)Multiple regression (P-value)Associated with DM2
16,1840.00380.0021Yes
16,2220.03840.6592No
16,2570.02890.1037No
16,2630.04150.6937No
16,2820.00330.0064Yes
16,2890.04260.4447No
16,3440.00380.0159Yes
16,3510.04380.1983No

Table 1.

Simple and multiple regression results.

Advertisement

4. Discussion

As observed in Table 1, after multiple regression three positions were associated with DM2. The positions and their resulting p-values were: 16,184; 16,282, and 16,344 and 0.0021, 0.0064, and 0.0159, respectively. To locate the associated gene, the human mitochondrial genome annotation in the National Center for Biotechnology Information (NCBI) database (https://www.ncbi.nlm.nih.gov/) was inspected. Three genes were located within 3000 base pairs (bp) of the associated positions. These genes are: CYTB, which produces the Cytochrome B protein and contributes to the conversion of energy from food to cellular energy (Adenosine Triphosphate, ATP), the TRNP gene, which is the Proline tRNA, and the TRNT gene, which is the tRNA of Threonine. Especially the TRNT gene, was found to be associated with the maternal heritability of DM2 in Chinese families [48], and in another study carried out by Momiyama, et al., this gene was associated, as in our study, with the genomic position 16,184; and declared as one of the causes of left ventricular hypertrophy in patients with DM2 in Japanese families [49].

Advertisement

5. Conclusions

In this association study, 510 complete mitochondrial genomes were analyzed. Of the total genomes, 437 were from patients with DM2, and 73 from healthy patients. A genome-wide alignment allowed locating a variable region in its allelic content; a PCA analysis allowed us to visualize the complexity of the data, and a logistic regression analysis allowed us to find 3 base-pair positions associated with DM2. The associated positions were located within 3 k bp of three genes, one of which (TRNT gene) was reported by previous studies to be associated with DM2. Finally, this study adds new evidence of the association of genomic positions with DM2.

Advertisement

Acknowledgments

This work was developed within the Master’s and Doctorate in Science and Engineering (MYDCI) program offered by the Autonomous University of Baja California. In addition, it was supported by a CONACYT scholarship.

References

  1. 1. de Rojas E, Molina R, Rodríguez C. Definición, clasificación y diagnóstico de la diabetes mellitus. Revista Venezolana de Endocrinología y Metabolismo. 2012;10(1):7-12. Available from: https://www.redalyc.org/articulo.oa?id=375540232003
  2. 2. Mediavilla Bravo JJ. la diabetes mellitus tipo 2, Medicina Integral. Available from: https://www.elsevier.es/es-revista-medicina-integral-63-articulo-la-diabetes-mellitus-tipo-2-13025480
  3. 3. Chen L, Magliano DJ, Zimmet PZ. The worldwide epidemiology of type 2 diabetes mellitus--present and future perspectives. Nature Reviews. Endocrinology. 2011;8(4):228-236. DOI: 10.1038/nrendo.2011.183
  4. 4. Ofarrill LCL, Cuervo AM, Ferrer RL, Valdés MTL. Interacción genoma-ambiente en la diabetes mellitus tipo 2. Acta Médica del Centro. 2018;12(4). Available from: http://www.revactamedicacentro.sld.cu/index.php/amc/article/view/948
  5. 5. Tsai FJ et al. A genome-wide association study identifies susceptibility variants for type 2 diabetes in Han Chinese. PLoS Genetics. 2010;6(2):e1000847. DOI: 10.1371/journal.pgen.1000847
  6. 6. Scott LJ et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007;316(5829):1341-1345. DOI: 10.1126/science.1142382
  7. 7. Saxena R et al. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science. 2007;316(5829):1331-1336. DOI: 10.1126/science.1142358
  8. 8. Sladek R et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature. 2007;445(7130):881-885. DOI: 10.1038/nature05616
  9. 9. Zeggini E et al. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science. 2007;316(5829):1336-1341. DOI: 10.1126/science.1142364
  10. 10. Burton PR et al. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447(7145):661-678. DOI: 10.1038/nature05911
  11. 11. Zeggini E et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nature Genetics. 2008;40(5):638-645. DOI: 10.1038/ng.120
  12. 12. Gudmundsson J et al. Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes. Nature Genetics. 2007;39(8):977-983. DOI: 10.1038/ng2062
  13. 13. Saxena R et al. Large-scale gene-centric meta-analysis across 39 studies identifies type 2 diabetes loci. American Journal of Human Genetics. 2012;90(3):410-425. DOI: 10.1016/j.ajhg.2011.12.022
  14. 14. Baum AE et al. A genome-wide association study implicates diacylglycerol kinase eta (DGKH) and several other genes in the etiology of bipolar disorder. Molecular Psychiatry. 2008;13(2):197-207. DOI: 10.1038/sj.mp.4002012
  15. 15. Galvan A et al. Genome-wide association study in discordant sibships identifies multiple inherited susceptibility alleles linked to lung cancer. Carcinogenesis. 2009;31(3):462-465. DOI: 10.1093/carcin/bgp315
  16. 16. Forstbauer LM et al. Genome-wide pooling approach identifies SPATA5 as a new susceptibility locus for alopecia areata. European Journal of Human Genetics. 2012;20(3):326-332. DOI: 10.1038/ejhg.2011.185
  17. 17. Wong LP et al. Deep whole-genome sequencing of 100 southeast Asian Malays. American Journal of Human Genetics. 2013;92(1):52-66. DOI: 10.1016/j.ajhg.2012.12.005
  18. 18. Cowie CC et al. Prevalence of diabetes and high risk for diabetes using A1C criteria in the U.S. population in 1988–2006. Diabetes Care. 2010;33(3):562-568. DOI: 10.2337/dc09-1524
  19. 19. Díaz-Apodaca BA, Ebrahim S, McCormack V, de Cosío FG, Ruiz-Holguín R. Prevalence of type 2 diabetes and impaired fasting glucose: Cross-sectional study of multiethnic adult population at the United States-Mexico border. Revista Panamericana de Salud Pública. 2010;28(3):174-181. DOI: 10.1590/s1020-49892010000900007
  20. 20. Lee JW, Brancati FL, Yeh HC. Trends in the prevalence of type 2 diabetes in Asians versus whites: Results from the United States National Health Interview Survey, 1997-2008. Diabetes Care. 2011;34(2):353-357. DOI: 10.2337/dc10-0746
  21. 21. Bowden DW et al. Review of the Diabetes Heart Study (DHS) family of studies: A comprehensively examined sample for genetic and epidemiological studies of type 2 diabetes and its complications. The Review of Diabetic Studies. 2010;7(3):188-201. DOI: 10.1900/rds.2010.7.188
  22. 22. Altshuler D et al. The common PPARgamma Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nature Genetics. 2000;26(1):76-80. DOI: 10.1038/79216
  23. 23. Gloyn AL, Hashim Y, Ashcroft SJ, Ashfield R, Wiltshire S, Turner RC. Association studies of variants in promoter and coding regions of beta-cell ATP-sensitive K-channel genes SUR1 and Kir6.2 with Type 2 diabetes mellitus (UKPDS 53). Diabetic Medicine. 2001;18(3):206-212. DOI: 10.1046/j.1464-5491.2001.00449.x
  24. 24. Sandhu MS et al. Common variants in WFS1 confer risk of type 2 diabetes. Nature Genetics. 2007;39(8):951-953. DOI: 10.1038/ng2067
  25. 25. Winckler W et al. Evaluation of common variants in the six known maturity-onset diabetes of the young (MODY) genes for association with type 2 diabetes. Diabetes. 2007;56(3):685-693. DOI: 10.2337/db06-0202
  26. 26. Winckler W et al. Association of common variation in the HNF1alpha gene region with risk of type 2 diabetes. Diabetes. 2005;54(8):2336-2342. DOI: 10.2337/diabetes.54.8.2336
  27. 27. Winckler W et al. Association testing of variants in the hepatocyte nuclear factor 4alpha gene with risk of type 2 diabetes in 7,883 people. Diabetes. 2005;54(3):886-892. DOI: 10.2337/diabetes.54.3.886
  28. 28. Steinthorsdottir V et al. A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. Nature Genetics. 2007;39(6):770-775. DOI: 10.1038/ng2043
  29. 29. Bouatia-Naji N et al. A polymorphism within the G6PC2 gene is associated with fasting plasma glucose levels. Science. 2008;320(5879):1085-1088. DOI: 10.1126/science.1156849
  30. 30. Voight BF et al. Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis. Nature Genetics. 2010;42(7):579-589. DOI: 10.1038/ng.609
  31. 31. Dupuis J et al. New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nature Genetics. 2010;42(2):105-116. DOI: 10.1038/ng.520
  32. 32. So HC, Gui AH, Cherny SS, Sham PC. Evaluating the heritability explained by known susceptibility variants: A survey of ten complex diseases. Genetic Epidemiology. 2011;35(5):310-317. DOI: 10.1002/gepi.20579
  33. 33. Yasuda K et al. Variants in KCNQ1 are associated with susceptibility to type 2 diabetes mellitus. Nature Genetics. 2008;40(9):1092-1097. DOI: 10.1038/ng.207
  34. 34. Unoki H et al. SNPs in KCNQ1 are associated with susceptibility to type 2 diabetes in East Asian and European populations. Nature Genetics. 2008;40(9):1098-1102. DOI: 10.1038/ng.208
  35. 35. Shu XO et al. Identification of new genetic risk variants for type 2 diabetes. PLoS Genetics. 2010;6(9):e1001127. DOI: 10.1371/journal.pgen.1001127
  36. 36. Yamauchi T et al. A genome-wide association study in the Japanese population identifies susceptibility loci for type 2 diabetes at UBE2E2 and C2CD4A-C2CD4B. Nature Genetics. 2010;42(10):864-868. DOI: 10.1038/ng.660
  37. 37. Kooner JS et al. Genome-wide association study in individuals of South Asian ancestry identifies six new type 2 diabetes susceptibility loci. Nature Genetics. 2011;43(10):984-989. DOI: 10.1038/ng.921
  38. 38. Lettre G et al. Genome-wide association study of coronary heart disease and its risk factors in 8,090 African Americans: The NHLBI CARe Project. PLoS Genetics. 2011;7(2):e1001300. DOI: 10.1371/journal.pgen.1001300
  39. 39. Kuraku S, Zmasek CM, Nishimura O, Katoh K. aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity. Nucleic Acids Research. 2013;41(W1):W22-W28. DOI: 10.1093/nar/gkt389
  40. 40. Katoh K, Rozewicki J, Yamada KD. MAFFT online service: Multiple sequence alignment, interactive sequence choice and visualization. Briefings in Bioinformatics. 2017;20(4):1160-1166. DOI: 10.1093/bib/bbx108
  41. 41. Mateos-Valenzuela AG, González-Macías ME, Ahumada-Valdez S, Villa-Angulo C, Villa-Angulo R. Risk factors and association of body composition components for lumbar disc herniation in Northwest, Mexico. Scientific Reports. 2020;10(1):18479. DOI: 10.1038/s41598-020-75540-5
  42. 42. Delgado A, Huamani A, Brillitt B. Applying Shannon Entropy to Analise Health System Level by departments in Peru. In: in 2018 IEEE XXV International Conference on Electronics, Electrical Engineering and Computing (INTERCON), 8-10 Aug. 2018. 2018. pp. 1-4. DOI: 10.1109/INTERCON.2018.8526435
  43. 43. Limeres CC. Regresión lineal simple. http://eio.usc.es/eipc1/BASE/BASEMASTER/FORMULARIOS-PHP-DPTO/MATERIALES/Mat_50140116_Regr_%20simple_2011_12.pdf [Accessed: 19 Enero 2022]
  44. 44. Kavitha S, Varuna S, Ramya R. A comparative analysis on linear regression and support vector regression. In: in 2016 Online International Conference on Green Engineering and Technologies (IC-GET), 19-19 Nov. 2016. 2016. pp. 1-5. DOI: 10.1109/GET.2016.7916627
  45. 45. Zhang Z, Li Y, Li L, Li Z, Liu S. Multiple linear regression for high efficiency video intra coding. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 12-17 May 2019. 2019. pp. 1832-1836. DOI: 10.1109/ICASSP.2019.8682358
  46. 46. Szumilas M. Explaining odds ratios. Journal of Canadian Academy of Child and Adolescent Psychiatry. 2010;19(3):227-229. Available from: https://pubmed.ncbi.nlm.nih.gov/20842279
  47. 47. Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: Molecular evolutionary genetics analysis across computing platforms. Molecular Biology and Evolution. 2018;35(6):1547-1549. DOI: 10.1093/molbev/msy096
  48. 48. Li K, Wu L, Liu J, Lin W, Qi Q, Zhao T. Maternally inherited diabetes mellitus associated with a novel m.15897G>A mutation in mitochondrial tRNA(Thr) gene. Journal Diabetes Research. 2020;2020:2057187. DOI: 10.1155/2020/2057187
  49. 49. Momiyama Y et al. A mitochondrial DNA variant associated with left ventricular hypertrophy in diabetes. Biochemical and Biophysical Research Communications. 2003;312(3):858-864. DOI: 10.1016/j.bbrc.2003.10.195

Written By

Julio Alejandro Valdez, Pedro Mayorga, Rafael Villa Angulo and Carlos Villa Angulo

Submitted: 28 February 2023 Reviewed: 13 March 2023 Published: 05 June 2023