Mitochondrial Haplogroups Associated with Japanese Parkinson’s Patients

Mitochondria are essential cytoplasmic organelles generating cellular energy in the form of adenosine triphosphate by oxidative phosphorylation. Most cells contain hundreds of mitochondria, each of which has several mitochondrial DNA (mtDNA) copies, so each cell contains thousands of mtDNA copies. mtDNA has a very high mutation rate, and when a mutation occurs the cell initially contains a mixture of wild-type and mutant mtDNAs, a situation known as heteroplasmy. If the percentage of mutant mtDNA increases enough that the cell’s ATP production falls below the level needed for normal cell function, disease symptoms appear and become progressively worse. A wide variety of diseases—such as Parkinson’s disease (PD), Alzheimer’s disease (AD), and cancer—are reportedly linked to mitochondrial dysfunction, and it is clear that mitochondrial diseases encompass an extraordinary assemblage of clinical problems (Wallace 1999; Vila and Przedborski 2003; Taylor and Turnbull 2005). Although mtDNA mutations have been reported to be related both to a wide variety of diseases and aging (Lin et al. 1992; Schoffner et al. 1993; Kosel et al. 1994; Mayr-Wohlfart et al. 1996; Schnopp et al. 1996; Simon et al. 2000; Tanaka et al. 2002; Dawson and Dawson 2003; Ross et al. 2003; Lustbader et al. 2004; Niemi et al. 2005; Alexe et al. 2007; Fuku et al. 2007; Chinnery et al. 2008; Kim et al. 2008; Maruszak et al. 2008; Feder et al. 2008), there are few reports regarding the relations between all mtDNA mutations and either disease patients or centenarians. The previous reports have also focused on mutations causing amino acid replacements in mitochondrial proteins and, although mitochondrial functions can of course be affected directly by amino acid replacements, they can also be affected indirectly by mutations in mtDNA control regions. It is therefore important to examine the relations between all mtDNA mutations and disease patients or centenarians. In the article reported here the relations between Japanese PD patients and their mitochondrial single nucleotide polymorphism (mtSNP) frequencies were analyzed using a method based on radial basis function (RBF) networks (Poggio and Girosi 1990; Wu and McLarty 2000) and a modified method based on RBF network predictions (Takasaki 2009). In addition, the relations between the haplogroups of the PD patients and those of the other four classes of people (centenarians, AD patients, T2D patients, and healthy non-obese young males) are also


Introduction
Mitochondria are essential cytoplasmic organelles generating cellular energy in the form of adenosine triphosphate by oxidative phosphorylation.Most cells contain hundreds of mitochondria, each of which has several mitochondrial DNA (mtDNA) copies, so each cell contains thousands of mtDNA copies.mtDNA has a very high mutation rate, and when a mutation occurs the cell initially contains a mixture of wild-type and mutant mtDNAs, a situation known as heteroplasmy.If the percentage of mutant mtDNA increases enough that the cell's ATP production falls below the level needed for normal cell function, disease symptoms appear and become progressively worse.A wide variety of diseases-such as Parkinson's disease (PD), Alzheimer's disease (AD), and cancer-are reportedly linked to mitochondrial dysfunction, and it is clear that mitochondrial diseases encompass an extraordinary assemblage of clinical problems (Wallace 1999;Vila and Przedborski 2003;Taylor and Turnbull 2005).Although mtDNA mutations have been reported to be related both to a wide variety of diseases and aging (Lin et al. 1992;Schoffner et al. 1993;Kosel et al. 1994;Mayr-Wohlfart et al. 1996;Schnopp et al. 1996;Simon et al. 2000;Tanaka et al. 2002;Dawson and Dawson 2003;Ross et al. 2003;Lustbader et al. 2004;Niemi et al. 2005;Alexe et al. 2007;Fuku et al. 2007;Chinnery et al. 2008;Kim et al. 2008;Maruszak et al. 2008;Feder et al. 2008), there are few reports regarding the relations between all mtDNA mutations and either disease patients or centenarians.The previous reports have also focused on mutations causing amino acid replacements in mitochondrial proteins and, although mitochondrial functions can of course be affected directly by amino acid replacements, they can also be affected indirectly by mutations in mtDNA control regions.It is therefore important to examine the relations between all mtDNA mutations and disease patients or centenarians.In the article reported here the relations between Japanese PD patients and their mitochondrial single nucleotide polymorphism (mtSNP) frequencies were analyzed using a method based on radial basis function (RBF) networks (Poggio and Girosi 1990;Wu and McLarty 2000) and a modified method based on RBF network predictions (Takasaki 2009).In addition, the relations between the haplogroups of the PD patients and those of the other four classes of people (centenarians, AD patients, T2D patients, and healthy non-obese young males) are also 558 described using the same analysis method.The results described here are quite different from those reported previously (Saxena et al. 2006;Alexe et al. 2007;Fuku et al. 2007;Bilal et al. 2008).

RBF-based method of mtSNP classification
A RBF network is an artificial network used in supervised learning problems such as regression, classification, and time series prediction.In supervised learning a function is inferred from examples (training set) that a teacher supplies.The elements in the training set are paired values of the independent (input) variable and dependent (output) variable.The RBF network shown in Fig. 1 was learned from the training set as the mtSNPs of the PD patients were regarded as correct and the mtSNPs of other four classes of people (centenarians, AD patients, T2D patients, and healthy non-obese young males) were regarded as incorrect.Similarly, in the mtSNP classification for the centenarians the mtSNPs of the centenarians are regarded as correct and those of the other four classes are regarded as incorrect.The mtSNPs of the AD patients, T2D patients, and healthy non-obese young males were also classified this way.The mitochondrial genome sequences of the PD patients were partitioned into two sets: training data comprising the sequences of 64 of the PD patients, and validation data comprising the sequences of the other 32 PD patients.The training and validation steps are described in detail elsewhere (Takasaki et al. 2006).

Modified classification method based on probabilities predicted by the RBF network
Since a RBF network can predict the probabilities that persons with certain mtSNPs belong to certain classes (e.g., PD patients, centenarians, AD patients, T2D patients, or healthy nonobese young males), these predicted probabilities are used to identify mtSNP features.By examining the relations between individual mtSNPs and the persons with high predicted probabilities of belonging to one of these classes, we are able to identify other mtSNPs useful for distinguishing between the members in different classes.A modified classification method based on the probabilities predicted by the RBF network was thus carried out in the following way (Takasaki 2009).1. Select the analysis target class (i.e., PD patients, centenarians, AD patients, T2D patients, or healthy non-obese young males).2. Rank individuals according to their predicted probabilities of belonging to the target class.3. Either select individuals whose probabilities are greater than a certain value or select the desired number of individuals from the top, and set them as a modified cluster.
Input layer Hidden layer Output layer Fig. 1.RBF network representation of the relations between individual mtSNPs and the PD patients.The input layer is the set of mtSNP sequences represented numerically (A, G, C, and T are converted to 1, 2, 3, and 4).The hidden layer classifies the input vectors into several clusters depending on the similarities of individual input vectors.The output layer is determined depending on which analysis is carried out.In the case of PD patients, 1 corresponds to PD patients and 0 corresponds to other four classes of people.In the case of centenarians, 1 corresponds to centenarians and 0 corresponds to other four classes of people.The AD patients, T2D patients, and healthy non-obese young males are also carried out in similar way.X i : i-th input vector, TN : maximum number of vectors (in this example, TN=320 (64x5)), T SNP : maximum number of mtSNPs (in this example, T SNP =562), M m : the location vector, m: the number of basis functions, µ: basis function, σ: standard deviation, w i : i-th weighting variable, f(X): weighted sum function.

Associations between haplogroups and the mtSNPs of the PD patients
When the mtSNPs of the PD patients were classified by the RBF-based method described above, ten mtSNP clusters were obtained.The average predicted probabilities of these clusters for becoming the PD patients were respectively 63%, 62.5%, 52.9%, 30%, 29.4%, 15.4%, 7.7%, 4.3%, 3.4% and 0%.Then the 15 individuals with the highest probabilities of becoming PD patients were selected using the modified classification method, and their nucleotide distributions at individual mtDNA positions were examined.After that, the relations between Asian/Japanese haplogroups and the mtSNPs for the PD patients were examined (Herrnstadt et al. 2002;Kong et al. 2003;Tanaka et al. 2004).The associations between the haplogroups and mtSNPs for the PD patients are shown in Fig. 2. The features of associations for the PD patients were L3-M-M7b2 (33%), L3-M-G2a (27%), L3-N-B4e (13%), B5b (13%), and N9a (7%).Fig. 2. Associations between haplogroups and the mtSNPs of the 15 persons with the highest probabilities of becoming PD patients.This description of associations is based on the phylogenetic tree for macrohaplogroups M and N described in Tanaka et al. [26].The locus of mtDNA polymorphism (mmm), the normal nucleotide (rCRS) at the position mmm (N N ), the mtDNA mutation at that position (N M ), the number of the mtDNA mutations at mmm in individual clusters (Y), and the number of the normal nucleotides at mmm in individual clusters (X) are expressed as mmmN N >N M (Y/X).For example, 489T>C (10/5) indicates the mtDNA locus (489), the normal nucleotide at that position (T), the mutation at that position (C), the number of mutations (10), and the number of the normal nucleotides in the cluster (5).(B) Centenarians, (C) AD patients, (D) T2D patients, (E) Non-obese young males.
To compare the mitochondrial haplogroups of the PD patients with those of other classes of people, we used the same modified method to examine the relations between the other four classes (i.e., centenarians, AD patients, T2D patients, and non-obese young males) and their mtSNPs.The associations between the haplogroups and mtSNPs for four classes of Japanese people are shown in Fig. 2 B to E. The centenarians were associated haplogroups L3-M-M7b2 (40%), L3-M-D-D4b2a (27%), and L3-N-B5b (20%); the AD patients were associated haplogroups L3-M-G2a (53%), L3-N-B4c1 (20%), and N9b1 (27%); the T2D patients were associated haplogroups L3-M-D-D4 (13%), L3-M-M8a1 (13%), G (13%), L3-N-B5b (20%), and F1 (13%); and the healthy non-obese young males were associated haplogroups L3-M-D-D4g (33%), D4b2a (20%), and D4b1b (27%).The relations among the haplogroups for these five classes of people are listed in Table 1.Table 1.The relations among the haplogroups for five classes of people In Table 1 we see that the haplogroup Mb2 was common in PD patients and centenarians, G2a was common in PD patients and AD patients; and B5b was common in PD patients, centenarians, and T2D patients.The haplogroups of the PD patients are therefore different from those of the other four classes of people.The results are therefore considered new findings.

Comparison with previous works for T2D patients and Centenarians
Although there is no report regarding the relations between mtSNP haplogroups and PD patients but there were a few studies concerning the relations between mtSNP haplogroups and T2D patients or centenarians, the differences between previous works and the work reported here are discussed based on the mtSNP haplogroups obtained.Fuku et al. (2007) reported that the mitochondrial haplogroup F in Japanese individuals had a significantly increased risk of type 2 diabetes mellitus (T2DM) (odds ratio 1.53, P=0.0032) using hospital based sampling data for large-scale association study (Fuku et al., 2007).They indicated that there were three mtSNPs in the haplogroup F -3970C>T, 13928G>C, and 10310G>A.In the present analysis, the risk of T2D patients for the haplogroup F1 was approximately 13% (Fig. 2D).Other haplogroups related to the risk of T2D patients were B5b (20%), M8a1 (13%), D4 (13%) and G (13%) (Fig. 2D and Table 2).There were therefore big differences between the analyses of Fuku et al. (2007) and the results reported here.The significantly increased risk of T2DM was the haplogroup F in Fuku et al. (2007), whereas that of the results obtained was the haplogroup B5b.Although Fuku et al. (2007) indicated that the haplogroup F was the increased risk of T2DM, the F has four sub-haplogroups F1, F2, F3, and F4.In the work reported here, the only haplogroup F1 was obtained by the modified clustering method.The haplogroup F by Fuku et al. (2007) was characterized by three mtSNPs-3970C>T, 13928G>C, and 10310G>A, whereas the haplogroup F1 by the proposed method was featured by many mtSNPs-3970C>T, 13928G>C, 16304T>C, 6392T>C, 10310G>A, 6962G>A, 10609T>C, 12406G>A, and 12882C>T (Tanaka et al., 2004) (Fig. 2D).Furthermore, as Saxena et al. (2006) reported that there was no evidence of association between common mtDNA polymorphism and type 2 diabetes mellitus, the results obtained may indicate new findings for T2D patients (Saxena et al., 2006).
In addition, Alexe et al. (2007) reported the associations between Asian haplogroups and the longevity of Japanese people using the same GiiB data (Alexe et al., 2007).They showed the enrichment of longevity phenotype in mtDNA haplogroups D4b2b, D4a, and D5 in the Japanese population using statistical techniques (t-test and P-value).However, the results here showed that the haplogroups M7b2, D4b2a, and B5b were associated with Japanese centenarians.There is therefore no common haplogroup in both methods.Alexe et al. (2007) showed that the haplogroup D5 was characterized by mtSNPs 11944T>C, 12026A>G, 1107T>C, 5301A>G, 10397A>G, and 752C>T, whereas there was no frequency in the corresponding mtSNPs in the present analysis.Although they reported that the centenarian enrichment was not found in the haplogroup D4b2a, the present results showed that the corresponding D4b2a was characterized by many mtSNPs with a frequency of 27% (Fig. 2B).
Although Alexe et al. (2007) described that there was no haplogroup having mtSNPs significantly enriched in centenarians other than D mega-group in M macrohaplogroup, the present analysis indicated that the haplogroup M7b2 was characterized by many mtSNPs (Fig. 2B and 3B).They also reported that there was no enrichment haplogroup for centenarians in macrohaplogroup N, whereas the haplogroup B5b obtained by the proposed method also had many mtSNPs enriched in centenarians (Figs.2B and 3B).Bilal et al. (2008) reported the haplogroup D4a was a marker for extreme longevity in Japan by analyzing the complete mtDNA sequences from 112 Japanese semi-supercentenarians (aged over 105 years old) combined with previously published data (Bilal et al., 2008).These semi-supercentenarians were also examined using the proposed method.Since the predicted probabilities of individual clusters for the semi-supercentenarians were lower than those of the centenarians, 43 individuals with predicted probabilities over 46% (the average is 54%) were selected.The obtained results were the haplogroups D4a (30%), B4c1a (14%), M7b2 (12%), F1 (9%), M1 (7%) and B5b (2%) shown in Fig. 4. As the highest haplogroup was D4a, this was the same as the marker described by Bilal et al. (2008).However, there are other haplogroups indicating the characteristics of semi-

Differences between statistical technique and the modified RBF method
Although the haplogroups of the PD patients were obtained by the modified RBF method, there are clear differences between the previously reported statistical technique and the method described here.As the previously reported methods analyzed the relations between mtSNPs and Japanese PD patients, centenarians, AD patients, T2D patients, or semisupercentenarians using standard statistical techniques (Alexe et al., 2007;Fuku et al., 2007;Bilal et al., 2008), they could not indicate mutual relations among the other classes of people-centenarians, AD patients, T2D patients and healthy non-obese young males.On the other hand, the proposed method was able to show differences and mutual relations among these classes of people.In addition, the prediction probabilities of associations between mtSNPs and these classes of people cannot be obtained by the statistical techniques used in the previous methods, whereas the method proposed is able to compute them based on learning mtSNPs of individual classes.
It is considered that the relations among individual mtSNPs for these classes of people should be analyzed as mutual mtSNP connections in the entire mtSNPs.A learning method, a RBF network, was therefore adopted for extracting individual characteristics from the entire mtSNPs, although the previous methods used standard statistical techniques.The differences between standard statistical technique and the proposed method are listed in Table 2.In the statistical technique, the analysis of odds ratios or relative risks is based on the relative relations between target and control data at each polymorphic mtDNA locus.In the modified RBF method, on the other hand, clusters indicating predicted probabilities are examined on the basis of the RBF using correct and incorrect data for the entire polymorphic mtDNA loci.The statistical technique determines characteristics of haplogroups using independent mtDNA polymorphisms that indicate high odds ratios, whereas the modified RBF method determines them by checking individuals with high predicted probabilities.This means that the statistical technique uses the results of independent mutation positions, whereas the modified RBF method uses the results of entire mutation positions.As there are the differences between the two methods, which method is better depends on future research.Furthermore, the method described here may have possibilities for use in the initial diagnosis of various diseases or longevity on the basis of the individual predicted probabilities.

Table 2 .
Differences between the statistical technique and the proposed (modified RBF) method