Open access peer-reviewed chapter

The Analysis on the Effects of COMT, DRD2, PER3, eNOS, NR3C1 Functional Gene Variants and Methylation Differences on Behavoiral Inclinations in Addicts through the Decision Tree Algorythm

Written By

Inci Zaim Gokbay, Yasemin Oyaci and Sacide Pehlivan

Submitted: 02 July 2022 Reviewed: 05 July 2022 Published: 29 August 2022

DOI: 10.5772/intechopen.106313

From the Edited Volume

Numerical Simulation - Advanced Techniques for Science and Engineering

Edited by Ali Soofastaei

Chapter metrics overview

80 Chapter Downloads

View Full Metrics

Abstract

The aim of this study was to analyze the effects of Catechol-O-methyltransferase (COMT), Dopamine Receptor D2 (DRD2), Period Circadian Regulator 3 (PER3), Endothelial Nitric Oxide Synthetase (eNOS), Nuclear Receptor Subfamily 3 Group C Member 1 (NR3C1) functional gene variants on possible inclinations of the individuals with Substance Use Disorder (SUD) by using decision trees algorithm and to evaluate the similarities with former studies. The decision trees classification was structured by confirming the effects of genetic and epigenetic sequences of gene variants through 10-fold cross-validation under subtitles of the criminal history, continuum of substance use, former polysubstance abuse, attempted suicide, and inpatient treatment. Performance criteria were evaluated with the similarities of former studies’ accuracy, sensitivity, and precision values. The branching structure of gene variants obtained by tree classification is consistent with the studies in the literature. Our study serves to be the first to show that there is a need for further comprehensive studies with data from different ethnic groups to increase the predictive accuracy rates and to state that machine learning may guide in predicting the effect of gene variants on behavior in the future.

Keywords

  • substance use disorder
  • COMT
  • DRD2
  • PER3
  • eNOS
  • NR3C1 gene variants
  • decision tree analysis
  • 10-fold cross validation

1. Introduction

The adolescent and younger adult periods defined by the World Health Organization (WHO) as the transition from childhood to adulthood play a vital role in the lives of individuals. These periods are mental and physical developmental processes in which tendency to substance abuse, nutritional disorders, mental problems, and risky behaviors are common. Gorker et al. [1] stated in their study that patients who referred to the child and adolescent psychiatry clinic were diagnosed with anxiety disorder, mood disorder, mental retardation, expulsion disorder, disruptive behavior disorder, borderline intellectual functioning, communication disorder, somatoform disorder, and tic disorders, respectively. It is stated in the aforesaid study that these diagnoses are accompanied by mental retardation, adjustment disorder, attention deficit, and hyperactivity, as well as substance use disorder (SUD). There are studies demonstrating that substances with addiction potential are preferred by adolescents and young people due to their euphoric effect, which is generally seen as a positive effect and to relieve negative effects such as pain, pain-reducing, stress-relieving, and relaxing [2]. However, chronic use of a limited number of classes of substances that begin with such justifications causes physical, psychological, and behavioral changes in humans. Substance Use Disorder (SUD) appears under the influence of multiple factors and the persistence is accompanied by these factors. Environmental factors including family, peer relations, neighborhood relations, and physical conditions of neighborhood and educational environment play various roles in addiction-related situations such as molecular pathways, cellular mechanisms, tolerance in addiction, recurrence of addiction, and substance seeking.

The family is closely involved with individual’s developmental behavior disorders, with respect to its cultural roots and family attitudes. Biological effects can be classified under many factors, such as genetical, physiological, temperamental, and impulsive behavior tendencies. Nonetheless, family is the first social unit that the individual belongs to. The baby is born in a family and learns the first social rules from the family. Due to this reason, it’s said that family has an impact on individuals both in biological and sociological personality development. Biological disposition—namely temperament—is transferred by genetical heritage while sociological disposition—namely character—is related with upbringing attitudes, attachment, cultural heritage. Studies have brought out that in SUD, having a substance user family member plays a crucial role in individual’s life [3].

The SUD refers to tolerance and withdrawal that occurs after chronic use of the substance, which is a tool that an individual uses to cope with many factors that he calls negative effects especially between 11 and 16 years of age or to feel himself belonging to a group; however, behavioral addiction refers to uncontrollable, permanent use despite negative physical, psychological, social, or legal consequences. Bozkurt [4] shared the findings in their study that child-raising attitudes of the parents are effective. Among substance addicted individuals, 20.9% of the participants stated that they were exposed to physical violence and 40.9% stated that there was physical violence in the family. Furthermore, it was determined that 69.8% of the participants had a substance use disorder in their families. Ünal [5] stated in his study that participants had family members with SUD including fathers by 25%, siblings by 50%. The family is also the society in which the temperament structure of the individual, namely genetic tendencies, is also effective. An individual is born with genetic predispositions in the family. There are studies showing that individuals with COMT, DRD2, PER3, eNOS, NR3C1 functional gene variants are prone to develop MID.

The gene variant is a term used to describe the variation in the DNA sequence in the genome. The term variant may be used to describe a change that may be benign, pathogenic, or with an unknown significance.

DNA methylation is the reaction of covalent attachment of a methyl group from the 5-carbon of cytosine in a CpG dinucleotide to the structure, altering gene expression and altering the cell functions.

The aim of this study was to analyze the effects of COMT, DRD2, PER3, eNOS, NR3C1 functional gene variants and COMT, DRD2, and NR3C1 methylation status on the tendencies that have the potential detected in individuals with MID through decision trees algorithm. The criminal record history, continuum of substance use, former polysubstance abuse, attempted suicide, and inpatient treatment will be analyzed by using decision trees.

Advertisement

2. Materials and method

There are studies commenting on the association between statistical results and variables including sociological, psychological, neurological, and genetic fields. Statistical analyses are mathematical operations that process and summarize data based on probability and guide researchers who review the association between variables in this direction. Machine learning techniques are also based on probabilistic models, yet the outcome of these models provides prediction on mutually exclusive events in a widen event set. Regarding these features, machine learning studies have the ability to analyze and obtain a prediction in different perspectives about data. Nevertheless, despite technological developments and improvements in enhancements of accuracy rates of prediction results of these systems, there are limited studies that have been done about substance abuse.

The aim of this study designed in this context was to analyze the criminal record history, continuum of substance use, former polysubstance abuse, attempted suicide, and inpatient treatment by using decision trees and to discuss the association of the findings obtained with the literature.

2.1 Machine learning methods

Machine learning is the structure that includes learning in artificial intelligence applications. On the one hand, it can be defined as the whole of algorithms that imitate human intelligence, and on the other hand, do not need rules that people can interpret and enter manually.

Machine learning applications learn the desired task by assimilating the presented datasets, just as people learn the concepts they see and hear on their own. They can make predictions about the outcome of the new data entry that is out of the data they have learned over time. The training set used in machine learning is used in the machine learning process, and the test set is used in the prediction process. For example, in the design of a system that will ensure that an orchid is selected from a vase with different flowers and taken into a separate vase, as much data about the orchid genus as possible are included in the training set for learning, and other flowers such as height, color, leaf shape, color, curl, flower shape, color distribution, folds are selected. It is ensured that the distinguishable features can be created by the machine. After making these distinctive classes in the training set, the predictive ability of the machine is tested. In this dataset, called the test set, there are flowers in a vase with a new arrangement that the machine has not seen before. It is expected from the machine to find out if there are orchids in this newly encountered cluster and to take it (differentiate) if it is. Machine learning processes are very similar in principle with learning processes of human. In the developmental processes of people, learning is divided into behavioral and cognitive approaches and many sub-branches under it. Machine learning methods are also divided into branches within themselves such as supervised and unsupervised learning [6].

In the most general form, supervised learning is in which the relationship between input and output is learned by matching under the supervision of a supervisor. Unsupervised learning is learning by finding the regularities between the inputs entering the system without a supervisor and producing output. Problems solved using supervised learning are generally divided into classification and regression problems. The important thing in supervised learning methods is to include a target attribute in the dataset. Depending on the type of problem to be addressed, the type of target attribute can be of different type. For target attribute classification problems, there may be class labels, while for regression problems it may be a numerical value.

2.2 The classification method and the decision tree algorithm

The classification method, which is among the machine learning methods, is one of the commonly preferred methods, especially in the field of medicine. Learning in classification algorithms is based on learning and classifying the distribution form from the given training set. Support Vector Machine (SVM), Nonlinear Supporter Vector Machine, Naive Bayes Classifier, Decision Tree Classifier, and Nearest K-neighbor classification algorithms are examples. For instance, Zaim Gokbay et al. [7] presented a decision support system design in their study in order to support the diagnosis of endocrine disease. The classification rules used in the model were created depending on the investigation of visible changes, which has started along with complaints on the physical appearance as well as the laboratory results. The patient complaints were stated by the patient by filling in a questionnaire. Physical changes were investigated by the endocrinologist during the exam, and findings on the mandible evagination, skin cracks were clinically evaluated and entered into the system. Every three data entered into the system were used to classify the prediction model of three different endocrinological diseases. Each class represents a disease. The individual falls into a class in the sum of his answers to the questions, and it is concluded that he has the potential to have the disease indicated by that class. Your symptoms that cause you to come to the doctor serve as an indicator. The formation of such rules allows to make predictions in the next steps. There is the logic of separating data belonging to common features into certain classes in a dataset in classification methods. Numerous algorithms have been developed for this purpose. Examples including the entropy-based classifications, regression and decision trees, memory-based algorithms, Bayesian classifiers may be given.

In the decision tree classification method, the data are classified by separating from the root to the leaf. The if-then rule is implemented in this separation. If the condition is 1, then a chain-like condition 2 and then 3 are formed in order to establish a branching structure from root to leaf. The decision tree method was chosen for the classification performed in this study, because it is based on rules that may be understood by people, both visually and because of the convenience that would provide to multidisciplinary work in comparing the results with the literature.

The model was established with 10-fold cross-validation in the study. The success rates of the decision tree classes were interpreted through the accuracy, class recall, and class precision values, and the association of the sequences obtained with the literature was discussed.

2.3 Classification model performance criteria

The simplest and most common method used to measure the performance of classification models is the accuracy rate, precision, and recall rates.

AccuracyRate=TP+TNTP+FP+TN+FNx100E1
ClassPrecision=TPTP+FPx100E2
ClassRecall=TPTP+FNx100E3

Parameters in formulas (1)(3) are defined as true positive (TP), false negative (FN), true negative (TN), and false positive (FP) as follows;

  • TP: The number of values actually in the positive class and predicted in the positive class.

  • FN: The number of values actually in the positive class but in the negative class in prediction.

  • FP: The number of values actually in the negative class but in the estimation positive class.

  • TN: Number of values actually in the negative class and predicted in the negative class.

In other words, the accuracy refers to the percentage of samples classified as correct. The measure of how many of the positively predicted outputs are positively predicted is expressed as Recall, and the measure of how many of the positively predicted outputs are positive is expressed as Precision.

2.4 The dataset characteristics

This study was conducted with retrospective data of 211 male participants known to be addicted to at least one substance, obtained from studies completed and published with the approval of the ethics committee (2019/87) of the Ethics Committee for Clinical research within Istanbul Faculty of Medicine [8, 9, 10]. The average age of the individuals is 28.67, and the age varies between 18 and 51 years of age. The educational level of the individuals include 2 college graduates, 61 secondary school graduates, 49 primary school graduates, and 99 literate or illiterate participants. The marital status of the participants was as follows: 147 were single, 43 were married, 12 were divorced, and 9 were married but living separately. There were three students among the participants; 56 individuals are employed, whereas 152 individuals are unemployed. Individuals who use at least one of cigarettes and alcohol and use at least one of the addictive substances including cannabinoids, synthetic cannabinoids, cannabis, cocaine, ecstasy, and heroin were included in the study. The first age of start of the individuals for one of these substances varies between 10 and 30 years of age.

There is no missing information in the dataset used within the scope of the study. Therefore, the data were not exposed to a preliminary procedure. Descriptions of attributes, values, and variable names are presented in Table 1.

DescriptionEntry variable nameTypeValues
eNOS is an important mediator of cardiovascular homeostasis due to its role in nitric oxide (NO) production. Nitric oxide has well-known vascular effects, and significantly affects autonomic nervous system activity. A significant gene-environment interaction between eNOS and behavioral risk factors such as chewing tobacco and consuming alcohol was detected in a previous study. In particular, the “GG” genotype was associated with an increased risk of hypertension among individuals who use tobacco or consume alcohol [11]. It was concluded in a previous study that lower nitric oxide levels may increase the dopamine turnover or decrease the dopamine release [12]. The suggestion shared was that these two effects are not mutually exclusive and may appear together, and the reward mechanism based on the release of less usable dopamine in metabolism strengthens the addictive behavior by directing the individual to consume more cannabis.eNOS- Intron 4a/b VNTRNominalValues: AA (Data count: 128), BA (Data count: 66), BB (Data count: 17)
eNOS-rs1799983NominalValues: GG (Data count:129), GT (Data count:73) ,TT (Data count: 9)
PER3, which is located on chromosome 1p36.23, contains a polymorphic domain that expresses 4 or 5 copies of the 54-bp tandem repeat sequence (variable number tandem repeat, VNTR). This variation results in the addition/deletion of 18 amino acids and it is linked to sleep and mood disorders as well as circadian preference in humans [13]. Studies have shown a clear association between poor sleep patterns and a range of negative health behaviors such as substance use, suicide attempts, and unintentional injury [14].PER3-rs57875989NominalValues: 4R/4R (Data count:78), 4R/5R (Data count:97), 5R/5R (Data count:36)
Dopamine receptors, which are divided into two classes including D1-like receptors and D2-like receptors, regulate the effects of dopamine and dopamine components. The cyclic AMP (cAMP) is stimulated or suppressed in the regulation of adenylate cyclase activity which is the most important of these regulation mechanisms. The first study demonstrating that a dopaminergic gene encoding DRD2, the dopamine receptor, may show population variants that may predispose to alcoholism was performed by Blum ve ark. [15, 16]. Many studies were conducted especially on DRD2 after then, and it was stated that a certain form of the DRD2 gene may be associated with a seven-fold increase in susceptibility to alcohol abuse, and when environmental effects are taken into account, individuals carrying this gene are 60% prone to substance use [17] when compared to others.DRD2-rs1799732NominalValues: Ins/Ins (Data count:168), Ins/Del (Data count:41), Del/Del (Data count:2)
DRD2-METHYLATIONNominalValues: PARTIAL (Data count:126), UNMETILE (Data count:85 )
The COMT enzyme is responsible for degradation and elimination of dopamine neurotransmitters in the prefrontal cortex of the brain. The COMT enzyme and COMT gene functional variants which play a role in the metabolism of catecholamine and catecholamine-containing substances such as dopamine and are thereby important elements of the dopaminergic system have been the focus of many studies. Findings of the study conducted by Delisi et al. [18] suggest that COMT plays a role in cerebral areas that modulate self-regulation and expression of negative emotions, influencing antisocial personality disorder (ASPD) and delinquency.COMT-METHYLATIONNominalValues: PARTIAL (Data count:143), UNMETHYLATED (Data count:68 )
COMT-rs4680NominalValues: Val/Met (Data count:97), Val/Val (Data count:63 ), Met/Met (Data count: 51)
NR3C1 plays a critical role in HPA axis regulation and it is thereby considered as a possible cause of stress-related disorders. NR3C1 consists of 8 introns and 9 exons on chromosome 5q31-32, encoding the glucocorticoid receptor. Previous studies have reported that altered NR3C1 methylation may cause various psychopathologies including major depressive disorder, bipolar disorder, suicidal behavior, and substance use disorder [19, 20, 21].NR3C1- rs41423247NominalValues: CC (Data count: 139), GC (Data count:60 ), GG (Data count: 12)
NR3C1-METHYLATIONNominalValues: PARTIAL (Data count:192), UNMETHYLATED (Data count:19 )

Table 1.

Attribute descriptions, variable names, types, and values.

Advertisement

3. Findings

In the study, the presence of gene variants of eNOS- Intron 4a/b VNTR, eNOS-rs1799983, PER3-rs57875989, DRD2-rs1799732, COMT-rs4680, NR3C1- rs41423247 and DRD2, COMT, and NR3C1 gene methylation status, the criminal record history, continuum of substance use, former polysubstance abuse, suicidal behavior, and inpatient treatment were analyzed with decision trees, which are the classification algorithms. The accuracy, sensitivity, and precision performance rates of each model are presented in Table 2.

Behavior tendencyPerformance scales
The Criminal Record HistoryAccuracy RateRecall and Precision Rates
True: ExistsTrue: Not ExistsPrecision
52.68 %Prediction: Exists715547.06 %
Prediction Not Exists454056.35 %
Recall42.11 %61.21 %
Continuum Of Substance UseAccuracy RateRecall and Precision Rates
True: ContinousTrue: Not ContinousPrecision
49.76 %Prediction: Continous717651.70%
Prediction Not Continous293545.31%
Recall29%68.47%
Former Polysubstance AbuseAccuracy RateRecall and Precision Rates
True: ExistsTrue: Not ExistsPrecision
51.21 %Prediction: Exists5967%46.83
Prediction Not Exists3649%57.65
Hassasiyet (recall)% 62.11%42.24
Suicidal BehaviorAccuracy RateRecall and Precision Rates
True: ExistsTrue: Not ExistsPrecision
65.00 %Prediction: Exists1628% 36.36
Prediction Not Exists46141% 72.46
Recall% 25.81% 81.21
Inpatient TreatmentAccuracy RateRecall and Precision Rates
True: ExistsTrue: Not ExistsPrecision
70.56 %Prediction: Exists14246% 75.53
Prediction Not Exists167% 30.43
Recall% 89.87% 13.21

Table 2.

Behavioral tendencies and performance rates in classification of gene variants by decision tree method.

3.1 The criminal record history

The tree structure established in order to review the effect of eNOS-Intron 4a/b VNTR, eNOS-rs1799983, PER3-rs57875989, DRD2-rs1799732, COMT-rs4680, NR3C1- rs41423247 gene variants, and DRD2, COMT, and NR3C1 gene methylation states on tendency of decriminalization was presented in Figure 1; the effect of input variables (gene variants) on weight distribution is presented in Table 3. The tendency of delinquency was evaluated by questioning the forensic history of the participant. The existence of criminal history, the tendency delinquency, and absence of the tendency to delinquency were interpreted that this tendency has been more suppressed and there is not any tendency to action. Therefore, the decision tree structure was established as “Criminal Record History -Yes” or “Criminal Record History -No.” There are 116 individuals in the Criminal Record History class of the dataset; however, the absence of criminal record history included 95 data. The tree root is established over the NR3C1 gene, as may be seen in Figure 1. The presence of CC genotype of NR3C1-rs41423247 following partial methylation of the NR3C1 gene is an effective sequence in predicting a tendency of an individual to delinquency.

Figure 1.

The decision tree structure constructed with 10-fold cross-validation for the evaluation of gene variant effect of Criminal History.

AttributeAverage weight
COMT-rs46800,013
NR3C1- rs414232470,010
DRD2- METHYLATION0,008
eNOS-rs17999830,007
NR3C1- METHYLATION0,005
PER3-rs578759890,002
eNOS – Intron 4a/b VNTR0,002
COMT- METHYLATION0,001
DRD2-rs17997320,000

Table 3.

Average weight values of gene variants in the criminal record history within the context of knowledge acquisition.

The review of Table 3 reveals that the attributes with the highest information gain are COMT-rs4680 with an average weight value of 0.013, and the lowest variable is COMT-METHYLATION with an average weight value of 0.001. This is concluded that the COMT-rs4680 variant contains the highest information gain on delinquency.

3.2 The continuum of substance use

The tree structure established to investigate the effect of the tendency toward continuous use of the substance without interruption in individuals with SUD. The weight distributions of the input variables (gene variants) on the output are presented in Table 4. The tendency for continuous use of the substance without interruption was evaluated by answers of the participants to the question “Do you use the substance intermittently?”. The decision tree structure was created as “Intermittent” or “Continuous.” The number of individuals who have declared intermittent substance use in the dataset was 100, whereas 111 individuals stated continuous substance use.

AttributeAverage weight
COMT- METHYLATION0,018
COMT-rs46800,011
DRD2-rs17997320,010
NR3C1- rs414232470,002
DRD2- METHYLATION0,001
NR3C1- METHYLATION0,001
eNOS-rs17999830,001
eNOS-Intron 4a/b VNTR0,001
PER3-rs578759890,000

Table 4.

Average weight values of gene variants in the trend to continuous substance use within the context of knowledge acquisition.

The tree root starts as part of COMT-METHYLATION in the form of methylated or non-methylated and forms a wide branching structure. In case of the partial methylation of COMT-METHYLATION, the branching continues through the DRD2-rs1799732 gene variant; however, if unmethylated, it continues with the NR3C1-rs41423247 gene variant.

The review of reveals that the variable with the highest information gain is COMT-METHYLATION with an average weight value of 0.018, and the lowest variables with an average weight value of 0.001 are DRD2-METHYLATION, NR3C1 METHYLATION, eNOS-rs1799983, and eNOS-Intron 4a/b VNTR. This is concluded that the COMT-METHYLATION variant contains the highest information gain on delinquency.

3.3 The former polysubstance abuse

Individuals with SUD may use more than one substance such as alcohol, cigarettes, cannabis, heroin, cocaine, toluene, ecstasy, etc. Combined use of at least two of cannabinoid, synthetic cannabinoid, ecstasy, heroin, cocaine, and toluene is investigated within the scope of this sub-assessment. In addition to any of the aforementioned substances, tobacco and/or alcohol use, the use was not included in the multiple substance use, because all of the participants already use these two substances in combination with the other substances mentioned. The reason for exclusion of this situation from the analysis is that it is clear that there will be no meaningful results.

The weight distributions of the input variables (gene variants) on the output are presented in Table 5. The decision tree structure class was created as “Polysubstance Use - Yes” or “Polysubstance Use - No.” The number of people who declared combined use of at least two of the substances mentioned was 95 in the dataset, and the number of people who declared use of one substance was 116.

AttributesAverage weight
COMT-rs46800,032
NR3C1- rs414232470,021
eNOS-rs17999830,018
DRD2-rs17997320,008
NR3C1-METHYLATION0,005
PER3-rs578759890,003
COMT-METHYLATION0,002
DRD2-METHYLATION0,001
eNOS- İntron 4a/b VNTR0,000

Table 5.

Average weight values of gene variants in the polysubstance use within the context of information gain.

The tree root starts by COMT-rs4680 and forms a wide branching structure. The Met/Met and Val/Val genotype branches to NR3C1-METHYLATION and to Val/Met NR3C1- rs41423247 and continues. It is concluded that there is not any tendency to polysubstance use when the Met/Met genotype is unmethylated in the NR3C1-METHYLATION branch; however, there is a tendency in partial methylation follow-up of the Val/Val genotype.

The review of Table 5 reveals that the variable with the highest information gain is COMT-rs4680 with an average weight value of 0.032, and the lowest variable is DRD2-METHYLATION with an average weight value of 0.001. This is concluded that the COMT-rs4680 variant contains the highest information gain in multiple substance use.

3.4 The suicidal behavior

The individuals were asked whether they had attempted suicide at least once in order to evaluate the suicidal behavior in individuals with SUD. One-hundred and forty-nine individuals who had never attempted suicide were classified under “No Suicide Attempt,” and 62 individuals who had at least one or more suicide attempts were classified under “Suicide Attempts.”

The tree root starts by NR3C1-rs41423247 and forms a wide branching structure. It is concluded that suicidality is not seen in the GG genotype, and it branches to NR3C-METHYLATION in the GC and CC genotypes and continues. There is not any tendency when NR3C1-METHYLATION is unmethylated in the CC genotype; however, the same pathway shows the tendency in the GC genotype.

The review of Table 6 reveals that the variable with the highest information gain is PER3-rs57875989 with an average weight value of 0.013, and the lowest variable is DRD2-rs1799732 with an average weight value of 0.005. This is concluded that the PER3-rs57875989 variant contains the highest information gain on suicidality in individuals with SUD.

AttributeAverage weight
PER3-rs578759890,013
NR3C1- rs414232470,012
DRD2- METHYLATION0,008
eNOS-rs17999830,008
eNOS-İntron 4a/b VNTR0,007
COMT- rs46800,005
DRD2-rs17997320,005
NR3C1- METHYLATION0,000
COMT- METHYLATION0,000

Table 6.

Average weight values of gene variants in the suicidal behavior within the context of information gain.

3.5 The inpatient treatment

Uzbay (Uzbay 2015) defined substance addiction in general as “a brain disease characterized by some behavioral disorders and the desire to take a substance continuously or periodically in order to feel the pleasurable effects of the substance, or to avoid the discomfort caused by its absence.” Based on this definition, hospitalization of an individual with SUD to be treated voluntarily may be evaluated as a desire to get rid of substance use or to avoid substance use. The participants were classified under two groups depending on the history of hospitalization. There were 158 individuals under the class of “History of Hospitalization-yes” and 53 individuals under the class of “History of Hospitalization-No.” These numbers suggest that the majority of the participants tend to avoid the substance addiction. The tree root starts by NR3C1-METHYLATION and forms a wide branching structure. The branching continues in partial methylation and unmethylated pathways through NR3C1-rs41423247. The individuals with the NR3C1-rs41423247 CC genotype bound in the unmethylated pathway have a tendency to avoid the substance, binding to the eNOS-rs1799983 gene variant occurs when the same pathway is followed in the partial methylation pathway.

The review of Table 7 reveals that the variable with the highest information gain is NR3C1-METHYLATION with an average weight value of 0.017, and the lowest variable is COMT-METHYLATION with an average weight value of 0.001. This causes to conclude that NR3C1-METHYLATION status provided the highest information about transforming the tendency to avoid or to get rid of the substance into the behavior.

AttributesAverage weight
NR3C1- METHYLATION0,017
eNOS-rs17999830,015
DRD2-rs17997320,011
NR3C1-rs414232470,007
eNOS-Intron 4a/b VNTR0,003
DRD2- METHYLATION0,002
COMT-rs46800,001
COMT- METHYLATION0,000
PER3-rs578759890,000

Table 7.

Average weight values of gene variants in the inpatient treatment within the context of information gain.

3.6 Discussion and conclusion

According to the data of the Ministry of Justice, the number of people in prison for crimes related to substance addiction was 57,674 in 2018 corresponding to 21.78% of all convicts [22]. It is detected in some studies on substance addiction that substance users have higher rates of prison history [23]. There is a similar pattern in the database used within the scope of the study, and 53.7% of individuals with SUD have a forensic history. Thirteen of the 28 decision leaves obtained in the established tree structure ended with the existence of a criminal story; however, 15 ended with the absence of a criminal story. The tree root is established over the NR3C1 gene, as seen in Figure 1. In a recent study on convicted male individuals, results obtained indicated that the NR3C1 gene is associated with violent behavior in adult males [24]. For instance, the presence of CC genotype of NR3C1-rs41423247 following partial methylation of the NR3C1 gene is an effective sequence in predicting a tendency of an individual to delinquency. It is detected that the dominant variable in the tree is the COMT-rs4680 functional gene variant, which is in line with the studies of the literature [9, 25]. When the tree success rates in Table 2 are examined, it is seen that the accuracy rate of the model is 52.68%. The lack of information about more individuals in the dataset, lower diversity of the dataset such as the absence of individuals without SUD and with same genetic information as input information have prevented the higher learning.

Substance addiction is a process that causes many systems to change physiologically and the desire to use the substance continuously by withdrawal [26]. Therefore, the tendency to continuous substance use is the natural expected result of the substance addiction. However, the desire to get rid of the substance is effective in getting away from this situation for a while. From this point of view, the root of the COMT-METHYLATION part, which starts in the form of methylated (Partial) or unmethylated, establishes a wide branching structure in the tree structure established. In case of the partial methylation of COMT-METHYLATION, the branching continues through the DRD2-rs1799732 gene; however, if unmethylated, it continues with the NR3C1-rs41423247 gene variant. Forty-four decision leaves were formed on the tree, 20 of which resulted in intermittent use and 24 of them in continuous use. It was detected that the variable with the highest information gain was the COMT-METHYLATION. In the review of the performance rates in Table 2, the accuracy rate was detected as 49.76%.

When the COMT-rs4680 genotypes and allele distributions were compared with clinical parameters in the statistical results of the dataset used in the study with X et al., it was observed that multiple substance use was significantly lower in individuals with Met/Met genotype than in individuals with Val/Met and Val/Val genotypes, and multiple substance use was found statistically significantly higher in carriers of the Val allele. Vandenbergh et al. also found in their study that the high-activity Val allele was significantly higher in individuals with multiple substance use [27]. It was observed in the classification of decision trees that there was not any tendency for multiple substance use in the separation of the tree root with COMT-rs4680 and the sequencing of the Met/Met separation with NR3C- METHYLATION to unmethylated. Sequencing continued with DRD2-rs1799732 in partial methyl cleavage of the same pathway. It was concluded that a tendency to multiple substance use appeared in the partial methylation of the sequence with NR3C1 METHYLATION in the Val/Val separation of the main root. Findings of the tree provide similar results when compared with previous studies. Thirty-five result leaves appeared on the tree. Fifteen of these leaves ended that there was a tendency, and 20 ended without any tendency. It was detected that the variable with the highest information gain was the COMT-rs4680. In the review of the performance rates in Table 2, the accuracy rate was detected as 51.21%.

It is stated in studies conducted on individuals with substance use that individuals are suicidal due to their inability to cope with the economic difficulties due to the substance, inadequacy, family problems, exclusion from society, mood disorders, depression experienced during substance withdrawal. It was observed in consideration of the statistical results of the dataset in previous studies (reference) that suicide attempts were at higher levels in individuals who have started to use substances before the age of 15 years. It was commented that individuals at and below 15 years of age have not yet completed their physical and mental development, they may be easily affected by their friends, and the emotional and hormonal changes due to adolescence may have caused these differences. The tree root starts by NR3C1 rs41423247 and forms a wide branching structure. It is concluded that suicidality is not seen in the GG genotype, and it branches to NR3C-METHYLATION in the GC and CC genotypes and continues. There is not any tendency when NR3C1-METHYLATION is unmethylated in the CC genotype; however, the same pathway shows the tendency in the GC genotype. Findings of the tree provide similar results when compared with previous studies. Forty-three result leaves appeared on the tree. Fifteen of these leaves ended that there was a tendency, and 22 ended without any tendency. It was detected that the variable with the highest information gain was the PER3-rs57875989. In the review of the performance rates in Table 2, the accuracy rate was detected as 65%.

It was seen that the root formed a wide branching structure from NR3C1-METHYLATION when considering the tree structure established for the trend analysis for the desire to get rid of the substance examined under the presence or absence of inpatient treatment history. The branching continues in partial methylation and unmethylation pathways through NR3C1-rs41423247. The individuals with the NR3C1-rs41423247 CC genotype bound in the unmethylation pathway have a tendency to avoid the substance, binding to the eNOS-rs1799983 gene variant occurs when the same pathway is followed in the partial methylation pathway. Twenty-eight result leaves appeared on the tree. Fifteen of these leaves ended with the decision that there was a tendency, and eight ended without any tendency. It was detected that the variable with the highest information gain was the NR3C1-METHYLATION. The review of the performance rates in Table 2 reveals that the accuracy rate was detected as 70.56%.

In this study, the effects of genetic and methylation differences of COMT, DRD2, PER3, eNOS, NR3C1 functional gene variants on the potential trends in individuals with SUD were analyzed by decision trees algorithm, and their similarities with previous studies in the literature were evaluated. The tendencies are grouped in a structure suitable for dual classification under the subgroups of tendency to delinquency, tendency to use of the substance, tendency to use of multiple substances, tendency to suicide, and tendency to abandon the substance, respectively. There is not any deficient data in the dataset. The 10-fold cross-validation was used in the model. This method creates k discrete pieces in a dataset with m samples, each containing m/k samples. This method allocates a different dataset for testing each time and uses the remaining k-1 dataset for training purposes. It is trained k times by changing the classifier in this way. In the last step, it estimates the classifier performance by the average of the k errors obtained.

The decision trees are a model that may provide effective results in binary classes among machine learning classification methods. Our study is the first in the literature to examine the effects of gene variants on behavioral tendencies through machine learning methods. However, the lower accuracy rates obtained in this study indicate that the dataset needs to be more diverse and comprehensive.

Advertisement

Conflict of interest

The authors declare no conflict of interest.

References

  1. 1. Işık G, Korkmazlar Ü, Durukan M, Aydoğdu A. Symptoms and diagnoses of first-time adolescent applications to a child and adolescent psychiatry out-patient clinic. The Journal of Clinical Psychiatry. 2004;7(2):103-110
  2. 2. Kılıç FS. Addiction and stimulant drugs. Osmangazi Journal of Medicine. 2016;1:55-60
  3. 3. Derin G, Okudan M, Aşıcıoğlu F. Alkol ve madde kullanım bozukluklarında ailevi risk faktörleri. In: Öztürk E, editor. Aile Psikopatolojisi. Ankara: Türkiye Klinikleri; 2021. pp. 118-126
  4. 4. Bozkurt O. Madde Bağımlısı Bireylerin Bağımlılık Süreçlerinde Ailenin Etkisi. Sosyal Bilimler Enstitüsü; 2015
  5. 5. Ünal M. Madde Bağımlılığı ve Alkolizmde Aile. Sosyal Politika Çalışmaları Dergisi. 2004;2(2):80-86
  6. 6. Gökbay İZ. Artificial Intelligence Applications In Medicine – An Overview Of The Evolution Of Clinical Decision Support Systems In The Development Process Of Diagnostic And Treatment Methods From Antiquity To Artificial Intelligence. 2021. pp. 673-692. DOI: 10.26650/B/ET07.2021.003.33
  7. 7. Gökbay İZ, Karman Ş, Yarman S, Yarman BS. An intelligent decision support tool for early diagnosis of functional pituitary adenomas. TWMS Journal of Applied and Engineering Mathematics. 2015;5(2):169-187
  8. 8. Nursal AF, Aydın PÇ, Uysal MA, Pehlivan M, Oyacı Y, Pehlivan S. PER3 VNTR variant and susceptibility to smoking status/substance use disorder in a Turkish population. Arch Clin Psychiatry. 2020;47(3):71-74. DOI: 10.1590/0101-60830000000235
  9. 9. Oyaci Y, Aytac HM, Pasin O, Cetinay Aydin P, Pehlivan S. Detection of altered methylation of MB-COMT promotor and DRD2 gene in cannabinoid or synthetic cannabinoid use disorder regarding gene variants and clinical parameters. Journal of Addictive Diseases. 2021;39(4):526-536. DOI: 10.1080/10550887.2021.1906618
  10. 10. Pehlivan S, Aydin PC, Nursal AF, Pehlivan M, Oyaci Y, Yazici AB. A relationship between endothelial nitric oxide synthetase gene variants and substance use disorder. Endocrine, Metabolic & Immune Disorders Drug Targets. 2021;21(9):1679-1684. DOI: 10.2174/1871530320666201013154917
  11. 11. Shankarishan P, Borah PK, Ahmed G, Mahanta J. Endothelial nitric oxide synthase gene polymorphisms and the risk of hypertension in an Indian population. BioMed Research International. 2014;2014:793040. DOI: 10.1155/2014/793040
  12. 12. Isir AB, Nacak M, Balci SO, Aynacioglu AS, Pehlivan S. Genetic contributing factors to substance abuse: An association study between eNOS gene polymorphisms and cannabis addiction in a Turkish population. Australian Journal of Forensic Sciences. 2016;48(6):676-683. DOI: 10.1080/00450618.2015.1112428
  13. 13. Guess J, Burch JB, Ogoussan K, et al. Circadian disruption, Per3, and human cytokine secretion. Integrative Cancer Therapies. 2009;8(4):329-336. DOI: 10.1177/1534735409352029
  14. 14. Zhabenko O, Austic E, Conroy DA, et al. Substance use as a risk factor for sleep problems among adolescents presenting to the emergency department. Journal of Addiction Medicine. 2016;10(5):331-338. DOI: 10.1097/ADM.0000000000000243
  15. 15. Blum K, Noble EP, Sheridan PJ, et al. Association of the A1 allele of the D2 dopamine receptor gene with severe alcoholism. Alcohol. 1991;8(5):409-416. DOI: 10.1016/0741-8329(91)90693-q
  16. 16. Uhl G, Blum K, Noble E, Smith S. Substance abuse vulnerability and D2 receptor genes. Trends in Neurosciences. 1993;16(3):83-88. DOI: 10.1016/0166-2236(93)90128-9
  17. 17. Pickens RW, Svikis DS, McGue M, Lykken DT, Heston LL, Clayton PJ. Heterogeneity in the inheritance of alcoholism. A study of male and female twins. Archives of General Psychiatry. 1991;48(1):19-28. DOI: 10.1001/archpsyc.1991.01810250021002
  18. 18. DeLisi M, Vaughn MG. Foundation for a temperament-based theory of antisocial behavior and criminal justice system involvement. Journal of Criminal Justice. 2014;42(1):10-25
  19. 19. Ishiguro H, Horiuchi Y, Tabata K, Liu QR, Arinami T, Onaivi ES. Cannabinoid CB2 receptor gene and environmental interaction in the development of psychiatric disorders. Molecules. 2018;23(8):18360
  20. 20. Park S, Hong JP, Lee JK, et al. Associations between the neuron-specific glucocorticoid receptor (NR3C1) Bcl-1 polymorphisms and suicide in cancer patients within the first year of diagnosis. Behaviour Brain Function. 2016;12(1):22
  21. 21. Schote AB, Jäger K, Kroll SL, et al. Glucocorticoid receptor gene variants and lower expression of NR3C1 are associated with cocaine use. Addiction Biology. 2019;24(4):730-742. DOI: 10.1111/adb.12632
  22. 22. TUBİM. 2019 Türkiye Uyuşturucu Raporu, Türkiye Uyuşturucu ve Uyuşturucu Bağımlılığı İzleme Merkezi (Internet) 2020. Available from: http://www.narkotik.pol.tr/kurumlar/narkotik.pol.tr/TUB%C4%B0M/Ulusal%20Yay%C4%B1nlar/2019-TURKIYE-UYUSTURUCU-RAPORU.pdf
  23. 23. Bilici R, Karakaş UG, Tufan E, Güven T, Uğurlu M. Bir bağımlılık merkezinde yatarak tedavi gören hastaların sosyo demografik özellikleri. Fırat Tıp Dergisi. 2012;17:223-227
  24. 24. Liu L, Li J, Qing L, et al. Glucocorticoid receptor gene (NR3C1) is hypermethylated in adult males with aggressive behaviour. International Journal of Legal Medicine. 2021;135(1):43-51. DOI: 10.1007/s00414-020-02328-7
  25. 25. Yang Y, Li J, Yang Y. The Research of the Fast SVM Classifier Method. In: 2015 12th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), IEEE. 2015. pp. 121-124
  26. 26. Berke JD, Hyman SE. Addiction, dopamine, and the molecular mechanisms of memory. Neuron. 2000;25(3):515-532. DOI: 10.1016/s0896-6273(00)81056-9
  27. 27. Vandenbergh DJ, Rodriguez LA, Miller IT, Uhl GR, Lachman HM. High-activity catechol-O-methyltransferase allele is more prevalent in polysubstance abusers. American Journal of Medical Genetics. 1997;74(4):439-442

Written By

Inci Zaim Gokbay, Yasemin Oyaci and Sacide Pehlivan

Submitted: 02 July 2022 Reviewed: 05 July 2022 Published: 29 August 2022