Open access peer-reviewed chapter

May Big Data Analysis Be Used to Diagnose Early Autism?

Written By

Terje Solsvik Kristensen

Submitted: 30 August 2022 Reviewed: 14 December 2022 Published: 24 January 2023

DOI: 10.5772/intechopen.109537

From the Edited Volume

Autism Spectrum Disorders - Recent Advances and New Perspectives

Edited by Marco Carotenuto

Chapter metrics overview

76 Chapter Downloads

View Full Metrics

Abstract

In this paper, a technique for early autism identification is presented. Both a multi-layered perceptron (MLP) neural network and a support vector machine (SVM) have been used for classification. Detection of early autism is important, since the prognosis to treat autism is then much better. The patterns of both methods to use have been extracted from high-performance liquid chromatography data in urine. The training samples consist of two types, one from normal children and one from children with autism. The classification rate has been estimated for both algorithms to about 80% or better. The algorithm that gave the best result was SVM. The program that we used to do the analysis we have developed in Java. A lot of work remains to improve the results and increase the recognition rate of the data. The parameter values used in both networks and also the configuration of the networks are not yet optimal. This could be solved by using a particle swarm optimization (PSO) method. We have not yet been using a deep learning network, for instance, a TensorFlow network to raise the classification rate of the different algorithms. We have not yet made a classification between different types of autism of the autism spectrum. All this belongs to future work.

Keywords

  • autism
  • HPLC spectra
  • MLP
  • SVM
  • PSO
  • TensorFlow

1. Introduction

Autism is usually diagnosed by a series of behavioral tests and symptoms [1]. Autism effects the information processing in the brain by changing how the nerve cells (neurons) and their synapses are connecting and organizing themselves. What is triggering this process is not yet understood. Globally, about 25 million people is estimated to suffer from autism. Autism is therefore a huge problem to solve. However, at the moment there is no known cure for it.

Suffering from autism may be identified early at an age of 5 months. However, a clear diagnosis is usually not possible before the children are one and half year or three years old. There seems to be a growing evidence that the earlier the behavioral therapies of autism are started, the better the chances are for the children to be able to live relatively normal lives when growing up.

Autism may be linked to metabolic abnormalities, and these metabolic changes may be detectable in the children’s urine. By using high-performance liquid chromatography (HPLC) spectral data [2], we have found that children of the autism group and the normal group seem to have distinct chemical fingerprints in their urine.

An early test may soon be a reality to identify children at risk for developing early autism. The urine of children with autism may have a certain chemical signature. This indicates also that there can be certain substances in the urine that may trigger the onset of autism [3, 4].

If we are able to develop a method to identify early autism by a chemical or statistical test rather by observing a full-blown behavior, we can start the treatment earlier. There also exist scientists that are linking autism with the production of toxins that may interfere with the brain development [5]. One compound that may be identified in the urine is N-methyl nicotinamide (NMND) which also has been associated with Parkinson’s disease. There are also scientists that are arguing for that autism may be associated with metabolic products of certain bacteria that must be identified [6].

However, this work is on how to use a multi-layered perceptron (MLP) neural network [7, 8] and a support vector machine (SVM) [9], to classify between HPLC samples belonging to normal children and samples belonging to children suffering from autism.

The organization of the chapter is as follows: in Section 2, the data is described, and in Section 3 the feature extraction techniques used have been presented. Section 4 defines a MLP neural network and the algorithm used for training it is defined. In Section 5, the SVM network and training algorithm for such a network is presented and in Section 6 we present the results of both a small-scale experiment and a proof of concept experiment. Section 7 gives the conclusion. In Section 8, we present further work on how we may use more advanced technology to confirm and validate the results achieved in this chapter.

Advertisement

2. The data

The HPLC data was recording by a company Tipogen ltd. at Bergen Hightech Centre, Norway. The company went corrupt some years ago. These of the first experiment was based on 30 samples of urine spectra from both normal and autism children. The aim of this first experiment was to verify a so-called proof of principle. First, we want to find out if datamining based on machine learning algorithms could be used to verify autism using HPLC spectra of children [2]. Figure 1 shows how HPLC spectra look like.

Figure 1.

An example of a HLPC spectrum of a child with autism.

The first axis represents what is called “retention time.” This represents the peak ID. The second axis represents the intentions or the “peak area.” The data was delivered in a spread sheet format. One spread sheet for normal children and one spread sheet for children with autism.

Advertisement

3. Feature extraction

The sample length may vary for both control and autism data. This was not easy to handle in an adequate way early in the analysis. An example of a pattern generated from the data is given in Figure 2. Each sample has a specific number of peaks. The different numbers of peaks belonging to each sample makes the analysis more complicated and was an important parameter to be estimated in the recognition process.

Figure 2.

HPLC peak extraction is generating by the data patterns.

3.1 Pattern diagnostics

To discriminate patterns acquired from healthy individuals and individuals affected from a disease is the most important aspect of a pattern diagnostic technique. Pattern diagnostic is based on the analysis of a huge amount of HPLC data [2] and may be used to find patterns to identify a disease.

Mass spectrometry (MS) data in the blood is an alternative method to be used [10, 11]. This method has been given promising results in detection of early cancer [12] and may also be used to show early autism. Mass spectrometry data consists of a set of m/z values (m is the atomic mass and z is the charge of the ion) and the corresponding relative intensities of all molecules present with that m/z ratio. The MS data of a chemical sample is thus an indication of the actual molecules. The data might therefore be used to predict the presence of a disease condition and distinguish it from a sample taken from a healthy individual.

Advertisement

4. Multi-layered perceptron

A multi-layered perceptron (MLP) network generally contains three or more layers of processing units [7, 8]. The topology of such a network is shown in Figure 3. Here, the network is containing three layers of nodes. The first layer defines the input layer. The middle layer or the “hidden” layer consists of “feature detectors”—units that respond to particular features that may appear in the input pattern. Usually, we may have more than one hidden layer. The output layer is the last layer. The activities of the output units are read as output from the network and define different categories of patterns.

Figure 3.

A MLP network consisting of three layers of nodes.

4.1 The hidden layers

A MLP network usually have many network layers connected by adjustable weights. This removes the restriction that the network is only able to be able to classify linear separable patterns that the ordinary perceptron network only is able to do. By introducing many hidden layers, more complex patterns can be classified. By inserting a new hidden layer, the network may learn more complex pattern, but at the same time more hidden layers could decrease the rate of performance of the network. Many degrees of freedom may then be introduced that are not really needed. This depends, however, on the actual algorithm being used.

4.2 Training of MLP

A supervised learning algorithm is used to train the MLP network. The network is presented to a set of training examples, where a target vector is given for each training example. The target vector is then compared to the output vector. The weights of the network are adjusted to make the network perform better according to the training examples and targets defined. The training algorithm used in this chapter is the backpropagation algorithm. This is a well-known learning algorithm used for classification [13, 14, 15].

Each node in the network is activated in accordance with the input of the node and the node activation function. The difference between the calculated output and the target output is compared. All the weights between the output layer, hidden layers, and the input layer are then adjusted by using this error value. The sigmoidal function f(x) = 1/(1 + e−x) is used to compute the output of a node, but other functions may also be used and may even give better performance. The weighted sum Sj = ∑ wjiai is inserted into the sigmoidal function, and the result of the output value from a unit j is given by:

fSj=1/1+eSj,E1

The error value of an output unit j is computed by formula:

δj=tjajfSjE2

tj and aj are the target and output value for unit j, and f´ is the derivative of the function f. The error value calculated for a hidden node is given by:

δj=δkwkjfSjE3

From the formula, we see that the error of a processing unit in the hidden layer is computed by the upper layer. Finally, the weights can be adjusted by:

Δwji=αδjaiE4

Here, α is the learning rate parameter.

Very often another parameter is also used in the MLP network. It is called the momentum (β). This additional parameter can be very helpful in speeding up the convergence of the algorithm and avoiding local minima [7]. By including momentum in the Eq. (5), the next iteration step can be written as:

wjit+1=wjit+αδjai+βΔwjitE5

Here again α is the learning rate, β is the momentum, and Δwji is the weight change from the previous processing step.

Advertisement

5. SVM theory

SVM (support vector machine) is a computationally efficient learning algorithm that now is being widely used in pattern recognition and classification problems [8]. The algorithm has been derived from the ideas of statistical learning theory to control the generalization abilities of a learning machine [16, 17]. An optimal hyperplane is learnt that classifies the given pattern that the machine is learning. By use of what we called kernel functions, the input feature space can be transformed into a higher dimensional space where the optimal hyperplane can be learnt. Such an approach gives great flexibility by using one of many learning models by changing the kernel function. A nonlinear mapping Φ:RDF where F represents the feature space and k(x, x′) is a Mercer’s Kernel [18]. The inner product k(x, x′) of Φ, is defined by:

ɸ:RDFE6
kxx=ɸTxɸxE7

where the dimension D of the input space is much less than the dimension of the feature space F. Mercer’s kernels are known mathematical functions (polynomial, sigmoid, etc.…) and therefore we can calculate the inner product of Φ without actually knowing it. The learning algorithm selects support vectors to build the decision surface in the feature space. Support vectors are patterns (vectors) that are most difficult to categorize and are lying on the margins of the SVM classifier. This mapping is achieved by first solving a convex optimization problem and then applying a linear mapping from the feature space to the output space. The advantage of having a convex optimization problem is that the solution is unique. This is in contrast to ANN where we may have many local minima or maxima of the error function. Figure 4 illustrates visually the concept of SVM.

Figure 4.

The mappings of SVM.

Calculating ɸ may be a time-consuming process and often not feasible. However, Mercer’s theorem allows us to avoid this computation so there is no need to explicitly describe the nonlinear mapping ɸ neither the image points in the feature space F. This technique is known as the Kernel trick [19].

5.1 The SVM classifier

The concept of the SVM classifier is illustrated in Figure 5. Figure 1 shows the simplest case where the data vectors (marked by ‘X’ s and ‘O’ s) can be separated by a hyperplane.

Figure 5.

A SVM classification maximizes the margins between the different classes.

There may exist many separating hyperplanes. The SVM classifier seeks the separating hyperplane that produces the largest separation of margins. In a more general case, where the data points are not linearly separable in the input space, a nonlinear transformation is used to map the data vectors into a high-dimensional space (the feature space) prior to applying the linear maximum margin classifier.

SVM has been initially designed to classify only binary data as in Figure 5. A multi-class SVM, however, has been designed by allowing classification of a finite number of classes [18]. This kind of learning assumes a priori knowledge of the data. In the case of autism, we may then have different kinds of autisms (which is also the reality), and the SVM algorithm may then be able to learn to categorize between them.

5.2 The kernel

SVM uses a kernel function where the nonlinear mapping is implicitly embedded. The discriminant function of the SVM classifier can be defined as:

fx=i=1NαiyiKxi,x+bE8

Here, K(−,-) is the kernel function, xi are the support vectors determined from the training data, yi is the class indicator (e.g. +1 and −1 for a two class problem) associated with each xi, N is the number of supporting vectors determined during training, α is the Lagrange multiplier for each point in the training set, and b (bias) is a scalar representing the perpendicular distance of the hyperplane from the origin.

A problem using SVM soon appears. As the dimension of data increases, the complexity of the problem also increases. This is called the curse of dimension [20]. However, this may be overcome using the kernel trick from the Mercer’s theorem.

The most commonly used kernel functions are the polynomial kernel given by:

Kxixj=xiTxj+1p,wherep>0isaconstantE9

And the Gaussian radial basis function (RBF) kernel is given by:

Kxixj=exp.xixj2/2σ2E10

Here, σ > 0 is a constant that defines the kernel width.

The mapping to the output space is based on the Cover theorem [21] illustrated in Figure 6. By transforming to a higher-dimensional space, we are able to categorize nonlinear input as linear in the output space using this theorem. This means that what is nonlinear in the input space becomes linear in the output space. In this way, we are able to categorize nonlinear data by use of the SVM algorithm.

Figure 6.

The cover theorem.

Advertisement

6. Experiments and results

A Java program was made to generate the patterns from the original HPLC data. The generation of optimal selection of patterns was very much dependent on computing the right number of peaks from the HPLC spectra.

The training and testing data that we used were written to file to later being read into a Java program. The Java program developed for training uses the package javANN (java Artificial Neural Network). This program was developed by the company Pattern solutions Ltd. [22] in Norway.

A Java program was then developed to change the format of data, so it could be used in the LIBSVM toolbox [23] and be able to classify between a normal and an autism child. Before the training started, a regularization parameter C (cost) has to be determined. The value of C was determined experimentally. This is not an optimal way to do it. The performance of the SVM classifier was optimal for C values from 100 and up to around 200. The SVM algorithm was tested on the same samples of data (unseen) as the MLP algorithm was, and with a constant C equal 100 or 20n The SVM algorithm also uses another constant ɤ (gamma) that has to be defined or the algorithm may determine it by default.

6.1 MLP and SVM experiments

6.1.1 Small-scale experiments

In the first experiment, 18 samples were used for training and 12 samples were used for testing. These are the samples that the algorithm never has seen before. In the MLP experiments, the number of hidden nodes was selected to 100. The learning rate was set to 0.1 and the momentum to 0.9. The number of iterations was set to 10,000.

The MLP algorithm has been tested on 12 unseen samples. The test data were unknown to the system, but we knew what category they belong to. The performance rate was then easy to calculate. The MLP network was able to correctly classify 11 either as an autist or normal child. The best performance was estimated to 11/12 = 91.7%.

The SVM algorithm best performance was estimated to 83.4% where 10 of 12 samples were classified correctly. One false-positive sample was estimated on the average. This corresponds to where a normal child was classified as an autist. This is a far more serious mistake than a false-negative classification error where an autist is classified as normal.

6.1.2 Large-scale experiments

The second delivery of data consist of 62 samples of autism children and 52 samples of normal children, totally 114 samples. In the second analysis, we wanted to see if a proof of principle experiment in the first delivery of data and could be extended to a proof of concept experiment. The training set consists now of 71 samples of data and 43 samples of testing data.

The best performance of the SVM algorithm was estimated to 88.4% with a penalty constant C = 100. This implies that 38 of 43 samples were correctly classified. The average performance of using SVM was then estimated to about 85%.

The MLP network gave the best performance estimated to 81.4% with an average value of 78.3%. This implies that 35 of 43 samples were classified correctly. The average number of false-positive cases in both experiments was equal to 2 for both algorithms.

Advertisement

7. Conclusion

Pattern diagnostics represents new ways to detect early diseases. This method may also be used to classify for instance between different DNA sequences [24, 25, 26]. In this chapter, we have used it to diagnose early autism. Such an analysis requires only small amount of urine to create the HPLC spectra. Mass spectrometry (MS) data is another method that could be used, we believe. The most important aspect of such an analysis is a very high throughput since both HPLC and MS spectra can be determined in short time. An important aspect of this analysis is that the patterns themselves are independent of the identity of proteins as a discriminator. The classification can then be done before the identity of proteins is determined.

Both a proof of principle and a proof of concept experiment have been carried out, and two quite independent algorithms have been used to analyze the data, Both algorithms have shown consistently results with respect to early identification of autism from their HPLC data.

Advertisement

8. Future work

The values of the parameters used in the algorithms are not optimal. The selection of different parameter values has been carried out by doing experiment. A lot of tuning of the parameter values are needed to be able to adapt the algorithms to the given data. One method for optimization is particle swarm optimization (PSO) [27]. This method can be used to determine optimal values of the different parameters, for both the neural network and SVM. PSO is now used very much in different types of applications.

Another aspect of the MLP neural network used is that we have used a sigmodal activation function. However, there are other types of neurons which can be used to introduce nonlinearities in the computation. Tanh neurons use a similar kind of nonlinearity as the sigmodal function, but the output of Tanh range from −1 to 1 compared to the sigmodal function where the output is ranging from 0 to 1. Tanh may in many cases give better performance of the neural network.

A different kind of nonlinearity is used by the restricted linear unit (ReLU) neuron. It uses the function f(x) = max (o,x) and may in many cases give the best performance of the ANN.

8.1 Particle swarm intelligence

Computational swarm intelligence (CSI) may be defined by algorithmic models based on the idea where the design of these algorithms came from the study of bird flocks and ant swarms to simulate their behaviors in computer models. These simulations show great ability to explore a multidimensional space and quickly turned into a quite new domain of algorithmic theory. A swarm can be defined as a group of agents cooperating to achieve some goal.

PSO is based on an intrinsic property of swarms to execute complex tasks by defining a self-organization process [28]. This is a new way to explore the high-dimensional search space to find optimal solutions. Birds are similar to particles that fly through a hyperdimensional space. The social tendency of individuals is used to update the velocity of the particles. Each particle is influenced by the experience of its neighbors and their own knowledge. The norm is that the agents should behave with no centralized structure. The local interactions between the agents often lead to emergency of global behavior [29, 30, 31, 32]. The particles in the multidimensional space, the search space, represent all feasible solutions of a given problem. By using a fitness function, their positions are updated in the process of finding an optimal or near-optimal solution.

How may this be used to improve the results of recognition of early autism? For a neural network, we could use PSO to find a correct configuration of the network. To determine the correct number of neurons in the hidden layer, we could then find the optimal value of the learning rate and momentum of the network.

For the SVM network, we may use of the PSO algorithm to find optimal values for the gamma and cost parameters. We could for instance also use a global PSO algorithm [31] to find these values. So far we have not done this, but it belongs to our future work.

8.2 Deep learning: backpropagation with TensorFlow

Deep learning is another approach that may be used to achieve better performance of the recognition of early autism. A deep neural network is in its simplicity a neural network with many hidden layers [33]; see Figure 7. The more the hidden layers, the deeper the network is. As the neural network gets deeper, the processing power to train the network increases substantially. We may also increase the number of nodes in each hidden layer – often called the width of the network. Multiple frameworks of neural networks exist today. Maybe the most used is the TensorFlow [34] which is an open-source machine learning library developed by Google.

Figure 7.

A simple and deep learning network.

In TensorFlow, numerical computations are processed using data flow graphs (illustrated in Figure 8). The data is represented as Tensors. A Tensor in TensorFlow may be described as a typed multidimensional array. Nodes in the data flow graph are called ops (short for operations). Each op takes zero or more Tensors as input and performs some computation and outputs zero or more Tensors.

Figure 8.

A TensorFlow Data graph.

The edges in the graph represent Tensors communicated between the nodes of the graph. Figure 8 illustrates a TensorFlow data graph.

We could also apply a TensorFlow backpropagation algorithm as in [34]. To create a TensorFlow network, we need to break the networks into Tensors. We also have to define the input data. Then, we need to represent our input as Tensors. We then also need placeholder operations to hold the data.

In the backpropagation algorithm, the cost function is minimized. If we want to see visually what is happening during the learning process, we may introduce a graph and create a TensorFlow session.

To optimize the neural network, different hyperparameters need to be tuned. This may be done by using for instance the PSO method discussed before. These hyperparameters are as follows:

  • Number of hidden layers

  • Number of nodes in each hidden layer

  • Activation function

  • Optimization algorithm

  • Cost (or error) function

  • Learning rate and momentum

  • Epochs (iterations)

When using a standard gradient descent algorithm to find the minimum cost function or global error, the weights are updated after each iteration. However, by using a stochastic gradient descent algorithm, we may use small batches from the dataset in each iteration.

While standard gradient descent performs a parameter update after each run through the whole training set, a stochastic gradient descent performs a parameter update after each batch. According to LeCun et al. [35], one should use stochastic gradient descent if the training set is large (more than a few hundred samples) and redundant, and the task is classification.

The deeper and wider the network is, the more computational expensive it is to train the network. When using gradient descent, one could use the method “GradientDescentOptimizer” in TensorFlow. The algorithm then converges within reasonable time. By introducing a great number of hidden layers and/or a large number of nodes in each hidden layer, we will not necessarily increase the accuracy of the network. However, it may give better performance.

Figure 9 shows how the cost decreases when using gradient descent of small batches from the dataset in each iteration. An example of a learning curve using a deep learning network with stochastic gradient descent may looks like the one given in Figure 9. For the configuration of the network we may, for instance, use up to 10 hidden layers with a number of 10–20 nodes in each layer.

Figure 9.

Gradient descent in a deep learning network.

We have not yet used a deep learning network to classify between an autist and a normal child, but this belongs to do in the future. We may also use such a deep learning network for introducing different kinds of autism in the autism spectrum and be able to classify between them, based on their HPLC spectrum data.

References

  1. 1. Scientific American. 2012. p. 11
  2. 2. Available from: http://stemedhub.org/resources/714/download/HPLCdata.pdf
  3. 3. Molecular Autism 2013;4:14. DOI: 10.1186/2040-2392-4-14
  4. 4. Kristensen T. Classification of early autism based on HPLC Data. In: IFMBE Proceedings of XIII Mediterranean Conference on Medical and Biological Engineering and Computing, Medicon2013. Vol. 41. Sevilla, Spain: Springer; 2013. pp. 774-778
  5. 5. Lilian RH, Farid H, Donald BR. Probabilistic disease classification of expression-dependent proteomic data from mass spectrometry of human serum. Journal of Computational Biology. 2003;10:6
  6. 6. New Scientist. 2010. p. 9
  7. 7. Kristensen T. Neural Networks, Fuzzy Logic and Genetic Algorithms. Cappelen Academic Publisher (in Norwegian); 1997
  8. 8. Mitchell TM. Machine Learning. McGraw-Hill Companies; 1997
  9. 9. Burges CJ. Tutorial on support vector machines for pattern recognition. Knowledge Discovery and Data Mining. 1998;1998
  10. 10. Aebersold R, Mann M. Mass spectrometry based on proteomics. Nature. 2003;422
  11. 11. Lolita AL, Ferrari M, Petricoin E. Clinical proteomics written in blood. Nature. 2003;425:905
  12. 12. Petricoin E, Liotta AL. Selditof based serum proteomic pattern diagnostics for early detection of cancer. Current Opinion in Biotechnology. 2004;15:24-30
  13. 13. Haykin S. Neural Networks and Learning Machines. 3rd ed. Pearson; 2009
  14. 14. Kristensen T, Patel R. Classification of eukaryotic and prokaryotic cells by a backpropagation network. In: Proceedings of IEEE International Joint Conference on Neural Networks, (IJCNN 2003); Portland, Oregon, USA. 2003
  15. 15. Kristensen T. Prototypes of ANN Biomedical Pattern Recognition Systems. In: Proceedings of IASTED International Conference on Simulation and Modeling (ASM 2002); Crete, Greece. 2002
  16. 16. Vapnik VN. Statistical Learning Theory. New York: Wiley; 1998
  17. 17. Vapnik VN. An overview of statistical learning theory. IEEE Transactions on Neural Networks. 1999;1999
  18. 18. Mercer J. Functions of positive and negative type, and their connection with the theory of integral equations. Transactions of London Philosophical Society. 1909;1909:415-446
  19. 19. Huang TM, Kechman V, Kopriva I. Kernel Based Algorithms for Mining Huge Data Sets. Berlin, Heidelberg: Springer-Verlag; 2006
  20. 20. Bellman R. Dynamic Programming. Princeton, USA: Princeton University Press; 1957
  21. 21. Chattamvelli R. Data Mining Algorithms. Oxford, U.K: Alpha Science International Ltd.; 2011
  22. 22. Kristensen T. javANN: Java Artificial Neural Networks. Pattern Solutions AS. 2007. Available from: http://www.patternsolutions.no
  23. 23. Chang CC, Lin CJ. LIBSVM: A library for Support Vector Machines. 2001. Available from: http://www.csie.ntu.edu.tw/cjlin/libsvm
  24. 24. Kristensen T, Guillaume F. Classification of DNA sequences by a MLP and a SVM network. In: Proceedings of International Conference on Bioinformatics and Computational Biology, BIOCOMP’13th July 22-25; Las Vegas. USA: CSCREA Press; 2013
  25. 25. Kristensen T, Guillaume F. Different regimes for classification of DNA sequences. In: IEEE 7th International Conference on Cybernetics and Intelligent Systems & Robotics, Automation and Mechatronics (CIS-RAM 2015). Angkor Wat, Cambodia; 2015
  26. 26. Kristensen T, Guillaume F. Different Regimes for Classification of DNA Sequences. IEEE Press; 2015. pp. 114-119. DOI: 10.1109/ICCIS.7274558
  27. 27. Kristensen T, Guillaume F. PSO in ANN, SVM and Data Clustering. In: Tan Y, editor. Chapter 18. Book: Swarm Intelligence: Volume 1: Principles of current algorithms and methods. London, UK: IET Publisher; 2018
  28. 28. Kohonen T. Self-Organizing Maps: Springer Series in Information Sciences. Vol. 30. Springer-Verlag; 1995
  29. 29. Kennedy J, Eberhart RC. Particle swarm optimization. In: Proceedings of IEEE International conference on Neural Networks. 1995. pp. 1942-1948
  30. 30. Kennedy J, Eberhart RC. A discrete binary version of the particle swarm algorithm. In: Proceedings of the 1997 Conference on Systems, Man, and Cybernetics. Piscataway, N.J: IEEE Service Center; 1997. pp. 4104-4109
  31. 31. Kennedy J, Eberhart RC, Shi Y. Swarm Intelligence. Morgan Kaufmann Academic Press; 2001
  32. 32. Koay C, Srinivasan D. Particle swarm optimization-based approach for generator maintenance scheduling. In: Proceedings of the IEEE Swarm Intelligence Symposium. 2003. pp. 167-173
  33. 33. Nielsen MA. Neural Networks and Deep Learning. Determination Press; 2015
  34. 34. TensorFlow. Available from: https://www.tensorflow.org/2016
  35. 35. LeCun YA, Bottou L, Orr GB, Müller KR. Efficient backpropagation. In: Neural Networks. Berlin Heidelberg: Springer; 1998. pp. 9-48

Written By

Terje Solsvik Kristensen

Submitted: 30 August 2022 Reviewed: 14 December 2022 Published: 24 January 2023