Data Processing Using Artificial Neural Networks

The artificial neural network (ANN) is a machine learning (ML) methodology that evolved and developed from the scheme of imitating the human brain. Artificial intelligence (AI) pyramid illustrates the evolution of ML approach to ANN and leading to deep learning (DL). Nowadays, researchers are very much attracted to DL processes due to its ability to overcome the selectivity-invariance problem. In this chapter, ANN has been explained by discussing the network topology and development parameters (number of nodes, number of hidden layers, learning rules and activated function). The basic concept of node and neutron has been explained, with the help of diagrams, leading to the ANN model and its operation. All the topics have been discussed in such a scheme to give the reader the basic concept and clarity in a sequential way from ANN perceptron model to deep learning models and underlying types.


Introduction
Artificial Intelligence (AI) is the knowledge domain that targets the development of computer systems to solve problems by giving them cognitive powers for performing tasks that usually require human intelligence. Hence, simulation of human intelligence, with computer programing and technologies, is the main objective of AI. Whereas, machine learning is one of the branches of AI, in which computer systems are programmed based on the data and type of input. Machine learning (ML) gives the capability to AI for solving problems based on available data. Likewise, artificial neural network (ANN) is an evolved method of ML algorithms, developed on a concept of imitating the human brain [1][2][3].
A single neuron is considered as a cell, processing electrochemical signals or nerve impulses, and the human brain is a complicated network of neurons that transfers information, with the help of various interlinked neurons. ANN models are considered as most popular among AI models because of their architecture, which is the collection of neurons linked with other neurons in various layers. ANN is non-linear and complex systems of neurons and neuron is a mathematical unit [4].
Literature depicts that ML, ANN and deep learning (DL) falls under the pyramid of AI and shown in Figure 1. Under ANN, DL has gained much importance among researchers. DL is a complex network set of ANN with various layers of processing, which improves the results by developing high levels of insight. DL methodologies To comprehend the basic structure of ANN, firstly, the understanding of 'node' is necessary. The generic model for a node is shown in Figure 5.
Each node receives various inputs through connections and transfers it to adjacent nodes. Figure 6 represents the general model of ANN, which is stimulated by a biological neuron.
The nodes are arranged and organised into linear arrays known as layers. Figure 6 shows that there are three layers in ANN called the input layer, the output layer and the hidden layer.
In the input layer X 1 , X 2 , X 3 , … X n signifies several inputs to the network. Whereas, W 1 , W 2 , W 3 , … W n are known as connection weights, which shows the strength of a particular node. In ANN, weights are considered as the most significant factors as these are numerical parameters that determine the effect of neurons to each other and also impact the output, by converting the input.
In the ANN, the processing part is performed in the hidden layer. The hidden layer executes two operational functions, i.e., summation function and transfer function, also known as an activation function. The summation function is the first step, and in this part, each input (X i ) to ANN is multiplied by its respective weight (W i ) and then, the products W i .X i is cumulated into the summation function ξ = ΣW i .X i . 'B' is a bias value; this parameter is used to regulate the output of the neuron in association with the weighted sum of the inputs. This process is denoted as Eq. (1): The activation function is the second step; which converts the input signal, received from the summation function module and transformed it to an output of a node for an ANN model [1-3, 12, 13].
Generally speaking, each ANN has three main components, i.e., node character, network topology and the learning rules. The node character controls the processing of signals by determining the associated number of inputs and outputs, the associated weight for each input and output and the activation function, for each node. Learning rules establish the initiation and adjustment of weights. Whereas, the network topology defines the ways the nodes will be connected and organised (details are discussed in Section 3.2). The operation of the ANN model is computing the output of all the neurons, which is an entirely deterministic calculation [1,2].

The activation function
An activation function is a mathematical function. In simple words, it receives the output of the summation function as an input and converts that into the final output of a node with the help of ANN processing.
There are different types of activation functions, but non-linear functions are more popular than the linear function. A linear function is just a polynomial of one degree, and it is considered as single-layer ANN model has less power and limited complexity to process complicated data. Therefore, non-linear activation functions are mostly included in designing of ANN models for solving complex problems and this unique quality makes ANN true universal function approximators.
The activation function uses the value ξ = ΣW i .X i as an input for processing and controlling the input X i for activation of the neuron. The most commonly known activation functions [1,[12][13][14][15] are shown in Table 1.

Remarks
Linear Useful for binary schemes.
Most popular activation function since 2015.

Network topology
The nodes are arranged and organised into linear arrays known as layers. The interconnecting network model, between the nodes of ANN, with each other, is called the topology (or architecture). ANN is composed of input layers, hidden layers and output layers, as already discussed in Figure 6. Also, the hidden layers can be from none to numerous, based on the model-complexity. Each layer is a combination of many nodes, and these nodes, based on some properties, can be grouped in layers. A single-layer ANN, with a single output, is known as Perceptron. A conceptual model for layers and ANN topology is shown in Figure 7. Figure 7 shows n number of data entries in the input layer as X 1 , X 2 , … . X n . Also, it can be seen that there is L number of hidden layers in the ANN model. Whereas, there are i number of nodes in each hidden layer. The notations 1 Â 1, 1 Â i, L Â 1 and L Â i, on each node giving its information, expressing 'L' as (hidden) layer number, i.e., from 1 to L and 'i' as node number, i.e., from 1 to i. Y is the output for the mentioned ANN model.
Designing of network topology is based on following factors; (1) the number of nodes in each layer, (2) the number of layers in the network and (3) the connected path among the nodes [1,2,12].

Perceptron and multi-layer architectures
A single-layered ANN, with a single output, is known as the perceptron. The perceptron mostly uses the step function, in which, if the computed sum of the inputs transcends a threshold point, the output is 1; otherwise, it is 0.
Multi-layer perceptrons (MLPs) are the most commonly used architecture for ANN. Composition of MLPs contains layers of neurons with an input layer, an output layer, and the hidden layer (at least one). The layers of the perceptron are interlinked with each other by developing a multi-layered architecture, and this makes the model essentially complex for the ANN processing. The MLP terminology is originated from perceptron neural networks, but its problem-solving capabilities makes it unique [1,14].

Connection types between nodes
The connections between nodes of ANN are classified into two categories: (1) the feedforward network, and (2) the feedback network or recurrent network.

Feedforward networks
Feedforward network is a one-way connection having no loop backwards. They are static in nature as their signal travels in one way only. Figure 8 is a model example of feedforward networks.

Feedback networks
In feedback network, nodes have backward connected loops, and in these connections, the output of the nodes can be the input to the same level or previous nodes. Unlike the feedforward network, the feedback networks are dynamic. In feedback networks, signals are transmitted in forward as well as in backward directions [16]. Feedback process occurs when the output (partial or full) is channelled back into the input of a network as part of a repeated cause-and-effect process [17]. In the feedback network, a single input generates a series of outputs cycles until it reaches an equilibrium point. Equilibrium point refers to minimum error, i.e., for each predicted output if the error is enormous then, the output is routed back, and parameters (weights and biases) are modified until the error becomes minimum [18]. Figure 9 shows the ANN model for feedback network  connections. It can be observed that node H2x1 is sending the information back to node H1x1 and the cycle goes on until the output will reach an equilibrium state, i.e., with minimum error. In a feedback network, there exists at least one interconnected path that drives it back to the starting neuron. It may cause a delay in specific time units, and this interconnected path is called a cycle [1,2,12]. This process will be better understood, after going through the next section.

Training of ANN (learning process)
The training of the ANN is accomplished through a learning process. While in the training process, weights are modified for attaining required results. In the training process, some sample data is processed to the network and weights are modified to attain better approximation of the desired output.
The learning process is mostly classified into two categories: (1) supervised learning, and (2) unsupervised learning.

Supervised learning
In supervised learning, a training set is presented to the model. The training set constitutes of input examples and corresponding target outputs. The inputs are noted for the response of the network, and the weights between with networks are adjusted for error reduction, for the attainment of the desired output. The network follows successive iterations during this process until the computed result converges to the correct one. Construction of the training set requires special consideration. A training set is considered an ideal one, and it should be giving a better representation of the underlying model. Otherwise, a reliable model with desirable results cannot be achieved with an unrepresentative training set.
In the supervised learning process, the networks are trained first before its operation in a model for predictive outputs. Significantly, when the network starts computing the intended outputs with the series of inputs, with fixed weights, then the ANN model can be set for the required operation. Few of the well-known algorithms with a supervised learning method are the Adaline (used for binary data), the Perceptron (used for continuous data), and the Madaline (developed from the Adaline).

Reinforcement learning
Reinforcement learning is a particular case scenario of supervised learning. It is, when the external environment only checks for the information for acceptance and rejection, instead of indicating the correct output. In this process, the wellperforming and the most active neuron connections for the input are strengthened over successive iterations. Few of the renown algorithms of reinforcement learning are the Boltzmann machine, the learning vector quantisation, and Hopfield networks.

Unsupervised learning
Unsupervised learning does not follow a training set or a targeted output approach. Instead, it trails the input data pattern of the underlying model. In this process, the ANN model adjusts its weights, against the supplied inputs, thus producing outputs similar to inputs. The model, without any outer support, recognises the patterns and differences in the inputs. In this process, the clusters are formed, each cluster consists of a group of several weights, in such a way that related input path results in a similar output. If any new pattern is detected during the iteration process, it is classified as a new cluster.
Autoencoders, Hebbian Learning, Deep Belief Nets, Self-Organising Map, Generative Adversarial Networks, and Algebraic Reconstruction Technique (ART) are the few most renown algorithms for unsupervised learning. Unsupervised ANN models are used in diagnosing diseases, image segmentation and many more. Unsupervised algorithms have become very useful and powerful tools in segmentation of magnetic resonance images for detection of anomalies in the body systems [1,2,4,12,14,[22][23][24].

Mapping by ANNs
The primary reason for ANN popularity is due to approximated data output. There are five main steps for the approximation function in the ANN model, as given below.

Data pre-processing
In data pre-processing, the appropriate predictors are selected as inputs before processing to a network for mapping. There are three general processes in data pre-processing, mentioned as follows:

Selection of network architecture
A network architecture comprises several hidden neurons, the number of hidden layers, the flow of data, the way neurons are interconnected, and specific transfer functions. Recurrent neural networks, multi-layer perceptron (MLP), probabilistic neural networks, radial basis function networks, generalised regression neural networks and time-delay neural networks are the few of the renown architectures.

Network training
About function mapping, the training process is known as the calibration of the network through input and out pairs. During the training process, ANN might suffer from the overfitting and underfitting. The overall performance of the network decreases because of these two mentioned factors. This unfitting of the network, during the training process, can be managed by increasing the number of epochs, but it may result in network overfitting if the number of epochs is more significant than a required number. Epoch is defined as a process of providing one pass or iteration of input through the network and modifying the weights. The optimal number of epochs can be determined by the comparison of training error and model testing procedure.

Simulation
Simulation is the ultimate goal of applying ANN networks. It is the representation of predicted output data for an ANN model.

Post-processing
There are three types of sets in which sample data is distributed: (i) the training set, (ii) the validation set, and (iii) the testing set. The training set is used to train the ANN model; it is a set of sample data that is used to modify or adjust the weights in the ANN to produce the desired outcome. The validation set is used to inform the ANN when training is to be terminated (when the minimum error point is achieved). The test set provides an entirely independent way of examining the precision of the ANN. The test set is a set of sample data that is used for the evaluation of the ANN model. A rule of thumb for this random split regarding percentage is 70, 15, 15%, respectively [3,12,14].
The post-processing comprises of all the tests, which are applied on a specific network for the validation of results, also, to analyse, describe, and to improve its final performance. The comparison of results is achieved by using three different statistics. The first one is the root-mean-square error (RMSE), and it is described in Eq. (2): The second statistical factor is percentage volume error (%VE), which is the measuring of the absolute relative bias error of estimated values. It is formulated as Eq. (3): whereas, est i = ith estimated variable, obs i = ith observed data, and n = number of observed values.
The third statistical factor is the correlation, and it is used in the measuring of the linear correlation coefficient between the predicted and observed data.
In case of unsatisfactory results in the post-processing, modification can be made in the following: (1) weights and biases, (2) number of hidden neurons, (3) transfer functions, and (4) number of hidden layers [4,25].

Gradient descent
The term 'gradient descent' is a combination of two words the 'gradient', which means a slope and the 'descent', which means to incline. Therefore, with gradient descent, the slope of gradients is descended to find the lowest point with the 11 Data Processing Using Artificial Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.91935 smallest error. It is an iterative process until the correction of the error in the ANN learning model. It is defined as during the backpropagation in the ANN model, the process of iteration keeps updating biases and weights with the error times derivative of the activation function. The steepest descent step size is substituted by a similar size from the previous step.
A gradient is the derivative of the activation function, as shown in Figure 10.
The primary purpose of using gradient descent is to find the overall cost minimum at each step, with the lowest error. Also, at this point, model predictions are more reliable because of upright fit data. Evaluation of slope can be done with the help of Figure 11, and Eq. (4) can be derived.  whereas, α = learning rate and dy/dx i , also known as the partial derivative of y with respect to x i . For gradient descent, this equation can be used for each variable when δy < 0 (δ is a partial derivative).
Gradient descent can be achieved either for the stochastic or full batch. In stochastic, gradient descent performs calculation for gradient by taking a single sample. Whereas, in full batch, the gradient is calculated for the full training dataset. One of the advantages of stochastic gradient descent is the fast calculation of gradients [1,13,23].

Training algorithm by delta rule
The biases and weights are the parameters of the network that are required to be adjusted before operating an ANN. These parameters can be modified by using either supervised or unsupervised approach for any ANN model. For training purpose, the supervised learning process is generally considered for determining biases and weights of an ANN network. The supervised training process of an ANN network could be attained by using delta rule. The delta rule is expressed as W ij with the help Eqs. (5)-(7), as shown: whereas, n = the number of pairs of data, W = the weight of the link between the ith neuron to the jth neuron in the Lth layer, E = the average error of estimation, t p = target output, y p = simulated output, α = learning rate, the value of which is selected between 0 and 1 experimentally.
The backpropagation algorithm is mostly used for the application of delta rule for the training process of an ANN. The mathematical expression of delta rule is changed to computational relation because of the backpropagation algorithm, which can be applied through an iterative process. This process provides a way to the gradient for determining of the minimum error function, and it is efficiently calculated by using the chain rule of differentiation provided by the backpropagation algorithm. This characteristic makes this process to also be known as the generalised delta rule. In this algorithm, during each iteration, the network weights are shifted along with the negative of the gradient in the steepest descent direction of the performance function (epoch). For a certain weight in the Lth hidden layer, the chain rule gives Eq. (8): This algorithm keeps the iterations continued until the expected output of network training is achieved. The basis for stopping the training process may be the minimum target value of performance function, the number of epochs and run time of the process; this is known as stopped training. The above mentioned equations lead to the following weight calculating Eqs. (9) and (10) (9) For the hidden layer (10) Following this procedure of training, based on the specific input vectors using the final derived weights and biases, the ANN model will be operated on sample data for initiation of simulation for the related outputs. The ANN training can be achieved either by batch training or incremental training. During the batch training process, the adjustment of biases and weights is attained after the presentation of all the inputs and targets. Whereas, during the incremental training, the adjustment of biases and weights is attained just after the presentation of individual input. In training, the process affects network performance. In the case of the low learning rate, the time required for learning the synaptic weights will be extremely long. On the other hand, if the set learning rate will be too high, this will tend the algorithm to oscillate, and the trained network performance will be reduced because the weight changes are too drastic. Therefore, the learning rate controls the convergence of the algorithm. These weight modifications can be applied after each pattern is completed, and these computed weight changes can be summed up to be applied to the network weights, as shown in Eq. (11): Δw L pij (11) Usually, in dynamic networks, the inputs and targets are shown in sequence. In the adaptive learning process, the recent data, that is perceived before the time of simulation is considered as necessary as compared to all the data [4,14,26].

Deep learning
In the field of AI, deep learning (DL) has gained much popularity and trending for investigation domains. One of the foremost shortcomings of conventional machine learning is their inability to solve the selectivity-invariance problem, and because of this drawback, these methods have limited capability of data processing in their real state. Selectivity-invariance enables the model for the selection of those parameters that comprise of more information and disregard parameters with less information. This characteristic of DL, i.e., ability to overcome the selectivityinvariance dilemma, makes it more likeable among researchers and motivate them to the advancement of machine learning using the DL approach.
The architecture of DL is composed of various layers of trainable parameters, and this helps DL-based algorithms for excellent performance in machine learning and AI applications. DL algorithm is Deep Neural Networks (DNNs), and they usually use backpropagation optimised algorithms for end-to-end training. DNNs capability of selectivity-invariance extracts the compound features through successive layers of neurons equipped with differentiable, non-linear activation functions, and this provides a suitable platform for the backpropagation algorithm. A generic architectural model of DNNs is shown in Figure 12. Figure 12 depicts a DNNs model with numerous hidden layers. The outer layer of DNN mostly uses the softmax module for the solution of most of the classification problems. The softmax formula is also known as normalised exponential, is given below in Eq. (12): whereas, j is the set of output nodes, a i is the net input to a particular output node, and Y i is the value of output node between range (0, 1).
DNNs models with non-linear behaviour can go up to several abstractions of levels that helps in decision making by transforming original data into higher abstract levels. This process streamlines finding the solution for non-linear and complex functions. Basis of DL is automated learning of features that offer the facility of transfer learning and modularity. Unlike conventional machine learning, training of DL networks requires a large amount of data. Convolutional neural network (CNN) and recurrent neural network (RNN) are the renown deep networks [27,28].

Convolutional neural network (CNN)
CNN is the popular DL methodology, based on the animal's visual cortex. CNNs are very much similar to ANN that can be observed as the acyclic graph in the form of a well-arranged collection of neurons. Although, in CNNs, the neurons in the hidden layers are only interconnected with a subset of neurons in the preceding layer, unlike regular ANN model. This rare type of interconnectivity enables CNN models to learn the discreet features on an object. CNN models are used for face recognition, scene labelling, image classification, document analysis and many more.
The police department of the Penang Island, Malaysia had installed more than 500 CCTV cameras around the Island and many of them were equipped with face recognition technology, which was developed by IBM. Their main objective was to control crime and capture the wanted criminals [29]. Likewise, in China Pharmaceutical University, to control the student attendance and class discipline the university management installed the facial recognition system in the campus, including the classrooms, labs, library and entrance gates. This overall improved the students' response towards academics [30]. Face recognition technology is based on deep CNN models. This process can be performed by using both supervised and unsupervised approaches but supervised methodologies are mostly preferred. Face recognition is performed by taking an input from video or image and detection is made by taking input to greyscale. The features in greyscale are applied one by one and compared with pixel values. The CNN models give high accuracy than past techniques by overcoming the problems, like light intensity and expressions, with the help of trained models using more training samples [31][32][33].

Recurrent neural network (RNN)
RNNs are used for the tasks that require consecutive sequential inputs for processing. Initially, training of RNNs was done by using backpropagation. RNNs approach utilises one factor of input, at a time, in sequence by keeping state vector in their hidden nodes, in which implicitly within nodes contains information of all the past value of factors of that sequence. RNNs are dynamic and fairly powerful systems, but during the training process the problem occurs as in gradients of backpropagation algorithm either would shrink or grow at every time step, ultimately they might disappear after many cycles. If we explore RNN, deep feedforward networks will be found having all layers sharing the same weight. RNN lags to the capability of storing information for a long time, and deficiency is known as long-term dependencies. To control this shortcoming, one approach has been introduced with explicit memory known as long short-term memory (LSTM). In this method, particular hidden nodes are used to store the information in the form of input data for a much higher time. LSTM is very much recognised for the betterquality performance in speech recognition systems [1,27,28].
Apple's Siri, Amazon's Alexa, Microsoft's Cortana, and Google's Assistant are the most popular voice recognizer tools and they are used for making a phone call, play reminders, alarms, provide driving directions and much more. The speech recognizers are developed on RNN networks, which are based on LSTM-RNN architecture. This gives the RNN models the ability to deal with long-distance patterns and makes them suitable for learning long-span relations. The models are trained endto-end and output is attained [34,35]. Other few applications of RNN models are keyphrase recognition, meteorological data updating, speech to text [35][36][37][38]. Massachusetts Institute of Technology (MIT) had performed an interesting simulated study on self-driving cars, and its framework was also being developed on the deep reinforced model [39].

Supervised ANN model
A simple ANN model was developed using Python. The model was designed by using supervised CNN methodology for image classification. Images were collected for training and validation purpose of the model for apples and oranges. For training purpose, 20 images were collected for each (apple and orange), making a total of 40 images. For validation purpose, 10 more images were collected for each, making a total of 20 images. The data for the supervised process, of the ANN model, was arranged in a specific way with a separate folder for each process, i.e., training and validation. In a folder named as 'Training', images of each fruit were placed separately in the folders having their name titles, i.e., 'Apple' and 'Orange', and same was done for 'Validation' folder. In the classification and prediction process, the model output was analysed, for the effectiveness of the results, against two parameters: (1) effect of increasing the number of epochs per run, and (2) the number of hidden layers.

Number of epochs per run
The effect of increasing the number of epochs on the model, for each run, is shown in Table 2. The effectiveness of the output is measured against the % accuracy, and % loss for different number epochs. The number of hidden layers for these tests were kept constant for each run. Table 2 clearly shows that an increasing number of epochs refines the output by increasing the accuracy and decreasing the data loss. The model gave a correct prediction of the fruit classification in all the runs.

Number of hidden layers
The effect of increasing the number of hidden layers on the model, for each run, is shown in Table 3. The effectiveness of the output is measured against the % accuracy, and % loss for various number hidden layers. The number of epochs for these tests was kept constant for each run. Table 3 clearly shows that an increasing number of hidden layers increases the model effectiveness by increasing the accuracy and decreasing the data loss. The model gave one wrong prediction, when there were 2 hidden layers. Whereas, by increasing the number of hidden layers, the model started to predict correctly.

Overall summary
The output window from the model is shown in Figure 13. It can be seen that the model successfully predicted the correct output ('Apple'). The accuracy of the model was increasing with each epoch from almost 37 to 89% and data loss was also decreasing, consecutively. The program code for this model is given in Appendix A.

Unsupervised ANN model
A simple unsupervised ANN model was developed for the colour quantization of an image, using Python, and Self-Organising Maps (SOM) methodology was adopted. SOM is basically used for feature detection.
Two different images of houses were selected for colour quantization by the SOM model. Separate tests were conducted with each image keeping the same model conditions. In each test, the developed SOM model reduced the distinct colours of the image, and another image was developed. This technique helped the model to learn the colours in the image and then use the same colours to reconstruct that image. The pictorial views for each output are shown in Figure 14.

Overall summary
It can be seen in the output results that for each test the model detected the distinct colours and using the same colours it reproduced that image. The output window from the model is shown in Figure 15. The program code for this model is given in Appendix B.

Conclusions
Operation of the ANN model is the simulation of the human brain, and they fall under the knowledge domain of AI. The popularity of ANN models were increased in the early 1990s, and many studies have been done since. The basic ANN model has three main layers, and the main process is performed in the middle layer known as the hidden layer. The output of the ANN model is very much dependent on the characteristics and function it carries under the hidden layer. Among the feedforward and feedback networks, the latter one propagates the error unless it became minimum for more effective results. The ANN models can perform supervised learning as well as unsupervised learning depending upon the task. The DL algorithms are very much popular among researchers because of effective outputs with large data. CNN and RNN are the two renown deep networks, and they have been used for various applications. Output accuracy of the ANN models is very much dependent on the number of hidden layers and the number of epochs.
In this era of automation, the AI plays an important role, and most of the daily use applications are based on the architecture of ANN models. This ANN technology, combined with other advanced and AI knowledge areas, is making life easier in almost every domain. This evolution of DNN models has led to the creation of Sophia the Robot (Hanson Robotics); the journey is on-going.
Sigmoid represents the activation function of this model.

Appendix B
Program code for unsupervised SOM model is given below:

Step#1 Opening Python
Python was opened, and conda environment was selected.
Step#2 Installing and Import Necessary Data Sources from minisom import MiniSom import numpy as np import matplotlib.pyplot as plt