Open access peer-reviewed chapter

Pre-Informing Methods for ANNs

Written By

Mustafa Turker

Submitted: 31 July 2022 Reviewed: 02 August 2022 Published: 31 August 2022

DOI: 10.5772/intechopen.106906

From the Edited Volume

Artificial Neural Networks - Recent Advances, New Perspectives and Applications

Edited by Patrick Chi Leung Hui

Chapter metrics overview

147 Chapter Downloads

View Full Metrics

Abstract

In the recent past, when computers just entered our lives, we could not even imagine what today would be like. If we look at the future with the same perspective today, only one assumption can be made about where technology will go in the near future; Artificial intelligence applications will be an indispensable part of our lives. While today’s work is promising, there is still a long way to go. The structures that researchers define as artificial intelligence today are actually programmed programs with limits and are result-oriented. Real learning includes many complex features such as convergence, association, inference and prediction. It has been demonstrated with an application how to transfer the input layer connections in human neurons to the artificial learning network with the pre-informing method. When the results are compared, the learning load (weights) was reduced from 147 to 9 with the proposed pre-informing method, and the learning rate was increased between 15–30% according to the activation function used.

Keywords

  • ANN
  • pre-informing
  • AHP
  • modified networks
  • interfered networks

1. Introduction

The learning mechanism makes human beings superior to all other creatures. Despite the fact that today’s computers have much more processing power, the human brain is still much more efficient than any computer or any artificially developed intelligence.

Building a perfect learning network requires more than just cell structures and its weights. The human brain has a very complex network, and each brain is unique for itself. Today’s technology is not enough to explain all the details of how our brain works. My observation of how our brain works starts from defining items. Every item has a key cell in our brain. Defining process is done by visuals, smell, feeling, linguistic name, hearing its sound. If these key cells match any of these information from body inputs, thinking and learning continues, if there is no key cell defined before, new cell is assigned for this item. Then, your brain wants to explore these item’s behavior. You start to take this item in your hand and start the psychical observation. When the psychical observation is satisfied, your brain starts to categorize it. After categorization, your brain checks other items for same categorization and determines what other information can be learned. Whenever you see someone has more knowledge from you, then you want to speak about this newly learned item, or you want to do research on it. This key cell started to develop itself with explored information. Each key cell and its network can also connect to each other in any part, if there are logical connections that exist.

Today’s artificial intelligence studies are a little simple compared to reality. Mathematical modeling of learning in an artificial cell and solving the problem with an optimization mechanism has resulted in success in most areas. However, this success is due to the fast processing capacity of computers rather than the perfect modeling of machine learning. In this case, researchers need to work on developing artificial neural networks close to the real learning.

In this study, the pre-informing method and rules in artificial neural networks are explained with an example in order to establish a more conscious and effective learning network instead of searching for relationships in random connections.

Advertisement

2. ANN structure

In the literature of ANN design, the first principles were introduced in the middle of the 20th century [1, 2]. Over the following years, network structures such as Perceptron, Artron, Adaline, Madaline, Back-Propagation, Hopfield Network, Counter-Propagation Network, Lamstar were developed [3, 4, 5, 6, 7, 8, 9, 10].

The complex behavior of our brain artificially imitated through layers is most network configuration. Basically, an artificial neural network has 3 types of layer group: input layer, hidden layers, output layer (See Figure 1). And all cells in these layers connected each other with artificial weights [1, 2].

Figure 1.

Basic ANN structure.

Input layer is the cluster of cells present the data that has influence on learning. Each cell represents a parameter with a variable data value. These values are scaled according to the limits of the activation function used in the next layers. The selection of input parameters requires knowledge and experience on the subject to be created artificial intelligence. In fact, this process is exactly the transfer of natural neuron input parameters from our brain to paper. However, this is not so easy because a learning activity in our brain is connected by a huge number of networks managed subconsciously. To explain this situation, sometimes our minds make some inferences even on subjects we have no knowledge of, and we can make correct predictions about this subject. In some cases, we feel the result of an event that we do not know, but we cannot explain it. In fact, the best example of this is falling in love. No one can tell why you fall in love with a person, it happens and then you look for the reason. This is proof that the subconscious mind plays a major role in learning. This means that there may also be some input parameters that we did not notice. Therefore, it is necessary to focus on this layer and define the input parameters.

Hidden layer(s) is the layer where the data of the input parameters are interpreted, and the learning capability of the network is defined. Each cell in these layers transfers the data from the input layer cells or previously hidden layer cells with the defined activation function and sends it to all cells in the next layer. Learning of nonlinear behavior takes place in this layer. Increasing the number of layers and cells in this group does not always work, but provides memorization, not learning. This also increases the number of connections and thus highly increases the required experienced data to determine the weight values of these connections.

In general, the basic mechanism of an artificial neuron consists of two steps: summation and activation [1]. Summation is the process of summing the intensities of incoming connections. Activation, on the other hand, is the process of transforming the collected signals according to the defined function (See Figure 2).

Figure 2.

Artificial neuron structure.

There are many activation functions. The purpose of these functions is to emulate linear or non-linear behavior. The sigmoid function is one of the most commonly used activation functions.

Mathematically, the summation and activation process of an artificial neuron is expressed as below (See Eqs. (1) and (2)).

u=n=1mxiwiθE1
y=fuE2

In these equations,

  • xi: Input value or previous cell output value for previous layer cells,

  • wi: Weight value of the connection for previous layer cells,

  • θ: Bias value,

  • u: Net collected output value of the cell,

  • y: Activated output value of the cell.

In some cases, the learning network cannot find a logical connection between the results and the inputs, so that this does not stop learning, a bias value can be used for each cell. A high bias coefficient means that learning is low, and memorization is high.

The output layer is the last layer in the connection and receives inputs from the last set in the hidden layer. In this layer, data is collected and as a result, output data is exported in the planned method.

The learning process of the network established with the input, hidden and output layers is actually an optimization problem. The connection values between the cells of the network converge to reach the result depending on the optimization technique. A training set consisting of a certain number of input and output data is used for this purpose. If desired, a certain amount of data set is also tested to measure the consistency of the network. When the learning is complete, the values of the weights are fixed, and the network becomes serviceable. If desired, the mathematical equation of the network can be derived by following the cells from back to forward.

Advertisement

3. Pre-informing of ANNs

Pre-information, unlike pre-training, is the processing of a certain information or rule into the structure of the network. In reality, a person learns under some prejudices while learning something. These prejudices are a mechanism that allows us to make predictions about the event that will occur, and they make these inferences by utilizing similar events. With these prejudices, the number of training data required for learning decreases by a considerable ratio. As a result, you have a clean and efficient way of learning.

For example, for a child who goes out for the first time, his mother advises never to talk to strangers, and he guesses that if the child talks to a stranger, the result may be bad. In this case, the people to talk to are the input parameters, the possibility of something bad happening as a result of the conversation is the output parameter. If the mother did not give advice to her child, the child would talk to everyone and eventually learn that talking to a stranger is bad and dangerous. As a result of the mother’s suggestion, the weight of strangers among the input parameters (people to talk to) increased before they even experienced it.

In order to transfer prejudices to artificial neural networks, some rules must be followed:

  1. The pre-Informed network structure consists of 3 layers; input layer, hidden layer, output layer. The hidden layer consists of a single sublayer.

  2. Input parameters should be grouped, if possible. For example, in a learning network that predicts heart attack, personal characteristics are one group, bad habits are another group, genetic diseases are another group. If there is no group, it should be considered as 1 group. These inputs should be scaled according to the activation function that will be used in the hidden layer.

  3. The information to be processed (pre-informing) should be in the weights between the input layer and the hidden layer.

  4. An artificial neuron cell is placed for each input group in the hidden layer to represent each group. This cell consists of 3 steps: summation, scaling, activation. Two or more different activation functions can be used in cells in the hidden layer. In this case, for each input group, same number of representation cells should be defined in the hidden layer.

  5. The connections of cells in the input layer to the representation cells of other groups other than their own are considered 0.

  6. The representation cells in the hidden layer are directly connected to the output layer.

  7. Optimization optimizes the weights of the connections between the hidden layer cells and the output layer cells.

  8. The connection values of the input layer groups to the representation cells in the hidden layer are determined and fixed for each group using the techniques in the literature.

In Figure 3, a total of 23 input parameters belonging to 3 input groups, these three groups are represented by two separate cells with hyperbolic tangent and sigmoid activations, and a hidden layer consisting of a total of 6 cells, and finally an output layer are described.

Figure 3.

Pre-informed ANN structure.

After the network structure is established, the next step is pre-informing the network. This stage is the transfer of information from the subconscious to network weights. This stage should be done for each group, and each group should be considered separately. The best method of this process is using AHP (Analytic Hierarchy Process) evaluation methods. In AHP evaluation methods, each parameter is compared with the other using verbal expressions. A simple superiority scale is used in this comparison. This means you can prepare a questionnaire and get the superiority information of parameters from an expert mind. After some calculations you will have the weights. These weights will be used in the network directly. The beauty of using this technique is consistency analysis can be done. In the end, if the input parameters are defined correctly, you will have 100% academically proofed subconscious information extraction.

AHP is a multi-criteria decision making (MCDM) method. The earliest reference to AHP is from 1972 [11]. Afterwards, Saaty [12], fully described the method in his article published in the Journal of Mathematical Psychology. AHP makes it possible to divide the problem into a hierarchy of sub-problems that can be more easily grasped and evaluated subjectively. Subjective evaluations are converted into numerical values and each alternative is processed and ranked on a numerical scale. Schematic AHP hierarchy is given in Figure 4 below.

Figure 4.

AHP hierarchy.

At the top of the hierarchy is the goal/purpose, while at the bottom there are alternatives. Between these two parts are the criteria and their sub-criteria. The most important feature that makes AHP important is that it can make comparisons both locally and globally when comparing the effect of sub-criteria at any level on alternatives.

Data corresponding to the hierarchical structure is collected by experts or decision makers by pairwise comparison of alternatives within the scope of a qualitative scale. Experts can rate the comparison as equal, less strong, strong, very strong and extremely strong. A general table, as shown in Figure 5, is used for expert evaluation of pairwise comparisons and data collection. This design can be customized for purpose, method and user usage.

Figure 5.

Pairwise comparison chart of alternatives A and B. B is very inferior compared to A.

Comparisons are made for each criterion and converted to quantitative numbers according to Table 1.

ScaleDefinitionDescription
1EqualThe two criteria are equally important.
3Little SuperiorOne of the criteria has some superiority based on experience and judgment
5SuperiorOne of the criteria has many advantages based on experience and judgment.
7Very SuperiorOne criterion is considered superior to the other
9Extreme SuperiorEvidence that one criterion is superior to another has great credibility
2, 4, 6, 8Intermediate valuesIntermediate values to be used for reconciliation

Table 1.

Comparison scales and explanations.

The pairwise comparison values of the criteria arranged in a matrix shown in Table 2.

C1C2C3Cn
C1a11=1a12a13a1n
C21/a12a22=1a23a2n
C31/a131/a23a33=1a3n
Cn1/a1n1/a2n1/a3nann=1
aS1=i=1nai1S2=i=1nai2S3=i=1nai3Sn=i=1na

Table 2.

Pairwise comparison matrix of criteria.

In next step, each aij value is normalized by dividing by the corresponding column sum, and the weights shown in the table above are obtained with the corresponding equation shown in the Table 3 above.

K1K2K3Knwi
K1a11/S1a12/S2a13/S3a1n/Snw1
K2a21/S1a22/S2a23/S3a2n/Snw2
K3a31/S1a32/S2a33/S3a3n/Snw3
Knan1/S1an2/S2an3/S3ann/Snwn
aS1/S1S2/S2S3/S3Sn/Snwi=j=1naijSj/n

Table 3.

Obtaining the weights of the normalized comparison values of the criteria.

Network connections of input parameters using AHP are explained as shown above. Next step is how to assign weights. Figure 6 shows how the AHP weights are defined to the network.

Figure 6.

Connections of two input groups to three different types of representation cells and implementation of AHP weights.

In this way, a large number of connections are canceled and a fast, efficient and less data-needing network is obtained.

Advertisement

4. Estimation of the severity of occupational accidents with using pre-informed ANN

The pre-informed neural network method was used by Turker [13] to predict the severity of occupational accidents in construction projects. In this study, it has been estimated how the accidents will result if they happen instead of the possibility of their occurrence. The scope of the study was made for the 4 most common accident types in the world. These are falling from high, hit from a thrown/falling object, structural collapse, electrical contact. In this study, 23 measures to be taken in occupational accidents are discussed in 3 groups. These measures have been associated with occupational accident severity in the artificial intelligence network (Table 4).

Collective protection measures (TKY)Personal protective equipment (KKD)Control, training, inspection (KEM)
(TKY-1) Constr. site curtain system
(TKY-2) Colored excavation net
(TKY-3) Safety rope system
(TKY-4) Guardrail systems
(TKY-5) Facade cladding
(TKY-6) Safety Field Curtain
(TKY-7) First aid kit, fire extinguisher
(TKY-8) Facade safety net
(TKY-9) Mobile electrical dist. panel
(TKY-10) Warning and info signs
(KKD-1) Safety Helmet
(KKD-2) Protective Goggles
(KKD-3) Face Mask
(KKD-4) Face Shield
(KKD-5) Working Suit
(KKD-6) Reflector
(KKD-7) Parachute Safety Belt
(KKD-8) Working Shoes
(KKD-9) Protective Gloves
(KEM-1) OHS specialist
(KEM-2) Occupational Doctor
(KEM-3) Examination
(KEM-4) OHS trainings

Table 4.

Risk reduction measures in occupational accidents.

First of all, defined measures in occupational accidents, which are the input parameters, were turned into a questionnaire by creating paired comparison questions for comparison within their own groups. Occupational health and safety experts working professionally in the sector were reached through a professional firm. The questionnaires were administered online and recorded. Survey results were taken and converted to weights with AHP matrices. Weights are shown in Tables 57.

CodeStructural collapseFalling from highObject hitContact w/ Electricity
TKY-10,0000,0000,0000,000
TKY-20,0000,0000,0000,000
TKY-30,5550,3980,1090,000
TKY-40,0000,1850,1090,000
TKY-50,0000,1020,0000,000
TKY-60,2520,0990,1090,107
TKY-70,0970,0390,4060,120
TKY-80,0000,1260,0000,000
TKY-90,0000,0000,0000,411
TKY-100,0970,0520,2690,361

Table 5.

AHP weights of collective protection measures group.

CodeStructural collapseFalling from highObject hitContact w/ electricity
KKD-10,1950,2430,2250,076
KKD-20,0950,0500,0800,098
KKD-30,0440,0440,0350,039
KKD-40,0720,0500,0910,100
KKD-50,0710,0930,1060,179
KKD-60,0390,0440,0500,058
KKD-70,3370,3880,2520,086
KKD-80,0810,0450,0870,142
KKD-90,0660,0450,0740,222

Table 6.

AHP weights of personal protective equipment group.

CodeStructural collapseFalling from highObject hitContact w/ electricity
KEM-10,4810,1670,3990,426
KEM-20,2100,1670,1610,134
KEM-30,0980,1670,0830,067
KEM-40,2100,5000,3570,372

Table 7.

AHP weights of control, training, inspection group.

After obtaining the preliminary information weights, 3 different artificial intelligence networks were created (Table 8). 140 historical accident data were collected on selected accidents within a company. These data include the precautions taken at the time of the accident and how the accident resulted. Accident results are divided into 4 categories: near miss, minor injury, serious injury, death. For each accident, 35 datasets were collected and a total of 120 datasets were used in training the network and 20 datasets were used in testing the network.

NetworkRegular ANNPre-informed ANNPre-informed ANN
SoftwareSPSS – Neural Networks EngineEXCEL VBA + SOLVEREXCEL VBA + SOLVER
Network StructureMultilayer Perceptron (MP)MPMP
Number of Hidden Layers111
Cells in Hidden Layer6 (Cells) + 1 (Bias)6 (Cells) + 3 (Bias)6 (Cells) + 3 (Bias)
Activation Function
in Hidden Layer Cells
Hyperbolic Tangent
(6 Cells)
Hyperbolic Tangent
(6 Cells)
Parabolic
Functions
3 Cells; fx=x2
3 Cells; fx=x
Output functionfx=xfx=xfx=x
Scaling Method(x- x̄) / Standard Dev.(x- x̄) / Standard Dev.(1–x) * 10
Optimization AlgorithmGradient MethodsGradient MethodsGradient Methods
RandomizerMersenne Twister algorithmMersenne TwisterMersenne Twister
Initial Value10100,1

Table 8.

3 alternative ANN structures.

Three alternative network structures were trained with the same data. As a result, the pre-informed neural network provided a better learning rate of 5% in the training set and 15% in the test set compared to the neural network without a pre-informed stage. The configuration using parabolic activation function from pre-informed artificial neural networks provided 1% better learning rate in the training set and 15% better in the test set compared to the configuration using hyperbolic tangent. Other configurations with activation functions were not included in the comparisons because of their low learning rates. As a result, it has been seen that the preliminary information phase significantly increases the learning performance in artificial neural networks. In addition, it has been observed that the parabolic activation function performs better than the hyperbolic tangent in relation to the prevention methods in occupational accidents and the result of the accident (Table 9).

NetworkRegular ANNPre-informed ANNPre-informed ANN
STRUCTURAL
COLLAPSE
Training Set26/30 (87%)29/30 (97%)30/30 (100%)
Test Set2/5 (40%)4/5 (80%)4/5 (80%)
CONTACT w/
ELECTRICITY
Training Set27/30 (90%)30/30 (100%)30/30 (100%)
Test Set2/5 (40%)4/5 (80%)5/5 (100%)
OBJECT HITTraining Set30/30 (100%)30/30 (100%)30/30 (100%)
Test Set4/5 (80%)3/5 (60%)5/5 (100%)
FALLING
FROM HIGH
Training Set30/30 (100%)30/30 (100%)30/30 (100%)
Test Set4/5 (80%)4/5 (80%)4/5 (80%)
TOTALTraining Set113/120 (94%)119/120 (99%)120/120 (100%)
Test Set12/20 (60%)15/20 (75%)18/20 (90%)

Table 9.

3 alternative ANN structure results.

Advertisement

5. Conclusions

In this study, how the learning ability of artificial neural networks should be increased with the pre-informing method is explained with rules and demonstrations. It is not possible to implement this method with the existing ready-made ANN software on the market. Instead, ANN should be expressed mathematically, and pre-informing method should be applied using programming languages such as MATLAB, Excel VBA, Python.

In this section, the application of this method has been demonstrated in an artificial neural network in which the precautions in occupational accidents are associated with the results of the accident and high performance has been achieved. With the application of the specified rules, this method can be used to solve many problems. In future studies, it can be investigated which other methods such as AHP can be used for the preliminary information phase.

Advertisement

Conflict of interest

The authors declare no conflict of interest.

References

  1. 1. Graupe D. Principles of artificial neural networks. 3rd ed. In: Advanced Series in Circuits and Systems. Singapore: World Scientific Publishing Co. Pte. Ltd.; 2013. DOI: 10.1142/8868
  2. 2. McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics. 1943;5(4):115-133
  3. 3. Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review. 1958;65(6):386
  4. 4. Graupe D, Lynn J. Some aspects regarding mechanistic modelling of recognition and memory. Cybernetica. 1969;12(3):119
  5. 5. Hecht-Nielsen R. Counterpropagation networks. Applied Optics. 1987;26(23):4979-4984
  6. 6. Hopfield JJ. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences. 1982;79(8):2554-2558
  7. 7. Bellman R, Kalaba R. Dynamic programming and statistical communication theory. Proceedings of the National Academy of Sciences. 1957;43(8):749-751
  8. 8. Widrow B, Winter R. Neural nets for adaptive filtering and adaptive pattern recognition. Computer. 1988;21(3):25-39
  9. 9. Widrow B, Hoff ME. Adaptive Switching Circuits. Stanford, CA: Stanford University; 1960
  10. 10. Lee RJ. Generalization of learning in a machine. In: Preprints of Papers Presented at the 14th National Meeting of the Association for Computing Machinery (ACM ’59). New York, NY, USA: Association for Computing Machinery; 1959. pp. 1-4. DOI: 10.1145/612201.612227
  11. 11. Saaty TL. An Eigenvalue Allocation Model for Prioritization and Planning. Pennsylvania, USA: University of Pennsylvania; 1972. pp. 28-31
  12. 12. Saaty TL. A scaling method for priorities in hierarchical structures. Journal of Mathematical Psychology. 1977;15(3):234-281
  13. 13. Turker M. Estimation of the Severity of Occupational Accidents in the Building Process with Pre-informed Artificial Learning Method. Gazi: Gazi University; 2021

Written By

Mustafa Turker

Submitted: 31 July 2022 Reviewed: 02 August 2022 Published: 31 August 2022