An Effective Method for Secure Data Delivery in IoT

Mnar Alnaghes; Nickolas Falkner; Hong Shen

doi:10.5772/intechopen.104663

Abstract

The Internet of Things (IoT) has become very popular recently due to its important features that contribute to many aspects of our lives such as health and transportation. It consists of a vast number of different projects such as sensors, tags, actuators, and mobile devices, which can communicate and collaborate without human interactions. These devices carry small memory and low-energy battery, which affects their performance and lead to many issues. In this work, we are going to focus on the efficiency and security issues. We will propose a secure and efficient routing protocol for data delivery in order to improve its performance. The proposed technique will be evaluated in an implemented platform with appropriate case study. The expected outcome of this study will be a reference design and its practical implementation to support efficiency and security in IoT.

Keywords

IoT
data delivery
routing protocols
security efficiency

Author Information

Show +

Mnar Alnaghes*
- University of Adelaide, Australia
Nickolas Falkner
- University of Adelaide, Australia
Hong Shen
- Faculty of Sciences, School of Computer Science, University of Adelaide, Engineering, and Technology, Australia

*Address all correspondence to: mnar.alnaghes@adelaide.edu.au

1. Introduction

Nowadays, many research efforts have been concentrated on the efficiency and security of IoT devices to raise the performance and the level of protection for IoT data and detect possible attacks. It is significant to understand the types of data delivery challenges in IoT. The challenges related to wireless sensor networks (WSN), cyber-physical systems (CPS), and machine-to-machine (M2M) continue to appear within the context of IoT since the basic components of IoT networks include WSNs, CPS, and M2M. One of the challenges is the difficulty in providing communications using infrastructure-based wireless systems because of the high cost of deploying and maintaining this infrastructure with the rapid growth of IoT users and devices [1]. Furthermore, the IoT system is mobile and dynamic; thus, its perimeters are not well-defined. It is also robustly heterogeneous concerning the devices, protocols, and communication medium.

The other concern is that IoT system is vulnerable to malicious cyber-attacks. One of these aggressive attacks is a distributed DoS (DDoS) attack, which intends to bring down a victim system by preventing legitimate devices of service from accessing it. DDoS attackers may also aim to gain unlimited access to the victim machines and cause more damage consequently. These attacks are made not to be significantly distinct from the usual behavior practiced by the system. One of the techniques that can be used to detect cyber-attacks is intrusion detection (ID). Yet, the advanced ID schemes utilizing machine learning techniques struggle to detect some of the cyber-attacks. These attacks are made not to be significantly distinct from the usual behavior practiced by the system. Therefore, there is a need for an anomaly-based IDS combined with artificial intelligence and machine learning due to its ability to classify and identify earlier hidden attacks. This kind of IDS will help in detecting multi-stage DDoS attacks. Current schemes in the development of ID investigate artificial intelligence and machine learning in academia and industry, such as artificial neural networks and fuzzy logic.

IoT advanced systems can achieve high performance with a human being’s supervision for defining how to perform their duties. They also can automatically detect unusual patterns of web traffic with malicious activities and learn the patterns by themselves over time. Previous studies in the wireless network security area focused on ID based on a single hidden Markov model (HMM) and multi-class system classifier (MCSC) [2, 3]. Here, we study the potential applicability of the hierarchical hidden Markov model (HHMM) for intrusion detection in IoT systems in which the problem space can be several magnitudes higher than in wireless networks. And, we propose a probabilistic hierarchical hidden Markov model that reduces the high state-space without compromising classification accuracy. The proposed scheme shows better outcomes for detecting the DoS and DDoS attack patterns compared to the state-of-the-artwork.

The main contributions of the work are:

We propose a PHHMM model that translates high-dimensional IoT data to a discrete set of reliable data to be securely delivered with the ability to detect DDoS attacks.
We propose a method that learns and efficiently analyzes large amounts of data for classifying DDoS patterns in IoT traffic.
We conducted a performance comparison of our PHHMM with the baseline HHMM, Neural Network, and Naive Bayes models on the benchmark dataset from the CICIDS2019 database that contains 11 types of DoS and DDoS attacks collected over real-time for validating the proposed model.

The rest of this paper is organized as follows. Section II investigates the state of the art of some IoT data delivery used methods and presents the related work, followed by the background in Section III. Then, the model details are demonstrated in section IV and section V. Next, we show and discuss the experiments and simulation results in sections VI and VII. Finally, section V ends the paper, outlining some suggestions for future work.

2. Related work

2.1 Routing protocols in IoT

To design an efficient protocol in IoT networks is a risky task due to their characteristics. The efficient routing protocol has to respond to the changes that may happen in the topology as same as the bandwidth constraint. Most of the proposed protocols are only sub-optimal. Forster et al. [4] discuss three popular machine learning techniques on the communication layers in the WSNs. These algorithms are used in distributed environments to solve different problems such as ad hoc routing. They are categorized into three groups; reinforcement learning, supervised, and unsupervised. The aim is to find out a convergent mapping function that helps in prophesying the output results for any new input. Routing in IoT environments, as mentioned earlier, is associated with protocols in wireless sensors and ad hoc networks. One existing routing protocol for IoT networks is IP6 overpower personal area networks (6LoWPAN), which are used to route the data among non-IP sensors through networks with high processing capabilities. Its topology consists of a set of reduced function sensors that are linked to full function sensors [5]. It helps to support low cost, different length addresses, low bandwidth, different topologies, energy consumption, and lengthy sleep time. This protocol supports the multi-hop data delivery and reduces transmission overhead by providing header compression enclosing IPv6 long headers in the IEEE802.15.4 small packets [6]. Many of the real-world machine learning algorithms use both supervised and unsupervised learning as hybrid learning or semi-supervised learning to take advantage of the strengths of these main categories and minimize their cons [7]. Another standard protocol in IoT is the Routing Protocol for Low-Power and Lossy Networks (RPL) [8], which is a distance-vector protocol based on IPV6 that can prop lots of data-link protocols. It builds a destination-oriented directed acyclic graph (DODAG). It has only one path from each node to the root, and all the communications will be through that root. All nodes advertise themselves as the root by broadcasting a DODAG information object (DIO), and then the DODAG is gradually built. For the cognitive networks, as an extension of RPL, which is the Cognitive RPL Protocol (CORPL) is designed. It uses the DODAG topology generation. Constrained Application Protocol (CoAP) [6] is another IoT protocol that produces a lightweight RESTful (HTTP) interface to reduce overhead and power consumption. The next protocol is the Message Queue Telemetry Transport (MQTT), which was introduced for providing embedded connectivity between the party of middlewares and applications and the party of networks and communications. It is a publish/subscribe design that includes three parts: publishers, which are the sensors that connect to the broker to send their data, subscribers, which are the sensory data or applications, and the broker, which sends the data to the subscribers after classifying them in topics. Secure MQTT (SMQTT) [6] is an extension of MQTT to enhance its security features. It is encryption-based, where each message is encrypted and delivered to multiple nodes, which is common in IoT applications. For supporting a large range of IoT applications, ZigBee smart energy [6] is used. It has a wide star topology, peer-to-peer topology, or cluster-tree network topology. It also allows implementations with low memory and processing power. In addition, the Advanced Message Queuing Protocol (AMQP) [9] is designed for the financial industry. It is a publish/subscribe design built over TCP, but the broker here is divided into two main components: exchange and queues. The exchange receives publisher messages and distributes them to queues based on pre-defined conditions. Moreover, the long-term evolution advanced protocol (LTE-A) [9] is used for IoT applications in wireless networks. LTE-A design has a core network (CN) to control mobile devices, a radio access network (RAN) to establish data planes and control the wireless connections, and mobile nodes.

2.2 Intrusion detection systems in IoT

Because of the lack of training datasets, the current IoT intrusion detection systems are incapable of detecting the latest DoS and DDoS attacks [10], such as Network Time Protocol (NTP) attack, Network BIOS (NetBIOS) attack, UDP lag (delay). The authors in [11] proposed a hidden Markov model for predicting and detecting multi-stage attacks. Their work is not applicable for IoT systems as it fails on the high dimension state space since the incoming network traffic in IoT will have largely hidden states. The approach developed by [12] has a high detection rate as it identified most of the occurred attacks. However, they did not consider DDoS attacks. Authors in [2] proposed an anomaly detection module that uses Long Short-term memory for detecting both known and unknown attacks with a low false-positive rate. Their work shows high recognition rates. In [3], researchers discussed the multi-stage attack and its prediction. They proposed a multi-stage Naive Bayes model that can predict each stage of the multi-stage attack scenarios. However, schemes in [2, 3] are not suitable for predicting multiple attack intents in heteroecious environments. Besides, the authors in [13] propose a Hierarchical Hidden Markov Model (HHMM), which is an extension of the hidden Markov model (HMM), as the method for activity recognition. They analyzed the accuracy rate of their model with the Naive Bayes and HMM schemes. The comparison showed that the HHMM has the highest accuracy rate among others. However, they did not take into consideration the nature of IoT systems. The authors in [14] described routing-specific attacks in the IoT systems and concentrated on identifying the malicious node’s location and neighborhood to inform the network administrator. In [15], the researchers proposed an ID scheme to detect flood attacks in IoT networks. Their proposed model identifies the attacks through the back-propagation neural network model. Table 1 summarizes the properties of the major types of existing ID schemes.

Paper	HHMM	HMM	DDoS detection	Up-to-date dataset	Applicable for IoT
[1]	No	Yes	Yes	No	No
[2]	No	Yes	No	N/A	No
[3]	No	Yes	No	Yes	No
[4]	No	Yes	Yes	Yes	No
[5]	Yes	Yes	Yes	Yes	No
[6]	No	No	Yes	N/A	Yes
[7]	No	No	Yes	N/A	Yes

Table 1.

Intrusion detection schemes.

3. Preliminaries

This section provides definitions for the used terms in this paper:

3.1 Distributed denial of service attacks (DDoS)

DDoS is one of the potential attacks in IoT where attackers coordinate the utility of many machines connected to the network to send an overwhelming amount of unwanted requests to a targeted server [16]. They try to disrupt the traffic of the server with a flood of unwanted requests. Besides, DDoS reaches effectiveness by using various compromised devices as the roots of attack traffic. The more hacked devices, the more damage is caused to the servers. Thus, the attacker examines remote machines for security gaps using some tools such as worms to find their vulnerabilities and inject them with the attack code. Then, these compromised machines become zombies, which the attacker uses to send malicious packets to the targeted victim. DDoS may yet cause a long-term memory consumption of the relaying nodes in IoT environments due to nodes’ restricted resources. There are various DDoS attack types used to degrade the performance or availability of targeted services on the Internet. Some of these attacks are Botnet attacks, Spoof-packet flood attacks, Multi-Vector Attacks, and Misused Application Attacks. Besides, there are various schemes used to defend against DDoS attacks, which are under three categories; policy-based schemes, application-based schemes, and machine learning–based schemes. The policy-based defense scheme is placed in the switch to define the traffic that is allowed to be forwarded and the other ones are defined as malicious. It requires analyzing collected data samples of the network to classify malicious traffic. Numerous policy algorithms use different measurements such as standard deviation or measure the chi-square statistic of the sample to classify the packets as malicious or legitimate. Secondly, the application-based schemes handle and control packets in the network by the user interface layer. Finally, the machine Learning-based defense schemes deploy machine learning algorithms to investigate and classify the traffic to detect the DDoS attack.

3.2 Hierarchical hidden Markov model (HHMM)

The Hierarchical hidden Markov model (HHMM) is a multi-level stochastic process derived from the Hidden Markov model (HMM) by making each of the hidden states a self-contained autonomous probabilistic model. It is a statistical framework for modeling a sequence of observations. Each observation is emitted from a hidden state within the system by recursive activation. The basic idea of HHMM is that the upper-level states produce sequence states called “abstract” states [17]. And, the lower-level states produce single observations called “concrete” states [17]. The observations are governed by each of the sub-states (sub-HMMs). The process of recursive activations ends when reaching a state that produces output symbols like an HMM [17].

For estimating HHMM parameters, we define the generalized forward (α) and backward (β) probabilities as follows:

αi=PO,qil|λPO|λ=∑αTi

where qil is the number of sub-states of an abstract state.

βj=POqjlλ

We also define the generalized horizontal (ξ) and vertical transitions (χ) as follows:

ξTqjlql−1=Pit=qilit+1=qjl|O,λχTqilql−1=Pit=qil−1it+1=qlOλ

The model is represented as λPHHMM = <Aql,Bql,πql>. And, the states of an HHMM are denoted by Ql=qil, where l∈1,2,.…L, i is the state indexing, L is the output state, and l is the hierarchy indexing. It performs the following computation:

A probability transition matrix (Aql=aijql) is generated as the conditional probability of future traffic state is independent of the past states given the present state:
aijql=Pqt+1l+1=Sj|qtl+1=Si,1⩽i,j⩽Naijql⩾0∑aijql=1
where aij is a horizontal transition probability from state i to state j and all are sub-states of ql.
N is hidden states.
An emission matrix (Bql=bjhql) for observation probabilities given the hidden traffic state is generated by:
bjhql=POht|qtl=Sj,1⩽i,j⩽N1⩽h⩽M
where bjh is observed probability in state j.
M is observable states.
An initial state distribution (πql) is generated by:

πql=πqlqil+1=Pqt=s1

4. Framework of the proposed model

In this section, we first explain the structure of the traditional HHMM model then we illustrate the framework of our proposed model.

4.1 Framework of the HHMM

Learning, decoding, and evaluating are the three principal HHMM objectives as described in [18]. Briefly, the techniques applied to achieve these objectives are as follows:

Learning: Baum-Welch algorithm is used to create the object of the learning machine.
Decoding: Viterbi algorithm is applied to define the most probable state path of hidden states that can be transitioned given an observation sequence (O) and the model parameters (λ).
Evaluation: The forward and backward algorithms are used to determine the probability of an observed sequence.

This probabilistic hierarchical hidden Markov model should overcome the problem of the heterogeneity of IoT data. However, it suffers from high computational costs as the data increases in an exponential manner due to its used algorithms. Applying this scheme to IoT data undeviatingly will contribute to a problem of high state space. We, therefore, need to find a way to reduce the high state space without compromising the classification quality.

4.2 Framework of PHHMM

Our proposed model uses clustering and dimension reduction techniques to partition the massive incoming network traffic to overcome the problem of largely hidden states, before applying HHMM for classification. It follows the framework described in Figure 1 and achieves the objectives [18] and techniques are applied as follows:

Estimate the model parameters: Find the most probable parameter λ∗ of the model by applying the Baum-Welch algorithm [19]: given the PHHMM structure and one or more observation sequences λ=argmaxλ∗POtλ.
Learning: Calculate the probability of a sequence: Derive the maximum likelihood estimate of the parameters of PHHMM given the set of output sequences by applying the Baum-Welch algorithm: given a PHHMM and its parameters, determine the likelihood POλ of a sequence O to be generated by the model.
Decoding: Calculate the most probable state sequence: Find the state sequences that best explain the observations by applying the Viterbi algorithm: given a PHHMM, its parameters, and an observation sequence, determine the single state sequence that is most likely to generate the observation sequence.
Evaluation: Detect DDoS attacks: Evaluate the probability of the observed sequence and solve the detection problem to compute the probability of alert observations by applying the DDoS detection algorithm: given the probability of an observed attack alert sequence (O), detect DDoS attacks and predict future trails.

In our PHHMM model, applying dimension reduction techniques is a challenging step due to the lack of a standard approach for reducing the dimensionality of the observed IoT network traffic. It requires identifying the principal components and linear combinations of variables that describe the highest contrast in the massive data without compromising this data. Determining the principal components given a covariance matrix is computationally expensive as it claims the eigenvalue decomposition that requires the calculation of the covariance matrix. To overcome this challenge, we present an approach that avoids the direct computation of the covariance matrices but delivers the efficient subspace dimension. Our model applies the singular value decomposition (SVD) for calculating PCA to circumvent this expensive operation. The participating nodes in the algorithm use the PCA with the SVD learning mechanism to estimate principal components of the data traffic.

This work contributes to improving and resolving the common flaws in the application of HHMM in massive data by reducing the data dimensions based on traffic from both of the two streams being compared instead of depending only on some training data of normal traffic. Using only the most significant principal components, we could avoid the computation of the entire subspace. We can estimate a reduced number of principal components that are sufficiently effective in detecting malicious traffic. Our model allows the subsequent use of only the number of dimensions necessary at any given time.

5. Proposed model

This section explains the methods and techniques that we apply to the tasks in the tasks of the proposed model in Figure 1.

5.1 Data pre-processing

Collecting a large amount of attack traffic and normal traffic in a large real-time network is time and money-consuming. It needs significant resources, a diversity of normal IoT traffic, and a diversity of attack traffic. Instead, there are publicly available network traffic datasets, which can be used for this task. We analyze some of the available datasets based on the following:

Real-time network traffic
The most advanced DoS and DDoS attacks

The CICIDS2019 includes inbound and outbound traffic of the most advanced DoS and DDoS attacks. It contains lots of network flow-related characteristics and different types of current DoS and DDoS IoT attacks traffic collected over real-time networks.

5.2 Feature selection

The CICISD dataset has a huge number of attributes, selecting a subset of them is necessary for eliminating redundant and irrelevant ones. This helps in improving detection accuracy for DDoS attack detection. To better train our model for detecting the attack patterns, we have selected packet features that indicate DDoS attacks, which are useful for classification to distinguish between normal IoT traffic and DDoS IoT traffic:

Destination IP: According to the research in [20], the IoT device communicates with several numbers of expected destinations that rarely get modified over time. Thus, if the device communicates with many separate destinations in a short time-stamp, it is considered an attack.
Packet size: In [21], the authors declared that the size of a normal IoT packet size differs from 42 to 1434 bytes, while the size of a DDoS packet is smaller than 100 bytes. Thus, a sudden jump in the traffic with a fixed packet size of around 100 bytes represents a DDoS attack.
Packet Size Variance: DDoS attack packets share the same size frequently whereas IoT packets’ size differs from one to another.
Packet Time Interval: The time interval between DDoS packets is close to zero while IoT packets travel in a regular time interval.

5.3 Data clustering

The k-means clustering is a method of vector quantization that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean based on a similarity metric [20].

Let S=f1,f2,..…fn and K=C1,C2,.…Cn then Ci≠ϕ, Ci∩Cj=ϕ, and ∪i=1KCi=S, where i,j=1,2,…K, and i≠j.

The clustering method works as follows:

Input: N features f1f2…fn
Initialization: Finding cluster centers μ1,…μk.

Randomly initialize K:

Select K features as the primary cluster centers.
Reiteration:
Allocate each feature fn to its closest cluster center Ck=n:k=minkfn−μk2.
Determine the new center μk of the cluster Ck:μk=1∣Ck∣∑fn.
Repeat until the cluster centers do not move anymore.

When deploying k-mean to cluster IoT traffic, we reduce the dimension data to quantize the data into single-dimensional data. We classify the traffic flow depending on similar characteristics and identify clusters of homogeneous traffic flows and define their borders. It needs to have high intra-cluster homogeneity and inter-cluster heterogeneity.

For determining the optimal (fitting) value of K, we use the elbow algorithm repeatedly applying different values of K and plotting their heterogeneity. When the curve begins to flatten, it reaches the optimal value of K.

5.4 Dimensionality reduction

After clustering the data, initially, we have n states S1S2…Sn, applying the dimension reduction technique results in a new set of m states s1s2…sm where m<n, si=fiS1S2…Sn and fi represents a mapping function. Thus, its idea is to transform the massive data from a high-dimensional space, like IoT data, into a k-dimensional sub-space by partitioning the data-space into fully connected states. The low-dimensional form holds the top eigenvector v which has the meaningful features of the real data ideally close to its natural dimension [22]. The new set of features is extracted by some functional mapping. In this model, we considered the principal component analysis (PCA) and calculated it using the singular value decomposition (SVD) as the PCA presents a structure for reducing data dimensionality by outlining the maximum variation in the data.

To achieve dimension reduction by applying PCA, it requires placing the eigenvalues from the highest to lowest by their value. This ordering stores the elements in order of weight to the variance of the initial data matrix. This will allow us to drop the less important elements. Thus, we keep most of the information and lose a little noise. We can reduce the dimension of the original data. For instance, for any data of d dimensions, we only take the first r eigenvectors:

∑αr∑αd=α1+α2+.…+αrα1+α2+.…+αdE1

=α1,α2,.…,αrE2

Definition 1 [23]: For any matrix Y=y1,y2,…,yn of the size K×d can be re-written as Y=USVΤ where, U is an orthonormal matrix of size K×r, S is a diagonal matrix of size r×r, V is a matrix of eigenvectors of size r×d (a column is an eigenvector) (see Figure 2).

Figure 2.
Singular value decomposition of Y [23].

Assume that the data matrix Y is centered, i.e., the column means have been subtracted to be equal to zero. The covariance matrix (C) is calculated by:

C=XΤXK−1E3

Because the covariance matrix is symmetric, it can be diagonalized by:

C=VLVΤE4

where V is an eigenvectors matrix and L is a diagonal matrix with eigenvalues λi. The eigenvectors are called principal axes of the data, and, the data projections on the principal axes are called principal components [24]. After obtaining the singular value decomposition, C is defined by:

C=VSUΤUSVΤK−1E5

=VS2K−1VΤE6

As the result, the eigenvectors of C are the same as the matrix V (the right singular vectors of Y) and the eigenvalues of C can be defined from the singular values λi.

λi=si2K−1λiE7

The principal components are defined by:

YV=USVΤV=USE8

In short, the PCA is calculated as follows:

Obtain the maximum covariance in the data.
Keep meaningful information only to reduce data size.
Simplify the representation of the data:
The variance of the data is maximized.
Analyze the construction of the features and observations.

Based on Oja’s algorithm for stochastic PCA optimization [25], the primary concept of our algorithm is to implement stochastic m updates by uniformly sampling the columns yi at random, and reduce the variance of these updates.

Xt′=Xt−1+ηyityitΤXt−1E9

Xt=1‖Xt′‖XtE10

We use the variance-reduced stochastic schemes for convex optimization [23] to reduce the stochastic variance. Let B=1nXXΤ; then the updates in each iteration can be rewritten of our algorithm as

X′=I+ηBXt−1+ηyityitΤ−BXt−1−X¯f−1E11

Xt=1‖Xt′‖XtE12

The algorithm is burst into periods f = 1, 2, 3,. .., wherein all period we do a single exact power iteration by computing U¯. The steps to solve the problem are explained by the pseudo-code in Algorithm 1.

Algorithm 1: Dimensionality Reduction Algorithm.

Input Matrix Y=y1…yn.

Output Matrix X¯f.

1: Initialize Orthonormal Matrix X¯k×d.

2: For f=1,2,3…K Do

3: U¯=1n∑yiyiΤX¯f−1

4: X¯0=X¯f−1

5: For t=1,2,3,.…m Do

6: Bt−1=VUΤ, where

USVΤ is an decomposition of Xt−1ΤX¯f−1

7: Set it∈1,2,3,…n uniformaly at random

8: Xt′=Xt−1+ηyityitΤXt−1−yitΤX¯f−1Bt−1+U¯Bt−1

9: Xt=Xt′X′ΤtXt′1/2

This is to ensure that Wt has orthonormal columns

10: end for

11: X¯f=Xm

12: end for

5.5 Hierarchical hidden Markov classification

Similar to HHMM [17], the PHHMM model uses the Baum-Welch algorithm to calculate the likelihood-maximizing parameters of the model given the observed data. It comprises four phases: the initial phase where the λ is randomly assumed if there is no prior knowledge, the forward phase where the forward variable is calculated recursively, the backward phase where the backward variable is calculated, and the update phase to update the parameters. Then, the model uses the Viterbi algorithm to find the most likely sequence of the hidden states given the observed data and the parameters. Finally, it uses the DDoS detection algorithm to detect the multistage attack based on the observed alert sequence, where, the standard HHMM focuses on a single category with a limited amount of features thus it is difficult to detect attack traffic that appears to be normal traffic.

The traditional HHMM constitutes multi-single states that are considered as self-contained probabilistic models [18]. However, due to the heterogeneity of IoT traffic, we design each state to have multiple separate lower HMM layers and one upper HMM layer, each lower state constitutes three levels:

Learning: The observations of the first level train the L1LHMMi by the Baum-Welch algorithm to determine model parameters.
Decoding: The observations of the second level trains the L2LHMMi by Viterbi algorithm to find the most likely sequence of hidden states using model parameters by the Viterbi algorithm.
Evaluation: The observations of the third level trains the L3LHMMi and find the DDoS attacks sequence using the most probable sequence.

The model has one upper HMM state for predicting DDoS attacks that use the attack sequence from the lower states to learn new patterns of DDoS attacks by the DDoS detection algorithm. Thus, we can detect the multistage DDoS attacks in this extended mode, unlike the standard HHMM.

Model Training (Baum-Welch algorithm) Baum-Welch algorithm [19] is a recursive Expectation–Maximization method for estimating un-observed hidden parameters in an HHMM model. This algorithm facilitates the complex challenges of analytically applying maximum likelihood estimation. It trains the HHMMs to find the optimal λ. Starting with initialized values, the algorithm iteratively adjusts the parameters based on a set of observed feature vectors.

By this algorithm, for HHMM models (λ1ql,λ2ql,…,λnql) and a given sequence of observations Oql=O1qlO2ql…Otql, we choose λql=AqlBqlπql such that POλiql,i=1n is locally maximized.

Algorithm 2: The Baum–Welch algorithm.

Input Observation sequence Oql.

Output Re-estimated model parameters:

• The state transition matrix a′ijql, and.

• The Observation likelihood sequence b′jhql.

1: Initialize Observations (M), States (qi), Threshold (Th).

2: Estimate aijql, bjhql using initialization techniques.

3: Calculate the expected probability POλ.

4: reiterate.

5: Calculate forward variable αqjl:

αt+1lqjl=bjql−1Ot+1 ∑αtqjl+1αijqil

6: Calculate backward variable βqil:

βtlqil=∑bjql−1Ot+1 αijqil−1 βt+1dqjl

7: Calculate downward and upward variable εqilqjl:

ε=αtqilaijql−1βt+1qjlPOλ

8: Estimate the state probability γhqil and γfqil:

γhtqil=αtqilβtqjlPOλγftqil=αtqilβtqilPOλ

9: Compute the optimal state sequence Xij:

Xij=αtliaijlβt+1ljbjlOt+1∑∑αtliaijlβt+1ljbjlOt+1

10: Estimate a′ijql and b′jhql:

a′ijql=∑εqil+1qjl+1∑γhtqil+1b′jhql=∑γfqil+∑γhqil∑γfqil+∑γhqil

11: Calculate POλ through the estimated parameters:

ε=POλ′ql−POλqlPOλql=POλ′qlaijql=a′ijqlbjhql=b′jhql

12: until ε<Th.

13: return a′ijql,b′jhql.

Model Decoding (Viterbi algorithm) Viterbi decoding algorithm [26] predicts the hidden traffic states. This algorithm only uses state-optimized joint likelihood for observation data and the underlying Markovian state sequence as the objective function for estimation. Opposed to the BW algorithm, it does not update all likely paths for all states in the HHMM.

Algorithm 3: Viterbi algorithm [26].

Input Re-estimated model parameters a′ijql, b′jhql from BA algorithm,

Output Re-estimated state transition matrix a′ijql, and Alert observation likelihood sequence b′jhql.

1: Initialize Observations (M), States (qi), Threshold (Th), aij, bjvk.

2: Obtain the model (λ) through expected probability POλ.

3: reiterate.

4: Split Oql into N states through Viterbi decoding.

5: a′ijql=no.of states transitions fromitojtotalno.of states transition fromi

6: b′jqlvk=no.ofoccurrencesofobservationkinstatejtotalno.ofobservationsinstatej

where, bjlvk=POtql=vkqtql=Sj.

7: Normalize row sums of a′ijql and b′j(qlvk) to unity so that all elements ∈ [0,1].

8: Estimate λ′l from Oql, a′ijql and b′jqlvk.

9: λd=λ′ql, aijql=a′ijql, bjvkql=b′jqlvk.

10: until ∣λ′l−λl∣>Th.

11: return a′ijl,b′jvkl.

Model Detection algorithm This algorithm uses prior knowledge to learn about the previous attack behavior and track the attack alerts. It gets the likelihood probability of the observation sequence Oiqd then predicts the DDoS attack behavior based on the occurrence of the attack observation sequence in the previous algorithm.

Algorithm 4: Detection algorithm.

Input Alert observation sequence Oiql, no. of iterations G, λil=AqlBqlπql.

Output Attack alerts observation sequence O′iql.

1: Initialize α0q1lj=πjq1lbjq1lO0.

2: Calculate the probability of O1ql:

αjql1=πjqlbjqlO1

3: Calculate the probability of observation sequence (Oql):

αt+1qlj=∣∑αtqliaijql∣bjqlOt+1

4: Calculate the likelihood sequence of the observable sequence (O) obtained by the model:

POλqil=∑αtqilj

5: If (logPOλqil<−Th).

alerti=alerti+1

6: until G is reached.

7: return alert.

6. Experimental setup

6.1 Datasets

In this section, the performance of the PHHMM based anomaly detection approach was tested on traffic combining DDoS attack data with normal data from the prepared dataset. We place the prepared dataset into our PHHMM model to identify DDoS attack intentions and predict the possible attacks. The performance of implementing our proposed model is obtained through MATLAB R2020b simulations. To remove duplicate alerts, we wrote a script for extracting necessary fields such as IP Addresses, Alert ID, Destination Port, Source Port, and timestamp from Snort IDS alerts.

6.2 Evaluation metrics

We analyze and evaluate the performance on the common metrics for IDS performance evaluation; Accuracy (the rate of true results including true negatives and true positives), Precision (positive predictive value), Sensitivity (true positive rate), Specificity (false positive rate), and False Negative Rate (error rate) [27], all in an average sense (see Table 2).

		Predicted class
		Attack	Non-attack
Actual class	Attack	TP	FN
Actual class	Non-attack	FP	TN

Table 2.

Two-Class Case Confusion Matrix.

After generating the likely state sequences, we compare them to the known state sequences to define true positive (TP), false positive (FP), true negative (TN), and false-negative (FN) parameters [27]. The accuracy (ACC) is obtained by the following equation:

ACC=TP+TNTP+TN+FP+FNE13

The precision (PR), the fraction of the total number of positive cases that are correctly identified as attacks to the total number of attacks, is obtained by the following equation:

PR=TPTP+FPE14

Sensitivity (SN) or the true positive rate, the fraction of the total number of classified true positive that are accurately identified as attacks to the total number of positive cases, is calculated by the following equation:

SN=TPTP+FNE15

We use F_measure to evaluate the model’s overall accuracy considering both precision and sensitivity. Having a good F-measure value indicates that the model has low false positives and false negatives, which means that it correctly identifies attacks. It is calculated by the following equation:

Fmeasure=2×PR×SNPR+SNE16

The following equation is used to identify the error rate (ER) for false negative predictions:

ER=FP+FNTP+TN+FP+FNE17

7. Evaluation results

Compared to the original system, our model constructs an equivalent system with a minimal number of constraints over real-valued variables consisting of bounds on variations. This helps in reducing the high state space and improving the classification accuracy and time complexity as well.

7.1 Classification accuracy

The above results show that our proposed model obtains satisfactory results with regard to attack detection rate. The proposed model has 98.9% accuracy and 97.9% precision. Table 3 reviews the results of the performance of our model compared to the neural network (NN), Naive Bayes (NB), and HHMM classification algorithms.

Models (Training 80%/ Testing 20%)	ACC	PR	SN	FMeasure	ER
Neural network	0.92	0.90	0.94	0.92	0.05
Naive Bayes	0.45	0.68	0.56	0.61	0.06
HHMM	0.975	0.92	0.979	0.94	0.03
PHHMM	0.989	0.979	0.99	0.985	0.02

Table 3.

Experiment results summary I.

We perform the tests using different window sizes to understand their influence on the detection. It shows that increasing the size of the window results in better accuracy.

Figure 3 shows the ROC curves for the performance of our proposed model, compared to the HHMM, NN, and NB models. ROC curves help identify the balance between the true-positive rate and the false-positive rate for all possible thresholds. It illustrates the model’s strength to differentiate between attack and non-attack classes (see Figure 4).

Figure 3.
Roc curves of models performance.

Figure 4.
Comparison of average models performance.

7.2 Efficiency

Computation time is not associated instantly with classification; however, it describes the training time taken by the model. Table 4 shows that our model has a lower computation time compared to the HHMM. In [28], The time complexity of calculating the probability of a sequence and estimating the HHMM parameters as the model depends on the length of the observation equals OST3, where S is the number of states, and, T is the granted transactions number at each level. Meanwhile, the PHHMM time complexity equals OSlogS based on our calculations. Figure 5 shows the time complexity of HHMM vs. PHHMM.

Models	Computation time (sec)
HHMM	8.3
PHHMM	5.6

Table 4.

Comparison of computation time.

Figure 5.
Time complexity of HHMM vs. PHHMM algorithms.

8. Conclusion

In this work, we propose a probabilistic hierarchical hidden Markov model (PHHMM) applied for IoT intrusion detection which is more efficient than the existing HHMM without compromising classification accuracy. The main idea of our model is to reduce the huge problem state space of IoT traffic through dimensionality reduction by PCA and SVD. The proposed model is tested on the CICISD2019 dataset to detect and predict DDoS attacks. We evaluated our model on major performance metrics including Accuracy, Precision, Sensitivity, and False Negative Rate, Specificity and shows that our scheme has better detection accuracy and low error rates compared to Naive Bayes and neural network classification algorithms. It shows that PHHMM achieves a comparable accuracy as HHMM, better than NB and NN, and better efficiency than HHMM.

Acknowledgments

We would like to thank the Ministry of Higher Education in Saudi Arabia and the University of Adelaide for partially supporting this research.

References

1. Zhang J, Hu P, Xie F, Long J, He A. An energy efficient and reliable In-network data aggregation scheme for WSN. IEEE. 2018
2. Chadza T, Kyriakopoulos K, Lambotharan S. Analysis of hidden Markov model learning algorithms for the detection and prediction of multi-stage network attacks. FGCS. 2020
3. Ingale S, Paraye M, Ambawade D. Enhancing multi-step attack prediction using hidden Markov model and naive Bayes. ICESC. 2020
4. Forster A, Murphy AL. Machine Learning across the WSN Layers in Emerging Communications for Wireless Sensor Networks. Rijeka: In Tech; 2011
5. Chelloug S. Energy-efficient content-based routing in internet of things. Journal of Computer and Communications. 2015
6. Salman T, Jain R. Networking Protocols and Standards for Internet of Things. John Wiley and Sons, Inc.; 2017
7. Iqbal M, Bayoumi M. Secure end-to-end key establishment protocol for resource-constrained health care sensors in the context of IoT. In: Proceeding of the HPCS. 2016
8. Korte K, Sehgal A, Schönwälder J. A Study of the RPL Repair Process using ContikiRPL. In: Proceedings of the 6th International Conference on Autonomous Infrastructure, Management, and Security. 2012
9. Alam S, Siddiqui S. Security & Privacy Threats, attacks and countermeasures in internet of things. International Journal of Network Security and Its Applications. 2019
10. Beslin Pajila PJ, Golden Julie E. Detection of DDoS Attack Using SDN in IoT: A Survey. Springer; 2020
11. Brogi G, Bernardino E. Hidden Markov models for advanced persistent threats. IJSN. 2019
12. J. Koo, and S. Cho,“Effective intrusion type identification with edit distance for hmm-based anomaly detection system”, in Pattern Recognition and Machine Intelligence. Springer, 2005.
13. Suratkar S, Kazi F, Gaikwad R, Shete A, Kabra R, Khirsagar S. Multi hidden Markov models for improved anomaly detection using system call analysis. IBSSC. 2019
14. Bhosale S, Sonavane S. A real-time intrusion detection system for wormhole attack in the RPL based internet of things. Procedia Manufacturing. 2019
15. Pham T, Yeo C, Yanai N, Fujiwara T. Detecting flooding attack and accommodating burst traffic in delay-tolerant networks. IEEE Transactions on Vehicular Technology. 2018
16. Nazih W, Hifny Y, Elkilani WS, Dhahri H, Abdelkader T. Countering DDoS attacks in SIP based VoIP networks using recurrent neural networks. Sensors. 2020;20
17. Kong W, Dong D, Hill D. A hierarchical hidden Markov model framework for home appliance modeling. IEEE Transactions on Smart Grid. 2016
18. Fine S, Singer Y, Tishby N. The hierarchical hidden Markov model: Analysis and applications. Machine Learning. 1998;32
19. Chadzaab T, Kyriakopoulosa K, Lambotharan S. Analysis of hidden Markov model learning algorithms for the detection and prediction of multi-stage network attacks. Future Generation Computer System. 2020
20. Kumar S, Raza Z. A K-means clustering based message forwarding model for the internet of things (IoT). Confluence. 2018
21. Otoum Y, Liu D, Nayak A. DL-IDS: A deep learning-based intrusion detection framework for securing IoT. Transactions on Emerging Telecommunications Technologies. 2019
22. Pal S, Bandyopadhyay S, Biswas S. Pattern recognition and machine intelligence. PReMI. 2005
23. Tenenbaum J, Silva V, Langford J. A global geometric framework for nonlinear dimensionality reduction. Science. 2000
24. Smith L. A tutorial on Principal Components Analysis. 2002.
25. Johnson R, Zhang T. Accelerating stochastic gradient descent using predictive variance reduction. NIPS. 2013
26. Bongiovanni W, Guelfi A, Pontes E, Silva A, Zhou F, Kofuji S. Viterbi algorithm for detecting DDoS attacks. LCN. 2015
27. Lawal M, Shaikh R, Hassan S. An anomaly mitigation framework for IoT using fog computing. Electronics. 2020
28. Chunfu J, Feng Y. An intrusion detection method based on hierarchical hidden Markov models. Wuhan University Journal of Natural Sciences. 2007

[1] 1. Zhang J, Hu P, Xie F, Long J, He A. An energy efficient and reliable In-network data aggregation scheme for WSN. IEEE. 2018

[2] 2. Chadza T, Kyriakopoulos K, Lambotharan S. Analysis of hidden Markov model learning algorithms for the detection and prediction of multi-stage network attacks. FGCS. 2020

[3] 3. Ingale S, Paraye M, Ambawade D. Enhancing multi-step attack prediction using hidden Markov model and naive Bayes. ICESC. 2020

[4] 4. Forster A, Murphy AL. Machine Learning across the WSN Layers in Emerging Communications for Wireless Sensor Networks. Rijeka: In Tech; 2011

[5] 5. Chelloug S. Energy-efficient content-based routing in internet of things. Journal of Computer and Communications. 2015

[6] 6. Salman T, Jain R. Networking Protocols and Standards for Internet of Things. John Wiley and Sons, Inc.; 2017

[7] 7. Iqbal M, Bayoumi M. Secure end-to-end key establishment protocol for resource-constrained health care sensors in the context of IoT. In: Proceeding of the HPCS. 2016

[8] 8. Korte K, Sehgal A, Schönwälder J. A Study of the RPL Repair Process using ContikiRPL. In: Proceedings of the 6th International Conference on Autonomous Infrastructure, Management, and Security. 2012

[9] 9. Alam S, Siddiqui S. Security & Privacy Threats, attacks and countermeasures in internet of things. International Journal of Network Security and Its Applications. 2019

[10] 10. Beslin Pajila PJ, Golden Julie E. Detection of DDoS Attack Using SDN in IoT: A Survey. Springer; 2020

[11] 11. Brogi G, Bernardino E. Hidden Markov models for advanced persistent threats. IJSN. 2019

[12] 12. J. Koo, and S. Cho,“Effective intrusion type identification with edit distance for hmm-based anomaly detection system”, in Pattern Recognition and Machine Intelligence. Springer, 2005.

[13] 13. Suratkar S, Kazi F, Gaikwad R, Shete A, Kabra R, Khirsagar S. Multi hidden Markov models for improved anomaly detection using system call analysis. IBSSC. 2019

[14] 14. Bhosale S, Sonavane S. A real-time intrusion detection system for wormhole attack in the RPL based internet of things. Procedia Manufacturing. 2019

[15] 15. Pham T, Yeo C, Yanai N, Fujiwara T. Detecting flooding attack and accommodating burst traffic in delay-tolerant networks. IEEE Transactions on Vehicular Technology. 2018

[16] 16. Nazih W, Hifny Y, Elkilani WS, Dhahri H, Abdelkader T. Countering DDoS attacks in SIP based VoIP networks using recurrent neural networks. Sensors. 2020;20

[17] 17. Kong W, Dong D, Hill D. A hierarchical hidden Markov model framework for home appliance modeling. IEEE Transactions on Smart Grid. 2016

[18] 18. Fine S, Singer Y, Tishby N. The hierarchical hidden Markov model: Analysis and applications. Machine Learning. 1998;32

[19] 19. Chadzaab T, Kyriakopoulosa K, Lambotharan S. Analysis of hidden Markov model learning algorithms for the detection and prediction of multi-stage network attacks. Future Generation Computer System. 2020

[20] 20. Kumar S, Raza Z. A K-means clustering based message forwarding model for the internet of things (IoT). Confluence. 2018

[21] 21. Otoum Y, Liu D, Nayak A. DL-IDS: A deep learning-based intrusion detection framework for securing IoT. Transactions on Emerging Telecommunications Technologies. 2019

[22] 22. Pal S, Bandyopadhyay S, Biswas S. Pattern recognition and machine intelligence. PReMI. 2005

[23] 23. Tenenbaum J, Silva V, Langford J. A global geometric framework for nonlinear dimensionality reduction. Science. 2000

[24] 24. Smith L. A tutorial on Principal Components Analysis. 2002.

[25] 25. Johnson R, Zhang T. Accelerating stochastic gradient descent using predictive variance reduction. NIPS. 2013

[26] 26. Bongiovanni W, Guelfi A, Pontes E, Silva A, Zhou F, Kofuji S. Viterbi algorithm for detecting DDoS attacks. LCN. 2015

[27] 27. Lawal M, Shaikh R, Hassan S. An anomaly mitigation framework for IoT using fog computing. Electronics. 2020

[28] 28. Chunfu J, Feng Y. An intrusion detection method based on hierarchical hidden Markov models. Wuhan University Journal of Natural Sciences. 2007

Paper	HHMM	HMM	DDoS detection	Up-to-date dataset	Applicable for IoT
[1]	No	Yes	Yes	No	No
[2]	No	Yes	No	N/A	No
[3]	No	Yes	No	Yes	No
[4]	No	Yes	Yes	Yes	No
[5]	Yes	Yes	Yes	Yes	No
[6]	No	No	Yes	N/A	Yes
[7]	No	No	Yes	N/A	Yes

Paper	HHMM	HMM	DDoS detection	Up-to-date dataset	Applicable for IoT
[1]	No	Yes	Yes	No	No
[2]	No	Yes	No	N/A	No
[3]	No	Yes	No	Yes	No
[4]	No	Yes	Yes	Yes	No
[5]	Yes	Yes	Yes	Yes	No
[6]	No	No	Yes	N/A	Yes
[7]	No	No	Yes	N/A	Yes