Machine Learning Approaches for Spectrum Management in Cognitive Radio Networks

Ahmed Mohammed Mikaeil

doi:10.5772/intechopen.74599

Abstract

Cognitive radio (CR) provides a better way for utilization of spectrum resource by introducing an opportunistic usage of the frequency bands that are not heavily occupied by a licensed spectrum user or a primary user (PU). In cognitive radio, the detection and estimation of PU channel availability (unoccupied spectrum) are the key challenges that need to be overcome in order to prevent the interference with licensed spectrum user and improve spectrum resource utilization efficiency. This chapter focuses on developing new ways for detecting and estimating primary user channel availability based on machine-learning (ML) techniques.

Keywords

machine learning
spectrum sensing
spectrum management
channel state estimation
cognitive radio

Author Information

Show +

Ahmed Mohammed Mikaeil*
- Department of Electronic Engineering, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China

*Address all correspondence to: ahmed_mikaeil@yahoo.co.uk

1. Introduction

In this chapter, we study the problem of detection of unoccupied primary user spectrum (i.e., spectrum hole). We also introduce the methods for estimating the time when primary user channel state is available, so that the secondary spectrum user can adjust their transmission strategies accordingly.

The chapter is organized in two parts. The first part of the chapter focuses on the problem of detecting the unoccupied spectrum left by the primary user. In this part, we introduce the usage of machine-learning (ML) techniques as a fusion algorithm in cooperative spectrum sensing based on energy detector [1, 2]. In particular, we train a machine-learning classifier (i.e., K-nearest neighbor (KNN), support vector machine (SVM), Naive Bayes (NB), and Decision tree (DT)) over a set containing energy test statistics of PU channel frames along with their corresponding decisions about the presence or absence of PU transmission in the channel. Then, we use the trained classifier to predict the decisions for newly unseen PU channel frames [3]. The second part focuses on estimating the near future of PU channel state. In the literature, there are many proposals that have studied the problem of estimating PU channel state in cognitive radio (CR) [4, 5, 6]. However, most of these studies focused on predicting PU channel state in frequency domain by converting the received digital signals into frequency domain using fast Fourier transform (FFT). This increases the system complexity due to the FFT computations process. In the second part of the chapter, we introduce a new time-domain approach for PU channel state prediction based on time series prediction with some machine-learning prediction model. In particular, a time series is used to capture PU channel state detection sequence (PU channel “idle” or “occupied”) in time domain. Then, prediction models such as the hidden Markov model (HMM) and Markov switching model (MSM) are used to predict the behavior of the time series that used capture PU channel state [7].

2. Machine-learning fusion-based cooperative spectrum sensing

In this part, we, first, define the system model for energy detection-based spectrum sensing; then, we present the method of calculating the thresholds for energy detector with different fusion rules. Second, we formulate a machine-learning classification problem and present four machine-learning classifiers to solve it. Then, we evaluate the performance of these classifiers with simulation experiments.

2.1. Energy detection-based cooperative spectrum sensing

Figure 1 shows a block diagram of the system model used for energy detection cooperative spectrum sensing based on machine-learning fusion rule. In this model, we consider a cooperative CR network with K cooperative nodes. Each node uses N samples for energy detection, while Mframes are used for training the machine-learning (ML) classifier. The received signal of ith frame at the jth cooperative node yijn ,1≤n≤N ,1≤i≤M, 1≤j≤K is given by

yijn=wijnH0γijsijn+wijnH1E1

where sijn is the PU signal which is assumed to follow Gaussian i.i.d random process (i.e., zero mean and σs2 variance), wijn is the noise which is also assumed to follow Gaussian i.i.d random process (zero mean and σu2 variance) because sijn and wijn are independent. Due to the fact that all K nodes are sensing the same frame at a given time, the global decision about PU channel availability will be made at the fusion center only. Thus, the energy statistic for the ith frame at the jth cooperative node Yij can be represented by the energy test statistic of the ith frame at the fusion center which is given by

Yi=1N∑n=1Nyijn2,1≤i≤ME2

Figure 1.
Block diagram of machine-learning-based fusion rule spectrum sensing.

Yi is a random variable that has chi-square distribution probability density function (2N degrees of freedom for complex value and with N degrees of freedom for real value case). If we assume that the channel remains unchanged during the observation interval and there are enough number of samples observed N≥200 [8], then we can approximate Yi using Gaussian distribution as follows:

Yi=σij22σij4/NH0(σij21+γij,2σij41+γij2/NH1E3

where σij2, is the standard deviation of noise samples wijn, and γij is the observed signal-to-noise ratio (SNR) of the ith frame sensed at the j th cooperative node. Assuming that the noise variance and the SNR at the node remain unchanged for all M frames, then γij=γj and σij2=σj2 . For a chosen threshold λj for each frame in the probability of the false alarm, Pf as given in [9] can be written as

Pfλj=PrYi>λjH0=12πσj∫λj∞e−λj−σj2/2σj2=Qλjσj2−1E4

and the probability of detection Pd is given by

Pdλj=PrYi>λjH1=Qλjσj21+γj−1N2E5

where Q. is the complementary distribution function of Gaussian distribution with zero mean and unit variance. To obtain the optimal threshold λ for K cooperative sensing nodes, data fusion rules are used. The calculation of the thresholds for single user and other fusion rules is presented in subsections 2.1.1 and 2.1.2.

2.1.1. The detection threshold for single-user-based sensing

For single user, sensing the number of the cooperative nodes is one (i.e., K = 1, σj2=σu2,γj=γu. From Eq. (4) and for a given probability of false alarm Pf, the single-user threshold can be written as

λsingle=2NQ−1Pf+1σu2E6

where Q−1. is the inverse of the Q. function, and the probability of the detection Pdsingle can be written as

Pdsingle=Qλsingleσu21+γu−1N2E7

2.1.2. The detection threshold for data fusion-based sensing

In a data fusion spectrum sensing scheme,K nodes cooperate in calculating the threshold that is used to make the global sensing decision. There are many fusion rules used to calculate the global sensing decision threshold, which are divided into: hard fusion rules including AND, OR, and majority rule and soft fusion rules including maximum ratio combining (MRC), equal gain combining (EGC), and square law selection (SLS).

2.1.2.1. AND fusion rule

The AND rule decides that the signal is present if all users have detected the signal. For a system with K cooperative nodes with the same false alarm probability Pf cooperating using AND rule, the fusion center threshold can be expressed as

λAND=2NQ−1Pf1K+1σu2E8

And the detection probability PdAND can be written as

PdAND=QλANDσu21+γu−1N2KE9

2.1.2.2. OR fusion rule

The OR rule decides that a signal is present if any of the users detect a signal. The fusion center threshold for K cooperative nodes cooperate using OR fusion rule which can be expressed as

λOR=2NQ−1(1−1−Pf1K+1σu2E10

And the detection probability PdOR is

PdOR=1−(1−QλORσu21+γu1N2KE11

2.1.2.3. Maximum ratio combination (optimal MRC) fusion rule

In soft combination fusion K, cooperative nodes with noise variances σ112σ222…σMK2 and instantaneous SNRs {γ11,γ22,…,γMK} send their ith frame energy test statistics Yij=1N∑n=1Nyijn2,1≤j≤K to the fusion center. The fusion center, weighs and adds them together after receiving these energy statistics as follows:

Ysi=∑j=1KwjYij,1≤i≤ME12

An assumption is made that SNRs and noise variances at the sensing node will remain unchanged for all the frames during the training process (i.e., γij=γj, σij2=σj2). For soft optimal linear combination, we need to find the optimum weight vector wj that maximizes the detection probability. For additive white Gaussian noise (AWGN) channels, the fusion threshold for MRC fusion rule is written as

λMRC=∑j=1Kwjσj2Q−1Pf+∑j=1Kwjσj2)E13

And the detection probability PdMRC is given by

PdMRC=QλMRC∑j=1K1+γjwjσj2−1N2E14

where the weighting coefficient vector wjw1w2…wK can be obtained by:

wj=signgTw0w0

where

w0=LH1−1/2LH1−1/TTgLH1−1/2LH1−1/2Tg

where

LH1=2diagσ141+γ12…..σk41+γK2/N

g=σ12γ1σ22γ2σ32γ3σ42γ4….σK2γKT

2.1.2.4. Equal gain combination (EGC) fusion rule

Equal weight linear combining employs straightforward averaging of the received soft decision statistics. In the equal gain combination, the received energies are equally weighted and then added together. The calculation of the threshold λEGC and the detection probability PdEGC follow Eqs. (13) and (14), respectively; the weighting vector is wj=w1…wK where w1=w2=w3…=wK=1/K [10].

2.1.2.5. Square law selection (SLS) fusion rule

Here, the fusion center selects the node with the highest SNR γSLS=MAXγ1γ2..γk and considers the noise variance σSLS2 associated with that node. Then the fusion center threshold is calculated as follows:

λSLS=2NQ−11−1−Pf1K+1σSLS2E15

And the detection probability PdSLS is

PdSLS=1−(1−QλSLSσSLS21+γSLS−1N2KE16

2.2. Machine-learning classification problem formulation

The ith frame energy test statistic (Yi for hard fusion or Ysi for soft fusion rule) given in Eq. (2) or (12) is compared to the sensing threshold to calculate the decision di associated with ith frame in the training data set as follows:

di=1YF≥λ−1YF<λ1≤i≤ME17

where λ∈λsingleλandλORλMRCλEGCλSLS,YF∈YiYsi, M is the number of frames in the training set and “−1 ” represents the absence of primary user on the channel, and “1” represents the presence of the primary user transmission on the channel. The output of Eq. (17) gives a set of pairs Yidi,i=1,2…M,di∈−11 that represent frame energy test statistics and their corresponding decisions. If we want to detect the decision (i.e., the class label) dx associated with a new frame energy test statistic Yx, we can use one of the following machine-learning classifiers to solve this classification problem.

2.2.1. K-nearest neighbors (KNN) classifier

For K-nearest neighbors classifier,K nearest points to Yx are used to predict the class label dx which corresponds to Yx [11]. for K=1, the Euclidian distance dst between Yx and the training data points can be computed as

dsti=Yx−Yi2=Yx−Yii=1,2…ME18

and, the new Yx is classified with the label dx = din, where din is the point that achieves the minimum Euclidian distance between dst and Yx.

2.2.2. Naïve Bayes classifier

Under the assumption that di=−1 and di=1 are independent, the prior probabilities for di=−1 and di=1 given training example Yidi,i=1,2,…,M can be calculated, and the class-conditional densities (likelihood probabilities) can also be estimated from the set Y1Y2…Yk.Y1Y2…Ykin which the new Yx is expected to fall in. And, the probability that the new Yx to be a member of either di=−1 or di=1 class is calculated using Naïve Bayes assumption and Bayes rule [12] as follows:

classYx=argmaxdiPrdi∏j=1kPrYj/diE19

where the prior probabilities are given to

Prdi=−1=number ofYiwithclass label"1"total number of class labels

Prdi=1=number ofYiwithaclass label"0"total number of class labels

Whereas the class-conditional densities “likelihood probabilities” can be estimated using Gaussian density function by:

PrYj/di=1σj2πe−Y−μj2σj,Y1<Y<Yk,σj>0,

where μj,σj are mean and variance of the set Y1Y2…Yk. Eq. (19) means that Naïve Bayes classifier will label the new Yxwith the class label di that achieves the highest posterior probability.

2.2.3. Support vector machine (SVM) classifier

For a given training set of pairs Yidi,i=1,2…M, where Yi∈R, and di∈+1−1, the minimum weight w and a constant b that maximize the margin between the positive and negative class (i.e., wYi+b=±1 ) with respect to the hyper-plane equation wYi+b=0 can be estimated using support vector machine classifier by performing the following optimization [13].

minw,bw22,wherew2=wTwE20

subject to diwYi+b≥1i=1,2,…,M.

The solution of this quadratic optimization problem can be expressed using Lagrangian function as

Lwbα=w22−∑i=1MαidiwYi+b−1,αi≥0E21

where α=α1α2…αM is the Lagrangian multipliers. IF we let Lwbα=0, we can get w=∑i=1MαidiYi and ∑i=1Mαidi=0, and by substituting them into Eq. (21), the dual optimization problem that describes the hyper-plane can be written as

minα12∑i=1M∑j=1MdidjYiYjαiαj−∑i=1Mαj,αj≥0E22

From expression (22), we can assess α and compute w using w=∑i=1MαidiYi. Then by choosing αi>0, from the vector of α=α1α2…αM and calculating b from b =dj−∑i=1MαidiYiYj, we classify the new instance Yx using the following classification function

classYx=sign∑i=1MαjdjYiYx+bE23

which means that the classification of new Yx can be expressed as dot product of Yx and the support vectors.

2.2.4. Decision tree (DT) classifier

For the training of a set of pairs of sensing decision Yidi,i=1,2,…,M ,di∈−11, the decision tree classifier creates a binary tree based on either impurity or node error splitting rule in order to split the training set into separate subset. Then, it repeats the splitting rule recursively for each subset until the leaf of the subset becomes pure. After that, it minimizes the error in each leaf by taking the majority vote of the training set in that leaf [14]. For classifying a new example Yx, DT classifier selects the leaf where the new Yx falls in and classifies the new Yx with the class label that occurs most frequently among that leaf.

2.3. Performance discussion

Figure 2 shows the receiver operating characteristic (ROC) curves for single-user soft and hard fusion rules under Additive White Gaussian Noise (AWGN) channel. In order to generate this figure, we assume a cognitive radio system with 7 cooperative nodes (i.e., K = 7) operate at SNR γu = −22 dB. The local node decisions are made after observing1000 samples (i.e., energy detection samples N = 100). For soft fusion rules, the SNRs γj for the nodes are equal to {−24.3, −21.8, −20.6, −21.6, −20.4, −22.2, −21.3} and the noise variances σj2 are 1111111. We use a false alarm probability Pf varied from 0 to 1 increasing by 0.025. The simulation results show that soft EGC and optimal MRC fusion rules perform better than other soft and hard fusion rules even though that soft EGC fusion rule does not need any channel state information from the nodes.

Figure 2.
ROC curves for the soft and hard fusion rules under the case of AWGN receiver noise ,σu2=1,γu= −22 dB, K= 7 users and energy detection over N=1000 samples.

Figure 3 shows the ROC curve depicting the performance of SVM classifier in classifying 1000 new frames after training it over a set containing M = 1000 frames. The thresholds used for training SVM classifier (i.e., single-user threshold, AND, OR, MRC, SLS, and EGC fusion rule threshold) are obtained numerically by considering the cognitive system used to generate Figure 2; however, here, we set the false alarm probability to Pf=0.1 .

Figure 3.
ROC curves shows the performance of SVM classifier in predicting the decisions for 1000 new frames after training it over a set containing1000 frames when single user, AND, OR, MRC, SLS, and EGC thresholds are used for training process.

From Figure 3 and Table 1, we can notice that when training SVM classifier with anyone of the following thresholds: single user, OR, MRC, SLS, or EGC, it can detect 100% positive classes. We can also notice that training with EGC threshold can provide 90% precession in classifying the positive classes with 10% harmful interference, whereas training SVM with AND threshold can precisely classify the positive classes by 97.8%. Table 1 shows the classification accuracy of SVM classifier (i.e., the proportion of all true classifications over all testing examples) and the precession of classification (i.e., proportion of true positive classes over all positive classes) as well as the recall of classification (i.e., the effectiveness of the classifier in identifying positive classes).

Threshold	Single user (%)	AND rule (%)	OR rule (%)	MRC rule (%)	SLS rule (%)	EGC rule (%)
SVM
Accuracy	96.1	98.3	98.1	97.6	98.9	98.0
Precession	77.7	100	53.7	89.4	74.4	90.1
Recall	100	97.8	100	100	100	100

Table 1.

The accuracy, precession and the recall of SVM classifier.

Figure 4 shows ROC curves showing the comparison of four machine-learning classifiers: K-nearest neighbor (KNN), support vector machine (SVM), Naive Bayes and Decision tree when used to classify 1000 frames after training them over a set containing 1000 frames with single-user threshold (Note: the same system used to generate the simulation of Figure 3. is considered for computing the single-user threshold). We can notice from both Figure 4. and Table 2 that KNN and decision tree classifier perform better than Naïve Bayes and SVM classifier in terms of the accuracy of classifying the new frames.

Figure 4.
ROC curves shows a comparison of four machine learning classifiers: KNN, SVM, naive Bayes, and decision tree in classifying 1000 frames after training them over a set with 1000 frames using single user scheme threshold.

Classifier	Accuracy (%)	Precession (%)	Recall (%)
KNN	100	100	100
Decision Tree	100	100	100
Naïve Bayes	98.9	100	91.2
SVM	97.6	83.9	100

Table 2.

The accuracy, precession and recall of KNN, SVM, NB, and DT classifiers used in classifying 1000 new frames after being trained with 1000 frames.

Table 3 shows the accuracy, precession, and the recall for decision tree classifier when used to classify 3000 frames after training it over a set containing 1000 frames for the same cognitive system used to generate Figure 3. The single-user threshold is used for training the classifier. The simulation was run with different number of samples for energy detection process. It is clear from the table that decision tree can classify all of the 3000 frames correctly or achieve 100% detection rate using only 200 samples for the energy detection process. And, due to the fact that the sensing time is proportional to the number of samples taken by energy detector, a less number of samples used for energy detection leads to less sensing time. Thus, when we use machine-learning-based fusion, such as decision tree or KNN, we can reduce the sensing time from 200 to 40 μs for 5 MHz bandwidth channel as an example, while we still achieve 100% detection rate of the spectrum hole.

Number of samples	Accuracy (%)	Precession (%)	Recall (%)
200	100	100	100
400	100	100	100
600	100	100	100
800	100	100	100
1000	100	100	100

Table 3.

The accuracy, precession, and recall for decision tree classifier used in classifying 3000 frames for different number of samples.

3. Prediction of PU channel state based on hidden Markov and Markov switching model

In this part, the system model for forecasting the near future of PU channel state is divided into three models: (1) the model detecting the PU channel state (i.e., PU signal present or PU signal) which follows the conventional single-user energy detection (i.e., fusion techniques mentioned in Section 2.1 can also be considered here); (2) the model that generates a time series to capture PU channel state based on the detection sequence; and (3) the model for predicting the generated time series used to capture PU channel state based on hidden Markov model (HMM) and Markov switching model (MSM). The block diagram in Figure 5. illustrates these three models.

Figure 5.
Block diagram of PU channel state prediction model.

The PU channel state detection model can be written using Eq. (4); by giving probability of false alarm Pf, the detection threshold for single-user energy detector can be written as:

λ=2NQ−1Pf+1σu2E24

where Q−1. is the inverse of the Q. function.

And the decision of the sensing (i.e., PU detection sequence) over the time can be written as follows:

Dt="0"PUabsentYt<λ"1"PUpresentYt≥λ1≤t≤TE25

3.1. Time series generation model

Given PU channel state detection sequence over the time (i.e., PU absent, PU present), if we denote the period that the PU is inactive as “idle state,” and the period that PU is active as “occupied state,” our goal now is to predict when the detection sequence Dt will change from one state to another (i.e., “idle” to “occupied “or vice versa) before that happens so that the secondary user can avoid interfering with primary user transmission. For this reason, we generate a time series ztto map each state of the detection sequence Dt (i.e., “PU present” and “PU absent”) into another observation space using two different random variable distributions for each state (i.e., zt∈v1v2…vL represents PU absent or idle state and zt∈vL+1….vM represents PU occupied or present), the time series zt can be written as

zt∈v1v2…vLYt<λvL+1..vMYt≥λ1≤t≤TE26

Now, supposing that we have given observations value O=O1O2Ot…OT ,Ot∈v1v2…vM and a PU channel state at time step t,Xt∈si,i=1,2….K, si∈01 (i.e., 0 for PU idle and 1 PU occupied state), and we want to estimate the channel state at one time step ahead of the current state Xt+1. We can solve this problem using hidden Markov model Viterbi algorithm [15].

3.2. Primary users channel state estimation based on hidden Markov model

The generic HMM model can be illustrated by Figure 6.—in this figure, X=X1Xt…XT represents the hidden state sequence, where Xt∈s1s2…sK, K represents the number of hidden states or Markov chain and O=O1Ot…OT represents the observation sequence where Ot∈v1v2…vM and M is the number of the observations in the observation space. A and B represent the transition probabilities matrix and the emission probabilities matrix, respectively, while π denotes the initial state probability vector. HMM can be defined by θ=πAB (i.e., the initial state probabilities, the transition probabilities, and emission probabilities) [15].

Initial state probabilities for HMM can be written as

π=π1π2πi….πK

πi=PX1=si,i=1,2,…,KE27

For a HMM model with two hidden states i=2 ,

π=π1π2

And the transition probabilities can be written as,

A=aijK×K

aij=PXt+1=sjXt=si,i,j=1,…..,KE28

where a_ij is the probability that next state equal sj when current state is equal to si. For HMM model with two states, the matrix A can be written as

A=a00a01a10a11

The emission probabilities matrix for HMM model is written as

B=bjmK×M

bjm=POt=vmXt=sj)≜bjOt,j=1…K,m=1,…..ME29

B and b_j represent the probability that current observation is vm when current state is sj. For example, in an HMM model with M=6 and K=2, B is written as

B=b11b12b13b21b22b23b14b15b16b24b25b26

Now, for the problem we describe in subsection (3.1), if we assume that HMM parameters θ=πAB and the observations value O=O1O2Ot…OT are given. If we assume that the maximum probability of state sequence t that end at state i to be equal to δti where

δti=maxX1,…,Xt−1PX1,…,Xt=si;O1,…,OtθE30

And we let ψti to be a vector that stores the arguments that maximize Eq. (30), we can write Viterbi algorithm to solve the problem mentioned in subsection (3.1) as follows:

1) step 1 initializes δti and ψti.

δti=πibiO1

ψti=0,i=1,…,KE31

2) step 2 iterates to update δti and ψti.

δtj=max1≤i≤Kδt−1iaijbjOt,t=2,…,T,j=1,…,KE32

ψtj=argmax1≤i≤Kδt−1iaij,t=2,…,T,j=1,…,KE33

3) step 3 terminates the update and calculates the likelihood probability P∗ and the estimated state qT∗ at time T as

P∗=max1≤i≤KδTiE34

qT∗=argmax1≤i≤KδTiE35

In the above case, HMM parameters θ=πAB are unknown and need to be estimated. We estimate these parameters statistically using Baum-Welch algorithm [16].

3.2.1. Hidden Markov model parameters estimation using Baum-Welch algorithm

If we assume that we have given some training observations with length L O1O2Ot…OL and want to approximate HMM parameters θ=πAB from them, we can use maximum likelihood estimation. In order to do that, we define γti to be the probability of being in state si at time t, given t Ot ,t=1,2…L . γti is written as

γti=PXt=siO1…OLθE36

We also define ζtij to be the probability of being in state si at time t and transiting to state sj at time t+1, given Ot ,t=1,2…L . ζtij is written as

ζtij=PXt=siXt+1=sjOO1…OLθE37

Given γti and ζtij, the anticipated number of transitions from state si during the path is written as

Eγti=∑t=1L−1γtiE38

and the anticipated number of transitions from state si to state sj during the path is written as

Eζtij=∑j=1L−1ζtijE39

Given Eζtij and Eγti, we can extract the model parameters θ=πAB from the training sequence as given in [16] using the step listed below

1- for i=1,2,3…K, let π̂i=expected frequency in statesiattimet=1

πi=γ1iE40

2- for i=1,2,3…K and j=1,2,3…K, compute

âij=Expected number of transitions fromstatesito statesjExpected number of transitions from statesi

=EζtijEγti=∑j=1L−1ζtij∑t=1L−1γtiE41

3- for i=1,2,3…K and j=1,2,3…K, compute

b̂im=Expected number of times in statesjand observingvmExpected number of times in statesj

=∑t=1Ot=vmLγti∑t=1LγtiE42

The estimation algorithm can be summarized in the following steps:

Get your observations O1O2,…OL,
Set a guess of your first θ estimate θ1,k=1
Update k=k+1
Compute θk based on O1O2,…OL and
γti,ζtij∀1≤t≤L,∀1≤i≤K,∀1≤j≤K
Compute Eγti and Eζtij from Eqs. (38) and (39)
Compute according to 5 the new estimate of aij, bik, πi, and call them θ (k + 1)
Go to 3 if not converged.

The prediction for a one-step ahead PU channel state can be done based on the trained parameters {π,A,B } with the help of Eqs. (31), (34), and (35) by setting T=1.

3.3. Primary users channel state estimation based on Markov switching model

An alternative way to estimate PU channel state is to use Markov switching model (MSM). For the time series in Eq. (26), we assume that zt obeys two different Gaussian distributions N∼μz0σz02 or N∼μz1σz12 based on the sensed PU channel state “PU channel idle” or “PU occupied.” We can rewrite Eq. (26) as follows:

zt∼μz0σz02Yt<λμz1σz12Yt≥λ1≤t≤TE43

It is obvious that Eq. (43) represents a two-state Gaussian regime switching time series which can be modeled using MSM [17]. In order to estimate the switching time of one state ahead of the current state for this time series, we need to derive MSM regression model for the time series and estimate its parameters.

3.3.1. Derivation of Markov switching model for Gaussian regime switching time series

A simple Markov switching regression model to describe the two-state Gaussian regime switching time series is given in Eq. (43). This model can be written by following Ref [17] as

zt=μst+ϵtϵt∼0σst2E44

where μst is an array of predetermined variables measured at time t, which may include the lagged values of zt, ϵt is the white noise process,st=01 is a hidden Markov chain which has a mean and standard deviation over the time equal to μst=μ01−st+μ1st and σst=σ01−st+σ1st, respectively (the state variable st follows first order Markov chain (i.e., two-state Markov chain as in [18])). Given the past history of st, the probability of st taking a certain value depends only on st−1, which takes the following Markov property:

P(st=jst−1=i=PijE45

where Pijij=01 denotes the transition probabilities of st=j, given that st−1=i. Clearly, the transition probabilities satisfy Pi0+Pi1=1 . We can gather the transition probabilities Pij into a transition matrix as follows:

P=( P(st=0| st−1=0) P( st=0| st−1=1) P( st=1| st−1=0) P( st=1| st−1=1))=( P00 P01 P10 P11)E46

The transition matrix P is used to govern the behavior of the state variable st, and it holds only two parameters (P00 and P11 ). Assuming that we do not observe st directly, we only deduce its operation from the observed behavior of zt. The parameters that need to be estimated to fully describe the probability law governing zt are the variance of the Gaussian innovation σ0,σ1, the expectation of the dependent variable μ0, μ1′ and the two-state transition probabilities P00 and P11.

3.3.2. Markov switching model parameters estimation via maximum likelihood estimation

There are many ways to estimate the parameters for the Markov switching model. Among these ways are Quasi-maximum likelihood estimation (QMLE) and Gibbs sampling. In this section, we focus on maximum likelihood estimation (MLE).

If we denote ψt−1 = {zt−1,zt,zt+1…z1} to be a vector of the training data until time t−1 and denote θ ={σ0,σ1, μ0, μ1′P00,P11} to be the vector of MSM parameters, then ψL = {zt−1, zt, …, zL} to a vector of the available information with the length L sample (see Figure 7.). In order to evaluate the likelihood of the state variable st based on the current trend of zt, we need to assess its conditional expectations st=i, i=01 based on ψ and θ . These conditional expectations include prediction probabilities P(st=iψt−1θ, which are based on the information prior to time t, the filtering probabilities P(st=i∣ψt;θ) which are based on the past and current information, and finally the smoothing probabilities P(st=iψLθ which are based on the full-sample information L. After getting these probabilities, we can obtain the log-likelihood function as a byproduct, and then we can compute the maximum likelihood estimates.

Normally, the density of zt conditional on ψt−1 and st=i, i=01 is written as

F(ztst=iψt−1θ=12πσste−zt−μst22σst2E47

where F represent the probability density function. Given the prediction probabilities P(st=iψt−1θ, the density of zt conditional on ψt−1 can be obtained from

F(ztψt−1θ=

=P(st=0ψt−1θFztst=0ψt−1θ

+P(st=1ψt−1θF(ztst=1ψt−1θE48

For i=0;1, the filtering probabilities of st are given by:

P(st=iψtθ

=P(st=iψt−1θF(ztst=iψt−1θF(ztψt−1θE49

The prediction probabilities are:

P(st+1=iψtθ

=P(st=0,st+1=iψtθ+Pst=1st+1=iψtθ

=P0iP(st=0ψtθ+P1iP(st=1ψtθE50

where P0i=P(st+1=ist=0 and P1i=P(st+1=ist=1 are the transition probabilities. By setting the initial values as given in [19] assuming the Markov chain is presumed to be ergodic:

P(s0=iψ0=1−Pjj2−Pii−Pjj

we can iterate the Eqs. (49) and (50) to obtain the filtering probabilities P(st=iψtθ and the conditional densities Fztst=0ψt−1θ for t=1,2,…..T. Then we can compute the logarithmic likelihood function using

logLθ̂=∑t=1T∑i=12log(FZtSt=iψt−1θ×PSt=iψtθE51

where Lθ̂ is the maximized value of the likelihood function. The model estimation can finally be obtained by finding the set of parameters θ̂ that maximize the Eq. (51) using numerical-search algorithm. The estimated filtering and prediction probabilities can then be easily calculated by plugging θ̂ into the equation formulae of these probabilities. We adopt the approximation in Ref [20] for computing the smoothing probabilities P(st=iψLθ

P(st=ist+1=jψLθ≈P(st=ist+1ψtθ

=P(st=i,st+1ψt−1θP(st+1=jψtθ

=P0iPst=iψtθP(st+1=jψtθ

And, for i;j=0;1, smoothing probabilities is expressed as:

P(st=iψLθ

=P(st+1=0ψLθP(st=1st+1=0ψLθ

+P(st+1=1ψLθP(st=ist+1=1ψLθ

=P(st=iψtθ×Pi0P(st+1=0ψLθP(st+1=0ψtθ+Pi1P(st+1=1ψLθP(st+1=1ψtθE52

Using PSL=iψLθ as the initial value, we can iterate the equations regressively for filtering and prediction probabilities along with the equation above to get the smoothing probabilities for t=L−1,··,k+1.

3.4. Results and discussions

Figure 8 shows the training detection sequence which we generate as a training observation using randomly distributed PU channel state “idle and occupied” over T = 250 ms simulation time. We use this training observations to train Baum-Welch algorithm in order to estimate HMM model parameters θ=πAB, assuming that the first estimate of θ1 is:

π1=10

A1=0.850.150.100.90

B1=0.170.160.170.160.170.170.170.600.080.080.080.080.080.08

Figure 8.
The training detection sequence for HMM and MSM.

Figure 9a shows the performance of HMM algorithm in estimating the PU channel states (i.e., PU idle or PU occupied) of the time series that capture the detection sequence for a single-user cognitive radio network. Figure 9a contains three plots; the top plot shows the randomly distributed PU channel states over time T=500ms . The middle plot shows the generated time series following the distribution zt∈123 for idle states and zt∈456 for occupied states (note: we can construct the observation space from these two distributions as Ot∈123456,t=1,2…500ms). The bottom plot shows performance of HMM algorithm in forecasting the time series generated to capture PU detection sequence.

Figure 9.
(a) Shows the performance of HMM algorithm in predicting the generated time series to capture PU channel state detection sequence. (b) Shows the performance of MSM algorithm in predicting the generated time series to capture the same PU detection sequence in Figure 8.

Figure 9b shows the performance of MSM algorithm in predicting the switching process between the two PU channel states for the same PU detection sequence given in Figure 9a T=500ms . The top graph in Figure 9b shows the generated time series with the following distribution zt∼0.1,0.5 for idle states and zt∼0.01,0.2 for occupied states and the bottom graph shows the prediction performance using MSM. As it is clear from the figure, the prediction performance is smoother than HMM approach.

4. Conclusions

In this chapter, we have presented a per-frame decision-based cooperative spectrum sensing based on machine-learning classifier-based fusion approach. The simulation and numerical results have shown that the machine-learning classifier-based fusion algorithm performs same as conventional fusion rules in terms of sensing accuracy with less sensing time, overheads, and extra operations that limit achievable cooperative gain among cognitive radio users. In addition, we have also studied the problem of primary user channel state prediction in cognitive radio network and introduced Markov model and Markov Switching Model to solve this problem. We finally showed by the means of simulation that both hidden Markov model and Markov switching model perform very well in predicting the time series that capture the actual primary user channel state.

Acknowledgments

I gratefully acknowledge the funding received from Shanghai Jiao Tong University to undertake my PhD. I also thank Prof. Bin Guo for his encouragement and help on the topic.

References

1. Teguig D, Scheers B, Le Nir V. Data fusion schemes for cooperative spectrum sensing in cognitive radio networks. Communications and Information Systems Conference (MCC), 2012 Military. IEEE; 2012
2. Zhai X, Jianguo P. Energy-detection based spectrum sensing for cognitive radio. 2007:944-947
3. Mikaeil AM, Guo B, Bai X, Wang Z. Machine learning to data fusion approach for cooperative spectrum sensing. 2014 International Conference on Cyber Enabled Distributed Computing and Knowledge Discovery(CyberC), Shanghai,13–15 October 2014, 429-434
4. Zhe C, Qiu RC. Prediction of channel state for cognitive radio using higher-order hidden Markov model. IEEE SoutheastCon 2010 (SoutheastCon), Proceedings of the IEEE. 2010
5. Zhe C et al. Channel state prediction in cognitive radio, part II: Single-user prediction. Proceedings of IEEE Southeast Con. 2011
6. Chang-Hyun P et al. HMM based channel status predictor for cognitive radio.” Microwave Conference, 2007. APMC 2007. Asia-Pacific. IEEE, 2007
7. Mikaeil AM, Guo B, Bai X, Wang Z. Hidden Markov and Markov switching model for primary user channel state prediction in cognitive radio. IEEE 2nd International Conference on Systems and informatics (ICSAI). 2014:854-859
8. Kieu-Xuan T, Koo I. A cooperative spectrum sensing scheme using fuzzy logic for cognitive radio networks. KSII Transactions on Internet and Information Systems (TIIS). 2010;4(3):289-304
9. Khaira ND, Bhadauria P. Cooperative spectrum sensing and detection efficiency in cognitive radio network. International Journal of Electronics and Computer Science Engineering (IJECSE, ISSN: 2277–1956)1. 2012;01:64-73
10. Qin Q, Zhimin Z, Caili G. A study of data fusion and decision algorithms based on cooperative spectrum sensing. Fuzzy Systems and Knowledge Discovery, 2009. FSKD'09. Sixth International Conference on. Vol. 1. IEEE, 2009
11. Bermejo S, Cabestany J. Adaptive soft k-nearest neighbor classifiers. Pattern Recognition. 2000;33(12):1999-2005
12. Metsis V, Androutsopoulos I, Paliouras G. Spam filtering with naive bayes-which naive bayes? CEAS. 2006
13. Thilina KM et al. Pattern classification techniques for cooperative spectrum sensing in cognitive radio networks: SVM and W-KNN approaches. Global Communications Conference (GLOBECOM), 2012 IEEE, 2012
14. Barros RC et al. A survey of evolutionary algorithms for decision-tree induction. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on. 2012;42(3):291-312
15. Rabiner L. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE. 1989;77.2:257-286
16. Welch LR. Hidden Markov models and the Baum-Welch algorithm. IEEE Information Theory Society Newsletter. 2003;53(4):10-13
17. Frühwirth-Schnatter S. Finite Mixture and Markov Switching Models: Modeling and Applications to Random Processes. Springer; 2006
18. Kuan C-M. Lecture on the Markov switching model. Institute of Economics Academia Sinica. 2002 available at homepage.ntu.edu.tw/~ckuan/pdf/Lec-Markov_note.pdf
19. Hamilton JD. Time Series Analysis. Vol. 2. Princeton: Princeton university press; 1994
20. Kim C-J. Dynamic linear models with Markov-switching. Journal of Econometrics. 1994;60.1:1-22

[1] 1. Teguig D, Scheers B, Le Nir V. Data fusion schemes for cooperative spectrum sensing in cognitive radio networks. Communications and Information Systems Conference (MCC), 2012 Military. IEEE; 2012

[2] 2. Zhai X, Jianguo P. Energy-detection based spectrum sensing for cognitive radio. 2007:944-947

[3] 3. Mikaeil AM, Guo B, Bai X, Wang Z. Machine learning to data fusion approach for cooperative spectrum sensing. 2014 International Conference on Cyber Enabled Distributed Computing and Knowledge Discovery(CyberC), Shanghai,13–15 October 2014, 429-434

[4] 4. Zhe C, Qiu RC. Prediction of channel state for cognitive radio using higher-order hidden Markov model. IEEE SoutheastCon 2010 (SoutheastCon), Proceedings of the IEEE. 2010

[5] 5. Zhe C et al. Channel state prediction in cognitive radio, part II: Single-user prediction. Proceedings of IEEE Southeast Con. 2011

[6] 6. Chang-Hyun P et al. HMM based channel status predictor for cognitive radio.” Microwave Conference, 2007. APMC 2007. Asia-Pacific. IEEE, 2007

[7] 7. Mikaeil AM, Guo B, Bai X, Wang Z. Hidden Markov and Markov switching model for primary user channel state prediction in cognitive radio. IEEE 2nd International Conference on Systems and informatics (ICSAI). 2014:854-859

[8] 8. Kieu-Xuan T, Koo I. A cooperative spectrum sensing scheme using fuzzy logic for cognitive radio networks. KSII Transactions on Internet and Information Systems (TIIS). 2010;4(3):289-304

[9] 9. Khaira ND, Bhadauria P. Cooperative spectrum sensing and detection efficiency in cognitive radio network. International Journal of Electronics and Computer Science Engineering (IJECSE, ISSN: 2277–1956)1. 2012;01:64-73

[10] 10. Qin Q, Zhimin Z, Caili G. A study of data fusion and decision algorithms based on cooperative spectrum sensing. Fuzzy Systems and Knowledge Discovery, 2009. FSKD'09. Sixth International Conference on. Vol. 1. IEEE, 2009

[11] 11. Bermejo S, Cabestany J. Adaptive soft k-nearest neighbor classifiers. Pattern Recognition. 2000;33(12):1999-2005

[12] 12. Metsis V, Androutsopoulos I, Paliouras G. Spam filtering with naive bayes-which naive bayes? CEAS. 2006

[13] 13. Thilina KM et al. Pattern classification techniques for cooperative spectrum sensing in cognitive radio networks: SVM and W-KNN approaches. Global Communications Conference (GLOBECOM), 2012 IEEE, 2012

[14] 14. Barros RC et al. A survey of evolutionary algorithms for decision-tree induction. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on. 2012;42(3):291-312

[15] 15. Rabiner L. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE. 1989;77.2:257-286

[16] 16. Welch LR. Hidden Markov models and the Baum-Welch algorithm. IEEE Information Theory Society Newsletter. 2003;53(4):10-13

[17] 17. Frühwirth-Schnatter S. Finite Mixture and Markov Switching Models: Modeling and Applications to Random Processes. Springer; 2006

[18] 18. Kuan C-M. Lecture on the Markov switching model. Institute of Economics Academia Sinica. 2002 available at homepage.ntu.edu.tw/~ckuan/pdf/Lec-Markov_note.pdf

[19] 19. Hamilton JD. Time Series Analysis. Vol. 2. Princeton: Princeton university press; 1994

[20] 20. Kim C-J. Dynamic linear models with Markov-switching. Journal of Econometrics. 1994;60.1:1-22