Performance and the most relevant characteristics of the studies using ANNs in the context of SAHS classification, event detection, and AHI regression.
Open access peer-reviewed chapter
By Daniel Álvarez, Ana Cerezo-Hernández, Graciela López-Muñiz, Tania Álvaro-De Castro, Tomás Ruiz-Albi, Roberto Hornero and Félix del Campo
Submitted: May 20th 2016Reviewed: October 26th 2016Published: April 5th 2017
Sleep apnea-hypopnea syndrome (SAHS) is a chronic and highly prevalent disease considered a major health problem in industrialized countries. The gold standard diagnostic methodology is in-laboratory nocturnal polysomnography (PSG), which is complex, costly, and time consuming. In order to overcome these limitations, novel and simplified diagnostic alternatives are demanded. Sleep scientists carried out an exhaustive research during the last decades focused on the design of automated expert systems derived from artificial intelligence able to help sleep specialists in their daily practice. Among automated pattern recognition techniques, artificial neural networks (ANNs) have demonstrated to be efficient and accurate algorithms in order to implement computer-aided diagnosis systems aimed at assisting physicians in the management of SAHS. In this regard, several applications of ANNs have been developed, such as classification of patients suspected of suffering from SAHS, apnea-hypopnea index (AHI) prediction, detection and quantification of respiratory events, apneic events classification, automated sleep staging and arousal detection, alertness monitoring systems, and airflow pressure optimization in positive airway pressure (PAP) devices to fit patients’ needs. In the present research, current applications of ANNs in the framework of SAHS management are thoroughly reviewed.
In their daily practice, physicians must frequently decide a definitive diagnosis or the most suitable treatment using several variables from multiple clinical data sources, which is a highly complex task. A huge amount of valuable healthcare-related information is currently available, from symptoms reported by the patient and details stored in their clinical history to biochemical data and outcomes from biomedical recordings or medical images. In this context, machine learning methods are essential to maximize the usefulness of medical data in order to expedite decisions and avoid misdiagnosis. In the last decades, the increasing development of computers and artificial intelligence has led to the use of decision support expert systems in the common clinical practice of several fields of medicine [1, 2]. The huge number of studies published in the context of biomedical engineering during the last years clearly shows this trend.
Bayesian theory was one of the first mathematical frameworks used to implement decision support systems. Regarding the classification of an item, according to the Bayes’ decision rule, the predicted class must be the one that maximizes a posteriori probability in order to minimize the classification error. A major goal is to model the statistical characteristics of the problem under study, leading to expert systems able to assist physicians in decision-making processes. Among pattern recognition algorithms, conventional statistical classifiers, such as discriminant analysis  or logistic regression (LR) , and more recently artificial neural networks (ANNs) , have been widely applied. The widely known statistical classifiers assume that the class density function of input data is known a priori. Assumptions such as normal distribution, homoscedasticity, linearity, independency, or stationarity decrease the complexity of the classifier, minimize the classification error, and improve the performance. Nevertheless, these assumptions are not always consistent in real-world pattern classification problems, especially when working with limited datasets. Conversely, when using ANNs, no assumptions are made about the probability density functions of input features and the training data is used directly to optimize the decision rule . Nevertheless, ANNs are characterized by a complex design stage. Both statistical and ANNs approaches have its advantages and limitations. However, the ability to model complex nonlinear problems, which are very common in biological systems, have made ANNs widely used in medical applications.
The first attempt to model information processing in biological systems by means of ANNs was carried out by McCulloch and Pitts in 1943 . Since then, ANN-based algorithms have significantly evolved and their use in the field of medicine has increased considerably, particularly since the late 1990s. Some computer programs in the context of statistical medicine already include ANNs among their functionalities, which has contributed to increase their use in medical research. Nevertheless, “neural network” remains frequently a confusing term for many healthcare-related researchers. The implementation of an ANN has to be carried out by means of advanced software and some expertise is required to set up properly the user-dependent input parameters. However, once designed, they are reliable and easy to use tools, even by nontrained personnel. In addition, once optimized, the computational time is small, which is a major feature in order to speed up decision making.
Sleep research and particularly sleep-related breathing disorders (SBD) is a field in which the application of automated pattern recognition algorithms has increased exponentially during the last years due to the need for automating their complex diagnostic processes. Particularly challenging is the management of sleep apnea-hypopnea syndrome (SAHS). The gold standard technique for SAHS diagnosis is in-lab nocturnal polysomnography (PSG). During PSG, several neuromuscular and cardiorespiratory signals (up to 32 biomedical recordings) are monitored and stored for subsequent interpretation by trained personnel, which is a highly complex and time-consuming task . In addition, accessibility to diagnosis and treatment is limited due to insufficient resources, both human (trained specialists) and technical (specialized sleep units), which have led to large waiting lists . In this context, automated computer-aided diagnosis systems have emerged as very useful tools to deal with complex rules involving several biomedical recordings simultaneously, in order to expedite diagnosis and treatment [10–12]. Among all the machine learning-based tools, ANNs have been widely applied in the context of SAHS and merit a thorough analysis.
In order to analyze the usefulness of ANNs in the management of SAHS, an exhaustive review of the studies published during the past decade has been carried out. The review is structured as follows. First, the most relevant tasks regarding the ANNs learning process are outlined in Section 2. In this regard, some user-dependent decisions involving the ANN design and major issues concerning the training and testing processes are detailed. Second, in Section 3, the most relevant applications of ANNs are analyzed, including automated diagnosis, sleep staging, and treatment monitoring.
ANNs are mathematical models inspired in the information processing capabilities of the nervous system designed to accomplish a predetermined task specified by the user [13, 14]. They were built to implement useful brain functions into a pattern recognition algorithm, such as parallel processing, distributed memory/storage, and environmental flexibility. ANNs are characterized by a fast and effective processing, learned from a preceding training process. During the learning or training stage, a wide set of known representative samples are used in order to model the statistical properties of the problem under study and accordingly compose the structure of the network. Figure 1 illustrates a common network architecture of interconnected nodes arranged in layers simulating the brain’s neuronal synapses.
The following advantages can be obtained when ANNs are applied for pattern recognition problems: (i) no prior assumptions about the data distribution are made as ANNs adjust themselves to the particular problem constrains during the learning process , (ii) ANNs are universal estimators able to match any function with arbitrary accuracy , and (iii) they are nonlinear algorithms able to model real-world complex relationships .
There are two major classes of ANNs: feedforward multilayered networks and radial basis function (RBF) networks. Both types of ANNs are capable of approximating any continuous functional mapping by means of several units (neurons) arranged in different layers . The main difference is the way hidden units are activated, i.e., how the input data is used to compute the output of each unit. In feedforward ANNs, there is a fixed (usually nonlinear) activation function, whereas in RBF ANNs, the activation of each unit depends on the radial distance (typically Euclidean) between the input vector and a prototype vector (center) .
The multilayer perceptron (MLP) is the most widely used feedforward ANN in computer-aided medical research. Indeed, feedforward networks, particularly MLP, are the most popular ANN in the framework of SAHS management [19–21]. A particular implementation of MLP networks involving Bayesian inference during the learning process (BY-MLP), which increase the generalization ability and allow for relevance analysis of input variables, has demonstrated to be useful in this context . Similarly, probabilistic neural networks (PNN), which also integrates the Bayes’ theory into the learning process, have been recently applied in the SAHS diagnosis problem . In addition, RBF ANNs [24, 25], such as learning vector quantization (LVQ), which is a precursor of self-organizing maps using the Hebbian learning-based approach [26, 27]; fuzzy neural networks (FNN), which incorporate the fuzzy inference system (FIS) into the learning process [26, 28, 29]; self-organizing maps (SOM) and adaptive resonance theory (ART) models, which are likely the most common unsupervised ANNs [30, 31]; and recurrent neural networks (RNN), which allow for closed-loop connections between units (feedback) , have been also applied in the framework of automated SAHS management.
Next, an overview of the conventional multilayered network architecture is provided, as well as the most important issues regarding the design, training, and validation stages common to all approaches in the ANN-based framework. Figure 2 shows a flow diagram summarizing these stages.
The so-called neuron is the basic element within an ANN, which comprises its elementary mathematical functions . ANNs are composed of multiple interconnected nodes arranged in different levels or layers leading to a massive parallel structure. The first level is called the input layer. Neurons in the input layer process directly the feature vectors or patterns that feed the ANN. Similarly, each output from every neuron in a layer feeds neurons composing the subsequent layer, leading to a distributed complex structure. The last level of the network, whose nodes provide the output of the ANN, is called the output layer. The remaining internal levels are called hidden layers. Both the number of hidden layers and the number of nodes are flexible and are determined during the learning process. The feedforward architecture is the most widely used, where each neuron in a layer is connected to every neuron on the next layer but neither connections between units in the same level nor closed-loops (feedback) are allowed. Therefore, data is always moving forward from one layer to the next, i.e., from the input to the output.
There is not a predetermined network architecture known to be a priori the best for any problem under study in terms of performance. The mathematical operation accomplished by each neuron is always the same. Therefore, the functionality of the ANN, i.e., the way in which a particular problem is addressed, is determined by the strength of the link between each pair of neurons. This strength is characterized by the coefficients of the ANN, the so-called weights, which are optimized during the training stage. Similar to the process of memory, weights represent the information stored in the network, whereas the optimization procedure represents the learning process or statistical inference .
As aforementioned, the structure of an ANN depends on the number of hidden layers, the number of neurons per layer, and the connectivity strength among them. Regarding the number of levels, it is common to construct ANNs with a single hidden layer because it has been demonstrated that this architecture is able to achieve universal approximation . This is a user-dependent decision, whereas the number of neurons and the connectivity degree (weights) are both determined automatically during the learning process. Regarding the number of nodes in the hidden layer, it is commonly optimized by means of a hold-out or cross-validation approach using the data in a training dataset. In this regard, it is supposed that the complex the problem, the higher the number of neurons. Notwithstanding, even a small network with a reduced number of nodes can model complex problems and reach high prediction ability. In addition, the following design issues must be addressed before the learning process : the output coding scheme, the error function used in the network training, and the activation function of neurons in the hidden and output layers. The hyperbolic tangent function is a common activation function for neurons in the hidden layer since it has been demonstrated that it provides fast convergence of training algorithms [13, 17]. Figure 3 shows a common schema of a single neuron (perceptron) with a sigmoid activation function. Regarding the learning process, the scale conjugate gradient (SCG) is a common method for updating the adjustable parameters of the ANN (weights and biases) during the training stage.
According to the mathematical nature of the output, ANNs can be applied to address two main kinds of problems: classification and regression. Regarding the classification approach, the goal of the ANN is to estimate the class membership for an input feature pattern among a set of predefined discrete categories. Conversely, in a regression task, the goal of the ANN is to estimate a continuous variable.
In the context of binary classification problems, an output layer with just a single neuron is needed. Regarding, for instance, a 2-class SAHS diagnosis problem, all input patterns are assigned to one of two mutually exclusive classes: SAHS positive (class C 0 or positive class) or SAHS negative (class C 1 or negative class). A possible target coding scheme would be the following: t = 0 for the positive class and t = 1 for the negative class. This architecture can be used also in regression problems, where the variable to be approximated is unidimensional and continuous. In the context of SAHS diagnosis, the goal of a regression ANN could be to estimate the apnea-hypopnea index (AHI).
Due to a highly flexible architecture, most of the ANNs can be used to model both classification and regression problems by just modifying certain design characteristics . The main difference between classification and regression ANNs is linked with the nature of the function to be approximated. The output of an ANN is provided in terms of probability in a classification task while it is an estimate of a continuous variable in a regression context. Accordingly, optimization procedures differ from one approach to another. Regarding a binary classification approach, the activation function of the output neuron could be a nonlinear function with output values ranging [0, 1], e.g., sigmoid functions such as the logistic or the hyperbolic tangent. In this regard, the network output can be interpreted as the probability that the input feature pattern belongs to one class or another according to the Bayes’ theorem. Conversely, addressing a regression task, the network output values must be continuous and nonnegative. Therefore, a linear activation function ranging [0, ∞) would be suitable.
Regarding the error function governing the learning process, the cross-entropy error function is widely used in the context of binary classification, whereas the sum-of-squares error function is commonly used for regression purposes .
Normalization of input feature values is an important task in nonlinear pattern recognition methods . Bounded similar input magnitudes are needed to accomplish suitable weight initialization. Input patterns are composed of features parameterizing different properties of the problem under study, e.g., the influence of recurrent apnea events typical of SAHS on cardiorespiratory signals. Usually, several features of different nature are involved in order to obtain as much information as possible, e.g., sociodemographic, anthropometric, clinical, and/or variables from automated feature extraction algorithms. Therefore, their values may differ significantly and thus they must be normalized. In this regard, simple linear rescaling can be used to standardize (zero mean and unit variance) the magnitudes of each input feature by subtracting its mean and dividing by its standard deviation.
Training is the most important stage when working with ANNs. The aim of the training process is to adapt the ANN to the problem under study by computing some adjustable parameters. The training or learning process can be (i) supervised, in which the learning process is guided by a static mapping between input patterns and known targets; (ii) reinforced, in which a performance function assesses the accuracy of the current output instead of knowing the actual target values; and (iii) unsupervised, in which ANNs adapt themselves to input patterns with no kind of feedback . In the context of medical decision support systems, the supervised approach is the most widely used. When using supervised learning, it is essential to know the target or actual output value for a wide set of input patterns. The dataset of examples used during the learning stage is referred to as the training set. According to this training input-output pairs, the network weights are tuned to fit the input to its corresponding target. It is important that the training set would be large enough to represent fairly the problem under study.
The backpropagation learning is the most commonly used methodology for updating weights in feedforward ANNs due to its computational efficiency . Using this approach, all weights are updated every time an input pattern is fed from the training dataset in order to minimize an error function. First, the network weights are initialized randomly. During a supervised learning process, the training samples (input-target pairs) are fed into the network and the error function is computed, i.e., the difference between the estimated output value and the desired target according to a predefined suitable function. Then, the values of the network weights are modified in order to minimize the error. This procedure is repeated throughout several iterations, which are set by the user. Once the training process is finished, all network weights already have a fix value, i.e., there is a single optimized ANN able to carry out the task for which it was designed.
Once optimized, an ANN is able to process new input patterns independent of the training dataset. In this regard, it is noteworthy that the goal of the training stage must be to build a general statistical model of the problem under study rather than to learn data samples from a particular training set. This is an essential characteristic common to all pattern recognition techniques and it is required to achieve good generalization ability. Generalization accounts for the ability to make good predictions for new unknown inputs .
In addition to the user’s capability to accomplish appropriate design and optimization procedures, the performance or generalization ability of an ANN is influenced by three main factors [13, 36]: (i) the size and completeness of the training dataset, i.e., whether the learning samples account for all the variability of the environment or problem of interest; (ii) the number of adjustable parameters in the model; and (iii) the complexity of the problem under study. The nature of the problem or model complexity is linked with the number of adjustable parameters in the ANN (network weights) and it cannot be controlled. Theoretically, the harder the problem, the more complex the ANN. In this regard, it is important to achieve a compromise between the generalization ability and complexity. An ANN with a small number of parameters, i.e., low flexibility, may lead to an underfitted model, insufficient to reach high generalization. On the contrary, an ANN with a large number of weights may lead to an overfitted model that matches a particular training dataset, resulting in poor generalization. Underfitting can be avoided by increasing the flexibility, whereas overfitting requires the training set to grow accordingly to the network complexity .
In the same way, the optimization of an ANN is closely related to the bias-variance trade-off. A too simple or inflexible model will have a large bias and may lead to underfitting. Conversely, models with a high variance provide high flexibility but could adapt to the noise present in the training set, leading to overfitting. Bias and variance are both complementary characteristics and thus the best generalization is obtained when a compromise between the conflicting requirements of small bias and small variance is achieved [15, 17].
A way to reduce both bias and variance simultaneously is to increase the number of training samples. As a result, model complexity increases, which minimizes the bias. At the same time, constrains imposed by the training data will be more rigorous, thereby also reducing variance. As mentioned earlier, to achieve this goal the size of the training set should increase in accordance with model complexity . Nevertheless, this requirement cannot always be achieved in real-world applications because the size of the training set is usually fixed and limited. Therefore, finding the optimum model complexity is a major issue. In order to deal with this optimization problem, a new trade-off arises: simpler models are preferred but smoothing mapping is needed to prevent from poor generalization [13, 17]. In this regard, regularization techniques allow the ANN to control the effective complexity of the model by reducing the number of adjustable parameters during the training set. Weight decay and early stopping are common approaches of regularization. Weight decay is probably the most widely used, consisting on adding a penalty term to the error function in order to penalize complex mappings.
An additional issue regarding the training sample size is called the course of dimensionality . This term refers to the relationship between the size of the training set and the dimension of the feature space, i.e., the number of variables in the input feature vector. The course of dimensionality states that the number of training samples needed to characterize the underlying problem grows exponentially as the number of input features increases. Therefore, the size of the training dataset must also increase according to the input space dimension in order to enhance generalization ability and avoid overfitting .
As previously stated, the size of the training set in real-world applications is fixed and usually limited, especially in the field of medicine. In this regard, dimensionality reduction techniques contribute to address the problem of overfitting due to the curse of dimensionality. An ANN fed with fewer input features needs to optimize fewer parameters (weights) and these are more likely to be properly characterized by a limited training dataset. The aim of dimensionality reduction algorithms is to compose a reduced subset of the most significant features governing a model. To achieve this goal, a fitness metric (relevancy, redundancy, completeness, or accuracy, among others) is used to obtain the optimum feature subset. There are several feature selection methodologies but principal component analysis and stepwise feature selection are likely the most widely used in medical applications.
In order to estimate the actual prediction ability of an ANN, the learning, model selection, and performance assessment stages must be carried out using independent datasets, i.e., the so-called training, validation, and test datasets. The goal of model selection is to obtain the optimum network configuration by comparing the performance of several ANNs with different values of the design parameters, i.e., number of neurons in the hidden layer and usually the regularization parameter. The hold-out method is commonly used for this purpose because it avoids a biased estimation of the results . In the hold-out method, the initial population/dataset is split into three independent groups for training, validation, and testing purposes. The network weights are adjusted in the training set for different configurations of the adjustable parameters specified by the researcher, i.e., multiple ANNs are really trained, whereas the performance of each individual ANN is computed in the validation set to determine the optimum ANN for the problem under study. Since there is a random initialization of weights, the training process is frequently repeated several times to avoid a potential bias linked with this arbitrary decision. Thus, the performance metric for model selection from the validation set is averaged across all the repetitions. Nevertheless, this procedure can lead also to some overfitting so the selected optimum ANN has to be further assessed in an independent test set composed of unseen data samples .
It is worth to notice that, unfortunately, several studies from the literature do not implement a suitable validation of their proposed methodology, providing biased overoptimistic results . On the other hand, sometimes the initial dataset is not large enough to properly derive the three independent subpopulations. In such cases, cross-validation techniques allow for training and validating the models in the same training set without biasing the selection of the optimum model. Bootstrap, leave-one-out, and k-fold cross-validation are common algorithms to deal with small populations under study.
ANNs have been applied to model problems in several fields, such as industrial processes optimization, economic and financial modeling, chemistry, physics, biology, or medicine, among others [38–42]. In the framework of SAHS management, automated expert systems based on ANNs have been mainly applied to classify patients suspected of suffering from SAHS (binary classification: no SAHS vs. SAHS), to categorize the severity of the disease (multiclass classification: no SAHS, mild, moderate, and severe), to estimate the AHI (regression of a continuous variable), to detect and quantify respiratory events (normal breathing vs. apneic), and to categorize apneic events (central, obstructive, and mixed). ANNs have been also used to implement automated sleep staging and arousal detection, which are very useful functionalities incorporated in current commercial software applications for sleep analysis. In addition, ANNs play an important role in alertness monitoring systems and they are already integrated in positive airway pressure (PAP)-based treatment devices to fit user’s airflow needs, which are major issues for patients suffering from SBDs.
Most research in the field of SAHS focus on binary classification in order to determine the presence or absence of the disease. Similarly, some studies also applied ANNs for multiclass classification in order to characterize SAHS severity according to predefined discrete categories. Conversely, despite of its higher information about the severity of the disease, only a few studies have been carried out to estimate the AHI using a regression approach (continuous function).
Regarding the nature of the input data, ANNs aimed at assisting in SAHS diagnosis first used anthropometric and clinical features to compose input patterns [19, 43]. However, the increasing research in the context of biomedical signal processing allows physicians to derive essential information directly from signals monitored during the PSG . In this regard, blood oxygen saturation (SpO2) from oximetry and heart rate variability (HRV) from electrocardiogram (ECG) are the most widely used. In addition, airflow from both thermistor and nasal pressure, abdominal and chest movements, snoring sounds, and EEG have been also studied. Alternatively, in order to avoid sleep studies, automated signal processing of speech recordings and even image analysis for facial recognition have been also assessed as an alternative to PSG-derived signals to assist in the detection of SAHS.
The main goal of computer-aided tools for SAHS management is to simplify and speed up the diagnostic methodology, in order to alleviate large waiting lists and increase accessibility of patients to diagnostic resources. Current research focuses on analyzing a reduced set of biomedical recordings, which are preferably obtained at patient’s home using existing commercial portable devices. Therefore, powerful tools are needed to obtain as much information as possible from this reduced subset of signals. In this regard, ANNs allow researchers to manage several features derived from the signals under study and thus they are suitable and reliable tools to help physicians in the diagnosis of SAHS. In order to obtain complementary information, different automated signal processing methods have been applied, such as common statistics (mean, median, variance, skewness, kurtosis), time domain analyses (detection and quantification of respiratory events), frequency domain analyses (Fourier analysis, time-frequency maps, wavelet transform, bispectrum), and/or nonlinear methods (entropy measures, Poincaré plots, complexity measures), among others, both individually or jointly.
ANNs were first used in the context of SAHS detection in the late 1990s, when Kirby et al.  and El-Solh et al.  carried out retrospective analyses aimed at designing ANNs based on clinical and anthropometric variables from patients showing clinical suspicion of SAHS. Table 1 summarizes the main characteristics of significant studies carried out during the last decade focused on applications of ANNs aimed at assisting in SAHS diagnosis. In the study by Kirby et al. , 23 clinical variables fed a generalized regression neural network (GRNN), which is a kind of RBF network, for binary classification (SAHS vs. no SAHS). The authors reported 98.9% sensitivity, 80.0% specificity, and 91.3% accuracy (86.8–95.8, CI 95%). Similarly, El-Solh et al.  used clinical and anthropometric variables in order to estimate the AHI by means of a MLP ANN. Using cutoffs of 10, 15, and 20 events per hour (e/h) for a positive diagnosis of SAHS, the sensitivity-specificity pairs were 94.9–64.7%, 95.3–60.0%, and 95.5–73.4%, respectively. Both studies achieved significantly high sensitivity but poor to moderate specificity, which is a common trend of pattern recognition techniques in the context of SAHS.
|Author (year) Ref.||ANN model||Purpose||Target function/class(es)||Input variables||Performance metrics|
|Part I – Anthropometric, clinical, and SpO2 features|
|Kirby et al. (1999) ||GRNN||Classification (binary)||No SAHS vs. SAHS (AHI ≥10 e/h)||Clinical||98.9% Se|
|El-Solh et al. (1999) ||MLP||Regression||AHI estimation||Clinical and anthropometric||CC = 0.852|
cutoff 10 e/h
cutoff 15 e/h
cutoff 20 e/h
|Su et al. (2012) ||MMTS|
|Normal/mild/moderate/severe||Anthropometric and questionnaire data||84.38% average Acc|
55.33% average Acc
34.04% average Acc
47.22% average Acc
53.82% average Acc
63.54% average Acc
13.20% average Acc
|Wang et al. (2016) ||FFBB|
|No SAHS/mild/moderate/severe||Anthropometric and questionnaire data||47.5% Acc, 0.145 k, 0.288 g-mean|
43.4% Acc, 0.181 k, 0.280 g-mean
|Karamanli et al. (2016) ||MLP||Classification|
|No SAHS vs. SAHS|
(AHI ≥10 e/h)
|Sex, age, BMI, snoring status||86.6% Acc|
|Polat et al. (2008) ||FFBP|
|No SAHS vs. SAHS|
(AHI ≥ 5 e/h)
|In-lab PSG-derived||100% Se, 93.5% Sp, 95.1% Acc, 0.96 AUC|
|Ghandeharioun et al. (2015) ||SOM||Classification|
|No SAHS/mild/moderate/severe||In-lab PSG-derived and anthropometric||94.2% Se, 97.8% Sp, 96.5% Acc|
|Marcos et al. (2008) ||MLP||Classification|
|No SAHS vs. SAHS|
(AHI ≥10 e/h)
|Nonlinear features from SpO2||89.8% Se, 79.4% Sp, 85.5% Acc|
|Marcos et al. (2008) ||RBF-KM|
|No SAHS vs. SAHS|
(AHI ≥10 e/h)
|Nonlinear features from SpO2||89.4% Se, 81.4% Sp, 86.1% Ac|
86.6% Se, 81.9% Sp, 84.7% Acc
89.8% Se, 79.4% Sp, 85.5% Acc
|Almazaydeh et al. (2012) ||MLP||Classification|
|Healthy vs. SAHS|
(AHI ≥5 e/h)
|ODI3, delta index, CTM from SpO2||87.5% Se, 100% Sp, 93.3% Acc|
|Marcos et al. (2010) ||MLP|
|No SAHS vs. SAHS|
(AHI ≥10 e/h)
|Statistical, spectral, and nonlinear features from SpO2||86.4% Se, 62.8% Sp, 76.8% Acc|
87.8% Se, 82.4% Sp, 85.6% Acc
|Morillo et al. (2012) ||BY-MLP||Classification|
|No SAHS vs. SAHS|
(AHI ≥10 e/h)
|Time, stochastic, spectral, and nonlinear features from SpO2||92.4% Se, 95.9% Sp, 93.9% Acc|
|Huang et al. (2015) ||FFBB|
|No SAHS vs. SAHS|
(AHI ≥5 e/h)
|ODI4 from SpO2||88.0% Se, 93.3% Sp, 90.7% Acc|
80.7% Se, 79.3% Sp, 80.0%Acc
90.7% Se, 86.0% Sp, 88.3% Acc
|Part II – Features from ECG, snoring, and EEG recordings|
|Khandoker et al. (2009) ||SVM|
|Healthy vs. SAHS|
(AHI ≥5 e/h)
|Wavelet decomposition of HRV and EDR from ECG||100% Se, 100% Sp, 100% Acc|
90% Se, 100% Sp, 93% Acc
80% Se, 90% Sp, 83% Acc
80% Se, 50% Sp, 70% Acc
|Khandoker et al. (2008) ||FFBB||Classification|
|Apneic vs. Normal|
Hypopnea vs. apnea
Obstructive vs. Central
|ECG||87.6% Se, 95.5% Sp, 95.1 Acc|
86.1% Se, 78.7% Sp, 83.4% Acc
93.7% Se, 99.2% Sp, 98.9% Acc
|Acharya et al. (2011) ||FFBB||Classification|
|Normal/apnea/hypopnea||Nonlinear measures from ECG||95.0% Se, 100% Sp, 99.1% Acc (normal)|
88.0% Se, 90.0% Sp, 96.5% Acc (apnea)
80.0% Se, 89.5% Sp, 87.8% Acc (hypopnea)
|Lweesky et al. (2011) ||FFBB||Classification|
|Normal breathing vs. apnea epochs||P-wave features from ECG||90.0% Se, 94.2% Sp, 92.0% Acc|
|Mendez et al. (2009) ||FFBB||Classification|
|Normal breathing vs. apnea|
(AHI ≥5 e/h)
|Time and spectral features from RRi and QRS area time series||89.0% Se, 86.0% Sp, 88.0% Acc (m-by-m)|
100% Acc (record)
|Nguyen et al. (2014) ||ANN (NS)|
|Normal sleep vs. sleep apnea epochs||HRV complexity by means of RQA||85.6% Se, 79.1% Sp, 83.2% Acc|
93.7% Se, 65.9% Sp, 84.1% Acc
86.4% Se, 83.5% Sp, 85.3% Acc
|Fiz et al. (2010) ||MLP||Classification|
|No SAHS vs. SAHS|
AHI ≥5 e/h
AHI ≥15 e/h
|Time and spectral features from snoring recordings||87.0% Se, 71.0% Sp|
80.0% Se, 90.0% Sp
|Nguyen and Won (2015) ||f-MLP|
|Normal breathing vs. snoring||Spectral content snoring recordings||96.0% overall Acc|
82.0% overall Acc
|Tagluk et al. (2011) ||MLP||Classification|
|Normal vs. SAHS EEG epochs||Bispectral analysis of EEG||94.1%Se, 98.2%Sp, 96.2%Acc|
|Liu et al. (2008) ||ART2||Classification|
|Healthy vs. SAHS subjects|
(AHI ≥5 e/h)
|EEG energy in theta (Fourier transform) and pupil size||91.0% Acc|
|Lin et al. (2006) ||FFBB||Classification|
|No SAHS vs. SAHS epochs||EEG power in delta, theta, alpha, and beta using DWT||69.64% Se, 44.44% Sp|
|Akṣahin et al. (2012) ||FFBB|
|Obstructive/central/healthy patients||Coherence and mutual information of EEG||0.1450 MRAE error|
0.3692 MRAE error
0.2282 MRAE error
|Part III – Features from thoracic/abdominal effort, airflow, and combined features|
|Fontela et al. (2005) ||BY-MLP||Classification|
|Obstructive/central/mixed||Wavelet decomposition of thoracic effort||83.78% Acc (overall)|
80.90% Acc (obstr.)
89.95% Acc (centr.)
80.48% Acc (mixed)
|Tagluk et al. (2010) ||FFBB||Classification|
|Obstructive/central/mixed||Wavelet decomposition of abdominal effort||78.5% Acc (overall)|
73.42% Acc (obstr.)
94.23% Acc (centr.)
66.16% Acc (mixed)
|Berdiñas et al. (2012) ||Ensemble ANNs||Classification|
|Obstructive/central/mixed||Wavelet decomposition of thoracic effort||90.27% Acc (overall)|
94.62% Acc (obstr.)
95.47% Acc (centr.)
90.45% Acc (mixed)
|Weinreich et al. (2008) ||FFBB||Classification|
|OA/OH/CSR/normal breathing||Spectral entropy of airflow||91.5% Acc (overall)|
90.2% Se, 90.9% Sp
(OA vs. CSR)
91.3% Se, 94.6% Sp
(OH vs. normal)
|Várady et al. (2002) ||FFBB||Classification|
|Normal/apnea/hypopnea||IRA and IRI from airflow and RIP||93.0% Acc (overall)|
98.4% Se, 94.0% Sp (normal)
78.7% Se, 91.0% Sp (hypopnea)
97.0% Se, 88.7% Sp (apnea)
|Belal et al. (2011) ||MLP||Classification|
|Non-apneic vs. apneic event||Correlation and PCA of HR, RR, and SpO2||81.8% Se, 75.8% Sp, 76.8% Acc|
|Part IV – ANNs for regression|
|Marcos et al. (2012) ||MLP||Regression||AHI estimation||Spectral and nonlinear features from SpO2||ICC = 0.91|
cutoff 5 e/h
91.8% Se–58.8% Sp
cutoff 10 e/h
89.6% Se–81.3% Sp
cutoff 15 e/h
94.9% Se–90.9% Sp
|Gutiérrez-Tobal et al. (2013) ||MLP|
|Regression||AHI||Statistical, spectral, nonlinear features from airflow||ICC = 0.849 ± 0.002|
cutoff 10 e/h
92.5% Se, 89.5% Sp, 91.5% Acc
ICC = 0.748 ± 0.037
cutoff 10 e/h
92.5% Se, 57.9% Sp, 81.4%Acc
|de Silva et al. (2011) ||FFBB||Regression||AHI||Pitch, formant, and structure-based features from snoring sounds and the neck circumference||Cutoff 15 e/h|
91 ± 6% Se, 89 ± 5% Sp
Cutoff 30 e/h
86 ± 9% Se, 88 ± 5% Sp
|de Silva et al. (2012) ||FFBB||Regression||AHI||Pitch, formant, and structure-based features from snoring sounds and the neck circumference||Female, AHI ≥ 15 e/h|
91 ± 10% Se, 88 ± 5% Sp
Male, AHI ≥ 15 e/h
91 ± 6% Se, 89 ± 5% Sp
Comb., AHI ≥ 15 e/h
84 ± 10% Se, 83 ± 13% Sp
|Emoto et al. (2012) ||MLP||Regression||Breathing sound signal||Preceding samples of the breathing signal||89.2% average Se|
87.4% average Sp
Recent studies have built updated predictive models based on anthropometric and clinical data, since characteristics of patients referred nowadays to sleep units have changed compared to those of patients in the last decade. In this regard, Su et al.  proposed the multiclass Mahalanobis-Taguchi system (MMTS) and used both anthropometric information and questionnaire data in order to classify patients into normal subjects or mild, moderate, or severe SAHS patients. Additionally, LR, conventional feed-forward backpropagation FFBB and LVQ ANNs, support vector machines (SVM), C4.5 decision tree (DT), and rough set (RS) were also applied for comparison purposes. The proposed MMTS significantly outperformed the competing classifiers, reaching an average accuracy of 84.38% (normal: 87.50%; mild: 66.67%; moderate: 100%; severe: 83.33%). Particularly, FFBB and LVQ ANNs reached 34.04% (normal: 25.00%; mild: 33.33%; moderate: 11.11%; severe: 66.70%) and 47.22% (normal: 50.00%; mild: 16.67%; moderate: 22.22%; severe: 100%) overall accuracy, respectively. Similarly, in a recent study carried out by Wang et al.  several automated classifiers fed with anthropometric and questionnaire-based variables were also assessed to predict SAHS. The authors propose a novel classifier based on fuzzy decision trees (FDT) to detect SAHS. In addition, LR, ANNs (backpropagation and LVQ), a SVM, and a conventional DT were used as benchmarks for comparison purposes. The proposed FDT achieved the highest performance (81.82% accuracy, 0.554 kappa, and 0.673 geometric mean). However, a synthetic oversampling approach (SMOTE) was used to deal with the common imbalance between SAHS positive and SAHS negative classes, which was not used in the remaining benchmark methods. Without SMOTE, FDTs slightly outperformed the backpropagation ANN (48.22% vs. 47.53% accuracy, 0.186 vs. 0.175 kappa, and 0.300 vs. 0.288 geometric mean), whereas the highest precision was achieved by the conventional LR approach (49.57% accuracy, 0.207 kappa, and 0.320 geometric mean). Karamanli et al. recently assessed a MLP ANN trained to classify healthy and SAHS patients using sex, age, BMI, and snoring status as input variables, reporting 86.6% accuracy . Nevertheless, it is important to highlight that input features derived automatically from cardiorespiratory and/or neuromuscular signals have been used predominantly, while anthropometric and clinical variables have been used marginally.
In the study by Polat et al. , different expert systems were assessed to classify patients with suspicion of SAHS using clinical features derived from in-lab polysomnography, including the arousal index and the AHI. A FFBB ANN reached 100% sensitivity, 93.55% specificity, 95.12% accuracy, and 0.96 AUC, slightly lower and more unbalanced than a DT-based classifier (91.67% sensitivity, 96.55% specificity, 95.12% accuracy, and 0.97 AUC). This work assessed the usefulness of different expert systems in the context of SAHS, although using input variables computed from the whole PSG study limits its ability as screening test for the disease. Similarly, Ghandeharioun et al.  trained a 4-class SOM to classify patients suspected of suffering from SAHS into healthy, mild, moderate, and severe categories using PSG-derived and anthropometric variables. The proposed algorithm reached 94.2% sensitivity, 97.8% specificity, and 96.5% accuracy, although neither validation nor test stages were described.
Regarding SAHS diagnosis by means of ANNs, the SpO2 signal from nocturnal oximetry is probably the most widely used biomedical data source. In the study by Marcos et al. , approximate entropy (ApEn), central tendency measure (CTM), and Lempel-Ziv complexity were applied to the SpO2 nocturnal profile to estimate irregularity, variability, and complexity, respectively. These nonlinear measures composed the input feature patterns to feed a MLP ANN for SAHS binary classification. A sensitivity of 89.8%, specificity of 79.4%, and accuracy of 85.5% were obtained in an independent test set, significantly improving the diagnostic performance of conventional oximetric indices. The same authors reached similar diagnostic performance using a RBF ANN in the same context : average accuracies of 86.1 ± 1.1% (89.4 ± 1.6% sensitivity, 81.4 ± 1.7% specificity), 84.7±1.2% (86.6 ± 2.8% sensitivity, 81.9 ± 2.0% specificity), and 85.5 ± 0.0% (89.8 ± 0.0% sensitivity, 79.4 ± 0.0% specificity) were achieved using k-means, fuzzy c-means, and orthogonal least squares kernels, respectively. An MLP ANN was also assessed in the study by Almazaydeh et al.  to perform binary classification. The ANN was fed with the conventional oxygen desaturation index of 3% (ODI3), the delta index, and the CTM from overnight oximetry recordings, reaching 87.5% sensitivity, 100% specificity, and 93.3% accuracy in a test set from the publicly available PhysioNet dataset.
Bayesian training has been applied to deal with overfitting of ANNs. In addition, Bayesian inference also allows the user to measure quantitatively the influence of each input feature in the output of the model. The effectiveness of this approach was assessed in the study by Marcos et al. . A sensitivity of 87.76%, specificity of 82.39%, and accuracy of 85.58% were reached, significantly improving the performance achieved using the conventional maximum likelihood criterion (86.42% sensitivity, 62.83% specificity, and 76.81% accuracy). Similarly, Sánchez-Morillo et al.  applied a feedforward probabilistic ANN to classify patients into SAHS negative or SAHS positive using time, stochastic, spectral, and nonlinear features from nocturnal SpO2 recordings. A sensitivity of 92.42%, specificity of 95.92%, and accuracy of 93.91% were reached in a single training set using leave-one-out cross-validation. In a recent study by Huang et al. , the automated analysis of the oxygen desaturation index of 4% (ODI4) from oximetry by means of a DT was proposed as an abbreviated method for SAHS screening. In this work, the authors assessed several pattern recognition techniques for automated diagnosis, including some ANNs, such as conventional backpropagation, (LVQ), and adaptive network-based fuzzy inference system (ANFIS). The proposed DT reached 98.67% sensitivity, 90.67 specificity, and 94.67% accuracy, outperforming backpropagation (88.00% sensitivity, 93.33% specificity, 90.67% accuracy), ANFIS (90.67% sensitivity, 86.00% specificity, 88.33% accuracy), and LVQ (80.67% sensitivity, 79.33% specificity, 80.00% accuracy) ANNs. In this study, conventional LR and k-nearest neighbors (k-NN) combined with genetic algorithms (GAs) and particle swarm optimization (PSO) also outperformed ANNs.
ECG recordings have been also widely used to assist in SAHS diagnosis. In the study by Khandoker et al. , the spectral content of HRV and ECG-derived respiration (EDR) time series from single-lead ECG recordings were analyzed by means of the wavelet transform. The authors proposed a binary SVM for classification (healthy vs. SAHS) and compared its performance with LDA, k-NN, and PNN. The proposed SVM classifier reached 100% accuracy in the test set, whereas the PNN showed poor classification performance (80% sensitivity, 50% specificity, and 70% accuracy) probably due to a suboptimal setting of the spread parameter (σ) of the Gaussian function. In a previous study by Khandoker et al. , the authors analyzed ECG short-term epochs from nocturnal PSG by means of wavelet decomposition to classify segments into normal breathing, obstructive apnea, and central apnea using a feedforward ANN. The authors reported accuracies of 95.10% in the classification of apneic and normal breathing epochs, 83.40% in the detection of hypopneas, and 98.96% in the classification of obstructive and central apneas. Similarly, Acharya et al.  implemented a FFBB ANN using nonlinear measures from the ECG (ApEn, fractal dimension, correlation dimension, largest Lyapunov exponent, and Hurst exponent) to detect apneas, hypopneas, and normal breathing segments. The proposed ANN reached 99.1% accuracy (95.0% sensitivity, 100% specificity), 96.5% accuracy (88.0% sensitivity, 90.0% specificity), and 87.8% accuracy (80.0% sensitivity, 89.5% specificity) in the classification of normal breathing, apneas, and hypopneas, respectively. Lweesky et al.  focused on the characterization of the P-wave of the ECG in order to feed an ANN aimed at discerning between apnea and normal breathing. The authors reported 90.0% sensitivity, 94.2% specificity, and 92.0% accuracy. In a previous study by Méndez et al. , both time and spectral features from the R-to-R interval (RRi) and QRS area time series were used as inputs to a FFBB ANN aimed at discriminating between apneic and nonapneic segments. A sensitivity of 89%, specificity of 86%, and accuracy of 88% were reached in a minute-by-minute classification, whereas 100% accuracy was achieved when the whole recording is classified as normal or apneic. In a recent study, Nguyen et al.  proposed a binary ANN to differentiate apnea from normal sleep based on a hear rate complexity measure by means of the recurrence quantification analysis of HRV recordings. In addition, a SVM classifier and an ensemble combining the decisions from both binary classifiers by means of a confidence score (the weighted sum of the output scores of all binary classifiers) were also assessed. The ensemble reached the highest performance (86.37% sensitivity, 83.47% specificity, 85.26% accuracy), whereas single ANN (85.57% sensitivity, 79.09% specificity, 83.23% accuracy) and the SVM (93.72% sensitivity, 65.88% specificity, 84.14% accuracy) classifiers reached slightly lower accuracy but with an unbalanced sensitivity-specificity pair.
ANNs have been also involved in the detection and characterization of snoring and its reliability in SAHS diagnosis. In the study by Fiz et al. , a total of 22 features from time and frequency domains (number of snore episodes, average intensity, and power spectral density parameters) were used as inputs to a MLP ANN. A sensitivity of 87% and a specificity of 71% were achieved using a SAHS cutoff of 5 e/h, whereas 80% sensitivity and 90% specificity were reached for a cutoff of 15 e/h. In a recent study, Nguyen and Won  proposed a novel correlational filter ANN (f-MLP) to distinguish normal breathing patterns from snoring patterns during sleep. This ANN implements a correlational filter operation in the frequency domain in a first hidden layer aimed at improving the discriminant power of the spectral content of input patterns, followed by a second feedforward hidden layer. In this study, the authors reported that the f-MLP classifier reached an average accuracy of 96%, outperforming the conventional MLP approach (82% average accuracy).
EEG signals from nocturnal PSG and ANNs have been also used to detect SAHS. Tagluk et al.  estimated the quadratic phase coupling of EEG (C3-A2) using bispectral analysis and trained a MLP ANN to detect patients with SAHS. An overall diagnostic accuracy of 96.15% was reached. In the study by Liu et al. , both the EEG energy in the theta band and the pupil size were used as inputs to an ANN aimed at discriminating between SAHS patients and healthy subjects. The authors reported 91% overall accuracy in the classification of both groups. Similarly, in the study by Lin et al. , the EEG (C3-O1) signal power in the common frequency bands delta, theta, alpha, and beta were estimated by means of the discrete wavelet transform (DWT) and subsequently used to train a FFBB ANN in order to identify SAHS episodes. A sensitivity of 69.64% and a specificity of 44.44% were obtained. The EEG signal has been also used to classify apnea events into obstructive or central. Akṣahin et al. computed the synchronization (coherence and mutual information) between EEG channels (C4-A1 and C3-A2) and fed three different ANN-based binary classifiers: conventional FFBB, RBF, and distributed time-delay (DTD) ANNs . The conventional FFBB ANN reached the highest performance in terms of the mean relative absolute error (MRAE = 0.145).
Features from both thoracic and abdominal effort signals have been also used to classify sleep apneas into obstructive, central, and mixed by means of ANNs. In the study by Fontela-Romero et al. , the wavelet coefficients from the DWT of the thoracic effort signal feed a Bayesian feedforward ANN, which achieved a mean accuracy of 83.78 ± 1.90%. Similarly, Tagluk et al.  analyzed the abdominal respiration signal by means of the wavelet transform and fed a FFBB ANN aimed at classifying apneic events into obstructive, central, and mixed. The proposed methodology achieved an overall accuracy of 78.5% (obstructive: 73.42%; central: 94.23%; mixed: 66.16%). In a recent study by Guijarro-Berdiñas et al. , the thoracic effort signal was used to reach the same goal. The DWT was applied to analyze the frequency content of the signal. The wavelet coefficients compose the input patterns of an ensemble of ANNs, which achieved an overall accuracy of 90.27 ± 0.79% (obstructive: 94.62%; central: 95.47%; mixed: 90.45%).
In the study by Weinreich et al. , the spectral entropy was used to analyze the frequency content of airflow recordings and feed an ANN trained to discern among SAHS, Cheyne-Stokes respiration, and normal breathing. An overall accuracy of 91.5% was reached in the classification of airflow patterns into obstructive apneas, periodic respiration, and normal breathing during non-REM sleep. Similarly, Várady et al.  trained a feedforward ANN to detect apneic events using respiratory signals. Data from both airflow and respiratory inductance plethysmography were used as inputs to the ANN. Up to 93% of input respiratory patterns were correctly classified into normal, apnea, or hypopnea, although no validation was performed.
ANNs have been also used to combine features from different biomedical recordings. In the study by Belal et al. , the correlation coefficients between the heart rate (HR), respiratory rate (RR), and SpO2 signals were computed to detect apnea events in preterm infants in real time. Principal component analysis (PCA) was applied to the correlation coefficients and the components accounting for the 70% of the total variance of the input data fed the MLP ANN, yielding 81.85% sensitivity, 75.83% specificity, and 76.78% accuracy.
It is noteworthy that most studies in the context of SAHS use ANNs for classification purposes, whereas only a few studies apply regression ANNs to estimate the AHI. This is a more challenging task but also a more useful approach, since the AHI is currently a standardized parameter widely used by physicians to assess SAHS severity and to decide whether the CPAP treatment could be effective. In the aforementioned study by El-Solh et al. , the authors compared the agreement of two automated regression approaches with the actual AHI from PSG. Multiple linear regression (MLR) and a regression MLP ANN, both trained with anthropometric and clinical variables, were assessed. Significantly higher correlation was reached using the MLP ANN (0.852 vs. 0.509). In the same way, Marcos et al.  used spectral and nonlinear features from nocturnal SpO2 recordings to feed a regression MLP ANN. High intraclass correlation coefficient was reported (ICC = 0.91), which outperformed the conventional MLR approach (ICC = 0.80). Similarly, in a recent study by Gutiérrez-Tobal et al. , regression MLP and RBF ANNs were trained to estimate the AHI from PSG using statistical, spectral, and nonlinear features derived from the airflow signal (thermistor). The estimated AHI from the MLP network reached the highest agreement with the PSG-derived AHI (ICC = 0.849 ± 0.002), improving both the RBF and the conventional MLR models.
A snore-based approach has been proposed by de Silva et al.  in order to estimate the actual AHI from PSG. Features from the automated analysis of snoring recordings (pitch, first formant, and the quantified recurrence probability density entropy) and the neck circumference were used as inputs to a FFBB ANN to predict the AHI. Averaged 91 ± 6% sensitivity and 89 ± 5% specificity were obtained using a cutoff of 15 e/h for positive SAHS, whereas for a cutoff of 30 e/h, 86 ± 9% sensitivity and 88 ± 5% specificity were achieved. In a similar subsequent study, de Silva et al.  proposed this methodology to characterize differences in snoring sounds due to gender and assessed its influence on the performance of a snore-based SAHS screening model. Using an output threshold of 15 e/h, the gender-dependent regression ANN resulted in increased sensitivity (up to 7% higher) and specificity (up to 6% higher) values compared with the gender-neutral model. In the study by Emoto et al. , a MLP ANN was used to predict the current value of the breathing sound signal using the preceding samples, i.e., the target output is the current sample, whereas the d-dimensional input feature pattern is composed by the preceding d samples of the breathing signal. In this way, the ANN was applied to distinguish snoring events from normal breathing comparing the network output with an optimized threshold. The proposed method reached an average sensitivity and specificity values of 89.2 and 87.4%, respectively.
In order to identify and quantify the number of respiratory events per hour of sleep and derive the AHI, several neuromuscular and cardiorespiratory recordings from the overnight PSG have to be analyzed. However, the interpretation of a PSG is a complex and laborious task even for trained personnel. In this regard, ANNs have demonstrated to be reliable as well as accurate tools to analyze both the macrostructure (automated sleep staging) and the microstructure (transient pattern detection) of sleep . In the context of sleep staging, nonlinear dynamic measures from EEG in combination with pattern classification algorithms have demonstrated to reach clinically significant results in sleep disorders diagnosis, treatment monitoring, and drug efficacy assessment . In fact, a number of automated algorithms are currently implemented into commercialized software tools for PSG analysis. Nevertheless, the performance of automated pattern recognition algorithms varies greatly depending on the number of stages involved in the classification task, from 2 (wake vs. sleep) to 5 (wake, REM, N1-N3) states (6 classes if the conventional Rechtschaffen and Kales classification is used). In addition, the accuracy is also influenced by the number and kind of recordings involved in the classification task (EEG, EOG, and/or EMG). Table 2 summarizes the main characteristics of significant studies focused on applications of ANNs for automated sleep staging, arousal quantification, and drowsiness detection.
|Author (year) Ref.||ANN model||Purpose||Target function/class(es)||Input variables||Performance metrics|
|Part I – Automated sleep staging and transient pattern detection|
|Becq et al. (2005) ||MLP||6-class classification||Wake/NREM 1-4/REM||EEG (C3-A2) (overall variance, relative power)|
EMG (overall variance)
|ER: 28 ± 2%|
|Ventouras et al. (2005) ||MLP||Binary classification||Sleep spindle detection||Single channel EEG (Cz)||80.2% Se, 95.0% Sp|
|Caffarel et al. (2006) ||NS||4-class|
|Wake/light sleep/deep sleep/REM|
Wake vs. sleep
|EEG (Cz-A1)||k = 0.305|
k = 0.449
|Ebrahimi et al. (2008) ||NS||4-class classification||Wake/NREM1+REM/NREM2/SWS||Wavelet decomposition of single channel EEG||84.2% Se, 94.4% Sp, 93.0% Acc|
|Sinha (2008) ||FFBB||3-class classification||Sleep spindles (SS)/REM/Awake||Wavelet coefficients from EEG||95.35% Acc (overall)|
96.84% Acc (SS)
93.68% Acc (REM)
95.52% Acc (Awake)
|Hsu et al. (2013) |
|5-class classification||Wake/NREM1/NREM2/SWS/REM||Energy features from single-channel EEG||87.2% overall Acc|
81.1% overall Acc
81.8% overall Acc
|Shambroom et al. (2012) ||NS||Binary classification||Light vs. deed sleep|
Sleep vs. wake states
|Combined EEG/EOG/EMG activity by a single lead|
|Griessenberger et al. (2013) ||NS||Classification|
|Wake/REM/light sleep/deep sleep||Combined EEG/EOG/EMG activity by a single lead|
|72.6% overall Acc|
|Tagluk et al. (2010) ||FFBB||Classification|
|NREM 1 to 4/REM||Filtered EOG and EMG||74.7% overall Acc|
72.6% Acc (NREM1)
73.3% Acc (NREM2)
78.0% Acc (NREM3)
72.3% Acc (NREM4)
77.3% Acc (REM)
|Chapotot and Becq (2009) ||Ensemble MLP||Classification|
|Wake/N1 to N3/REM/Movement||Statistical, spectral, nonlinear features from EEG and EMG||36 ± 15% error rate|
0.48 ± 0.18 k
34% Acc (Wake)
43% Acc (N1)
51% Acc (N2)
82% Acc (N3)
82% Acc (REM)
13% Acc (Mov.)
|Charbonnier et al. (2011) ||Ensemble MLP||Classification|
|Wake/NREM1/NREM2/SWS/REM||Time and spectral (Fourier analysis) features from EEG, EMG, and EOG||85.5% overall Acc|
78.1% Acc (Wake)
64.8% Acc (NREM1)
86.9% Acc (NREM2)
94.8% Acc (SWS)
79.3% Acc (REM)
|Álvarez-Estevez and Moret-Bonillo (2009) ||FLD|
|Arousal detection||Energy in common bands of EEG (Fourier analysis)||0.196 ± 0.015 ER|
0.195 ± 0.015 ER
0.140 ± 0.012 ER
0.092 ± 0.010 ER
|Part II – Automated drowsiness/fatigue detectors|
|Patel et al. (2011) ||FFBB||Classification|
|Alert vs. fatigue||Spectral power (Fourier analysis) of HRV||90% Acc|
|Lin et al. (2006) ||FNN||Regression||Driver’s drowsiness level estimation||Spectral power of EEG and ICA||Pearson correlation:|
0.913 ± 0.027
|Kurt et al. (2009) ||MLP||Classification|
|Awake/drowsy/sleep||Wavelet decomposition of EEG, EOG and chin-EMG||97–98% overall Acc|
|Garcés et al. (2014) ||FFBB||Classification|
|Alert vs. drowsiness||Time, spectral, and wavelet decomposition of single-lead EEG||87.4% Se, 83.6% Sp|
In the study by Becq et al. , the relative power in the common frequency bands of the EEG (C3-A2), as well as the overall variance of EEG and EMG signals, was used to feed a 6-class MLP ANN. The proposed method reached the same performance as a k-NN classifier, achieving 28 ± 2% error rate. Ventouras et al.  trained a MLP ANN to detect sleep spindles using a bandpass filtered EEG channel (Cz) without feature extraction. The classifier achieved 80.2% sensitivity and 95.0% specificity in the whole sleep record after a consensus agreement among independent scorers. In the study by Caffarel et al. , an ANN-based commercial software using a single-channel EEG (Cz-A1) was assessed. The overall agreement between automated and manual scoring was relatively low in a 4-class classification task (kappa = 0.305) and slightly better in a 2-class classification task (kappa = 0.449). In a later study by Ebrahimi et al. , wavelet decomposition and ANNs were used to perform 4-class sleep staging using the EEG signal. An overall sensitivity of 84.2 ± 3.9%, specificity of 94.4 ± 4.5%, and accuracy of 93.0 ± 4.0% were reported. Wavelet coefficients from the EEG (P3-P4) and a backpropagation ANN were also used in the study carried out by Sinha . The author reported accuracies of 96.84%, 93.68%, and 95.52% in the detection of sleep spindles, REM sleep, and awake state, respectively. More recently, Hsu et al.  computed energy-based measures from a single EEG channel (Fpz-Cz) to feed a recurrent neural classifier (RNN), which achieved an overall accuracy of 87.2% in a 5-class classification task.
Adding features from additional biomedical signals as inputs to the ANN does not seem to improve significantly the classification performance. In the study by Shambroom et al. , a commercial wireless device for automated sleep staging based on the combined activity of EEG, EOG, and EMG is assessed. The Zeo device implements an ANN that achieved 81.1% agreement for light sleep versus deep sleep classification and 93.6% agreement for sleep versus wake classification when the gold standard is a consensus between two independent expert scorers. In a subsequent similar study , the same wireless system achieved an overall agreement of 72.6% in a 4-class approach (Wake, REM, light, and deep sleep). Tagluk et al.  used bandpass filtered EOG and EMG recordings as inputs to a feedforward ANN in a 5-class classification task, achieving an overall accuracy of 74.7 ± 1.63%. Similarly, using statistical, spectral, and nonlinear features from EEG and EMG signals and an ensemble classifier based on multiple MLP ANNs, 64% overall performance was achieved for wakefulness, movement, and intermediate sleep detection, while 82% accuracy was reached for deep and paradoxical sleep detection . In the study carried out by Charbonnier et al. , 85.5% overall accuracy was reached using EEG-, EOG-, and EMG-derived features as inputs to an ensemble of 4 MLP ANN for 5-class automated sleep staging.
In the study by Álvarez-Estevez and Moret-Bonillo , two EEG channels (C2-A2 and C4-A1) and the submental EMG channel were analyzed to automatically detect arousals in the context of SAHS classification. For these signals, the energy in the conventional frequency bands was computed by means of the Fourier transform and four automated expert systems were trained: Fisher’s linear and quadratic discriminants, a SVM, and a feedforward ANN. The ANN reached the highest performance, achieving 92% accuracy and 0.0921 ± 0.0098 error rate.
Besides ANNs, it is noteworthy that several competing algorithms have been applied for automated sleep staging, such as Gaussian mixture models (88.4% overall accuracy, 6-class, EEG-based) , discrete hidden Markov models (85.29% overall accuracy, 5-class, EEG/EOG/EMG-based) , linear (73.7% overall accuracy, 4-class, HRV-based), and quadratic (63.7% overall accuracy, 4-class, HRV-based) discriminant analysis (81% accuracy, 5-class, EEG/EOG/EMG/ECG-based) [82, 83], SVM (89.39% accuracy 5-class, single EEG) , DTs (72.6% accuracy, 5-class, EEG/EOG single lead). In the same way as ANNs, these approaches are characterized by variable performance.
A relevant application of ANNs in the context of SAHS is the detection of drivers’ fatigue and/or drowsiness, which is an important issue for patients suffering from SBD. In this regard, different physiological signals have been used to monitor alertness, such as spectral analysis of HRV (90% accuracy)  and EEG (0.913 ± 0.027 correlation between actual and estimated alertness levels) , wavelet coefficients of EEG combined with features from EOG and EMG (97–98% 3-class overall accuracy) , and time, spectral, and wavelet features from single-lead EEG . Neuromuscular (EEG, EOG, EMG) and cardiac (ECG) signals have been analyzed predominantly in order to detect drowsiness, though additional physiological recordings (oximetry, skin conductance), physical measures (eye movement/blinks, face and mouth images), and driver’s performance measures (steering wheel movements) have been also proposed as inputs to different pattern recognition methods, specially Bayesian networks, SVMs, and ensembles of linear classifiers [89–91]. The main limitation of these automated algorithms is that a great amount of data is needed to perform an accurate training of the pattern recognition method. Nonetheless, alertness monitoring systems are already incorporated in many high-end vehicles.
The incorporation of automated decision support systems in the common clinical practice of SAHS diagnosis is still very limited. Conversely, the implementation of artificial intelligence-based expert systems in treatment devices for sleep-related breathing disorders therapy increased significantly during the last decade. In this regard, the exponential technological development of continuous positive airway pressure (CPAP) devices relies on the automated analysis of breathing patterns by means of expert systems, most of them based on ANNs. Currently, CPAP is the primary preferred treatment of mild, moderate, and severe SAHS and thus it is considered the standard of care. During CPAP treatment, a continuous pressure of air is delivered to the patient’s upper airway to keep patency . Though nonintrusive, simple, and effective, the device delivers an unnecessary constant high pressure during the whole night whatever the actual patient’s needs, which decreases comfort and in turn treatment compliance. This is the main limitation of CPAP and thus the most relevant improvements during the last years focused on the modulation of the pressure delivered by the device in order to fit patient’s needs. In this regard, the major companies operating in the SAHS therapy market incorporated to their devices automated algorithms to monitor and modulate the breathing gas pressure. Nevertheless, most manufactures provide no technical data about the design and implementation of their automated signal processing algorithms and thus they are blackboxes hard to interpret and assess.
As aforementioned, determining the optimal therapeutic pressure has been a major goal of research regarding CPAP treatment. Different respiratory-related signals have been assessed for automated regulation of the pressure. Airflow, SpO2 from oximetry, HRV, pharyngeal wall vibration, and snoring sounds have been involved in automated algorithms aimed at detecting airflow limitation and respiratory events. Among them, the analysis of the airflow profile is the most widely used method [93, 94]. In this regard, several algorithms have been patented during the last years, which reflect the increasing interest of leading companies in this field. In the patent by Norman et al. , a pretrained ANN fed with shape-based features from the airflow signal is used to detect the presence of airflow limitation in each individual patient’s breath. Eklund et al.  granted a patent for automatically adjusting the flow pressure when respiratory events are detected. To achieve this goal, an ANN is fed with respiration-related variables. In a recent granted patent, Waxman et al.  proposed a Large Memory Storage and Retrieval (LAMSTAR) neural network to process patient’s physiological data in order to predict breathing events and control the airway pressure level supplied to the user. This algorithm reached high prediction ability within the 30 s preceding the respiratory event . Similarly, in the patent by Hedner et al. , the authors describe a pattern recognition system based on a plurality of ANNs aimed at controlling the therapy breathing support in order to increase its effectiveness. Leading companies, such as Philips Respironics, ResMed, or Fisher & Paykel, incorporated these algorithms into their CPAP devices. Nevertheless, additional research is still needed to further assess whether these technological advances can effectively improve CPAP adherence.
Automatic detection of wake and sleep states is a novel approach for enhancing patient’s comfort [98, 99]. In the study carried out by Ayappa et al. , the authors proposed an ANN to detect irregular respiration characteristics of sleep/wake transitions. In this study, the CPAP flow signal is parameterized by means of breath timing and amplitude measures, which subsequently feed the ANN in order to detect irregular breathing. This algorithm is used in the commercial system SensAwakeTM (Fisher & Paykel, Auckland, NZ) in order to automatically decrease the therapeutic pressure when the patient is awake . This ANN has demonstrated to be effective for sleep onset and awakening detection, though there is still little if any evidence supporting its actual long-term influence on patient’s comfort and CPAP compliance.
In order to obtain the optimal CPAP pressure level for a patient, an individual titration procedure is needed. This technique is aimed at estimating the continuous pressure that normalizes the patient’s sleep and breathing during in-lab PSG, which contributes to increase the large waiting lists. Therefore, alternative methods are demanded. In this regard, El-Solh et al.  designed and trained a GRNN aimed at estimating the most effective continuous pressure using demographic and anthropometric variables (those from the Hoffstein formula, i.e., age, gender, BMI, neck circumference, and AHI). The authors reported high agreement between the optimal pressure determined by standard titration during overnight PSG and the pressure predicted by the ANN. In a later randomized study, El-Solh et al.  reported that this ANN can be effectively used to guide CPAP titration. The authors showed that automated titration procedures using this methodology reached the optimal CPAP pressure at a shorter time interval compared to conventional PSG-based titration, as well as lower titration failure.
Researchers carried out an exhaustive study during the last decades focused on the design of automated expert systems derived from artificial intelligence able to help physicians in their daily practice. Accordingly, several computer-aided decision support systems have been proposed to overcome limitations of the standard diagnostic methodology for SAHS. Among all the automated prediction methods, ANNs are probably the most widely used pattern recognition algorithm in the context of SAHS management. Their flexibility to model complex nonlinear problems and their higher generalization ability allow ANNs to reach higher performance rates both in classification and regression problems. In this regard, several applications of ANNs have been developed, such as classification of patients suspected of suffering from SAHS, AHI estimation, detection and quantification of respiratory events, apneic events classification, automated sleep staging and arousal detection, alertness monitoring systems, and airflow pressure optimization in PAP-based devices. On the other hand, the most common limitation of ANNs relates to the interpretation of the results in terms of the significance of the variables involved in the model. In this way, ANNs are most times viewed as blackboxes that are not able to generate understandable rules, which is the main weakness of neural-based classifiers. Conversely, both decision trees and probabilistic networks also reach high performance by providing interpretable rules and relationships between input variables.
Regarding input features, ANNs are able to deal with high-dimensional spaces composed of several features. This is especially useful when working with a lot of data sources providing information about the problem under study, such as symptoms reported by the patient, physical examination, sleep questionnaires, or PSG, among others. However, it is important to highlight that, sometimes, researchers try to compose a wide initial feature set in order to gather as much information as possible, including features from signal processing algorithms regardless of their relevance or clinical meaning. In this way, feature selection strategies are very useful to distinguish the more significant ones. In addition, dimensionality reduction algorithms allow ANNs to deal with the curse of dimensionality problem and to control overfitting. Nevertheless, just a few studies apply feature selection techniques before the classification stage.
ANNs have yielded reliable and accurate applications in the context of SAHS detection. Nevertheless, it is noteworthy that, in the last years, there is a trend to use different pattern recognition algorithms, particularly SVMs and ensemble classifiers. SVMs have emerged as powerful tools able to achieve significantly high performance both in classification and regression problems. They are kernel-based maximum margin classifiers, i.e., the decision boundary is determined by a subset of the training data samples in a transformed space in which the margin (the distance between the boundary and the closest samples) is maximized. In this way, the optimization problem is relatively straightforward . Several recent studies have demonstrated the usefulness of SVMs in the framework of SAHS management [102–105]. Moreover, in the present research, some studies were reviewed reporting that SVM-based classifiers reached higher accuracy than ANNs [23, 45, 52]. Unlike ANNs, SVMs are capable to minimize both structural and empirical risk, leading to higher generalization ability even when working with limited training datasets . On the other hand, they are also characterized as blackboxes and usually higher computational time is needed to optimize the classifier . Unfortunately, there are few studies assessing the performance of different classification approaches in the same conditions (population under study and equal optimization of input parameters), leading to biased results and poor generalization. Open access databases, such as the Physionet or the Sleep Heart Health Study (SHHS), provide a common benchmark to properly assess the performance of different methodologies using the same data. Nevertheless, these databases are limited and most studies are carried out using no publicly available datasets, which restricts comparisons.
In addition, it is also noteworthy that ensemble classifiers, from the simplest majority vote to the more complex bagging, boosting, and stacking algorithms, have been recently introduced in the context of SAHS in order to improve classification performance [106, 107]. It is obvious that misclassified samples are not always the same when using different classification algorithms. Accordingly, improved performance may be reached when working with several classifiers at the same time. In this way, ensemble algorithms take advantage of the information provided by all the classifiers involved in the classification or regression task. The studies by Guijarro-Berdiñas et al.  and Nguyen et al.  demonstrated the reliability and efficacy of ANN-based ensembles. Nevertheless, further research is still need in order to exploit the full potential of this approach in the context of SAHS diagnosis.
This research has been partially supported by projects 153/2015 and 158/2015 of the Sociedad Española de Neumología y Cirugía Torácica (SEPAR), the project RTC-2015-3446-1 from the Ministerio de Economía y Competitividad and the European Regional Development Fund (FEDER), and the project VA037U16 from the Consejería de Educación de la Junta de Castilla y León and FEDER. D. Álvarez was in receipt of a Juan de la Cierva grant from the Ministerio de Economía y Competitividad.
© 2017 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
707total chapter downloads
Login to your personal dashboard for more detailed statistics on your publications.Access personal reporting
Edited by Mayank Vats
By Rajiv Garg, Anand Srivastava and Jagadeesha N. Halekote
Edited by Mayank Vats
By Ma. Eugenia Manjarrez-Zavala, Dora Patricia Rosete-Olvera, Luis Horacio Gutiérrez-González, Rodolfo Ocadiz-Delgado and Carlos Cabello-Gutiérrez
We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.More about us