Open access peer-reviewed chapter

Deep Network Model and Regression Analysis Using OLS Method for Predicting Lung Vital Capacity

Written By

Harun Sümbül

Submitted: 18 February 2022 Reviewed: 29 March 2022 Published: 18 May 2022

DOI: 10.5772/intechopen.104737

From the Edited Volume

Data and Decision Sciences - Recent Advances and Applications

Edited by Tien M. Nguyen

Chapter metrics overview

187 Chapter Downloads

View Full Metrics

Abstract

With the advancement of technology, many new devices and methods with machine learning and artificial intelligence (ML-AI) have been developed and these methods have begun to play an important role in human life. ML-AI technology is now widely used in many applications such as security, military, communications, bioengineering, medical treatment, food industry, and robotics. In this chapter, deep learning methods and medical usage techniques that have become popular in recent years will be discussed. Experimental and simulation results and a comprehensive example of the biomedical use of the deep network model will be presented. In addition, the regression analysis using the ordinary least squares (OLS) method for estimating lung vital capacity (VC) will be discussed. The simulation results showed that the VC parameter was predicted with higher than 90% accuracy using the proposed deep network model with real data.

Keywords

  • deep network
  • data analytics
  • modeling
  • simulation
  • vital capacity
  • MLPFFNN
  • artificial intelligent
  • OLS
  • regression
  • spirometer

1. Introduction

Due to the new coronavirus disease (COVID-19) epidemic, which has recently spread all over the world and has been declared as a global pandemic by WHO (World Health Organization), many people have been adversely affected. The COVID-19 outbreak damages many organs, especially the lungs, and this poses an extra-large risk for major diseases such as chronic obstructive pulmonary disease (COPD).

COPD is a common public disease known worldwide, and its lethal power is increasing day by day [1]. COPD disease is closely related to lung volumes and capacities [2, 3]. Vital capacity (VC) is one of the most important lung parameters [4]. VC is the maximum volume of air taken into and out of the lungs during breathing [5]. The correct measurement of VC is of greatest importance to provide insights on the diagnostic about lung-related obstructive, restrictive, or mixed diseases. Moreover, knowledge of lung volume changes is very important to track the history of many restrictive and obstructive lung problems and their response to various treatments [6].

VC measurements are used for monitoring diseases [7]. For example, chronic reduction in VC may reduce lung compliance. These physiological changes increase the load on the emaciated respiratory muscles and eventually cause a vicious circle of respiratory dysfunction [8]. The VC value is also related to the disease in amyotrophic lateral sclerosis (ALS) and can provide information about the level of the disease [9]. Measurements of VC are used to predict disease progression and to help identify diaphragm dysfunction (DD) [10]. VC is also a very important parameter used for mechanical noninvasive ventilation [11].

A healthy adult’s VC ranges from approximately 3000 to 5000 ml liters [12]. The parameters that affect a person’s VC can be listed as age, gender, height, mass, and ethnicity, respectively [13]. Many studies in the literature have been done to measure and predict VC [14]. Lung capacity estimates are also made on radiographic images [15]. But spirometer is the most used device to measure VC [16]. The spirometer is the most commonly used device for collecting information about lung-related diseases [17].

Figure 1 shows the lung volume and capacity parameters measured with the aid of a spirometer [18].

Figure 1.

Lung volume and capacity parameters.

Although the spirometer is currently the most commonly used measurement tool to measure lung parameters, many of them are not portable (the prices of portable ones are very high), their use requires a technician, it cannot be used in the home environment, since the printouts are on paper instead of recording in digital media, and has a high cost.

Machine learning (ML) algorithms such as artificial intelligence (AI), fuzzy logic (FL) are widely used techniques in biomedical studies. Particularly, deep learning (DL) models, due to their superior performance in prediction and classification problems, have become quite popular recently [19]. As a result of the extensive literature search, a VC prediction study with a deep network structure has not been encountered. In this chapter, unlike the literature, the VC parameter value was successfully estimated using the person’s age, height, and weight information using the deep learning technique. This value was measured on real patients using a medical spirometer.

The objective of this chapter is twofold, namely, to:

a. Develop a deep model algorithm that can predict the VC parameter without the need for the spirometer,

b. Understand and measure the effect of input variables on VC from the developed model using OLS regression methods.

Advertisement

2. Materials and methods

2.1 Spirometric measurements and dataset

The performance success of the DL architecture is closely related to the dataset chosen as its input. The success of DL architectures trained with a large number of data is higher than those with a smaller volume of data. It is known that this type of DL architecture gives positive results in models with sufficient examples but not very deep architecture.

In this book chapter, we collected a biomedical dataset, including VC, age, height, and weight parameters of normal subjects. About 491 healthy subjects (363 males and 128 females, aged between 21 and 61) were selected for this data collection. To measure the VC parameter, the breathing performance of the 50 s was recorded from each patient. Thus, a data record of 6.82 h was created in the database. The measurements were performed using a biomedical spirometer device (Fukuda Sangyo brand spiroanalyz ST-75) at our Biomedical Device Technology Laboratory (BCT Lab). Related information about the patients in the study group is summarized in Table 1.

Patient No.Height
(cm)
Weight
(kg)
Gender
(F/M)
BMI
(kg/m2)
AgeVC
(L)
116186M33.18346.2
216678M28.31286.3
317491M30.06327.7
417386F28.73285.8
516674F26.85236.4
617093M32.18246.9
717991M28.4317.8
.......
.......
.......
49116979F27.66296.8

Table 1.

General features of chosen patients.

The spirometric measurement system and the device used are shown in Figure 2.

Figure 2.

The spirometric measurement system.

At the end of the measurements, a new biomedical dataset consisting of a total of 1.964 data including age, height, and weight, and VC parameters were created. Thus, a data frame with a total of 1.964 data (with a total of 4 columns, each column consisting of 491 values) was created. Figure 3 shows some sample breathing performance signals of the present dataset.

Figure 3.

Sample VC signals from the subject dataset used in this study.

2.2 Proposed deep network model

Neural networks that have more than one hidden layer are called deep neural networks [20]. Deep networks mean that there are multiple hidden layers within the adaptive neural network (ANN) architecture. ANN algorithms can be adapted to many areas and are widely used in many different fields [21]. ANN structurally consists of three layers (input, hidden, and output layer). Neurons between layers are linked to each other by specific pathways that have a certain weight value [22]. In this chapter, as the ANN algorithm, a multilayer perceptron feed-forward neural network (MLPFFNN) was preferred [23]. In MLPFFNN model, data flow occurs in one direction from the input layer to the output layer [24]. The article aims to create a reliable deep model for predicting VC. Weight, height, and age parameters (independent variables) were given as input parameters to the model. The VC value (dependent variable) was taken as output.

The designed MLPFFNN is multilayer and there are many neurons in each layer except the output layer. The number of neurons in the output layer is 1. The number of neurons in the hidden layers was observed gradually between 1 and 70 and by trial and error, and the most ideal multi-neuron network structure was selected. The best result was achieved in 1000 repetitions (the number of repetitions was increased by 50 steps). In the intended deep network model, the mini-batch size is 40. The effect of the mini-batch size on the model performance was examined and after various trials, it was decided that this number was the most ideal. The learning rate was chosen as 0.0012 and gradient reduction as 0.85. Adaptive Moment Estimation (Adam) algorithm is used as the optimizer. The Rectifier Linear Unit (ReLU), which is one of the most commonly used activation functions, is used as an activation function [25], and it is defined as follows:

fx=max0xE1

where x is the weighted sum of the inputs and f (x) is the activation function. The function output is between 0 and the maximum value. The designed MLPFFNN structure is shown in Figure 4.where, i, j, and k are layers within the model, b1, b2; biases, w1, w2; weights, K, L, M; the number of neurons in the layers, VC; vital capacity.

Figure 4.

MLPFFNN architecture.

2.3 Training of the proposed MLPFFNN model

In this section, the hyperparameter parameters including learning rate, verbose, batch size, the number of iterations, and epoch size are determined. Adam’s optimization algorithm was selected for backpropagation and updating of model weights. To avoid overfitting, the number of epochs was adjusted. Thanks to the dropout feature added to the model, overfitting was prevented. In the drop layer, some nodes of the network are removed to prevent the network from being dependent on a particular neuron. Thanks to “letting go,” the network can be forced to learn correctly even in the absence of certain information [26]. The learning rate was reduced gradually. The model algorithm was trained and tested using the 66–33% training and a testing data partition as 329 data, 162 data, respectively. Performance metrics of the proposed model are shown in Figure 5. Here, we can see the variation of validation loss (val_loss) compared to training loss (loss). The loss shows how close the neural network is to the optimum. One of the differences between val_loss and loss is that, when using dropout, validation loss can be lower than training loss (usually not expected in cases where dropout is not used). The values for loss are similar. The general loss decreases after almost every epoch and approaches the value 0, whereas the val_loss stagnates.

Figure 5.

Training and verification graphs with loss of the model depending on the number of epochs.

2.4 Regression algorithms

Regression algorithms estimate the output parameter based on the input parameters. OLS is a type of least-squares method used to predict undefined states in a regression model. In the OLS method, in light of the least-squares principle, the sum of the squares of the differences between the dependent variable and the predicted in the given data set is minimized. The differences obtained are aimed to be minimal.

In the OLS model used in the study, the relevance between the dependent variable (VC) and the independent variables (age, height, and weight) was investigated using equation Eq. (2).

A=α0+α1X1+α2X2+α3X3+eE2

where A is VC; X1…X3 symbolized age, height, and weight variables, respectively; α0 is the bias and α1…α3 are the coefficients of the variables; and e is the error parameter [27]. In this study, the relationship between the real values (measured) and the predicted values found by the model was examined by some popular regression methods based on the OLS algorithm.

2.4.1 Multi-linear regression (MLR)

MLR is a statistics-based analysis technique that is widely used in output variable estimation using different variables. The purpose of MLR is to model the linear relationship between the independent variables and the dependent variable.

2.4.2 Polynomial regression (PR)

PR model parameters (X, Y, b, and e) can represent in matrix form, as design input matrix, output response vector, dependent parameters vector, and random error, respectively, as given in Eq. (3) [28].

Y=b0+b1X+b2Y+b3X2+b4XY+b5Y2+eE3

2.4.3 Support vector regression (SVR)

It is known that the SVR algorithm is a very powerful instrument in real value estimation studies [29]. General SVR estimation functions as given in Eq. (4).

fx=w.Φx+bE4

where w and b are the weight coefficient and the bias coefficient, respectively [30].

2.4.4 Decision tree regression (DTR)

DTR is a supervised learning method used for classification and regression. The decision tree-based regression algorithm can provide close to optimum distribution decisions [31].

2.4.5 Random forest regression (RFR)

RFR is a group learning algorithm based on decision trees. Random forests for regression are formed by growing trees depending on a random vector [32]. The output values are numerical and it is assumed that the training set is independently drawn from the distribution of the random vector Y, X. h (x) and E represents the tree predictor and the mean-squared generalization error for any numerical predictor, respectively. The mean-squared generalization error h(x) is given in Eq. (5).

EX,YYhX2E5

In regression tasks, the mean prediction of K regression trees, hk(x) is calculated to obtain the random forest prediction is given in Eq. (6).;

RFRprediction=1Kk=1KhkxE6

2.5 Model Performance Evaluation

To evaluate our proposed method, accuracy is calculated using Eq. (7).

Accuracy=TP+TNTP+TN+FP+FNE7

TP, TN, FP, and FN in Eq. (6) are true positives, true negatives, false positives, and false negatives, respectively [33]. A confusion matrix has been formed for calculating the performances of the model used in the study.

Advertisement

3. Results

In this book chapter, the multiple-layer perceptron neural network (MLPFFNN) was selected for the ANN implementation. In the selected MLPFFNN design, the best result was achieved in 1000 repetitions (the number of repetitions was increased by 50 steps) with a mini-batch size of 40, a learning ratio of 0.0012, and the gradient reduction of 0.85. The simulation environment is Python 3.8.5(64 bit). Figure 6 shows the predictive VC values found by the model versus the actual VC values.

Figure 6.

The graphical comparison of results.

Statistically, the actual value is the value that is obtained by observation or by measuring the available data. The predicted value is the value of the variable predicted based on the regression analysis.

As a result of the graphical comparison, it can be easily seen that the estimated VC values of all the participants participating in the study watch very close to the actually measured (with spirometer) VC values.

When 3-parameter OLS models are examined in terms of R-squared, it is seen that the best OLS result is obtained with Multi-Linear Regression (0.946). Three-parameter OLS models result in terms of R-squared is given in Table 2.

R-squaredMLRPRSVRDTRRFR
0.9480.7940.8740.7750.889

Table 2.

Three-parameter OLS models result in terms of R-squared.

The proximity between the predicted results of the model and the actual values measured with the Spirometer is shown in Figure 7. The results are quite close to each other. This figure shows the scattering of the predicted and actual values relative to each other. Accordingly, the blue dots on the figure show the data series linear regression situation.

Figure 7.

Evaluation of model performance.

The confusion matrix obtained from the results is given in Table 3. Overall accuracy was found at 93.3%. As a result, an efficient deep model is provided for estimating the VC parameter.

AP
Predicted: ZeroPredicted: One
Actual: (Zero)1016
Actual: (One)219

Table 3.

The confusion matrix of the model.

Advertisement

4. Discussion

The chapter aims to build an accurate and reliable deep neural network model to predict VC using weight, height, and age parameters (independent variables) as input parameters to the model. The VC value (dependent variable) was taken as the output of the proposed deep neural network.

In this book chapter, a multiple-layer perceptron neural network (MLPFFNN) was selected as a preferred ANN algorithm. Three-parameter OLS models are examined in terms of R-squared, it is found that the best OLS result is obtained with Multi-Linear Regression (R-squared =0.948). The results showed that the height and age information has a significant effect on the VC compared to the weight information. These variables played a significant role in the prediction of VC. Although studies of estimating lung volume have been encountered in the literature search, a deep neural network model application that estimates the VC value using some specified independent parameters has not been found.

Therefore, it is believed that the results presented in this chapter will fill an important gap in the literature in light of both the database specificity and the presented ML-AI method.

Advertisement

5. Conclusions

COPD disease has become a challenging problem with the effect of COVID-19. The fact that the Respiratory Test Functions, which is the most effective method for diagnosing COPD, cannot be performed at home has forced researchers to find different, new, cheap, technological, and practical methods addressing this challenge.

In this book chapter, it is suggested that the deep neural network-based VC prediction algorithm can be used in clinical tests to reduce the workload of doctors and nurses. As shown in this chapter, a fast and reliable diagnostic tool using ML-AI algorithm was obtained. The proposed ML-AI model provided 93.3% accuracy. The simulation results showed that the VC parameter can be determined with a high success rate using the proposed deep learning model with real data. With the proposed model, the rate of misdiagnosis can be reduced and spirometric measurements can be made quickly without waiting for hours to have Pulmonary Function Test (PFT) performed in hospitals.

The simulation results indicate that a smart tool using ML-AI technology can be a reliable alternative to medical spirometers. Currently, the developed model is planned to be tested clinically and the results will be reported in future studies. The goal is to provide this smart tool to be used in hospitals after approval from field experts and governmental health agencies.

Advertisement

Acknowledgments

This study is supported by the Coordinatorship of Ondokuz Mayıs University’s Scientific Research Projects (Project Number: PYO.YMY.1901.20.001), Samsun, Turkey. We would like to extend our heartfelt thanks to Dr. Kazım Sekeroğlu of Southeastern Louisiana University, Hammond, LA, USA.

Advertisement

Conflict of interest

The authors declare no conflict of interest.

References

  1. 1. Fındık G, Aydoğdu K, Kaya S. Travma sonrası semptomatik hale gelen diyafram evantrasyonu. Turkish Journal of Thoracic and Cardiovascular Surgery. 2011;19(1):107-109
  2. 2. Sümbül H, Yüzer AH. The measurement of COPD parameters (VC, RR, and FVC) by using Arduino embedded system. In: 1st International Mediterranean Science and Engineering Congress. Adana, Turkey: Çukurova University, Congress Center; 2016. pp. 201-207
  3. 3. Sümbül H, Yüzer AH. Development of a diagnostic device for COPD: A MEMS based approach. International Journal of Computer Science and Network Security. 2017;17(7):196-203
  4. 4. Sümbül H, Yüzer AH. Measuring of diaphragm movements by using iMEMS acceleration sensor. In: International Conference on Electrical and Electronics Engineering. Bursa, Turkey: ELECO; 2015. pp. 166-170
  5. 5. Irzaldy A, Wiyasihati SI, Purwanto B. Lung vital capacity of choir singers and nonsingers: A comparative study. Journal of Voice. 2016;30(6):717-720. DOI: 10.1016/j.jvoice.2015.08.008
  6. 6. Tantucci C, Bottone D, Borghesi A, Guerini M, Quadri F, et al. Methods for measuring lung volumes: Is there a better one? Respiration. 2016;91(4):273-280. DOI: 10.1159/000444418
  7. 7. Andrews JA, Meng L, Kulke SF, Rudnicki SA, Wolff AA, et al. Association between decline in slow vital capacity and respiratory insufficiency, use of assisted ventilation, tracheostomy, or death in patients with amyotrophic lateral sclerosis. JAMA Neurology. 2018;75(1):58-64. DOI: 10.1001/jamaneurol.2017.3339
  8. 8. Santos D B, Boré A, Castrillo LDA, Lacombe M, Falaize L, et al. Assisted vital capacity to assess recruitment level in neuromuscular diseases. Respiratory Physiology & Neurobiology 2017;243: 32-38. DOI: 10.1016/j.resp.2017.05.001
  9. 9. Pellegrino GM, Papa GFS, Centanni S, Corbo M, Kvarnberg D, et al. Measuring vital capacity in amyotrophic lateral sclerosis: Effects of interfaces and reproducibility. Respiratory Medicine. 2021;176:106277. DOI: 10.1016/j.rmed.2020.106277
  10. 10. Brault M, Gabrysz-Forget F, Dubé BP. Predictive value of positional change in vital capacity to identify diaphragm dysfunction. Respiratory Physiology & Neurobiology. 2021;289:103668. DOI: 10.1016/j.resp.2021.103668
  11. 11. Calvo A, Vasta R, Moglia C, Matteoni E, Canosa A, et al. Prognostic role of slow vital capacity in amyotrophic lateral sclerosis. Journal of Neurology. 2020;267(6):1615-1621. DOI: 10.1007/s00415-020-09751-1
  12. 12. "Vital Capacity". Family Practice Notebook. Retrieved February 19, 2015
  13. 13. Bhatti U, Rani K, Memon MQ. Variation in lung volumes and capacities among young males in relation to height. Journal of Ayub Medical College, Abbottabad. 2014;26(2):200-202
  14. 14. Sümbül H, Yüzer AH. Estimating the value of the volume from acceleration on the diaphragm movements during breathing. Journal of Engineering Science and Technology, School of Engineering, Taylor’s University. 2018;13(5):1205-1221
  15. 15. Pierce RJ, Brown DJ, Holmes M, Cumming G, Denison DM. Estimation of lung volumes from chest radiographs using shape information. Thorax. 1979;34(6):726-734. DOI: 10.1136/thx.34.6.726
  16. 16. Lange B, Flynn S, Rizzo A, Bolas M, Silverman M, et al. Breath: A game to motivate the compliance of postoperative breathing exercises. In: Virtual Rehabilitation International Conference. Haifa, Israel: IEEE; 2009. pp. 94-97. DOI: 10.1109/ICVR.2009.5174212
  17. 17. Yüzer AH, Sümbül H, Polat K. A novel wearable real-time sleep apnea detection system based on the acceleration sensor. IRBM. 2020;41(1):39-47. DOI: 10.1016/j.irbm.2019.10.007
  18. 18. Peters U, Kaminsky DA, Maksym GN. Chapter 2 - Standardized pulmonary function testing. In: Ionescu C, editor. Lung Function Testing in the 21st Century. London, UK: Academic Press; 2019. pp. 5-23. DOI: 10.1016/B978-0-12-814612-5.00002-6
  19. 19. Sindi H, Nour M, Rawa M, Öztürk Ş, Polat K. An adaptive deep learning framework to classify unknown composite power quality event using known single power quality events. Expert Systems with Applications. 2021;178:115023. DOI: 10.1016/j.eswa.2021.115023
  20. 20. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436-444
  21. 21. Yüzer AH, Sumbul H, Polat K, Nour M. A different sleep apnea classification system with neural network based on the acceleration signals. Applied Acoustics. 2020;163:107225. DOI: 10.1016/j.apacoust.2020.107225
  22. 22. Faust O, Hagiwara Y, Tan JH, Oh SL, Acharya UR. Deep learning for healthcare applications based on physiological signals: A review. Computer Methods and Programs in Biomedicine. 2018;161:1-13. DOI: 10.1016/j.cmpb.2018.04.005
  23. 23. Uçar MK, Bozkurt MR, Bilgin C, et al. Automatic detection of respiratory arrests in OSA patients using PPG and machine learning techniques. Neural Computing and Applications. 2017;28:2931-2945. DOI: 10.1007/s00521-016-2617-9
  24. 24. Uçar MK, Uçar Z, Uçar K, Akman M, Bozkurt MR. Determination of body fat percentage by electrocardiography signal with gender based artificial intelligence. Biomedical Signal Processing and Control. 2021;68:102650. DOI: 10.1016/j.bspc.2021.102650
  25. 25. Goodfellow I, Bengio Y, Courville A. Deep Learning. Cambridge, MA: The MIT Press; 2017
  26. 26. Özkaya U, Seyfi L, Öztürk Ş. Dimension optimization of multi-band microstrip antennas using deep learning methods. Pamukkale University Journal of Engineering Sciences. 2021;27(2):229-233. DOI: 10.5505/pajes.2020.23471
  27. 27. Ahmad I, Dar MA, Fenta A, Halefom A, Nega H, et al. Spatial configuration of groundwater potential zones using OLS regression method. Journal of African Earth Sciences. 2021;177:104147. DOI: 10.1016/j.jafrearsci.2021.104147
  28. 28. Paletta A, Alimehmeti G, Mazzetti G, Guglielmi D. Educational leadership and innovative teaching practices: A polynomial regression and response surface analysis. International Journal of Educational Management. 2021. DOI: 10.1108/IJEM-01-2021-0019
  29. 29. Chen D, Liu Y, Feng W, Wang Y,Hu Q, et al. In-situ prediction of α-phase volume fraction in titanium alloy using laser ultrasonic with support vector regression. Applied Acoustics. 2021;177:107928. DOI:10.1016/j.apacoust.2021.107928
  30. 30. Liu Q, Wang F, Li J, Xiao W. A hybrid support vector regression with multi-domain features for low-velocity impact localization on composite plate structure. Mechanical Systems and Signal Processing. 2021;154:107547. DOI: 10.1016/j.ymssp.2020.107547
  31. 31. Huo Y, Bouffard F, Joós G. Decision tree-based optimization for flexibility management for sustainable energy microgrids. Applied Energy. 2021;290:116772. DOI: 10.1016/j.apenergy.2021.116772
  32. 32. Breiman L. Random forests. Machine Learning. 2001;45:5-32. DOI: 10.1023/A:1010933404324
  33. 33. Md Mujeeb S, Praveen Sam R, Madhavi K. Adaptive exponential Bat algorithm and deep learning for big data classification. Sådhanå. 2021;46:15. DOI: 10.1007/s12046-020-01521-z

Written By

Harun Sümbül

Submitted: 18 February 2022 Reviewed: 29 March 2022 Published: 18 May 2022