Open access peer-reviewed chapter

Big Data Framework Using Spark Architecture for Dose Optimization Based on Deep Learning in Medical Imaging

By Clémence Alla Takam, Aurelle Tchagna Kouanou, Odette Samba, Thomas Mih Attia and Daniel Tchiotsop

Submitted: October 16th 2020Reviewed: April 15th 2021Published: May 4th 2021

DOI: 10.5772/intechopen.97746

Downloaded: 127


Deep learning and machine learning provide more consistent tools and powerful functions for recognition, classification, reconstruction, noise reduction, quantification and segmentation in biomedical image analysis. Some breakthroughs. Recently, some applications of deep learning and machine learning for low-dose optimization in computed tomography have been developed. Due to reconstruction and processing technology, it has become crucial to develop architectures and/or methods based on deep learning algorithms to minimize radiation during computed tomography scan inspections. This chapter is an extension work done by Alla et al. in 2020 and explain that work very well. This chapter introduces the deep learning for computed tomography scan low-dose optimization, shows examples described in the literature, briefly discusses new methods for computed tomography scan image processing, and provides conclusions. We propose a pipeline for low-dose computed tomography scan image reconstruction based on the literature. Our proposed pipeline relies on deep learning and big data technology using Spark Framework. We will discuss with the pipeline proposed in the literature to finally derive the efficiency and importance of our pipeline. A big data architecture using computed tomography images for low-dose optimization is proposed. The proposed architecture relies on deep learning and allows us to develop effective and appropriate methods to process dose optimization with computed tomography scan images. The real realization of the image denoising pipeline shows us that we can reduce the radiation dose and use the pipeline we recommend to improve the quality of the captured image.


  • Deep Learning
  • Computer Tomography Scan Image
  • Big Data technologies
  • Low Dose Optimization
  • Spark Framework

1. Introduction

Machine Learning (ML) technics are widely used in medical imaging in the form of many successful optimization, clustering, prediction and classifier algorithms. ML is a branch of artificial intelligence (AI) and has been used in a heterogeneity of applications. It is used to analyze complex data sets and find similarity, correlation and patterns between such data without explicit programming [1]. ML technology is an important part of medical imaging research. Recently, a highly flexible ML method called deep learning (DL) has emerged as a disruptive technology to improve the performance of existing ML methods and solve previously difficult problems [2]. DL comes from the ML and computer vision communities. The key to the success of the DL-based method lies in its independence from the explicit imaging model, backup of big data in a specific field, and optimization of image quality by learning features in an end-to-end manner [3]. Recently, it has been applied to natural language processing, facial recognition, speech recognition, image classification, automatic diagnosis and other problems, and has achieved good results [4, 5]. Nowadays, DL allows many applications in CT and helps to improve interpretation speed, diagnostic accuracy and clinical efficiency. In addition, for research purposes and clinical purposes, CT is widely used for detection, diagnosis and image-guided treatment [6, 7]. CT is a well-known imaging technique that can observe the inside of objects non-invasively [8]. The main problem of CT scans is the optimization and minimization of radiation dose during the examination, especially in pediatric skull scans. The development and optimization of dosimetry protocols in pediatric skull scans is a huge social interest, as well as the medical community worldwide. Indeed, because the radiosensitivity is much higher than that of adults, patient-specific dosimetry has aroused great interest in pediatric skull applications. This is because children have a higher risk of cancer compared with adults receiving the same dose [9]. In view of the possible risk of X-ray radiation to pediatric patients, low-dose CT has attracted considerable interest in the field of biomedical imaging [10]. However, the main problem with low-dose CT is image noise and the quality of the results obtained. To overcome this shortcoming, DL with a convolutional neural network (CNN) algorithm is used. In fact, one of the goals of various DL and ML algorithms is to improve the consistency, quality and/or applicability of diagnostic data interpretation. DL can improve the image quality during low-dose CT skull scans.

This chapter is an extended work done by Alla et al. in 2020 (, and explains the work well. In this chapter, we mainly focus on the dose optimization in pediatric skull scans using CNN for DL and the image processing performed in [4]. We completed the expansion of the work in [4], giving more explanations and more papers. The workflow performs the following steps: image denoising, image segmentation, CNN, image retrieval, image diagnosis and storage. We described the importance of using big data technology (Spark framework) to build our proposed architecture through the MapReduce method. We will discuss with the pipeline and architecture discussed in the literature. The implementation of FCNN has been implemented. The rest of the work is arranged as follows: Section 2 introduces the latest status of published works in this field. In Section 3, theoretically, these works are fully utilized in our paper, and the architecture we propose is proposed. Section 4 introduces the implementation of our architecture and the different results obtained. Section 5 examines and discusses the results. Section 6 provides conclusions and future work.


2. State of the art

ML and DL are becoming established disciplines in a wide range of AI fields in terms of analyzing and using data-concentrated patterns [11, 12]. The DL model of CNN refers to a class of computers that can learn the hierarchical structure of elements by constructing high-level attributes from low-level attributes, thereby automatically executing the process of element builders [13, 14]. In CT scan images, DL is usually used for the purpose of minimizing radiation exposure, noise images and CT image reconstruction. CT accurately uses gamma rays, X-rays, ultrasound or other types of beams in conjunction with sensitive detectors to sequentially scan various parts of the human body [15]. However, obtaining excellent image quality from CT scans requires very high radiation doses to the patient during the examination.

In addition, because the radiosensitivity is much higher than that of adults, patient-specific dosimetry has aroused great interest in pediatric skull applications [16]. Therefore, in these cases, low-dose CT is essential. In the past few years, low-dose CT biomedical imaging technology has become the focus of attention to alleviate people’s concerns about exposure to X-ray radiation and the widely used CT scan [16]. According to [17], the author reviewed the dosimetry applications in pediatric diagnostic methods (including CT and nuclear medicine applications) in 2018.

Based on these challenges, a lot of work was done during the CT examination to reduce the radiation dose and maintain the quality of the captured images. In [18, 19, 20], the author proposed a news method to optimize the dose during CT scans. However, sometimes the reduction of radiation leads to a reduction in image quality. DL can overcome this problem through reconstruction technology and prevent useful information from being deleted into the original CT image. Many types of research and publications have been conducted in the literature. In order to reduce radiation dose, some work is based on traditional methods (protocol optimization), while others are based on DL and ML methods. In Ref. [21], the author proposes a method for optimizing radiation dose based on the study of the scheme used. Dalmazo et al. in [22] investigated the radiation dose of the CT program through the phantom and ionization chamber, and conducted the research in the university hospital. Their research is based only on equipment surveys. In 2012, Dougeni et al. reviewed the patient dose and optimization procedures in CT scans of adults and children [23]. They discussed and compared various literary works in CT dose optimization, but did not propose their method. Recently, in 2019, Smith-Bindman et al. conducted a study in 151 institutions in seven countries and proposed a good practical plan from more than 2 million adult CT examinations to optimize the dose of radiation during CT scans or other radiological examinations [24]. In 2020, Abdukkadir et al. conducted a study to optimize current local practices by investigating the radiation dose distribution of pediatric head and abdomen CT examinations and existing routine scanning procedures at the Kelanta Radiology Department in Malaysia [25]. In addition, Cui et al. in [26], proposed a work to optimize the dose and image quality of various exposure conditions and phantom diameters in pediatric abdominal CT scans. In [27], the authors evaluated the image quality of dose-optimized (DO)C spine CT in patients who can pull down their shoulders in an emergency to reduce exposure and improve image quality. Chen et al., based on a survey in 2017, observed how to improve image quality after CT scan [28]. Nowadays, many works based on DL or ML are studying how to reduce the radiation dose during CT scan inspection and maintain good resolution and the quality of captured images.

Regarding ML and DL in dose optimization, Kang et al. in 2017, developed an algorithm using CNN, which was applied to the window wavelet transform coefficients of low-dose CT images. Their CNN is built with a residual learning architecture, which can speed up network training and improve performance. The execution results of their proposed algorithm show that the complex noise patterns are effectively eliminated in the CT images obtained from the reduced X-ray dose, and the wavelet domain CNN is effective in reducing the noise of low-dose CT [29]. Jung et al. performed a survey in 2017 on the latest applications of DL in CT and magnetic resonance imaging (MRI) biomedical image analysis in a range of tasks and target organs, with a focus on improving the accuracy and productivity of current diagnostic analysis [30]. They introduced some promising applications that have greatly changed the current flow of biomedical imaging [30]. However, they did not provide any workflow for the described method. Xuy et al. in [31] performed literature on DL method to solve the problem of PET image reconstruction quality. In their work, an excellent clinical diagnosis can be obtained when the radiation dose during the capture of biomedical images using PET Scan is low [31]. They are based on encoder-decoder residual deep networks with chain skip connections. Liu et al. evaluated how to use DL-based low-dose coronary CT angiography (CCTA) optimization algorithm for image noise reduction and image quality (IQ) improvement [32].

Wurfl et al. proposed a new DL framework for 3-D CT reconstruction in [33]. They developed a new type of cone beam back-projection layer, which can effectively calculate the forward pass, and their framework can jointly optimize the volume and the correction steps in the projection domain. Although the performance is encouraging, their methods are limited to post-processing methods. Shan et al. launched a transmission path-based convolutional codec (CPCE) network in 2018, which performs low-dose CT noise reduction based on transfer learning in 2D and 3D configurations within the framework of Generative Adversarial Networks (GAN) [34]. Tian et al., based on CNN, we combined the two networks to increase the width of the network, thereby obtaining more functions. This allows them to design a novel network called Batch Renormalized Noise Reduction Network (BRDNet) to eliminate a lot of noise on the image [35]. However, the author did not use CT images. Lee et al. developed a method using DL and its CNN which can analyze CT image tasks, such as object detection and semantic segmentation, or analyze other biomedical imaging modes, such as MRI and positron emission tomography (PET) scans [36]. However, their method is not based on low-dose CT. Recently, in 2019, Gu et al. combined random forest with dictionary learning to reduce CT scan radiation while ensuring the new low-dose CT super-resolution reconstruction and CT image quality [22]. In the same year, Meineke et al. proved that ML can comprehensively detect chest CT examinations with the potential of dose optimization [37]. They used 139 CT chest examinations to train and test different neural network layers and components, improved and optimized the construction model, and predicted the volumetric CT dose index (CTDIvol) based on the scanned patient indicators [37]. However, in the previous three works, the author did not provide a framework based on big data technology and DL to optimize the dose in children’s cranial CT scans.

According to these cited works, and as far as we know, no author does not provide a specific pipeline and big data architecture to use DL, Spark framework and MapReduce processing model to manage low-dose and efficient cranial CT scan images. This shortcoming is the main interest of this article. In fact, we have performed a pipeline that implements a full CNN (FCNN) for processing CT scan images and proposed a method to divide biomedical images into image blocks before applying FCNN. Therefore, we can use the Spark framework and MapReduce programming, and shorten the processing time to our proposed architecture. Our proposed architecture allows adjustment of the low dose in the low dose skull CT image for correct diagnosis.


3. Methods

Biomedical image processing is not new. Many software and programming methods always divide the image into many blocks for processing. In fact, dividing the image into small pieces, processing and merging them is a routine engineering work that will be used in any medical imaging pipeline. However, the new concept introduced in this article is to perform parallel processing, and for biomedical images, parallel processing is not actually completed. Traditionally we divide the image into many chucks. Instead of processing each block one by one, we use the Spark framework, but process many blocks or all blocks of the image at the same time. This section discusses our recommended workflow.

As the use of CT in modern medicine continues to increase, people are beginning to pay attention to increasing the radiation dose from biomedical imaging to the community and the associated increase in the estimated risk of radiation-induced cancer [38]. The optimization of the scanning method is important, so the necessary clinical information can be collected or captured while minimizing the radiation dose [39]. The proposed general workflow for biomedical image denoising is presented in Figure 1.

Figure 1.

Proposed General workflow for biomedical image denoising.

In Figure 1, the first step includes data collection from various hospitals, medical centers, and laboratories. The data include the images of various medical applications. In this chapter, we used CT image to perform our work. The second step consist to choose the denoising techniques to use. We list various techniques and we used DL into this chapter. Feature extraction and selection are actually a critical step for image denoising using DL. Feature extraction methodologies evaluate the preprocessed images in order to extract the most prominent features which represent different sets of features based on the pixel intensity relationship statistics.

We proposed also in this part the best pipeline for CNN-based low-dose CT image diagnosis. Our proposed pipeline relies on four main parts: captured images, multiprocessing images, denoising and diagnosis, sharing or storage. Figure 2 shows us all the parts of our proposed pipeline [4].

Figure 2.

Pipeline for low-dose CT image reconstruction. Using the Spark framework will only design multi-processing steps.

3.1 Image denoising theory in CT Scan: overview

Image denoising has always been a basic problem in the field of image processing [40]. For researchers, removing noise on the original captured image is still a challenging problem in digital image processing. Solving image details and eliminating random noise as much as possible is the goal of image denoising methods. Many noise reduction techniques rely on mathematical methods. The problem of image denoising cans mathematically modelled by Eq. (1) [41]:


In Eq. (1), yrepresented the image noising, xthe clean image, and bthe noise. The noising is modelled by using an additive white Gaussian noise (AWGN) with standard deviation. The authors from [40, 41, 42, 43], shown that the technics of denoising image rely on the transform domain and spatial domain. In this part, we based of work done in [44] and present the FCNN architecture used to applied in our work. Through mathematical explanation, the parameter update process in the CNN architecture is introduced in detail. In this section, we deal with the CNN model for image denoising.

The mathematical model of CNN that allows us to predict a clean version from noisy images based on the CNN architecture and training process will develop. Image noise reduction and noise removal with structure preservation function is one of the important tasks integrated in medical diagnostic imaging systems (such as X-ray, computer tomography (CT)). When the area considered by the patient is exposed under X-ray/CT, X-ray and CT images are formed and the resulting attenuation is captured. The Figure 3 presents the CNN architecture for image denoising from [42, 43].

Figure 3.

The structure of the CNN denoiser [42].

The CNN model usually has three layers: Input layer, hidden layer and output layer. Figure 4 shows a set of layers used to reduce image noise in the CNN architecture. All descriptions of this architecture are presented in [4].

Figure 4.

Layers name for CT image denoising.

CNN method is based on the following idea: the model operates normally based on the local understanding of the image. By reusing the same parameters multiple times, it uses fewer parameters than a fully connected network.

In the “multi-processing” step in Figure 2, we propose our method based on a large number of research results provided by the CT low-dose optimization study using Deep CNN [12]. FCNN is composed of multiple layers of neuron-like computational connections, and has minimal step-by-step processing, thus achieving significant improvements [12]. In FCNN, each layer is completely connected to the upper layer, so there is no need to preserve spatial relationships. However, the training of FCNN is computationally demanding and requires a large number of data sets that may not be easily available. In order to solve the usually long training time problem, a large community of machine learning engineers and programmers is committed to research and development of more general and faster software platforms for DL use cases. There are many examples, such as Keras, Pytorch, Torch, etc., which provide an exciting experience, a practical interface, and a fast and efficient memory implementation to train and test many deep learning architectures [12, 45]. Nowadays, almost every framework includes convolution, deconvolution, max pooling, full connection, exit technology and batch normalization, and almost all popular optimization methods are implemented. Due to the lack of a powerful computer, we propose a DNN-based architecture in this chapter, which uses Spark to accelerate and improve CT low-dose image reconstruction. In this chapter, we set up a cluster with one master node and two slave nodes to reduce computation time. After the FCNN-based CT low-dose image reconstruction step, we can turn to the image diagnosis step in Figure 2, where the expert will view the new image and make a suitable diagnosis. In addition, low doses can be used to carry on the health of the patient. According to our pipeline technic, experts can always make correct diagnosis on captured images.

3.2 Suggested FCNN method for CT scan image denoising

The development and training of FCNN is still the subject of research. In the process, we use DL for low-dose CT image reconstruction. In the next session, we will use the best big data framework (Spark) with MapReduce to design the best architecture to build effective and appropriate technics for processing low-dose CT scan images. In this part, we will introduce the Spark architecture to handle the most important steps of the pipeline described in Figure 1. The main goal of this architecture is to see how to process CT images for reconstruction. Apache Spark is based on MapReduce for parallel programming and extends the data sharing abstraction called Resilient Distributed Data Set (RDD)[4, 5]. Spark’s DL has two main advantages: large-scale prediction and hyperparameter adjustment [5]. In addition, Spark Framework provides easy-to-use APIs to enable DL in a few lines of code in its Spark MLib library. Figure 5 shows us the Apache Spark architecture with different layers, and Figure 6 shows us the Spark architecture with FCNN for low-dose CT image optimization. In Figure 6, we can see how to use MapReduce programming and FCNN with back propagation and forward propagation to train the input image. We need to divide the image into a set of image blocks (with heavier images). Digital CT scan images are usually too large, thus increasing the complexity of processing. To overcome this complexity, we divide the image into many image blocks and process each part of the image independently. By using a programming parallel method like MapReduce, we can execute the processing of these blocks at the same time, thus saving processing time compared to traditional methods. Figure 6 outlines how FCNN reconstructs CT low-dose images into Spark. In fact, the Spark architecture allows us to develop effective and appropriate techniques to utilize a large number of images. Figure 6 outlines image processing in Spark. Training our FCNN model on the Spark framework involves two main steps (MapReduce programming), these steps will happen repeatedly and repeated until the total initialization error is small enough: Map and Reduce Step [4].

Figure 5.

Apache spark features.

Figure 6.

FCNN-based spark map reduce pipeline for low-dose CT image reconstruction.

The scenario or concept of Figure 6 allows us to process many CT images at the same time and optimize the processing time. Using the Spark framework and using the DL architecture, the process of dose optimization in pediatric skull scans is complete, easy and fast.


4. Results

In this section, we are based on the architecture proposed in Figure 6 and implement our CT image noise reduction algorithm. Our goal is to use FCNN to learn Eq. (1) by minimizing function of equations presented in [4]. We treat the image from Kaggle [46] as a clean/real image: 𝑦𝑖. The data set contains information on 37 women and 45 men, so a total of 82 patients obtained 4615 CT images. However, due to insufficient computer capabilities, we reduced the number of images. Figure 7 shows us some noisy and clean images from the dataset. For each pixel, we will generate a noisy version by adding Gaussian white noise: 𝑥𝑖 = 𝑦𝑖 + b𝑖 (see Eq. (1)), where b where is a CT image, where each pixel is an independent implementation of zero-mean Gaussian distribution, Has a standard deviation σ = 30.

Figure 7.

Clean and noisy CT images.

Indeed, when we reduce the dose during the CT scan, the captured image is noisy. Here, we treat the noise as a Gaussian distribution. Since the sizes of CT images are different, we will consider random crops with a size of 180 × 180. As mentioned in [47], it is very important to initialize the weights in the process of training the model. The training loss and training PSNR according to number of epochs are also presented in this section. The PSNR is defined in [7, 48] by (2)


PSNR gives an objective measure of distortion; a higher PSNR (greater than 30 dB) equals good image quality [7, 48]. Figure 8a and b respectively show the training loss and training PSNR according to several periods. We notice that in Figure 8, the training loss is close to 0.001, which proves the effectiveness of our training model, and the training PSNR is close to 33 dB (Figure 8b). Therefore, our DL method can efficiently denoise CT scan images. This effect can be seen in Figure 9, where we show a noisy and denoised image. To implement this work, we use a computer with Ubuntu OS, Spark and work locally in one cluster that we built with one node. Table 1 present a summary of our different results.

Figure 8.

(a & b) Results of training model. (a) Training Loss (b) Training PSNR.

Figure 9.

Image noisy and obtained image denoising from our model.

Number of epochsTraining LossPSNR (dB)Training Time (s)

Table 1.

Summary of our Training simulation Model.


5. Discussion

Nowadays, new workflows, pipelines, and architectures are always suggested in other areas to improve the field of biomedical imaging. This work proposes a workflow for CT low-dose image reconstruction relying on FCNN and Spark. The uniqueness of our workflow is that it gives the best techniques, methods and algorithms that can be used in every design phase. By using the features of MapReduce, we can perform parallel processing on the proposed architecture. Based on the observations in the previous section, our proposed pipeline and architecture have a new concept for low-dose optimization in pediatric skull scans. They can be customized and adapted to many other biomedical applications. In order to effectively understand our proposed architecture, we compared this architecture with another architecture suggested in the literature. In [8], the author proposed an architecture based on FCNN for CT low-dose optimization. However, his proposed architecture is not based on the Spark, so it cannot process many bio medical images at the same time. As shown in [8], we propose two main training steps: forward propagation, in which low-quality images are passed through the network, and the output is obtained by calculating a set of convolutions. Backpropagation, where the derivative of the loss function with respect to each network parameter is calculated, and the calculated gradient is used to update these values to reduce the loss. Similarly, in [49], the author designed a DL architecture for CT reconstruction based on the plug-and-play framework, and obtained good results. Nevertheless, the authors did not use DL for low-dose reconstruction. They are only used for image noise reduction. As mentioned in Section 2, they did not rely on the literature of the Spark framework for CT low-dose reconstruction using DL.


6. Conclusion

Deep learning has shown encouraging results in clinical studies because they can perform major reconstructions during a reduced-dose CT scan while maintaining a useful diagnosis. In this article, we outline some important research in the field of low-dose CT optimization, and study the problem of low-dose CT reconstruction from the perspective of DL. We propose a pipeline for low-dose image reconstruction using FCNN to Spark framework. To design our pipeline, we conducted a literature review to determine the most suitable method for CT low-dose image optimization. Therefore, we are able to provide a way to finally obtain the best architecture for each stage of the pipeline. To outline our proposed method, we built a Spark architecture that uses FCNN for low-dose CT reconstruction. The results got prove the efficiency and effectiveness of our proposed method. The training data greatly affects the noise reduction performance of the model, which is a common problem in discriminative learning methods. In the future, we will build our own data set to improve the process of CT scan image noise reduction. We will also try to used quantum computing with deep learning for a large dataset in order to improve quantitatively the work done in this chapter.

© 2021 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Clémence Alla Takam, Aurelle Tchagna Kouanou, Odette Samba, Thomas Mih Attia and Daniel Tchiotsop (May 4th 2021). Big Data Framework Using Spark Architecture for Dose Optimization Based on Deep Learning in Medical Imaging, Artificial Intelligence - Latest Advances, New Paradigms and Novel Applications, Eneko Osaba, Esther Villar, Jesús L. Lobo and Ibai Laña, IntechOpen, DOI: 10.5772/intechopen.97746. Available from:

chapter statistics

127total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Novelty Detection Methodology Based on Self-Organizing Maps for Power Quality Monitoring

By Juan Jose Saucedo-Dorantes, David Alejandro Elvira-Ortiz, Arturo Yosimar Jaen-Cuéllar and Manuel Toledano-Ayala

Related Book

First chapter

Biologically Inspired Intelligence with Applications on Robot Navigation

By Chaomin Luo, Gene En Jan, Zhenzhong Chu and Xinde Li

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us