Open access peer-reviewed chapter

Effective Screening and Face Mask Detection for COVID Spread Mitigation Using Deep Learning and Edge Devices

Written By

Xishuang Dong, Lucy Nwosu, Sheikh Rufsan Reza and Xiangfang Li

Reviewed: 12 September 2023 Published: 16 November 2023

DOI: 10.5772/intechopen.113176

From the Edited Volume

Internet of Things - New Insights

Edited by Maki K. Habib

Chapter metrics overview

19 Chapter Downloads

View Full Metrics

Abstract

The emergence of COVID-19, stemming from the SARS-CoV-2 virus, has led to a widespread outbreak affecting countless individuals and inducing dire circumstances globally. Mitigating the transmission of COVID-19 has necessitated the implementation of effective measures such as rigorous COVID screening and physical safeguards, including practices like social distancing and the utilization of face masks. Notably, the application of advanced technologies such as deep learning, a subset of artificial intelligence (AI), has played a pivotal role in devising novel strategies for both detecting COVID-19 and curbing its propagation. This chapter presents a comprehensive overview of COVID screening methodologies based on deep learning, with a specific focus on biomedical image processing and the detection of face masks. Furthermore, it delves into initial endeavors concerning COVID image analysis and the creation of a mobile face mask detection system, designed to operate on edge devices. The ensuing discussions encompass detailed case studies, showcasing the practical implications and efficacy of these initiatives.

Keywords

  • COVID-19 pandemic
  • chest X-ray imaging
  • deep learning
  • face mask detection
  • Internet of Things

1. Introduction

The global COVID-19 pandemic, originating from the Coronavirus disease 2019 (COVID-19), continues to persist across more than 200 countries and territories, evoking significant apprehension within the international community [1, 2]. This crisis has engendered profound human losses and economic ramifications, reshaping lives in various countries through the implementation of lockdown measures. In order to curb the dissemination of COVID-19 and curtail its associated mortality, early detection of the disease assumes critical importance. Effective and prompt screening and testing are pivotal for the proficient management of individuals afflicted by COVID-19 [3, 4]. The evolution of advanced techniques has led to the proposition of increasingly efficient screening technologies, aimed at attenuating the transmission of COVID-19.

1.1 COVID-19 screening

The goal of a COVID-19 screening test is to identify potential cases of the disease in individuals who are asymptomatic. This proactive approach aims to mitigate the spread of COVID-19 by detecting infections early, allowing for timely and effective treatment. During the initial stages of the COVID-19 pandemic, the primary method for detecting viral infections was the Reverse Transcription-Polymerase Chain Reaction (RT-PCR) test [5]. However, the effectiveness of the RT-PCR assay has been questioned due to its limited sensitivity [6], which can be attributed to various factors like sample preparation and quality control issues [7]. Furthermore, the current nucleic acid tests’ sensitivity necessitates repeated testing for a significant portion of suspected patients to achieve a reliable diagnosis. This underscores the need for the development of a complementary tool capable of providing lung-imaging information. Such a tool would serve as an invaluable resource for medical professionals, aiding them in enhancing the accuracy of COVID-19 diagnoses.

Chest X-rays and thoracic computed tomography (CT) scans are readily available imaging tools that offer significant support to medical practitioners in diagnosing lung-related ailments [8, 9, 10]. The application of artificial intelligence (AI) in enhancing image analysis of chest X-rays and thoracic CT data has garnered substantial interest, particularly in the development of effective COVID-19 screening techniques. AI, a burgeoning technology in the realm of medical imaging, has played a dynamic role in the battle against COVID-19 [11]. This is in contrast to traditional imaging processes that heavily rely on human interpretation, as AI offers imaging solutions that are safer, more accurate, and more efficient. Notably, the utilization of deep learning for data representation has exhibited remarkable success in image processing [12]. Convolutional neural networks (CNNs) [13, 14, 15, 16, 17, 18] have effectively tackled the challenge of representing digital images, particularly on extensive datasets such as the ImageNet dataset [19]. These advances demonstrate the potential of deep learning in transforming biomedical image analysis.

The potency of deep learning methods, such as Convolutional Neural Networks (CNNs), has been prominently demonstrated in the classification of COVID-19 cases. Ghoshal et al. [20] introduce a Bayesian Convolutional Neural Network designed to estimate diagnostic uncertainty in COVID-19 predictions. This approach incorporates 70 lung X-ray images from COVID-19 patients sourced from an online COVID-19 dataset [21], as well as non-COVID-19 images from Kaggle’s Chest X-Ray Images (Pneumonia), where Bayesian inference is employed to enhance detection accuracy. Narin et al. [10] focus on COVID-19 infection detection using X-ray images, employing a comparative analysis of three deep learning models: ResNet50, InceptionV3, and Inception-ResNetV2. The evaluation results indicate that the ResNet50 model surpasses the performance of the other two models. Zhang et al. [22] similarly harness the ResNet architecture for COVID-19 classification using X-ray images. An anomaly score is estimated to optimize the COVID-19 score, which in turn is used for classification. Wang et al. [23] introduce COVID-Net, a framework tailored for the detection of COVID-19 cases through X-ray images. Primarily, most ongoing studies employ X-ray images for discriminating between COVID-19 cases and other instances of pneumonia and healthy subjects. However, the limited quantity of available COVID-19 images raises concerns regarding the methods’ robustness and generalizability, urging further investigation.

Furthermore, it is of utmost importance to delineate the areas affected by COVID-19 infection, as this yields comprehensive insights crucial for accurate diagnosis. Semantic segmentation plays a pivotal role in identifying and quantifying COVID-19 by recognizing regions and associated patterns. This technique enables the assessment of regions of interest (ROIs) encompassing lung structures, lobes, bronchopulmonary segments, and infected regions or lesions within chest X-ray or CT images. Extracting handcrafted or learned features for diagnosis and other applications becomes feasible through the use of segmented regions. The advancement of deep learning has significantly propelled the evolution of semantic image segmentation. In the context of CT scans, the networks employed for COVID-19 include established models like U-Net [24, 25, 26], UNet++ [27], and VB-Net [28] to segment ROIs. Furthermore, segmentation approaches for COVID-19 can be categorized into two main groups: those oriented toward lung regions and those aimed at lung lesions. The former group focuses on distinguishing lung regions, encompassing entire lungs and lung lobes, from surrounding (background) regions in CT or X-ray images [29, 30]. For instance, Jin et al. [29] utilize UNet++ to detect the entire lung region. The latter group seeks to isolate lung lesions (or artifacts such as metal and motion) from lung regions [31, 32]. In addition to screening techniques, physical solutions for mitigating the spread of COVID-19 also hold efficacy.

1.2 Face mask detection

An effective physical measure to counter the spread of COVID-19 is the utilization of face masks in public settings [33]. Figure 1 illustrates the varying transmission risks between an infected individual and an uninfected person. When an infected person does not wear a mask, the risk of transmitting the virus to an uninfected person is substantially high, as depicted in the first row. This risk diminishes to a moderate level if either of them wears a face mask (depicted in the second row). The lowest risk of infection occurs when both individuals are wearing masks [34], as depicted in the third row. Thus, wearing face masks effectively reduces the spread of COVID-19. However, ensuring universal adherence to mask-wearing mandates poses challenges. AI-driven face mask detection [35] has emerged as a technique to identify compliance with face mask requirements and can serve as a reminder for those not wearing masks. Developing a face mask detection system from scratch presents challenges due to the scarcity of labeled images. As a result, deep transfer learning [36] offers a promising solution to this predicament. This technique involves adapting pre-trained models to the task of face mask detection. The complete process of constructing face mask detection models through deep transfer learning is delineated in Figure 2.

Figure 1.

Different risk of transmission between infected person (left column) and uninfected person (right column).

Figure 2.

The flow of building face mask detector.

During the model training phase, a limited set of annotated images is loaded as training data. These images are then used to fine-tune pre-trained deep learning models, ultimately creating a robust fake mask detector. In the testing phase, datasets with labeled ground truth are loaded. The face mask detector is applied to this data, and its performance is evaluated using predefined metrics. Subsequent sections delve into comprehensive details regarding COVID-19 screening methodologies and fake mask detection on edge devices, all supported by illustrative case studies.

The structure of this chapter is as follows: In Section 2, we provide an overview of previous research on AI-driven COVID-19 screening techniques and face mask detection. Moving to Section 3, we delve into specific case studies, presenting associated outcomes that highlight effective screening and face mask detection strategies. These endeavors leverage deep learning and edge devices to mitigate the spread of COVID-19. Finally, Section 4 encapsulates the conclusions and outlines avenues for future research.

Advertisement

2. Related work

2.1 COVID-19 screening via AI-enhanced image processing

The advancement of AI-driven image processing has played a pivotal role in significantly advancing COVID-19 screening techniques, encompassing both COVID-19 classification and COVID-19 segmentation. Considerable attention has been directed toward the classification of COVID-19 cases versus non-COVID-19 cases, with a focus on employing deep learning models. These models aim to effectively differentiate between COVID-19 patients and non-COVID-19 subjects, wherein the latter group includes individuals with common pneumonia and those without pneumonia. The diagram depicted in Figure 3 illustrates the process of COVID-19 classification on a chest X-ray image. The COVID-19 classifier receives a chest X-ray image as input and provides an output that classifies the image as either indicating COVID-19 or non-COVID-19 status.

Figure 3.

A diagram of COVID-19 classification on chest X-ray images.

Considerable research efforts have been directed toward COVID-19 classification. Chen et al. [27] pursued COVID-19 classification using segmented lesion patterns extracted via UNet++. Their dataset included diverse patient cases, such as COVID-19 patients, viral pneumonia patients, and non-pneumonia patients. Given the visual similarity between common pneumonias, particularly viral pneumonia, and COVID-19, distinguishing these conditions becomes crucial for effective clinical screening. To address this, a 2D CNN model was proposed, employing manually delineated region patches for classification between COVID-19 and typical viral pneumonia. Additionally, Wang et al. [37] combined segmentation information with a proposed 2D CNN model to classify COVID-19 cases by considering handcrafted features of relative infection distance from the lung’s edge. Xu et al. [38] utilized candidate infection regions segmented by V-Net. They combined these region patches with handcrafted features representing the distance from the edge of the region to perform COVID-19 classification using a ResNet-18 model. Zheng et al. [24] employed U-Net for lung segmentation and utilized 3D CNNs to predict COVID-19 probabilities based on the segmented features. Their dataset consisted solely of chest CT images of COVID-19 and non-COVID-19 cases. Similarly, Jin et al. [29] introduced a UNet++ − based segmentation model to identify lesions and a ResNet50-based classification model for diagnosis. Their larger dataset encompassed chest CT images of 1136 cases, including 723 COVID-19 positives and 413 COVID-19 negatives. In another work, Jin et al. [39] employed a 2D Deeplab v1 model for lung segmentation and a 2D ResNet152 model for slice-based identification of positive COVID-19 cases. In summary, ongoing efforts in COVID-19 classification primarily focus on learning from significant volumes of medical images. However, the application of these techniques is hindered by the considerable data requirements for building effective classifiers.

The segmentation of COVID-19 cases is achieved through image semantic segmentation techniques bolstered by deep learning models such as U-Net and V-Net [40, 41]. An illustrative instance of image semantic segmentation on a chest X-ray image is presented in Figure 4. This approach enables the precise identification and isolation of COVID-19-affected regions within medical images.

Figure 4.

An example of image semantic segmentation on a chest X-ray image.

The U-Net architecture is a powerful tool for segmenting both lung regions and lung lesions, facilitating the construction of effective image segmentation models [31]. U-Net, developed using a fully convolutional network [42], features a distinctive U-shaped design encompassing two symmetric paths: an encoding path and a decoding path. The layers at the corresponding levels in these two paths are interconnected through shortcut connections, fostering the acquisition of improved visual semantics and intricate contextual details. Zhou et al. [43] introduced UNet++, which inserts a nested convolutional structure between the encoding and decoding paths, further enhancing segmentation performance. In a similar vein, Milletari et al. [44] developed the V-Net, employing residual blocks as fundamental convolutional units and optimizing the network using a Dice loss. Furthermore, Shan et al. [28] devised VB-Net, enhancing segmentation efficiency by incorporating convolutional blocks with bottleneck blocks. Numerous variations of the U-Net architecture and its derivatives have been explored, yielding promising segmentation outcomes in COVID-19 diagnosis [27]. Advanced attention mechanisms are incorporated to identify the most discriminant features within deep learning models. Oktay et al. [45] introduced the Attention U-Net, capable of capturing intricate structures in medical images, rendering it suitable for segmenting lesions and lung nodules in COVID-19 applications. The integration of COVID-19 classification and segmentation empowers the implementation of multifaceted screening techniques across different levels.

2.2 Face mask detection

Face mask detection has been a subject of extensive research, and these efforts can be broadly categorized into two classes. The first approach treats it as an object detection task, where the goal is to localize the face mask area using bounding boxes. For instance, Mingjie Jiang et al. [46] proposed a one-stage face mask detector that employs a pre-trained ResNet for transfer learning and a feature pyramid network to extract semantic information. They introduced a novel context attention mechanism to enhance the detection of mask features. Similarly, Loey et al. [47] presented an object detection process using a combination of ResNet and YOLO V2. Another approach views face mask detection as an image classification problem [35]. Researchers in this category employed various convolutional neural networks (CNNs) such as MobileNet [48], Inception V3 [49], VGG-16 [50], and ResNet [51]. MobileNet, designed for edge devices, operates at high speed due to its smaller model size and complexity. Inception V3 utilizes factorizing convolutions to maintain robustness while reducing connections. VGG-16 explores the impact of depth on accuracy in large-scale image classification tasks.

In addition to training models from scratch, some researchers use pre-trained face detectors to extract faces, and then apply mask detection classification models on the detected faces [52, 53]. For example, Lippert et al. utilize OpenCV’s pre-trained face detector and a VGG-16-based classifier for face mask detection. The deployment of machine learning models on resource-constrained devices has gained popularity, as executing models locally is often preferable to sending data to the cloud due to issues such as limited bandwidth and privacy concerns.

In summary, face mask detection has been tackled through two main avenues: object detection using bounding boxes and image classification using various CNN architectures. These diverse approaches aim to enhance the accuracy and efficiency of detecting face mask presence and adherence.

2.3 Edge device

Machine learning models are executed on small IoT or edge devices, which highlights the relevance of edge computing However, executing such models on these devices is not a straightforward task due to their computational intensity. Models must be lightweight or compatible with these devices to be feasible. NVIDIA Jetson TX2 and Nano are popular test devices. These NVIDIA devices serve as embedded AI computing solutions. The Jetson TX2 features an NVIDIA Pascal GPU with 8 GB of memory, while the Nano is equipped with a Maxwell GPU and 4 GB of memory. Both devices are well-suited for computer vision and machine learning tasks. Figure 5 depicts the devices used in this chapter, and Table 1 provides a detailed comparison between the two devices.

Figure 5.

NVIDIA Jetson Nano (left) and NVIDIA Jetson TX2 (right).

Jetson TX2Jetson Nano
CPUDual-Core NVIDIA Denver 2 64-Bit CPUQuad-core ARM Cortex-A57 MPCore processor
GPU256-core Pascal @1300 MHzNVIDIA Maxwell architecture with 128 NVIDIA CUDA® cores
Memory8GB 128-bit LPDDR44 GB 64-bit LPDDR4
Storage32GB eMMC 5.116 GB eMMC 5.1

Table 1.

Comparison between NVIDIA Jetson TX2 and NVIDIA Jetson Nano.

Advertisement

3. Case study

3.1 Case study on COVID-19 classification

In the context of this chapter, chest X-ray images were selected for both model development and validation. It was driven by the cost-effectiveness and greater accessibility of chest X-rays compared to CT scans, particularly in communities with limited medical resources. Additionally, chest X-rays offer a swift imaging solution, making them particularly attractive for large-scale patient screening during the COVID-19 pandemic. Therefore, utilizing chest radiography for screening COVID-19 patients is considered a practical, efficient, and rapid approach [54, 55]. For validation purposes, this chapter utilized a comprehensive chest X-ray dataset “COVIDx” [23]. This dataset comprises a vast collection of 18,543 chest radiography images from 13,725 unique cases. It is important to note that when evaluating the distribution of classes between the training and testing datasets, a significant dissimilarity in class distribution becomes apparent.

This chapter employs two types of deep learning models: Convolutional Neural Networks (CNNs) and ResNet. CNNs have played a significant role in advancing various visual processing tasks like image classification [56], object detection and tracking [57, 58], and semantic segmentation [59]. The progress of CNNs has been facilitated by large datasets like ImageNet [56] and YouTube-BoundingBoxes [60], which provide ample training data for building large-scale models. The general architecture of CNN for image classification is depicted in Figure 6. SOTA CNN architectures such as AlexNet [56], VGG [50], and GoogleNet [61] have propelled advancements in image classification. These architectures leverage millions of annotated samples from large datasets to successfully estimate appropriate parameters. Furthermore, CNNs have been enhanced by combining them with other deep learning models. For example, Wang et al. [62] combined CNN with Recurrent Neural Networks (RNNs) for multi-label image classification. Additionally, CNNs combined with autoencoders [63, 64] have demonstrated effectiveness in tasks like face detection. In this chapter, three small CNNs were trained and tested on a subset of the COVIDx dataset to demonstrate a proof-of-concept experiment.

Figure 6.

Convolutional neural network architecture for image classification.

ResNet [51] is an artificial neural network architecture inspired by the structure of pyramidal cells in the cerebral cortex. It introduces skip connections, or shortcuts, which allow the network to bypass certain layers. The concept behind ResNet is that training a network to learn a residual mapping is simpler than training it to directly learn the underlying mapping. This is achieved using residual blocks, as depicted in Figure 7. A crucial modification in ResNet compared to a standard CNN is the “skip connection” for identity mapping. This identity mapping has no parameters; it simply adds the output from the previous layer to the next layer. However, the dimensions of x and Fx might differ. Since convolutions usually reduce spatial resolution, e.g., a 3×3 convolution on a 32×32 image results in a 30×30 image, the identity mapping is expanded using a linear projection W to match the channels of the residual. This allows the input x and Fx to be combined as input to the subsequent layer. Given the effectiveness of ResNet, various ResNet architectures will be employed in this chapter to screen COVID-19 using the complete COVIDx dataset.

Figure 7.

General architecture of the residual block of ResNet [51].

This chapter examines the proposed models from two distinct viewpoints. The first perspective involves assessing their capability to effectively identify COVID-19 cases from a limited dataset, using compact CNNs. The second perspective entails investigating whether these models can utilize the ResNet architecture to identify COVID-19 cases within an extensive dataset, all without relying on transfer learning techniques. For the small dataset scenario, a subset of 350 images was extracted from the original COVIDx dataset [23] for training and testing. Among these, 300 images were allocated for training, while the remaining 50 were reserved for testing. Three small CNNs were employed for this evaluation. The obtained training and testing accuracies for these three models are depicted in Figure 8. The results indicate that the initial shallow CNN (referred to as CNN1) encountered issues with under-fitting. However, as additional layers were incorporated to extract more intricate features, CNN3 exhibited superior accuracy performance.

Figure 8.

Performance on training and testing accuracy for three small CNNs.

In the context of a large dataset, all images encompassed within the COVIDx dataset were harnessed to establish a classifier for COVID-19 identification. The initial step encompassed data preprocessing, involving the compression of images. The original X-ray images within the dataset measured 1024×1024×3 in dimensions. To expedite the training process, the image size was compressed to 64×64×3. Subsequently, training was initiated using this preprocessed dataset. The outcomes of the training and testing phases are depicted in Figure 9.

Figure 9.

Performance comparison on training and testing.

The findings unveil the existence of an optimal configuration, specifically ResNet-34, which outperforms alternative models. Beyond ResNet-34, a consistent better performance becomes apparent as the number of ResNet layers increases to 152. This can be attributed to the phenomenon where, with the progressive augmentation of layers, the models tend to overfit while the available data remains inadequate to effectively train the model. It is noteworthy that although the training time slightly extended with the inclusion of additional layers in the ResNet models, the performance gains were limited. Furthermore, the results underscore the accomplishment of this chapter in achieving commendable performance by training ResNet models from scratch, devoid of reliance on transfer learning techniques.

While supervised deep learning demonstrates impressive performance in classifying COVID-19 images, its practical application is hindered by the need for a substantial volume of annotated medical images for training. Given the limitations in available COVID-19-related data resources and the significant costs associated with labeling medical images, this approach becomes less feasible, further exacerbated by labeling inaccuracies that may arise [65]. To address this challenge, the focus has shifted toward semi-supervised deep learning, which has garnered considerable attention due to its capacity to enhance model generalization by leveraging both labeled and unlabeled data [66, 67, 68, 69]. This paradigm involves training deep neural networks through the simultaneous optimization of a standard supervised classification loss on labeled samples and an unsupervised loss on unlabeled data [67, 69]. Semi-supervised learning models aim to amplify the information derived from unlabeled data [70] or impose regularization on the network to enforce smoother and more consistent classification boundaries [68].

In the realm of COVID-19 research, particularly in tasks like COVID-19 image classification and image segmentation, semi-supervised learning has emerged as a solution to mitigate the scarcity of labeled data [71, 72, 73, 74, 75, 76]. However, within the domain of COVID-19 image classification, the studies conducted by Zhou et al. [76], Calderon et al. [72], and Paticchio et al. [74] have not thoroughly examined model performance using large-scale X-ray image datasets such as COVIDx [23]. Furthermore, they have not conducted comprehensive comparisons against state-of-the-art methods, particularly in scenarios where labeled data is severely limited, constituting less than 10% of the dataset. In response, this chapter introduces a semi-supervised deep learning model for COVID-19 image classification, systematically evaluating its performance on the COVIDx dataset [23].

Using the ResNet architecture, this chapter devised a two-path semi-supervised learning ResNet (referred to as SSResNet), which comprises three key components: a shared ResNet, a supervised ResNet, and an unsupervised ResNet. These two paths are formed by coupling the shared ResNet with either a supervised ResNet or an unsupervised ResNet. Both labeled and unlabeled data are leveraged in the computation of the unsupervised loss, utilizing the mean squared error loss (MSEL). Conversely, only labeled data contributes to the calculation of the supervised loss, employing the cross-entropy loss (CEL). To counterbalance data imbalance, a weighted cross-entropy loss (WCEL) was ingeniously crafted, assigning greater weight to the COVID-19 class. The primary objective of minimizing MSEL lies in enhancing image representation, while the reduction of WCEL aims to boost classification performance. For a comprehensive outline of the methodology, refer to Algorithm 1. The efficacy of the proposed model was thoroughly assessed using the extensive X-ray image dataset COVIDx. The experimental findings distinctly establish that the proposed model excels in the realm of COVID-19 image classification. Notably, even when trained with a notably limited quantity of labeled X-ray images, the model showcases remarkable performance.

Algorithm 1. Learning of Semi-supervised ResNet (SSResNet).

Require: training sample xi, the set of training samples S, label yi for xi (iS)

  1. for t in [1, num epochs] do

  2. for each minibatch B do

  3.    ziBfθsharedxiB shared representation

  4.    ziBsupfθsupziB supervised representation

  5.    ziBunsupfθunsupziB unsupervised representation

  6.    liBWCEL1BiBSlogϕzisupyiwi supervised loss component

  7.    liBMSEL1CBiBzisupziunsup2 unsupervised loss component

  8.    LossliBWCEL+λ×liBMSEL total loss

  9.    update θshared, θsup, θunsup using optimizer, e.g., ADAM

return θshared, θsup, θunsup

Upon analyzing the class distribution disparities between our training and testing datasets, a noteworthy distinction came to light. Consequently, we undertook the task of reconstructing the data structure. This involved partitioning the dataset into distinct training and testing subsets, ensuring that their class distributions closely aligned. Specifically, 70% of the data was allocated for the training dataset, while the remaining 30% constituted the testing dataset. For a comprehensive breakdown of the reconstructed dataset, including sample distribution details, please refer to Table 2.

DatasetNormalPneumoniaCOVID-19Total
Training619567087512,978
Testing26562876335565
Total8851958410818,543

Table 2.

Sample distribution in different classes for training and testing datasets.

It is evident that the distribution of samples in our dataset is highly skewed, particularly in relation to the COVID-19 class. This imbalance presents a significant hurdle in achieving a high-performing classifier. To surmount this issue, our proposed model employs a weighted cross-entropy loss. This innovative approach involves according greater weight to the minority class (COVID-19) throughout the training process. For a comprehensive understanding of this technique, please refer to section two.

In the experiment, the key hyper parameters for training the proposed model are: Minibatch size: 256, Number of epoch: 50, Optimizer: Adam optimizer, and Initial Learning rate: 0.1. They are determined by trial and error. Moreover, the details of the model architecture are illustrated in Table 3. We employ COVID-Net1 as a baseline supervised model to present the state-of-the-art performance of COVID-19 image classification for comparison. Furthermore, we compared the proposed model with SRC-MT that is the state-of-the-art semi-supervised learning since it outperformed Π model and mean teacher model in the area of medical image classification.

NameDescription
InputMedical Images
Shared ResNetone convolutional layer, 2 residual block
batch normalization, one pooling layer
Supervised ResNetone convolutional layer, 2 residual block
batch normalization, one pooling layer
Unsupervised ResNetone convolutional layer, 2 residual block
batch normalization, one pooling layer
Outputimage class ϕzsup and
a new representation zunsup

Table 3.

The proposed network architecture.

Table 4 showcases a comprehensive comparison of the classification performance between SRC-MT and the newly introduced model (SSResNet). On the whole, the overall accuracies attained by SRC-MT surpass those achieved by the proposed SSResNet. Nonetheless, an interesting observation emerges when considering scenarios where merely 5% of labeled samples were utilized for training. In this specific scenario, the MacroF metric of SSResNet surpasses that of SRC-MT. This outcome indicates that the proposed model exhibits greater efficacy in identifying COVID-19 samples. Essentially, the utilization of the unsupervised path in SSResNet appears to remarkably enhance data representation, thereby leading to a more pronounced improvement in COVID-19 classification performance compared to SRC-MT.

Semi-supervised modelAccuracy (%)MacroP (%)MacroR (%)MacroF (%)
SRC-MT (5%)90.6761.0860.7560.59
SRC-MT (7%)89.8289.9274.1378.95
SRC-MT (9%)92.7993.6179.1584.15
Our modelAccuracy (%)MacroP (%)MacroR (%)MacroF (%)
SSResNet (5%)84.9561.1866.7662.41
SSResNet (7%)84.2163.6767.8562.83
SSResNet (9%)81.7959.3470.9959.19

Table 4.

Comparing performance between SRC-MT and proposed model (semi-supervised ResNet (SSResNet)).

Furthermore, this study delved into a granular examination of the performance for each class, elucidating the outcomes through the employment of confusion matrices, as depicted in Figure 10. A noteworthy observation emerges from this analysis: The proposed model outperforms SMC-TC in terms of recognizing COVID-19 cases. This observation underscores the capability of SSResNets to glean more effective features from unlabeled data, resulting in a heightened ability to discern COVID-19 samples. Notably, as the ratios of labeled data are incrementally augmented, there is a marked improvement in the accuracy of COVID-19 recognition. This underscores the proposition that the inclusion of an unsupervised path serves to elevate image representations, consequently bolstering classification performance. In essence, the integration of unlabeled data distinctly contributes to a substantial enhancement in COVID-19 classification performance, primarily attributable to the amplification of image representations facilitated by the unsupervised path within the SSResNet.

Figure 10.

Comparison of confusion matrix generated with SRC-MT and SSResNets trained on different ratios of labeled data.

3.2 Face mask detection

This chapter also introduces the process of constructing a mobile model designed for face mask detection on edge devices. The primary goal of this endeavor is to develop an intelligent Internet of Things (IoT) device capable of performing real-time video processing to ascertain whether an individual is wearing a face mask in public settings. This technology finds practical application in scenarios such as enforcing mask-wearing within facilities or buildings. In this context, a smart IoT camera functions as a vigilant observer, detecting instances where individuals are not complying with the mask mandate and triggering alarms as necessary. To achieve this, mobile devices like FPGAs, integrated into the camera system, execute the face mask detection models. For a visual representation of this concept, refer to Figure 11, which presents an illustrative diagram depicting the potential of employing face mask detection to control access to a door.

Figure 11.

A diagram of a face mask alarm system. A face mask alarm mounted on the door consists of a camera, a face mask detector, and a convertor. The camera will send the personal image to the face mask detector consisting of mobile GPUs and artificial neural networks. Then, the detector will detect the face mask on the image by running the neural networks on the mobile GPUs and sent the detection result. Finally, the convertor will show a speech reminder such as “Face Mask Required” back to the person if the detection result indicates there is no face mask on the face.

Instead of embarking on the construction of models from scratch, the approach taken involves the utilization of pre-trained models that have undergone training using more extensive datasets across broader classes. This pre-trained ensemble comprises four distinct convolutional neural networks (CNNs): MobileNet V2, Inception V3, VGG 16, and ResNet 50. Leveraging transfer learning, as opposed to constructing models from scratch, often leads to superior performance. These pre-trained models have been honed on the comprehensive ImageNet dataset [56]. The salient attributes extracted from these pre-trained models serve as the foundation, conveyed to a novel classifier positioned at the terminal end of the network. Subsequently, a mask detection classifier is trained on top of the pre-trained model. A pivotal distinction discernible among these models is their input size. Inception V3 adopts an image size of 299×299×3, while the other three models employ a size of 224×224×3. The efficacy of these models is rigorously verified through inference runs on mobile devices, encompassing the NVIDIA Jetson TX2 and NVIDIA Jetson Nano platforms.

Dataset and experiment setup are presented below.

  • Dataset: We utilized publicly available datasets2, which consist of 3890 images categorized into two classes: with face and without face. Among these, 1916 images featured individuals wearing masks, while 1930 images depicted individuals without masks, after removing redundant entries. This dataset was deliberately structured to maintain a balanced distribution for our classifier. However, it does present some exceptional cases, such as images containing multiple faces or instances where faces are partially occluded by other body parts.

  • Experiment setup: A learning rate of 0.0001 was employed, while the experiment consistently utilized a batch size of 10. This modest batch size was deliberately selected to facilitate extended training while operating within memory limitations. Furthermore, the experimentation spanned 100 epochs. The loss function of choice was binary cross-entropy.

The experiment involves the utilization of publicly available datasets, sourced from https://github.com/chandrikadeb7/Face-Mask-Detection. This dataset comprises a total of 3890 images that include both images with faces and images without faces. Within this collection, there are 1916 images depicting individuals wearing masks and 1930 images without masks. This dataset design maintains a balanced distribution, which is optimal for classification. It is important to note that certain exceptions are present, such as images containing multiple faces or faces partially covered by other body parts. In configuring the models, a learning rate of 0.0001 is employed. Throughout the experiment, a batch size of 10 is consistently used. This choice is rooted in the goal of facilitating more extensive training while optimizing memory usage. The training process spans 100 epochs, and for the loss function, the binary cross-entropy is selected.

In the model implementation, a diverse set of tools come into play. TensorFlow [77], a widely adopted open-source software library, forms the core framework for machine learning applications, supporting operations across various processing units including CPUs, GPUs, and TPUs. The flexibility it offers, in terms of both architectural design and working with pre-trained models, contributes to its popularity. Keras [78], which functions atop TensorFlow, is another critical component. Designed for expedited execution of deep learning models, Keras enhances the speed of development. For this research, Keras is employed to leverage pre-trained deep learning models, which are then fine-tuned to create the face mask detection classifier through transfer learning.

This study delved into the robustness of the models by subjecting them to training on limited sample sizes. Traditionally, working with small training datasets results in diminished training and testing performance. However, given the unique context of face mask detection in the context of the COVID-19 outbreak, acquiring extensive data for training and testing becomes a challenge. Thus, developing models for face mask detection that excel with small training data becomes pivotal for creating effective applications. Additionally, these models must be locally deployable and capable of achieving tasks at an optimal speed.

To address these requirements, various ratios of training data were employed to assess model performance. The outcomes, presented in Table 5, were obtained by running these models on mobile NVIDIA GPUs. The experimental results yield noteworthy insights. MobileNet V2 emerges as the swiftest model, processing nearly 40 frames per second (FPS) on the Jetson TX2 platform. On the other hand, VGG 16 attains the highest accuracy among the models. Notably, when trained using just 1% of the available training data, Inception demonstrates superior performance.

ModelsTrainingTesting on TX2Testing on Nano
RtrNtrLossTraining accuracyTesting accuracyInference time (ms)FPSTesting accuracyInference time (ms)FPS
MobileNet V21%300.1070.97760.695726.9737.270.692642.4023.58
5%1500.0950.97330.768525.1039.860.765942.0023.80
10%3000.1140.96220.784225.1039.870.780142.9023.31
20%6010.1360.94900.773525.5339.140.769943.0323.24
ResNet 501%300.0710.96660.500070.9014.110.7433195.055.13
5%1500.0500.98450.797671.0014.090.8023176.455.67
10%3000.0350.99000.797670.1314.250.7962173.055.78
20%6010.0270.99270.865171.0014.080.8599173.655.76
Inception V31%300.1670.95560.827589.2711.210.8275187.005.35
5%1500.1360.95780.870487.8011.410.8713188.535.31
10%3000.1360.94450.906692.3310.850.9057905.725.22
20%6010.1670.93790.897289.5711.170.8977192.775.19
VGG 161%300.0890.96670.68572138.770.470.6761239.374.18
5%1500.0060.99780.92312139.600.470.9173245.234.08
10%3000.0100.99560.93702136.000.470.9325236.174.23
20%6010.0060.99940.96072139.570.470.9580237.704.21

Table 5.

Performance comparison on face mask detection on Jetson TX2 and Jetson Nano. FPS refers to the number of images processed per second. Rtr and Ntr denote the ratio of training data and the number of training images.

Observations indicate that training accuracy surpasses 90% for each model, indicative of overfitting given the markedly lower testing scores. As the proportion of training data is gradually increased to 5, 10, and 20%, the performance of all models exhibits enhancement. This suggests that augmenting the dataset samples contributes to alleviating overfitting issues. VGG secures the highest accuracy, and while training on 20% of the data, MobileNet attains the lowest accuracy. ResNet and Inception showcase comparable performance levels throughout this experimentation.

Additionally, it is observed in Table 6 that the pretrained models performed better on recognizing images containing masks regarding the precision, recall, and Fscore. Additionally, these pretrained models can achieve promising performance in terms of Fscore.

ModelsClassPrecisionRecallFscore
MobileNet V2With Mask0.990.830.90
Without Mask0.860.990.92
ResNet 50With Mask1.000.960.98
Without Mask0.961.000.98
Inception V3With Mask1.000.980.99
Without Mask0.981.000.99
VGG 16With Mask1.001.001.00
Without Mask1.001.001.00

Table 6.

Comparison of precision, recall, and Fscore.

In summary, we leveraged pre-trained models such as MobileNet V2, ResNet 50, Inception V3, and VGG 16. These models were chosen due to their established performance in various applications. When considering model complexity, MobileNet V2 stands out for its efficiency, boasting the lowest complexity among the aforementioned models. On the other hand, VGG-16 adheres to a classical architecture characterized by a substantial number of parameters, leading to an overall higher complexity. InceptionV3, with its emphasis on capturing multi-scale features, adopts a more intricate architecture than MobileNet V2. In the context of speed and accuracy, as showcased in the performance comparison table labeled “Performance comparison on face mask detection on Jetson TX2 and Jetson Nano,” MobileNet V2 outperforms the others in terms of speed. In contrast, VGG-16 exhibits the slowest processing speed, aligning with the principle that more complex models tend to operate at a slower pace. Furthermore, it is noteworthy that VGG-16 achieves optimal performance relative to the other models, driven by its expansive model capacity. This insight underscores the trade-off between complexity and performance, where VGG-16’s higher capacity contributes to its superior performance despite the trade-off in processing speed.

3.3 Challenges to real-world applications

Deploying COVID screening for real-world applications still presents several challenges:

  • COVID screening: To be effective in real-world scenarios, COVID screening requires a streamlined and efficient process. However, existing techniques, particularly deep learning-based X-ray COVID detection, involve intricate steps, including chest X-ray tests and subsequent COVID classification. Moreover, the quality of data obtained from chest X-ray tests might not be optimal for accurate COVID classification. Additionally, the reliance on high-performance computing hardware for deep learning-based classification hinders its applicability in underdeveloped regions.

  • Face Mask Detection: It encounters several obstacles that hinder its real-world applications. Firstly, the high accuracy achieved in controlled experiments might not translate directly to real-world scenarios due to significant differences in image backgrounds, object sizes and positions, and object overlaps. Secondly, face mask detection must work across various environments, including both indoor and outdoor settings. Adapting to outdoor environments requires techniques capable of handling diverse weather conditions and backgrounds, which poses difficulties for current methodologies. Lastly, users prefer seamless face mask detection that does not disrupt their daily activities. This demands the development of techniques that can perform detection without requiring users to actively interact with a camera.

In both cases, addressing these challenges requires innovative solutions that can simplify processes, adapt to diverse conditions, and accommodate user preferences without compromising performance or accuracy.

Advertisement

4. Conclusion

The deployment of technologies such as COVID-19 screening and counterfeit mask detection plays a crucial role in curbing the spread of the COVID-19 virus. This chapter has introduced advanced AI-driven methods designed to enhance both COVID-19 screening and the identification of face masks. Specifically, the utilization of deep learning for COVID-19 classification has demonstrated significant potential in the creation of effective screening tools. Moreover, the application of deep learning for face mask detection on mobile devices has exhibited promising results. In the pursuit of developing these innovative techniques, valuable insights are gained that can be applied in future pandemic situations. In the future, potential work includes: (1) Overfitting is characterized by machine learning models achieving impressive performance during the training phase, but faltering when tested on new data. To ensure the efficacy of large models for COVID screening and face mask detection, it is imperative to address this concern. In our forthcoming work, we aim to tackle this issue through the implementation of data augmentation techniques, leveraging expansive pre-trained image models. More specifically, these large pre-trained image models hold the capacity to generate synthetic data for both COVID screening and face mask detection. This synthetic data will encompass varying backgrounds, enriching the diversity of the training dataset and reducing overfitting tendencies. Additionally, we intend to incorporate human feedback as an integral part of the loop. This iterative process will involve human assessment to gauge the quality of the generated data, fostering continuous improvements in the data augmentation strategy. This combined approach of data augmentation and human feedback holds significant promise in enhancing the generalization capabilities of the models, thereby enabling robust and high-performing COVID screening and face mask detection systems in real-world scenarios; (2) AI ethics continue to be a significant concern across various applications, encompassing issues such as privacy breaches and biased data. In our future efforts, we are dedicated to addressing these ethical challenges head-on. Our strategy involves the application of privacy-preserving techniques to safeguard user privacy, coupled with measures to mitigate biased data. To protect user privacy, we are poised to employ our pioneering privacy-preserving edge intelligent computing framework. This entails training autoencoders in an unsupervised manner on individual edge devices. Subsequently, the latent vectors derived from these autoencoders are transmitted to the edge server for classifier training. This approach effectively reduces communication overhead while safeguarding end-users’ sensitive data from exposure. In tackling biased data concerns, our plan is to integrate fair pre-processing techniques from AIF360, an AI fairness toolkit.3 These techniques will be strategically applied during the data collection phase for both COVID screening and face mask detection. By doing so, we aspire to counteract biases that may emerge in the data, ensuring equitable and unbiased outcomes. By proactively addressing privacy and bias concerns through cutting-edge privacy-preserving frameworks and fairness techniques, we aim to develop AI solutions that not only excel in performance but also uphold the highest standards of ethical conduct.

Advertisement

Acknowledgments

This research work is supported by the US NSF award 2018945, 2205891, and 2302469.

Advertisement

Conflict of interest

The authors declare no conflict of interest.

References

  1. 1. Organization W.H et al. Statement on the Second Meeting of the International Health Regulations Emergency Committee Regarding the Outbreak of Novel Coronavirus (2019-nCoV). Geneva, Switzerland; 2020
  2. 2. Organization W.H et al. WHO Director-general’s Opening Remarks at the Media Briefing on COVID-19-11 March 2020. Geneva: Switzerland; 2020
  3. 3. CDC. Testing for COVID-19. Available from: https://www.cdc.gov/coronavirus/2019-ncov/symptoms-testing/testing.html. [Accessed: April 19 2020]
  4. 4. NY Times. Germany got testing right, what can we learn?. Available from: https://www.nytimes.com/2020/04/28/opinion/coronavirus-testing-united-states.html. [Accessed: April 29 2020]
  5. 5. Ai T, Yang Z, Hou H, Zhan C, Chen C, Lv W, et al. Correlation of chest CT and RT-PCR testing for coronavirus disease 2019 (COVID-19) in China: A report of 1014 cases. Radiology. 2020;296(2):32-40
  6. 6. Fang Y, Zhang H, Xie J, Lin M, Ying L, Pang P, et al. Sensitivity of chest CT for COVID-19: Comparison to RT-PCR. Radiology. 2020;296(2):115-117
  7. 7. Liang T et al. Handbook of COVID-19 Prevention and Treatment. Zhejiang: Zhejiang University School of Medicine; 2020
  8. 8. Kanne JP. Chest CT findings in 2019 novel coronavirus (2019-nCoV) infections from Wuhan, China: Key points for the radiologist. Radiology. 2020;295(1):16-17
  9. 9. Bernheim A, Mei X, Huang M, Yang Y, Fayad ZA, Zhang N, et al. Chest CT findings in coronavirus disease-19 (COVID-19): Relationship to duration of infection. Radiology. 2020;295(3):685-691
  10. 10. Narin A, Kaya C, Pamuk Z. Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks. Pattern Analysis and Applications. 2021;24(3):1207-1220
  11. 11. Bullock J, Luccioni A, Pham KH, Lam CSN, Luengo-Oroz M. Mapping the landscape of artificial intelligence applications against COVID-19. Journal of Artificial Intelligence Research. 2020;69:807-845
  12. 12. Bengio Y, Courville A, Vincent P. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2013;35(8):1798-1828
  13. 13. Dahl GE, Ranzato MA, Mohamed AR, Hinton G. Phone recognition with the mean-covariance restricted Boltzmann Machine. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems. New Orleans Ernest N. Morial Convention Center; Dec 2010. pp. 469-477
  14. 14. Deng L, Seltzer ML, Yu D, Acero A, Mohamed AR, Hinton G. Binary coding of speech spectrograms using a deep auto-encoder. In: Eleventh Annual Conference of the International Speech Communication Association. 2010. pp. 1692-1695
  15. 15. Yu D, Seide F, Li G. Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of the 29th International Conference on Machine Learning. Edinburgh, Scotland. Jun 2021. pp. 1-2
  16. 16. Hinton G, Deng L, Yu D, Dahl GE, Mohamed AR, Jaitly N, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine. 2012;29(6):82-97
  17. 17. Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets. Neural Computation. 2006;18(7):1527-1554. DOI: 10.1162/neco.2006.18.7.1527. URL doi:10.1162/neco.2006.18.7.1527
  18. 18. Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. Aistats. 2011;15:275
  19. 19. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems. Lake Tahoe, Nevada, USA; 2012. pp. 1087-1105
  20. 20. Ghoshal B, Tucker A. Estimating uncertainty and interpretability in deep learning for coronavirus (COVID-19) detection. 2020. arXiv preprint arXiv:2003.10769
  21. 21. Cohen JP, Morrison P, Dao L. COVID-19 image data collection. 2020. arXiv preprint arXiv:2003.11597
  22. 22. Zhang J, Xie Y, Li Y, Shen C, Xia Y. COVID-19 screening on chest X-ray images using deep learning based anomaly detection. 2020. arXiv preprint arXiv:2003.12338
  23. 23. Wang L, Wong A. COVID-Net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest radiography images. arXiv preprint arXiv:2003.09871. 2020
  24. 24. Zheng C, Deng X, Fu Q, Zhou Q, Feng J, Ma H, et al. Deep learning-based detection for COVID-19 from chest CT using weak label. MedRxiv. 2020
  25. 25. Huang L, Han R, Ai T, Yu P, Kang H, Tao Q, et al. Serial quantitative chest CT assessment of COVID-19: Deep-learning approach. Radiology: Cardiothoracic Imaging. 2020;2(2):e200075
  26. 26. Gozes O, Frid-Adar M, Greenspan H, Browning PD, Zhang H, Ji W, et al. Rapid AI development cycle for the coronavirus (COVID-19) pandemic: Initial results for automated detection and patient monitoring using deep learning CT image analysis. 2020. arXiv preprint arXiv:2003.05037
  27. 27. Chen J, Wu L, Zhang J, Zhang L, Gong D, Zhao Y, et al. Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution computed tomography: A prospective study. MedRxiv. 2020
  28. 28. Shan F, Gao Y, Wang J, Shi W, Shi N, Han M, et al. Lung infection quantification of COVID-19 in CT images with deep learning. arXiv preprint arXiv:2003.04655. 2020
  29. 29. Jin S, Wang B, Xu H, Luo C, Wei L, Zhao W. et al. AI-assisted CT imaging analysis for COVID-19 screening: Building and deploying a medical AI system in four weeks. MedRxiv. 2020
  30. 30. Tang L, Zhang X, Wang Y, Zeng X. Severe COVID-19 pneumonia: Assessing inflammation burden with volume-rendered chest CT. Radiology: Cardiothoracic Imaging. 2020;2(2):e200044
  31. 31. Cao Y, Xu Z, Feng J, Jin C, Han X, Wu H, et al. Longitudinal assessment of COVID-19 using a deep learning–Based quantitative CT pipeline: Illustration of two cases. Radiology: Cardiothoracic Imaging. 2020;2(2):e200082
  32. 32. Gaál G, Maga B, Lukács A. Attention U-Net based adversarial architectures for chest X-ray lung segmentation. 2020. arXiv preprint arXiv:2003.10304
  33. 33. Jefferson T, Dooley L, Ferroni E, Al-Ansary LA, van Driel ML, Bawazeer GA, et al. Physical interventions to interrupt or reduce the spread of respiratory viruses. Cochrane Database of Systematic Reviews. 2023. pp. 1-325
  34. 34. Desai AN, Patel P. Stopping the spread of COVID-19. JAMA. 2020;323(15):1516-1516
  35. 35. Loey M, Manogaran G, Taha MHN, Khalifa NEM. A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the COVID-19 pandemic. Measurement. 2020;167:108288
  36. 36. Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C. A survey on deep transfer learning. In: Artificial Neural Networks and Machine Learning-ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part III 27. Springer International Publishing; 2018. pp. 270-279
  37. 37. Wang S, Kang B, Ma J, Zeng X, Xiao M, Guo J. et al. A deep learning algorithm using CT images to screen for corona virus disease (COVID-19). MedRxiv. 2020
  38. 38. Butt C, Gill J, Chun D, Babu BA. Deep learning system to screen coronavirus disease 2019 pneumonia. Applied Intelligence. 2020;53(4):4874-4874
  39. 39. Jin C, Chen W, Cao Y, Xu Z, Tan Z, Zhang X, et al. Development and evaluation of an artificial intelligence system for COVID-19 diagnosis. Nature Communications. 2020;11(1):5088
  40. 40. Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In: Medical Image Computing and Computer-Assisted Intervention-MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19. Springer International Publishing; 2016. pp. 424-432
  41. 41. Isensee F, Petersen J, Klein A, Zimmerer D, Jaeger PF, Kohl S et al. nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation. In: Bildverarbeitung für die Medizin 2019: Algorithmen–Systeme–Anwendungen. Proceedings des Workshops vom 17. bis 19. März 2019 in Lübeck. Springer Fachmedien Wiesbaden; 2019. pp. 22-22
  42. 42. Ronneberger O, Fischer P, BroxT. U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer International Publishing; 2015. pp. 234-241
  43. 43. Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J. Unet++: A nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4. Springer International Publishing; 2018. pp. 3-11
  44. 44. Milletari F, Navab N, Ahmadi SA. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 4th International Conference on 3D Vision (3DV). Stanford, CA, USA. Oct 2016. pp. 565-571
  45. 45. Oktay O, Schlemper J, Folgoc LL, Lee M, Heinrich M, Misawa K, et al. Attention U-Net: Learning where to look for the pancreas. 2018. arXiv preprint arXiv:1804.03999
  46. 46. Jiang M, Fan X. Retinamask: A face mask detector. 2020. arXiv preprint arXiv:2005.03950
  47. 47. Loey M, Manogaran G, Taha MHN, Khalifa NEM. Fighting against COVID-19: A novel deep learning model based on YOLO-v2 with ResNet-50 for medical face mask detection. Sustainable Cities and Society. 2021;65:102600
  48. 48. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. 2017
  49. 49. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. pp. 2818-2826
  50. 50. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. 2014
  51. 51. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. pp. 770-778
  52. 52. Inamdar M, Mehendale N. Real-time face mask identification using facemasknet deep learning network. 2020. Available from: SSRN 3663305
  53. 53. Chowdary GJ, Punn NS, Sonbhadra SK, Agarwal S. Face mask detection using transfer learning of inceptionv3. In: Big Data Analytics: 8th International Conference, BDA 2020, Sonepat, India, December 15-18, 2020, Proceedings 8. Springer International Publishing; 2020. pp. 81-90
  54. 54. Farooq M, Hafeez A. Covid-resnet: A deep learning framework for screening of covid19 from radiographs. arXiv preprint arXiv:2003.14395. 2020
  55. 55. Hall LO, Paul R, Goldgof DB, Goldgof GM. Finding covid-19 from chest x-rays using deep learning on a small dataset. arXiv preprint arXiv:2004.02060. 2020
  56. 56. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. Miami, FL, USA; Jun 2009. pp. 248-255
  57. 57. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L. Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Columbus, Ohio, USA; 2014. pp. 1725-1732
  58. 58. Ren S, He K, Girshick R, Sun J. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems. Montreal, Canada: Montreal Convention Center; 2015. p. 28
  59. 59. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA; 2015. pp. 3431-3440
  60. 60. Real E, Shlens J, Mazzocchi S, Pan X, Vanhoucke V. Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. pp. 5296-5305
  61. 61. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA; 2015. pp. 1-9
  62. 62. Wang J, Yang Y, Mao J, Huang Z, Huang C, Xu W. CNN-RNN: A unified framework for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA; 2016. pp. 2285-2294
  63. 63. Yang J, Reed SE, Yang MH, Lee H. Weakly-supervised disentangling with recurrent transformations for 3D view synthesis. In: Advances in Neural Information Processing Systems. Montreal, Quebec, Canada; 2015. pp. 1099-1107
  64. 64. Yim J, Jung H, Yoo B, Choi C, Park D, Kim J. Rotating your face using multi-task deep neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, Ohio, USA; 2015. pp. 676-684
  65. 65. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, et al. A survey on deep learning in medical image analysis. Medical Image Analysis. 2017;42:60-88
  66. 66. Kingma DP, Mohamed S, Rezende DJ, Welling M. Semi-supervised learning with deep generative models. Advances in Neural Information Processing Systems. Montreal, Quebec, Canada; 2014. pp. 3581-3589
  67. 67. Laine S, Aila T. Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242. 2016
  68. 68. Rasmus A, Valpola H, Honkala M, Berglund M, Raiko T. Semi-supervised learning with Ladder networks. In: IEEE Conference on Neural Information Processing Systems. Montreal, Quebec, Canada: Neural Information Processing Systems Foundation; 2015. pp. 3546-3554
  69. 69. Weston J, Ratle F, Collobert R. Deep learning via semi-supervised embedding. In: Proceedings of the 25th International Conference on Machine Learning. Helsinki, Finland; Jul 2008. pp. 1168-1175
  70. 70. Lee DH. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, The 30th International Conference on Machine Learning (ICML 2013), Atlanta, USA. Jun 2013
  71. 71. Berenguer AD, Sahli H, Joukovsky B, Kvasnytsia M, Dirks I, Alioscha-Perez M, et al. Explainable-by-design semi-supervised representation learning for COVID-19 diagnosis from CT imaging. 2020. arXiv preprint arXiv:2011.11719
  72. 72. Calderon-Ramirez S, Yang S, Moemeni A, Elizondo D, Colreavy-Donnelly S, Chavarria-Estrada LF, et al. Correcting data imbalance for semi-supervised covid-19 detection using x-ray chest images. Applied Soft Computing. 2021;111:107692
  73. 73. Ma J, Nie Z, Wang C, Dong G, Zhu Q, He et al. Active contour regularized semi-supervised learning for COVID-19 CT infection segmentation with limited annotations. Physics in Medicine & Biology. 2020;65(22):225-234
  74. 74. Paticchio A, Scarlatti T, Mattheakis M, Protopapas P, Brambilla M. Semi-supervised neural networks solve an inverse problem for modeling COVID-19 spread. 2020. arXiv preprint arXiv:2010.05074
  75. 75. Yang D, Xu Z, Li W, Myronenko A, Roth HR, Harmon S, et al. Federated semi-supervised learning for COVID region segmentation in chest CT using multi-national data from China, Italy, Japan. Medical Image Analysis. 2021;70:101992
  76. 76. Zhou J, Jing B, Wang Z, Xin H, Tong H. Soda: Detecting covid-19 in chest x-rays with semi-supervised open set domain adaptation. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2021;19(5): 2605-2612
  77. 77. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. 2016. arXiv preprint arXiv:1603.04467
  78. 78. Chollet F. et al. Keras. 2015. Available from: https://keras.io

Notes

  • https://github.com/lindawangg/COVID-Net
  • https://github.com/chandrikadeb7/Face-Mask-Detection
  • https://github.com/Trusted-AI/AIF360

Written By

Xishuang Dong, Lucy Nwosu, Sheikh Rufsan Reza and Xiangfang Li

Reviewed: 12 September 2023 Published: 16 November 2023