Open access peer-reviewed chapter

Adversarial Attacks on Image Classification Models: FGSM and Patch Attacks and Their Impact

Written By

Jaydip Sen and Subhasis Dasgupta

Submitted: 09 May 2023 Reviewed: 04 July 2023 Published: 26 July 2023

DOI: 10.5772/intechopen.112442

From the Edited Volume

Information Security and Privacy in the Digital World - Some Selected Topics

Edited by Jaydip Sen and Joceli Mayer

Chapter metrics overview

85 Chapter Downloads

View Full Metrics

Abstract

This chapter introduces the concept of adversarial attacks on image classification models built on convolutional neural networks (CNN). CNNs are very popular deep-learning models which are used in image classification tasks. However, very powerful and pre-trained CNN models working very accurately on image datasets for image classification tasks may perform disastrously when the networks are under adversarial attacks. In this work, two very well-known adversarial attacks are discussed and their impact on the performance of image classifiers is analyzed. These two adversarial attacks are the fast gradient sign method (FGSM) and adversarial patch attack. These attacks are launched on three powerful pre-trained image classifier architectures, ResNet-34, GoogleNet, and DenseNet-161. The classification accuracy of the models in the absence and presence of the two attacks are computed on images from the publicly accessible ImageNet dataset. The results are analyzed to evaluate the impact of the attacks on the image classification task.

Keywords

  • image classification
  • convolutional neural network
  • adversarial attack
  • fast gradient sign method (FGSM)
  • adversarial patch
  • ResNet-34
  • GoogleNet
  • DenseNet-161
  • classification accuracy

1. Introduction

Szegedy et al. observed that a number of machine-learning models, even cutting-edge neural networks, are susceptible to adversarial samples [1]. In other words, these machine learning models categorize incorrectly cases that differ by a marginal amount from examples that are correctly classified and taken from the distribution of data. The same adversarial example is most often classified incorrectly by a wide range of models of varied architecture which are built on different sub-samples of the training data. This shows that fundamental flaws in our training algorithms are exposed by adversarial samples. It was unclear what caused these adversarial cases. However, speculative explanations have indicated that it may be related to the extreme non-linearity property of deep neural networks in combination with inadequate model averaging and insufficient regularization of the supervised learning problem that the models attempt to handle.

However, Goodfellow et al. disprove the need for these speculative hypotheses [2]. The authors argued that only linear behavior in high-dimensional domains is needed to produce adversarial cases. With the help of this viewpoint, it is possible to quickly create adversarial examples, which makes adversarial training feasible. The authors have also demonstrated that in addition to the regularization benefits offered by the techniques such as dropout, adversarial training can also regularize deep learning models [3]. Changing to nonlinear model families like RBF networks can significantly reduce a model’s vulnerability to adversarial examples compared to generic regularization procedures like dropout, pretraining, and model averaging.

One may consider deep learning, which is frequently employed in autonomous (driverless) automobiles, to see why such misclassification is risky [4]. To recognize signs or other cars on the road, systems based on DNNs are utilized [5]. The automobile might not stop and end up in a collision, which might have disastrous repercussions, if tampering with the input of such systems, for as by significantly changing the body of the car, stops DNNs from correctly recognizing it as a moving vehicle. When an enemy may gain by avoiding detection or having their information misclassified, there is a significant threat. These kinds of attacks are frequent in non-DL classification systems nowadays [6, 7, 8, 9, 10].

Goodfellow et al. argue that there is a fundamental incompatibility between building simple-to-train linear models and building models that use nonlinear effects to withstand hostile disruption [2]. By creating more effective optimization methods that can successfully train more nonlinear models in the long run, this trade-off may be avoided.

While the bulk of adversarial attacks has concentrated on slightly altering each pixel of an image, there are examples of attacks that are not limited to barely discernible alterations in the image. An approach that is based on creating an image-independent patch and positioning it to cover a tiny area of the image was demonstrated by Brown et al. [11]. The classifier will reliably predict a particular class for the image in the presence of this patch based on the attacker’s preference. This assault is significantly more dangerous than pixel-based attacks like FGSM because it can potentially cause even more damage and because the attacker does not need to know what image they are attacking when they are building the attack. An adversarial patch might then be produced and disseminated for use by more attackers. The conventional defense strategies, which concentrate on protecting against minor perturbations, may not be robust to larger disturbances like these since the attack involves a massive perturbation.

This chapter discusses various adversarial attacks on image classification models and focuses particularly on two specific attacks, the fast gradient sign method (FGSM), and the adversarial patch attack. The impact of these two attacks on image classification accuracy is analyzed and extensive results are presented. The rest of the chapter is organized as follows. Section 2 presents a few related works. Some theoretical background information on adversarial attacks and pre-trained image classification models is discussed in Section 3. Section 4 presents detailed results and their analysis. Finally, the chapter is concluded in Section 5 highlighting some future works.

Advertisement

2. Related work

Deep learning systems are generally prone to adversarial instances. These instances are deliberately selected inputs that influence the network to alter its output without being obvious to a human [5, 12]. Several optimization techniques, including L-BFGS [1], Fast Gradient Sign Method (FGSM) [2], DeepFool [13], and Projected Gradient Descent (PGD) [14] can be used to find these adversarial examples, which typically change each pixel by only a small amount. Other attack strategies aim to change only a small portion of the image’s pixels (Jacobian-based saliency map [15]), or a small patch at a predetermined location [16].

A wide range of fascinating traits of neural networks and related models were demonstrated by Szegedy et al. [1]. The following are some of the important observations of the study: (1) Box-constrained L-BFGS can consistently discover adversarial cases. (2) The adversarial instances in ImageNet [17] data are so similar to the original examples that it is impossible for a human to distinguish between the two. (3) The same adversarial example is commonly classified incorrectly by a large number of classification models, each of which is trained using a different sample of the training data. (4) Adversarial events sometimes make shallow Softmax regression models less robust. (5) Training on adversarial examples can lead to a better regularization of the classification models.

By printing out a huge poster that resembles a stop sign or by applying various stickers to a stop sign, Eykholt et al. [12] demonstrated numerous techniques for creating stop signs that are misclassified by models.

These results suggest that classifiers developed using modern machine learning methods do not actually learn the underlying principles that determine the appropriate output label, even if they perform exceptionally well on the test data. The classification algorithms working for these models perform flawlessly with naturally occurring data. However, their classification accuracy drastically reduces for points that have a low probability in the underlying data distribution. This poses a big challenge to image classification since convolutional neural networks used in the classification work on the computation of perceptual feature similarity based on Euclidean distance. However, the resemblance found in this approach is false if images with unrealistically small perceptual distances actually belong to different classes as per the representation of the neural network.

The problem discussed above is particularly relevant to deep neural networks although linear classifiers are not immune to this problem. No model has yet been able to resist adversarial perturbation while preserving state-of-the-art accuracy on clean inputs. However, several approaches to defending against small perturbations-based adversarial attacks and some novel training approaches have been proposed by researchers [14, 15, 18, 19, 20, 21, 22, 23, 24, 25, 26]. Some of these works proposing methods to defend against adversarial attacks are briefly presented in the following.

Madry et al. designed and trained deep neural networks on the MNIST and CIFAR10 image set that are robust to a wide range of adversarial attacks [14]. The authors formulated an approach to identify a saddle point for optimizing the error function and used a projected gradient descent (PGD) as the adversary. The proposed approach was found to yield a classification accuracy of 89% against the strongest adversary in the test data.

Papernot et al. proposed a novel method to create adversarial samples based on a thorough comprehension of the mapping between inputs and outputs of deep neural networks [15]. In a computer vision application, the authors demonstrated that, while only changing an average of 4.02% of the input characteristics of each sample, their proposed method can consistently create samples that were correctly classified by humans but incorrectly classified in certain targets by a deep neural network with a 97% adversarial success rate. Then, by designing a hardness metric, the authors assessed the susceptibility of various sample classes to adversarial perturbations and outlined a defense mechanism against adversarial samples.

Tramer et al. observed that adversarial attacks are more impactful in a black-box setup, in which perturbations are computed and transferred on undefended models [18]. Adversarial attacks are also very effective when they are launched in a single step that escapes the non-smooth neighborhood of the input data through a short random step. The authors proposed an ensemble adversarial training, a method that adds perturbations obtained from other models to training data. The proposed approach is found to be resistant to black-box adversarial attacks on the ImageNet dataset.

For assessing adversarial resilience on image classification tasks, Dong et al. developed a reliable benchmark [21]. The authors made some important useful observations including the following. First, adversarial training is one of the most effective defense strategies because it can generalize across different threat models. Second. model robustness ness curves are useful in the evaluation of the adversarial robustness of models. Finally, the randomization-based defenses are more resistant to query-based black-box attacks.

Chen et al. examined and evaluated the features and effectiveness of several defense strategies against adversarial attacks [22]. The authors considered the evaluation from four different perspectives: (i) gradient masking, (ii) adversarial training, (iii) adversarial examples detection, and (iv) input modifications. The authors presented several benefits and drawbacks of various defense mechanisms against adversarial attacks and explored the future trends in designing robust methods to defend against such attacks on image classification models.

Advertisement

3. Background concepts

In this section, for the benefit of the readers, some background theories are discussed. The concepts of adversarial attack, fast gradient sign method (FGSM) attack, and three pre-trained convolutional neural network (CNN)-based deep neural network models, ResNet-34, GoogleNet, and DenseNet-161, are briefly introduced in this section.

3.1 Adversarial attacks

Many different adversarial attack plans have been put out, all of which aim to significantly affect the model’s prediction by slightly changing the data or picture input. How can we modify the image of a goldfish so that a classification model that could correctly classify the image before would no longer recognize it? On the other hand, a human would still categorize the image as a goldfish without any doubt, hence the label of the image should not change at the same time. The generator network’s goal under the framework for generative adversarial networks is the same as this one: try to trick another network (a discriminator) by altering its input.

3.2 Fast gradient sign method

The Fast Gradient Sign Method (FGSM), created by Ian Goodfellow et al., is one of the initial attack tactics suggested [2]. The FGSM uses a neural network’s gradients to produce an adversarial image. Essentially, the adversarial image is produced by FGSM by computing the gradients of a loss function (such as mean-square error or category cross-entropy) with respect to the input image and using the sign of the gradients to produce a new image (i.e., the adversarial image) that maximizes the loss. The end result is an output image that, to human sight, appears just like the original, but it causes the neural network to anticipate something different than it should have. The FGSM is represented in (1).

advx=x+εsign(xJθxyE1

The symbols used in (1) have the following significance:

  1. advx the adversarial image as the output

  2. x the original image as the input

  3. y the actual class (i.e., the ground-truth label) of the input image

  4. ε the noise intensity expressed as a small fractional value by which the signed gradients are multiplied to create perturbations. The perturbations should be small enough so that the human eye cannot distinguish the adversarial image from the original image.

  5. θ the neural network model used for image classification

  6. J the loss function

The FGSM attack on an image involves the following three steps.

  1. The value of the loss function is computed after the forward propagation in the network.

  2. The gradients are computed with respect to pixels in the original (i.e., input) image.

  3. The pixels of the input image are perturbed slightly in the direction of the computed gradients so that the value of the loss function is maximized.

Most often, in machine learning, determining the loss after forward propagation is frequently the initial step. To determine how closely the model’s prediction matches the actual class, a negative likelihood loss function is used. Gradients are used to choose the direction in which to move the weights in order to lower the value of the loss function when training neural networks. However, calculating gradients in relation to an image’s pixels is not a usual task. In FGSM, the pixels in the input image are moved in the direction of the gradient to maximize the value of the loss function.

3.3 ResNet-34 architecture

He et al. presented a cutting-edge image classification neural network model containing 34 layers [27]. This deep convolutional neural network is known as the ResNet-34 model. The ImageNet dataset, which includes more than 100,000 images in 200 different classes, served as the pre-training data for ResNet-34. Similar to residual neural networks used for text prediction, ResNet architecture differs from typical neural networks in that it uses the residuals from each layer in the connected layers that follow.

3.4 GoogleNet architecture

Szegedy et al. introduced GoogleNet (also known as Inception V1) in their paper titled “Going Deeper with Convolutions” [28]. In the 2014 ILSVRC image classification competition, this architecture was the winner. This architecture employs methods like global average pooling and 1–1 convolution in the middle of the architecture. A network may experience the issue of overfitting if it is constructed with very deep layers. To address this issue, the GoogleNet architecture was developed with the idea of having filters of various sizes that could function at the same level [28]. The network actually gets bigger with this concept rather than deeper. The architecture has a total of 22 layers, including 27 pooling layers. There are nine linearly stacked inception components that are connected to the global average pooling layer. The readers may refer to the work of Szegedy et al. for more details [28].

3.5 DenseNet-161 architecture

A class of CNN called DenseNets uses dense connections between network layers for matching convolution operation feature-map sizes [29]. These dense connections are called dense blocks. Each layer receives extra inputs from all earlier layers and transmits its own feature maps to all later layers in order to maintain the feed-forward character of the system. Huang et al. demonstrated that a variant of DenseNet architecture called DensseNet-161with k = 48 features per layer and having 29 million parameters can achieve a classification accuracy of 77.8% (i.e., top-1 classification accuracy) on the ImageNet ILSVRC classification dataset. As its name implies, the DenseNet-161 architecture contains 161 layers of nodes. More details on DenseNet-161 architecture may be found in [29, 30].

Advertisement

4. Image classification results and analysis

Experiments are conducted to analyze the effect of two types of adversarial attacks on three well-known pre-trained CNN architectures. Two adversarial attacks considered in the study are the FGSM attack and the adversarial patch attack on a set of images. Three pre-trained architectures on which the attacks are simulated are ResNet-34, GoogleNet, and DenseNet-161. The images are chosen from the ImageNet dataset [17]. The pre-trained CNN models of ResNet-34, GoogleNet, and DenseNet-161 integrated into PyTorch’s torchvision package, are used in the experiments.

4.1 Classification results in the absence of an attack

Before we study the impact of adversarial attacks on the image classification models, we analyze the classification accuracy of the models in the absence of any attack. Since the ImageNet dataset includes 1000 classes, it is not prudent to evaluate a model’s performance just on the basis of its classification accuracy alone. Consider a model that consistently predicts the true label of an input image as the second-highest class using the Softmax activation function. Despite the fact that we would say it recognizes the object in the image, its accuracy is zero. There is not always one distinct label we can assign an image to in ImageNet’s 1000 classes. This is why “Top-5 accuracy” is a popular alternative metric for picture classification over a large number of classes. It shows how often the real label has been within the model’s top 5 most likely predictions. Since the three pre-trained architectures perform very well on the images in the ImageNet dataset, instead of accuracy, the error, i.e., (1- accuracy) values are presented in the results.

Table 1 presents the performance results of three classification models on the whole ImageNet dataset containing 1000 classes of images. It is evident that all three models are highly accurate as depicted by their Top-% error percentage values. The DeepNet-161 model has yielded the highest level of accuracy and the least error among the three architectures. The Top-5 and Top-1 error rates for this model are found to be 2.30% and 15.10%, respectively.

MetricResNet34 modelGoogleNet modelDenseNet161 model
Top-1 error (%)19.1025.2615.10
Top-5 error (%)4.307.742.30

Table 1.

Classification accuracy of ResNet-34, GoogleNet, and Densenet-161 CNN models on the ImageNet data.

After evaluating the overall performance of the three models, we investigate some specific images in the dataset. For this purpose, the images with the indices 0, 6, 13, and 18 are randomly chosen and how the model has classified these images are checked. The images corresponding to the four indices chosen belong to the classes “tench”, “goldfish”, “great white shark” and “tiger shark”, respectively.

Table 2 presents the performance of the ResNet-34 model on the classification task for the four images. It is evident that the model has been very accurate in classification as the confidence associated with the true class of each of the four images is more than 90%. It may be noted that confidence here means the probability value that the model associates with the corresponding class. For example, the ResNet-34 model has yielded a confidence value of 0.9817 for the image whose true class is “tench” with the predicted class “tench”, implying that the model has associated a probability of 0.9817 with its classification of the image to the class “tench”.

Image indexImage true classTop-5 predicted classes and their confidence
ClassConfidence
0tenchtench0.9817
barracouta0.0095
coho0.0085
gar0.0002
sturgeon0.0001
6goldfishgoldfish0.9982
tench0.0005
barracouta0.0005
tailed frog0.0003
puffer0.0002
13great white sharkgreat white shark0.9855
tiger shark0.0109
submarine0.0007
sturgeon0.0006
hammerhead0.0005
18tiger sharktiger shark0.9118
sturgeon0.0251
great white shark0.0202
puffer0.0192
electric ray0.0038

Table 2.

The classification results of the ResNet-34 model for the chosen images.

Figure 1 depicts the classification results of the ResNet-34 model on the four images. In Figure 1, the input image is shown on the left and the confidence values of the model for the top five classes for the image are shown on the right. The confidence values are shown in the form of horizontal bars.

Figure 1.

The classification results of the ResNet34 model for the chosen images.

Table 3 presents the performance of the GoogleNet model on the classification task for the four images. It is observed that the model has been very accurate in the classification task for the “tench” and “goldfish” images. While its accuracy for the image “great white shark” class is high, the model has performed poorly for the image “tiger shark”. However, for the “tiger shark” image the model has still associated the highest confidence value for the correct class, although the confidence is quite low, i.e., 0.3484.

Image indexImage true classTop-5 predicted classes and their confidence
ClassConfidence
0tenchtench0.9826
coho0.0075
barracouta0.0034
goldfish0.0008
gar0.0005
6goldfishgoldfish0.9617
tench0.0129
loggerhead0.0018
barracouta0.0014
coho0.0010
13great white sharkgreat white shark0.8188
sea lion0.0532
gray whale0.0376
tiger shark0.0144
loggerhead0.0082
18tiger sharktiger shark0.3484
platypus0.1978
hammerhead0.0277
sturgeon0.0245
great white shark0.0166

Table 3.

The classification results of the GoogleNet model for the chosen images.

Figure 2 depicts the classification results of the GoogleNet model on the four images. In Figure 2, the input image is shown on the left and the confidence values of the model for the top five classes for the image are shown on the right. The confidence values are shown in the form of horizontal bars.

Figure 2.

The classification results of the GoogleNet model for the chosen images.

Table 4 presents the performance of the DenseNet-161 model on the classification task for the four images. It is observed that the performance of the model on the classification task has been excellent. For all four images, the confidence values computed by the model for the true class have been higher than 94. The results also show that among the three architectures, DenseNet-161 has been the most accurate model for the classification of the four images chosen for analysis.

Image indexImage true classTop-5 predicted classes and their confidence
ClassConfidence
0tenchtench0.9993
barracouta0.0003
coho0.0002
gar0.0001
platypus0.0001
6goldfishgoldfish0.9999
barracouta0.0001
tench0.0001
coho0.0001
gar0.0001
13great white sharkgreat white shark0.9490
tiger shark0.0177
dugong0.0127
sea lion0.0113
gray whale0.0074
18tiger sharktiger shark0.9932
great white shark0.0047
gar0.0008
sturgeon0.0002
hammerhead0.0001

Table 4.

The classification results of the DenseNet-161 model for the chosen images.

Figure 3 depicts the classification results of the DenseNet-161 model on the four images. In Figure 3, the input image is shown on the left and the confidence values of the model for the top five classes for the image are shown on the right. The confidence values are shown in the form of horizontal bars.

Figure 3.

The classification results of the DenseNet161 model for the chosen images.

4.2 Classification results in the presence of the FGSM attack

After observing the performance of the three CNN architectures for the image classification tasks on the images in the ImageNet dataset, the impact of the adversarial attacks on the classifier models is studied. We start with the FGSM attack with a value of 0.02 for epsilon (ε). The value of ε = 0.02 indicates that the values of pixels are changed by an amount of 1 (approximately) in the range of 0 to 255 – the range over which a pixel value can change. This change is so small that it will be impossible to distinguish the adversarial image from the original one. The performance results of the three models in the presence of FGSM attack with ε = 0.02 have been presented in Tables 57. The results are pictorially depicted in Figures 46.

Image true classTop-5 predicted classes and their confidence
ClassConfidence
tenchcoho0.6684
barracouta0.2799
gar0.0321
tench0.0128
sturgeon0.0066
goldfishbarracouta0.5885
tench0.1392
gar0.0945
tailed frog0.0462
coho0.0434
great white sharkdugong0.2661
tiger shark0.2575
gray whale0.0578
great white shark0.0537
submarine0.0303
tiger sharkotter0.2462
puffer0.1666
beaver0.1543
platypus0.1083
sea lion0.0565

Table 5.

The performance of ResNet-34 model under FGSM attack with ε = 0.02.

Image true classTop-5 predicted classes and their confidence
ClassConfidence
tenchcoho0.2652
tench0.2116
barracouta0.1275
gar0.0219
sturgeon0.0153
goldfishgoldfish0.0553
tench0.0305
barracouta0.0218
gar0.0127
great white shark0.0115
great white sharkweasel0.1880
sea lion0.1489
otter0.1402
platypus0.0605
tailed frog0.0400
tiger sharkplatypus0.5450
beaver0.0336
American coot0.0265
terrapin0.0142
otter0.0127

Table 6.

The performance of GoogleNet model under FGSM attack with ε = 0.02.

Image true classTop-5 predicted classes and their confidence
ClassConfidence
tenchcoho0.6793
tench0.1645
gar0.0466
barracouta0.0373
sturgeon0.0273
goldfishbarracouta0.7139
tench0.1357
coho0.0665
gar0.0645
goldfish0.0137
great white sharksea lion0.4830
dugong0.4388
gray whale0.0255
tiger shark0.0099
snorkel0.0023
tiger sharkgreat white shark0.9025
gar0.0606
barracouta0.0059
tiger shark0.0058
coho0.0058

Table 7.

The performance of DenseNet-161 model under FGSM attack with ε  0.02.

Figure 4.

The performance of the ResNet-34 model under FGSM attack with ε = 0.02.

Figure 5.

The performance of the GoogleNet model under FGSM attack with ε = 0.02.

Figure 6.

The performance of DenseNet-161 model under FGSM attack with ε = 0.02.

It is evident that all three models are adversely affected by the FGSM attack even with a value of ε as low as 0.02. While the adversarial images are impossible to distinguish from the original ones, none of the models could correctly classify any of the four images as the highest confidence values were assigned to incorrect classes (Tables 810).

Noise level (ε)Classification error in percent
Top-1 errorTop-5 error
0.0183.4443.76
0.0293.5660.54
0.0395.6668.60
0.0496.2472.42
0.0596.7674.78
0.0697.0076.18
0.0796.9876.92
0.0897.0077.54
0.0996.9477.68
0.1096.9277.56

Table 8.

Performance of ResNet34 under FGSM attack for different values of ε.

Noise level (ε)Classification error in percent
Top-1 errorTop-5 error
0.0182.7649.52
0.0291.1065.86
0.0393.7272.68
0.0494.6675.86
0.0595.1477.62
0.0695.2678.40
0.0795.3678.96
0.0895.4079.04
0.0995.4679.24
0.1095.4079.20

Table 9.

Performance of GoogleNet under FGSM attack for different values of ε.

Noise level (ε)Classification error in percent
Top-1 errorTop-5 error
0.0179.0833.10
0.0290.0850.64
0.0392.9858.64
0.0494.0462.88
0.0594.3865.12
0.0694.3866.38
0.0794.3466.76
0.0894.4266.94
0.0994.1866.68
0.1094.1066.70

Table 10.

Performance of DenseNet161 under FGSM attack for different values of ε.

The value of the parameter ε is increased from 0.01 to 0.10 by a step of 0.01. It is observed that except for a few cases, the classification error increased consistently with ε till ε reaches a value in the range of 0.08–0.09. The impact of the FGSM attack is so severe that the classification error for the ResNet-34 model in the presence of this attack reaches as high values as 97.00% (Top-1 error) and 77.68% (Top-5 error). The corresponding values for the GoogleNet model are 95.46% (Top-1 error) and 79.24% (Top-5 error), and for the DenseNet-161 are 94.42% (Top-1 error) and 66.94% (Top-5 error). Among the three models, DenseNet-161 looked to be the most robust against the FGSM attack.

4.3 Classification results in the presence of the adversarial patch attack

As mentioned in Section 1, an attack can also be launched on image classification models by introducing adversarial patches [11]. In this attack, the strategy is to transform a small portion of the image into a desired form and shape instead of the FGSM’s approach of slightly altering some pixels. This will be able to deceive the classification model and force it to predict a certain pre-determined class. In practical applications, this type of attack poses a greater hazard than FGSM. Consider an autonomous vehicle network that receives a real-time image from a camera. To trick this vehicle into thinking that an automobile is actually a pedestrian, another driver may print out a certain design and stick it on the rear part of the vehicle.

Figure 7.

Five images used as patches: cock, balloon, computer keyboard, electric guitar, and radio. The sizes for the patch images: 32*32, 48*48, and 64*64.

For simulating the adversarial patch attack on the same four images on which the FGSM attack was launched, at first, five images are chosen randomly which will be used as the patches. As shown in Figure 7, the five patch images are (i) cock, (ii) balloon, (iii) computer keyboard, (iv) electric guitar, and (v) radio. For the purpose of studying the effect of the sizes of the patch images on the accuracies of the classification models, three different sizes are considered for each patch image. The three sizes are (i) 32*32, (ii) 48*48, and (iii) 64*64. The sizes are expressed in terms of the number of pixels along the x and y dimensions. Tables 1116 present the accuracies (Top 1% and Top 5%) of the models for different sizes of different patch images. Here, accuracy refers to the percentage of cases in which the images have been classified as the target class (i.e., patch class) with the highest confidence. Figures 810 depict the performance of the classification models in the presence of a “balloon” patch of size 64*64. The pictures for other patch images and other sizes are not presented for the sake of brevity.

Patch imageSize of the patch image
32*3248*4864*64
cock78.7592.0197.82
balloon81.1792.3597.44
computer keyboard0.0468.4692.97
electric guitar54.4787.0895.93
radio22.3575.4994.03

Table 11.

Top-1 accuracy (%) of ResNet34 model for different patch sizes.

Patch imageSize of the patch image
32*3248*4864*64
cock93.4898.5999.84
balloon93.7398.8899.83
computer keyboard1.2291.6399.45
electric guitar77.4397.1599.64
radio62.0893.6799.37

Table 12.

Top-5 accuracy (%) of ResNet34 model for different patch sizes.

Patch imageSize of the patch image
32*3248*4864*64
cock0.000.270.94
balloon85.0695.6798.76
computer keyboard17.1780.8197.22
electric guitar70.1493.6498.76
radio7.7381.3495.59

Table 13.

Top-1 accuracy (%) of GoogleNet model for different patch sizes.

Patch imageSize of the patch image
32*3248*4864*64
cock0.098.6932.63
balloon96.7699.8099.99
computer keyboard66.7296.8199.94
electric guitar90.4599.6099.98
radio65.1597.2199.84

Table 14.

Top-5 accuracy (%) of GoogleNet model for different patch sizes.

Patch imageSize of the patch image
32*3248*4864*64
cock0.000.000.01
balloon14.7035.4140.25
computer keyboard0.020.0847.46
electric guitar1.915.9746.74
radio0.538.8843.44

Table 15.

Top-1 accuracy (%) of DenseNet-161 model for different patch sizes.

Patch imageSize of the patch image
32*3248*4864*64
cock0.090.080.16
balloon41.6369.4469.75
computer keyboard0.761.4679.86
electric guitar13.1022.6775.43
radio10.3646.6374.84

Table 16.

Top-5 accuracy (%) of DenseNet161 model for different patch sizes.

Figure 8.

The classification results of the ResNet-34 model in the presence of a patch image of a balloon with size 64*64.

Figure 9.

The classification results of the GoogleNet model in the presence of a patch image of a balloon with size 64*64.

Figure 10.

The classification results of the DenseNet-161 model in the presence of a patch image of a balloon with size 64*64.

The following observations are made on the results of the adversarial patch attack.

  1. For the same patch image, all three models, ResNet-34, GoogleNet, and DenseNet-161, exhibited higher accuracy in deceiving the models into the wrong classification for a bigger patch size. In other words, for all three models, the attack effectiveness is the highest for the patch size 64*64, for a given patch image.

  2. For most of the patch images and patch sizes, the effectiveness of the attack on three models is found to be the most for the patch image of “balloon”. However, for the ResNet-34 model, with a patch image of “cock” the attack yielded the maximum effectiveness for the patch size of 64*64. For the GoogleNet model, along with the “balloon” patch image, the “electric guitar” patch of size 64*64 also produced the maximum Top-1 accuracy. For the DenseNet-161 model, the attack exhibited the highest effectiveness for the patch image of “computer keyboard” of size 64*64 for both Top-1 and Top-5 cases.

  3. For obvious reasons, the attack effectiveness (i.e., the accuracy of the attack) is found to be always higher for the Top-5 case than its corresponding Top-1 counterpart.

Advertisement

5. Conclusion

In this chapter, some adversarial attacks on CNN-based image classification models are discussed. In particular, two attacks, e.g., the FGSM attack and adversarial patch attack are presented in detail. The former attack involves changing the pixels of an image in the direction of their maximum gradients so that the value of the loss function is maximized. While the resultant adversarial image is impossible to distinguish from the original image by human eyes, the highly trained classification models will most likely classify the adversarial image into a class that is different from its ground truth. For the adversarial patch attack, an image patch of a different class is inserted in the original image in such a way that the trained models will be deceived and forced to incorrectly classify the original image into the class of the patch image. It is observed in the study that with the increase in the amount of perturbation created in the original image by the FGSM attack, the error in the classification increases till a threshold level is reached at which the attack saturates. No further increase in perturbation usually leads to a further decrease in the classification accuracy of the models. For the adversarial patch attack, the attack effectiveness increases with the increase in the patch size.

References

  1. 1. Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow IJ, et al. Intriguing properties of neural networks. In: Proceedings of International Conference on Learning Representations (ICLR’14), Poster Track, April 14–16, 2014, Banff, Canada. 2014. DOI: 10.48550/arXiv.1312.6199
  2. 2. Goodfellow IJ, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. In: Proceedings of International Conference on Learning and Representations (ICLR’15), Poster Track, May 7–9, 2015, San Diego, CA, USA. 2015. DOI: 10.48550/arXiv.1412.6572
  3. 3. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research. 2014;15(1):1929-1958
  4. 4. NVIDIA. Solutions for self-driving cars. 2023. Available online at: https://www.nvidia.com/en-us/self-driving-cars [Accessed: May 9, 2023]
  5. 5. Ciresan D, Meier U, Masci J, Schmidhuber J. Multi-column deep neural network for traffic sign classification. Neural Networks. 2012;32:333-338. DOI: 10.1016/j.neunet.2012.02.023
  6. 6. Huang L, Joseph AD, Nelson B, Rubinstein BIP, Tygar JD. Adversarial machine learning. In: Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, Chicago, IL, USA, October 21, 2011. New York, NY: ACM Press; 2011. pp. 43-58. DOI: 10.1145/2046684.2046692
  7. 7. Biggio B, Fumera G, Roli F. Pattern recognition systems under attack: Design issues and research challenges. International Journal of Pattern Recognition and Artificial Intelligence. 2014;28(7):1460002. DOI: 10.1142/S0218001414600027
  8. 8. Biggio B, Corona I, Maiorca D, Nelson B, Srndic N, Laskov P, et al. Evasion attacks against machine learning at test time. In: Blockeel H et al., editors. Machine Learning and Knowledge Discovery in Databases. Vol. 8190. Berlin, Heidelberg, Germany: Springer; 2012. pp. 387-402. DOI: 10.1007/978-3-642-40994-3_25
  9. 9. Anjos A, Marcel S. Counter-measures to photo attacks in face recognition: A public database and a baseline. In: Proceedings of the 2011 International Joint Conference on Biometrics (IJCB), October 11–13, 2011. Washington DC, USA: IEEE; 2011. pp. 1-7. DOI: 10.1109/IJCB.2011.6117503
  10. 10. Fogla P, Lee W. Evading network anomaly detection systems: Formal reasoning and practical techniques. In: Proceedings of the 13th ACM Conference on Computer and Communications Security, October 30–November 3, 2006, Alexandria, VA, USA. New York, USA: ACM; 2006. pp. 59-68. DOI: 10.1145/1180405.1180414
  11. 11. Brown TB, Mane D, Roy A, Abadi M, Gilmer J. Adversarial patch. In: Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS’17) Workshop, December 4-9, 2017, Long Beach, CA, USA. Red Hook, NY, USA: Curran Associates Inc; 2017. DOI: 10.48550/arXiv.1712.09665
  12. 12. Eykholt K, Evtimov I, Fernandes E, Li B, Rahmati A, Xiao C, et al. Robust physical-world attacks on deep learning visual classification. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18), June 18–23, 2018, Salt Lake City, UT, USA. Piscataway, NJ, USA: IEEE Press. pp. 1625-1634. DOI: 10.1109/CVPR.2018.00175
  13. 13. Dezfooli M, Fawzi A, Frossard P. Deepfool: A simple and accurate method to fool deep neural networks. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16), June 27–30, 2016, Las Vegas, NV, USA. Piscataway, NJ, USA: IEEE Press; 2016. pp. 2574-2582. DOI: 10.1109/CVPR.2016.282
  14. 14. Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A. Towards deep learning models resistant to adversarial attacks. In: Proceedings of International Conference on Learning Representations (ICLR’18), Poster Track, April 30–May 3 2018, Vancouver, BC, Canada. 2018. DOI: 10.48550/arXiv.1706.06083
  15. 15. Papernot N, McDaniel P, Jha S, Fredrikson M, Celik ZB, Swami A. The limitations of deep learning in adversarial settings. In: Proceedings of 2016 IEEE European Symposium on Security and Privacy (EuroS&P’16), March 21–24, 2016, Saarbruecken, Germany. Piscataway, NJ, USA: IEEE Press; 2016. pp. 372-387. DOI: 10.1109/EuroSP.2016.36
  16. 16. Sharif M, Bhagavatula S, Bauer L, Reiter MK. Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, October 24–28, 2016, Vienna, Austria. New York, NY, USA: ACM Press; 2016. pp. 1528-1540. DOI: 10.1145/2976749.2978392
  17. 17. Deng J, Dong W, Socher R, Li L, Li K, Fei LF. ImageNet: A large-scale hierarchical image database. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09), June 20–25, Miami, FL, USA. Piscataway, NJ, USA: IEEE Press. pp. 248-255. DOI: 10.1109/CVPR.2009.5206848
  18. 18. Tramer F, Kurakin A, Papernot N, Goodfellow I, Boneh D, McDaniel P. Ensemble adversarial training: Attacks and defenses. In: Proceedings of International Conference on Learning Representations (ICLR’18), Poster Track, April 30–May 3, 2018, Vancouver, BC, Canada. 2018. DOI: 10.48550/arXiv.1705.07204
  19. 19. Gu S, Rigazio L. Towards deep neural network architectures robust to adversarial examples. In: Proceedings of International Conference on Learning and Representations (ICLR’15), Poster Track, May 7–9, 2015, San Diego, CA, USA. 2015. DOI: 10.48550/arXiv.1412.5068
  20. 20. Chalupka K, Perona P, Eberhardt F. Visual causal feature learning. In: Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence, Amsterdam, Netherlands, July 12–16, 2015. Arlington, VA, USA: AUAI Press; 2015. pp. 181-190. DOI: 10.48550/arXiv.1412.2309
  21. 21. Dong Y, Fu QA, Yang X, Pang T, Su H, Xiao Z, et al. Benchmarking adversarial robustness on image classification. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20), June 13–19, 2020. Seattle, WA, USA; 2020. pp. 318-328. DOI: 10.1109/CVPR42600.2020.00040
  22. 22. Chen Y, Zhang M, Li J, Kuang X. Adversarial attacks and defenses in image classification: A practical perspective. In: Proceedings of the 7th International Conference on Image, Vision and Computing (ICIVC’22), July 26–28, 2022, Xian, China. Piscataway, NJ, USA: IEEE Press; 2022. pp. 424-430. DOI: 10.1109/ICIVC55077.2022.9886997
  23. 23. Pestana C, Akhtar N, Liu W, Glance D, Mian A. Adversarial attacks and defense on deep learning classification models using YCbCr color images. In: Proceedings of 2021 International Joint Conference on Neural Networks (IJCNN’21), July 18–22, 2021, Shenzhen, China. Piscataway, NJ, USA: IEEE Press; 2021. pp. 1-9. DOI: 10.1109/IJCNN52387.2021.9533495
  24. 24. Li C, Fan C, Zhang J, Li C, Teng Y. A block gray adversarial attack method for image classification neural network. In: Proceedings of 2022 IEEE 24th International Conference on High Performance Computing & Communications (HPCC’22), December 18–20, 2022, Hainan, China. Piscataway, NJ, USA: IEEE Press; 2022. pp. 1682-1689. DOI: 10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00255
  25. 25. Yuan H, Li S, Sun W, Li Z, Steven X. An efficient attention based image adversarial attack algorithm with differential evolution on realistic high-resolution image. In: Proceedings of 2021 IEEE/ACIS 20th International Fall Conference on Computer and Information Science (ICIS Fall’21), October 1–15, 2021, Xian, China. Piscataway, NJ, USA: IEEE Press; 2021. pp. 115-120. DOI: 10.1109/ICISFall51598.2021.9627468
  26. 26. Xu Y, Du B, Zhang L. Self-attention context network: Addressing the threat of adversarial attacks for hyperspectral image classification. IEEE Transactions on Image Processing. 2021;30:8671-8685. DOI: 10.1109/TIP.2021.3118977
  27. 27. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16), Las Vegas, NV, USAJune 27–30, 2016. Piscataway, NJ, USA: IEEE Press; 2016. pp. 770-778. DOI: 10.1109/CVPR.2016.90
  28. 28. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15), Boston, MA, USA, June 7–12, 2015. Piscataway, NJ, USA: IEEE Press; 2015. pp. 1-9. DOI: 10.1109/CVPR.2015.7298594
  29. 29. Huang G, Liu Z, Van Der Maaten L, Weinberger K. Densely connected convolutional networks. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17), Honolulu, HI, USA. Piscataway, NJ, USA: IEEE Press; 2017. pp. 2261-2269. DOI: 10.1109/CVPR.2017.243
  30. 30. Pleiss G, Chen D, Huang G, Li T, van der Maaten L, Weinberger KQ. Memory-efficient implementation of DenseNets. Technical report, arXiv:1707.06990. 2017. DOI: 10.48550/arXiv.1707.06990

Written By

Jaydip Sen and Subhasis Dasgupta

Submitted: 09 May 2023 Reviewed: 04 July 2023 Published: 26 July 2023