key: cord-0474213-15z8gzw7
authors: Gupta, Akash Kumar; Chattopadhyay, Arpan; Yadav, Darpan Kumar
title: Compressive Sensing Based Adaptive Defence Against Adversarial Images
date: 2021-10-11
journal: nan
DOI: nan
sha: 4697ef43450f173e12b1e22b77e976dc56fdf5fe
doc_id: 474213
cord_uid: 15z8gzw7

Herein, security of deep neural network against adversarial attack is considered. Existing compressive sensing based defence schemes assume that adversarial perturbations are usually on high frequency components, whereas recently it has been shown that low frequency perturbations are more effective. This paper proposes a novel Compressive sensing based Adaptive Defence (CAD) algorithm which combats distortion in frequency domain instead of time domain. Unlike existing literature, the proposed CAD algorithm does not use information about the type of attack such as l0, l2, l-infinity etc. CAD algorithm uses exponential weight algorithm for exploration and exploitation to identify the type of attack, compressive sampling matching pursuit (CoSaMP) to recover the coefficients in spectral domain, and modified basis pursuit using a novel constraint for l0, l-infinity norm attack. Tight performance bounds for various recovery schemes meant for various attack types are also provided. Experimental results against five state-of-the-art white box attacks on MNIST and CIFAR-10 show that the proposed CAD algorithm achieves excellent classification accuracy and generates good quality reconstructed image with much lower computation

T HE rapid development of Deep Neural Network (DNN) and Convolutional Neural Network (CNN) has resulted in the widening of computer vision applications such as object recognition, Covid-19 diagnosis using medical images [2] , autonomous vehicles [3] , face detection in security and surveillance systems [4] etc. In all these applications, images play a vital role. Recent studies have shown that smartly crafted, human imperceptible, small distortion in pixel values can easily fool these CNNs and DNNs [5] - [7] . Such adversarial images result in incorrect classification or detection of an object or a face, leading to accidents on roads or by drones, traffic jam, missed identification of a criminal, etc. While many countermeasures have been proposed in recent years to tackle adversarial images, they are mostly based on heuristics and do not perform well against all classes of attacks. In this connection, the recent developments on compressive sensing [8] , [9] allows signal recovery at sub-Nyquist rate, which is suitable for application to images, videos and audio signals that are sparse in Fourier and wavelet domain. This also allows us to achieve lower complexity, lower power, smaller memory and less number of sensors, and provide theoretical performance guarantee for the image processing algorithms. These reasons motivate us to use compressive sensing to combat adversarial images.

In this paper, we propose a compressive sensing based adaptive defence (CAD) algorithm that can defend against all l 0 , l 2 , l ∞ adversaries as well as gradient attacks. In order to identify the attack type and choose the appropriate recovery method, we use the popular exponential weight algorithm [10] adapted from the multi-armed bandit literature for exploration and exploitation decision, along with compressive sensing based recovery algorithms such as compressive sampling matching pursuit (CoSaMP) [11] , standard basis pursuit and modified basis pursuit with novel constraints to mitigate l 0 , l ∞ attack. Numerical results reveal that CAD is efficient in classifying both grayscale and colored images, and that it does not suffer from clean data accuracy and gradient masking.

Existing research on adversarial images is broadly focused on two categories: attack design and defence algorithm design.

Attack design: Numerous adversarial attacks have been proposed in the literature so far. They can be categorized as white box attacks and black box attacks. In white box attacks, the attacker has full knowledge of trained classifier its architectures, parameters and weights. Examples of white box attack include fast gradient sign method (FGSM [7] ), projected gradient descent (PGD [12] ), Carlini Wagner L2 (CW-L2) attack [13] , basic iterative method (BIM [6] ), Jacobian saliency map attack (JSMA [14] ) etc. In black box attack, the attacker generates the adversarial perturbation without having any knowledge of the target model. Transfer-based attacks [15] , gradient estimation attacks [16] and boundary attack [17] ) are some examples of black box attack.

The adversarial attacks can also be divided into targeted attacks and non-targeted attacks. In targeted attacks, an attacker seeks to classify an image to a target class which is different from the original class. On the other hand, in nontargeted attack, the attacker's goal is just to misclassify an image. Based on the nature of perturbation error, attacks are further grouped into various norm attacks, such as CW (L 2 ) attack, L ∞ BIM attack, etc.

Defence design: Adversarial image problem can be tackled either by (i) increasing the robustness of the classifier by using either image processing techniques, or adversarial training, or compressive sensing techniques (see [18] - [22] ), or by (ii) distinguishing between clean and malicious images [23] , [24] .

Existing defense schemes based on compressive sensing [18] , [19] assumes that normally images have heavy spectral strength at lower frequencies and little strength at higher frequencies, which allows the adversary to modify the high-frequency spectral components to fool the human eye. Usually, most of the adversarial attacks [13] , [25] , [17] work by searching the whole available attack space and are used to converge to high frequency perturbations to fool the classifier. However, it has recently been observed that constraining attack to low-frequency perturbations and keeping small distortion bound in l ∞ norm is more effective, and achieves high efficiency and transferability [26] , [27] .

The authors of [18] proposed a technique based on compressive sensing to combat l 0 attack; the technique recovers low frequency components corresponding to 2D discrete cosine transform (DCT) basis. In this paper, the adversarial image vector y = x + e, where the original image x is k-sparse in Fourier domain and the injected noise e is t-sparse in time domain. This defense is based on the fact that usually the perturbation crafted by an attacker is on high frequency components, and hence it is not perceptible to human eye. Hence, the proposed defense works by just recovering the few top most DCT low frequency coefficients and reconstructing images using those coefficients only. Authors of [19] extended the same framework and proposed compressive recovery defense (CRD) to counter l 2 , l ∞ attack. They proposed various algorithms for different perturbation attacks which require prior knowledge of the type of attack. However, they did not prescribe any choice of the recovery algorithm since the type of perturbation is not known apriori.

Another popular technique to counter malicious attacks is adversarial training based defense. Here the goal is to increase the robustness of the model by training the classifier using several adversarial examples. The authors of [12] used projected gradient adversaries and clean images to train the network; though their proposed defense works well for datasets having grayscale images such as MNIST, it suffers from low classification accuracy for datasets having colored images such as CIFAR-10. The authors of [20] used the same method and considered the properties of loss surface under various adversarial attacks in parameter and input domain. They showed that model robustness can be increased by using decision surface geometry as a parameter. The proposed defense has a very high computational complexity. The authors of [28] proposed collaborative multi-task training (CMT) to counter various attacks. They encoded training labels into label pairs which allowed them to detect adversarial images by determining the pairwise connections between actual output and auxiliary output. However, an enormous volume of nontargeted malicious samples is needed for determining the encoding format in [28] . Also, the proposed defense is only applicable for non-targeted attacks.

Several classical image processing techniques have been used earlier to combat adversarial attacks. The authors of [21] used Gaussian kernels with various intensities to form multiple representations of the images in the dataset, and then fed these images to the classifier. Classification and attack detection were achieved by taking an average of multiple confidence values given by the classifier. The authors of [22] used pre-processing techniques; they altered the pixel values of images in the training and testing dataset block-wise by maintaining some common key. Using these image pre-processing techniques as a defense requires a lot of computations for each image in the dataset. Also, these papers did not establish any performance bound.

All the above papers deal with classification based defense. Detection based defense has been proposed in [23] , where the authors have proposed the adaptive perturbation based algorithm (APERT, a pre-processing algorithm) using principal component analysis (PCA), two-timescale stochastic approximation and sequential probability ratio test (SPRT [29] ) to distinguish between clean and adversarial images.

We have made following contributions in this paper:

• We propose a novel compressive sensing based adaptive defence (CAD) algorithm to combat l 0 , l 2 , l ∞ norm attacks as well as gradient based attacks, with much lower computational complexity compared to existing works. The computational complexity is O(N 2 ) where N is the number of pixels in an image. • CAD is the first algorithm that can detect the type of attack if it falls within certain categories (such as l 0 , l 2 , l ∞ ), and choose an appropriate classification algorithm to apply on the potentially adversarial image.

To this end, we have adapted the popular exponential weight algorithms [10] , [30] from multi-armed bandit literature to our setting, which adaptively assigns a score to each attack type, thus guiding us in choosing the appropriate recovery algorithm (e.g., CoSaMP, basis pursuit etc.). The CAD algorithm does not require any prior knowledge of the adversary. • We consider adversarial perturbation in the frequency domain instead of the time domain while formulating the problem, which allows us to counter both low as well as high frequency spectral components. • We propose modified basis pursuit using a novel constraint to mitigate l 0 and l ∞ norm attacks, and establish its performance bound. • Our work has the potential to trigger a new line of research where compressive sensing and multi-armed bandits can be used for detection and classification of adversarial videos.

This paper is further arranged as follows. Description of various recovery algorithms and their performance bounds are established in Section II. The proposed CAD algorithm is described in Section III. Complexity analysis of CAD is provided in Section IV, followed by the numerical results in Section V and conclusions in Section VI.

In this section, we define the basic problem and propose various recovery algorithms assuming that the attack type is known to the classifier. It is noteworthy that here we propose modified versions of basis pursuit to combat l 0 and l ∞ attacks in the spectral domain, and provide performance bounds for these algorithms. The background theory provided in this section are prerequisites to understand the performance of the proposed CAD algorithm later under various circumstances.

Let us consider a clean, vectorized image x ∈ R N ×1 , and let us assume that it is k-sparse [9, Definition 2.1] in discrete Fourier transform domain. Let its Fourier coefficients bex = F x, where F ∈ C N ×N is the DFT matrix. The adversary modifies the image in spectral domain by adding an error vector e tox, and the distorted image becomes y = F −1 (x + e). For l 0 attack, e is assumed to be τsparse, so thatx + e becomes at most (k + τ ) sparse in Fourier domain. Defining A . = F −1 and β . = F −1 e, the modified image becomes y = Ax + β. Our objective is to findx from y. We will solve this problem iteratively by using compressive sensing based adaptive defense (CAD) algorithm comprising compressive sampling matching pursuit (CoSaMP) and modified version of basis pursuit for various attacks, and an adapted version of the exponential weight algorithm for selecting the recovery algorithm.

We know that images are compressible signals as their coefficients decay rapidly in Fourier domain when arranged according to their magnitudes. CoSaMP [11] iteratively recovers the approximate Fourier coefficients of a compressible signal from noisy samples given that the signal is sparse in the Fourier domain; it is based on orthogonal matching pursuit (OMP), but provides stronger guarantee than OMP. The authors of [11] have shown that this algorithm produces a 2k-sparse recovered vector whose recovery error in L 2 norm is comparable with the scaled approximation error in L 1 norm. CoSaMP provides optimal error guarantee for sparse signal, compressible signal and arbitrary signal.

Since we do not know apriori whether the attack is l 0 , l 2 , l ∞ or gradient-based, and since it is difficult to infer the type of the attack initially, we use CoSaMP along with various versions of basis pursuit for Fourier coefficient recovery. This is further motivated by the fact that CoSaMP is robust against arbitrary injected error [11] . However, our proposed CAD algorithm (described in Section III) also adaptively assigns a score to each recovery scheme via the exponential weight algorithm using the residue-based feedback for each algorithm, and probabilistically selects an algorithm in each iteration based on the assigned scores. The exponential weight algorithm is typically used to solve online learning problems that involve exploration and exploitation, and the robustness of CoSaMP facilitates exploration especially at the initial phase when the algorithm has not developed a strong belief about the type of attack. In this connection, it is worth mentioning that CoSaMP has provably strong performance bounds in all cases and also works well for highly sparse signals.

Let us denote byx 0 the initialisation before applying CoSaMP algorithm (usually we takex 0 = 0). The quantitŷ x h(k) is a k-sparse vector (i.e., its l 0 norm is at most k) that consists of k largest entries (in terms of absolute values) of x. We also definex t(k) =x −x h(k) . The iteration number in the CoSaMP algorithm is denoted by n.

The performance guarantee of CoSaMP is provided through the following theorem:

with card(S) = k, the Fourier coefficientsx n defined by CoSaMP with y = Ax + β satisfies:

where the constant 0 < ρ < 1 and τ > 0 depend only on δ 4k .

Proof. The proof is similar to that of [9, Theorem 6.27].

Standard basis pursuit is chosen to counter l 2 perturbation [31] since it minimizes the l 1 norm of Fourier coefficients while constraining the l 2 -norm of the injected error. Let us assume that the l 2 perturbation satisfies ||F −1 e|| 2 ≤ η for a small η, and hence is imperceptible to human eye. Since F −1 is an orthonormal matrix, we can write it as ||e|| 2 ≤ η.

Let σ k (x) 1 . = min ||z||0≤k ||x − z|| 1 . Performance bound for the standard basis pursuit algorithm is provided in the following theorem:

Theorem 2. Suppose that the 2k th restricted isometry constant of the matrix A ∈ C N ×N satisfies δ 2k < 0.624. Then, for anyx ∈ C N and y ∈ C N with ||Ax − y|| 2 ≤ η, a solutionx * of min z∈C N ||z|| 1 subject to ||Az − y|| 2 ≤ η approximates thex with errors

where the constants C, D > 0 depend only on δ 2k .

Proof. The proof is similar to [9, Theorem 6.12]

From Theorem 2, it is clear that, in order to guarantee unique recovery of largest k Fourier coefficients, sensing matrix A should satisfy restricted isometry property (RIP) [9, Definition 6.1] of order 2k. It has been observed that with high probability, random Gaussian and partial Fourier matrices satisfy RIP properties [32] , which ensures that any 2k columns in matrix A are linearly independent. We can relate performance bound (3) in spectral domain with that in time domain, since F −1 is an orthonormal matrix.

D. Combating l 0 Attack using basis pursuit

In Section III, we employ another modified version of basis pursuit to counter l 0 attack; this involves a slightly different formulation. Let us assume that the perturbation error e is τ sparse, and let us arrange perturbations of error vector e in ascending order [e 1 , e 2 , ...e τ ...0]. In l 0 attack, the attacker has constraints only on the number of Fourier coefficients that can be perturbed. Since according to the uncertainty principal [33] any image cannot be simultaneously narrow in the pixel domain as well as in spectral domain, the l ∞ norm of the injected error e under l 0 attack should have small enough to remain imperceptible to the human eye, i.e., |e| ∞ < η , for some constant η . Now, it is well known that e 2 ≤ e 1 , and we also notice that

The performance bound for the modified basis pursuit algorithm under l 0 attack is provided in the following theorem:

where the constants C, D > 0 depend only on δ 2k .

Proof. This Theorem can be followed easily using Theorem 2 and the fact that e 2 ≤ τ |e τ | ≤ τ η as discussed earlier.

Let us assume that ||e|| ∞ < η . Now, since F is orthonormal,

and hence

The performance guarantee for modified basis pursuit under l ∞ attack is provided in the following theorem:

Theorem 4. Suppose that the 2k th restricted isometry constant of the matrix A ∈ C N ×N satisfies δ 2k < 0.624. Then, for anyx ∈ C N and y ∈ C N with ||Ax − y|| 2 ≤ √ N η , a solutionx * of min z∈C N ||z|| 1 subject to ||Az−y|| 2 ≤ √ N η approximates thex with errors

where the constants C, D > 0 depend only on δ 2k .

Proof. The proof follows easily from Theorem 2 and (7).

F. Combating l 1 attack using basis pursuit

If e is such that e 1 < η, then the error in the recovered image also satisfies F −1 e 2 = e 2 ≤ e 1 ≤ η, and we can solve the same l 1 minimization problem with the same constraint as in Section II-C for l 2 attack. Similarly, its performance bound will be given by Theorem 2.

In this section, we propose our main algorithm to combat adversarial images. Since the CAD algorithm does not have any prior knowledge on the type of attack, CAD algorithm employs an adaptive version of the exponential weight algorithm [10] , [30] for exploration and exploitation to assign a score on each possible attack type, and chooses an appropriate recovery method based on the inferred nature of the injected error. In this paper, we consider four actions, i.e., four different ways to recover k-sparse Fourier coefficients, corresponding to different types of perturbation:

• CoSaMP (Action 1): This greedy approach allow us to accurately approximate the Fourier coefficients initially when we do not have any belief for the type of attack. As iterations progress, the algorithm explores other actions as well. • Modified Basis pursuit L 0 (Action 2): A modified form of basis pursuit with novel constraint e 2 ≤ τ η is used to tackle l 0 perturbation attack. • Standard Basis pursuit L 1 and L 2 (Action 3):

Standard basis pursuit method is used to counter both l 1 and l 2 attack. • Modified Basis pursuit L ∞ (Action 4): Modified basis pursuit is used to tackle l ∞ norm attack, using novel constraint given by (7).

In the next three subsections, we discuss three major aspects of our proposed CAD algorithm: (i) adaptive exponential weight algorithm for choosing an appropriate recovery scheme, (ii) actions and feedback, and (iii) stopping criteria.

A. The adaptive version of exponential weight for choosing the recovery scheme Algorithm 1 summarizes the overall defence strategy. In each iteration t, the algorithm chooses randomly an action using a probability distribution p ai (t) where a i , i ∈ {1, 2, 3, 4} denotes the action chosen.

The probability of choosing an action is given by exponential weighting:

where S ai (t − 1) = t−1 τ =1 r ai (τ ) is the total score up for the action a i . Here σ and γ are tuning parameters such that σ > 0 and γ ∈ (0, 1). The reward for action a i at the t-th iteration, r ai (t) is the following:

if a i is not chosen in the t-th iteration (11) Here f ai (t) is a binary feedback that is obtained by checking certain conditions for action a i in the t-th iteration; this feedback signifies the applicability of action a i . If action a i is chosen in the t-th iteration and if its feedback f ai (t) = 1, the actual reward λ > 0 is divided by p ai (t) so that an unbiased estimate of the reward is obtained. On the other hand, if f ai (t) = 0, then a penalty of −1 is assigned for a i . However, this penalty is divided by (1 − p ai (t)) to ensure that, if p ai (t) is small because it has not been chosen frequently earlier, the penalty incurred by a i in the t-th iteration remains small.

The action in each iteration is chosen in the following way. With probability γ, one action is randomly chosen from uniform distribution. This is done to ensure sufficient exploration of all recovery algorithms irrespective of the reward accrued by them at the initial phase. On the other hand, with probability (1 − γ), each action is chosen randomly with a probability depending on its accumulated score.

In action-1 CosaMP, the following steps are involved: We choose the following feedback criterion i.e. f ai = 1 for each action:

• Action 1: It is quite intuitive that if there is no attack then the l 2 norm of the residual error will be upper bounded by just recovery error at the end of the algorithm. Hence, we set its upper bound equal to the parameter α. The maximum absolute value in the residual vector is upper bounded by the parameter m. If these inequalities are satisfied in each iteration, then the algorithm concludes that there is no attack, hence f ai = 1. We can also calculate the Mahalanobis distance (MD) [34] using (12) , between the residual error of a test image and that of the clean images. This is used as another alternative criterion to determine whether the image is malicious or not by comparing with some threshold parameter θ.

Input: The measurement matrix A = F −1 , test image vector y, dimension of image vector N , sparsity parameters τ and k, perturbation levels η, η and η , Mahalanobis Distance (MD) threshold θ, stopping time T , stopping time threshold parameters ∆ and δ, and also α, β, m, γ ∈ (0, 1), λ > 0 σ > 0. Initialisation: Set Cumulative score S ai (0) = 0∀i ∈ {1, 2, 3, 4}, Fourier coefficientŝ x 0 = 0, residual error v 0 = y and p ai (1) = 1/4 for all actions in A = {a 1 , a 2 , a 3 , a 4 } Result:x which is k sparse approximation of Fourier coefficients Actions:

• a 1 :

for t = 1, ..., T do 1) Select action a i , i ∈ {1, 2, 3, 4} with sampling distribution p ai (t) using (10). 2) Perform some more number of initial iterations of chosen action a i compared to the last time when a i was chosen. 3) Find top k Fourier coefficients i.e.x t =x h(k) using the output in the previous step. 4) Calculate the residual error v t ← y − Ax t . 5) Feedback f ai (t) = 1 is set if following condition holds for the chosen action:

And v t ∞ > β 6) Calculate reward r ai (t) using (11). 7) Update cumulative score where v is the residual error of test image andm and C is the mean and covariance of residual error of clean images respectively. This is reminiscent of the popular χ 2 detector used in anomaly detection. • Action 2: Under l 0 attack, the number of non-zero entries in its perturbation vector should be upper bounded by some parameter τ . Hence, we use the conditions v t 2 > α and v t 0 < τ . This can be explained from the fact that v t includes perturbation error along with recovery error. • Action 3: Along with the previous condition v t 2 > α, here we assume that maximum absolute perturbation in case of l 2 or l 1 attack is upper bounded by β and lower bounded by m. • Action 4: Checking for l ∞ attack additionally requires us to verify whether the maximum residual error component which acts as a proxy for the maximum perturbation is greater than β. Choosing an action yields a feedback status which influences the reward values as in (11) and consequently the probabilities of choosing all actions.

We numerically observed, in addition to the above feedback criteria, that the residual vector contains a large number of nonzero entries for actions 3 and 4 for adversarial grayscale images such as the MNIST dataset. Hence, in our experiments in Section V, we additionally check whether v t 0 is above a threshold.

CAD can be run till the maximum limit T for the number of iterations is reached. However, if either of the two conditions p ai (t) > ∆ for some i ∈ {1, 2, 3, 4} and v t 2 < δ is met before that for two given threshold parameters ∆ and δ, then the iteration will stop. The condition p ai (t) > ∆ means that it is optimal to choose action i with high probability, and hence no further exploration is required. The condition v t 2 < δ means that most likely the test image is clean, and hence there is no need to investigate it further.

At the end, the appropriate recovery method is chosen according to the action which achieves the maximum cumulative score. However, if the maximum score is negative at this time, then it implies that CAD is unable to clearly identify the type of attack, and hence CoSaMP is chosen as a default recovery method due to its robustness.

CoSaMP has the following five steps: forming signal proxy, identification, support merger, least square estimation and pruning. The sensing matrix A = F −1 has dimension N × N and sparsity k. Hence, following standard matrix vector multiplication, time complexity for each of the five steps [11] For any action, choosing the top k Fourier coefficients is similar to the CoSaMP pruning step, and it can be done by a sorting algorithm in O(k log k) time. The number of operations required to calculate the residual error v t = y − Ax t for a k sparse vectorx t is O(kN ). Calculating various norms such as l 2 , l 0 and l ∞ require O(N ) each time. Also, the number of iterations is upper bounded by T .

Hence, the overall computational complexity of CAD will be of O(T N 2 + T p(N )).

We conducted our experiments on MNIST [35] and CIFAR-10 [36] data sets for pixels lying in between [0, 1]. Discrete Cosine Transform (DCT) domain is used in experiments to get sparse coefficients. We consider only white box attacks since the attacker in a black box attack has access to much less information than a white box attacker, and hence is less effective in general. All experiments were performed in Google Colab.

Foolbox [37] is an open source library available in python that can exploit the vulnerabilities of DNNs and generate various malicious attacks. All our evaluations are done using the 2.3.0 version of Foolbox library. We evaluate our compressive sensing based adaptive defense (CAD) against five major state-of-the-art white box adversarial attacks. They are projected gradient descent (PGD) [12] , basic iterative method (BIM) [6] , fast gradient sign method (FGSM) [7] , Carlini Wagner(L2)(CW) attack [13] and Jacobian saliency map attack (JSMA) [14] . In the PGD attack, 40 iterations steps with random start are used in Foolbox. For the C&W attack, we use 10,000 iteration steps with a learning rate of 0.01. In the BIM attack, the number of iterations is set to 10 and limit on perturbation size is set to 0.3. We use the default parameters of foolbox library for the FGSM attack. In JSMA attack maximum iteration is set to 2000 and perturbation size in l 0 norm is set to 20 and 35 for MNIST and CIFAR-10 respectively. All attacks used in this are bounded under l ∞ norm with perturbation size = 0.3 and = 8/255 for MNIST and CIFAR-10 respectively.

As the authors of [26] observed that data sets such as MNIST (28 × 28) and CIFAR-10 (32 × 32) are too low dimensional to exhibit a diverse frequency spectrum. Hence, we do not test our algorithm against low frequency adversarial perturbation attacks.

For training, we use clean, compressed, reconstructed images using only top k DCT coefficients. Then we test the DNN based classifier against perturbed images (without any reconstruction) and note down its adversarial accuracy and loss. Then we employ our proposed CAD algorithm to reconstruct the adversarial images to obtain corrected classification accuracy and loss for each attack.

The model architecture used for MNIST is described in Table I . We use an RMSprop optimizer in Keras with crossentropy loss for MNIST. For CIFAR-10 we use ResNet (32 Layers) [38] model having Adam optimizer with cross-entropy loss having batch size = 128 and epoch = 50. We randomly choose 7000 and 2050 images for MNIST and CIFAR-10 respectively from the training set, and train the classifier with its reconstructed and compressed (reconstructed by taking top k DCT coefficients of the image) images. In MNIST, we take 1000 corrupted images randomly from the test set for each attack. Since 3 channels are available in CIFAR-10, attacks are much expensive to execute and time complexity is O (3T N 2 + 3T p(N ) ). Hence, we choose only 250 images randomly from the test set for each attack to evaluate our CAD algorithm. Various parameters used in the algorithm are as follows: Stopping criterion parameters are ∆ = 0.8 and δ = 2. For action-2, although we mention in the algorithm that the number of non-zero entries should be less than τ to satisfy the feedback condition, there will be some recovery error components in practice. Hence, we count the number of entries greater than a threshold 0.5 in the residual error vector, instead of exactly counting the number of non-zero entries. In CIFAR-10, we run our algorithm channel-wise to obtain reconstructed coefficients.

It has been observed that defenses that employ adversarial training suffer from the problem of clean data accuracy, i.e., the classifiers trained with adversarial images perform poorly for clean images. In order to address this problem, we evaluate the cross-entropy loss and classification accuracy of our algorithm on 10,000 uncompressed, clean test images in the first row of Table II and III for both MNIST and CIFAR-10. Our results show that our trained model using reconstructed and compressed images works effectively in classifying the uncompressed clean images, compared to the competing algorithms.

Reverse engineering attacks allow the attacker to determine the decision rule by monitoring the output of the classifier for sufficient number of query images [39] . CAD algorithm randomly chooses the recovery algorithm based on the nature of the recovery error, hence it is completely nondeterministic. In order to impart more uncertainty to the recovered coefficients and confuse the attacker, one can randomly initializex 0 instead of initializing it with all zero vectors. Hence, generating a reverse engineered attack for our proposed CAD algorithm becomes difficult.

For comparison, we take 5 state-of-the-art defenses proposed in recent years: 1) CRD [19] : We choose CRD defense for comparison since it is also based on compressive sensing. The authors of [19] provided different recovery methods of Fourier coefficients for each norm attack, but did not test their defense against gradient based attacks like FGSM, PGD since these two attacks do not yield any norm condition. 2) Madry et al. defense [12] : It is min-max optimization based defense using adversarial training to combat adversarial attack. 3) Yu et al. defense [20] : This also is based on adversarial training, using decision surface geometry as parameter. 4) CMT [28] : This defense uses collaborative learning to increase the complexity in searching adversarial images for the attacker. It is applicable for non-targeted, blackbox and greybox attacks. 5) Bafna et al. defense [18] : This defense is based on compressive sensing techniques and is only applicable for l 0 attack. The classification accuracy of the all defenses are computed for targeted white box adversarial images (since white box attack is more effective), except for the CMT [28] defense scheme which is evaluated for non-targeted black box adversarial images. Experimental results for cross-entropy loss and classification accuracy of both adversarial and clean images for each attack are provided in Tables II  and III . It is clear from the tabulated results that CAD algorithm is outperforming CRD except in CW(L2) attack in MNIST dataset where the performance is slightly worse. The defenses proposed in [12] and [20] perform well for data sets having grayscale images (e.g., MNIST), but exhibit very low classification accuracy for data sets having colored images (e.g., CIFAR-10). Finally, we compare CAD algorithm with CMT [28] . Though white box attack usually performs better than black box attack, proposed CAD algorithm against white box attack achieves much better classification accuracy compared to CMT under black box attack, for the MNIST data set. On the other hand, CAD algorithm against white box attack achieves comparable classification accuracy compared to CMT under black box attack, for the CIFAR-10 data set. It is to be noted that CMT exhibits extremely poor classification accuracy against CW(L2) attack for both MNIST and CIFAR-10 data sets. Since PGD and BIM attacks are very similar, Tables II and  III. In Table IV , we compare the existing defenses against l 0 norm attack JSMA. It can be observed that CAD algorithm significantly outperforms others for MNIST. For CIFAR-10, CAD algorithm achieves high classification accuracy compared to CRD [19] but poorer accuracy compared to CMT [28] ; however, one should remember that here CAD algorithm is evaluated against white box JSMA attack, while CMT is evaluated against black box JSMA attack.

We also illustrate the reconstruction quality of randomly selected images (after performing inverse DCT on recovered coefficients) for each attack in Figures 1 & 2 . It is observed that the reconstructed images have high classification accuracy.

Most of recently proposed defenses are suffer from the problem of obfuscated gradients [40] , [41] ; the proposed defense often does not use accurate gradients while generating adversarial images for the testing phase. Here we argue that CAD algorithm does not cause gradient masking, the reasons being the following: 1) Iterative attacks are usually superior to single step attacks. In order to verify this, we randomly select 15 images on which foolbox can craft a perturbed image. We choose FGSM and PGD as single step attack and iterative attack respectively, evaluate each image separately on our model, and plot the cross-entropy loss for each image. From Figure 3 it can be seen clearly that, for each image, cross-entropy loss is always less for PGD attack compared to FGSM attack , for both MNIST and CIFAR-10. This matches the well-known fact that iterative attack is superior to single step attack. 2) We apply unbounded distortion for both FGSM and PGD and observe that each image is misclassified. Hence, the attack exhibits 100% success rate, which is another desired condition.

In this paper, we have proposed a compressive sensing based adaptive defense (CAD) scheme. CAD algorithm chooses an appropriate recovery algorithm in each iteration using the multi-armed bandit theory, based on the observed nature of the residual error. While the standard basis pursuit algorithm was previously used to mitigate l 2 attack, we have proposed a modified basis pursuit with novel constraint to combat l 0 and l ∞ attacks, and also have provided their performance bounds. The proposed CAD algorithm achieves excellent classification accuracy with low computational complexity and low memory requirement for both white box gradient attacks and norm attacks.

While our paper combines compressive sensing and multiarmed bandit techniques for adversarial image classification, this approach can be adopted even for classifying and detecting adversarial videos. However, computation complexity will be a major challenge for videos, and that can be alleviated to some extent by opportunistically sampling frames and applying tools similar to this paper on them. Tools from restless bandit theory can also be useful for videos. Thus, our paper opens the possibility of starting a new research domain on adversarial image and video detection using theoretical tools, which has traditionally seen mostly DNN and heuristic based efforts.

Compressive-Sensing-Based-Adaptive-DefenceAgainst-Adversarial-Images

Fast automated detection of covid-19 from medical images using convolutional neural networks

Are we ready for autonomous driving? the kitti vision benchmark suite

Face recognition based surveillance system using facenet and mtcnn on jetson tx2

Naturalae: Natural and robust physical adversarial examples for object detectors

Adversarial examples in the physical world

Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples

Stable signal recovery from incomplete and inaccurate measurements

An invitation to compressive sensing

Gambling in a rigged casino: The adversarial multi-armed bandit problem

Cosamp: Iterative signal recovery from incomplete and inaccurate samples

Towards deep learning models resistant to adversarial attacks

Towards evaluating the robustness of neural networks

The limitations of deep learning in adversarial settings

Delving into transferable adversarial examples and black-box attacks

Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models

Decision-based adversarial attacks: Reliable attacks against black-box machine learning models

Thwarting adversarial examples: An l_0-robustsparse fourier transform

Compressive recovery defense: Defending neural networks against 2, ∞, and 0 norm attacks

Interpreting adversarial robustness: A view from decision surface in input space

Multi-scale defense of adversarial images

Encryption inspired adversarial defense for visual classification

Efficient detection of adversarial images

Detecting adversarial samples from artifacts

Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks

On the effectiveness of low frequency perturbations

Defending against adversarial attack towards deep neural networks via collaborative multi-task training

An introduction to signal detection and estimation

Exploiting channel sparsity for beam alignment in mmwave systems via exponential learning

On the stability of the basis pursuit in the presence of noise

Restricted isometry of fourier matrices and list decodability of random linear codes

The uncertainty principle

The mahalanobis distance. Chemometrics and intelligent laboratory systems

The mnist database of handwritten digits

Learning multiple layers of features from tiny images

Foolbox: A python toolbox to benchmark the robustness of machine learning models

Resnet in resnet: Generalizing residual architectures

Stealing machine learning models via prediction apis

Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples

Aleksander Madry, and Alexey Kurakin. On evaluating adversarial robustness