key: cord-0108455-nwj0zijm authors: Ambati, Anirudh; Dubey, Shiv Ram title: AC-CovidNet: Attention Guided Contrastive CNN for Recognition of Covid-19 in Chest X-Ray Images date: 2021-05-21 journal: nan DOI: nan sha: 79cf34f01decc3df30635ed8bb56c8220c19388c doc_id: 108455 cord_uid: nwj0zijm Covid-19 global pandemic continues to devastate health care systems across the world. At present, the Covid-19 testing is costly and time-consuming. Chest X-Ray (CXR) testing can be a fast, scalable, and non-invasive method. The existing methods suffer due to the limited CXR samples available from Covid-19. Thus, inspired by the limitations of the open-source work in this field, we propose attention guided contrastive CNN architecture (AC-CovidNet) for Covid-19 detection in CXR images. The proposed method learns the robust and discriminative features with the help of contrastive loss. Moreover, the proposed method gives more importance to the infected regions as guided by the attention mechanism. We compute the sensitivity of the proposed method over the publicly available Covid-19 dataset. It is observed that the proposed AC-CovidNet exhibits very promising performance as compared to the existing methods even with limited training data. It can tackle the bottleneck of CXR Covid-19 datasets being faced by the researchers. The code used in this paper is released publicly at url{https://github.com/shivram1987/AC-CovidNet/}. Coronavirus disease 2019 (Covid-19) has emerged very fast as an emergent health risk disease affecting the whole world. It has been observed that this infection spreads through the surfaces which might be infected from the infected person. The spread of Covid-19 is categorized into different stages. The stage1 and stage2 refer to the small scale spread, whereas stage3 and beyond refer to the large scale spread due to chain reaction. Covid-19 pandemic is being witnessed as the toughest time of the century during April-May 2021 due to its 2 nd wave which has already entered into stage3/stage4 (i.e., community spread) of Covid-19 infection spread. Thus, the pandemic has led to a huge burden on the healthcare systems across the world. Testing for Covid-19 is the most important part of the process and must be scaled as much as possible. CXR based testing can one of the fastest way using existing infrastructure and can be scaled very quickly and cost effectively. The radiograph image of the lungs can be captured using different imaging tools such as CT-Scan and X-Ray. Getting the CT-Scan is again a costly and time-consuming process. Moreover, only major hospitals have CT scanners. However, capturing an X-Ray is a very affordable as well as an efficient process. The covid-19 disease is caused by severe acute respiratory syndrome coronavirus-2 (SARS-COV-2) and the infected patients show distinct visual features in the Chest X-Ray (CXR) images. Hence, artificial intelligence based automated techniques can be utilized to detect the infection in CXR images. Such testing methods can be fast, scalable, economical, and affordable. Researchers have tried to explore the artificial intelligence based deep learning techniques for Covid-19 detection, such as COVID-Net [36] , CovidAID [22] , COVID-CAPS [1] , CovXNet [21] , DarkCovidNet [26] and Convolutional Neural Network (CNN) Ensemble [6] . The lack of sufficient data to train and test the models is the main problem in the development of the deep learning based models for Covid-19 detection in CXR images. Hence, it is an urgent requirement to develop a deep learning model that would learn distinctive features from the limited data. In order to learn the discriminative and localized features, we propose an attention guided contrastive CNN (AC-CovidNet) for Covid-19 recognition from CXR images. Following are the commitments of this work: -We propose a novel AC-CovidNet deep learning framework for Covid-19 recognition in CXR images. -The use of the attention module enforces the learning of localized visual features corresponding to Covid-19 symptoms. -The contrastive loss increases the discriminative ability and robustness of the model by learning the similarity between Covid-19 infected samples and dissimilarity between Covid-19 positive and negative samples. -The impact of the proposed method is analyzed for different amount of training data w.r.t. the recent state-of-the-art models. The remaining paper is organized as follows: Section 2 summarizes the related works; Section 3 illustrates the proposed AC-CovidNet model; Section 4 details the experimental settings; Section 5 presents the results and analysis; and finally Section 6 summarizes the findings with concluding remarks. It has been observed in the primary research conducted by Wang and Wong (2020) [36] that chest radiograph images can be used for the Covid-19 detection. It opened up the new urgent and demanding area of the possible usage of Artificial Intelligence for early, efficient and large-scale detection of viruses among people. Ng et al. released the imaging profile of the Covid-19 infection with radiologic findings [24] . Li et al. discovered the spectrum of CT findings and temporal progression of Covid-19 disease which reveals that this problem can be solved using imaging AI based tools [20] . Bai et al. performed a performance study of radiologists which can differentiate the Covid-19 from viral pneumonia on chest CT [4] . Deep learning is also utilized for Covid-19 detection from Chest X-Ray (CXR) images [23] . In one of the first attempts, CXR radiograph images are used for Covid-19 detection using a Deep Learning based convolutional neural network (CNN) model COVID-Net [36] . A projection expansion projection extension module is used heavily in COVID-Net which is experimented on various configurations of the model. Authors used a human-machine collaborative design strategy to create COVID-Net architecture where human driven prototyping and machine based exploration is combined. A COVIDx dataset is also accumulated from various sources and being updated with new data. The dataset and models are publically released for further research 1 . Using COVID-Net model, 96% sensitivity is observed in [36] on a test set of 100 CXR images. A CovidAID model is proposed in [22] which is a pretrained CheXNet [29] -a 121 layer DenseNet [14] followed by a fully connected layer. Using the CovidAID model, 100% sensitivity is observed on a test set having 30 Covid-19 images. A capsule networks based deep learning model (COVID-CAPS) is investigated in [1] . Authors in [1] aim to prevent the loss of spatial information which is observed in CNN based methods. Using COVID-CAPS model, a sensitivity of 90% is reported on 100 test CXR images. Cov-XNet model [21] is proposed to use transferable multi-receptive feature optimisation. Basically, 4 different configurations of a network are utilized in CovXNet for training and prediction. Using CovXNet model, 91% sensitivity is reported on 100 test CXR images. The DarkNet architecture based DarkCovidNet model is introduced in [26] with You Only Look Once (YOLO) real time object detection system for Covid-19 detection. Using the DarkCovidNet model, a classification accuracy of 98.08% for binary classes and 87.02% for multi-class cases are observed. CNN ensemble of DenseNet201, Resnet50-v2 and Inception-v3 is utilized for Covid-19 recognition in [6] . Samples from only 2 classes (i.e., covid and non-covid) are used to train the CNN ensemble model. Using CNN ensemble model, a classification accuracy of 91.62% is reported. Researchers from [13] tried to develop an open source framework of algorithms to detect covid19 using CT scan images. Also, researchers at [15] developed Covid-CT using selftrans approach. We have tested these algorithms on our CXR dataset configurations. From the above presented works, it is convincing that AI powered deep learning methods can play a vital role for the Covid-19 screening. The CT scans and CXR radiographs are used majorly for the imaging based techniques. Less attention has been given to CXR images so far due to the not so great generalization performance caused by the limited availability of data [2] , [7] . Given the need to conduct the mass screening at affordable cost, the further fast research over CXR images of lungs is very much needed using the limited data. Thus, in this paper, we utilize the capability of attention mechanism and contrastive learning to tackle the learning with limited data for Covid-19 recognition from CXR images. In this section, first we provide a brief of deep learning, attention mechanism and contrastive learning. Then we present the proposed AC-CovidNet architecture. Deep learning has shown a great impact from last decade to solve many challenging problems [19] . Deep learning models consist of the deep neural networks which learn the important features from the data automatically. The training of the deep models is generally performed using stochastic gradient descent optimization [17] , [10] . Convolutional neural network (CNN) based models have been used to deal with the image data, such as image classification [18] , face recognition [32] , image retrieval [9] , hyperspectral image analysis [31] , and biomedical image analysis [11] . Attention mechanism in deep learning facilitates to learn the localized features which is more important in the context of the problem of Covid-19 recognition from CXR images [3] . It is also discovered that the attention based model can outperform the plain neural network models [35] . The attention mechanism has been also utilized for different applications such as facial micro-expression recognition [12] , breast tumor segmentation [34] , and face recognition [30] . Thus, motivated from the success of attention mechanisms, we utilize it in the proposed model for Covid-19 recognition. Contrastive learning is the recent trend to learn the similarity and dissimilarity between the similar and dissimilar samples in the abstract feature space for visual representations [5] , [16] . Generally, contrastive learning is dependent upon the feature similarity between positive pairs and negative pairs [33] . The contrastive learning has shown very promising performance for different problems, such as face generation [8] , imageto-image translation [27] , medical visual representations [37] , and video representation [28] . Thus, motivated from the discriminative and robust feature representation by contrastive learning, we utilize it in the proposed method. In this paper, we propose an attention guided contrastive CNN for Covid-19 recognition, named as AC-CovidNet. The architecture of the proposed AC-CovidNet model is illustrated in Fig. 1 . The proposed model is based on the popular COVID-Net model [36] . It heavily uses light weight residual projection expansion projection extension (PEPX) mechanism. The PEPX component is shown in Fig. 1 (right side) . This architecture also uses selective long range connectivity in model which improves the representational capacity. It also facilitates the training of the model easier. However, the extensive use of these long range connections may bring a lot of redundant low level features. In order to resolve this issue, the proposed model uses an attention mechanism. Attention helps the model to prioritize the regions of important w.r.t. the problem being solved. Attention is also useful to suppress the activations from the redundant features from the initial layers and helps to focus on the important features that are required to solve the given problem. We use the attention gates in the proposed architecture as suggested in [25] , at various layers in the COVID-Net architecture where many long range connections are used. This improves the sensitivity as the model attends better to the important visual features of infected regions in CXR images due to Covid-19. As the difference between Covid-19 and Pneumonia features is very subtle, we propose to use the supervised contrastive loss. The contrastive loss facilitates the network to increase the distance between the learnt representation of the classes as much as possible. Architecture The proposed AC-CovidNet architecture is an extension of COVID-Net by utilizing the attention gates where multiple long range connections converge. The details of the architecture of the proposed model is illustrated Fig. 1 . We use PEPX layers and attention gates heavily as a part of our architecture. We also use the contrastive loss function while training. The model is first trained using supervised contrastive learning before final fine tuning using supervised learning. PEPX Layer The architecture of a Projection-Expansion-Projection-Extension (PEPX) is shown in Fig. 1 (right side) . The idea of this module is to project features into a lower dimension using the first two conv1x1 layers, then expand those features using a depthwise convolution layer (DWConv3x3) and project into lower dimension again using two conv1x1 layers. Thus, the PEPX layer leads to the efficient model by reducing the number of parameters and operations. Attention Gate We use attention gates in the proposed AC-CovidNet model at various layers as depicted in Fig. 1 . The block diagram of the attention module is shown in Fig. 3 . Features from multiple layers are passed through conv1x1 and are added together. Then, the aggregated features are passed through Relu activation function followed by conv1x1 and then sigmoid activation function. The feature map output of the sigmoid layer is then passed through a resampler. The output of the resampler is added to the First we train the encoder network E(.) with supervised contrastive loss (SupCon-Loss) using the projection network P (.). Then we discard the projection network and add the classifier network C(.) to the encoder network. Then we freeze the weights of the encoder model and train the classifier with categorical cross entropy loss. The training of the encoder of the proposed model is summarized in Algorithm 1. We use the supervised contrastive learning method [16] to train the encoder network for feature extraction. We train the classifier network using the cross entropy loss function after freezing the encoder network. Supervised Contrastive loss Contrastive loss is most commonly used in unsupervised and self-supervised learning. In order to adapt this method to supervised learning and take advantage of the labels available, the supervised contrastive learning has been investigated in [16] . This loss is used to train the encoder network of the proposed AC-CovidNet. Consider z = P (E (x )) where x is the input and A(i) is a set of all indices (I) except i. Then, the supervised contrastive loss is given as, where, τ ∈ R + is scalar temperature parameter, P (i) ≡ {p ∈ A(i) :ỹ p =ỹ i } is the set of indices of all the positive samples other than i, and |P (i)| is its cardinality. Note thatỹ represents the class label. Cross Entropy Loss Cross entropy loss is used to train the classifier which takes input from the feature extractor. It produces the output probability corresponding to three class from the softmax activation function. The cross entropy loss is given as, whereŷ j i and y j i are the output probability and ground truth value, respectively, for j th class corresponding to i th sample and N represents the number of samples in a batch. The proposed AC-COVIDNet model is pretrained on imagenet as suggested by [36] . Then the entire model is trained in two stages. In the first stage, the encoder network (i.e., feature extractor) is trained using contrastive loss function as suggested in [16] for feature extraction, so that the distance between the learnt features is optimized. In the second stage, the feature extractor is frozen and trained by adding a classifier on the top with cross entropy loss function. We train the proposed model as well as the state-ofthe-art models on all three variations of COVIDx dataset, i.e., with 100, 150 and 200 test images of different categories. The models are trained using Adam optimiser [17] . The learning rate is set as 1.7e-4. The batch size of 64 is used. We use Relu activation function in every layer of the network and softmax in the last layer. Max-pooling is used after every batch of PEPX layers. The three versions of the COVIDx dataset are used for training and the sensitivity of the classes are compared. Covid-19 sensitivity is the percentage of instances with Covid-19 that are correctly identified. The model is trained using the computational resources provided by Google Colab. Keras deep learning library is used with tensorflow as a backend. We test the proposed AC-CovidNet model on all three configurations of the COVIDx dataset and calculate the sensitivity for Covid-19 and other classes. In order to demonstrate the superiority of the proposed method, we also compute the results using stateof-the-art deep learning based Covid-19 recognition models, such as CovXNet [21] , COVID-CAPS [1] , CNN Ensemble [6] , DarkCovidNet [26] , COVID-Net [36] , and CovidAID [22] . The results in terms of the sensitivity for the Covid-19 class are reported in Table 2 . On configuration I (i.e., COVIDx-v1 dataset with 100 test images), configuration II (i.e., COVIDx-v2 dataset with 150 test images), and configuration III (i.e., COVIDx-v3 dataset with 200 test images), the observed Covid-19 sensitivity using the proposed AC-CovidNet model is 96%, 96.66%, and 96.5%, respectively. It can be observed in Table 2 that the proposed model outperforms the remaining models over all three settings of the COVIDx dataset. Note that the proposed model is able to achieve better results than the other compared models because the proposed model learns the Covid-19 specific features using the attention module and increases the separation between different classes in feature space using the contrastive loss. In the configuration I of the dataset (with 100 test images), the performance of the proposed model is better than CovXNet, COVID-CAPS, CNN Ensemble, DarkCovidNet and CovidAID models and same as COVID-Net. Thus, in order to demonstrate the advantage of the proposed model, we compare the results with less number of training samples and more test samples. Basically, it depicts the generalization capability of the proposed model. On an expectation, a bigger test set can reflect the better generalization of the deep learning models. Thus, we experiment with configuration II having 467 training samples and 150 test samples of Covid-19 category. It can be seen in Table 2 that the proposed model is able to retain the similar performance by correctly classifying 145 Covid-19 images out of 150 in the dataset. However, the performance of other models dropped significantly. We also test the performance by further reducing the number of training samples and increasing the number of test samples in configuration III with 417 training images and 200 test images from Covid-19 category. It can be noticed that the performance of the proposed AC-CovidNet model is still similar. However, the other models drastically fail to generalize in case of limited training set. Thus, it clearly indicates that the proposed AC-CovidNet model is able to capture the robust and discriminative features pertaining to the Covid-19 infection and generalize well even with the limited training data. It also shows the positive impact of attention modules and contrastive learning for the Covid-19 recognition from CXR images. The impact of the attention and contrastive mechanism is also investigated by considering the only attention and only contrastive mechanism with base network CovidNet (i.e., CovidNet + attention and CovidNet + contrastive loss, respectively). As shown in the results using these methods in Table 5 , the performance of AC-CovidNet, which uses both attention and contrastive mechanisms, is improved than the models which use only attention and only contrastive mechanisms for Covid-19 recognition. We also report the results for other two classes in Table 5 , i.e., Pneumonia and Normal. It is observed that the performance of the proposed model is comparable to the state-of-the-art for Pneumonia and Normal classes. In this paper, an AC-CovidNet model is proposed for Covid-19 recognition from chest X-Ray images. The proposed model utilizes the attention module in order to learn the task specific features by better attending the infected regions in the images. The proposed model also utilizes contrastive learning in order to achieve the better separation in the feature space by increasing the discriminative ability and increasing robustness. The results are computed over three different configurations of Covid-19 dataset with varying number of training and test samples. The results are also compared with six recent state-of-the-art deep learning models. It is noticed that the proposed AC-CovidNet model outperforms the existing models in terms of the sensitivity for the Covid-19 category. Moreover, it is also observed that the performance of the proposed model is consistent with a limited training set. Whereas, the existing methods fail to do so. It shows the better generalization capability of the proposed method. The future work includes the utilization of recent development in deep learning to solve the Covid-19 recognition problem from chest X-Ray images with better performance. Covid-caps: A capsule network-based framework for identification of covid-19 cases from x-ray images Discovery of a generalization gap of convolutional neural networks on covid-19 x-rays classification Neural machine translation by jointly learning to align and translate Performance of radiologists in differentiating covid-19 from noncovid-19 viral pneumonia at chest ct A simple framework for contrastive learning of visual representations Automatic covid-19 detection from x-ray images using ensemble learning with convolutional neural network Ai for radiographic covid-19 detection selects shortcuts over signal Disentangled and controllable face image generation via 3d imitative-contrastive learning A decade survey of content based image retrieval using deep learning diffgrad: an optimization method for convolutional neural networks Local bit-plane decoded convolutional neural network features for biomedical image retrieval Meranet: Facial micro-expression recognition using 3d residual attention network Sample-efficient deep learning for covid-19 diagnosis based on ct scans Supervised contrastive learning Adam: A method for stochastic optimization Imagenet classification with deep convolutional neural networks Deep learning Coronavirus disease (covid-19): spectrum of ct findings and temporal progression of the disease Covxnet: A multi-dilation convolutional neural network for automatic covid-19 and other pneumonia detection from chest x-ray images with transferable multi-receptive feature optimization Covidaid: Covid-19 detection using chest x-ray Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks Imaging profile of the covid-19 infection: radiologic findings and literature review Attention u-net: Learning where to look for the pancreas Automated detection of covid-19 cases using deep neural networks with x-ray images Contrastive learning for unpaired image-to-image translation Spatiotemporal contrastive video representation learning Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning Attention-aware deep reinforcement learning for video face recognition Hybridsn: Exploring 3-d-2-d cnn feature hierarchy for hyperspectral image classification Hard-mining loss based convolutional neural network for face recognition What makes for good views for contrastive learning Attention-enriched deep learning model for breast tumor segmentation in ultrasound images Attention is all you need Covid-net: a tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images Contrastive learning of medical visual representations from paired images and text