key: cord-0191100-lu4rgiz1
authors: Jiang, Juntao; Lin, Shuyi
title: COVID-19 Detection in Chest X-ray Images Using Swin-Transformer and Transformer in Transformer
date: 2021-10-16
journal: nan
DOI: nan
sha: 4da98d806c85ec723adbac70dac6f95518b685d0
doc_id: 191100
cord_uid: lu4rgiz1

The Coronavirus Disease 2019 (COVID-19) has spread globally and caused serious damages. Chest X-ray images are widely used for COVID-19 diagnosis and Artificial Intelligence method can assist to increase the efficiency and accuracy. In the Challenge of Chest XR COVID-19 detection in Ethics and Explainability for Responsible Data Science (EE-RDS) conference 2021, we proposed a method which combined Swin Transformer and Transformer in Transformer to classify chest X-ray images as three classes: COVID-19, Pneumonia and Normal (healthy) and achieved 0.9475 accuracy on test set.

The Coronavirus Disease 2019 (COVID- 19) , which caused by the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2, 2019-nCoV) has become a global pandemic and brought unprecedented damages seriously worldwide.As Chest X-ray tests are typically has a high sensitivity diagnosis of COVID-19 [1] , [2] , Chest X-ray images can be used for not only following up the effects of COVID-19 on lung tissue, but also early detection of COVID-19, thus the immediate isolation and treatment for the suspected can be achieved.

Past years witnessed the growing use of AI techniques [3] - [6] , especially deep-learning-based methods, in disease detection on chest X-ray images, successfully increasing the accuracy and efficiency for early diagnosis. After the outbreak of pandemic, many approaches have also been proposed to detect COVID-19. Shervin Minaee et al. [7] trained 4 stateof-the-art convolutional networks for COVID-19 detection on a dataset of around 5000 X-ray images, and achieved higher than 90 percent of sensitivity and specificity rate. Asif IqbalKhan et al. [8] designed a deep convolutional neural network model based on Xception architecture to classify Normal, Pneumonia-bacterial, Pneumonia-viral and Covid-19 chest X-ray images. Linda Wang et al. [9] introduced a deep convolutional neural network called COVID-Net to detect COVID-19 cases from chest X-ray images, which is open source and available to the general public. Rachna Jain et al. [10] used Inception net V3, XCeption net and ResNeXt to do the classification. Sanhita Basu et al. [11] gave a new concept called domain extension transfer learning (DETL) with pretrained deep convolutional neural network on a related large Special attention should be paid to the truth that Transformer methods have outperform convolutional neural network in some datasets of different scenes recently. Swin Transformer [12] and Transformer in Transformer [13] are successful works to adapt Transformer from language to vision and achieved the state of the art in different tasks, but we haven't seen a lot of application in chest X-ray images classification and COVID-19 detection. This paper is a technical report in Chest XR COVID-19 Detection Challenge [14] , a part of Ethics and Explainability for Responsible Data Science (EE-RDS) conference. We combined Swin-transformer and Transformer in Transformer to classify chest X-ray images as three classes as COVID-19, Pneumonia and Normal (healthy) on the dataset offered by the challenge organizer and achieved 0.9475 accuracy on test set.

The dataset of this challenge contains three parts, which are used to train, validate and test respectively:

• Train = 17,955 chest X-ray images • validation = 3,430 chest X-ray images • test = 1,200 chest X-ray images There are three types of chest X-ray images in this dataset: COVID-19, Pneumonia and Normal (healthy). The example of three classes are shown as Figure 1 .

Our task is to design an algorithm to auto classify chest X-ray images into these three classes. The workflow of our classification method is shown as the Figure 2 . We trained the Swin Transformer and Transformer in Transformer separately and did results ensemble by using weighted average methods.

The workflow of our Preprossing method for training is shown as the figure 3 and the workflow of our Preprossing method for training is shown as the figure 4.

Firstly we resized the images to a certain scale (224*224 or 384*384), then applied random flipping, then selected a policy from from rotation, translation horizontally or vertically, and then we applied random erasing. Finally we did the normalization. All these changes are applied with small probability.

Different from the natural images, the overall characteristics of medical images are significant to classify, so we avoid using crop or center crop as data augmentation methods. We only use horizontal flipping, and the rotation or translation augmentation methods are also controlled within a small range.

We did not use prepossessing methods related to brightness or contrast to avoid destroying the difference between the three classes. sizes and has linear computational complexity to image size. Swin Transformer has different types according to the size of the models and we selected Swin-B(ase) for the task.

2) Transformer in Transformer: Transformer in Transformer (TNT) [13] , which regards the local patches as "visual sentences" and then present to further divide them into smaller patches as "visual words", and features of both words and sentences will be aggregated to improve the representation ability.

Transformer in Transformer has different types according to the size of the models and we selected the Transformer in Transformer small for the task. The larger models should have better performance but we failed to try them due to the limitation of time.

Training settings. Our implementation is based on PyTorch [15] and mmclassification [16] , and one NVIDIA Tesla-V100 GPUs are used for training. Model training is done by using the AdamW, where initial learning rate, batch size and weight decay set to 0.001, 64 and 0.05 respectively respectively. For learning rate scheduler, cosine annealing [17] and warmup [18] is used. We also applied label smoothing in loss.

In the training for Swin Transformer, the input image size is 224*224 and for Transformer in Transformer, the input image size is 384*384. The design is due to the limitation of computation power, and also for convenience to use the pre-trained model directly from ImageNet [19] .

We obtained the probabilities for each classes from two models, then calculated the weighted average for these probabilities. Finally, a linear classifier was used to get the results for classification.

We trained the two models on training set and validate on validation set. Then we selected the best model on validation set to inference on test set.

Due to the limitation of time, the experiments are not enough for comparison but may still give some information. The results are shown as Table 1 .

More specifically, our final results are:

• Accuracy Score: 0.9475, • Sensitivity Score: 0.9475 • Specificity Score: 0.9509 Our results ranks 10 at the leaderboard for this challenge.

In this paper, we applied Swin Transformer and Transformer in Transformer to classify the chest X-ray images in In the Challenge of Chest XR COVID-19 detection and did model ensemble by using weighted average method. We achieved 0.9475 accuracy on test set and ranks 10 on the leaderboard.

Yanqing Fan, Chuansheng Zheng,Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study

Xia Correlation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases

Learning to read chest X-ray images from 16000+ examples using CNN Proceedings

Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists

A novel transfer learning based approach for pneumonia detection in chest X-ray images

Detecting tuberculosis in chest X-ray images using convolutional neural network 2017 IEEE international conference on image processing (ICIP)

Deep-COVID: Predicting COVID-19 from chest X-ray images using deep transfer learning

Coro-Net: A deep neural network for detection and diagnosis of COVID-19 from chest x-ray images

COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest Xray images

Deep learning based detection and analysis of COVID-19 on chest X-ray images

Deep Learning for Screening COVID-19 using Chest X-Ray Images,2020 IEEE Symposium Series on Computational Intelligence (SSCI)

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Chest XR COVID-19 detection

PyTorch: An Imperative Style, High-Performance Deep Learning Library

OpenMMLab's Image Classification Toolbox and Benchmark

SGDR: stochastic gradient descent with warm restarts

Deep Residual Learning for Image Recognition

ImageNet: A large-scale hierarchical image database