key: cord-0839916-ljx5tg7c
authors: Alrahhal, Maher; K P, Supreethi
title: COVID-19 Diagnostic System Using Medical Image Classification and Retrieval: A Novel Method for Image Analysis
date: 2021-05-19
journal: Comput J
DOI: 10.1093/comjnl/bxab051
sha: 6848d3800fd78d27cd39b842aa3db634aa24a422
doc_id: 839916
cord_uid: ljx5tg7c

With the rapid increase in the number of people infected with COVID-19 disease in the entire world, and with the limited medical equipment used to detect it (testing kit), it becomes necessary to provide another detection method that mainly relies on Artificial Intelligence and radiographic Image Analysis to determine the disease infection. In this study, we proposed a diagnosis system that detects the COVID-19 using chest X-ray or computed tomography (CT) scan images knowing that this system does not eliminate the reverse transcription-polymerase chain reaction test but rather complements it. The proposed system consists of the following steps, starting with extracting the image’s features using Visual Words Fusion of ResNet-50 (deep neural network) and Histogram of Oriented Gradient descriptors based on Bag of Visual Word methodology. Then training the Adaptive Boosting classifier to classify the image to COVID-19 or NOTCOVID-19 and finally retrieving the most similar images. We implemented our work on X-ray and CT scan databases, and the experimental results demonstrate the effectiveness of the proposed system. The performance of the classification task in terms of accuracy was as follows: 100% for classifying the input image to X-ray or CT scan, 99.18% for classifying X-ray image to COVID-19 or NOTCOVID-19 and 97.84% for classifying CT scan to COVID-19 or NOTCOVID-19.

In the last months of the year 2019, the COVID-19 disease appeared in China, specifically from Wuhan [1] . The common onset symptoms for COVID-19 patients were fever, cough, myalgia or fatigue. COVID-19 complications are severe infections of the respiratory system, acute heart injury and the advanced injuries lead to entering the central care unit and may lead to death. The current tests are mostly based on reverse transcription-polymerase chain reaction (RT-PCR) [2] . One of the main obstacles in controlling the spread of COVID-19 disease is the disadvantages associated with RT-PCR, which are summarized by the following factors:

• The limited efficiency of the test, as its accuracy does not exceed 71% [3, 4] , meaning that a person may be infected, but his test result will be negative (false result), and therefore, the person will not receive treatment; this leads to the rapid infection transmission among healthy people. • The relatively large time for the RT-PCR test, where it ranges between 6-8 hours [5] , and this has a negative impact on limiting the speed of the spread of COVID-19. • The high cost of the RT-PCR test [5] .

• The shortage in medical equipment needed to perform RT-PCR tests, and thus, the inability to perform the 2 M. Alrahhal And K.P. Supreethi Studies have shown that most cases of COVID-19 disease develop to mild respiratory and constitutional symptoms such as fever, cough, dyspnea, myalgia and fatigue [6, 7] . The symptoms vary according to the degree of infection that may be a mild, moderate or severe infection. Studies have also shown that the examination of the chest X-ray reveals the changes that occur on the patient's lungs 4-5 days after the onset of symptoms [8] , whereas the computed tomography (CT) scan reveals these changes 2 days after the onset of symptoms [9, 10] or when symptoms begin to appear [11] . The effect of the COVID-19 on the patient's lungs in the case of an X-ray and CT scan is illustrated in Table 1 .

These studies have led to conducting researches on the ability of radiography; with both types of X-ray and CT imaging, to detect suspected cases of COVID-19 [12] [13] [14] . Although X-ray images and CT scans may help in the early examination of suspected cases, the images of pneumonia caused by . Evaluates the diagnosis of the scanned images by comparing the class of majority retrieve images with the diagnosis of the image. . Assists the specialist physician through using the similar treatment protocol of retrieved images.

• Both diagnosing (classification) and retrieving tasks share the feature extraction step that considers the essential step in the whole work. For that, we proposed two novel methods for feature extraction as the following:

-Proposed1 method that relies on Visual Words Fusion of ResNet-50 and Histogram of Oriented Gradient (HOG) descriptors based on Bag of Visual Word (BoVW) methodology and used those features to train Adaptive Boosting classifier and retrieve the most similar images of the query images. -Proposed2 method that relies on Features Fusion of ResNet-50 and HOG descriptors based on BoVW methodology and used those features for classification and retrieving tasks.

Furthermore, it is necessary to mention the following points:

• This study does not eliminate RT-PCR testing but rather complements it. In the cases where respiratory symptoms of the disease appear, it is better to use radiographic techniques based on Artificial Intelligence and Image Analysis. However, in cases where the respiratory symptoms not exist, the RT-PCR test remains the best way to detect the disease. • The system trained to be able to diagnose COVID-19 disease using X-ray images or CT scans, and the choice of the imaging method used to detect COVID-19 depends on multiple factors, including:

. Availability of medical imaging devices.

. Determine the period of onset of symptoms.

. Clinical examination of the patient and the final decision is up to the specialist physician.

• We implement our work on both types of images (X-ray and CT scan), knowing that X-ray is a faster, easier, cheaper, less harmful method than CT and detects the changes that occur to the COVID-19 patient's lungs 4 or 5 days after the onset of symptoms. However, using CT scanning, we can examine the very soft structure of the active body part and show the soft tissues and internal organs more clearly and accurately than Xray images. Moreover, a CT scan detects the changes in the infected lung with COVID-19 disease since the appearance of symptoms. Because of the pros and cons of each imaging method, we have developed COVID-19 detection methods for both medical images (X-ray and CT).

Many studies addressed the use of deep learning for diagnosing COVID-19 disease. We reviewed a few of those studies. Those studies are summarized as following: Xu et al. [22] started their proposed methods by preprocessing the CT scan images, then used 3D CNN model to segment the images into multiple image cubes. After that, they used the ResNet-18 network for feature extraction and Bayesian function to classify all image patches into the following types: COVID-19, Influenza-A-viral-pneumonia and normal. They implemented their work on CT scan images and the database description was as follows: 219 images of COVID-19 category, 224 images of viral pneumonia category and 175 images of a normal category.

Training deep learning models on a small dataset can easily lead to overfitting; Yang et al. [2] addressed this problem using two approaches: transfer learning and data augmentation. The authors used a large collection of chest X-ray images to pretrain a DenseNet, and then fine-tune this pretrained network on the COVID-19 dataset. They implemented their work on CT scan images and the database description was as follows: 275 images of COVID-19 category and 195 images of a normal category.

Wang et al. [15] modified the typical Inception network in the last fully connected (FC) layers. They reduced the [23] proposed a method that consists of three main steps: first, they extracted the main regions of the lungs and filled the blank of lung segmentation with the lung itself to avoid noises caused by different lung contours. Second, they designed a Details Relation Extraction neural network (DRE-Net) to extract the top-K details in the CT images and obtain the image-level predictions. Third, the imagelevel predictions were aggregated to achieve patient-level diagnoses. The database description as follows: 88 images of COVID-19 category, 101 images of viral pneumonia category and 86 images of a normal category.

Zheng et al. [24] proposed a (DeCoVNet) that consists of three steps: the first one was network stem, and the second one composed of two 3D residual blocks (ResBlocks). The third step was a progressive classifier (ProClf), which contained three 3D convolution layers and an FC layer with the softmax activation function. They implemented their work on CT scan images and the database description was as follows: 313 images of COVID-19 category and 229 images of a normal category.

Li et al. [25] proposed COVNet model for COVID-19 detection that depends on RestNet-50 as the backbone, which takes a series of CT slices as input and generates features for the corresponding slices. Then, the extracted features from all slices are combined by a max-pooling operation. The final feature map is fed to an FC layer and the softmax activation function to generate a probability score. The database description was as follows: 1296 images of COVID-19 category, 1736 images of pneumonia category and 1325 images of a normal category.

Narin et al. [26] proposed COVID-19 detection models based on ResNet-50, InceptionV3 and Inception-ResNetV2 networks. They implemented their work on X-ray images and their database description was as follow: 50 images of the COVID-19 category and 50 images of a normal category.

At the end of the related works section, we reviewed a few works that discussed the COVID-19 detection using deep learning techniques. However, the importance of our work comparing to the literature works comes from the following factors; first, all the previous work used only one type of medical image. However, our work discusses and uses both types of medical images (X-ray and CT). Second, our work used the image retrieval task to evaluate the diagnostic outcomes and make the best decision. Third, our work proposed novel feature extraction methods that combined the deep learning (ResNet-50) with the handcraft features (HOG) and used BoVW to reduce the feature dimensions.

Our proposed diagnosis system contains two main tasks, namely:

• Classification (Diagnosis) Task: This task is responsible for detecting the type of the medical image (X-ray or CT) and diagnosing it whether infected with COVID-19 or not. • Retrieval Task: This task is responsible for retrieving the most similar images to the scanned image by comparing the query image with the images archive (database) of previous patients to assist the specialist physician in the treatment.

Both tasks intersect in the feature extraction step, which is the essential step in the whole work. Therefore, in this section, we discussed the proposed methods of feature extraction and the ideas related to them. The architecture scheme of the diagnosis system is illustrated in Figure 1 .

The proposed method that relies on visual word fusion of ResNet-50 and HOG descriptors based on BoVW consists of several steps, summarized in Figure 2 , as following:

Step1 : Choose randomly 40% images from each class in the database to build the dictionary.

Step2 : Preprocessing step for enhancing the image intensity where this type of images had a narrow range of intensity level.

Step3 : Extract the feature vectors from the enhancement image after resizing it to 224 × 224 using:

• The layer 'avg_pool' of the ResNet-50 [27] and we refer to this feature vector as f ResNet50 . • The HOG method [28] and we refer to this feature vector asf HOG .

Step4 : For compacting the feature vectors, we apply the K-Means++ clustering (which describes as a smart centroid initialization technique comparing to K-Means) [29] separately for each type of the feature vector (f ResNet50 ,f HOG ), knowing that the centers of the clusters compose the visual words. The set of visual words constitute the vocabulary (sometimes called dictionary or bag) that forms by merging the visual words as the Equations (1) and (2): 

where the VW ResNet50 1 refers to the first visual word (first cluster center) that resulted from applying k-means++ on thef ResNet50 , N is the size of the vocabularyVOC ResNet50 and M is the size ofVOC HOG .

Step5 : Fuse both vocabularies VOC ResNet50 , VOC HOG by concatenating VOC ResNet50 with VOC HOG vertically as represented in Equation (3):

After building the vocabulary (dictionary) using steps 1-5, the process of the feature extraction for a query image starts with step 2 (preprocessing), step 3 (feature extraction using ResNet-50 and HOG), then clustering the output of step 3 and builds two histograms usingVOC fusion where the first one related to f ResNet50 and the second one related tof HOG . Finally, concatenates the resulted histograms to form the feature vector in its final form that will use in the classification and retrieval tasks.

The proposed method that relies on feature fusion of ResNet-50 and HOG descriptors based on BoVW consists of several steps, summarized in Figure 3 , as following:

Step1 : Choose randomly 40% images from each class in the database to build the dictionary.

Step2 : Preprocessing step.

Step3 : Extract the feature vectors from the enhancement image after resizing it to 224 × 224 using:

• The layer 'avg_pool' of the ResNet-50 to getf ResNet50 . • The HOG method to getf HOG .

Step4 : Fuse the features by concatenating the f ResNet50 andf HOG to get fv fusion .

Step5 : For compacting the feature vectors, we apply the K-means++ clustering algorithm onfv fusion , knowing that the centers of the clusters compose the visual words. The set of visual words constitute the vocabulary, which formed by merging the visual words as the Equation (4):

where the VW fv fusion 1 refers to the first visual word (first cluster center) that resulted from applying k-means++ on thefv fusion , and N is the size of the vocabularyVOC fv fusion .

After building the vocabulary using steps 1-5, the process of the feature extraction for a query image starts with step 2 (preprocessing), step 3 (feature extraction using ResNet-50 and HOG) and step 4 (fusion features), then clusters the fused Section C: Computational Intelligence, Machine Learning and Data Analytics

The Computer Journal, Vol. 00 No. 00, 2021 features and builds the histogram usingVOC fv fusion , which will be the feature vector in the final form.

The preprocessing step is an essential step in the medical system that aims to enhance the image in terms of contrast, brightness, removing noise and adjusting the size. In our proposed methods, the preprocessing step consists of two steps as illustrating in Figure 4 :

• High-frequency emphasis filter (HFEF) for sharpening an image by emphasizing the edges. • Histogram equalization for maximizing the image contrast and mapping the lowest and highest intensity pixels to 0 and 1, respectively.

In more details, the preprocessing step includes the following steps:

1. Convert the input image to a frequency domain by applying the Discrete Fourier Transform. 2. Apply the HFEF with Gaussian high-pass filter (GHPF) on the transformed image using the Equations (5) and (6) as the following:

where k 1 offsets the value of the transfer functionH GHPF , whereas k 2 controls the contribution of high frequencies, and D 0 is the cutoff frequency. In our work we used k 1 = 0.5 and k 2 = 0.75. 2. Performing the histogram equalization [30] to improve the image's contrast by spreading out the histogram.

ResNet-50 is a deep neural network [27] that consists of 50 layers, and more than 23 million neurons and takes 224 × 224 color image as input. This network trained on the famous ImageNet database [31] , which includes more than 1 million and three hundred thousand images distributed on thousand classes. Therefore, it trained in a very rich group of images and visual features. Neural networks tend to be deeper and deeper so that they can extract and analyze the smallest details in the image and represent them, thereby increasing the network's accuracy in the classification task. However, increasing the depth of the network leads to the arrival of the network accuracy to saturation and often decreases the network accuracy. For solving those issues, it is necessary to use new technology in the structure of the network, which is Residual Learning. The Resent-50 network has gained its name from this Residual learning. We can achieve residual learning by utilizing skip connections [32] that define as extra connections between nodes in different layers of a neural network that skip one or more layers of nonlinear processing.

To train the network, we divided the image databases into 80% for training and 20% for testing. Matlab R2018b programming language used to train the model. All experiments performed on the desktop with core i7 and 8GB RAM of Windows 7 operating system. Regarding training options, we used the stochastic gradient descent with momentum optimizer. The batch size, learning rate and the number of epochs were experimentally set to 20, 1e-4, 10 respectively, and a shuffle of the training data achieved before each training epoch. We accomplished data augmentation using geometric transformation techniques applied to the original images, such as Section C: Computational Intelligence, Machine Learning and Data Analytics

The Computer Journal, Vol. 00 No. 00, 2021 cropping, rotating, reflection around X (Y) axis, horizontal (vertical) scaling, shearing and translation. The aim of data augmentation is to increase the number of training samples, preventing data overfitting in neural networks [33, 34] and improving performance in imbalanced class problems [35] .

HOG feature extraction [28] involves the following steps:

• Convolve the image with two filters that are sensitive to horizontal and vertical rightness gradients. These filters capture edge, contour and texture information. 

In the beginning, the patient (who wants to know whether has COVID-19 disease or not) clinically examines by the specialist doctor to determine the symptoms and the appropriate radiographic method. If a person does not have any respiratory symptoms, then the RT-PCR test remains the best method. When there are respiratory symptoms of the disease, it is helpful to use Artificial Intelligence and Image Analysis to detect COVID-19 disease rather than RT-PCR because it reduces the need it as well as it is faster, cheaper and more accurate than the RT-PCR test as will be proved in this study.

Regarding the image databases, we dealt with the two types of images (chest X-ray and CT) as the following:

• The chest X-ray image is classified into one of three types: NORMAL, PNEUMONIA or COVID-19, and we have two classification levels as follows: • The first level aims to classify the input image into The conditional flow chart of the classification task is shown in Figure 5 .

In this work, we used an Adaptive Boosting-based Ensemble classifier [36] , which aims to build a strong classifier that learns from the mistakes of the weak learners in the ensemble, i.e. there are a number of iterations (weak classifiers) combined to create a strong classifier. Knowing that each iteration is used to update the weights by comparing the predicted and actual training data. Suppose we want to build an ensemble classifier that has three weak classifiers (WC1, WC2 and WC3) based on the decision trees algorithm. This classifier works as shown in Figure 6 .

For the classification task, we used four different databases; knowing that, we divided each database into 80% for training and 20% for testing. The description of each database is as the following:

We used this database to train the classifier to be able to classify the input image into an X-ray or CT scan classes. The database consists of the following classes:

• Chest CT scan class that consists of 300 images, selected randomly from [37] . • Chest X-ray class that consists of 300 images, selected randomly from [38] .

Regarding the X-ray classification, we have two classification levels with two different databases. The aim is to classify the chest X-ray images into one of three classes (NORMAL, PNEUMONIA and COVID-19). The first level of classification aims to classify the chest X-ray image to COVID-19 or NOTCOVID-19. Therefore, the image database here contains the following classes:

• COVID-19 class that contains 225 images available in [38] . • NOTCOVID-19 class that contains 1000 images selected from the Kaggle repository [39] and those images are as follows: 500 NORMAL images and 500 PNEUMONIA images.

If the chest X-ray image is classified as NOTCOVID-19, we proceed to the second level to classify the X-ray images to either NORMAL or PNEUMONIA. This type of classification called binary classification, and the image database consists of the following classes: 

We used this database to make the system able to classify the CT scan images into COVID-19 or not (i.e. binary classification). Therefore, it consists of two classes as the following:

• COVID-19 class that contains 350 images. • NOTCOVID-19 class that contains 350 images.

Those images obtained from the open-source GitHub repository [37] . The details of the four databases illustrate in Table 2 .

The main performance measure [40] [41] [42] in the classification task is the confusion matrix that defined as a table with four different combinations of predicted and actual values as the following:

• True positive (TP), i.e. the image is classified as COVID-19, and it is actually COVID-19. • True negative (TN), i.e. the image is classified as NOTCOVID-19, and it is actually NOTCOVID-19. • False positive (FP), i.e. the image is classified as COVID-19; however, it is NOTCOVID-19 (known as type 1 error). • False negative (FN), i.e. the image is classified as NOTCOVID-19, however, it is COVID-19 (known as type 2 error).

From the confusion matrix, we can derive five important measures that are, Sensitivity (Recall), Specificity, Precision, F-measure and Accuracy. For all metrics, the higher are the better. Those metrics calculated as follows:

• Sensitivity (also known as Recall) calculates using Equation (7):

Recall = TP TP + FN (7) (8) • Precision calculates using Equation (9): Precision = TP TP + FP (9) • F-measure calculates using Equation (10):

Recall + Precision (10) • Accuracy calculates using Equation (11):

The vocabulary (dictionary) size is an important parameter in our proposed methods. As we increase the size of the vocabulary at a certain level, the performance of the system increases. However, the larger sizes of the vocabulary lead to overfitting that decreases the performance. In this experiment, we studied the effect of the different vocabulary sizes on the performance analysis in terms of the classification accuracy of the proposed methods. Regarding the X-ray (first and second classification level) and CT databases, we have used different vocabulary sizes (i.e. 50, 200, 400, 600, 800, 1000 and 1200) for measuring the performance of the proposed methods to find out where the proposed techniques are generating the best results.

According to the experimental results shown in Figure 7a -c, we conclude the following:

• The classification accuracy of the Proposed1 method start increasing from the 50-vocabulary size until 600 then start decreasing, because of the overfitting. Knowing that in the Proposed1 method, we fused two vocabularies (the first vocabulary regards the ResNet-50 and the second one regards the HOG). Moreover, the vocabulary sizes of the Proposed1 method present in Figure 7a -c are the size of each vocabularies. Therefore, the feature vector length generated with vocabulary size x is 2x. • As well as, the classification accuracy of the Proposed2 method starts increasing from the 50-vocabulary size until 1000 then starts decreasing. As illustrated in Figure 7a -c, the accuracy of the Proposed2 method performs better than Proposed1 method on 1000vocabulary size due to more compact representation of the visual contents of the images because the size of the vocabulary is twice less compared with visual word fusion technique (Proposed1). • The Proposed1 method performs better comparing to Proposed2 method for almost all vocabulary sizes in the X-ray (first and second level) and CT scan databases. Moreover, we got the best accuracies of the Proposed1 method on 600-vocabulary size, and the results are as the following:

. 99.18% for the X-ray database (first classification level). . 98.50% for the X-ray database (second classification level). . 97.86% for CT scan database.

• The best accuracies of the Proposed2 method obtained on 1000-vocabulary size, and the results are as the following:

. 98.78% for the X-ray database (first classification level). . 96.5% for the X-ray database (second classification level). . 96.43% for CT scan database.

For all classification tasks, we used our proposed methods for feature extraction and the Adaptive Boosting Ensemble classifier for the classification. As we concluded in the previous experiment, the best accuracy of the visual words fusion of ResNet-50 and HOG descriptors based on BoVW methodology (Proposed1) was on 600-vocabulary, whereas the best classification accuracy of the features fusion of ResNet-50 and HOG descriptors based on BoVW methodology (Proposed2) obtained on 1000-vocabulary size.

In this experiment, and regarding the X-ray (first and second classification level) and CT databases, we will compare our proposed methods (with 600-vocabulary size for Proposed1 method and 1000-vocabulary size for Proposed2 method) in terms of sensitivity, specificity, precision, F-measure and accuracy, with the following methods: [2, 15, 22] , which focused on the deep learning-based COVID-19 detection.

We used 600-vocabulary size for implementing both ResNet-50 + BoVW and HOG+BoVW because it seems to be an intermediate value between enhancing the system's performance and the overfitting. The experimental results shown in Figure 8a -c proved that our proposed methods that used both deep learning and handcraft features (Proposed1 and Proposed2) improved the performance of classification as compared with standalone ResNet-50 based on BoVW, standalone HOG descriptors based on BoVW and other recent methods. Knowing that the Proposed1 method gives the best results for the aforementioned databases, as the following:

• The accuracy of classifying the X-ray image to COVID-19 or NOTCOVID-19 is 99.18%; the first level of X-ray classification (second database). • The accuracy of classifying the X-ray image to NOR-MAL or PNEUMONIA is 98.5%; the second level of X-ray classification (third database). • The accuracy of classifying the CT scan image to COVID-19 or NOTCOVID-19 is 97.86%; the fourth database.

The classification accuracies using the Proposed2 method for the second, third and fourth databases are 98.78%, 96.50% and 96.43%, respectively. The sensitivity, specificity, precision, F-measure and accuracy for second, third and fourth databases illustrated in Figure 8a -c.

• In the first database, we used only 600 chest images; 300 images for X-ray and other 300 images for CT scan. Knowing that, we got 100% accuracy for both Proposed1 (on 600-vocabulary size) and Proposed2 (on 1000-vocabulary size), and the accuracy remains constant while increasing the database size. Moreover, we got the same accuracy (100%) for other comparing methods. • This system will store in a central computer (server); thus, differentiating between a chest X-ray and a chest CT image is an important primary task. Therefore, after the radiographic image is taken, it will be sent to the server, which in turn performs its work and sends the results back to the sender.

• It should note that all performance criteria used in the classification tasks are important because our X-ray database is imbalanced. Therefore, we did not rely only on the accuracy, but also on other criteria to demonstrate the importance of the work. • It should note that the COVID-19 images used in our work related to COVID-19 disease, without a doubt, do not include all cases of the disease due to the novelty of the disease. However, our work can consider as a smart prototype that tested it on the available images and we obtained promising results. Thus, in the event of a large image database that includes all cases of COVID-19 diseases, we will obtain satisfactory results. • This system provides the diagnosis result in a few seconds (depending on system specifications). Moreover, all we need is a radiograph scan that is also cheaper than the RT-PCR test, as well as the accuracy of this system is better than the RT-PCR test.

The COVID-19 detection is considered a critical system, and because diagnosing the COVID-19 using Artificial Intelligence and Image Analysis is not 100% accurate, it is necessary to use a supporting mechanism to detect the wrong diagnosis (wrong classification). This is the reason of needed the second task that aims to retrieve the most similar images from the database based on measuring the similarity. For example, if we retrieve 10 similar images from the database, then the majority of the retrieved images will be from the class of the query image, and this will guide us to verify the work of the classifier. Moreover, we assume that each image stored in the database has a medical report, and this facilitates diagnosing of the studied case. The image retrieval task consists of off-line and on-line phases. The off-line phase [43] defined as extracting the features from all images in the database to build a features database. Whereas in the on-line phase, we extract the features from the query image and measure the similarity distance between the query feature and features database. In our work, we used Euclidean distance as a similarity measure [44] , which defines as the Equation (12): (12) where f DB ji is the i th feature of the j th image in the database, and f Q i is the feature vector of image query.

The performance measures used to evaluate the image retrieval task [44] are Precision (number of similar images when we Section C: Computational Intelligence, Machine Learning and Data Analytics

The Computer Journal, Vol. 00 No. 00, 2021 

Average precision i (15) We got the mAP measure by calculating the top 10, top 20 . . . top 100, then take the average and repeats this procedure for all images in the database.

We used the following databases to evaluate the performance of the retrieval task:

• X-ray database, which consists of three classes as the following:

-COVID-19 class: It contains 225 images taken from the second classification database. -NORMAL class: It contains 500 images taken from the third classification database. -PNEUMONIA class: It contains 500 images taken also from the third classification database.

• CT scan database, which consists of two classes as the following:

-COVID-19 class: It contains 350 images taken from the fourth classification database.

-NOTCOVID-19 class: It contains 350 images taken also from the fourth classification database.

We have evaluated the performance of the image retrieval task in terms of mAP on different vocabulary sizes (i.e. 50, 200, 400, 600, 800, 1000 and 1200) for both databases, using the following techniques:

• Visual words of ResNet-50 and HOG descriptors (Pro-posed1) based on the BoVW methodology. According to the experimental results shown in Figure 9a and b, the experimental analysis as the following: . P-R curves of our proposed methods and other methods using Euclidean distance for (a) X-ray, (b) CT scan databases.

• The best accuracy of the Proposed2 method in term of mAP (85.35% for X-ray database and 86.144% for CT scan database) was on 1000-vocabulary size.

We conclude from the previous experiment that the best results in terms of mAP of Proposed1 (Proposed2) method obtained on 600-vocabulary size (1000-vocabulary size) for both databases.

In this experiment, we will measure the performance of our proposed methods on specific vocabulary sizes (600 for Proposed1 and 1000 for Proposed2) in terms of precision, recall and mAP. For both databases, we calculated precision (top 10 images), recall (top 100) and mAP, as well as we compared our proposed methods with the following:

• Deep learning based methods referred in [2, 15, 22] . • Standalone ResNet-50 feature techniques based on BoVW methodology with 600-vocabulary size. • Standalone HOG feature techniques based on BoVW methodology with 600-vocabulary size.

The comparisons of our proposed methods with the aforementioned methods in terms of precision, recall and mAP regarding both databases are illustrated in Table 3 . Moreover, Figure 10 studies the relationships between precision and recall by calculating the Precision-Recall (P-R) curves for both databases. The first point of each curve refers to the precision (top 10), and the last point refers to the recall (top 100), and between them, we have the PR values for 20, 30 . . . 90 retrieved images.

From Table 3 and Figure 10 , we can conclude the following:

• Our proposed method relies on visual words fusion of ResNet-50, and HOG descriptors based on BoVW methodology performs better than the features fusion of ResNet-50 and HOG descriptors based on BoVW method in terms of precision, recall and mAP. • Both proposed methods perform better than standalone ResNet-50 and standalone HOG based on the BoVW methodology. 

In our study, we proposed a diagnosis system that detects the COVID-19 disease using chest X-ray or CT scan images. This system performs two major tasks that are classifying the image and retrieving similar images. We proposed two novel methods for the feature extraction, the first one (Proposed1) relies on Visual Words Fusion of ResNet-50 and HOG descriptors based on BoVW methodology. The second one (Pro-posed2) relies on Features Fusion of ResNet-50 and HOG descriptors based on BoVW methodology. We used those features to train the Adaptive Boosting classifier and retrieve the most similar images of the scanned images. The results obtained in our work are very promising, and the system can use in the hospital as a complementary method to the RT-PCR test.

In the future, we can develop the system to be able to determine the period of the injury, i.e. the system will diagnose the pathological condition, and when it is COVID-19, the system will determine whether it is in the first week of the injury or the second week and so on. Finally, making the system able to locate the severity degree of COVID-19 disease and suggesting a therapeutic protocol appropriate to the degree of specific risk.

The data underlying this article are available in [Mendeley Data], at http://dx.doi.org/10.17632/h4jh5khgzy.1.

Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia

COVID-CT-dataset: a CT image dataset about COVID-19. arxiv:2003.13865V3

Sensitivity of chest CT for COVID-19: comparison to RT-PCR

Correlation of chest CT and RT-PCR testing for coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases

Development and evaluation of a multiplex conventional reverse-transcription polymerase chain reaction assay for detection of common viral pathogens causing acute gastroenteritis

COVID-19 pneumonia in asymptomatic trauma patients; report of 8 cases

Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China

Diagnostic Imaging in COVID-19 Chest X-ray and CT

Dynamic evolution of COVID-19 on chest computed tomography: experience from Jiangsu Province of China

Imaging and clinical features of patients with 2019 novel coronavirus SARS-CoV-2

Asymptomatic novel coronavirus pneumonia patient outside Wuhan: the value of CT images in the course of the disease

The role of imaging in 2019 novel coronavirus pneumonia (COVID-19)

Chest X-ray in new coronavirus disease 2019 (COVID-19) infection: findings and correlation with clinical outcome

Diagnosing COVID-19 from chest X-ray in resource limited environmentcase report

A deep learning algorithm using CT images to screen for corona virus disease (COVID-19)

Automated detection of diabetic subject using pre-trained 2D-CNN models with frequency spectrum images extracted from heart rate signals

Brain tumor detection using fusion of hand crafted and deep learning features

The skin cancer classification using deep convolutional neural network. Multimed

A comparative study of deep learning architectures on melanoma detection

Detecting and classifying lesions in mammograms with deep learning

Automated invasive ductal carcinoma detection based Section C: Computational Intelligence

using deep transfer learning with whole-slide images

A deep learning system to screen novel coronavirus disease

Deep learning enables accurate diagnosis of novel coronavirus (COVID-19) with CT images

A weakly-supervised framework for COVID-19 classification and lesion localization from chest CT

Using Artificial Intelligence to detect COVID-19 and community-acquired pneumonia based on pulmonary CT: evaluation of the diagnostic accuracy

Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks

Deep residual learning for image recognition

Histograms of oriented gradients for human detection

) k-means++: the advantages of careful seeding

Digital image processing

Image-net.org

Skip connections eliminate singularities

Best practices for convolutional neural networks applied to visual document analysis

Deep, big, simple neural nets for handwritten digit recognition

SMOTE: synthetic minority over-sampling technique

AdaBoost for feature selection, classification and its relation with SVM, a review. Phys. Procedia

COVID-19 Radiography Database (2020) Kaggle.com

Chest X-Ray Images (Pneumonia) (2020) Kaggle.com

Evaluation: from precision, recall and f-measure to ROC, informedness, markedness & correlation

Statistics notes: diagnostic tests 1: sensitivity and specificity

An introduction to ROC analysis

Content-based image retrieval using local patterns and supervised machine learning techniques

Multimedia image retrieval system by combining CNN with handcraft features in three different similarity measures

Section C: Computational Intelligence