key: cord-0959470-qhdtzelq
authors: Li, Dongguang; Li, Shaoguang
title: An artificial intelligence deep learning platform achieves high diagnostic accuracy for Covid-19 pneumonia by reading chest X-ray images
date: 2022-03-06
journal: iScience
DOI: 10.1016/j.isci.2022.104031
sha: 890c63ca25f58d87d18045cc7661038379826fe8
doc_id: 959470
cord_uid: qhdtzelq

Coronavirus disease 2019 (Covid-19) causes deadly lung infection (pneumonia). Accurate clinical diagnosis of Covid-19 is essential for guiding treatment. Covid-19 RNA test does not reflect clinical features and severity of the disease. Pneumonia in Covid-19 patients could be caused by non-Covid-19 organisms and distinguishing Covid-19 pneumonia from non-Covid-19 pneumonia is critical. Chest X-ray detects pneumonia, but a high diagnostic accuracy is difficult to achieve. We develop an artificial intelligence (AI) deep learning method with high diagnostic accuracy for Covid-19 pneumonia. We analyzed 10,182 chest X-ray images of healthy individuals, bacterial pneumonia and viral pneumonia (Covid-19 and non-Covid-19) to build and test AI models. Among viral pneumonia, diagnostic accuracy for Covid-19 reaches 99.95%. High diagnostic accuracy is also reached for distinguishing Covid-19 pneumonia from bacterial pneumonia (99.85% accuracy) or normal lung images (100% accuracy). Our AI models are accurate for clinical diagnosis of Covid-19 pneumonia by reading solely chest X-ray images.

Only within the first year of the pandemic, more than 100 million people worldwide have been diagnosed with Covid-19 that is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (Lam et al., 2020; Ziegler et al., 2020) , and proximate 2% death rate has been observed. In addition, fast-spreading SARS-CoV-2 variants have been identified worldwide (Grubaugh et al., 2020; Kirby, 2021; Korber et al., 2020; Tang et al., 2020) . Although the majority of Covid-19 patients have mild symptoms and do not need specific treatment, approximately 15% of the patients end up developing severe pneumonia that J o u r n a l P r e -p r o o f pre-trained network with transfer learning is typically much faster and easier than training a network from scratch. Medical image analysis and computer-assisted intervention problems have been increasingly addressed with deep-learning-based solutions (Campanella et al., 2019) .

Although the available deep learning platforms are flexible, they do not provide specific functionality for medical image analysis and their adaption for this domain of application requires substantial implementation effort (Razavian, 2019) . Consequently, there has been substantial duplication of effort and incompatible infrastructure developed across many research groups (Appenzeller, 2017) . Furthermore, deep learning helps to generate computational models consisting of multiple processing layers to learn representations of data with multiple levels of abstraction (LeCun et al., 2015) . Compared to machine learning that learns to conduct classification tasks directly from data, deep learning learns and abstracts the relevant information automatically while the data are being processed. Within the deep learning networks, the processing layers are interconnected via nodes (neurons), and each hidden layer receives information from the previous layer. Of all deep learning networks, convolutional neural networks (CNNs) are most commonly used, because CNNs can transform a multidimensional input image into a desired output (LeCun and Bengio, 1998) . In general, a CNN is composed of an input layer and an output layer with several hidden layers in between, and the most common layers are convolution, activation or ReLU (rectified linear activation function unit), and pooling.

Each layer learns to detect different features in the input data. Deep learning networks have been widely used in the artificial intelligence (AI) field for signal data classification and could be powerful tools for analyzing X-ray images, we believe.

Using deep learning, here we generate AI models to only read chest X-ray images of patients to reach nearly 100% diagnostic accuracy for Covid-19.

J o u r n a l P r e -p r o o f

To develop accurate AI models for reading chest X-ray images, the quality of the images affects the outcomes. We found that sample variations were huge, posing a difficult challenge in building accurate AI models for reading chest X-ray images. A major type of sample variation was related to image collection, reflected by image darkness, contrast, size, orientation, etc.

( Figure 1A ). It was also common that some images contained unexpected non-human structures such as image labeling, pen marks, pictures of medical treatment devices, etc. ( Figure 1B ). We realized that we need to build our AI models capable of identifying and excluding those nonpneumonia variations when reading chest X-ray images. It is equally important that our AI models should have abilities to distinguish Covid-19 pneumonia from non-Covid-19 pneumonia caused by viruses or bacteria.

As described above, we faced a difficult challenge to overcome huge variations of chest X-ray images when generating highly accurate AI models for the diagnose of Covid-19 pneumonia. It was clear to us that employment of a single deep CNN, as other researchers often did in establishing AI models, would fail to generate an AI model that allows for achieving a high diagnostic accuracy upon reading chest X-ray images of patients. Therefore, we developed a unique voting algorithm that allowed for combining 17 CNNs and utilizing them as a whole to generate our AI models for optimizing the fitness of the data (Figure 2A) , because we had a success in employing our AI models to read pathologic tissue images of patients with diffuse J o u r n a l P r e -p r o o f large B-cell lymphoma with a 100% diagnostic accuracy . The architecture of the CNN used in this study comprised multiple layers including convolution, ReLU and pooling ( Figure 2B ). This advanced deep learning neural network approach was aimed to predict Covid-19 disease in patients and increase diagnostic accuracy using classification models with multiple CNNs based on deep learning ( Figure 2B ). We believe that the use of our voting algorithm with the combined 17 CNNs would be required for reconciling huge image variations to achieve high diagnostic accuracy in clinic.

We reviewed totally 10,182 chest X-ray images obtained from public datasets for healthy individuals (5,510 cases), bacterial pneumonia (2,530) and viral pneumonia (non-Covid-19: 1,345 cases; Covid-19: 797 cases). We divided those X-ray images into classification groups based on the causes of pneumonia: Covid-19 virus, other viruses, bacteria and healthy (as a control) ( Figure 2C ). We used 80% of the X-ray images for model training and 10% of the images for model validation, with the remaining 10% of the images for model testing. We expected that our combined 17 CNNs approach would help us to establish powerful AI models for reading chest X-ray images of Covid-19 pneumonia and distinguishing them from the images of other types of pneumonia with high diagnostic accuracy.

We took a binary classification approach using deep learning, i.e., we generally set up each comparison group as Covid-19 versus non-Covid-19 (other viruses, bacteria or healthy) with the combined 17 CNNs for classification ( Figure 3A ). On the other hand, we realized that training a deep CNN from scratch is computationally expensive and often requires a large amount of training data (about a few millions) which is not available in any public database including the J o u r n a l P r e -p r o o f one we used. Therefore, we used transfer learning ( Figure 3A ). We generated a particular AI model for each comparison group: classifier A (pneumonia vs healthy), classifier B (viruses vs bacteria), classifier C (Covid-19 vs other viruses), classifier D (Covid-19 vs bacteria) or classifier E (Covid-19 vs healthy) ( Figure 3B ). Specifically, we divide X-ray images into classification groups based on the causes of pneumonia: Covid-19 virus, other viruses, bacteria and healthy (as a control). 80% of the X-ray images were used for model training and 10% of the images for model validation, with the remaining 10% of the images for model testing. Before performing the classification with multiple CNNs, each every of CNNs was trained (or finetuned) with optimized parameters to achieve reasonable good performance. Those parameters include learning rate (0.001-0.0001), validation frequency (10-30), mini. batch size (16-64), max. epochs (20-50), and algorithms (sgdm, adam and rmsprop). In the end, all trained CNNs were fed into a platform where our core voting algorithm made them to work together to produce the final classification results. With a focus on specifically identifying Covid-19 pneumonia and distinguishing it from pneumonia caused by other viruses (classifier C) and bacteria (classifier D), we found that our AI models achieved 99.95% diagnostic accuracy for Covid-19 from reading chest X-ray images of virus-caused pneumonia and achieved 99.85% accuracy for Covid-19 from reading the images of pneumonia caused by Covid-19 and bacteria ( Figure 3C ).

We also generated an AI model for reading the Covid-19 and healthy images (classifier E), and the diagnostic accuracy for Covid-19 reached 100% ( Figure 3B ). These results demonstrate that our AI models provide accurate diagnosis of Covid-19 through reading chest X-ray images for clinical use.

In a practical sense, a patient with suspected lung infection demands further examination for pneumonia, for example, by X-ray. If pneumonia existed, it is beneficial to determine J o u r n a l P r e -p r o o f whether pneumonia is caused by viruses including Covid-19 or bacteria for guiding proper treatments. Therefore, we generated an AI model to distinguish pneumonia from healthy chest X-ray images (classifier A) and another AI model to distinguish virus-caused pneumonia from the one caused by bacteria (classifier B). The model of classifier A achieved 99.23% diagnostic accuracy for pneumonia and the model of classifier B achieved 99.06% accuracy for identifying pneumonia caused by viruses or bacteria ( Figure 3B ).

We originally developed the 17 CNN deep learning platform and used it in this study. To demonstrate that our 17 CNN approach is more superior than any individual CNN often used in the AI field, we analyzed all five classification groups using each of 17 CNN separately and achieved diagnostic accuracy ranging from 79 to 99%, contrasting sharply to diagnostic accuracy ranging from 99.06 to 100% accuracy using our combined 17 CNN platform (Table 1) . It is necessary to point out that when all classifiers (A-E) were analyzed by the same CNN, none of an individual CNN alone could achieve a diagnostic accuracy greater than 99% (Table 1) . By contrast, our combined 17 CNN platform achieved greater than 99% of diagnostic accuracy for all classifiers (Table 1 ). In our study on AI-assistant pneumonia diagnosis for Covid-19 detection, we built several classifiers, and the classification outcomes with those classifiers shown in the confusion matrices also reflected the high accuracy of our combined multiple CNN deep learning platform (Figure 4) .

We should point out that accuracy only measures the number of correctly predicted values among the total predicted value. Although it is a good measure of performance, it is not complete and does not work well when the cost of false negatives is high. To further evaluate our deep learning platform, we employed more evaluation measures including precision (PPV: positive predictive value), NPV (negative predictive value), recall (sensitivity), specificity and J o u r n a l P r e -p r o o f F1 score, because these methods are believed to be valuable ways for validating performance evaluation measures (Powers, 2008; Tharwat, 2021) . Our deep learning platform allowed us to obtain high values in precision (>99%), negative predictive value (> 98%), recall (>98%), specificity (>99%) and F1 score (>98%) ( Table 2) .

Besides chest X-ray, lung CT images are also taken for clinical diagnosis of Covid-19 pneumonia. Because availability of CT examination is often limited to larger hospitals, we focused on developing our AI models by solely using chest X-ray images that can be taken in almost any size of medical facilities, including small clinics/hospitals even in the remote areas of countries.

In this study, we used transfer learning which allows transferring knowledge from one domain to another by using trained weights from the previous domain. Traditionally, the weight matrices of several layers in a CNN are initially frozen while training on the secondary domain, and only the remaining layers are fine-tuned. This process works well when an overlapping region in the low-level features is shared by both domains. In our case, since the ImageNet and the COVID-19 datasets belong to non-overlapping domains, the trained weights from the ImageNet dataset were used to initialize the weights of our model, and none of them were frozen.

With the help of multiple models, to classify a single x-ray image, the final classified class is decided by a majority rule (May, 1952) . Majority rule is a decision rule that selects alternatives which have a majority, that is, maximum votes among those models involved. The idea has been introduced in this study from one of the election theories, called approval voting.

Under approval voting, a voter indicates which candidates he or she approves. A candidate receives one point for each voter that approves of the candidate. A candidate receives no points for each voter that does not approve of the candidate. For a single candidate election, the candidate with the most points wins the election. Naturally, approving of all candidates or disapproving of all candidates does not change the difference in the number of points the candidates receive. If there are an odd number of voters and no voter approves or disapproves of both candidates, then approval voting is equivalent to majority rule: each voter gives one point to the candidate that he or she prefers and the candidate with majority of the points wins the election. Determining a winner for a two-candidate election is easy, which will be a binary classification problem. It has been shown that the majority rule is the only two-candidate election procedure in which each voter is treated equally, that is, only the number of votes matters, not who casts the votes; each candidate is treated equally, that is, only the number of votes that a candidate receives determines if he or she wins the election; and, a candidate can never be harmed by receiving more votes, that is, if a candidate wins the election, then it would still win the election if some of the voters who had voted for the candidate's opponent now voted for the candidate (May, 1952) .

In general, the quality of available chest X-ray images of patients varies hugely across hospitals, and it is challenging to generate highly accurate AI models for diagnosing Covid-19 pneumonia. Published deep learning studies with a relatively small number of cases that the diagnostic accuracy for Covid-19 is about 90% (Jin et al., 2020; Li et al., 2020b) , implying that a significant number of false-negative cases existed. Practically, we believe that an accuracy close to 100% is required for Covid-19 diagnosis in a clinical setting. Recently, several studies suggest that the use of chest X-ray images may help to assess the severity of Covid-19 (Cohen et al., J o u r n a l P r e -p r o o f 2020; Wong et al., 2021; Zhu et al., 2020) , emphasizing the clinical significance of chest X-ray in diagnosing Covid-19 pneumonia. In our study, we have developed reliable AI models with nearly 100% diagnostic accuracy for Covid-19 pneumonia by solely reading chest X-ray images of patients, building a solid foundation for employment of the models in clinic.

We have trained and built 5 binary classification models called Classifier A, Classifier B, Classifier C, Classifier D and Classifier E respectively (Figures 2 and 3) . Our AI models were built mainly for diagnosing Covid-19, but they were also capable of identifying non-Covid-19 pneumonia caused by other types of virus or bacteria, providing an opportunity to expand the use of our models to diagnosis of non-Covid-19 viral or bacterial pneumonia. This approach is meaningful because treatment options for viral and bacterial pneumonia are different. We envision that when individuals who have been healthy but might have had lung infection come to clinic, our AI models can help to read chest X-ray images to determine whether they have had any form of pneumonia caused by viruses or bacteria, followed by confirming whether they have had Covid-19 pneumonia.

In summary, we took a binary classification deep learning approach using our combined 17 CNNs and core voting algorithm by reading whole chest X-ray images and classifying them as either positive or negative to Covid-19. As a result, we have achieved nearly 100% diagnostic accuracy for Covid-19 pneumonia with high sensitivity and specificity. Our immediate next step would be to apply our AI models in a clinical trial for chest X-ray-based diagnosis of Covid-19.

J o u r n a l P r e -p r o o f pneumonia using our deep learning method for potential clinical use. However, we could not explain why the accuracy did not reach 100% in some comparison groups. In other words, we do not know whether we need to further improve our deep learning method or to verify the correctness of image labeling in the public datasets; the latter is obviously impossible to achieve.

Prior to clinical use of our deep learning method for diagnosing Covid-19 pneumonia, we may need to ensure a control of the image collection process to avoid possible mislabeling of any chest X-ray images. J o u r n a l P r e -p r o o f  This paper was produced using large volumes of publicly available image data.

The authors have made every effort to make available links to these resources as well as the software methods and information used to produce the dataset, analyses, and summary information. The datasets used in this study are available online (https://fts.umassmed.edu/).

All Covid-19 chest X-ray images were obtained from a publicly-available depository site (https://github.com/ieee8023/covid-chestxray-dataset/tree/master/images). They are real cases for patients who tested positive for Covid-19 in hospitals across the global. Non-Covid-19 chest X-ray images were obtained from the Kaggle's Chest X-Ray Images (Pneumonia) dataset (https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia). Based on the causes of pneumonia, we grouped all cases as either Covid-19 or Non-Covid19 (healthy, bacterial pneumonia and other viral pneumonia).

The dataset used to train and evaluate the proposed platform is comprised of a total 10,182 chest X-ray images, and these images are available from https://github.com/ieee8023/covid-chestxray-dataset/tree/master/images and https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia. We combined and modified several different datasets that are publicly available (Kermany et al., 2018; Medicine, 2020; Paul, 2020; Wang et al., 2017) . There is also more information related to those images (Cohen et al., 2020; Jaeger et al., 2014) .

Like many different research groups, we have used different CNNs individually but diagnostic accuracy has not been satisfactory. In our study, we initially used each of the 17 CNNs, respectively, to analyze the chest X-ray images of Covid-19 in each of the five classification groups, and found that the average diagnostic accuracy by using one CNN was ranged from 79% to 99% (Table 1 ). In our view, the diagnostic accuracy needs to be near 100% or greater than 99% prior to employing any deep learning model in medical practice. This is why we programmed multiple models (17 CNNs) into one system with our core algorithms to enhance the performance of deep learning with a goal of achieving 100% diagnostic accuracy. As a result, we have indeed reached 100% accuracy, which is superb to a sole use of any one of the 17 CNNs. In fact, we were not particularly interested in which network(s) contribute the most/the best to the output, because our model was treated as a black box, which is how deep learning should work. However, what we do know is that the output of combining the 17 CNNs is much better than any individual network.

In order to get the multiple CNNs to work together, a core algorithm is developed based on a voting mechanism. In the classification process, each individual CNN model votes for the Covid-19 results either Yes or No. Yes will get a score +1 and No will get a score -1. An array (1) Our ability to combine 17 CNNs and use them together as a single model is definitely

unprecedented. This single model has all of the layers built in those 17 CNNs for conducting transfer deep learning with our datasets, and this novel approach allowed us to achieve a high diagnostic accuracy for Covid-19.

CPU Sever and computer used for conducting all experiments are described previously . Briefly, the MATLAB2019a was used for training AI models. In data preparation, programming and deployment, the toolboxes provided by MATLAB were used, including the deep-learning toolbox and the image processing toolbox.

In this study, we used deep learning to generate computational models that are composed of multiple processing layers, including convolution, activation or ReLU, and pooling ( Figure 2B ). is 20) before convergence. The detail procedure of training is illustrated in Figure 3B . We split our data into 3 datasets. 10% of the data were used for validation, 10% of the data were reserved for testing, while the remaining 80% of the data were used for training. During training, validation data is useful to detect if the network is overfitting. All of 17 trained CNNs were incorporated into a core voting algorithm to work out the final classification output (Figure 2A) with comparison to the performance of each of individual CNNs (Table 1) .

Diagnostic accuracy was used as a measure to evaluate the diagnostic performance, which involved in the use of the following terms: true positive ( virus, other viruses, bacteria and healthy (as a control). 80% of the X-ray images were used for model training and 10% of the images for model validation, with the remaining 10% of the images for model testing. J o u r n a l P r e -p r o o f 

The scientists' apprentice

Clinical-grade computational pathology using weakly supervised deep learning on whole slide images

Predicting COVID-19

Structure of the RNA-dependent RNA polymerase from COVID-19 virus

Making Sense of Mutation: What D614G Means for the COVID-19 Pandemic Remains Unclear

Genetic variants are identified to increase risk of COVID-19 related mortality from UK Biobank data

Clinical features of patients infected with 2019 novel coronavirus in Wuhan

Machine-learning classification of texture features of portable chest X-ray accurately classifies COVID-19 lung infection

Two public chest X-ray datasets for computer-aided screening of pulmonary diseases

Development and evaluation of an artificial intelligence system for COVID-19 diagnosis

Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning

Deep-learning convolutional neural networks with transfer learning accurately classify COVID-19 lung infection on portable chest radiographs

New variant of SARS-CoV-2 in UK causes surge of COVID-19

Evidence that D614G Increases Infectivity of the COVID-19 Virus

Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins

Convolutional networks for images, speech, and time-series

Deep learning

A deep learning diagnostic platform for diffuse large B-cell lymphoma with high accuracy across multiple hospitals

Using Artificial Intelligence to Detect COVID-19 and Community-acquired Pneumonia Based on Pulmonary CT: Evaluation of the Diagnostic Accuracy

A set of independent necessary and sufficient conditions for simple majority decisions

Tuberculosis chest X-ray image data sets

Covid-19 image data collection

Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation

Augmented reality microscopes for cancer histopathology

Deep learning in neural networks: an overview

Initial chest radiograph scores inform COVID-19 status, intensive care unit admission and need for mechanical ventilation

Emergence of a new SARS-CoV-2 variant in the UK

Classification assessment methods

Comorbidities and multi-organ injuries in the treatment of COVID-19

Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Location of Common Thorax Diseases. arXiv

A survey of transfer learning. J Big Data

Towards computer-aided severity assessment via deep neural networks for geographic and opacity extent scoring of SARS-CoV-2 chest X-rays

Pathological findings of COVID-19 associated with acute respiratory distress syndrome

Histopathologic Changes and SARS-CoV-2 Immunostaining in the Lung of a Patient With COVID-19

Distinct characteristics of COVID-19 patients with initial rRT-PCR-positive and rRT-PCR-negative results for SARS-CoV-2

Deep transfer learning artificial intelligence accurately stages COVID-19 lung disease severity on portable chest radiographs

SARS-CoV-2 Receptor ACE2 Is an Interferon-Stimulated Gene in Human Airway Epithelial Cells and Is Detected in Specific Cell Subsets across Tissues