key: cord-0630683-53aq480d authors: Cohen, Joseph Paul; Morrison, Paul; Dao, Lan title: COVID-19 Image Data Collection date: 2020-03-25 journal: nan DOI: nan sha: d59c6e06bad8e42029919130bb3214855f9e421d doc_id: 630683 cord_uid: 53aq480d This paper describes the initial COVID-19 open image data collection. It was created by assembling medical images from websites and publications and currently contains 123 frontal view X-rays. In the context of a COVID-19 pandemic, is it crucial to streamline diagnosis. Data is the first step to developing any diagnostic tool or treatment. While there exist large public datasets of more typical chest X-rays (Wang et al., 2017; Bustos et al., 2019; Irvin et al., 2019; Johnson et al., 2019; Demner-Fushman et al., 2016) , there is no collection of COVID-19 chest X-rays or CT scans designed to be used for computational analysis. In this paper, we describe the public database of pneumonia cases with chest X-ray or CT images, specifically COVID-19 cases as well as MERS, SARS, and ARDS. Data will be collected from public sources in order not to infringe patient confidentiality. Example images shown in Figure 1 . Our team believes that this database can dramatically improve identification of COVID-19. Notably, this would provide essential data to train and test a Deep Learningbased system, likely using some form of transfer learning. These tools could be developed to identify COVID-19 characteristics as compared to other types of pneumonia or in order to predict survival. Currently, all images and data are released under the following URL: https://github.com/ieee8023/ covid-chestxray-dataset. As stated above, images collected have already been made public. This dataset can be used to study the progress of COVID-19 and how its radiological findings vary from other types of 1 Mila, Quebec Artificial Intelligence Institute 2 University of Montreal 3 Department of Mathematics and Computer Science, Fontbonne University 4 Faculty of Medicine, University of Montreal. Correspondence to: Joseph Paul Cohen . pneumonia. Similarly to the outcome of the Chest Xray14 (Wang et al., 2017) dataset which enabled significant advances in medical imaging, tools can be developed to predict not only the type of pneumonia, but also its outcome. Eventually, our model could take inspiration from work by Rajpurkar et al. (2017) , which could predict pneumonia, as well as Cohen et al. (2019) , which deployed such models. Tools could be built to triage cases in the absence of physical tests, particularly in the context of polymerase chain reaction (PCR) tests shortage (Satyanarayana, 2020; Kelly Geraldine Malone, 2020) . These tools could predict patient outcomes such as survival, allowing a physician to plan ahead for specific patients and facilitate management. In extreme situations, where physicians could be faced with the extraordinary decision to choose which patient should be allocated healthcare resources (Yascha Mounk, 2020), such a tool could potentially serve as a measuring device. The current statistics as of March 25th 2020 are shown in Table 1 . For each image, attributes shown in Table 2 are collected. Data is largely compiled from websites such as Radiopaedia.org, the Italian Society of Medical and Interventional Radiology 1 , and Figure1.com 2 . Images are extracted from online publications, website, or directly from the PDF using the tool pdfimages 3 . The goal during this process is to maintain the quality of the images. Data was collected from the following papers: (Phan et al., 2020; Liu et al., 2020; Chen et al., 2020; Paul et al., 2004; Silverstein et al., 2020; Shi et al., 2020; Holshue et al., 2020; Ng et al., 2020; Kong & Agarwal, 2020; Lim et al., 2020; Zu et al., 2020; Cheng et al., 2020; jin Zhang et al., 2020; Lee et al., 2020; Wu et al., 2020; Yoon et al., 2020; Hsih et al., 2020; Cuong et al., 2020; Thevarajan et al., 2020; A large chest x-ray image dataset with multi-label annotated reports COVID-19 Image Data Collection and Zhang, Li. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in wuhan, china: a descriptive study. The Lancet First case of coronavirus disease 2019 (COVID-19) pneumonia in taiwan A Web Delivered Locally Computed Chest X-Ray Disease Prediction System The first vietnamese case of COVID-19 acquired from china. The Lancet Infectious Diseases Preparing a collection of radiology examinations for distribution and retrieval First case of 2019 novel coronavirus in the united states Featuring COVID-19 cases via screening symptomatic patients with epidemiologic link during flu season in a medical center of central taiwan CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison Clinical characteristics of 140 patients infected with SARS-CoV-2 in wuhan, china MIMIC-CXR: A large publicly available database of labeled chest radiographs Testing backlog linked to shortage of chemicals needed for COVID-19 test -CTV News Chest imaging appearance of COVID-19 infection. Radiology: Cardiothoracic Imaging A case of COVID-19 and pneumonia returning from macau in taiwan: Clinical course and anti-SARS-CoV-2 IgG dynamic Case of the index patient who caused tertiary transmission of coronavirus disease 2019 in korea: the application of lopinavir/ritonavir for the treatment of COVID-19 pneumonia monitored by quantitative RT-PCR A locally transmitted case of SARS-CoV-2 infection in taiwan Imaging profile of the COVID-19 infection: Radiologic findings and literature review. Radiology: Cardiothoracic Imaging Radiologic pattern of disease in patients with severe acute respiratory syndrome: The toronto experience Importation and human-to-human transmission of a novel coronavirus in vietnam Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks Shortage of RNA extraction kits hampers efforts to ramp up COVID-19 coronavirus testing Evolution of CT manifestations in a patient recovered from 2019 novel coronavirus (2019-nCoV) pneumonia in wuhan, china. Radiology First imported case of 2019 novel coronavirus in canada, presenting as mild pneumonia. The Lancet Breadth of concomitant immune responses prior to patient recovery: a case report of nonsevere COVID-19 ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases novel coronavirus (COVID-19) pneumonia: Serial computed tomography findings Clinical characteristics of imported cases of COVID-19 in jiangsu province: A multicenter descriptive study. Clinical Infectious Diseases Coronavirus: Extraordinary Decisions For Italian Doctors -The Atlantic Chest radiographic and CT findings of the 2019 novel coronavirus disease (COVID-19): Analysis of nine patients treated in korea Coronavirus disease 2019 (COVID-19): A perspective from china