key: cord-0769883-w2nnvjgw authors: Ikenoue, T.; KATAOKA, Y.; Matsuoka, Y.; Matsumoto, J.; Kumasawa, J.; Tochitatni, K.; Funakoshi, H.; Hosoda, T.; Kugimiya, A.; Shirano, M.; Hamabe, F.; Iwata, S.; Fukuma, S. title: Accuracy of deep learning based computed tomography diagnostic system of COVID-19: a consecutive sampling external validation cohort study date: 2020-11-18 journal: nan DOI: 10.1101/2020.11.15.20231621 sha: ae0637ccacde29aeb53d16540df71abc3444a270 doc_id: 769883 cord_uid: w2nnvjgw Objectives: Ali-M3, an artificial intelligence, analyses chest computed tomography (CT) and detects the likelihood of coronavirus disease (COVID-19) in the range of 0 to 1. It demonstrates excellent performance for the detection of COVID-19 patients with a sensitivity and specificity of 98.5 and 99.2%, respectively. However, Ali-M3 has not been externally validated. Our purpose is to evaluate the external validity of Ali-M3 using Japanese sequential sampling data. Methods: In this retrospective cohort study, COVID-19 infection probabilities were calculated using Ali-M3 in 617 symptomatic patients who underwent reverse transcription-polymerase chain reaction (RT-PCR) tests and chest CT for COVID-19 diagnosis at 11 Japanese tertiary care facilities, between January 1 and April 15, 2020. Results: Of 617 patients, 289 patients (46.8%) were RT-PCR-positive. The area under the curve (AUC) of Ali-M3 for predicting a COVID-19 diagnosis was 0.797 (95% confidence intervals [CI]: 0.762-0.833) and goodness-of-fit was P = 0.156. With a cut-off of probability of COVID-19 by Ali-M3 diagnosis set at 0.5, the sensitivity and specificity were 80.6% and 68.3%, respectively, while a cut-off of 0.2 yielded a sensitivity and specificity of 89.2% and 43.2%, respectively. Among 223 patients who required oxygen support, the AUC was 0.825 and sensitivity at a cut-off of 0.5 and 0.2 were 88.7% and 97.9%, respectively. Although the sensitivity was lower when the days from symptom onset were few, sensitivity increased for both cut-off values after 5 days. Conclusions: Ali-M3 was evaluated by external validation and shown to be useful to exclude a diagnosis of COVID-19. Accuracy of deep learning based computed tomography diagnostic system of COVID-19: a consecutive sampling external validation cohort study Objectives: Ali-M3, an artificial intelligence, analyses chest computed tomography (CT) and detects the likelihood of coronavirus disease in the range of 0 to 1. It demonstrates excellent performance for the detection of COVID-19 patients with a sensitivity and specificity of 98.5 and 99.2%, respectively. However, Ali-M3 has not been externally validated. Our purpose is to evaluate the external validity of Ali-M3 using Japanese sequential sampling data. In this retrospective cohort study, COVID-19 infection probabilities were calculated using Ali-M3 in 617 symptomatic patients who underwent reverse transcription-polymerase chain reaction (RT-PCR) tests and chest CT for COVID-19 diagnosis at 11 Japanese tertiary care facilities, between January 1 and April 15, 2020. and goodness-of-fit was P = 0.156. With a cut-off of probability of COVID-19 by Ali-M3 diagnosis set at 0.5, the sensitivity and specificity were 80.6% and 68.3%, respectively, while a cut-off of 0.2 yielded a sensitivity and specificity of 89.2% and 43.2%, respectively. Among 223 patients who required oxygen support, the AUC was 0.825 and sensitivity at a cut-off of 0.5 and 0.2 were 88.7% and 97.9%, respectively. Although the sensitivity was lower when the days from symptom onset were few, sensitivity increased for both cut-off values after 5 days. Conclusions: Ali-M3 was evaluated by external validation and shown to be useful to exclude a diagnosis of COVID-19. Main Document (no author names) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted November 18, 2020. ; https://doi.org/10.1101/2020.11.15.20231621 doi: medRxiv preprint 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted November 18, 2020. ; Introduction A proper triage system is necessary during this coronavirus disease (COVID-19) pandemic era, [1, 2] as improper triage systems may disadvantage patients and lead to wastage of personal protective equipment (PPE) and hospital infections through admission of infected patients to facilities, causing collapse of the medical system. Although reverse transcription-polymerase chain reaction (RT-PCR) tests have been developed, the delay in waiting for RT-PCR results can hamper proper triage. Computed tomography (CT) is a fast and useful diagnostic tool. Some studies have reported the characteristic findings on chest CT images of COVID-19 patients. [3] [4] [5] [6] [7] [8] Use of chest CT images by radiologists has shown high diagnostic performance for COVID-19. However, even radiologists' interpretations vary largely, because of the influence of their habituation in the interpretation of COVID-19 CT images. [9] Therefore, using CT as a diagnostic tool in general clinical practice is difficult in the current situation. Diagnostic support systems using artificial intelligence (AI) have the potential to replace many of the routine detection, characterisation, and quantification tasks currently performed by radiologists using cognitive ability. [10] AI can prevent the variability of diagnosis from inter-and intra-reader variability. In China, where COVID-19 infection originated, many AI systems were developed for establishing a diagnosis of COVID-19 based on chest CT images. [11] [12] [13] [14] [15] One such system, Ali-M3, can detect the likelihood of COVID-19 in the range of 0 to 1, and has excellent accuracy for the detection of COVID-19 with an accuracy, sensitivity, and specificity of 99.0, 98.5, and 99.2%, respectively. Although Ali-M3 has excellent accuracy, it was developed in a virtual population, which consisted of 3,067 examinations for COVID-19; 1,996 for community-acquired pneumonia; and 1,975 for non-pneumonia, which was different from the general population and its accuracy could be overestimated. [16] To use Ali-M3 to diagnose exclusion of COVID-19, its external validity must be evaluated based on the 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted November 18, 2020. ; https://doi.org/10.1101/2020.11.15.20231621 doi: medRxiv preprint distribution of diseases in a real-world setting. We here conducted a retrospective cohort study to evaluate the external validity of Ali-M3 using the Japanese sequential sampling data of patients who underwent RT-PCR tests and chest CT for diagnosis of COVID-19. This retrospective cohort study consisted of 11 Japanese tertiary care facilities that provided treatment for COVID-19 in each area. We partially followed the guidelines of the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis Statement to plan and report this study (Supplemental Table 1 ). [17] The institutional review board of each facility approved the study and the need to obtain written informed consent was waived. We included patients who underwent both RT-PCR examinations and chest CT for the diagnosis of COVID-19. The potentially eligible participants were identified on the advice of physicians that both RT-PCR test and chest CT be obtained when the patients presented with symptoms or were suspected of having COVID-19. The detailed information of the inclusion criteria is shown in Supplemental Table 2 . We selected patients by using consecutive sampling methods between January 1 and April 15, 2020. The RT-PCR results were extracted from the patients' medical records at each facility. Patients were excluded when the time-interval between chest CT and the first RT-PCR assay was longer than 7 days. All available data on the database were used to maximize the power and generalizability of the results. All images were obtained on one of five types of CT systems, with the patient in the supine position. The details of scanning parameters and systems are shown in Supplemental All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted November 18, 2020. ; https://doi.org/10.1101/2020.11.15.20231621 doi: medRxiv preprint We used a three-dimensional deep learning framework for the detection of COVID-19 infections. [16] The details of this model are included in Appendix 1. The learning of Ali-M3 was stopped before our evaluation. We set a cut-off point for the model output at 0.5, because this cut-off point was used during the developing stage. The investigators who entered the CT images data into Ali-M3 were blinded to the RT-PCR results. The diagnosis of COVID-19 was established by the RT-PCR test, which detected the nucleic acid of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in the sputum, throat swabs, and secretions of the lower respiratory tract samples. [18] We established the RT-PCR tests as the main reference standard. Although the findings of chest CT, interpreted by radiologists, were included as the reference standard in the derivation study, we did not include it as the reference standard in the present study. Statistical analysis was performed using R statistical software, version 3.6.3 (R Foundation for Statistical Computing). Data analysis was performed in a complete-case dataset. Continuous variables are presented as means (standard deviation) and categorical variables are presented as counts and percentages. Using the RT-PCR results as reference, the area under the curve (AUC), sensitivity, specificity, positive-predictive value, and negative-predictive value of the likelihood of COVID-19 as derived from the Ali-M3's analysis of chest CT imaging were calculated. A 95% confidence interval (CI) was determined by the Wilson score method. The goodness-of-fit was calculated using the Le Cessie-Van Houwelingen normal test statistic for the unweighted sum of squared errors. 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted November 18, 2020. ; https://doi.org/10.1101/2020.11.15.20231621 doi: medRxiv preprint The objective of this study was to determine whether this AI model could be used as a screening tool for COVID-19 in the real world. In a clinical situation, physicians require an accurate diagnosis of COVID-19; hence, they insist more on sensitivity than on specificity. For sensitivity analysis, we moved the cut-off point and observed sensitivities and specificities to minimize overlooking COVID-19 patients. In the main analysis, we assumed RT-PCR as the perfect reference (100% sensitivity and 100% specificity). However, in the real world, RT-PCR is not the perfect reference since the sensitivity of the RT-PCR test was estimated at 0-80%. [19] To evaluate the effect of this imperfect reference, we calculated the sensitivity, specificity, and AUC of Ali-M3 using the methods and R code described in the Supplemental Material when varying the sensitivity, but fixing the specificity of RT-PCR at 100%. [20] 3. Effect of the number of days after symptom onset The number of days that have passed since the onset of symptoms affects the performance of antibody and RT-PCR tests in COVID-19 patients. [19, 21] However, it was not clear if this could affect CT images in COVID-19 patients. Sensitivity and specificity were calculated for a group of patients whose symptom onset date was known, among those were those with 14 days or more, as well as those at every 2 days from 0 to 13 days after symptom onset. Imaging is not routinely indicated as a screening test for COVID-19 in asymptomatic individuals. [22] However, CT images are used in assessment of disease severity. We established the severity by evaluating whether oxygen therapy was required and if the patient was asymptomatic while undergoing CT. 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The thickness of the reconstruction slice can affect the diagnostic performance. [23] We separated the dataset for the main analysis by a 3-mm reconstruction slice thickness to account for the fissure in our data set between 3 mm and 4 mm and calculated the performance of the model in each dataset. Table 1 . Overall, 289 patients (46.8%) were diagnosed with COVID-19 using the RT-PCR test. Thirteen patients need more than two RT-PCR tests before being diagnosed with COVID-19. Major symptoms were dry cough (37.6%), fever (33.5%), and sore throat (25.8%). The performance of the confidence score after validation among symptomatic patients is shown in 1. Moving cut-off point Table 2 shows the relationship between cut-off points for the confidence score and performance. When the cut-off point was 0.2, the sensitivity and specificity were 89.2% and 43.3%, respectively. 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted November 18, 2020. ; https://doi.org/10.1101/2020.11.15.20231621 doi: medRxiv preprint 8 2. Simulation of imperfect reference Figure 3 shows the sensitivity and specificity, with the assumption of imperfect reference of RT-PCR test. The AUC was 0.865. When the cut-off point was set at 0.5, using the Youden Index, the sensitivity and specificity were 80.6% and was 81.3%, respectively. When the cut-off point was set at 0.2, the sensitivity and specificity were 89.2% and 51.9%, respectively. Of all symptomatic patients, 600 patients (97.2%) were included in this sensitivity analysis. Of these, the number of days after the onset of symptoms was not known for 17 patients. Figure 4 shows the relationship between test performance and the number of days since the onset of symptoms when the confidence score of Ali-M3 was set at 0.5 or 0.2. Sensitivity values started at 0.7 and increased up to 1.0 until 10-11 days in both cases. However, specificity values remained similar across the strata. The sensitivity increased over 0.9 when the confidence score was set at 0.2 than when the confidence score was set at 0.5. The effects of changing the criteria for patient eligibility are shown n Figure 5 . There were 86 asymptomatic patients (RT-PCR positive: 37). Using these patients only, the AUC was 0.623. When the cut-off point was 0.5, the sensitivity and specificity were 51.4% and 59.2%, respectively. When the cut-off point was 0.2, the sensitivity and specificity were 44.9% and 73.0%, respectively. There were 223 patients who required oxygen support (RT-PCR positive: 97). When using these patients only, the AUC was 0.828. When the cut-off point was set at 0.5, the sensitivity and specificity were 88.7% and 57.9%, respectively. When the cut-off point was set at 0.2, the sensitivity and specificity were 97.9% and 34.9%, respectively. 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted November 18, 2020. ; https://doi.org/10.1101/2020.11.15.20231621 doi: medRxiv preprint 9 5. Effect of the thickness of the CT reconstruction slice of CT There were 320 patients (RT-PCR positive: 121) with a reconstruction slice thickness of under 3 mm When considering these patients only, the AUC was 0.825. When the cut-off point was set at 0.5, the sensitivity and specificity were 82.6% and 69.7%, respectively. When the cut-off point was set at 0.2, the sensitivity and specificity were 94.2% and 51.5%, respectively. In patients with a reconstruction slice thickness over 3 mm, the AUC was 0.789 (Supplement Figure 1) In this external validation study, our results indicated that Ali-M3 could be useful for early triage of suspected COVID-19 patients with symptoms at a lower cut-off. In particular, higher accuracy was observed in patients with higher severity and a few days since symptom onset, and with images with a thinner reconstructed CT slice thickness. Currently, all patients with symptoms, such as fever, are triaged as COVID-19 patients. Thus, medical practitioners must use PPE for each patient. [24] Additionally, bed zoning is essential to avoid contamination of non-infected patients. [25] On the other hand, under-triage cause hospital infections through admission of infected patients to facilities. This should continue until a definitive diagnosis is established. Since Ali-M3 is available on the cloud, the physician can receive the results immediately by sending the digital imaging and communications in medicine images from the ordinal picture archiving and communication system. When applying triage, clinicians require sufficient accuracy in terms of sensitivity, but specificity is less important. [19] The high sensitivity obtained at a cut-off of 0.2 with the AI diagnosis is useful for exclude the diagnosis of COVID-19. Ali-M3 also has the potential to support a diagnosis of COVID-19. The tools currently used for diagnosing COVID-19 infection are antibody, antigen, and RT-PCR tests. Both antigen and RT-PCR tests use tracheal secretions or saliva. An antigen test requires an antigen protein above a given detectable 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted November 18, 2020. ; https://doi.org/10.1101/2020.11.15.20231621 doi: medRxiv preprint level, and is currently inferior to RT-PCR tests. As the same patient sample is used, the antigen test cannot support the RT-PCR test. The RT-PCR test is currently used as a gold standard, but the sensitivity changes depending on the number of days after the onset of symptoms. [19] Therefore, for an exclusion diagnosis, multiple tests staggered over time are needed, rather than a single negative RT-PCR test. Even when this test is performed as rapidly as possible, it still requires a few days to obtain multiple test results. On the other hand, Ali-M3 uses the configurational information of patients' lungs, and can add different information than obtained from RT-PCR, thereby complementing the drawbacks of RT-PCR. In this study, the diagnostic accuracy at the validation stage was lower than the accuracy at the development stage. A two-gate (case-control) design was used in the development of the AI system but in the present study for evaluating the ability of Ali-M3 to assess a COVID-19 diagnosis by chest CT image, we used a single-gate (cohort) design. Although many studies have used the two-gate design in evaluation of AI for the diagnosis of COVID-19, [26] the two-gate design is generally prone to overestimation of diagnostic test results. [27] Thus, blindly using the results based on a two-gate design in a clinical situation can be inappropriate. Moreover, other factors should be considered. With the use of a two-gate design, the fact that RT-PCR is an imperfect reference standard is typically ignored. Furthermore, performing culture and tests to ascertain the true sensitivity of this test is difficult. In the present study, we simulated the diagnostic ability of Ali-M3 with consideration that the sensitivity of the reference standard was imperfect, which leads to underestimation of the specificity and AUC of Ali-M3, without distortion of the sensitivity. Furthermore, the outcomes while developing Ali-M3 and while examining its adequacy were different. Taking into account the patient flow in China, the outcomes at the development stage were set as positive cases with RT-PCR negative results and positive CT image findings. [28] This had a small effect on the sensitivity, but a large effect on the specificity. For example, if in the development stage, 33.9% of the positive patients had negative RT-PCR results and positive CT image findings,[28] then the performance that showed a sensitivity of 98.5% and specificity of 99.2% in the developing Ali-M3, [16] changes from 97.7% to 100% for sensitivity and from 80.8% to 81.6% for specificity when positive RT -1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted November 18, 2020. ; https://doi.org/10.1101/2020.11.15.20231621 doi: medRxiv preprint PCR is the only reference used. Upgrading to a diagnostic AI that targets only RT-PCR-positive cases at the development stage is desirable. This study had some limitations. First, the differentiation performance of Ali-M3 was poor in asymptomatic patients; thus, Ali-M3 should not be used to screen asymptomatic patients. While an alternative to the RT-PCR test for COVID-19 is expected in terms of screening for nosocomial infections and screening on admission for patients with other diseases, Ali-M3 is not recommended for this purpose. Second, we could not differentiate COVID-19 from other viral pneumonias. Compared to the past five seasons, the number of Japanese people infected with influenza during this season was markedly low. [29] In fact, only a few cases in our cohort were diagnosed with other viral pneumonias. Third, it could not reflect the difference in imaging features caused by different COVID-19 types. In addition to type A COVID-19 that was initially prevalent in Asia, type B and type C were prevalent in Europe and in the United States. These different types were not determined in the PCR test, and thus we could not evaluate these differences. In conclusion, we conducted a retrospective cohort study for external validation of Ali-M3. Our results indicated that AI-based CT diagnosis could be useful for a diagnosis of exclusion of COVID-19 in symptomatic patients, particularly those requiring oxygen and with only a few days since symptom onset. Using Ali-M3 support can reduce PPE consumption and prevent hospital infections through the admission of covertly infected patients. Moreover, Ali-M3 also has the potential to support the diagnosis of RT-PCR for suspected COVID-19 patients. However, as Ali-M3 had some limitations in terms of development, further studies and learning are warranted for updating the system. 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted November 18, 2020. ; https://doi.org/10.1101/2020.11.15.20231621 doi: medRxiv preprint 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted November 18, 2020. ; https://doi.org/10.1101/2020.11.15.20231621 doi: medRxiv preprint Light gray bar shows the number of patients included in the strata of days after the onset of symptoms, following the right axis. One stratum includes 2 days from day 0 to day 13. The stratum to the extreme right includes 14 days or more. Following the left axis, solid lines are sensitivity in strata and dash lines are specificity in strata . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. coordinate among asymptomatic patients. The PV+ is dark gray and PV-is light gray. The maximum PV+ is 43.0% and maximum PV-is 57.0%. (C) A plot of test sensitivity (y-coordinate) versus its false-positive rate (xcoordinate) obtained at each cutoff level confidence score in patients using oxygen support. The area under the ROC curve is 0.623 and the Youden index is 0.25. (D) A plot of test sensitivity, specificity, PV+, and PV-in ycoordinate versus confidence score obtained from Ali-M3 in x-coordinate in patients using oxygen support. The PV+ is dark gray and the PV-is light gray. The maximum PV+ is 43.5% and the maximum PV-is 56.5% . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted November 18, 2020. ; PCR, polymerase chain reaction; bpm, beats per minute *Patients using oxygen support were included in symptomatic patients + is continuous data and the others are count data. Continuous variables are expressed as mean (SD) and count data as number (percentage). Triage of Scarce Critical Care Resources in COVID-19 An Implementation Guide for Regional Allocation: An Expert Panel Report of the Task Force for Mass Critical Care and the Hospital surge capacity in a tertiary emergency referral centre during the COVID-19 outbreak in Italy Role of Chest CT in Diagnosis and Management A Systematic Review of Imaging Findings in 919 Patients CT Features of Coronavirus Disease 2019 (COVID-19) Pneumonia in 62 Patients in Wuhan Quantification of Tomographic Patterns associated with COVID-19 from Chest CT CT manifestations of coronavirus disease-2019: A retrospective analysis of 73 cases by disease severity Time Course of Lung Changes at Chest CT during Recovery from Coronavirus Disease 2019 (COVID-19) Performance of Radiologists in Differentiating COVID-19 from Non-COVID-19 Viral Pneumonia at Chest CT Artificial intelligence in medical imaging: threat or opportunity? Radiologists again at the forefront of innovation in medicine Serial Quantitative Chest CT Assessment of COVID-19: Deep-Learning Approach Evaluation of acute pulmonary embolism and clot burden on CTPA with deep learning Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal COVID-19 AI Assisted Analysis Based On Chest CT Imaging Moons: Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement Potential preanalytical and analytical vulnerabilities in the laboratory diagnosis of coronavirus disease 2019 (COVID-19) Variation in False-Negative Rate of Reverse Transcriptase Polymerase Chain Reaction-Based SARS-CoV-2 Tests by Time Since Exposure Fool's Gold: Why Imperfect Reference Tests Are Undermining the Evaluation of Novel Diagnostics: A Reevaluation of 5 Diagnostic Tests for Leptospirosis Antibody responses to SARS-CoV-2 in patients with COVID-19 The Role of Chest Imaging in Patient Management during the COVID-19 Pandemic: A Multinational Consensus Statement from the Fleischner Society Effects of contrast-enhancement, reconstruction slice thickness and convolution kernel on the diagnostic performance of radiomics signature in solitary pulmonary nodule Organization WH: Rational use of personal protective equipment (PPE) for coronavirus disease (COVID-19): interim guidance Gynecological prevention and control model based on ward rearrangement and zoning management in pandemic period of COVID-19 A comprehensive study on classification of COVID-19 on computed tomography with pretrained convolutional neural networks. Sci Rep We thank M3 Inc., and Clinical Porter for the support with providing free Ali-M3 and data storage, although they did not participate in the preparation protocol and manuscript. To want to access Ali-M3, reader can contact M3 (m3-ai-lab@m3.com). We also thank Ms. Kyoko Wasai, who assisted retrieving data. Click here to access/download;Disclosure Paragraph;Acknowledgments.docx All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.The copyright holder for this this version posted November 18, 2020. ; https://doi.org/10.1101/2020.11.15.20231621 doi: medRxiv preprint