key: cord-0880275-le3be44i authors: Ebrahimzadeh, Sanam; Islam, Nayaar; Dawit, Haben; Salameh, Jean-Paul; Kazi, Sakib; Fabiano, Nicholas; Treanor, Lee; Absi, Marissa; Ahmad, Faraz; Rooprai, Paul; Al Khalil, Ahmed; Harper, Kelly; Kamra, Neil; Leeflang, Mariska MG; Hooft, Lotty; Pol, Christian B; Prager, Ross; Hare, Samanjit S; Dennie, Carole; Spijker, René; Deeks, Jonathan J; Dinnes, Jacqueline; Jenniskens, Kevin; Korevaar, Daniël A; Cohen, Jérémie F; Van den Bruel, Ann; Takwoingi, Yemisi; de Wijgert, Janneke; Wang, Junfeng; Pena, Elena; Sabongui, Sandra; McInnes, Matthew DF title: Thoracic imaging tests for the diagnosis of COVID‐19 date: 2022-05-16 journal: Cochrane Database Syst Rev DOI: 10.1002/14651858.cd013639.pub5 sha: 854c79aa9361229f94d37f1e7d3d5e7f25df74ea doc_id: 880275 cord_uid: le3be44i BACKGROUND: Our March 2021 edition of this review showed thoracic imaging computed tomography (CT) to be sensitive and moderately specific in diagnosing COVID‐19 pneumonia. This new edition is an update of the review. OBJECTIVES: Our objectives were to evaluate the diagnostic accuracy of thoracic imaging in people with suspected COVID‐19; assess the rate of positive imaging in people who had an initial reverse transcriptase polymerase chain reaction (RT‐PCR) negative result and a positive RT‐PCR result on follow‐up; and evaluate the accuracy of thoracic imaging for screening COVID‐19 in asymptomatic individuals. The secondary objective was to assess threshold effects of index test positivity on accuracy. SEARCH METHODS: We searched the COVID‐19 Living Evidence Database from the University of Bern, the Cochrane COVID‐19 Study Register, The Stephen B. Thacker CDC Library, and repositories of COVID‐19 publications through to 17 February 2021. We did not apply any language restrictions. SELECTION CRITERIA: We included diagnostic accuracy studies of all designs, except for case‐control, that recruited participants of any age group suspected to have COVID‐19. Studies had to assess chest CT, chest X‐ray, or ultrasound of the lungs for the diagnosis of COVID‐19, use a reference standard that included RT‐PCR, and report estimates of test accuracy or provide data from which we could compute estimates. We excluded studies that used imaging as part of the reference standard and studies that excluded participants with normal index test results. DATA COLLECTION AND ANALYSIS: The review authors independently and in duplicate screened articles, extracted data and assessed risk of bias and applicability concerns using QUADAS‐2. We presented sensitivity and specificity per study on paired forest plots, and summarized pooled estimates in tables. We used a bivariate meta‐analysis model where appropriate. MAIN RESULTS: We included 98 studies in this review. Of these, 94 were included for evaluating the diagnostic accuracy of thoracic imaging in the evaluation of people with suspected COVID‐19. Eight studies were included for assessing the rate of positive imaging in individuals with initial RT‐PCR negative results and positive RT‐PCR results on follow‐up, and 10 studies were included for evaluating the accuracy of thoracic imaging for imagining asymptomatic individuals. For all 98 included studies, risk of bias was high or unclear in 52 (53%) studies with respect to participant selection, in 64 (65%) studies with respect to reference standard, in 46 (47%) studies with respect to index test, and in 48 (49%) studies with respect to flow and timing. Concerns about the applicability of the evidence to: participants were high or unclear in eight (8%) studies; index test were high or unclear in seven (7%) studies; and reference standard were high or unclear in seven (7%) studies. Imaging in people with suspected COVID‐19 We included 94 studies. Eighty‐seven studies evaluated one imaging modality, and seven studies evaluated two imaging modalities. All studies used RT‐PCR alone or in combination with other criteria (for example, clinical signs and symptoms, positive contacts) as the reference standard for the diagnosis of COVID‐19. For chest CT (69 studies, 28285 participants, 14,342 (51%) cases), sensitivities ranged from 45% to 100%, and specificities from 10% to 99%. The pooled sensitivity of chest CT was 86.9% (95% confidence interval (CI) 83.6 to 89.6), and pooled specificity was 78.3% (95% CI 73.7 to 82.3). Definition for index test positivity was a source of heterogeneity for sensitivity, but not specificity. Reference standard was not a source of heterogeneity. For chest X‐ray (17 studies, 8529 participants, 5303 (62%) cases), the sensitivity ranged from 44% to 94% and specificity from 24 to 93%. The pooled sensitivity of chest X‐ray was 73.1% (95% CI 64. to ‐80.5), and pooled specificity was 73.3% (95% CI 61.9 to 82.2). Definition for index test positivity was not found to be a source of heterogeneity. Definition for index test positivity and reference standard were not found to be sources of heterogeneity. For ultrasound of the lungs (15 studies, 2410 participants, 1158 (48%) cases), the sensitivity ranged from 73% to 94% and the specificity ranged from 21% to 98%. The pooled sensitivity of ultrasound was 88.9% (95% CI 84.9 to 92.0), and the pooled specificity was 72.2% (95% CI 58.8 to 82.5). Definition for index test positivity and reference standard were not found to be sources of heterogeneity. Indirect comparisons of modalities evaluated across all 94 studies indicated that chest CT and ultrasound gave higher sensitivity estimates than X‐ray (P = 0.0003 and P = 0.001, respectively). Chest CT and ultrasound gave similar sensitivities (P=0.42). All modalities had similar specificities (CT versus X‐ray P = 0.36; CT versus ultrasound P = 0.32; X‐ray versus ultrasound P = 0.89). Imaging in PCR‐negative people who subsequently became positive For rate of positive imaging in individuals with initial RT‐PCR negative results, we included 8 studies (7 CT, 1 ultrasound) with a total of 198 participants suspected of having COVID‐19, all of whom had a final diagnosis of COVID‐19. Most studies (7/8) evaluated CT. Of 177 participants with initially negative RT‐PCR who had positive RT‐PCR results on follow‐up testing, 75.8% (95% CI 45.3 to 92.2) had positive CT findings. Imaging in asymptomatic PCR‐positive people For imaging asymptomatic individuals, we included 10 studies (7 CT, 1 X‐ray, 2 ultrasound) with a total of 3548 asymptomatic participants, of whom 364 (10%) had a final diagnosis of COVID‐19. For chest CT (7 studies, 3134 participants, 315 (10%) cases), the pooled sensitivity was 55.7% (95% CI 35.4 to 74.3) and the pooled specificity was 91.1% (95% CI 82.6 to 95.7). AUTHORS' CONCLUSIONS: Chest CT and ultrasound of the lungs are sensitive and moderately specific in diagnosing COVID‐19. Chest X‐ray is moderately sensitive and moderately specific in diagnosing COVID‐19. Thus, chest CT and ultrasound may have more utility for ruling out COVID‐19 than for differentiating SARS‐CoV‐2 infection from other causes of respiratory illness. The uncertainty resulting from high or unclear risk of bias and the heterogeneity of included studies limit our ability to confidently draw conclusions based on our results. For chest CT (69 studies, 28285 participants, 14,342 (51%) cases), sensitivities ranged from 45% to 100%, and specificities from 10% to 99%. The pooled sensitivity of chest CT was 86.9% (95% confidence interval (CI) 83.6 to 89.6), and pooled specificity was 78.3% (95% CI 73.7 to 82. 3) . Definition for index test positivity was a source of heterogeneity for sensitivity, but not specificity. Reference standard was not a source of heterogeneity. For chest X-ray (17 studies, 8529 participants, 5303 (62%) cases), the sensitivity ranged from 44% to 94% and specificity from 24 to 93%. The pooled sensitivity of chest X-ray was 73.1% (95% CI 64. to -80.5), and pooled specificity was 73.3% (95% CI 61.9 to 82.2). Definition for index test positivity was not found to be a source of heterogeneity. Definition for index test positivity and reference standard were not found to be sources of heterogeneity. For ultrasound of the lungs (15 studies, 2410 participants, 1158 (48%) cases), the sensitivity ranged from 73% to 94% and the specificity ranged from 21% to 98%. The pooled sensitivity of ultrasound was 88.9% (95% CI 84.9 to 92.0), and the pooled specificity was 72.2% (95% CI 58.8 to 82.5). Definition for index test positivity and reference standard were not found to be sources of heterogeneity. Indirect comparisons of modalities evaluated across all 94 studies indicated that chest CT and ultrasound gave higher sensitivity estimates than X-ray (P = 0.0003 and P = 0.001, respectively). Chest CT and ultrasound gave similar sensitivities (P=0.42). All modalities had similar specificities (CT versus X-ray P = 0.36; CT versus ultrasound P = 0.32; X-ray versus ultrasound P = 0.89). For rate of positive imaging in individuals with initial RT-PCR negative results, we included 8 studies (7 CT, 1 ultrasound) with a total of 198 participants suspected of having COVID-19, all of whom had a final diagnosis of COVID-19. Most studies (7/8) evaluated CT. Of 177 participants with initially negative RT-PCR who had positive RT-PCR results on follow-up testing, 75.8% (95% CI 45.3 to 92.2) had positive CT findings. Trusted evidence. Informed decisions. Better health. • Most studies (n = 69) evaluated the accuracy of chest CT scans. Chest X-ray was evaluated in 17 studies and ultrasound of the lungs was evaluated in 15 studies. • Chest CT was sensitive and moderately specific in the diagnosis of COVID-19 in suspected cases. • Chest X-ray was moderately sensitive and moderately specific in the diagnosis of COVID-19 in suspected cases. • Ultrasound of the lungs was sensitive and moderately specific in the diagnosis of COVID-19 in suspected cases. • There was no statistical evidence indicating that reference standard conduct was a source of heterogeneity for chest CT studies. The definition used for index test positivity in chest CT studies appeared to impact sensitivity, as studies that used radiologists' impressions showed higher sensitivities than those that used formal scoring systems.However, the definition of index test positivity was not found to be a source of heterogeneity for chest CT specificity, chest X-ray accuracy or ultrasound accuracy. • The 'threshold' effect in chest CT studies that used the CO-RADS scoring system, or the RSNA scoring system demonstrated a tradeo between sensitivity and specificity; as the threshold for index test positivity increased, sensitivity decreased, and specificity increased. • Indirect test comparisons showed that chest CT (69 studies) and ultrasound (15 studies) both gave higher sensitivity estimates than chest X-ray (17 studies). Chest CT and ultrasound gave similar sensitivities. All modalities had similar specificities. • The rate of positive CT imaging in repeat RT-PCR positive results (where initial RT-PCR was negative), was 75.8% (95% CI 45.3 to 92.2). • Chest CT imaging had poor sensitivity and high specificity for detecting asymptomatic individuals. Imaging Given various prevalence settings, predicted outcomes for the number of individuals receiving a false positive result or a false negative (missed) result per 1000 people undergoing chest CT, chest X-ray, and ultrasound of the lungs are outlined as follows. The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection and resulting coronavirus disease 2019 pandemic continue to present diagnostic evaluation challenges. While the World Health Organization (WHO) reports laboratory confirmation of COVID-19 infection, such as a positive reverse transcriptase polymerase chain reaction (RT-PCR) result as the standard for diagnosing COVID-19, the value of imaging tests in the diagnostic pathway remains undefined (WHO 2020) . Research on the role of imaging in COVID-19 patients is evolving and more refined assessment methods for imaging tests, such as the COVID-19 Reporting and Data System (CO-RADS), are being investigated (Prokop 2020). Also, asymptomatic transmission of COVID-19 is one of its biggest diagnostics challenges, with the WHO recently reminding the public of the distinction between asymptomatic patients and presymptomatic patients (Walker 2020). The role of imaging in the screening of asymptomatic patients remains undefined. Decisions about patient and isolation pathways for COVID-19 vary according to health services and settings, available resources, and outbreaks in di erent settings. They will change over time, as accurate tests, e ective treatments, and vaccines are identified. The decision points between these pathways vary, but all include points at which knowledge of the accuracy of diagnostic information is needed to inform medical decisions. Therefore, it is essential to understand the accuracy of tests and diagnostic features to develop e ective diagnostic and management pathways for di erent settings. This supports strategies aiming to identify those who are infected, and consequently the management of patients either through isolation precautions, contact tracing, quarantine, hospital admission or admission to a specialized facility, admission to the intensive care unit, or initiation of specific therapies, and implementation of mitigation strategies to limit the spread of the disease. This review from the suite of Cochrane 'living systematic reviews' summarizes evidence on the accuracy of di erent imaging tests and diagnostic features in participants regardless of their symptoms. Estimates of accuracy from this review will help inform diagnostic, screening, isolation, and patient-management decisions. We have included an explanation of terminology and acronyms in Appendix 1. The target condition being evaluated is COVID-19, the illness following acute infection with SARS-CoV-2 (Datta 2020). People Infected with SARS-CoV-2 can be asymptomatic and can have a wide variety of symptoms, including fever, sore throat, diarrhoea, dyspnoea, headache, chest pain, stomach-ache, nausea, loss of taste, loss of smell, myalgia (muscle pain), fatigue, runny nose, cough, aches, and lethargy (either without di iculty breathing at rest or with shortness of breath and increased respiratory rate potentially requiring supplemental oxygen or mechanical ventilation). Furthermore, in people diagnosed with a pulmonary condition (e.g. pulmonary embolism), symptoms could be indicative of COVID-19, or could be a manifestation of the preexisting condition. Chest CT refers to the acquisition of images of the chest using computed tomography. Typical imaging protocols would not use intravenous (IV) contrast; however, in this review we considered all variations of imaging protocols with the exception of studies specifically targeted at evaluating the coronary arteries or the heart, which did not include the entire lungs in the field of view. This includes, but is not limited to, non-contrast chest CT, low-dose chest CT (with or without contrast), high-resolution chest CT, and chest CT with IV contrast (routine or pulmonary angiogram). Chest radiography refers to the evaluation of the lungs using Xrays. This o en involves two orthogonal views, posterior-anterior (PA) and lateral, but may be done by a portable machine and only acquire an anterior-posterior (AP) view. In this review, we considered any and all variations of chest radiography protocols that evaluated the lungs. We did not include protocols that did not include the entire thorax and were done for reasons other than for assessment of pulmonary status (e.g. assessment of feeding tube position, which typically only includes the lower thorax, or dedicated evaluation of the ribs). Ultrasound of the lungs refers to any ultrasound of the thorax done with the intention of evaluating the status of the lungs. This includes, but is not limited to, point-of-care ultrasound, done at the bedside by a physician, as well as what is o en termed consultative' ultrasound, which is done by a technologist and subsequently interpreted by a physician (typically a radiologist). We considered all possible technical parameters (e.g. type of probe, transducer frequency, use of contrast). This did not include ultrasound done with the intended purpose of evaluating only the heart or vessels of the chest. The optimal diagnostic pathway and the role of thoracic imaging for identifying people with COVID-19 is unclear. Compared to RT-PCR testing, a potential major advantage of thoracic imaging is that results are available faster and that it provides a better insight into the status of the lungs. However, chest CT imaging is typically only available in secondary and tertiary healthcare settings, and availability varies across these settings. 1. Thoracic imaging may play an integral role in 'ruling out' COVID-19 pneumonia when RT-PCR is unavailable, pending or negative, or when clinical suspicion is 'low' based on other signs, symptoms and routine laboratory tests. Role of test: triage for RT-PCR, to make decisions about performing additional tests such as RT-PCR. 2. Thoracic imaging is used to rule in or rule out COVID-19 when results from other tests (e.g. RT-PCR) are not available in a timely manner. 3. Concurrent/combination testing with other diagnostic tests (as part of a pair or group of tests) to improve diagnostic accuracy. For example, thoracic imaging could be used to identify false Library Trusted evidence. Informed decisions. Better health. Cochrane Database of Systematic Reviews negatives of other tests (e.g. RT-PCR) , and to improve the overall accuracy of the testing strategy. 4. Thoracic imaging used to detect COVID-19 in asymptomatic patients. Several diagnostic pathways have been proposed that provide guidance for physicians to identify people with COVID-19. The order and components of these pathways di er with varying dependence on pre-test probability, physical examination, laboratory tests and findings based on RT-PCR results and availability. However, some professional organizations recommend imaging for patients with moderate or severe features of COVID-19 (Rubin 2020). In some hospitals, the results of low-dose chest CT are one of the many parameters (among molecular test results, routine laboratory results and clinical signs and symptoms) used to categorize patients as low risk, moderate to high risk, and proven COVID-19 cases (China National Health Comission 2020). Given the rapid progression of COVID-19 and the constantly evolving evidence base, the diagnostic accuracy to inform the utility of thoracic imaging in these pathways is di icult to estimate. This 'living systematic review' aims to identify and summarize evidence regarding the diagnostic accuracy of thoracic imaging in people with suspected COVID-19. This represents our fourth version of this 'living systematic review' (Islam 2021) . Other Cochrane diagnostic test accuracy (DTA) reviews in the suite of reviews address the following tests. 1. Signs and symptoms, which will be mainly used in primary care, including when presenting at the emergency department (Struyf 2020). 2. Routine laboratory testing, such as for C-reactive protein (CRP) and procalcitonin (PCT) (Stegeman 2020). 3. Antibody tests (Deeks 2020). 4. Laboratory-independent point-of-care and near-patient molecular and antigen tests (Dinnes 2020; Dinnes 2021). 5. Electronic and animal noses (Leeflang 2021). In Salameh 2020a, studies that only included confirmed cases of COVID-19 reported high pooled sensitivities for chest CT and X-ray: 93.1% (95% CI 90.2 to 95.0) and 82.1% (95% CI 62.5 to 92.7), respectively (Salameh 2020a). Thirteen studies that assessed chest CT in participants with suspected COVID-19 demonstrated sensitivity of 86.2% (95% CI 71.9 to 93.8) but a low specificity of 18.1% (95% CI 3.71 to 55.8). This indicated a lack of discrimination, as the chances of getting a positive chest CT result are 86% in patients with a SARS-CoV-2 infection and 82% in patients without. We did not evaluate accuracy estimates for chest X-ray and ultrasound of the lungs in participants with suspected COVID-19 in the initial review as these data were not available. Islam 2020 focused on people suspected of having COVID-19 and excluded studies evaluating only confirmed cases of COVID-19 (Islam 2020) . Thirty-one studies that evaluated chest CT in suspected participants demonstrated a pooled sensitivity of 89.9% (95% CI 85.7 to 92.9) and a pooled specificity of 61.1% (95% CI 42.3 to 77.1). We were not able to evaluate pooled accuracy estimates for chest X-ray and ultrasound of the lungs in participants with suspected COVID-19 due to limited data. We explored the value of formal scoring systems for the evaluation of index tests, and 'threshold' e ects of index test positivity, however, we could not perform formal analyses due to the limited number of included studies. Compared to Islam 2020, Islam 2021 had stricter inclusion criteria, excluding studies of case-control design and those that reported an overview of index test findings without explicitly classifying the imaging test as either COVID-19 positive or negative. Forty-one studies evaluated chest CT in suspected participants, nine studies evaluated X-ray and five studies evaluated ultrasound of the lungs in suspected participants. The pooled sensitivity of chest CT was 87.9% (95% CI 84.6 to 90.6) and the pooled specificity was 80.0% (95% CI 74.9 to 84.3). The pooled sensitivity of chest X-ray was 80.6% (95% CI 69.1 to 88.6) and the pooled specificity was 71.5% (95% CI 59.8 to 80.8). The pooled sensitivity of ultrasound was 86.4% (95% CI 72.7 to 93.9) and the pooled specificity was 54.6% (95% CI 35.3 to 72.6). Definition of index test positivity and reference standard conduct were not found to impact accuracy of chest CT. Based on an indirect comparison using all included studies, chest CT had a higher specificity than ultrasound. For this current update (fourth version of the review), we have further refined the inclusion criteria, excluding studies that used imaging as a reference standard and studies that excluded participants with normal index test results. We have also formally assessed the impact of definition of index test positivity on the accuracy of X-ray and ultrasound, along with chest CT. We also assessed the rate of positive imaging in people who had an initial RT-PCR negative result and a positive RT-PCR result on followup, and the accuracy of imaging for screening for COVID-19 in asymptomatic individuals. We do not have immediate future plans for this 'living systematic review'. Updates to the review and modifications to the protocol are made a er discussion with many stakeholders including the author team, the Cochrane DTA COVID group, and the Cochrane Infectious Diseases Group (CIDG). Evolving research on imaging tests in COVID-19 patients includes the use of formal scoring systems to evaluate imaging tests, which o er the potential for improved specificity. Formal scoring systems include CO-RADS (Prokop 2020), the British Society of Thoracic Imaging (BSTI) COVID-19 Reporting Templates (BSTI 2020), and the Radiological Society of North America (RSNA) Expert Consensus on Reporting Chest CT Findings for COVID-19 (Simpson 2020). In Islam 2020, we explored the value of formal scoring systems, but we could not formally analyze them due to a limited number of studies that used these systems. In Islam 2021 we evaluated the value of formal scoring systems on accuracy estimates of imaging tests (Irwig 1995) and threshold e ects of the CO-RADS scoring system for chest CT studies. Since Islam 2021, more studies with comparative designs that compare di erent imaging modalities are available, as well as more studies that evaluate the rate of positive imaging in those with initial RT-PCR negative results and positive RT-PCR results on follow-up, and the accuracy of imaging for screening asymptomatic individuals. The primary objectives are 1) to evaluate the diagnostic accuracy of thoracic imaging (computed tomography (CT), chest X-ray and ultrasound) in the evaluation of people with suspected COVID-19, 2) to assess the rate of positive imaging in individuals with initial RT-PCR negative results and positive RT-PCR results on follow-up, and 3) to evaluate the accuracy of thoracic imaging for screening asymptomatic individuals. The secondary objective is to evaluate threshold e ects of index test positivity on accuracy. We kept the eligibility criteria broad to be able to include all settings and all variations of a test. We included studies of all designs, with the exception of case-control studies. Studies had to include participants suspected of having the target condition and produce estimates of test accuracy or provide 2x2 data (true positive (TP), true negative (TN), false positive (FP), false negative (FN)), from which we could compute estimates for the primary objective. Studies with fewer than 10 participants who underwent the index test and reference standard were excluded. Our focus was on studies that recruited participants suspected of having COVID-19 as outlined in the Target condition being diagnosed section. We included studies with 'symptomatic populations' or 'mixed populations' (asymptomatic and symptomatic participants). There were no age or gender restrictions. We also included 'asymptomatic populations' for the objective on imaging of asymptomatic individuals in this review To reduce the e ect of selection bias, we excluded studies that excluded participants who had normal index test results. The index tests were chest CT, chest X-ray, or ultrasound of the lungs, meeting the criteria described in the Index test(s) section. The roles of the test could have been a replacement of RT-PCR, an add-on test, a triage test, rapid testing, or used concurrently with other diagnostic tests. We included only index tests interpreted by humans, and not an algorithm (machine learning/artificial intelligence (AI)). We included studies involving interpretation by an algorithm only if they provided data pertaining to diagnostic accuracy of human interpretation. Inclusion was limited to 'diagnostic test accuracy studies' in which the study authors explicitly indicated that the index test aims to distinguish between patients with and without COVID-19. Specifically, studies with index test readers either (1) using a radiological scoring system (e.g. CO-RADS), or (2) explicitly classifying patients as having a positive or negative imaging test were included. Studies that reported an overview of index test findings without explicitly classifying the imaging test as either COVID-19 positive or negative were excluded. There has been considerable heterogeneity and changes over time in the definitions used for positive imaging findings. Some groups have used constellations of specific findings (such as multiple peripheral ground-glass opacities on CT), some have used an approach in which they consider the combined e ect of specific findings (a 'gestalt' approach), and some have used formal scoring systems, such as CO-RADS (5 categories Prokop 2020), the BSTI COVID-19 Reporting Templates (four categories; BSTI 2020), and the RSNA Expert Consensus on Reporting Chest CT Findings for COVID-19 (four categories; Simpson 2020). As such, we did not limit ourselves to a predefined definition or threshold for positivity. Instead, we extracted the definition for positivity used in each study, and the constellation of imaging features used to inform this definition. This o ers an opportunity to determine if the definition of positivity contributes to variability in accuracy. As explained above, our target condition is COVID-19. However, we included all studies reporting data on COVID-19 or COVID-19 pneumonia that might provide data relevant to our objective. A positive diagnosis for COVID-19 by one or a combination of the following: 1. a positive RT-PCR test for SARS-CoV-2 infection, from any manufacturer in any country, and from any sample type, including nasopharyngeal swabs or aspirates, oropharyngeal swabs, bronchoalveolar lavage fluid, sputum, saliva, serum, urine, rectal or faecal samples; 2. positive on WHO criteria for COVID-19; 3. positive on China CDC criteria for COVID-19; 4. positive serology for SARS-CoV-2 antibodies in addition to consistent symptomatology; 5. positive on study-specific list of criteria for COVID-19 which includes other criteria (symptoms, other tests, infected contacts). A negative diagnosis for COVID-19 by one or a combination of the following: 1. suspected COVID-19 with negative RT-PCR test results, whether tested once or more than once; 2. currently healthy or with another disease (no RT-PCR test). Studies that used imaging as a part of the reference standard were excluded because of a risk of incorporation bias. We assessed methodological quality based on our judgement of how likely it was that the reference standard definition used in each study would correctly classify individuals as positive or negative for COVID-19. All reference standards are likely to be imperfect in some way; details of reference standard evaluation are provided in Appendix 2. We used a consensus process to agree on the classification of the reference standard as to what we regarded as good, moderate and poor. 'Good' reference standards need to have very little chance of misclassification; 'moderate', a small but acceptable risk; and 'poor', a larger and probably unacceptable risk. Trusted evidence. Informed decisions. Better health. We used three di erent sources for our electronic searches through 17 February 2021, which were devised with the help of an experienced Cochrane Information Specialist with DTA expertise (RSp). These searches aimed to identify all articles related to COVID-19 and SARS-CoV-2 and were not restricted to those evaluating imaging tests. Thus, the searches used no terms that specifically focused on an index test, diagnostic accuracy or study methodology. Due to the increased volume of published and preprint articles, we used artificial intelligence text analysis from 25 May 2020 and onwards to conduct an initial classification of documents, based on their title and abstract information, for relevant and irrelevant documents. See Appendix 3. We used the COVID-19 living search results of the Institute of Social and Preventive Medicine (ISPM) at the University of Bern. This search includes PubMed, Embase and preprints indexed in bioRxiv and medRxiv databases. The strategies as described on the ISPM website (ispmbern.github.io/covid-19) , are shown in Appendix 4. We also included searches undertaken by Cochrane to develop the Cochrane COVID-19 Study Register. These include searches of trials registers at ClinicalTrials.gov and the World Health Organization International Clinical Trials Registry Platform (WHO ICTRP), as well as PubMed (see Appendix 4 for details). Search strategies were designed for maximum sensitivity, to retrieve all human studies on COVID-19. We did not apply any language limits. We included Embase records within the CDC library on COVID-19 research articles database (see Appendix 4 for details) and deduplicated these against the Cochrane COVID-19 Study Register. We checked repositories of COVID-19 publications against these search results including the following. The review authors screened studies independently, in duplicate. A third, experienced review author resolved disagreements about initial title and abstract screening. We resolved disagreements about eligibility assessments through discussion between three review authors. The review authors performed data extraction independently, in duplicate. Three review authors discussed any disagreements to resolve them. For each study, we extracted 2x2 contingency tables of the number of true positives, false positives, false negatives and true negatives. If a study reported accuracy data for more than one index test reader, we took the average of the data from all readers to compute the average 2x2 contingency table (McGrath 2017) . If a study reported accuracy data for both an AI algorithm and one or more radiologists, we extracted only the 2x2 contingency table corresponding to the radiologist accuracy data. If a study used multiple reference standards, but we could determine 2x2 contingency tables that included only RT-PCR as the reference standard, we extracted and analyzed these data. If a study reported accuracy data for multiple thresholds of index test positivity (e.g. studies that used the CO-RADS scoring system, and/or the RSNA scoring system), we extracted the 2x2 contingency table for all available thresholds. Two of the 11 studies that used the CO-RADS scoring system did not report the 2x2 data for all five CO-RADS thresholds. For these two studies, we contacted the corresponding authors but could not obtain the complete data; thus, we were only able to extract data for a CO-RADS threshold of 3. One of the five studies that used the RSNA scoring system did not report the 2x2 data for all four RSNA thresholds. For this one study, we contacted the corresponding authors but could not obtain the complete data; thus we were only able to extract data for RSNA thresholds from 3 to 4 for this study. In addition, we extracted the following items. We used a bivariate model for meta-analyses, taking into account the within-and between-study variance, and the correlation between sensitivity and specificity across studies (Chu 2006; Reitsma 2005) . We performed meta-analyses when four or more studies evaluated a given modality. We also performed sensitivity analyses by limiting inclusion in the meta-analysis to studies published in peer-reviewed journals. We undertook meta-analyses using metandi in STATA (Harbord 2009; StataCorp 2019). If a study reported accuracy data at multiple thresholds of index test positivity, we used the 2x2 contingency table corresponding to the threshold producing the highest Youden's Index (YI) (YI = sensitivity + specificity -1) for inclusion in the meta-analysis. In addition, for studies that evaluated positive imaging chest CT imaging in repeat RT-PCR positive results, we presented rates of positive imaging per study using forest plots. We used the same meta-analysis methods for all primary and secondary objectives (metandi and meqrlogit in STATA, specifically). We investigated heterogeneity by visual inspection of paired forest plots and summary receiver operating characteristics (SROC) plots. For chest CT studies, we evaluated the impact reference standard conduct (RT-PCR performed at least twice in all participants with initial negative results versus RT-PCR not done twice). For chest CT, chest X-ray and ultrasound of the lungs, we evaluated the definition for index test positivity (radiologist impression versus formal scoring system). To investigate the impact of these factors on accuracy estimates, we used meta-regression with the variable of interest added as a covariate to a bivariate model. Using the model parameters, we used a post estimation command to compute absolute di erences in pooled sensitivity and specificity and we obtained their 95% CI using the delta method. We obtained P values using the Wald test. We performed meta-regression when variables of interest consisted of subgroups with five or more studies in each subgroup, an arbitrary threshold chosen to facilitate convergence of the analyses using the bivariate model. We undertook metaregression using meqrlogit in STATA (StataCorp 2019). We performed meta-analyses using a bivariate model for studies that used common thresholds for test positivity. (i.e. chest CT studies at CO-RADS thresholds 2, 3, 4 and 5 and chest CT studies at RSNA thresholds 2, 3 and 4) We used ggplot2 and ggforce in R to generate a plot displaying pooled accuracy estimates at varying CO-RADS and RSNA thresholds (Wickham 2016; Pedersen 2020; R Core Team 2021). We performed this using meta-regression with modality type (i.e. chest CT, chest X-ray, and ultrasound of the lungs) added as a covariate to a bivariate model. We obtained P values using the Wald test. In future updates, as more data become available, we will also perform test comparisons that are restricted to only comparative studies (i.e. direct comparisons). It should be noted that there were not enough studies for direct comparisons. We also generated a plot displaying meta-analysis results across Salameh 2020a, Islam 2020, Islam 2021 and this version of this review (i.e. pooled sensitivity and specificity estimates from the Salameh 2020a published in September 2020, Islam 2020 published in November 2020, Islam 2021 published in February 2021, and this current version) using ggplot2 and ggforce in R (Wickham 2016; Pedersen 2020; R Core Team 2021). For this review, we did not undertake tests for publication bias and made no formal assessment of reporting bias. We provided a summary of the key findings of this review in Summary of findings 1, indicating the certainty of evidence for each finding and emphasizing the main gaps in our current level of available evidence. Islam 2020 and Islam 2021 contained studies up to 22 June 2020 and up to 30 September 2020 respectively. This fourth version contains the results of an updated search performed on 17 February 2021. We identified 7734 search results and imported 976 studies for screening. Subsequently, we removed 11 duplicates. We then screened a total of 965 unique references (published or preprint studies) for inclusion; this is inclusive of the 773 references we screened in Salameh 2020a, Islam 2020, and Islam 2021. Of the 188 records selected for full-text assessment, we included 98 studies in this review for all objectives. Of these 98 studies, 94 were included for evaluating the diagnostic accuracy of thoracic imaging in the evaluation of people with suspected COVID-19; of these 94 studies, four have been included since our initial review(Salameh 2020a) and 12 have been included since the first update of this review (Islam 2020) and 29 have been included since the first update of this review (Islam 2021). Furthermore, 10 studies of the 98 included in this review were included for evaluating the accuracy of thoracic imaging for imagining asymptomatic individuals, and eight were included for assessing the rate of positive imaging in individuals with initial RT-PCR negative results and positive RT-PCR results on follow-up. We included 94 studies (64 CT, 12 X-ray, 11 ultrasounds, three both CT and X-ray, two both CT and ultrasound, and two both X-ray and ultrasound) with a total of 37,631 participants suspected of having COVID-19, of whom 19768 (53%) had a final diagnosis of COVID-19. This could be on the basis of symptoms or epidemiological risk factors such as close contact with confirmed case. The median sample size was 234 (interquartile range (IQR) 101.25 to 478.75). Sixty-five studies were conducted in Europe (Italy 19, the Netherlands 9, France 9, Belgium 5, Turkey 6, Germany 7, UK 4, Switzerland 2, Czech Republic 1, Ireland 1, Spain 1, Denmark 1), 19 were conducted in Asia (China 9, Korea 1, India 4, Iran 2, Japan 1, Pakistan 1, United Arab Emirates 1), and the remaining studies were conducted in North America (USA 6, Canada 1) and South America (Brazil 3). Index test readings were performed by radiologists in 49 studies (52%), radiology residents in two studies (2%), both radiologists and residents in three (4%) study, and radiographers and radiologist in one study (1%); 39 studies (37%) did not clearly report the level of training of readers. Technical parameters regarding the protocol of chest CT used in 69 studies were not clearly reported in 31 (44%) studies, while non-contrast CT was used in 25 (36%) studies, high-resolution chest CT was used in eight (11%) studies, low-dose CT with or without contrast was used in 11 (15%) studies and CT with IV contrast was used in five (7%) studies. Manuscripts of three (3%) of the studies were available only as preprints at the time of the search. Characteristics of the included studies are summarized in Table 1 , and outlined in detail in the Characteristics of included studies. All participants were suspected of having COVID-19. Seventy (74%) studies involved only symptomatic participants, 20 (21%) studies involved both symptomatic and asymptomatic participants, and four (4%) studies did not clearly report participants' symptom status. Fi y-seven studies included only adult participants (aged 16 years and over), 32 studies included both children and adults (although in most cases, only a minority of included patients were children), one study included only children, one study included participants aged 70 years and older, and the remaining three studies did not clearly report the age range of participants. All 94 studies used RT-PCR as the reference standard for the diagnosis of COVID-19, with 82 studies using only RT-PCR as the reference standard and seven studies using a combination of RT-PCR and other criteria (laboratory tests 2, clinical signs and symptoms 2, clinical signs on follow-up 1, positive contacts 1, and follow-up phone calls 1) as the reference standard. With respect to RT-PCR testing, eight studies tested each participant once, 42 studies tested some participants with initial negative RT-PCR results at least twice, 19 studies tested all participants with initial negative RT-PCR results at least twice, and 25 studies did not report on the frequency of testing per participant. Seventeen studies included inpatients, 65 studies included outpatients, one study included both in-and outpatients, while the remaining 23 studies were conducted in unclear settings. Thirty-three (35%) studies described the co-morbidities of the study population, which commonly included hypertension, cardiovascular disease, and diabetes; however, the overall presence of co-morbidities in the participant groups of these studies was unclear. Trusted evidence. Informed decisions. Better health. Cochrane Database of Systematic Reviews participants suspected of having COVID-19, all of whom had a final diagnosis of COVID-19. All studies were also included for the primary objective. Seven studies were conducted in Europe (Italy 4, France 2, Belgium 1), and one was conducted in Asia (China). Index test readings were performed by radiologists in five studies (62%), while three studies (37%) did not clearly report the level of training of readers. Technical parameters regarding the protocol of chest CT used in seven studies were not clearly reported in two (29%) studies, while non-contrast CT was used in four (57%) studies, low-dose CT with or without contrast was used in one (14%) study. Characteristics of the included studies are summarized in Table 2 , and outlined in detail in the Characteristics of included studies. Five studies included only adult participants (aged 16 years and over), three studies included both children and adults. This covers the fact that most were symptomatic and so relatively high pretest probability of COVID-9. All the studies used RT-PCR as the reference standard for the diagnosis of COVID-19. With respect to RT-PCR testing, one study tested all participants with initial negative RT-PCR results at least twice, and seven studies tested some participants with initial negative RT-PCR results at least twice. Five studies included outpatients, two studies included inpatients, while the remaining one study was conducted in an unclear setting. Three (37%) studies described the co-morbidities of the study population, which included hypertension, cardiovascular disease, diabetes, and asthma. However, the overall presence of comorbidities in the participant groups of these studies was unclear. We included 10 studies (Dafydd 2021; De Smet 2020; Dini 2020; Dogan 2020; Gumus 2020; Hernigou 2020; Hwang 2020; Ooi 2021; Puylaert 2020; Yassa 2020) (seven CT, one X-ray, two ultrasound) with a total of 2007 participants suspected of having COVID-19, of whom 127 (6%) had a final diagnosis of COVID-19. For example, patients who had preoperative chest CT included in a study (Gumus 2020) . Of these 10 studies, six were also included for the primary objective. Eight studies were conducted in Europe (Italy 1, UK 2, Belgium 2, the Netherlands 1, Turkey 3), and one was conducted in Korea. Index test readings were performed by radiologists in three studies (30%), one study by radiologist and resident (10%) and other six studies (60%) did not clearly report the level of training of readers. Technical parameters regarding the protocol of chest CT used in three studies were not clearly reported in six (60%) studies, while non-contrast CT was used in two (20%) studies, low-dose CT with or without contrast was used in one (10%) study and high resolution in one (10%) study. Characteristics of the included studies are summarized in Table 3 , and outlined in detail in the Characteristics of included studies. Six studies included only adult participants (aged 16 years and over), three studies included both children and adults, and one study included 70 years of age and older. All the studies used RT-PCR as the reference standard for the diagnosis of COVID-19. With respect to RT-PCR testing, two studies tested each participant once, one study tested all participants with initial negative RT-PCR results at least twice, five studies tested some participants with initial negative RT-PCR results at least twice, and two studies did not report on the frequency of testing per participant. Three studies included outpatients, five studies included inpatients, while the remaining two studies were conducted in unclear settings. One study (10%) described the co-morbidities of the study population, which included hypertension, kidney disease, heart failure, and diabetes; however, the overall presence of co-morbidities in the participant groups of these studies was unclear Our primary objective was to evaluate the diagnostic accuracy of thoracic imaging (computed tomography (CT), X-ray and ultrasound) in people with suspected COVID-19. Also, we assessed the rate of positive imaging in people who had an initial RT-PCR negative result and a positive RT-PCR result on follow-up, and the diagnostic accuracy of thoracic imaging for screening COVID-19 in asymptomatic individuals With respect to the primary objective, 87 studies evaluated a single imaging modality and seven studies evaluated two imaging modalities. In total, the 94 studies reported a total of 101 imaging modality evaluations for the diagnostic accuracy of thoracic imaging in people with suspected COVID-19. Chest CT was evaluated in 69 studies, chest X-ray was evaluated in 17 studies, and ultrasound of the lungs was evaluated in 15 studies. For the objective for positive imaging in repeat RT-PCR positive results, all studies evaluated a single imaging modality. Chest CT was evaluated in seven studies and ultrasound of the lungs was evaluated in one study. For the objective for asymptomatic screening, all studies evaluated a single imaging modality. Chest CT was evaluated in seven studies, chest X-ray was evaluated in one study, and ultrasound of the lungs was evaluated in two studies. Figure 2 provides a summary of the overall methodological quality assessment using the QUADAS-2 tool for all 98 included studies. Figure 3 displays a study-level quality assessment (see Figure 3 for details). Trusted evidence. Informed decisions. Better health. Cochrane Database of Systematic Reviews Across all 98 included studies, we found risk of bias based on concerns about the selection of participants to be high in 10 (10%) and unclear in 42 (42%) studies; the main concern in this domain was high risk of bias due to inappropriate exclusions (n = 10). Risk of bias for chest CT (73 studies) was high in six (8%) and unclear in 27 (36%) studies; risk of bias because of concerns regarding application of chest X-ray (17 studies) was unclear in seven (41%) studies, and risk of bias because of concerns regarding application of ultrasound of the lungs (15 studies) was unclear in six (37%) studies. The six CT studies with a high risk of bias did not predefine the positivity criteria for index tests or did not blind index test readers to reference standard results (n = 1). Risk of bias based on concerns about the reference standard was high in 25 (26%) and unclear in 39 (39%) studies; the 25 studies with a high risk of bias used an single RT-PCR protocol that was not likely to correctly classify the target condition. Risk of bias based on concerns related to participant flow and timing was high in nine (9%) and unclear in 39 (41%) studies; the nine studies with a high risk of bias did not provide the same reference standard to all participants (n = 3), or did not have an appropriate time interval between the reference standard and index test (n = 6). Concerns about the applicability of the evidence to participants were high in three studies (3%) and unclear in five (5%) studies. Concerns about the applicability of the evidence to the index test were high in one (1.4%) and unclear in two (2.7%) studies in 73 chest CT studies, high in two (12%) and unclear in one (6%) chest X-ray study (17 studies), and unclear in one (6%) ultrasound studies (15 studies). Concerns about the applicability of the evidence to the reference standard were high in two (2%) studies and unclear in five (5%) studies. Additional details about risk of bias and applicability assessment are presented in Figure 3 . Cochrane Database of Systematic Reviews The forest plots for chest X-ray and ultrasound of the lungs are presented in Figure 6 . The sensitivity of chest X-ray in 17 studies (including 5303 (62%) cases in 8529 participants) ranged from 44% to 94% and the specificity ranged from 24% to 93%. The pooled sensitivity for chest X-ray was 73.1% (95% CI 64.1 to 80.5) and the pooled specificity was 73.3% (95% CI 61.9 to 82.2). The scatter of the study points in ROC space on the SROC plot ( Figure 7) shows substantial variability in sensitivity, and specificity for chest X-ray. The sensitivity of ultrasound of the lungs in 15 studies (including 1158 (49%) cases in 2410 participants) ranged from 73% to 94% and the specificity ranged from 21% to 98%. The pooled sensitivity for ultrasound was 88.9% (95% CI 84.9 to 92.0), and the pooled specificity was 72.2% (95% CI 58.8 to 82.5). The scatter of the study points in ROC space on the SROC plot ( Figure 8 ) shows substantial variability in sensitivity and specificity for ultrasound of the lungs. For CT studies with suspected participants, we excluded the three studies published as preprints and found this did not a ect summary sensitivity and specificity; studies published in peerreviewed journals (n = 66) had a pooled sensitivity of 87.5% (95% CI 84.3 to 90.1) and a pooled specificity of 78.0% (95% CI 72.9 to 82.4). These results are outlined in Table 4 . The publication status of studies has been updated as of 17 February 2021. Investigations of heterogeneity found that reference standard conduct did not have an impact on accuracy of chest CT. Definition for index test positivity impacted the sensitivity, but not specificity, of chest CT. Definition for index test positivity did not impact the accuracies of chest X-ray or ultrasound. The results of the investigations of heterogeneity are outlined in Table 5 . Eleven studies that evaluated CT used the CO-RADS scoring system to define index test positivity. We obtained the 2x2 data at all five CO-RADS thresholds for nine studies; two studies only reported 2x2 data at a CO-RADS threshold of 3, and the authors could not provide any additional data. The forest plots of chest CT studies that used CO-RADS and reported 2x2 data for CO-RADS thresholds >=2, >=3, >=4 and = 5 are presented in Figure 9Table 6 and Figure 10 summarize the results. Figure 10 . Pooled sensitivity and specificity estimate and 95% confidence intervals at varying CO-RADS thresholds: CO-RADS 2 (n = 9), CO-RADS 3 (n = 11), CO-RADS 4 (n = 9), and CO-RADS 5 (n = 9). • At a CO-RADS threshold of 5 (9 studies), the sensitivity ranged from 42% to 80% and the specificity ranged from 84% to 99%; the pooled sensitivity was 67.3% (95% CI 57.9 to 75.6) and the pooled specificity was 92.2% (95% CI 89.3 to 94.3). • At a CO-RADS threshold of 4 (9 studies), the sensitivity ranged from 56% to 90% and the specificity ranged from 68% to 91%; the pooled sensitivity was 83.3% (95% CI 76.1 to 88.7) and the pooled specificity was 84.0% (95% CI 81.3 to 86.4). • At a CO-RADS threshold of 3 (11 studies), the sensitivity ranged from 65% to 95% and the specificity ranged from 54 % to 87%; the pooled sensitivity was 90.3% (95% CI 85.9 to 93.5) and the pooled specificity was 69.7% (95% CI 64.3 to 74.6). • At a CO-RADS threshold of 2 (9 studies), the sensitivity ranged from 75% to 100% and the specificity ranged from 11% to 57%; the pooled sensitivity was 94.0% (95% CI 89.8 to 96.6) and the pooled specificity was 45.4% (95% CI 38.4 to 52.5). • We did not perform meta-analysis for a CO-RADS threshold of 1, since at this threshold, all sensitivity values are equal to 1, and all specificity values are equal to 0. Five studies that evaluated CT used the RSNA scoring system to define index test positivity. We obtained the 2x2 data at all four RSNA thresholds for four studies; one study did not report 2x2 data at a RSNA threshold of 1 or 2, and the authors could not provide any additional data. The forest plots of chest CT studies that used RSNA and reported 2x2 data for RSNA thresholds 2, 3, and 4 are presented in Figure 11 . Table 7 and Figure 12 summarize the results. • At an RSNA threshold of 4 (5 studies), the sensitivity ranged from 34% to 88% and the specificity ranged from 74% to 97%; the pooled sensitivity was 68.9% (95% CI 47.1 to 84.7) and the pooled specificity was 90.1% (95% CI 79.4 to 94.4). • At an RSNA threshold of 3 (5 studies), the sensitivity ranged from 50% to 97% and the specificity ranged from 57% to 80%; the pooled sensitivity was 87.6% (95% CI 69.4 to 95.7) and the pooled specificity was 63.4% (95% CI 57.1 to 69.2). • At an RSNA threshold of 2 (4 studies), the sensitivity ranged from 55% to 100% and the specificity ranged from 10.7% to 43.6%; the pooled sensitivity was 91.6% (95% CI 67.1 to 98.3) and the pooled specificity was 27.9% (95% CI 17.0 to 42.1). • We did not perform meta-analysis for a RSNA threshold of 1, since at this threshold, all sensitivity values are equal to 1, and all specificity values are equal to 0. Indirect comparisons of modalities evaluated across all 94 studies in suspected participants indicated that chest CT (69 studies) and ultrasound (15 studies) gave higher sensitivity estimates than X-ray (P = 0.0003 and P = 0.001, respectively). Chest CT and ultrasound gave similar sensitivities (P = 0.42). All modalities had similar specificities (CT versus X-ray P = 0.36; CT versus ultrasound P = 0.32; X-ray versus ultrasound P = 0.89). For rate of positive imaging in repeat RT-PCR positive results (where initial RT-PCR was negative), we included eight studies for rate of positive imaging in repeat RT-PCR positive results (7 CT, 1 ultrasound) with a total of 198 participants suspected of having COVID-19, who had an initial negative RT-PCR test result, and a positive result on repeat RT-PCR testing. For chest CT (7 studies, 177 participants), rate of positive imaging in repeat RT-PCR positive results (where initial RT-PCR was negative) ranged from 21% to 100%, and the pooled rate was 75.8% (95% CI 45.3 to 92.2). For ultrasound of the lungs (one study, 21 participants), the sensitivity was 90.4%. The forest plot of chest CT studies for repeat RT-PCR positive results where initial RT-PCR was negative is presented in Figure 13 . We included 10 studies for imaging asymptomatic individuals (7 CT, 1 X-ray, 2 ultrasound). Figure 14 and Figure 15 . Trusted evidence. Informed decisions. Better health. Cochrane Database of Systematic Reviews With respect to chest X-ray, which was evaluated only in Islam 2021 and the current version, the specificities appear to be similar, while the sensitivity appears to slightly increase in the current version. With respect to ultrasound of the lungs, which was evaluated only in Islam 2021 and the current version, the sensitivities appear to be similar, while the specificity appears to increase in the current version. This is the fourth version of a Cochrane living systematic review evaluating the diagnostic accuracy of thoracic imaging (computed tomography (CT), chest X-ray and ultrasound) in the evaluation of people suspected to have COVID-19. This version of the review is based on published studies and preprints up to 17 February 2021. There was no statistical evidence of the e ect of reference standard conduct on the sensitivity or specificity of chest CT; studies that performed reverse transcriptase polymerase chain reaction (RT-PCR) testing at least twice for all initial negative results and studies that did not perform repeat RT-PCR testing for all initial negative results had similar sensitivities and specificities. These findings align with those of Salameh 2020a, Islam 2020 and Islam 2021. The definition used for index test positivity in chest CT studies appeared to impact sensitivity not specificity, as studies that used radiologists' impressions showed higher sensitivities than those that used formal scoring systems. A possible explanation is that a 'threshold e ect' seems to apply to the di erent definitions for index test positivity. Thus, there are di erences in the interpretation of chest CT between the formal scoring system and radiologist impression groups. Chest X-ray (17 studies, 8529 participants with 5303 (62%) cases) demonstrated a sensitivity of 73.1% (95% CI 64. 1-80.5) ,and a specificity of 73.3% (95% CI 61.9 to 82.2) for the diagnosis of COVID-19 in suspected participants. Compared to Islam 2021, the specificities appear to be similar, while the sensitivity appears to slightly increase in the current version. Ultrasound of the lungs (15 studies, 2410 participants with 1158 (49%) cases) demonstrated a sensitivity of 88.9% (95% CI 84.9 to 92.0), and a specificity of 72.2% (95% CI 58.8 to 82.5). Compared to Islam 2021, the sensitivities appear to be similar, while the specificity appears to increase in the current version. In chest CT studies that used the CO-RADS scoring system to define index test positivity (11 studies), as expected, when the threshold for index test positivity increased (i.e. from 2 to 5), sensitivity decreased and specificity increased. The same pattern can be seen for the RSNA scoring system. In chest CT studies that used the RSNA scoring system to define index test positivity (5 studies), when the threshold for index test positivity increased (i.e. from 2 to 4), sensitivity decreased and specificity increased. Based on indirect comparisons of all included studies, chest CT and ultrasound gave higher sensitivity estimates than X-ray. Chest CT and ultrasound gave similar sensitivities. All modalities had similar specificities. The pooled rate of positive chest CT imaging (7 studies, 177 participants all of whom had a final diagnosis of COVID-19) in repeat RT-PCR positive results where initial RT-PCR was negative, was 75.8% (95% CI 45.3 to 92.2). We were unable to derive pooled rates for X-ray and ultrasound due to insu icient available data. in asymptomatic participants. We were unable to derive pooled accuracy estimates for screening with X-ray and ultrasound due to insu icient available data. Our findings show that imaging is not useful for screening asymptomatic patients. Based on the visual assessments of the ggplot graphs, with respect to the four versions of this review, the sensitivity estimates of chest CT appear to remain similar across Salameh 2020a, Islam 2020, Islam 2021, and this current version, while the specificity estimates of chest CT appear to increase with Islam 2020 and Islam 2021. However, the specificity estimates of chest CT appear to remain similar between Islam 2021 and current versions. Given the large number of chest CT studies included in the prior review, which provided sensitivity and specificity estimates with narrow confidence intervals, we had expected that sensitivity and specificity estimates of chest CT will not notably di er in future updates of this review. The results of the current review align with this expectation. For chest X-ray, the specificities between Islam 2021 and this current version appear to be similar, while the sensitivity appears to have slightly increased in the current version. For ultrasound of the lungs the sensitivities between Islam 2021 and this current version appear to be similar, while the specificity appears to have increased in the current version. Our search strategy was broad and allowed for identification of a wide range of articles about COVID-19 diagnosis. The review authors screened records, extracted data, and assessed study methodology independently and in duplicate. Though we are relatively confident in the accuracy and completeness of our findings, please inform us at mmcinnes@toh.ca should errors be found so that we can address them in a future update. Furthermore, compared to Salameh 2020a, Islam 2020, and Islam 2021, this current update includes a greater number of studies that evaluated accuracy estimates of imaging tests in the diagnosis of suspected COVID-19 participants. We included studies that involved only symptomatic participants, as well as studies that had a mixed population (i.e. symptomatic and asymptomatic participants). Thus, there may be situations when asymptomatic individuals are suspected of having COVID-19, such as if they have infected contacts or other risk factors for infection. However, not all the studies clearly reported information on participants' symptoms. We identified that how index test positivity is defined impacts on chest CT sensitivity but not any other modality. These findings may suggest that the variables we investigated did not significantly contribute to variability; alternatively, there may be unmeasured confounding variables blurring our analyses. Due to insu icient granularity of data, we were unable to investigate additional potential sources of variability, particularly participant setting (inpatient versus outpatient). We plan to perform these analyses in future updates, when su icient data become available. In this update, we addressed additional objectives of evaluating the rate of positive imaging in repeat RT-PCR positive results where initial RT-PCR was negative. Furthermore, we evaluated the diagnostic accuracy of thoracic imaging (CT, chest X-ray and ultrasound) in asymptomatic individuals. We explored indirect comparisons of chest CT, chest X-ray and ultrasound of the lungs. Due to the limited number of studies that evaluated multiple imaging modalities in the same population, we did not formally evaluate direct comparisons of di erent imaging tests at this stage. We plan to conduct formal analyses of direct comparisons of imaging tests in future updates, as more studies with comparative designs become available. We were not able to evaluate accuracy estimates based on specific findings of imaging tests (e.g. ground-glass, consolidation, pleural e usion) or combinations of such findings because of the lack of data granularity reported in included studies; however, we will consider this in future updates of the review. We hope that in future versions of this review we will be able to evaluate these associations as research on the role of imaging tests in the diagnosis of COVID-19 evolves. It should be noted that any association between number of days a er symptom onset, symptom severity and the findings on chest imaging for patients with COVID-19 might impact the diagnostic performance of chest CT in the future versions. The quality of the primary studies included in this review continues to impact the overall robustness of the review. Several studies failed to describe their participants (e.g. recruitment method), the details of reference standard conduct used for identifying COVID-19 cases, and the definition used for positivity of the imaging tests. In this version, half of all studies seemed to have low risk of bias data, while, in Islam 2021, most were high or unclear. Of the studies that did report recruitment methods, most reported including 'consecutive' participants. However, many of these studies did not actually recruit 'consecutive' participants that represent the target population (i.e. individuals suspected of having COVID-19), but instead included all consecutive participants that underwent an imaging test and RT-PCR testing. These studies did not describe whether all suspected patients in the recruitment setting underwent both an imaging test and RT-PCR as a part of standard practice (which would result in a true 'consecutive' recruitment), or whether imaging tests were only performed in patients with specific clinical signs (e.g. severe symptoms). In studies where the latter situation is present, included participants may not represent the target population, and this could create selection bias. We recommend that the accuracy estimates reported in this review are interpreted with caution because of the use of RT-PCR as the reference standard. The results of RT-PCR are not always sensitive, and it is possible that chest CT may be more sensitive than the reference standard in some patients. However, our investigations of heterogeneity for chest CT studies did not identify di erent accuracy estimates between studies that used at least two RT-PCR test results to define disease-negative status versus studies that used only one RT-PCR test result to define disease-negative status. At this stage, despite its limitations, RT-PCR remains the best tool for diagnosing COVID-19. However, the best reference standard may Cochrane Database of Systematic Reviews vary across clinical questions, settings, and populations (Korevaar 2020). In future updates of this review, we may consider the use of a latent-class bivariate model for meta-analysis, which adjusts for the imperfect accuracy of the reference standard (Butler-Laporte 2021). Three out of 98 included studies (3%) were only available as preprints at the time of the search. We will update data extracted from these studies in future versions of our review as these studies become published in peer-reviewed journals. As the studies in our cohort included suspected COVID-19 participants, our findings are applicable to individuals suspected to have COVID-19. Our search did not identify many studies that evaluated the accuracy of chest CT, ultrasound of the lungs, and chest X-ray for the diagnosis of COVID-19 in paediatric populations. Thus, the diagnostic accuracy of these modalities in children is not as well-established. In addition, the lack of data available in the included studies pertaining to signs and symptoms of presenting cases, the severity of the symptoms, as well as timing of symptom onset adds complexity to the interpretation of the findings in this review. It should be noted that the results apply mostly to imaging interpreted by radiologists. Our findings indicate that chest computed tomography (CT), chest X-ray and ultrasound all give higher proportions of positive results for individuals with COVID-19 as compared to those without. For ultrasound of the lungs, the chances of getting a positive result are 88.9% (95% CI 84.9 to 92.0) in individuals with COVID-19 and 23.7% (95% CI 13.3 to 33.8) in those without. Due to the limited availability of data, accuracy estimates of chest X-ray and ultrasound of the lungs for the diagnosis of COVID-19 in suspected participants should be carefully interpreted. From our current pool of included studies, we can draw limited conclusions regarding the diagnostic performance of thoracic imaging modalities. Additional studies evaluating the accuracy of chest X-ray and ultrasound of the lungs for diagnosis COVID-19 in suspected patients are needed to allow for more reliable findings. In this update, we were unable to assess several objectives of interest due to the lack of available data required to formally evaluate direct comparisons of di erent imaging modalities, and the e ect of time since onset of symptoms on the diagnostic performance of various index tests. Future studies should ideally pre-define positive imaging findings and include direct comparisons of the various modalities of interest on the same participant population in order to provide robust and reliable data. Furthermore, improved transparency and reporting is necessary for more e icient data extraction in our updated versions of this review. We encourage authors and investigators to refer to the STARD 2015 checklist (Bossuyt 2015; Hong 2018) to ensure that any relevant information is clearly reported in their studies. Also, the uncertainty resulting from high or unclear risk of bias of included studies limit our ability to confidently draw conclusions based on our results. We hope that future updates of this review include more informative studies to allow for additional investigations of variability with improved power and further evaluations of additional objectives. Members of the Cochrane COVID-19 Diagnostic Test Accuracy Review Group include the following. • The wider team of systematic reviewers from University of Birmingham, UK who assisted with title and abstract screening across the entire suite of reviews for the diagnosis of COVID-19 (Agarwal R, Baldwin S, Berhane S, Herd C, Kristunas C, Quinn L, Scholefield B). We thank Dr Jane Cunningham (World Health Organization) for participation in technical discussions and comments on the manuscript. Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Cochrane Database of Systematic Reviews Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Cochrane Database of Systematic Reviews Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? No Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Low concern DOMAIN 2: Index Test (Chest CT) DOMAIN 2: Index Test (Chest X-ray) Were the index test results interpreted without knowledge of the results of the reference standard? Yes If a threshold was used, was it pre-specified? Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Cochrane Database of Systematic Reviews Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the index test results interpreted without knowledge of the results of the reference standard? Yes If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Yes If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Boussouar 2020 (Continued) Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Caruso 2020 (Continued) Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Yes If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Were the index test results interpreted without knowledge of the results of the reference standard? Yes If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Ultrasound of the lungs) Cozzi 2020 (Continued) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Unclear Were all patients included in the analysis? Yes Cozzi 2020 (Continued) Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Low concern Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? No High risk Low concern DOMAIN 2: Index Test (Chest X-ray) Cochrane Database of Systematic Reviews Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes De Smet 2020 (Continued) Was a consecutive or random sample of patients enrolled? Yes Was a case-control design avoided? Yes Did the study avoid inappropriate exclusions? Yes Are there concerns that the included patients and setting do not match the review question? Low concern Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Low concern Was there an appropriate interval between index test and reference standard? Defintion for positive diagnosis on CT: 1. any one of the following: a. single, multiple, or diffuse GGO, with thickened blood vessels and thickened bronchial shadows passing through, with or without localised lobular septal grid thickening b. single or multiple real shadows 2. re-examination 3-5 days later showed that the original GGO or consolidation range increased, the number increased, or accompanied by pleural effusion on one or both sides If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Low concern Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Deng 2020 (Continued) Low concern Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Dimeglio 2021 (Continued) Low concern DOMAIN 2: Index Test (Chest CT) DOMAIN 2: Index Test (Chest X-ray) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Yes Unclear risk Low concern Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Unclear Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Dini 2020 (Continued) Are there concerns that the included patients and setting do not match the review question? Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Unclear Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Unclear Djangang 2020 (Continued) Were the reference standard results interpreted without knowledge of the results of the index tests? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Djangang 2020 (Continued) Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? No High risk Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Unclear risk Dogan 2020 Low concern Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Low concern DOMAIN 2: Index Test (Chest X-ray) Dogan 2020 (Continued) Cochrane Database of Systematic Reviews Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Dogan 2020 (Continued) Was a consecutive or random sample of patients enrolled? Yes Was a case-control design avoided? Yes Did the study avoid inappropriate exclusions? Yes Are there concerns that the included patients and setting do not match the review question? Low concern Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes High risk Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Low concern Was there an appropriate interval between index test and reference standard? Low concern Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? No High risk High DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Erxleben 2021 (Continued) Low concern Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Ferda 2020 (Continued) Low concern Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Ultrasound of the lungs) Cochrane Database of Systematic Reviews Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Fink 2021 (Continued) Low concern Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Fonsi 2020 (Continued) Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Fonsi 2020 (Continued) Cochrane Database of Systematic Reviews Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Gietema 2020 (Continued) Low concern DOMAIN 2: Index Test (Chest CT) DOMAIN 2: Index Test (Chest X-ray) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Low concern Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Library Trusted evidence. Informed decisions. Better health. Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Unclear Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? No Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Unclear Haak 2021 (Continued) Low concern Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Unclear Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes He 2020 (Continued) Are there concerns that the included patients and setting do not match the review question? Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Yes Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Hermans 2020 (Continued) Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Hermans 2020 (Continued) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Unclear Unclear risk Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Hernigou 2020 (Continued) Herpe 2020 Are there concerns that the included patients and setting do not match the review question? Low concern Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Herpe 2020 (Continued) Cochrane Database of Systematic Reviews Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Herpe 2020 (Continued) Was a consecutive or random sample of patients enrolled? Yes Was a case-control design avoided? Yes Did the study avoid inappropriate exclusions? Unclear Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Hwang 2020 (Continued) Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Yes Unclear risk Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Ippolito 2020 (Continued) Was a consecutive or random sample of patients enrolled? Yes Was a case-control design avoided? Yes Did the study avoid inappropriate exclusions? Unclear Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (Chest CT) DOMAIN 2: Index Test (Chest X-ray) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Jalil 2020 (Continued) High risk Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Yes Library Trusted evidence. Informed decisions. Better health. Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Krdzalic 2020 (Continued) Low concern Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Lieveld 2021a (Continued) Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? No Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Luo 2020a (Continued) High risk Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Majeed 2020 (Continued) Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Majeed 2020 (Continued) Low concern Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Mei 2020 (Continued) Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Yes If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Unclear Murphy 2020 (Continued) Low concern Were the index test results interpreted without knowledge of the results of the reference standard? Yes If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Unclear Narinx 2020 (Continued) Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Narinx 2020 (Continued) Did the study avoid inappropriate exclusions? Yes Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Yes If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Library Trusted evidence. Informed decisions. Better health. Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Defintion for positive diagnosis on X-ray: if the report included infection in the differential, as defined by words such as opacity, consolidation, or airspace disease; negative if no abnormality was noted, an abnormality was noted but attributed to a non-infectious aetiology, or was inconclusive for infectious process Definition for positive diagnosis on ultrasound: positive if any Blines were detected. Trusted evidence. Informed decisions. Better health. Cochrane Database of Systematic Reviews Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? No Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Pare 2020 (Continued) Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? No Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Any one of the following: a) Single, multiple, or diffuse ground-glass opacity, with thickened blood vessels and thickened bronchial shadows passing through, with or without localized lobular septal grid thickening; b) Single or multiple real shadows, (2) Reexamination 3 to 5 days later showed that the original ground-glass opacity or consolidation range increased, the number in- RT-PCR once 0.7 On the final report, patients were rated as "Surely COVID+" when presenting with peripheral, bilateral, or multifocal GGO of rounded morphology ± consolidation or crazy paving, reversed halo sign, or subpleural bands of consolidations. Patients were rated as "Possible COV-ID+" when presenting with multifocal, diffuse, peripheral, or unilateral GGO ± consolidation lacking a specific distribution and non-rounded or non-peripheral or with only few very small GGO with a non-rounded and non-peripheral distribution or Radiologist RT-PCR twice, in some with initial negative results 0.4 CT features were classified as "typical," "indeterminate," "atypical," and "negative" for COVID-19 pneumonia", according to RSNA expert consensus Radiologist. RT-PCR twice, in some with initial negative results 0 A structured report about the probability of COVID-19 pneumonia Resident RT-PCR twice, in some with initial negative results 0.6 Positive findings for COVID-19 defined as bilateral, multifocal, multilobar ground glass opacities with or without sub-segmental consolidations or crazy paving pattern in a peripheral distribution (Han 2020; Lee 2020; Simpson 2020) Negative findings defined as presence of isolated lobar consolidation, pleural effusion, nodularity and absence of the positive findings of COVID-19. Indeterminate cases defined as having multilobar ground glass opacities or consolidation with central or diffuse distribution lacking subpleural pattern or unilateral ground glass opacities; these were further categorized as positive or negative for COVID-19 on the basis of clinical history, mutual consensus and RT-PCR results, if available. RT-PCR twice, in some with initial negative results 0 Computed tomography images were divided into 3 groups: normal, consistent with COVID-19, and inconsistent with COVID-19. Multifocal consolidation, ground-glass opacity, and reversed halo sign on CT were considered to be consistent with COV-ID-19. Table 5 . Meta-regression analyses for chest CT, X-ray, and US of suspected cases Abbreviations: CI: confidence interval;CT: computed tomography; US: ultrasound ; RT-PCR: reverse transcription polymerase chain reaction. Correlation of chest CT and RT-PCR testing for coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases Aslan 2020 {published data Diagnostic performance of low-dose chest CT to detect COVID-19: a Turkish population study Performance of Low-Dose Chest CT Scan for Initial Triage of COVID-19 Chest CT accuracy in the diagnosis of SARS-CoV-2 infection: initial experience in a cancer center Diagnostic accuracy and interobserver variability of CO-RADS in patients with suspected coronavirus disease-2019: a multireader validation study Accuracy of CT in a cohort of symptomatic patients with suspected COVID-19 pneumonia during the outbreak peak in Italy The role of CT imaging for management of COVID-19 in epidemic area: early experience from a University Hospital. Insights into Imaging Chest X-ray has poor diagnostic accuracy and prognostic significance in COVID-19: a propensity matched database study. medRxiv [Preprint] 2020 Lung ultrasound as diagnostic tool for SARS-CoV-2 infection. Internal and Emergency Medicine Diagnostic performance of chest computed tomography during the epidemic wave of COVID-19 varied as a function of time since the beginning of the confinement in France COVID-19 pneumonia: high diagnostic accuracy of chest CT in patients with intermediate clinical probability RT-PCR) test for SARS-CoV-2 infection, from any manufacturer in any country, from any source, including nasopharyngeal swabs or aspirates, oropharyngeal swabs, bronchoalveolar lavage fluid (BALF), sputum, saliva, serum, urine, rectal or faecal samples Positive on WHO criteria for COVID-19 which includes some testing RT-PCR negative Positive on China CDC criteria for COVID-19 which includes some testing RT-PCR negative Positive serology in addition to consistent symptomatology Positive on study specific list of criteria for COVID-19 which includes some testing RT-PCR negative Other criteria (symptoms, imaging findings, other tests) A negative diagnosis for COVID-19 by the following COVID suspects with negative RT-PCR test results, whether tested once or more than once Current healthy or with another disease (no RT-PCR test) Although RT-PCR is considered the best available test, it is suspected of missing a substantial proportion of cases, and thus may not be the ideal reference standard if used as a standalone test Copyright © 2022 The Authors. Cochrane Database of Systematic Reviews published by John Wiley & Sons, Ltd. on behalf of The Cochrane Collaboration René Spijker: the Dutch Cochrane Centre (DCC) has received grants for performing commissioned systematic reviews Junfeng Wang received a consultancy fee from Biomind, an Artificial Intelligence (AI) company providing machine intelligence solutions in medical imaging. The consultancy service was about design of clinical studies, not related to this review. The company had no influence , Commonwealth and Development O ice (FCDO) UK • Government of Ontario Ministry of Health COVID-19 Rapid Response Research Grant program, Canada • University of Ottawa Faculty of Medicine COVID-19 Pandemic Response Funding Program Risk of bias assessment The criteria for the index test and reference standard domains of the QUADAS-2 tool were modified for this update (Appendix 2). For studies that used formal scoring systems with clearly defined thresholds, even if the signalling question about using a 'prespecified threshold' was 'unclear' or 'no', the index test domain was not considered to have a 'unclear' or 'high' risk of bias based on the 'prespecified threshold' signalling question. For studies that used RT-PCR testing as the reference standard, even if this signalling question about 'blinding' was 'unclear' or 'no', the reference standard domain was not considered to have a 'unclear' or 'high' risk of bias based on the 'blinding' signalling question Authors' judgement Presented below are all the data for all of the tests entered into the review. • COVID-19: coronavirus disease 2019, the clinical manifestations/symptoms caused by infection with SARS-CoV-2, name given to the disease associated with the virus SARS-CoV-2 • COVID-19 pneumonia: COVID-19 that presents as infection-inflammation of the lungs • Index test: the test that is being assessed (the index test will o en be a new test) • False negative: the test does not detect a condition in someone when it is present • False positive: the test detects a condition in someone when it is not present • Negative predictive value: the probability that someone who has tested negative for the target condition with the index test will really not have it (a true negative) • Positive predictive value: the probability that someone who has tested positive for the target condition with the index test will actually have it (a true positive) • Reference standard: the most reliable method for determining if the target condition is present or absent, used to verify index test results. This could be a combination of tests. • RT-PCR: reverse transcription polymerase chain reaction (RT-PCR) is a laboratory technique that combines reverse transcription of RNA into DNA and amplification of specific DNA targets using polymerase chain reaction. In this context it is used to detect the presence of SARS-CoV-2 RNA • SARS-CoV-2: severe acute respiratory syndrome coronavirus 2, the name given to the 2019 novel coronavirus • SARS-CoV-2 infection: people infected with severe acute respiratory syndrome coronavirus 2, but who may or may not have any clinical manifestations of infection • Secondary care: medical care that is provided by a specialist or facility upon referral by a primary care physician and that requires more specialized knowledge, skill, or equipment than the primary care physician can provide • Sensitivity: the proportion of people with the target condition (with disease) that are correctly identified by the index test • Specificity: the proportion of people without the target condition (without disease) that are correctly identified by the index test • Tertiary care: specialized care, usually for inpatients and on referral from a primary or secondary health professional, in a facility that has personnel and facilities for advanced medical investigation and treatment • Target condition: the disease or condition of interest • True negative: a correct diagnosis of a condition being absent • True positive: a correct diagnosis of a condition being present Appendix 2. QUADAS-2 Imaging studies of the chest (computed tomography (CT), chest X-ray and ultrasound) for diagnosis of COVID-19 People with suspected COVID-19All settings, in particular secondary care, emergency care and ICUsIn people presenting with suspected COVID-19; suspicion may be based on prior testing, such as general lab testing.Signs and symptoms often used for triage or referral A positive diagnosis for COVID-19 by the following.Cochrane Database of Systematic Reviews way; details of reference standard evaluation are provided in the 'Risk of bias' tool below. We will use a consensus process to agree the classification of the reference standard as to what we regard as good, moderate and poor. 'Good' reference standards need to have very little change of misclassification, 'moderate', a small but acceptable risk, 'poor', a larger and probably unacceptable risk. Was a consecutive or random sample of patients enrolled?YES: if a study explicitly states that all participants within a certain time frame were included; that this was done consecutively; or that a random selection was done.NO: if it is clear that a different selection procedure was employed; e.g. selection based on clinician's preference, or based on institutions (i.e. 'convenience' series)UNCLEAR: if the selection procedure is not clear or not reported at all.Was a case-control design avoided?YES: if a study explicitly states that all participants came from the same group of (suspected) patients.NO: if it is clear that a different selection procedure was employed for the participants depending on their COVID-19 status (e.g. proven infected patients in one group and proven non-infected patients in the other group).UNCLEAR: if the selection procedure is not clear or not reported at all.Did the study avoid inappropriate in-or exclusions?This needs to be addressed on a case-to-case basis.YES: if all eligible patients were more or less equally suspected of having COVID-19 and were included and if the numbers in the flow chart show not too many excluded participant (a maximum of 20% of eligible patients excluded without reasons).NO: if over 20% of eligible patients were excluded without providing a reason; if only proven patients were included, or only proven non-patients were included; if in a retrospective study participants without index test or reference standard result were excluded; if exclusion was based on severity assessment post-factum or comorbidities (cardiovascular disease, diabetes, immunosuppression). If the study oversampled patients with particular characteristics likely to affect estimates of accuracy.UNCLEAR: if the exclusion criteria are not reported. HIGH: if one or more signalling questions were answered with NO, as any deviation from the selection process may lead to bias.LOW: if all signalling questions were answered with YES. Is there concern that the included patients do not match the review question?This needs to be addressed on a case-to-case basis, based on the objective the included study answers to.HIGH: if accuracy was assessed in a case-control design, or the study was able to only estimate sensitivity or specificity. Cochrane Database of Systematic Reviews UNCLEAR: if the selection method was unclearly reported. Were the index test results interpreted without knowledge of the results of the reference standard?YES: if blinding was explicitly stated or index test was recorded before the results from the reference standard were available NO: if it was explicitly stated that the index test results were interpreted with knowledge of the results of the reference standard UNCLEAR: if blinding was unclearly reported.If a threshold was used, was it prespecified?YES: for any of these index tests it is highly unlikely that any numerical threshold is used. Still we expect studies to report their criteria for test-positivity (e.g. the constellation of imaging findings used). If these criteria are reported in the methods section, we will score 'YES' for this question.NO: if the optimal criterion for test-positivity was based on the reported data (for example, different scores on a quantitative scoring system) we will score 'NO'.UNCLEAR: if the criteria for test positivity were not or unclearly reported. HIGH: if one or more signalling questions were answered with NO.LOW: if all signalling questions were answered with YES. Note: For studies that use formal scoring systems with clearly defined thresholds, even if the signalling question about using a 'prespecified threshold' is 'unclear' or 'no', this domain should not be considered as having a 'unclear' or 'high' risk of bias based on the aforementioned question.Is there concern that the index test, its conduct, or interpretation differ from the review question?There is not a huge amount of variability from a technical perspective. Therefore, this question will probably be answered 'LOW' in all cases except when assessments are made using personnel not available in practice, or personnel not trained for the job, or using modalities that are uncommon in practice. We will consult expert clinicians on a case-to-case basis to judge this question. Is the reference standard likely to correctly classify the target condition?YES: for COVID-19: RT-PCR, done by trained personnel, and repeated after a first negative RT-PCR, following guidelines for confirmed cases and done with an assay targeting minimum 2 targets in the genes N, E, S or RdRP (one target even acceptable in zone with known transmission). To clarify, a low risk of bias reference standard for true negative would require 2 (or more) negative RT-PCR results. Is there concern that the target condition as defined by the reference standard does not match the review question?HIGH: there is a high concern regarding applicability of the reference standard if the reference standard actually measures a different target condition than the one we are interested in for the review. For example, if the diagnosis is only based on clinical picture, without excluding other possible causes of this clinical picture (e.g. other respiratory pathogens), then there is considerable concern that the reference standard is actually measuring something else than COVID-19. In addition, a positive RT-PCR only measures SARS-CoV-2 infection and not COVID-19 and therefore the reference standard for COVID-19 is a combination of positive RT PCR and symptoms and/or imaging findings. A more e icient approach was required to keep up with the rapidly increasing volume of COVID-19 literature. A classification model for COVID-19 diagnostic studies was built with the model building function within Eppi Reviewer, which uses the standard SGCClassifier in Scikit-learn on word trigrams. As outputs, new documents receive a percentage (from the predict_proba function) where scores close to 100 indicate a high probability of belonging to the class 'relevant document' and scores close to 0 indicate a low probability of belonging to the class 'relevant document'. We used three iterations of manual screening (title and abstract screening, followed by full-text review) to build and test classifiers. The final included studies were used as relevant documents, while the remainder of the COVID-19 studies were used as irrelevant documents. The classifier was trained on the first round of selected articles, and tested and retrained on the second round of selected articles. Testing on the second round of selected articles revealed poor positive predictive value but 100% sensitivity at a cut-o of 10. The poor positive predictive value is mainly due to the broad scope of our topic (all diagnostic studies in COVID-19), poor reporting in abstracts, and a small set of included documents. The model was retrained using the articles selected for the second and third rounds of screening, which added a considerable number of additional documents. This led to a large increase in positive predictive value, at the cost of a lower sensitivity, which led us to reduce the cut-o to 5. The largest proportion of documents had a score between 0-5. This set did not contain any of the relevant documents. This version of the classifier with a cut-o 5 was used in subsequent rounds and accounted for approximately 80% of the screening burden. From 27 April 2020, we retrieved the curated bioRxiv/medRxiv dataset link Embase: ncov OR (wuhan AND corona) OR COVID bioRxiv/medRxiv: ncov or corona or wuhan or COVID 14 April 2022 New citation required and conclusions have changed The results for chest X-ray and ultrasound have changed. The results for chest computed tomography (CT) have changed. All authors reviewed, edited, contributed to, and approved this review update.The search was performed by RS, MMGL, and LH.Cochrane Database of Systematic Reviews test; determining if there is an association between number of days a er symptom onset, symptom severity and the findings on thoracic imaging for patients with COVID-19; and determining the rate of alternative diagnoses identified by thoracic imaging. We had planned to undertake additional sensitivity analyses to determine whether low risk of bias for all QUADAS-2 domains had an e ect on findings. However, since most included studies had an overall high or unclear risk of bias due to study design and only two studies had an overall low risk of bias, it was not possible to undertake these analyses. Our protocol included additional sources of heterogeneity to be evaluated, such as disease prevalence, participant symptoms (severity), timing of symptom onset, participant co-morbidities and other potential candidate variables. Due to the lack of available data, we did not investigate these covariates. Islam 2021 included studies of cross-sectional or case-control designs that either:1. reported specific criteria for index test positivity (i.e. used a scoring system, such as CO-RADS); 2. did not report specific criteria, but had the index test reader(s) explicitly classify the imaging test result as either COVID-19 positive or negative; or 3. reported an overview of index test findings, without having the index test reader(s) explicitly classify index tests as either COVID-19 positive or negative.The inclusion of case-control studies may have been a source of bias as the disease prevalence in the sample of these types of studies do not represent the prevalence in the target population. The inclusion of studies that only reported an overview of index test findings (i.e. studies not intended to be 'diagnostic test accuracy studies') was a possible source of bias identified by sensitivity analysis in Islam 2021 and may have limited our ability to evaluate the sensitivity and specificity of chest CT, chest X-ray and ultrasound. In this update, we excluded studies with case-control designs, and studies that only reported an overview of index test findings without having the index test reader(s) explicitly classify index tests as either COVID-19 positive or negative. The body of evidence has grown to the point that su icient studies that meet these preferred criteria are now available.Investigations of variability were limited in Islam 2021 due to limited available data. The assessment of secondary objectives such as the association between number of days a er symptom onset, symptom severity and the findings on thoracic imaging for patients with COVID-19 was also not possible. In this update, we evaluated the impact of reference standard conduct (RT-PCR, performed at least twice in all initial negative results versus RT-PCR, not performed at least twice in all initial negative results) and definition used for index test positivity (formal scoring system versus radiologist impression), but we were unable to conduct further investigations of variability due to limited available data. We also formally evaluated the impact of threshold e ects on accuracy estimates in this update, particularly for studies that used the CO-RADS scoring system. We were unable to evaluate threshold e ects in other types of formal scoring systems due to the limited number of included studies that used other systems.Of the studies included in Islam 2021, several failed to clearly report key information about their study design, as well as their methods for recruiting participants and delivering the reference standard. Therefore, data derived from these studies may have a high risk of bias and this quality of reporting and weaknesses in the primary studies reflected the overall degree of robustness of our study. In this update, several included studies also failed to report key information and had a high or unclear risk of bias with respect to participant selection, index test, reference standard, and participant flow.The interpretation of the accuracy estimates in Islam 2021 involved several uncertainties. While RT-PCR is considered the best available test, the results of the RT-PCR are not always sensitive; sensitivity depends on the timing of specimen collection, with high sensitivity around the onset of symptoms and during the symptomatic period but lower sensitivity before and a er that window (Kucirka 2020), and collection of an appropriate specimen for testing can also be challenging. RT-PCR alone may not be the ideal reference standard (Li 2020b; Loe elholz 2020), and it is possible that chest CT may be more sensitive than the reference standard in some patients, as some patients identified as having a false-positive diagnosis on CT may have been missed by the RT-PCR test. In this update, similar uncertainties with respect to the use of RT-PCR as the reference standard exist. However, our meta-regression analyses for studies that performed RT-PCR testing at least twice for all participants with initial negative results (i.e. studies that addressed, to some extent, the low sensitivity of RT-PCR testing by conducting at least two RT-PCR tests to define disease-negative status) compared with studies that did not perform repeat RT-PCR testing for all participants with initial negative results, did not identify significantly di erent accuracy estimates between the groups. The quality of reporting and the design of the included studies also a ected the generalizability and ability to assess the validity of our findings.