key: cord-0839253-5za3izyi authors: Islam, Nayaar; Ebrahimzadeh, Sanam; Salameh, Jean-Paul; Kazi, Sakib; Fabiano, Nicholas; Treanor, Lee; Absi, Marissa; Hallgrimson, Zachary; Leeflang, Mariska MG; Hooft, Lotty; Pol, Christian B; Prager, Ross; Hare, Samanjit S; Dennie, Carole; Spijker, René; Deeks, Jonathan J; Dinnes, Jacqueline; Jenniskens, Kevin; Korevaar, Daniël A; Cohen, Jérémie F; Van den Bruel, Ann; Takwoingi, Yemisi; de Wijgert, Janneke; Damen, Johanna AAG; Wang, Junfeng; McInnes, Matthew DF title: Thoracic imaging tests for the diagnosis of COVID‐19 date: 2021-03-16 journal: Cochrane Database Syst Rev DOI: 10.1002/14651858.cd013639.pub4 sha: 93c54e3bfe55365525d8d42e13edd10a385435ae doc_id: 839253 cord_uid: 5za3izyi BACKGROUND: The respiratory illness caused by SARS‐CoV‐2 infection continues to present diagnostic challenges. Our 2020 edition of this review showed thoracic (chest) imaging to be sensitive and moderately specific in the diagnosis of coronavirus disease 2019 (COVID‐19). In this update, we include new relevant studies, and have removed studies with case‐control designs, and those not intended to be diagnostic test accuracy studies. OBJECTIVES: To evaluate the diagnostic accuracy of thoracic imaging (computed tomography (CT), X‐ray and ultrasound) in people with suspected COVID‐19. SEARCH METHODS: We searched the COVID‐19 Living Evidence Database from the University of Bern, the Cochrane COVID‐19 Study Register, The Stephen B. Thacker CDC Library, and repositories of COVID‐19 publications through to 30 September 2020. We did not apply any language restrictions. SELECTION CRITERIA: We included studies of all designs, except for case‐control, that recruited participants of any age group suspected to have COVID‐19 and that reported estimates of test accuracy or provided data from which we could compute estimates. DATA COLLECTION AND ANALYSIS: The review authors independently and in duplicate screened articles, extracted data and assessed risk of bias and applicability concerns using the QUADAS‐2 domain‐list. We presented the results of estimated sensitivity and specificity using paired forest plots, and we summarised pooled estimates in tables. We used a bivariate meta‐analysis model where appropriate. We presented the uncertainty of accuracy estimates using 95% confidence intervals (CIs). MAIN RESULTS: We included 51 studies with 19,775 participants suspected of having COVID‐19, of whom 10,155 (51%) had a final diagnosis of COVID‐19. Forty‐seven studies evaluated one imaging modality each, and four studies evaluated two imaging modalities each. All studies used RT‐PCR as the reference standard for the diagnosis of COVID‐19, with 47 studies using only RT‐PCR and four studies using a combination of RT‐PCR and other criteria (such as clinical signs, imaging tests, positive contacts, and follow‐up phone calls) as the reference standard. Studies were conducted in Europe (33), Asia (13), North America (3) and South America (2); including only adults (26), all ages (21), children only (1), adults over 70 years (1), and unclear (2); in inpatients (2), outpatients (32), and setting unclear (17). Risk of bias was high or unclear in thirty‐two (63%) studies with respect to participant selection, 40 (78%) studies with respect to reference standard, 30 (59%) studies with respect to index test, and 24 (47%) studies with respect to participant flow. For chest CT (41 studies, 16,133 participants, 8110 (50%) cases), the sensitivity ranged from 56.3% to 100%, and specificity ranged from 25.4% to 97.4%. The pooled sensitivity of chest CT was 87.9% (95% CI 84.6 to 90.6) and the pooled specificity was 80.0% (95% CI 74.9 to 84.3). There was no statistical evidence indicating that reference standard conduct and definition for index test positivity were sources of heterogeneity for CT studies. Nine chest CT studies (2807 participants, 1139 (41%) cases) used the COVID‐19 Reporting and Data System (CO‐RADS) scoring system, which has five thresholds to define index test positivity. At a CO‐RADS threshold of 5 (7 studies), the sensitivity ranged from 41.5% to 77.9% and the pooled sensitivity was 67.0% (95% CI 56.4 to 76.2); the specificity ranged from 83.5% to 96.2%; and the pooled specificity was 91.3% (95% CI 87.6 to 94.0). At a CO‐RADS threshold of 4 (7 studies), the sensitivity ranged from 56.3% to 92.9% and the pooled sensitivity was 83.5% (95% CI 74.4 to 89.7); the specificity ranged from 77.2% to 90.4% and the pooled specificity was 83.6% (95% CI 80.5 to 86.4). For chest X‐ray (9 studies, 3694 participants, 2111 (57%) cases) the sensitivity ranged from 51.9% to 94.4% and specificity ranged from 40.4% to 88.9%. The pooled sensitivity of chest X‐ray was 80.6% (95% CI 69.1 to 88.6) and the pooled specificity was 71.5% (95% CI 59.8 to 80.8). For ultrasound of the lungs (5 studies, 446 participants, 211 (47%) cases) the sensitivity ranged from 68.2% to 96.8% and specificity ranged from 21.3% to 78.9%. The pooled sensitivity of ultrasound was 86.4% (95% CI 72.7 to 93.9) and the pooled specificity was 54.6% (95% CI 35.3 to 72.6). Based on an indirect comparison using all included studies, chest CT had a higher specificity than ultrasound. For indirect comparisons of chest CT and chest X‐ray, or chest X‐ray and ultrasound, the data did not show differences in specificity or sensitivity. AUTHORS' CONCLUSIONS: Our findings indicate that chest CT is sensitive and moderately specific for the diagnosis of COVID‐19. Chest X‐ray is moderately sensitive and moderately specific for the diagnosis of COVID‐19. Ultrasound is sensitive but not specific for the diagnosis of COVID‐19. Thus, chest CT and ultrasound may have more utility for excluding COVID‐19 than for differentiating SARS‐CoV‐2 infection from other causes of respiratory illness. Future diagnostic accuracy studies should pre‐define positive imaging findings, include direct comparisons of the various modalities of interest in the same participant population, and implement improved reporting practices. For chest CT (41 studies, 16,133 participants, 8110 (50%) cases), the sensitivity ranged from 56.3% to 100%, and specificity ranged from 25.4% to 97.4%. The pooled sensitivity of chest CT was 87.9% (95% CI 84.6 to 90.6) and the pooled specificity was 80.0% (95% CI 74.9 to 84.3). There was no statistical evidence indicating that reference standard conduct and definition for index test positivity were sources of heterogeneity for CT studies. Nine chest CT studies (2807 participants, 1139 (41%) cases) used the COVID-19 Reporting and Data System (CO-RADS) scoring system, which has five thresholds to define index test positivity. At a CO-RADS threshold of 5 (7 studies), the sensitivity ranged from 41.5% to 77.9% and the pooled sensitivity was 67.0% (95% CI 56.4 to 76.2); the specificity ranged from 83.5% to 96.2%; and the pooled specificity was 91.3% (95% CI 87.6 to 94.0). At a CO-RADS threshold of 4 (7 studies), the sensitivity ranged from 56.3% to 92.9% and the pooled sensitivity was 83.5% (95% CI 74.4 to 89.7); the specificity ranged from 77.2% to 90.4% and the pooled specificity was 83.6% (95% CI 80.5 to 86.4). For chest X-ray (9 studies, 3694 participants, 2111 (57%) cases) the sensitivity ranged from 51.9% to 94.4% and specificity ranged from 40.4% to 88.9%. The pooled sensitivity of chest X-ray was 80.6% (95% CI 69.1 to 88.6) and the pooled specificity was 71.5% (95% CI 59.8 to 80.8). For ultrasound of the lungs (5 studies, 446 participants, 211 (47%) cases) the sensitivity ranged from 68.2% to 96.8% and specificity ranged from 21.3% to 78.9%. The pooled sensitivity of ultrasound was 86.4% (95% CI 72.7 to 93.9) and the pooled specificity was 54.6% (95% CI 35.3 to 72.6). Based on an indirect comparison using all included studies, chest CT had a higher specificity than ultrasound. For indirect comparisons of chest CT and chest X-ray, or chest X-ray and ultrasound, the data did not show di erences in specificity or sensitivity. • A positive RT-PCR test for SARS-CoV-2 infection, from any manufacturer in any country, from any source, including nasopharyngeal swabs or aspirates, oropharyngeal swabs, bronchoalveolar lavage fluid (BALF), sputum, saliva, serum, urine, rectal or faecal samples • Positive on WHO criteria for COVID-19, which includes some testing RT-PCR-negative • Positive on China CDC criteria for COVID-19, which includes some testing RT-PCR negative • Positive serology in addition to consistent symptomatology • Positive on study-specific list of criteria for COVID-19, which includes some testing RT-PCRnegative • Other criteria (symptoms, imaging findings, other tests, infected contacts) A negative diagnosis for COVID-19 by one or a combination of the following. • People with suspected COVID-19 with negative RT-PCR test results, whether tested once or more than once. • Currently healthy or with another disease (no RT-PCR test) • Participant selection: high in 10 (20%) studies and unclear in 22 (43%) studies • Application of index tests -chest CT: high in 5/41 (12%) studies and unclear in 15/41 (37%) studies • Application of index tests -chest X-ray: unclear in 6/9 (67%) studies • Application of index tests -ultrasound of the lungs: unclear in 4/5 (80%) studies • Reference standard: high in 20 (39%) studies and unclear in 20 (39%) studies • Flow and timing: high in 2 (3.9%) studies and unclear in 22 (43%) studies • Participants: high in 1 (2%) study • Index test -chest CT: low in all 41 studies • Index test -chest X-ray: high in 1/9 (11%) study and unclear in 1/9 (11%) study • Index test -ultrasound of the lungs: unclear in 1/5 (20%) study • Reference standard: low in all 51 studies • We included 51 studies (19,775 participants suspected of having COVID-19, 10,155 (51%) cases) • Studies evaluated chest CT scans (41 studies), chest X-ray (9 studies) and ultrasound of the lungs (5 studies) Cochrane Database of Systematic Reviews • Chest CT was sensitive and moderately specific in the diagnosis of COVID-19 in suspected cases. • Chest X-ray was moderately sensitive and moderately specific in the diagnosis of COVID-19 in suspected cases. • Ultrasound of the lungs was sensitive, but not specific in the diagnosis of COVID-19 in suspected cases. • Sensitivity analysis in chest CT studies showed that publication status had a minimal effect on our findings. • The 'threshold' effect in chest CT studies that used the CO-RADS scoring system demonstrated a trade-o between sensitivity and specificity; as the threshold for index test positivity increased from 2 to 5, sensitivity decreased, and specificity increased. • There was no statistical evidence indicating that reference standard conduct and definition of index test positivity were sources of heterogeneity for chest CT studies. • Indirect test comparisons showed that chest CT has a higher specificity than ultrasound. Chest CT and ultrasound have similar sensitivities, chest CT and chest X-ray have similar sensitivities and specificities, and chest X-ray and ultrasound have similar sensitivities and specificities. Imaging Given various prevalence settings, predicted outcomes for the number of individuals receiving a false positive result or a false negative (missed) result per 1000 people undergoing chest CT, chest X-ray, and ultrasound of the lungs are outlined as follows. Positive CT result n (95% CI) The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection and resulting coronavirus disease 2019 pandemic continue to present diagnostic evaluation challenges. While the World Health Organization (WHO) reports laboratory confirmation of COVID-19 infection, such as a positive reverse transcriptase polymerase chain reaction (RT-PCR) result as the standard for diagnosing COVID-19, the value of imaging tests in the diagnostic pathway remains undefined (WHO 2020) . Research on the role of imaging in COVID-19 patients is evolving and more refined assessment methods for imaging tests, such as the COVID-19 Reporting and Data System (CO-RADS), are being investigated (Prokop 2020). Decisions about patient and isolation pathways for COVID-19 vary according to health services and settings, available resources, and outbreaks in di erent settings. They will change over time, as accurate tests, e ective treatments, and vaccines are identified. The decision points between these pathways vary, but all include points at which knowledge of the accuracy of diagnostic information is needed to inform medical decisions. Therefore, it is essential to understand the accuracy of tests and diagnostic features to develop e ective diagnostic and management pathways for di erent settings. This supports strategies aiming to identify those who are infected, and consequently the management of patients either through isolation precautions, contact tracing, quarantine, hospital admission or admission to a specialised facility, admission to the intensive care unit, or initiation of specific therapies, and implementation of mitigation strategies to limit the spread of the disease. This review from the suite of Cochrane 'living systematic reviews' summarises evidence on the accuracy of di erent imaging tests and diagnostic features in participants regardless of their symptoms. Estimates of accuracy from this review will help inform diagnostic, screening, isolation, and patient-management decisions. We have included an explanation of terminology and acronyms in Appendix 1. The target condition being evaluated is COVID-19, the illness following acute infection with SARS-CoV-2 (Datta 2020). People infected with SARS-CoV-2 can be asymptomatic; these people are not considered to have COVID-19 and thus not within the scope of this review. People with COVID-19 can have a wide variety of symptoms, including fever, sore throat, diarrhea, dyspnoea, headache, chest pain, stomach ache, nausea, loss of taste, loss of smell, myalgia, fatigue, runny nose, cough, aches, and lethargy (either without di iculty breathing at rest or with shortness of breath and increased respiratory rate potentially requiring supplemental oxygen or mechanical ventilation). Furthermore, in people diagnosed with a pulmonary condition (e.g. pulmonary embolism), symptoms could be indicative of COVID-19, or could be a manifestation of the pre-existing condition. In this review, we focused on people suspected of having COVID-19 who had thoracic imaging as part of their evaluation or care. Chest CT refers to the acquisition of images of the chest using computed tomography. Typical imaging protocols would not use intravenous (IV) contrast; however, in this review we considered all variations of imaging protocols with the exception of studies specifically targeted at evaluating the coronary arteries or the heart, which did not include the entire lungs in the field of view. This includes, but is not limited to, non-contrast chest CT, low-dose chest CT (with or without contrast), high-resolution chest CT, and chest CT with IV contrast (routine or pulmonary angiogram). Chest radiography refers to the evaluation of the lungs using Xrays. This o en involves two orthogonal views, posterior-anterior (PA) and lateral, but may be done by a portable machine and only acquire an anterior-posterior (AP) view. In this review, we considered any and all variations of chest radiography protocols that evaluated the lungs. We did not include protocols that did not include the entire thorax and were done for reasons other than for assessment of pulmonary status (e.g. assessment of feeding tube position, which typically only includes the lower thorax, or dedicated evaluation of the ribs). Ultrasound of the lungs refers to any ultrasound of the thorax done with the intention of evaluating the status of the lungs. This includes, but is not limited to, point-of-care ultrasound, done at the bedside by a physician, as well as what is o en termed 'consultative' ultrasound, which is done by a technologist and subsequently interpreted by a physician (typically a radiologist). We considered all possible technical parameters (e.g. type of probe, transducer frequency, use of contrast). This did not include ultrasound done with the intended purpose of evaluating only the heart or vessels of the chest. At present, the optimal diagnostic pathway and the role of thoracic imaging for identifying people with COVID-19 is unclear. Compared to RT-PCR testing, a potential major advantage of thoracic imaging is that results are available faster and that it provides a better insight into the status of the lungs. However, chest CT imaging is typically only available in secondary and tertiary healthcare settings, and availability varies across these settings. Given the rapid progression of COVID-19 and the constantly evolving evidence base, the diagnostic accuracy to inform the utility of thoracic imaging in these pathways is di icult to estimate. This 'living systematic review' aims to identify data regarding the diagnostic accuracy of thoracic imaging in people with suspected COVID-19. This represents our second update of this 'living systematic review' (Islam 2020) . Other Cochrane diagnostic test accuracy (DTA) reviews in the suite of reviews address the following tests. 1. Signs and symptoms, which will be mainly used in primary care, including when presenting at the emergency department (Struyf 2020) 2. Routine laboratory testing, such as for C-reactive protein (CRP) and procalcitonin (PCT) (Stegeman 2020) 3. Antibody tests (Deeks 2020) 4. Laboratory-independent point-of-care and near-patient molecular and antigen tests (Dinnes 2020) 5. Molecular laboratory tests In our initial review, studies that only included confirmed cases of COVID-19 reported high pooled sensitivities for chest CT and X-ray: 93.1% (95% CI 90.2 to 95.0) and 82.1% (95% CI 62.5 to 92.7), respectively (Salameh 2020a) . Thirteen studies that assessed chest CT in participants with suspected COVID-19 demonstrated a sensitivity of 86.2% (95% CI 71.9 to 93.8) but a low specificity of 18.1% (95% CI 3.71 to 55.8). This indicated a lack of discrimination, as the chances of getting a positive chest CT result are 86% in patients with a SARS-CoV-2 infection and 82% in patients without. We did not evaluate accuracy estimates for chest X-ray and ultrasound of the lungs in participants with suspected COVID-19 in the initial review as these data were not available. The first update of this review focused on people suspected of having COVID-19 and excluded studies evaluating only confirmed cases of COVID-19 (Islam 2020) . Thirty-one studies that evaluated chest CT in suspected participants demonstrated a pooled sensitivity of 89.9% (95% CI 85.7 to 92.9) and a pooled specificity of 61.1% (95% CI 42.3 to 77.1). This indicated that chest CT performs well in identifying COVID-19, but may have limited capability in di erentiating SARS-CoV-2 infection from other causes of respiratory illness. We did not identify publication status as a source of variability for accuracy estimates of chest CT, and further investigations of additional variables were not possible due to limited data. We were not able to evaluate pooled accuracy estimates for chest X-ray and ultrasound of the lungs in participants with suspected COVID-19 in the first update of this review due to limited data. We did explore the value of formal scoring systems for the evaluation of index tests, and 'threshold' e ects of index test positivity, however, we could not perform formal analyses due to the limited number of included studies. Compared to the first update, this second update has stricter inclusion criteria, excluding studies of case-control design and those that report an overview of index test findings without explicitly classifying the imaging test as either COVID-19 positive or negative. We included more studies in this update and we evaluated both chest X-ray and ultrasound of the lungs in addition to chest CT. Furthermore, this update formally assesses the value of formal scoring systems, 'threshold' e ects, and time trends of accuracy estimates of chest CT, as well as indirect comparisons with respect to accuracy of imaging modalities (i.e. chest CT, X-ray and ultrasound). Evolving research on imaging tests in COVID-19 patients includes the use of formal scoring systems to evaluate imaging tests, which o er the potential for improved specificity. Formal scoring systems include CO-RADS (Prokop 2020), the British Society of Thoracic Imaging (BSTI) COVID-19 Reporting Templates (BSTI 2020), and the Radiological Society of North America (RSNA) Expert Consensus on Reporting Chest CT Findings for COVID-19 (Simpson 2020). In the initial version of this review, most studies either did not specify what criteria were used for index test positivity, or used 'any abnormality' to define index test positive. In the first update of this review, we explored the value of formal scoring systems but we could not formally analyse them due to a limited number of studies that used these systems. In this update, as well as in future updates of this review, we will evaluate the value of formal scoring systems and the impact of 'threshold' e ects of index test positivity on accuracy estimates of imaging tests (Irwig 1995). To evaluate the diagnostic accuracy of thoracic imaging (computed tomography (CT), chest X-ray and ultrasound) in the evaluation of people with suspected COVID-19. To evaluate 'threshold' e ects of index test positivity on accuracy. We kept the eligibility criteria broad to be able to include all settings and all variations of a test. We included studies of all designs, with the exception of case-control studies. Studies had to include participants suspected of having the target condition and produce estimates of test accuracy or provide 2x2 data (true positive (TP), true negative (TN), false positive (FP), false negative (FN)) from which we could compute estimates for the primary objective. If data were not available, we contacted study authors for additional Library Trusted evidence. Informed decisions. Better health. Cochrane Database of Systematic Reviews data if the study met the primary objective only. Studies with fewer than 10 participants who underwent the index test and reference standard were excluded. Our focus was on studies that recruited participants suspected of having COVID-19 as outlined in the Target condition being diagnosed section. We included studies with 'symptomatic populations' or 'mixed populations' (asymptomatic and symptomatic participants). There were no age or gender restrictions. The index tests were chest CT, chest X-ray, or ultrasound of the lungs, meeting the criteria described in the Index test(s) section. The roles of the test could have been a replacement of RT-PCR, an add-on test, a triage test, rapid testing, or used concurrently with other diagnostic tests. We included only index tests interpreted by humans, and not an algorithm (machine learning/artificial intelligence (AI)). We included studies involving interpretation by an algorithm only if they provided data pertaining to diagnostic accuracy of human interpretation. Inclusion was limited to 'diagnostic test accuracy studies' in which the study authors explicitly indicated that the index test aims to distinguish between patients with and without COVID-19 were included. Specifically, studies with index test readers either (1) using a radiological scoring system (e.g. CO-RADS), or (2) explicitly classifying patients as having a positive or negative imaging test were included. Studies that reported an overview of index test findings without explicitly classifying the imaging test as either COVID-19 positive or negative were excluded. Since COVID-19 is such a new disease, and the imaging findings were unknown until recently, there is considerable heterogeneity and change in the definitions used for positivity. Some groups have used constellations of specific findings (such as multiple peripheral ground-glass opacities on CT), some have used an approach in which they consider the combined e ect of specific findings (a 'gestalt' approach), and some have used formal scoring systems, such as CO-RADS (5 categories; Prokop 2020), the BSTI COVID-19 Reporting Templates (4 categories; BSTI 2020), and the RSNA Expert Consensus on Reporting Chest CT Findings for COVID-19 (4 categories; Simpson 2020). As such, we did not limit ourselves to a predefined definition or threshold for positivity. Instead, we extracted the definition for positivity used in each study, and the constellation of imaging features used to inform this definition. This o ers an opportunity to determine if the definition of positivity contributes to variability in accuracy. As explained above, our target condition is COVID-19. However, we included all studies reporting data on COVID-19 or COVID-19 pneumonia that might provide data relevant to our objective. A positive diagnosis for COVID-19 by one or a combination of the following: 1. a positive RT-PCR test for SARS-CoV-2 infection, from any manufacturer in any country, and from any sample type, including nasopharyngeal swabs or aspirates, oropharyngeal swabs, bronchoalveolar lavage fluid, sputum, saliva, serum, urine, rectal or faecal samples; 2. positive on WHO criteria for COVID-19; 3. positive on China CDC criteria for COVID-19; 4. positive serology for SARS-CoV-2 antibodies in addition to consistent symptomatology; 5. positive on study-specific list of criteria for COVID-19 which includes: a. other criteria (symptoms, imaging findings, other tests, infected contacts). A negative diagnosis for COVID-19 by one or a combination of the following: 1. suspected COVID-19 with negative RT-PCR test results, whether tested once or more than once; 2. currently healthy or with another disease (no RT-PCR test). We assessed methodological quality based on our judgement of how likely it was that the reference standard definition used in each study would correctly classify individuals as positive or negative for COVID-19. All reference standards are likely to be imperfect in some way; details of reference standard evaluation are provided in Appendix 2. We used a consensus process to agree on the classification of the reference standard as to what we regarded as good, moderate and poor. 'Good' reference standards need to have very little chance of misclassification; 'moderate', a small but acceptable risk; and 'poor', a larger and probably unacceptable risk. We used three di erent sources for our electronic searches through 30 September 2020, which were devised with the help of an experienced Cochrane Information Specialist with DTA expertise (RSp). These searches aimed to identify all articles related to COVID-19 and SARS-CoV-2 and were not restricted to those evaluating imaging tests. Thus, the searches used no terms that specifically focused on an index test, diagnostic accuracy or study methodology. Due to the increased volume of published and preprint articles, we used artificial intelligence text analysis from 25 May 2020 and onwards to conduct an initial classification of documents, based on their title and abstract information, for relevant and irrelevant documents. Appendix 3. We used the COVID-19 living search results of the Institute of Social and Preventive Medicine (ISPM) at the University of Bern. This search includes PubMed, Embase and preprints indexed in bioRxiv and medRxiv databases. The strategies as described on the ISPM website (ispmbern.github.io/covid-19) , are shown in Appendix 4. Trusted evidence. Informed decisions. Better health. International Clinical Trials Registry Platform (WHO ICTRP), as well as PubMed (see Appendix 4 for details). Search strategies were designed for maximum sensitivity, to retrieve all human studies on COVID-19. We did not apply any language limits. We included Embase records within the CDC library on COVID-19 research articles database (see Appendix 4 for details) and deduplicated these against the Cochrane COVID-19 Study Register. We checked repositories of COVID-19 publications against these search results including the following. The review authors screened studies independently, in duplicate. A third, experienced review author resolved disagreements about initial title and abstract screening. We resolved disagreements about eligibility assessments through discussion between three review authors. The review authors performed data extraction independently, in duplicate. Three review authors discussed any disagreements to resolve them. For each study, we extracted 2x2 contingency tables of the number of true positives, false positives, false negatives and true negatives. If a study reported accuracy data for more than one index test reader, we took the average of the data from all readers to compute the average 2x2 contingency table (McGrath 2017) . If a study reported accuracy data for both an AI algorithm and one or more radiologists, we extracted only the 2x2 contingency table corresponding to the radiologist accuracy data. If a study used multiple reference standards, but we could determine 2x2 contingency tables that included only RT-PCR as the reference standard, we extracted and analysed these data. If a study reported accuracy data for multiple thresholds of index test positivity (e.g. studies that used the CO-RADS scoring system), we extracted the 2x2 contingency table for all available thresholds. Three of the nine studies that used the CO-RADS scoring system did not report the 2x2 data for all five CO-RADS thresholds. We contacted the corresponding authors and successfully received the complete data for one of the three studies. For the two remaining studies, we were only able to extract data for a CO-RADS threshold of 3. In addition, we extracted the following items. 1. Study setting (including country), age of study participants, study dates, disease prevalence at the time of acquisition (as reported in the study), number of participants, participant symptoms, number of imaging studies (and if more than one study was done per participant), participant outcomes and other relevant participant demographic parameters. 2. Study design 3. Imaging timing relative to disease course 4. CT, chest X-ray and ultrasound findings 5. Criteria for 'positive' diagnosis of COVID-19 on imaging 6. Index test technical parameters 7. Reference standard results and details. If RT-PCR was performed, timing of test, number of tests and method of acquisition (or similar details regarding other reference standards used). 8. Details regarding interpretation of the index test (level of training, number of readers, the inter-observer variability) 9. The number of true positives, false positives, false negatives and true negatives or summary statistics from which they can be computed 10.Participant co-morbidities as described in the studies The review authors assessed the risk of bias and applicability concerns independently, in duplicate, using the QUADAS-2 domain-list. Three review authors resolved any disagreements through discussion. See Appendix 2 for an explanation of the operationalisation of the four QUADAS-2 domains -participant selection, index test(s), reference standard(s), flow and timing. We presented estimates of sensitivity and specificity using paired forest plots and we summarised pooled estimates in tables. We analysed the data on a participant level, not a lesion or lungsegment level, since this is what determines care. We used a bivariate model for meta-analyses, taking into account the within-and between-study variance, and the correlation between sensitivity and specificity across studies (Chu 2006; Reitsma 2005) . We performed meta-analyses when four or more studies evaluated a given modality. We also performed sensitivity analyses by limiting inclusion in the meta-analysis to studies published in peer-reviewed journals. We undertook meta-analyses using metandi in STATA (Harbord 2009; StataCorp 2019). If a study reported accuracy data at multiple thresholds of index test positivity, we used the 2x2 contingency table corresponding to the threshold producing the highest Youden's Index (YI) (YI = sensitivity + specificity -1) for inclusion in the meta-analysis. We investigated heterogeneity by visual inspection of paired forest plots and summary receiver operating characteristics (SROC) plots. For chest CT studies, we evaluated the impact of definition for reference standard conduct (RT-PCR performed at least twice in all participants with initial negative results versus RT-PCR performed Library Trusted evidence. Informed decisions. Better health. Cochrane Database of Systematic Reviews only once in all participants with initial negative results or RT-PCR performed twice in some but not all participants with initial negative results), and definition for index test positivity (radiologist impression versus formal scoring system) on accuracy estimates using meta-regression with the variable of interest added as a covariate to a bivariate model. Using the model parameters, we used a postestimation command to compute absolute di erences in pooled sensitivity and specificity and we obtained their 95% CIs using the delta method. We obtained P values using the Wald test. We performed meta-regression when variables of interest consisted of subgroups with five or more studies in each subgroup, an arbitrary threshold chosen to facilitate convergence of the analyses using the bivariate model. We undertook meta-regression using meqrlogit in STATA (StataCorp 2019). If a study within a subgroup reported accuracy data at multiple thresholds of index test positivity, we used the 2x2 contingency table corresponding to the threshold producing the highest YI for inclusion in the meta-regression. We performed meta-analyses using a bivariate model for studies that used common thresholds for test positivity (i.e. chest CT studies at CO-RADS thresholds 2, 3, 4 and 5). We used ggplot2 and ggforce in R to generate a plot displaying pooled accuracy estimates at varying CO-RADS thresholds (Wickham 2016; Pedersen 2020; R Core Team 2021). We undertook comparisons of test accuracy between imaging modalities, regardless of whether or not studies compared imaging modalities head-to-head in the same study population (i.e. indirect comparison). We performed this using meta-regression with modality type (i.e. chest CT, chest X-ray, and ultrasound of the lungs) added as a covariate to a bivariate model. We obtained P values using the Wald test. In future updates, as more data become available, we will also perform test comparisons that are restricted to only comparative studies (i.e. direct comparisons). For chest CT studies, we performed univariate, cumulative, random-e ects meta-analyses of sensitivity and specificity with logit-transforms using STATA (StataCorp 2019). We incorporated primary studies sequentially in the meta-analysis, according to their rank with respect to publication date, to iteratively recalculate summary estimates of logit sensitivity, logit specificity and their variances (Cohen 2016; Lau 1992). Then we assessed time trends by fitting a weighted linear regression model in which the summary estimate up to and including a given primary study is modeled as a linear function of rank of publication, with a first-order autoregressive process to account for the correlation between successive estimates. We generated plots displaying changes in cumulative logit sensitivity and cumulative logit specificity over time. The above cumulative meta-analyses were restricted to the chest CT studies included in the current review. We also generated a plot displaying meta-analysis results across all versions of this review (i.e. pooled sensitivity and specificity estimates from the initial version published in September 2020 (Salameh 2020a) , the first update published in November 2020 (Islam 2020) , and this current update) using ggplot2 and ggforce in R (Wickham 2016; Pedersen 2020; R Core Team 2021). For this review, we did not undertake tests for publication bias and made no formal assessment of reporting bias. We provided a summary of the key findings of this review in Summary of findings 1, indicating the strength of evidence for each finding and emphasising the main gaps in our current level of available evidence. The prior version of this review contained studies up to 22 June 2020 (Islam 2020) . This updated review contains the results of an updated search performed on 30 September 2020. With the substantial number of studies published since 30 September 2020, we plan to update this review including studies up to February 2021. We identified 4734 search results and imported 782 studies for screening. Subsequently, we removed nine duplicates. We then screened a total of 773 unique references (published or preprint studies) for inclusion; this is inclusive of the 668 references we screened in our previous reviews. Of the 358 records selected for full-text assessment, we included 51 studies in this review; of these 51 included studies, four have been included since our initial review (Salameh 2020a) , and 12 have been included since the first update of this review (Islam 2020 ). Refer to Figure 1 for the PRISMA flow diagram of search and inclusion results (Salameh 2020b; Moher 2009) . Exclusions were mainly due to ineligible study design (17 studies), ineligible study outcomes (9 studies), or ineligible patient populations (6 studies); see We included 51 studies (38 CT, seven X-ray, two ultrasound, one both CT and X-ray, two both CT and ultrasound, and one both Xray and ultrasound) with a total of 19,775 participants suspected of having COVID-19, of whom 10,155 (51%) had a final diagnosis of COVID-19. The median sample size was 211 (interquartile range 94.5 to 486). Thirty-three studies were conducted in Europe (Italy 11, the Netherlands 7, France 5, Belgium 3, Turkey 3, Germany 2, UK 2), 13 were conducted in Asia (China 10, Korea 1, India 1, Japan 1) and the remaining studies were conducted in North America (USA 3) and South America (Brazil 2). Index test readings were performed by radiologists in 39 studies (76%), radiology residents in two studies (4%), both radiologists and residents in one (2%) study, and radiographers in one study (2%); eight studies (16%) did not clearly report the level of training of readers. Technical parameters regarding the protocol of chest CT used in 41 studies were not clearly reported in 18 (44%) studies, while non-contrast CT was used in 10 (24%) studies, high-resolution chest CT was used in three (7%) studies, low-dose CT with or without contrast was used in seven (17%) studies and CT with IV contrast was used in three (7%) studies. Manuscripts of three (6%) of the studies were published as preprints at the time of the search. We updated the publication status of all four of the preprint studies previously included in the first update of our review as of 1 November 2020: two studies were published since then, though there were no changes to the data between the preprint and published versions; one study remained as a preprint and had an updated version with updated data, which we re-extracted for analysis; and one study remained as a preprint without any updated versions. Characteristics of the included studies are summarised in Table 1 , and outlined in detail in the Characteristics of included studies. Twenty-six studies included only adult participants (aged 16 years and over), 21 studies included both children and adults (although in most cases, only a minority of included patients were children), one study included only children, one study included participants aged 70 years and older, and the remaining two studies did not clearly report the age range of participants. All participants were suspected of having COVID-19. Thirty-three (65%) studies involved only symptomatic participants, 11 (22%) studies involved symptomatic and asymptomatic participants, and seven (14%) studies did not clearly report participants' symptom status. All the studies used RT-PCR as the reference standard for the diagnosis of COVID-19, with 47 studies using only RT-PCR as the reference standard and four studies using a combination of RT-PCR and other criteria (clinical signs 1, clinical signs and imaging tests 1, positive contacts 1, and follow-up phone calls 1) as the reference standard. With respect to RT-PCR testing, two studies tested each participant once, 18 studies tested some participants at least twice, if necessary, 11 studies tested all participants at least twice, if necessary, and 20 studies did not report on the frequency of testing per participant. Two studies included inpatients, 32 studies included outpatients, while the remaining 17 studies were conducted in unclear settings. Seventeen (33%) studies described the co-morbidities of the study population, which commonly included hypertension, cardiovascular disease, and diabetes; however, the overall presence of co-morbidities in the participant groups of these studies was unclear. Forty-seven studies evaluated a single imaging modality and four studies evaluated two imaging modalities. In total, the 51 studies reported a total of 55 imaging modality evaluations. Chest CT was evaluated in 41 studies, chest X-ray was evaluated in nine studies, and ultrasound of the lungs was evaluated in five studies. Trusted evidence. Informed decisions. Better health. Figure 2 provides a summary of the overall methodological quality assessment using the QUADAS-2 tool for all 51 included studies. Figure 3 displays a study-level quality assessment. Cochrane Database of Systematic Reviews Overall, we found risk of bias based on concerns about the selection of participants to be high in 10 (20%) and unclear in 22 (43%) studies; the main concern in this domain was high risk of bias due to inappropriate exclusions (n = 10). Risk of bias because of concerns regarding application of chest CT (41 studies) was high in five (12%) and unclear in 15 (37%) studies; risk of bias because of concerns regarding application of chest Xray (9 studies) was unclear in six (67%) studies, and risk of bias because of concerns regarding application of ultrasound (5 studies) was unclear in four (80%) studies. The five CT studies with a high risk of bias did not predefine the positivity criteria for index tests (n = 3) or did not blind index test readers to reference standard results (n = 2). Risk of bias based on concerns about the reference standard was high in 20 (39%) and unclear in 20 (39%) studies; the 20 studies with a high risk of bias used an RT-PCR protocol that was not likely to correctly classify the target condition. Risk of bias based on concerns related to participant flow and timing was high in two (3.9%) and unclear in 22 (43%) studies; the two studies with a high risk of bias did not provide the same reference standard to all participants (n = 1), or did not have an appropriate time interval between the reference standard and index test (n = 1). Concerns about the applicability of the evidence to participants were high in one study (2%) . Concerns about the applicability of the evidence to the index test were low in all 41 chest CT studies, high in one (11%) and unclear in one (11%) chest X-ray study, (9 studies), and unclear in one (20%) ultrasound study (5 studies). Concerns about the applicability of the evidence to the reference standard were low in all 51 studies. Additional details about risk of bias and applicability assessment are presented in Figure 3 . The forest plot for chest CT is presented in Figure 4 . The sensitivity of CT in 41 studies (involving 8110 (50%) cases in 16,133 participants) ranged from 56.3% to 100%, and the specificity ranged from 25.4% to 97.4%. The pooled sensitivity for chest CT was 87.9% (95% CI 84.6 to 90.6) and the pooled specificity was 80.0% (95% CI 74.9 to 84.3). The scatter of the study points in ROC space on the SROC plot ( Figure 5 ) shows substantial variability in sensitivity and specificity. Trusted evidence. Informed decisions. Better health. Cochrane Database of Systematic Reviews The forest plots for chest X-ray and ultrasound of the lungs are presented in Figure 6 . The sensitivity of chest X-ray in nine studies (including 2111 (57%) cases in 3694 participants) ranged from 51.9% to 94.4% and the specificity ranged from 40.4% to 88.9%. The pooled sensitivity for chest X-ray was 80.6% (95% CI 69.1 to 88.6) and the pooled specificity was 71.5% (95% CI 59.8 to 80.8). The scatter of the study points in ROC space on the SROC plot ( Figure 7 ) shows substantial variability in sensitivity and specificity for chest X-ray. Trusted evidence. Informed decisions. Better health. The sensitivity of ultrasound of the lungs in five studies (including 211 (47%) cases in 446 participants) ranged from 68.2% to 96.8% and the specificity ranged from 21.3% to 78.9%. The pooled sensitivity for ultrasound was 86.4% (95% CI 72.7 to 93.9) and the pooled specificity was 54.6% (95% CI 35.3 to 72.6). The scatter of the study points in ROC space on the SROC plot ( Figure 8 ) shows substantial variability in sensitivity and specificity for ultrasound of the lungs. Sensitivity analysis for CT studies limiting inclusion to studies published in peer-reviewed journals gave accuracy estimates similar to those of the overall included studies. When we excluded the three studies published as preprints, studies published in peerreviewed journals (n = 38) had a pooled sensitivity of 88.5% (95% CI 85.2 to 91.2) and a pooled specificity of 81.2% (95% CI 76.0 to 85.3). These results are outlined in Table 2 . Investigations of heterogeneity for chest CT studies found that reference standard conduct, as well as definition for index test positivity, did not have an e ect on accuracy estimates. The results of the investigations of heterogeneity are outlined in Table 3 . Stratification by reference standard conduct gave pooled sensitivity estimates of 88.1% (95% CI 78.5 to 93.8) for studies that performed RT-PCR testing at least twice for all participants with initial negative results versus 86.7% (95% CI 82.3 to 90.1) for studies that did not perform repeat RT-PCR testing for all participants with initial negative results (P = 0.74). Pooled specificity estimates were 71.3% Library Trusted evidence. Informed decisions. Better health. Cochrane Database of Systematic Reviews (95% CI 59.9 to 80.6) for studies that performed RT-PCR testing at least twice for all participants with initial negative results versus 82.6% (95% CI 77.8 to 86.6) for studies that did not perform repeat RT-PCR testing for all participants with initial negative results (P = 0.05). For the subgroup of CT studies that did not perform repeat RT-PCR testing in all participants with initial negative results (n = 22), the proportion of participants that underwent repeat RT-PCR testing ranged from 0% to 61% amongst the nine studies that reported this information; the remaining thirteen studies did not clearly report this information. Stratification by definition used for index test positivity gave pooled sensitivity estimates of 90.3% (95% CI 84.5 to 94.1) for studies that defined index test positivity based on radiologist's impressions versus 85.9% (95% CI 81.2 to 89.2) for studies that used a formal scoring system to define index test positivity (P = 0.15). Pooled specificity estimates were 77.2% (95% CI 67.0 to 84.9) for studies that used radiologist's impressions versus 80.0% (95% CI 75.0 to 84.2) for studies that used a formal scoring system (P = 0.58). For studies that used a formal scoring system, we used the threshold demonstrating the highest Youden's index in each study (or as in the cases of two studies that did not report data at all thresholds, the only threshold that was available (i.e. CO-RADS 3)) in the analysis. Nine studies that evaluated CT (involving 1139 (41%) cases amongst 2807 participants) used the CO-RADS scoring system to define index test positivity. We obtained the 2x2 data at all five CO-RADS thresholds for seven studies; two studies only reported 2x2 data at a CO-RADS threshold of 3, and the authors could not provide any additional data. The forest plots of chest CT studies that used CO-RADS and reported 2x2 data for CO-RADS thresholds 2, 3, 4 and 5 are presented in Figure 9 . Cochrane Database of Systematic Reviews • At a CO-RADS threshold of 5 (7 studies), the sensitivity ranged from 41.5% to 77.9% and the specificity ranged from 83.5% to 96.2%; the pooled sensitivity was 67.0% (95% CI 56.4 to 76.2) and the pooled specificity was 91.3% (95% CI 87.6 to 94.0). • At a CO-RADS threshold of 4 (7 studies), the sensitivity ranged from 56.3% to 92.9% and the specificity ranged from 77.2% to 90.4%; the pooled sensitivity was 83.5% (95% CI 74.4 to 89.7) and the pooled specificity was 83.6% (95% CI 80.5 to 86.4). • At a CO-RADS threshold of 3 (9 studies), the sensitivity ranged from 65.5% to 98.8% and the specificity ranged from 56.6% to 86.9%; the pooled sensitivity was 90.7% (95% CI 85.2 to 94.3) and the pooled specificity was 69.4% (95% CI 63.3 to 74.9). • At a CO-RADS threshold of 2 (7 studies), the sensitivity ranged from 75.4% to 99.7% and the specificity ranged from 26.5% to 57.1%; the pooled sensitivity was 94.3% (95% CI 88.6 to 97.2) and the pooled specificity was 44.1% (95% CI 36.5 to 52.0). • We did not perform meta-analysis for a CO-RADS threshold of 1, since at this threshold, all sensitivity values are equal to 1, and all specificity values are equal to 0. Four included studies evaluated two modalities each (one study on chest CT and chest X-ray, two studies on chest CT and ultrasound, and one study on chest X-ray and ultrasound). Both modalities in each study were evaluated in the same population, or the second modality was evaluated in a subset of the population in which the first modality was assessed (i.e. direct comparisons). Paired SROC plots for these four comparative studies are presented in Figure 11 , Figure 12 , and Figure 13 . In the one study that evaluated chest CT and chest X-ray, both modalities had similar sensitivities and specificities. In the two studies that evaluated chest CT and ultrasound, chest CT had similar sensitivities and higher specificities compared to ultrasound. In the one study that evaluated chest X-ray and ultrasound, chest X-ray had a lower sensitivity and a higher specificity compared to ultrasound. We could not perform formal analyses to compare accuracy estimates of modalities assessed in the same population directly due to the limited number of studies. Trusted evidence. Informed decisions. Better health. Cochrane Database of Systematic Reviews Indirect comparisons of modalities evaluated across all 51 studies indicated that: chest CT (41 studies) and chest X-ray (9 studies) gave similar sensitivity (P = 0.10) and specificity (P = 0.12) estimates; chest CT and ultrasound (5 studies) gave similar sensitivity estimates (P = 0.77), while chest CT gave higher specificity estimates than ultrasound (P = 0.0052); and chest X-ray and ultrasound gave similar sensitivity (P = 0.43) and specificity (P = 0.13) estimates. These findings are summarised in Table 5 . Cumulative meta-analyses and time trends analyses of the 41 chest CT studies included in this current review indicated a decrease in cumulative estimates of sensitivity and an increase in cumulative estimates of specificity over time; both P values < 0.001. Figure 14 displays the cumulative meta-analyses of logit-sensitivity and logit-specificity over time, respectively. Based on visual Cochrane Database of Systematic Reviews assessment, the meta-analysis estimates for sensitivity and specificity appear to stabilise near their final values a er 20 consecutive studies have been included in the meta-analysis. Based on the pooled sensitivity and specificity estimates derived across all three versions of this review (i.e. initial review published in September 2020 (Salameh 2020a) , first update published in November 2020 (Islam 2020) , and this current update), the sensitivity estimates of chest CT appear to be similar across all versions, while the specificity estimates of chest CT appear to increase with each update. Figure 15 displays the pooled sensitivity and specificity estimates with 95% CIs from all versions of this review. Cochrane Database of Systematic Reviews Figure 15 . Pooled sensitivity and specificity estimates and 95% confidence intervals across all versions of this review. This is the second update of a Cochrane living systematic review evaluating the diagnostic accuracy of thoracic imaging (CT, chest Xray and ultrasound) in the evaluation of people suspected to have COVID-19. This version of the review is based on published studies and preprints up to 30 September 2020. Chest CT (41 studies, 16,133 participants, 8110 (50%) cases) demonstrated a sensitivity of 87.9% (95% CI 84.6 to 90.6), and a specificity of 80.0% (95% CI 74.9 to 84.3) for the diagnosis of COVID-19 in suspected participants. Compared with the findings of the first update of this 'living' systematic review, in which we determined that chest CT had a sensitivity of 89.9% (95% CI 85.7 to 92.9) and specificity of 61.1% (95% CI 42.3 to 77.1), our current update demonstrates similar sensitivity, but notably higher specificity. Possible explanations for this improved specificity could include the stricter study inclusion criteria applied for this update, particularly the exclusion of studies that report an overview of index test findings in participants with and without the target condition, without explicitly classifying the imaging test as either COVID-19 positive or negative. The improved specificity could also be due to the addition of more studies that used well-developed definitions for index test positivity used by index test readers (e.g. CO-RADS, BSTI COVID-19 Reporting Template, RSNA Expert Consensus on Reporting Chest CT Findings for COVID-19). Furthermore, studies from the early stage of the pandemic were included in our initial review and studies from a later stage were added in this update; thus, the stage of the pandemic during which included studies were conducted have likely influenced these di ering specificity estimates through improved knowledge about the pathophysiology and imaging manifestations of COVID-19. This is supported by the results of our cumulative meta-analysis and time trends analysis, which indicate that specificity has increased over time. The influence of these proposed variables on only specificity and not sensitivity remains unexplained and requires exploration in future updates. Sensitivity analysis for chest CT studies, limiting inclusion to studies published in peer-reviewed journals, gave accuracy estimates similar to those of the overall included studies. Based on this, publication status had a minimal e ect on our results. In our previous review, publication status did not appear to contribute to heterogeneity. There was no statistical evidence of the e ect of reference standard conduct on the sensitivity or specificity of chest CT; studies that performed RT-PCR testing at least twice for all initial negative Cochrane Database of Systematic Reviews results and studies that did not perform repeat RT-PCR testing for all initial negative results had similar sensitivities and specificities. These findings align with those of our previous review, in which a sensitivity analysis limiting inclusion to studies that implemented RT-PCR testing at least twice for all initial negative results, gave accuracy estimates similar to those of the overall included studies. However, as the results of this meta-regression in the current review were close to statistical significance, reference standard conduct will continue to be of interest in future updates of this review. Surprisingly, the definition used for index test positivity in chest CT studies did not appear to a ect test accuracy, as studies that used a formal scoring system demonstrated comparable accuracy to those that used radiologists' impressions. A possible explanation is that radiologists using the 'radiologist impression' method may have implicitly used a formal scoring system that they were previously familiar with (e.g. CO-RADS). Thus, there may be minimal di erences in the interpretation of chest CT between the formal scoring system and radiologist impression groups. In chest CT studies that used the CO-RADS scoring system to define index test positivity (9 studies, 2807 participants with 1139 (41%) cases), as expected, when the threshold for index test positivity increased (i.e. from 2 to 5), sensitivity decreased and specificity increased. Chest X-ray (9 studies, 3694 participants with 2111 (57%) cases) demonstrated a sensitivity of 80.6% (95% CI 69.1 to 88.6), and a specificity of 71.5% (95% CI 59.8 to 80.8) for the diagnosis of COVID-19 in suspected participants. Ultrasound (5 studies, 446 participants with 211 (47%) cases) demonstrated a sensitivity of 86.4% (95% CI 72.7 to 93.9), and a specificity of 54.6% (95% CI 35.3 to 72.6). As meta-analysis was not performed for chest X-ray or ultrasound in our previous review due to a limited number of studies, comparisons between our current and previous findings are not possible. The number of included studies for chest X-ray and ultrasound was not su icient to conduct meta-regression analyses. Based on indirect comparisons of all included studies, chest CT and chest X-ray had similar sensitivities and specificities. Chest CT and ultrasound had similar sensitivities, but chest CT had a higher specificity than ultrasound. Chest X-ray and ultrasound had similar sensitivities and specificities. The indirect comparisons between chest CT and chest X-ray, and chest CT and ultrasound are supported by the limited evidence provided by studies that performed direct comparisons. The one study that compared chest CT and chest X-ray in the same population showed that sensitivities and specificities for both modalities were similar (Borakati 2020) . Both studies that compared chest CT and ultrasound in the same population found that the two modalities had similar sensitivities and chest CT had a higher specificity compared to ultrasound (Fonsi 2020; Narinx 2020). However, the one study that compared chest X-ray and ultrasound in the same population showed that X-ray had a lower sensitivity and a higher specificity compared to ultrasound (Pare 2020), which contradicts the findings of the indirect comparison. These findings require investigation with more direct comparative design studies in future updates. With respect to the three versions of this review, the sensitivity estimates of chest CT appear to remain similar across all versions, while the specificity estimates of chest CT appear to increase with each update. With the current number of chest CT studies included in this review, sensitivity and specificity estimates appear to be stable and have narrow confidence intervals. This may suggest that pooled sensitivity and specificity estimates of chest CT will not notably di er in future updates of this review. Our search strategy was broad and allowed for identification of a wide range of articles about COVID-19 diagnosis. The review authors screened records, extracted data, and assessed study methodology independently and in duplicate. Though we are relatively confident in the accuracy and completeness of our findings, please inform us at mmcinnes@toh.ca should errors be found so that we can address them in a future update. Furthermore, compared to our initial review (Salameh 2020a) , as well as the first update of our review (Islam 2020) , this current update includes a greater number of studies that evaluated accuracy estimates of imaging tests in the diagnosis of suspected COVID-19 participants. We included studies that involved only symptomatic participants, as well as studies that had a mixed population (i.e. symptomatic and asymptomatic participants). Thus, there may be situations when asymptomatic individuals are suspected of having COVID-19, such as if they have infected contacts or other risk factors for infection. However, not all the studies clearly reported information on participants' symptoms. We did not identify reference standard conduct or definition for index test positivity as sources of variability for chest CT accuracy. These findings may suggest that the variables we investigated did not significantly contribute to variability; alternatively, there may be confounding variables obscuring our analyses. Due to insu icient granularity of data, we were unable to investigate additional potential sources of variability, particularly participant setting (inpatient versus outpatient) and symptom status (symptomatic versus asymptomatic). Furthermore, we were unable to investigate potential sources of variability of chest X-ray and ultrasound accuracy estimates due to the limited number of studies. We plan to perform these analyses in future updates, when su icient data become available. In this update, we addressed our secondary objective of evaluating threshold e ects of imaging findings of COVID-19 on accuracy measures, particularly with respect to the CO-RADS scoring system. We could not evaluate threshold e ects for studies that used other scoring systems (e.g. BSTI COVID-19 Reporting Template, RSNA Expert Consensus on Reporting Chest CT Findings for COVID-19) due to the limited number of included studies that used other scoring systems. We explored indirect comparisons of chest CT, chest X-ray and ultrasound of the lungs, and we qualitatively assessed studies that evaluated multiple imaging modalities in the same population (i.e. direct comparisons). However, due to the limited number of studies that evaluated multiple imaging modalities in the same population, we did not formally evaluate direct comparisons of di erent imaging tests at this stage. We plan to conduct formal Cochrane Database of Systematic Reviews analyses of direct comparisons of imaging tests in future updates, as more studies with comparative designs become available. We performed the cumulative meta-analyses and time trends analyses of chest CT accuracy estimates using a univariate model, whereas we performed all other meta-analyses in this review using a bivariate model. The univariate model, which analyses sensitivity and specificity separately instead of together, is therefore a limitation of the cumulative meta-analyses and time trends and should be considered when interpreting the results of the analyses. We did not perform cumulative meta-analyses and time trends of chest X-ray and ultrasound accuracy due to the limited number of studies. We were not able to evaluate accuracy estimates based on specific findings of imaging tests (e.g. ground-glass, consolidation, pleural e usion) or combinations of such findings because of the lack of data granularity reported in included studies; however, we will consider this in future updates of the review. We were not able to evaluate several planned additional secondary objectives due to insu icient data. Important questions that remain a concern are: 1. the rate of positive thoracic imaging in individuals with initial negative RT-PCR results who have positive RT-PCR results on repeat testing; 2. possible associations between findings on thoracic imaging for patients with COVID-19, the number of days a er symptom onset, symptom severity and subsequent hospitalisation; and 3. the screening of asymptomatic individuals. We hope that in future updates of this review we will be able to evaluate these associations as research on the role of imaging tests in the diagnosis of COVID-19 evolves. The quality of the primary studies included in this review continues to impact the overall robustness of the review. Several studies failed to describe their participants (e.g. recruitment method), the details of reference standard conduct used for identifying COVID-19 cases, and the definition used for positivity of the imaging tests. Furthermore, of the studies that described reference standard conduct, one study used a composite reference standard including index test findings, which creates the risk of incorporation bias. Future studies need to prioritise scientific rigour and completeness of reporting and we encourage investigators to refer to the STARD 2015 checklist (Bossuyt 2015; Hong 2018). A limitation of primary studies that may not be captured in our risk of bias evaluation concerns the recruitment of participants. Of the studies that did report recruitment methods, the majority reported including 'consecutive' participants. However, many of these studies did not actually recruit 'consecutive' participants that represent the target population (i.e. individuals suspected of having COVID-19), but instead included all consecutive participants that underwent an imaging test and RT-PCR testing. These studies did not describe whether all suspected patients in the recruitment setting underwent both an imaging test and RT-PCR as a part of standard practice (which would result in a true 'consecutive' recruitment), or whether imaging tests were only performed in patients with specific clinical signs (e.g. severe symptoms). In studies where the latter situation is present, included participants may not represent the target population, and this could create bias. We recommend that the accuracy estimates reported in this review are interpreted with caution because of the use of RT-PCR as the reference standard. The results of RT-PCR are not always sensitive, and it is possible that chest CT may be more sensitive than the reference standard in some patients. However, our investigations of heterogeneity for chest CT studies did not identify di erent accuracy estimates between studies that used at least two RT-PCR test results to define disease-negative status versus studies that used only one RT-PCR test result to define disease-negative status. At this stage, despite its limitations, RT-PCR remains the best tool for diagnosing COVID-19. In future updates of this review, we may consider the use of a latent-class bivariate model for meta-analysis, which adjusts for the imperfect accuracy of the reference standard (Butler-Laporte 2021). Three out of 51 included studies (6%) were only available as preprints at the time of the search. We will update data extracted from these studies in future versions of our review as these studies become published in peer-reviewed journals. As the studies in our cohort included suspected COVID-19 participants, our findings are applicable to individuals suspected to have COVID-19. Our search did not identify many studies that evaluated the accuracy of chest CT, ultrasound of the lungs, and chest X-ray for the diagnosis of COVID-19 in paediatric populations. Thus, the diagnostic accuracy of these modalities in children is not as well established. In addition, the lack of data available in the included studies pertaining to signs and symptoms of presenting cases, the severity of the symptoms, as well as timing of symptom onset adds complexity to the interpretation of the findings in this review. The uncertainty resulting from high or unclear risk of bias and the heterogeneity of included studies limit our ability to confidently draw conclusions based on our results. Our findings indicate that chest computed tomography (CT), chest X-ray and ultrasound all give higher proportions of positive results for individuals with COVID-19 as compared to those without. For chest CT, the chances of getting a positive result are 87.9% (95% CI 84.6 to 90.6) in individuals with COVID-19 and 20.0% (95% CI 15.7 to 25.1) in those without. For chest X-ray, the chances of getting a positive result are 80.6% (95% CI 69.1 to 88.6) in individuals with COVID-19 and 28.5% (95% CI 19.2 to 40.2) in those without. For ultrasound of the lungs, the chances of getting a positive result are 86.4% (95% CI 72.7 to 93.9) in individuals with COVID-19 and 45.4% (95% CI 27.4 to 64.7) in those without. Due to the limited availability of data, accuracy estimates of chest X-ray and ultrasound of the lungs for the diagnosis of COVID-19 in suspected participants should be carefully interpreted. From our current pool of included studies, we can draw limited conclusions regarding the diagnostic performance of thoracic imaging modalities. Additional studies evaluating the accuracy of COVID-19 diagnosis in suspected patients are needed to allow for more reliable findings. In this update, we were unable to assess several secondary objectives due to the lack of available data required to formally evaluate direct comparisons of di erent imaging modalities, and the e ect of time since onset of symptoms on the diagnostic performance of various index tests. Future studies should ideally pre-define positive imaging findings and include direct comparisons of the various modalities of interest on the same participant population in order to provide robust and reliable data. Furthermore, improved transparency and reporting is necessary for more e icient data extraction in our updated versions of this review. We encourage authors and investigators to refer to the STARD 2015 checklist (Bossuyt 2015; Hong 2018) to ensure that any relevant information is clearly reported in their studies. We hope that future updates of this review include more informative studies to allow for additional investigations of variability with improved power and further evaluations of secondary objectives. Members of the Cochrane COVID-19 Diagnostic Test Accuracy Review Group include the following. • The wider team of systematic reviewers from University of Birmingham, UK who assisted with title and abstract screening across the entire suite of reviews for the diagnosis of COVID-19 (Agarwal R, Baldwin S, Berhane S, Herd C, Kristunas C, Quinn L, Scholefield B). We thank Dr Jane Cunningham (World Health Organization) for participation in technical discussions and comments on the manuscript. Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? No If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Bar 2020 (Continued) Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Low concern DOMAIN 2: Index Test (Chest X-ray) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Borakati 2020 (Continued) Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Cartocci 2020 (Continued) Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Unclear Unclear risk Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Low risk Choudhury 2020 Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Yes If a threshold was used, was it pre-specified? No Are there concerns that the index test, its conduct, or interpretation differ from the review question? Choudhury 2020 (Continued) Cochrane Database of Systematic Reviews Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Choudhury 2020 (Continued) Was a consecutive or random sample of patients enrolled? Yes Was a case-control design avoided? Yes Did the study avoid inappropriate exclusions? Unclear Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Low concern Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Unclear Were all patients included in the analysis? Yes Library Trusted evidence. Informed decisions. Better health. Low concern DOMAIN 2: Index Test (Chest X-ray) DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Unclear High risk Low concern Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Unclear Were all patients included in the analysis? Yes Debray 2020 (Continued) Are there concerns that the included patients and setting do not match the review question? Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Unclear Deng 2020 (Continued) Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Deng 2020 (Continued) Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (Chest CT) DOMAIN 2: Index Test (Chest X-ray) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? DOMAIN 3: Reference Standard Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Dini 2020 (Continued) Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? No High risk Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Cochrane Database of Systematic Reviews Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Ducray 2020 (Continued) Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Fonsi 2020 (Continued) Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? No Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Unclear Fujioka 2020 (Continued) Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Unclear Were all patients included in the analysis? Yes Gezer 2020 (Continued) Giannitto 2020 Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Giannitto 2020 (Continued) Cochrane Database of Systematic Reviews Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Giannitto 2020 (Continued) Cochrane Database of Systematic Reviews Was a consecutive or random sample of patients enrolled? Yes Was a case-control design avoided? Yes Did the study avoid inappropriate exclusions? Yes Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Yes If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Gietema 2020 (Continued) High risk Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Low risk Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Guillo 2020 (Continued) Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? He 2020 (Continued) If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Hermans 2020 (Continued) Are there concerns that the included patients and setting do not match the review question? Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Hernigou 2020 (Continued) Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Yes If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Herpe 2020 (Continued) Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? High DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Unclear Hwang 2020 (Continued) Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Hwang 2020 (Continued) Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Cochrane Database of Systematic Reviews Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Kuzan 2020 (Continued) Cochrane Database of Systematic Reviews Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Library Trusted evidence. Informed decisions. Better health. Were the index test results interpreted without knowledge of the results of the reference standard? Yes If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Unclear Murphy 2020 (Continued) Did the study avoid inappropriate exclusions? Unclear Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (Chest CT) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 2: Index Test (Chest X-ray) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? No Were all patients included in the analysis? Unclear High risk Table 5 . Indirect comparisons of sensitivity and specificity of chest CT, chest X-ray and ultrasound A P P E N D I C E S Appendix 1. Glossary • COVID-19: coronavirus disease 2019, the clinical manifestations/symptoms caused by infection with SARS-CoV-2, name given to the disease associated with the virus SARS-CoV-2 • COVID-19 pneumonia: COVID-19 that presents as infection-inflammation of the lungs • Index test: the test that is being assessed (the index test will o en be a new test) • False negative: the test does not detect a condition in someone when it is present • False positive: the test detects a condition in someone when it is not present • Negative predictive value: the probability that someone who has tested negative for the target condition with the index test will really not have it (a true negative) • Positive predictive value: the probability that someone who has tested positive for the target condition with the index test will actually have it (a true positive) • Reference standard: the most reliable method for determining if the target condition is present or absent, used to verify index test results. This could be a combination of tests. • RT-PCR: reverse transcription polymerase chain reaction (RT-PCR) is a laboratory technique that combines reverse transcription of RNA into DNA and amplification of specific DNA targets using polymerase chain reaction. In this context it is used to detect the presence of SARS-CoV-2 RNA The exclusion of case-control studies, as well as studies that report an overview of index test findings in participants with and without the target condition, without explicitly classifying the imaging test as either COVID-19 positive or negative, are modifications from the study protocol and the previous versions of this review. These changes were made prior to initiating the update with approval by the Cochrane COVID-19 Diagnostic Test Accuracy Group, as well as all of the review authors. The criteria for the index test and reference standard domains of the QUADAS-2 tool were modified for this update (Appendix 2). For studies that used formal scoring systems with clearly defined thresholds, even if the signalling question about using a 'prespecified threshold' was 'unclear' or 'no', the index test domain was not considered to have a 'unclear' or 'high' risk of bias based on the 'prespecified threshold' signalling question. For studies that used RT-PCR testing as the reference standard, even if this signalling question about 'blinding' was 'unclear' or 'no', the reference standard domain was not considered to have a 'unclear' or 'high' risk of bias based on the 'blinding' signalling question. These changes were approved by the Cochrane COVID-19 Diagnostic Test Accuracy Group, as well as all of the review authors. We did not address several planned secondary objectives due to insu icient available data (McInnes 2020). These objectives include: evaluating the rate of positive imaging in patients with initial RT-PCR-negative results who have a positive result on a follow-up RT-PCR test; determining if there is an association between number of days a er symptom onset, symptom severity and the findings on thoracic imaging for patients with COVID-19; and determining the rate of alternative diagnoses identified by thoracic imaging. We had planned to undertake additional sensitivity analyses to determine whether low risk of bias for all QUADAS-2 domains had an e ect on findings. However, since the majority of included studies had an overall high or unclear risk of bias due to study design and only two studies had an overall low risk of bias, it was not possible to undertake these analyses. Our protocol included additional sources of heterogeneity to be evaluated, such as disease prevalence, participant symptoms (severity), timing of symptom onset, participant co-morbidities and other potential candidate variables. Due to the lack of available data, we did not investigate these covariates. The previous version of this review included studies of cross-sectional or case-control designs that either: 1. reported specific criteria for index test positivity (i.e. used a scoring system, such as CO-RADS); 2. did not report specific criteria, but had the index test reader(s) explicitly classify the imaging test result as either COVID-19 positive or negative; or 3. reported an overview of index test findings, without having the index test reader(s) explicitly classify index tests as either COVID-19 positive or negative. The inclusion of case-control studies may have been a source of bias as the disease prevalence in the sample of these types of studies do not represent the prevalence in the target population. The inclusion of studies that only reported an overview of index test findings (i.e. studies not intended to be 'diagnostic test accuracy studies') was a possible source of bias identified by sensitivity analysis in our previous review and may have limited our ability to evaluate the sensitivity and specificity of chest CT, chest X-ray and ultrasound. In this update, we excluded studies with case-control designs, and studies that only reported an overview of index test findings without having the index test reader(s) explicitly classify index tests as either COVID-19 positive or negative. The body of evidence has grown to the point that su icient studies that meet these preferred criteria are now available. Investigations of variability were limited in the previous review due to limited available data. The assessment of secondary objectives such as the association between number of days a er symptom onset, symptom severity and the findings on thoracic imaging for patients with COVID-19 was also not possible. In this update, we evaluated the impact of reference standard conduct (RT-PCR, performed at least twice Cochrane Database of Systematic Reviews in all initial negative results versus RT-PCR, not performed at least twice in all initial negative results) and definition used for index test positivity (formal scoring system versus radiologist impression), but we were unable to conduct further investigations of variability due to limited available data. We also formally evaluated the impact of threshold e ects on accuracy estimates in this update, particularly for studies that used the CO-RADS scoring system. We were unable to evaluate threshold e ects in other types of formal scoring systems due to the limited number of included studies that used other systems. Of the studies included in the previous review, several failed to clearly report key information about their study design, as well as their methods for recruiting participants and delivering the reference standard. Therefore, data derived from these studies may have a high risk of bias and this quality of reporting and weaknesses in the primary studies reflected the overall degree of robustness of our study. In this update, several included studies also failed to report key information and had a high or unclear risk of bias with respect to participant selection, index test, reference standard, and participant flow. The interpretation of the accuracy estimates in the previous review involved several uncertainties. While RT-PCR is considered the best available test, the results of the RT-PCR are not always sensitive; sensitivity depends on the timing of specimen collection, with high sensitivity around the onset of symptoms and during the symptomatic period but lower sensitivity before and a er that window (Kucirka 2020), and collection of an appropriate specimen for testing can also be challenging. RT-PCR alone may not be the ideal reference standard (Li 2020b; Loe elholz 2020), and it is possible that chest CT may be more sensitive than the reference standard in some patients, as some patients identified as having a false-positive diagnosis on CT may have been missed by the RT-PCR test. In this update, similar uncertainties with respect to the use of RT-PCR as the reference standard exist. However, our meta-regression analyses for studies that performed RT-PCR testing at least twice for all participants with initial negative results (i.e. studies that addressed, to some extent, the low sensitivity of RT-PCR testing by conducting at least two RT-PCR tests to define disease-negative status) compared with studies that did not perform repeat RT-PCR testing for all participants with initial negative results, did not identify significantly di erent accuracy estimates between the groups. The quality of reporting and the design of the included studies also a ected the generalisability and ability to assess the validity of our findings. About a quarter of the studies (9/34; 26%) included in the previous review were only available as preprints at the time of the search and had not yet been through the peer-review process; of the four preprint studies that were included in the previous review and also included in this update, two have since been published (publication statuses are updated as of 1 November 2020). Compared to the previous review, this update includes a notably smaller proportion of preprint studies (3/51; 6%). We will update data extracted from these studies and include them in future versions of our review as these studies become published in peer-reviewed journals. Correlation of chest CT and RT-PCR testing for coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases Aslan 2020 {published data Diagnostic performance of low-dose chest CT to detect COVID-19: a Turkish population study. Diagnostic and Interventional Radiology 2020 Chest CT accuracy in the diagnosis of SARS-CoV-2 infection: initial experience in a cancer center Diagnostic accuracy and interobserver variability of CO-RADS in patients with suspected coronavirus disease-2019: a multireader validation study Accuracy of CT in a cohort of symptomatic patients with suspected COVID-19 pneumonia during the outbreak peak in Italy Chest X-ray has poor diagnostic accuracy and prognostic significance in COVID-19: a propensity matched database study. medRxiv [Preprint] 2020 Chest CT for early detection and management of coronavirus disease (COVID-19): a report of 314 patients admitted to Emergency Department with suspected pneumonia Thoracic imaging tests for the diagnosis of COVID-19 Thoracic imaging tests for the diagnosis of COVID-19 SARS-CoV-2 infection: people infected with severe acute respiratory syndrome coronavirus 2, but who may or may not have any clinical manifestations of infection • Secondary care: medical care that is provided by a specialist or facility upon referral by a primary care physician and that requires more specialised knowledge, skill, or equipment than the primary care physician can provide • Sensitivity: the proportion of people with the target condition (with disease) that are correctly identified by the index test • Specificity: the proportion of people without the target condition (without disease) that are correctly identified by the index test • Tertiary care: specialised care, usually for inpatients and on referral from a primary or secondary health professional René Spijker: the Dutch Cochrane Centre (DCC) has received grants for performing commissioned systematic reviews Junfeng Wang received a consultancy fee from Biomind, an Artificial Intelligence (AI) company providing machine intelligence solutions in medical imaging. The consultancy service was about design of clinical studies Copyright © 2021 The Authors Authors' judgement Was a consecutive or random sample of patients enrolled? YesWas a case-control design avoided? YesDid the study avoid inappropriate exclusions? No Are there concerns that the included patients and setting do not match the review question?Low concern DOMAIN 2: Index Test (Chest CT)Were the index test results interpreted without knowledge of the results of the reference standard?Yes If a threshold was used, was it pre-specified? No Are there concerns that the index test, its conduct, or interpretation differ from the review question?Low concern DOMAIN 2: Index Test (Chest X-ray) DOMAIN 2: Index Test (Ultrasound of the lungs) Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard?Unclear Did all patients receive the same reference standard? YesWere all patients included in the analysis? Yes People with suspected COVID-19All settings, in particular secondary care, emergency care and intensive care units (ICUs)In people presenting with suspected COVID-19; suspicion may be based on prior testing, such as general lab testing Signs and symptoms often used for triage or referral A positive diagnosis for COVID-19 by the following.1. A positive reverse transcriptase polymerase chain reaction (RT-PCR) test for SARS-CoV-2 infection, from any manufacturer in any country, from any source, including nasopharyngeal swabs or aspirates, oropharyngeal swabs, bronchoalveolar lavage fluid (BALF), sputum, saliva, serum, urine, rectal or faecal samples 2. Positive on WHO criteria for COVID-19, which includes some testing RT-PCR-negative 3. Positive on China CDC criteria for COVID-19, which includes some testing RT-PCR-negative 4. Positive serology in addition to consistent symptomatology 5. Positive on study-specific list of criteria for COVID-19, which includes some testing RT-PCR-negative 6. Other criteria (symptoms, imaging findings, other tests) A negative diagnosis for COVID-19 by the following.1. People with suspected COVID-19 with negative RT-PCR test results, whether tested once or more than once 2. Current healthy or with another disease (no RT-PCR test) This list is not exhaustive, as we anticipate that studies will use a variety of reference standards and we plan to include all of them, at least for the earlier versions of the review. Although RT-PCR is considered the best available test, it is suspected of missing a substantial proportion of cases, and thus may not be the ideal reference standard if used as a standalone test (Li 2020b; Loeffelholz 2020). Therefore, we are likely to use alternative reference standards, such as a combination of RT-PCR, and symptoms or imaging findings, or both.We will judge how likely each reference standard definition is to correctly classify individuals in the assessment of methodological quality. All reference standards are likely to be imperfect in some way; details of reference standard evaluation are provided in the 'Risk of bias' tool below. We will Cochrane Database of Systematic Reviews use a consensus process to agree the classification of the reference standard as to what we regard as good, moderate and poor. 'Good' reference standards need to have very little change of misclassification, 'moderate', a small but acceptable risk, 'poor', a larger and probably unacceptable risk. Was a consecutive or random sample of patients enrolled?YES: if a study explicitly states that all participants within a certain time frame were included; that this was done consecutively; or that a random selection was done.NO: if it is clear that a different selection procedure was employed; e.g. selection based on clinician's preference, or based on institutions (i.e. 'convenience' series) UNCLEAR: if the selection procedure is not clear or not reported at all.Was a case-control design avoided?YES: if a study explicitly states that all participants came from the same group of (suspected) patients.NO: if it is clear that a different selection procedure was employed for the participants depending on their COVID-19 status (e.g. proven infected patients in one group and proven non-infected patients in the other group).UNCLEAR: if the selection procedure is not clear or not reported at all.Did the study avoid inappropriate in-or exclusions?This needs to be addressed on a case-to-case basis.YES: If all eligible patients were ore or less equally suspected of having COVID-19 and were included and if the numbers in the flow chart show not too many excluded participants (a maximum of 20% of eligible patients excluded without reasons).NO: If over 20% of eligible patients were excluded without providing a reason; if only proven patients were included, or only proven non-patients were included; if in a retrospective study participants without index test or reference standard result were excluded; if exclusion was based on severity assessment post-factum or comorbidities (cardiovascular disease, diabetes, immunosuppression). If the study oversampled patients with particular characteristics likely to affect estimates of accuracy.UNCLEAR: if the exclusion criteria are not reported. HIGH: if one or more signalling questions were answered with NO, as any deviation from the selection process may lead to bias.LOW: if all signalling questions were answered with YES. Is there concern that the included patients do not match the review question?This needs to be addressed on a case-to-case basis, based on the objective the included study answers to.HIGH: if accuracy was assessed in a case-control design, or the study was able to only estimate sensitivity or specificity.LOW: any situation where imaging is generally available.UNCLEAR: if a description about the participants is lacking. Were the index test results interpreted without knowledge of the results of the reference standard? YES: if blinding was explicitly stated or index test was recorded before the results from the reference standard were available NO: if it was explicitly stated that the index test results were interpreted with knowledge of the results of the reference standard Cochrane Database of Systematic Reviews UNCLEAR: if blinding was unclearly reported.If a threshold was used, was it prespecified?YES: for any of these index tests it is highly unlikely that any numerical threshold is used. Still we expect studies to report their criteria for test-positivity (e.g. the constellation of imaging findings used). If these criteria are reported in the methods section, we will score 'YES' for this question.NO: if the optimal criterion for test-positivity was based on the reported data (for example, different scores on a quantitative scoring system) we will score 'NO'.UNCLEAR: if the criteria for test positivity were not or unclearly reported. HIGH: if one or more signalling questions were answered with NO.LOW: if all signalling questions were answered with YES. Note: For studies that use formal scoring systems with clearly defined thresholds, even if the signalling question about using a 'prespecified threshold' is 'unclear' or 'no', this domain should not be considered as having a 'unclear' or 'high' risk of bias based on the aforementioned question.Is there concern that the index test, its conduct, or interpretation differ from the review question?There is not a huge amount of variability from a technical perspective. Therefore, this question will probably be answered 'LOW' in all cases except when assessments are made using personnel not available in practice, or personnel not trained for the job, or using modalities that are uncommon in practice. We will consult expert clinicians on a case-to-case basis to judge this question. Is the reference standard likely to correctly classify the target condition?YES: for COVID-19: RT-PCR, done by trained personnel, and repeated after a first negative RT-PCR, following guidelines for confirmed cases and done with an assay targeting minimum 2 targets in the genes N, E, S or RdRP (one target even acceptable in zone with known transmission). To clarify, a low risk of bias reference standard for true negative would require 2 (or more) negative RT-PCR results.NO: any other test UNCLEAR: if no reference standard was reported, or if it was just reported that RT-PCR was done.Were the reference standard results interpreted without knowledge of the results of the index test? YES: if it was explicitly stated that the reference standard results were interpreted without knowledge of the results of the index test, or if the result of the index test was obtained after the reference standard.NO: if it was explicitly stated that the reference standard results were interpreted with knowledge of the results of the index test or if the index test was used to make the final diagnosis (incorporation bias). HIGH: if one or more signalling questions were answered with NO.LOW: if all signalling questions were answered with YES. Note: For studies that use RT-PCR testing as the reference standard, even if this signalling question about 'blinding' is 'unclear' or 'no', this domain should not be considered as having a 'unclear' or 'high' risk of bias based on the aforementioned question.Is there concern that the target condition as defined by the reference standard does HIGH: there is a high concern regarding applicability of the reference standard if the reference standard actually measures a different target condition than the one we are interested in for the review. For example, if the diagnosis is only based on clinical picture, without excluding other possi- Cochrane Database of Systematic Reviews not match the review question?ble causes of this clinical picture (e.g. other respiratory pathogens), then there is considerable concern that the reference standard is actually measuring something else than COVID-19. In addition, a positive RT-PCR only measures SARS-CoV-2 infection and not COVID-19 and therefore the reference standard for COVID-19 is a combination of positive RT PCR and symptoms and/or imaging findings. LOW: if above situations not present UNCLEAR: if intention for testing is not reported in the study Was there an appropriate interval between index test(s) and reference standard? YES: as the situation of a patient, including clinical presentation and disease progress, evolves rapidly and new/ongoing exposure can result in case status change. On the other hand, negative PCR results need to be repeated for several days. Therefore, an appropriate time interval will be within 7 days.NO: if there is more than 7 days between the index test and the reference standard or if patients are otherwise reported to be assessed with the index versus reference standard test at moments of different severity. A more e icient approach was required to keep up with the rapidly increasing volume of COVID-19 literature. A classification model for COVID-19 diagnostic studies was built with the model building function within Eppi Reviewer, which uses the standard SGCClassifier in Scikit-learn on word trigrams (Thomas 2010). As outputs, new documents receive a percentage (from the predict_proba function) where scores close to 100 indicate a high probability of belonging to the class 'relevant document' and scores close to 0 indicate a low probability of belonging to the class 'relevant document'. We used three iterations of manual screening (title and abstract screening, followed by fulltext review) to build and test classifiers. The final included studies were used as relevant documents, while the remainder of the COVID-19 studies were used as irrelevant documents. The classifier was trained on the first round of selected articles, and tested and retrained on the second round of selected articles. Testing on the second round of selected articles revealed poor positive predictive value but 100% sensitivity at a cut-o of 10. The poor positive predictive value is mainly due to the broad scope of our topic (all diagnostic studies in , poor reporting in abstracts, and a small set of included documents. The model was retrained using the articles selected of the second and third rounds of screening, which added a considerable number of additional documents. This led to a large increase in positive predictive value, at the cost of a lower sensitivity, which led us to reduce the cut-o to 5. The largest proportion of documents had a score between 0-5. This set did not contain any of the relevant documents. This version of the classifier with a cut-o 5 was used in subsequent rounds and accounted for approximately 80% of the screening burden. Embase records from the Stephen B. Thacker CDC Library, Covid-19 Research articles Downloadable database.Records were obtained by the CDC Library by searching Embase through Ovid using the following search strategy. Embase ( The results for chest computed tomography (CT) have changed.