key: cord-0810829-95v0nfme authors: Caini, Saverio; Bellerba, Federica; Corso, Federica; Díaz-Basabe, Angélica; Natoli, Gioacchino; Paget, John; Facciotti, Federica; De Angelis, Simone Pietro; Raimondi, Sara; Palli, Domenico; Mazzarella, Luca; Pelicci, Pier Giuseppe; Vineis, Paolo; Gandini, Sara title: Meta-analysis of diagnostic performance of serological tests for SARS-CoV-2 antibodies up to 25 April 2020 and public health implications date: 2020-06-11 journal: Euro Surveill DOI: 10.2807/1560-7917.es.2020.25.23.2000980 sha: cc0b1a814108ddb24b290698930593455531fb46 doc_id: 810829 cord_uid: 95v0nfme We reviewed the diagnostic accuracy of SARS-CoV-2 serological tests. Random-effects models yielded a summary sensitivity of 82% for IgM, and 85% for IgG and total antibodies. For specificity, the pooled estimate were 98% for IgM and 99% for IgG and total antibodies. In populations with ≤ 5% of seroconverted individuals, unless the assays have perfect (i.e. 100%) specificity, the positive predictive value would be ≤ 88%. Serological tests should be used for prevalence surveys only in hard-hit areas. We reviewed the diagnostic accuracy of SARS-CoV-2 serological tests. Random-effects models yielded a summary sensitivity of 82% for IgM, and 85% for IgG and total antibodies. For specificity, the pooled estimate were 98% for IgM and 99% for IgG and total antibodies. In populations with ≤ 5% of seroconverted individuals, unless the assays have perfect (i.e. 100%) specificity, the positive predictive value would be ≤ 88%. Serological tests should be used for prevalence surveys only in hard-hit areas. Testing of patients for ongoing infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is mostly conducted by detecting viral RNA in respiratory specimens using reverse transcription (RT)-PCRbased assays. While these tests can confirm infection, they may prove less helpful in quantifying the actual number of coronavirus disease (COVID-19) cases in the population, if a large proportion of infected individuals are either asymptomatic [1, 2] or with mild symptoms, thereby having no incentive to seek medical care or to be tested. In this event, such cases may go unnoticed by surveillance systems and public health entities. Moreover, once the infection is resolved, RT-PCR tests do not inform on past infection. In order to overcome these shortcomings, serology-based tests are being increasingly used to gain more insight into the true prevalence of persons who have/have had COVID-19 and to assess the degree of herd immunity that has been acquired by the population. Serology-based tests have thus become a key public health element in the COVID-19 pandemic and there has been a rapid growth in the number of available SARS-CoV-2 serological tests since February 2020. These tests differ between one another in several ways, including the antigens used for antibody detection, the type of antibodies identified, and the laboratory method. Here, we conducted a systematic review and meta-analysis of the diagnostic accuracy of currently available SARS-CoV-2 serological tests, and assessed their real-world performance under scenarios of varying proportion of infected individuals in the population being tested. Searching studies assessing serological tests for severe acute respiratory syndrome coronavirus 2 We carried out a systematic literature search (up to 25 April 2020) of scientific articles on immunological tests for detection of SARS-CoV-2 antibodies. Both peerreviewed and non-peer-reviewed reports in English were retrieved by interrogating the PubMed, medRxiv and bioRxiv databases with the following keywords: 'SARS-COV-2 OR COVID' AND 'IgM OR IgG OR IgA OR antibody OR serological' AND 'test'. The search also extended to the reference lists of the reports found and to technical manuals of tests mentioned therein. Reports/technical documents considered in this review are referred to as 'studies' henceforth. We considered independent studies that specified the antigen used for antibody detection, used quantitative methods, and reported the number of true positives, true negatives, false positives, and false negatives. This information was extracted from each study as well as the laboratory method used as reference. Based on the 2×2 contingency table, we calculated the test sensitivity and specificity (with 95% confidence intervals (CI)) and the diagnostic odds ratio (DOR), to provide an overall measure of the test performance [3] . We then calculated the positive (PPV) and negative (NPV) predictive values assuming a true prevalence of 5%, 10% and 20%. Concerning the pooled estimates of the performance parameters of serological tests, it should be noted that some of the studies included in the current systematic review assessed more than one assay. Among three investigations employing the 'Beijing Wantai' kit (Beijing Wantai Biological, Beijing, China), one also used the 'Xiamen InnoDx Biotech' kit (Xiamen InnoDx Biotech Co., Xiamen, China) to measure in particular IgG and total antibodies [4] . In the meta-analysis for calculating summary values, we only entered data for the 'Beijing Wantai' kit (instead of the 'Xiamen InnoDx Biotech' kit) from this study [4] . This was for consistency with the other studies found. Moreover, when tests with the nucleocapsid (N) protein and the spike protein antigens were both reported in a single study, we only entered data derived from assays with the N protein, because they generally showed better sensitivity. To assess the robustness of this choice, sensitivity analyses were conducted, whereby parameters were re-calculated by swapping data obtained from assays based on the N protein with those obtained based on the spike protein. Pooled estimates of sensitivity and specificity were obtained through random-effects models after Freeman-Tukey double arcsine transformation. DOR were pooled by fitting a bivariate model, which takes into account the correlation between sensitivity and specificity and uses their log-transformed values as normally distributed variables. Between-studies heterogeneity was assessed using the I 2 statistics, which quantifies the percentage of variation attributable to heterogeneity rather than chance. An I 2 below 50% was considered as an indicator of acceptable heterogeneity. Following the identification of 71 records, screening led to the exclusion of 61 of these, leaving nine studies included in the current systematic review [4] [5] [6] [7] [8] [9] [10] [11] [12] ( Figure) . Six of the nine studies were based on commercial assays including enzyme-linked immunosorbent assay (ELISA) or chemiluminescence microparticle immunoassay (CMIA)/chemiluminescent immunoassay (CLIA), and three on in-house tests, for detecting SARS-CoV-2 antibodies (Table 1 and Supplementary Table S1 ). Most studies (n = 8) evaluated sensitivity and specificity separately for IgG and IgM, while only some (n = 4) reported those values for total antibodies. Only one study tested IgA [10] . Real-time RT-PCR was always used as the reference method for sensitivity, while the definition of patients testing negative varied across studies (Supplementary Table S2 ). The reviewed studies had sample sizes ranging between 46 and 436 patients. For IgM, sensitivity ranged from 68% in Liu [7] , which tested 370 patients, while two smaller-size studies (46 and 84 patients) reached a sensitivity of 100% [8, 9] . For IgM testing, the PPV had lowest values of 19% to 52% (in the 5% and 20% true-prevalence scenarios, respectively) in the study by Lin et al. (n = 159 patients) [5] , while it was 100% in all scenarios in another study [12] and in the largest study (n = 314 patients) by Liu et al. [6] . For IgG, the PPV ranged between 47% and 81% (depending on the assumed prevalence) in the study by Lassaunière et al. (n = 112 patients) [10] , while it amounted to 100% in other studies [4, 6, 8, 9, 12] as well as in the one by Lou et al. with the largest number of patients (n = 380) [4] . The NPV fell in the range 96-100% for all IgG and IgM kits when the prevalence was assumed to be 10% (the lower limit of the range became 98% and 92% for the 5% and 20% true-prevalence scenarios, respectively) as shown in Table 1 . a Including PubMed, medRxiv and bioRxiv databases. b Including reference lists of reports found through public databases and technical manuals of serological tests mentioned in these reports. Main characteristics of studies included in the systematic review and meta-analysis of the performance serological tests for SARS-CoV-2, along with test sensitivity and specificity and positive and negative predictive values assuming a true COVID-19 prevalence of 5%, 10% and 20% in the population tested, as at 25 April 2020 (n = 9 studies) SARS-CoV-2: severe acute respiratory syndrome coronavirus 2; sens: sensitivity; spec: specificity. a PPV-5, PPV-10, and PPV-20 are PPVs at a hypothesised prevalence of 5%, 10% and 20%. b NPV-5, NPV-10, and NPV-20 are NPVs at a hypothesised prevalence of 5%, 10% and 20%. c Commercial kit. Meta-analysis yielded a summary sensitivity of 82% (95%CI: 75-88%) for IgM, and 85% for both IgG (95%CI: 73-93%) and total antibodies (95%CI: 74-94%) ( Table 2 and Supplementary Figure S1-S6). Pooled specificity was 98% (95%CI: 92-100%) for IgM and 99% (95%CI: 98-100%) for both IgG and total antibodies. Mostly due to the low proportion of false positives, the pooled DOR was generally very high (ca 2,800 for IgM, and ca 1,300 for IgG and total antibodies). Both sensitivity and specificity were 93% for the only assay testing IgA. While some SARS-CoV-2 serological tests reported an excellent ability to discriminate between seroconverted and non-seroconverted individuals, others showed markedly lower diagnostic accuracy. In particular, the pooled sensitivity for all/all types of antibodies was unsatisfactory (82-85%), as a substantial fraction (one sixth on average) of seroconverted individuals would be incorrectly classified as non-seroconverted. Specificity was generally very high (≥ 98%), yet this may not suffice to guarantee satisfactory real-world performance in areas with a very low prevalence of infected individuals. A specificity just less than perfect (99%) would in fact produce a PPV ranging between 76% and 88% when combined with a true prevalence equal to 5%, meaning that around one fifth of those labelled as seroconverted would in reality be false positives. According to the World Health Organization (WHO), 2-3% of the global population may have been infected by the end of the first epidemic wave [13] , thus the PPV in most areas could indeed be much lower than in our simulations. Further reasons of concern lie in the low number of patients on whom some estimates are based, the variability in terms of the gold standard used to define sensitivity and specificity, the possible heterogeneity of testing procedures (which should be harmonised internationally to ensure comparability), the fact that some of the included studies were not, or had not yet been peer-reviewed [4, 5, [8] [9] [10] [11] [12] , and, above all, the uncertainty as to whether positivity according to the test means that effective protection against reinfection has been established [14, 15] . Of note, some of the above factors may have contributed to widen the range of reported of sensitivity and specificity between studies and, in turn, to the high heterogeneity observed in most of the pooled results. Our choice to consider the N protein instead of the spike protein was justified by the generally better sensitivity for the former, but can be questioned as the latter is generally more likely to induce neutralising antibodies. While the available serological tests included here can be used for research purposes, our data suggest that their use for large-scale prevalence surveys (or to grant 'immunity passports', which could possibly entail exemption from use of personal protective equipment for healthcare personnel, and face masks and social distancing measures for the general population) appears currently only justified (and only if showing very high diagnostic accuracy) in hard-hit regions, while they should be used with caution elsewhere. Moreover, issues of cost, speed, and availability should also be taken into account when planning large seroprevalence surveys, as well as the medical and non-medical costs of diagnostic errors. Finally, SARS-CoV-2 serological tests are being developed at a fast pace, and the conclusions of our report may need revision in the coming months, also depending on the further spread of the pandemic. Summary estimates of sensitivity and specificity, with 95% confidence intervals, of the serological tests for SARS-CoV-2 included this systematic review, as at 25 April 2020 Gioacchino Natoli: conceptualisation, investigation, methodology, validation, writing-review and editing John Paget: methodology, validation, writing original draft Federica Facciotti: conceptualisation, formal analysis, methodology, validation, writing-review and editing Angelis: conceptualisation, formal analysis, methodology, validation, writing-review and editing Sara Raimondi: conceptualisation, data curation, formal analysis, methodology, software, writing-review and editing Domenico Palli: conceptualisation, supervision, validation, writing-review and editing Luca Mazzarella: conceptualisation, supervision, validation, writing-review and editing Pier Giuseppe Pelicci: conceptualisation, funding acquisition, investigation, supervision, validation, writing-review and editing Vineis: conceptualisation, funding acquisition, methodology, supervision, validation, writing-review and editing Sara Gandini: conceptualisation, data curation, formal analysis, investigation, methodology, project administration, software, visualisation, writing-review and editing Covid-19: identifying and isolating asymptomatic people helped eliminate virus in Italian village Q&A: Similarities and differences -COVID-19 and influenza. Geneva: WHO The diagnostic odds ratio: a single indicator of test performance Serology characteristics of SARS-CoV-2 infection since the exposure and post symptoms onset Evaluations of serological test in the diagnosis of 2019 novel coronavirus (SARS-CoV-2) infections during the COVID-19 outbreak Evaluation of Nucleocapsid and Spike Protein-Based Enzyme-Linked Immunosorbent Assays for Detecting Antibodies against SARS-CoV-2 Antibody responses to SARS-CoV-2 in patients of novel coronavirus disease 2019 SARS-CoV-2 IgG ELISA Kit (DEIASL019) Novel Coronavirus COVID-19 IgG ELISA Kit. Product information Evaluation of nine commercial SARS-CoV-2 immunoassays Anti-SARS-Cov-2 Total Reagent Pack. Instructions for use Evaluation of antibody testing for SARS-Cov-2 using ELISA and lateral flow immunoassays Coronavirus disease (COVID-2019) press briefing 20 Coronavirus disease (COVID-2019) press briefing 17 Immunity passports" in the context of COVID-19. Geneva: WHO Federica Bellerba is a PhD student at the European School of Molecular Medicine (SEMM), Milan, Italy.Funding: No particular funding was received for this study. None declared. Saverio Caini: conceptualisation, investigation, methodology, writing original draft.Federica Bellerba: conceptualisation, data curation, formal analysis, investigation, methodology, writing-review and editing.Federica Corso: data curation, formal analysis, investigation, methodology, software, writing-review and editing.Angélica Díaz-Basabe: conceptualisation, investigation, writing-review and editing. This is an open-access article distributed under the terms of the Creative Commons Attribution (CC BY 4.0) Licence. You may share and adapt the material, but must give appropriate credit to the source, provide a link to the licence and indicate if changes were made.Any supplementary material referenced in the article can be found in the online version.