key: cord-318426-kv7aa0og authors: Kritsotakis, Evangelos I. title: On the importance of population-based serological surveys of SARS-CoV-2 without overlooking their inherent uncertainties date: 2020-05-22 journal: nan DOI: 10.1016/j.puhip.2020.100013 sha: doc_id: 318426 cord_uid: kv7aa0og Abstract The SARS-CoV-2 epidemic has caused an unprecedented public health situation and more than ever it is important to be well informed on methods to monitor and analyse the progression of the epidemic. This brief note aims to explain the scope in conducting large-scale serological surveys of SARS-CoV-2 to define the landscape of population immunity, without overlooking the inherent uncertainty steaming from sampling design and diagnostic validity. The note completes with a succinct appendix of simple statistical methods for estimating prevalence from random population samples using imperfect diagnostic tests. To date we know little about the SARS-CoV-2 virus spread into the general population. Our great uncertainty stems from the fact that the virus spreads easily between people but many COVID-19 infections are mild or subclinical 1 and therefore go unnoticed. The actual number of people already exposed to SARS-CoV-2 may be much higher than the number of confirmed COVID-19 patients who have been seriously ill and/or tested positive for SARS-CoV-2. Most experts would agree that it is reasonable to assume that we are at least 10 times off in reported numbers, but a recent report suggests that the actual number of infections may be as much as 85 times higher than that reported 2 . From a public health standpoint, knowing how many and who have already been exposed to SARS-CoV-2 gives a clearer picture of how widespread the virus is in local populations. This is extremely useful because public health measures depend on how far Coronavirus has already penetrated into the general population. In the absence of precise estimates from a random sample of the general population, we are essentially operating in the dark and likely to continue taking restrictive measures without being able to assess their effectiveness. Population-based serological surveys can generate much needed data 3 . They use serological tests to examine a large number of blood samples from people without a confirmed SARS-CoV-2 infection to detect signs that they were once infected with the virus. That is, serological tests detect our body's response to the virus but not the virus itself (as opposed to molecular tests). Therefore, they cannot be used early in infection before the patient's body has already developed an antibody response. Thus, serological tests are not much helpful for clinicians to diagnose infection in individual persons. However, they are extremely useful for epidemiological purposes to understand the immunity landscape of the population at large. Estimating the true rate of SARS-CoV-2 infection allows epidemiologists to predict the likely future course of the epidemic in specific locations or populations and helps public health authorities to better design interventions to control the epidemic. This is because we expect, although no one is entirely certain yet, that once we have antibodies to the virus, they will provide us with immunity, that is, we will be protected for some period of time. Detecting people who are potentially immune to SARS-CoV-2 could even play an important role in when and how social distancing restrictions are lifted. The results of serological surveys can also be useful in guiding strategic decisions on essential staffing in hospitals and other health care facilities -for example, by assigning to the forefront those who are probably immune. It is therefore desirable to conduct targeted serological studies of healthcare workers. The results of serological surveys come with uncertainty, but it is important to note that this can be assessed. Uncertainty stems from two main sources: (a) sampling variability, that is, from the fact that we examine only a small part of the overall population, and (b) diagnostic validity, that is, imperfect accuracy of the immunoassay test in detecting the presence or the absence of antibodies. Therefore, it is critical that serological surveys are based on both appropriate sampling designs assuring population representation and accurate serological tests. Due to urgency and demand, several serological tests have been developed and placed on the market recently. Manufacturer's own data 4 and independent evaluations 5 indicate that accurate enough tests are currently available: their probability of successfully detecting people exposed to SARS-CoV-2 (sensitivity) exceeds 90% a few days after the infection and their success in detecting non-infected individuals (specificity) reaches 99%. Available serological tests are not perfect but are acceptable for use in the context of surveying populations for SARS-CoV-2 antibodies, because survey estimates can be corrected for imperfect diagnostic performance. For example, let us assume that a serological survey of ݊ = 1,000 people found that ܽ = 100 are positive for SARS-CoV-2 antibodies, meaning that ܲ = ܽ ݊ ⁄ = 10% were infected. The test used was imperfect, say with known sensitivity ܵ = 92% and specificity ܵ = 98%, but we can correct our estimate for these inaccuracies. The corrected estimate of the true prevalence of SARS-CoV-2 turns out to be ܲ ் = 8.9%. We can express the uncertainty associated with this estimate using a 95% confidence interval, which in this case is from 6.7% to 11.1%. In this way, we get a fairly precise idea of the extent of the virus spread into the population. Large-scale seroprevalence surveys are an important tool in combating COVID-19 disease as they can provide much-needed estimates of the fraction of the population with antibodies against SARS-CoV-2. The quality of the antibody prevalence estimates depends on the sampling design and the diagnostic accuracy of serological tests. This appendix provides a summary of simple methods to estimate prevalence using imperfect diagnostic tests. Assume that the prevalence of infection ሺߨ ் ሻ in the target population is a fixed, but unknown quantity. To estimate ߨ ் , we do a diagnostic test on ݊ randomly sampled individuals from the target population and ܽ individuals test positive. However, the test is imperfect, with sensitivity ሺܵ ሻ and/or specificity ൫ܵ ൯ that are below 100%. Thereby, the apparent prevalence Let ܲ ் denote the true prevalence proportion that we would observe if the diagnostic test was perfect. It is easy to confirm that the apparent prevalence ܲ = ܽ ݊ ⁄ and the true prevalence ܲ ் are related by: ܲ ் is known as the Rogan-Gladen-estimator. 6 Assuming ܵ and ܵ are known with certainty, ܲ ் is an unbiased estimate of the true population prevalence ߨ ் . It is also a maximum likelihood estimate of ߨ ் . 7 Note that ܲ ் is meaningful under the reasonable requirement that the diagnostic test is better than the flip of a coin ൫ܵ + ܵ > 1 ൯. Nevertheless, ܲ ் is not guaranteed to lie between 0 and 1 (especially when ܲ is very small) and a "clipped" estimate may need to be used: ܲ ் = ݉݅݊ሾ݉ܽ‫ݔ‬ሺܲ ் , 0ሻ, 1ሿ. The standard error of ܲ ் is: where ܵ‫ܧ‬ሺܲ ሻ depends on the sampling design used. For a simple random sample from a large population: For large n, the statistic ሺܲ ் − ߨሻ ܵ‫ܧ‬ሺܲ ் ሻ ⁄ can be treated as a standard normal variate. Thus, an approximate 95% confidence interval for ߨ is obtained as: The "clipped" estimate ܲ ் = ݉݅݊ሾ݉ܽ‫ݔ‬ሺܲ ் , 0ሻ, 1ሿ is asymptotically equivalent to ܲ ் , 8 so the large sample theory is valid in that case too. Essentially, for fixed ܵ and ܵ , a 95% confidence interval ሾ݈, ‫ݑ‬ሿ for the apparent prevalence ߨ , can be converted to a 95% confidence interval for the true prevalence ߨ ் by Consequently in situations where asymptotic assumptions are not met (e.g. small sample size and/or very low prevalence), exact methods (e.g. Clopper-Pearson) can be applied to calculate confidence limits for the apparent prevalence that can be converted to confidence limits for the true prevalence using the formula above. 9 If ܵ and ܵ are not known with certainty, but independent binomial estimates are available from a validation study on persons whose infection status is known, then ܲ ் is biased but to a much lesser degree than ܲ 6 . In that case, a more valid quantification of standard error that captures the uncertainty in ܵ and ܵ is given by: where ݊ ଵ and ݊ ଶ denote the numbers of infected and non-infected individuals in the validation study. 6 A double sampling design that partly utilises a more definitive diagnostic test can also be used 10 . Using a binomial distribution model for the number of positive tests ܽ out of the n individuals tested, a Bayesian approach may also be used to estimate ߨ ் that does not yield explicit formulae but is computationally easy 11, 12 . The SARS-CoV-2 outbreak: What we know COVID-19 Antibody Seroprevalence Use of serological surveys to generate key insights into the changing global landscape of infectious disease Covid-19 antibody tests face a very specific problem. Evaluate Vantage Antibody tests in detecting SARS-CoV-2 infection: a meta-analysis Estimating prevalence from the results of a screening test A three-population model for sequential screening for bacteriuria2 The Statistical Precision of Medical Screening Procedures: Application to Polygraph and AIDS Antibodies Test Data Exact confidence limits for prevalence of a disease with an imperfect diagnostic test Sampling of Populations: Methods and Applications: Fourth Edition. Sampling of Populations: Methods and Applications: Fourth Edition Estimating Prevalence Using an Imperfect Test A tutorial in estimating the prevalence of disease in humans and animals in the absence of a gold standard diagnostic ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: