key: cord-0846169-3p9yc6ti authors: Burstyn, Igor; Goldstein, Neal D.; Gustafson, Paul title: It can be dangerous to take epidemic curves of COVID-19 at face value date: 2020-06-23 journal: Can J Public Health DOI: 10.17269/s41997-020-00367-6 sha: 34ff0cd9c98c40f72a8624cc658cb0e3fa72101c doc_id: 846169 cord_uid: 3p9yc6ti During an epidemic with a new virus, we depend on modelling to plan the response: but how good are the data? The aim of our work was to better understand the impact of misclassification errors in identification of true cases of COVID-19 on epidemic curves. Data originated from Alberta, Canada (available on 28 May 2020). There is presently no information of sensitivity (Sn) and specificity (Sp) of laboratory tests used in Canada for the causal agent for COVID-19. Therefore, we examined best attainable performance in other jurisdictions and similar viruses. This suggested perfect Sp and Sn 60–95%. We used these values to re-calculate epidemic curves to visualize the potential bias due to imperfect testing. If the sensitivity improved, the observed and adjusted epidemic curves likely fall within 95% confidence intervals of the observed counts. However, bias in shape and peak of the epidemic curves can be pronounced, if sensitivity either degrades or remains poor in the 60–70% range. These issues are minor early in the epidemic, but hundreds of undiagnosed cases are likely later on. It is therefore hazardous to judge progress of the epidemic based on observed epidemic curves unless quality of testing is better understood. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.17269/s41997-020-00367-6) contains supplementary material, which is available to authorized users. It is well known that outcome misclassification can bias epidemiologic results, yet is infrequently quantified and adjusted. In the context of infectious disease outbreaks, such as during the COVID-19 pandemic of 2019-2020, false positive diagnoses may lead to a waste of limited resources, such as testing kits, hospital beds, and loss of healthcare workforce. On the other hand, false negative diagnoses contribute to uncontrolled spread of contagion. In an ongoing epidemic, where test sensitivity (Sn) and specificity (Sp) of case ascertainment are fixed, prevalence of the outcome (infection) determines whether false positives or negatives dominate. Poor sensitivity is more tolerable when prevalence is low early in the epidemic but less so as the number of infected grows . At present, the accuracy of testing for SARS-CoV-2 viral infection, the causal agent for COVID-19, is unknown in Canada. All available research suggest that near-perfect Sp is justified but Sn can range from 60 to 95% (Binsaeed et al. 2011; COVID-19 Science Report: Diagnostics 2020; Fang et al. 2020; Ai et al. 2020; Konrad et al. 2020; US FDA 2020; He et al. 2020) . Full details of our efforts to learn more about Sn and Sp are detailed in Burstyn et al. (2020) . We conducted a Monte Carlo (probabilistic) sensitivity analysis of the impact of the plausible extent of this misclassification on bias in epidemic curves in Alberta, Canada. Data on observed counts of "confirmed" positive diagnoses versus presumed incident dates ("date reported to Alberta Health") were obtained on May 28, 2020 from their " Figure 3 : COVID-19 cases in Alberta by day status" posted on https://www.alberta.ca/stats/covid-19-alberta-statistics htm under "Case counts" tab. Samples (e.g., nasopharyngeal swab; bronchial wash) undergo nucleic acid testing that use primers/ probes targeting the E (envelope protein; Corman et al. 2020) and RdRp (RNA-dependent RNA polymerase; qualitative detection method developed at the Alberta Public Health Laboratory (Alberta Precision Laboratories (APL)) genes of the COVID-19 virus. The relevant data notes are reproduced in full here: "Data sources: The Provincial Surveillance Information system (PSI) is a laboratory surveillance system which receives positive results for all Notifiable Diseases and diseases under laboratory surveillance from Alberta Precision Labs (APL). The system also receives negative results for a subset of organisms such as COVID-19. … Disclaimer: The content and format of this report are subject to change. Cases are under investigation and numbers may fluctuate as cases are resolved. Data included in the interactive data application are up-to-date as of midday of the date of posting." The data were digitized as shown in Table 1 . Counts from May 28, 2020 were excluded because data notes state that "current day is partial." Dates are the "date reported to Alberta Health." For each observed count of incident cases, we estimated true counts as (observed count)/Sn under the assumption that specificity is indistinguishable from perfect. Because values of Sn are not estimated or known, we simulated various values of Sn as arising from Beta distribution, ranging in means from 0.60 to 0.95 and fixed standard deviation of 0.05. The results of 10 simulation realizations for each scenario are plotted in Fig. 1 . We assumed that Sn either improved (realistic best case: green) or degraded (pessimistic worst case: brown). Our analysis only reflects systematic errors for illustrative purposes and under the common assumptions (and experience) that they dwarf random errors. Calculations were performed in R (R Core Team 2019) and code is available via Burstyn et al. (2020) . Figure 1 indicates how, as expected, if sensitivity improves over time with the natural refinement of laboratory procedures (green lines), then the true epidemic curve is expected to be flatter than the observed. It also appears that observed and true curves may well fall within the range of 95% confidence intervals around the observed counts (blue lines). If sensitivity decreases over time (grey lines) (e.g., due to increased pressures of work, scaling up of the analyses with less experienced staff), then the true epidemic curve is expected to be steeper than the observed. In either scenario, there can be an under-counting of cases by nearly a factor of two, most apparent as the incidence grows, such that on day April 23, 2020, there may have been almost 500 cases vs. 336 observed. Even at the tail end of the epidemic, on May 27, 2020, our simulation predicts that there may have been as many as 45 to 50 cases vs. 27 observed. This is alarming, because misdiagnosed patients who have not self-isolated can spread infection unimpeded and it is impossible to know who they are among symptomatic persons tested negative around that time per day. Alberta initially restricted COVID-19 testing to "any person exhibiting any symptom of COVID-19, all close contacts of confirmed COVID-19 cases, all workers and/or residents at specific outbreak sites, all workers and residents at long-term care and level 4 supportive living facilities … , all patients admitted to continuing care or transferred between continuing care and hospital settings" (Government of Alberta 2020). Currently, testing is available to any person who wishes to be tested (Government of Alberta 2020). Consequently, the sensitivity of surveillance has likely improved over time, with numbers of test-positive persons more closely resembling occurrence of the disease in the population. However, as the capture for test of cases of truly infected improves, the sensitivity of laboratory tests, which is the focus of our work, may or may not have improved, leaving bias of unknown magnitude in the surveillance data. Observed (C*) and 95%CI Sn increases from 60, 75, 95% at break−points Sn decreases from 95, 75, 60% at break−points break−points Fig. 1 Uncertainty in the epidemic curve of COVID-19 on May 28, 2020 in Alberta, Canada, due to imperfect sensitivity (Sn) that varies in time; standard deviation of Sn = 5%; assumes specificity 100% (10 simulation realizations plotted) We conclude that it is of paramount importance to validate laboratory tests and to share this knowledge, especially as the epidemic matures into its full force. In absence of such knowledge, the observed epidemic curves can be dangerously misleading for surveillance purposes. Correlation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases A validation study comparing the sensitivity and specificity of the new Dr. KSU H1N1 RT-PCR kit with real-time RT-PCR for diagnosing influenza A (H1N1) Towards reduction in bias in epidemic curves due to outcome misclassification through Bayesian analysis of time-series of laboratory test results: case study of COVID-19 in Alberta R: A language and environment for statistical computing Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR NUS Saw Swee Hock School of Public Health Sensitivity of chest CT for COVID-19: comparison to RT-PCR On the importance of early testing even when imperfect in a pandemic such as COVID-19 Diagnostic performance between CT and initial real-time RT-PCR for clinically suspected 2019 coronavirus disease (COVID-19) patients outside Wuhan, China Rapid establishment of laboratory diagnostics for the novel coronavirus SARS-CoV-2 in Bavaria Accelerated emergency use authorization (EUA) summary COVID-19 RT-PCR test (Laboratory Corporation of America Acknowledgements The authors thank Isaac R. Burstyn and Marguerite R. Burstyn for their diligence in perfect extraction of Alberta data from the online charts. The authors thank Jean-Michel Galarneau for translating the abstract into French.Funding information Research reported in this publication was partially supported by the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under Award Number K01AI143356 (to NDG). Conflict of interest The authors declare that they have no conflict of interest.