key: cord-0732046-ffaoyu4u authors: Albani, V. V. L.; Loria, J.; Massad, E.; Zubelli, J. P. title: COVID-19 Underreporting and its Impact on Vaccination Strategies date: 2021-03-15 journal: nan DOI: 10.1101/2021.03.11.21253404 sha: 4010799f3b53d679dfe104fd1766c7635641646a doc_id: 732046 cord_uid: ffaoyu4u We present a novel methodology for the stable rate estimation of hospitalization and death related to the Corona Virus Disease 2019 (COVID-19) using publicly available reports from various distinct communities. These rates are then used to estimate underreported infections on the corresponding areas by making use of reported daily hospitalizations and deaths. The impact of underreporting infections on vaccination strategies is estimated under different disease-transmission scenarios using a Susceptible-Exposed-Infective-Removed-like (SEIR) epidemiological model. Underreporting cases of infectious diseases poses a major challenge in the analysis of their epidemiological characteristics and dynamical aspects. Without accurate numerical estimates it is difficult to precisely quantify the proportions of severe and critical cases, as well as the mortality rate (1). Such estimates can be provided, e.g., by testing the presence of the virus. However, during an ongoing epidemic, such tests' implementation is a daunting task. Thus, this work presents a methodology to estimate underreported infections based on approximations of the stable rates of hospitalization and death. In order to find such rates, we seek specific time periods when the daily rate of testing is sufficiently large with respect to the population size, and the number of positive tests is small enough. During such periods we evaluate daily empirical rates of hospitalization and death, looking for those whose rates fluctuate around some mean value. This is performed by means of an accurate data analysis producing different statistical indicators leading to the necessary correction. A schematic representation that summarizes the proposed methodology can be found in Figure 1 . Since COVID-19 severity strongly depends on age and gender (2) (3) (4) (5) (6) , we evaluate the abovementioned rates accounting for demography to improve the estimation accuracy of the number of infections. The latter will be called corrections. These corrections are evaluated using the empirical rates of hospitalization and death as follows: For an observed rate of hospitalization or death, and a given day in the time series, we evaluate the corresponding infection number. For example, if for this day the reported hospitalization rate is one and the projected rate is one half, then, the correction is twice the reported infections. As an important byproduct, we evaluate the impact of underreporting in the designing of vaccination strategies because the larger the number of unaccounted infections, the larger the chances of vaccinating an already immune individual. This can restrict the capability of . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 15, 2021. ; https://doi.org/10.1101/2021.03.11.21253404 doi: medRxiv preprint vaccination in reducing hospitalizations and deaths, as simulated scenarios using a Susceptible-Exposed-Infective-Removed-like (SEIR-like) model (7) show. In order to estimate underreported infections, the formulas in Eq. (S.2) are used, considering the daily cases of COVID-19. The graphical comparison between the observed and corrected numbers of infections for Chicago can be found in Figure 2 . Table 1 presents the corrected and observed accumulated numbers of COVID-19 infections in Chicago, during the period 01Mar-2020 to 23-Dec-2020. In order to observe the effect of corrections, we divided the period 01-Mar-2020 to 23-Dec-2020 into three periods, namely, 01-Mar-2020 to 31-July-2020, 01Aug-2020 to 05-Oct-2020, and 06-Oct-2020 to 23-Dec-2020. Additional results considering the data from other places can be found in the Supplementary Materials, as well as the details on the implementation of the techniques used in this work. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 15, 2021. ; https://doi.org/10.1101/2021.03.11.21253404 doi: medRxiv preprint Corrections using hospitalization rates present smaller values than the ones obtained with death rates. This can be explained by the considerably larger values of the death rate in hospital observed during the outbreaks of March to May and of October to November. The estimated numbers for 01-Mar-2020 to 31-July-2020 are larger than the ones estimated for other periods, indicating that underreport can be more likely in the beginning of the pandemic. Corrections suggest that, for 01-Mar-2020 to 31-July-2020, the number of infections can be 32% to 632% larger. For 01-Mar-2020 to 23-Dec-2020, COVID-19 infections can be 10% to 238% larger. Thus, from 8% to 25% of the population of Chicago could have being infected in the study period, instead of the observed proportion of 7.3%. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 15, 2021. ; https://doi.org/10.1101/2021.03.11.21253404 doi: medRxiv preprint The datasets from NYC do not have daily reports by age range or gender. We considered two different periods to estimate the stable rates of hospitalization and deaths and corrected infections can be found in Table S .3, representing 7.5% to 30% of the NYC population, instead of the observed proportion of 4.41%. For BA, unfortunately, during the period of study the percentage of positive tests was mostly above 10%, making difficult the empirical analysis. However, we consider the period when the positive rate was below 20%. Table S.4 presents the estimated rates of hospitalization and death. Death rates for individuals younger than 60 years old are like the corresponding rates observed . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 15, 2021. ; https://doi.org/10.1101/2021.03.11.21253404 doi: medRxiv preprint in Chicago. However, for older individuals in BA, the death rates are considerably larger. Corrections from Table S.5 suggest infection numbers varying from 3.4% to 303% larger than the notified cases, representing 4.7% to 18% of the estimated BA population for 2020, instead of the reported 4.53%. For MC we could not identify a period when the rates of death or hospitalization stabilized around mean values. Thus, we used the rates estimated for Chicago to provide corrections. Using the death rates by age-range from Chicago seems to be the more accurate way to estimate underreported cases in other places, since the data from Chicago satisfied the hypotheses made to find stable rates. Corrections are 44% to 681% larger than the observed cases, representing 5.5% to 30% of the estimated population of MC for 2015. Let us now turn to the impact of underreporting on the capacity of vaccination strategies in reducing hospitalizations and deaths. We consider two different scenarios. In the first one, In both cases we assume that the proportion of the population in the recovered, exposed or in some infective compartment in the model in Eqs. (S.3)-(S.11), ranges from 5% to 30%. Moreover, only the amount of 5% is observed in all cases. This means that the probability of vaccinating someone that has already had contact with the virus is proportional to the percentage of the population distributed in the exposed, non-hospitalized and infective, and recovered compartments that were not included in the reports. Thus, in our simulations if 5% of the population was infected, then 100% of the vaccinated individuals were susceptible, whereas, if 30% of the population was infected, then only 73.4% of the vaccinated individuals were susceptible. We also assume that the vaccine is 90% effective, and 0.5% of the population is vaccinated every day, for 150 days. The hospitalization rate also decreased proportionally to the number of underreports. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 15, 2021. ; https://doi.org/10.1101/2021.03.11.21253404 doi: medRxiv preprint Under contained spread, the transmission parameter amongst mildly infective individuals is set to βM = 0.23. Under uncontained transmission, the parameter βM is set to 0.44. The resulting accumulated numbers during the vaccination strategy, in both situations, can be found in Table 2 . The assumed size of this hypothetical population is of 2,693,976 individuals. In Table 2 is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 15, 2021. ; https://doi.org/10.1101/2021.03.11.21253404 doi: medRxiv preprint and deaths can decrease, indicating the achievement of herd immunity. Therefore, estimating underreporting helps to quantify and explain possible limitations of vaccination strategies. This work proposes possible ways to estimate underreported COVID-19 infections, based on daily reported of cases, hospitalizations, and deaths, considering demography. The proposed methodology of correction is then applied to data from Chicago, NYC, BA, and MC. Moreover, it estimates the potential impact of underreporting in vaccination strategies by using an SEIRlike model with parameters estimated from real data. Estimating underreporting in an ongoing epidemic is a hard task, and only a seroprevalence study can address this task appropriately. However, if we can estimate the stable rates of hospitalization and death related to the disease, then we can use reports to estimate the correct number of infections. The major difficulty of this approach is to identify the period when these rates can be observed or approximated. Firstly, we assume that the number of tests performed daily must be sufficiently large, then the number of positive tests must be sufficiently small. Setting up this is subtle, and we must compare the data from different places. For Chicago and NYC, we set that the rate of positive tests must be below 10% and for BA, it was 20%, since we identified, in the corresponding periods, a stabilization of the rates around mean values. For MC, we could not find such period. For Chicago, NYC, and MC, during the period of study, corrections suggest that the number of infected individuals could reach 30% of the population of these places, which represents, in some cases, more than six times the reported numbers. Such estimates must be considered when evaluating the aftermath of vaccination strategies, since underreporting, as illustrated by numerical examples, can reduce the impact of vaccination in reducing mortality and hospitalization rates. Estimating underreports can be useful, for example, to adjust the daily numbers of given vaccines in order to reach the target of reducing the numbers of infections, hospitalizations, and deaths. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 15, 2021. ; https://doi.org/10.1101/2021.03.11.21253404 doi: medRxiv preprint Using age-dependent death rates seems to be a reliable way of estimating underreporting, since such rates can be used even if the age pattern of the infected population changes during the epidemic. Thus, we expect that the more demographic information we incorporate into the death rates, the more reliable are the corrections. In summary, using the methodology described in Figure 1 and employing a judiciously chosen data analysis implementation, we estimate COVID-19 underreporting from publicly available data. This leads to a powerful way of quantifying underreporting impact on the efficacy of vaccination strategies. The data that support the findings of this study are available from the following publicly sources: data.cityofchicago.org (Chicago), www1.nyc.gov (NYC), datos.cdmx.gob.mx (BA), and datos.cdmx.gob.mx (MC). The numerical scripts used to generate corrections and to simulated scenarios can be found in the GitHub repository github.com/JennySorio/Under Reporting. References (9-16) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 15, 2021. ; https://doi.org/10.1101/2021.03.11.21253404 doi: medRxiv preprint COVID-19 Response Team Modeling Infectious Diseases in Humans and Animals Estimating, Monitoring, and Forecasting the COVID-19 Epidemics: A Spatio-Temporal Approach Applied to NYC Data