key: cord-0867873-bpvhtszb authors: Garcia-Garcia, D.; Vigo, M. I.; Fonfria, E. S.; Herrador, Z.; Navarro, M.; Bordehore, C. title: Retrospective Methodology to Estimate Daily Infections from Deaths (REMEDID) in COVID-19: the Spain case study date: 2020-06-23 journal: nan DOI: 10.1101/2020.06.22.20136960 sha: 43cef94529c3b9809794f09e98a13a50296c0bb4 doc_id: 867873 cord_uid: bpvhtszb The number of new daily infections is one of the main parameters to understand the dynamics of an epidemic. During the COVID-19 pandemic in 2020, however, such information has been underestimated. Here, we propose a retrospective methodology to estimate daily infections from daily deaths, because those are usually more accurately documented. The methodology is applied to Spain and its 19 administrative regions. Our results showed that probable infections were between 34 and 42 times more than the official ones on 14 March, when national government decreed the national lockdown. The latter had a strong effect on the growth rate of virus transmission, which began to decrease immediately. Finally, the first infection in Spain may have occurred on 11 January 2020, around 40 days before it was officially reported. In summary, we state that our methodology is adequate to reinterpret official daily infections, being more accurate in magnitude and dates . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 1. Introduction 50 51 The key parameter to understand and model the evolution of the COVID-19 pandemic 52 4 because some Spanish regions reviewed and modified the cases in previous days 8, 9 . This 75 inconsistency of time series hampers any kind of accurate analysis. 76 77 High fidelity time series of each parameter of an epidemic are crucial to run reliable 78 epidemiological models. The number of new infections per unit time is one of the 79 essential parameters. The official data do not reflect the actual day of infection nor the 80 real number of infections. To overcome this limitation, we propose a retrospective 81 methodology to infer daily infections from daily deaths, believing the number of deaths 82 to be a more reliable parameter than the official number of daily infections 1 . These 83 estimated infections are closer to the date of infections, avoiding delays from the 84 incubation period and symptom onset to diagnosis. Then, these time series of daily 85 infections would help to better understand the pandemic dynamics and to quantify the 86 effectiveness of the different containment strategies, avoiding the bias of the official 87 data. In addition, those numbers can be used to feed epidemiological models with more 88 realistic data. Finally, we will apply this methodology in Spain as a whole and each of 89 its 19 administrative regions (17 autonomous communities and 2 autonomous cities). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 5 individual may be estimated from the death date (DD) by subtracting the incubation 99 period (IP) and illness onset to death (IOD) periods, hence 100 101 DI + IP + IOD = DD. 102 103 Therefore, DI can be estimated from DD as far as IP and DD are known; however, 104 neither IP nor IOD are fixed values. On the contrary, they are random variables that can 105 be approximated by probability distributions. From dozens of cases in Wuhan, Linton et 106 al. 10 approximated IP for COVID-19 with a lognormal distribution, X IP , with a mean of 107 5.6 days and median of 5 days, and IOD with a lognormal distribution, X IOD , with a 108 mean of 14.5 days and median of 13.2 days. Therefore, the infection to death period is a 109 random variable, X IP+IOD , that follows the distribution X IP + X IOD . 110 111 Although the addition of two lognormal distributions does not follow any commonly 112 used probability distribution, its probability density function (PDF) can be estimated 113 convolving the PDF of the two variables 11 . If g(t) and h(t) are the PDF of IP and IOD, 114 respectively, then their convolution defines the PDF of X IP+IOD , 115 where t 0 is a positive real number representing days from infection. f(t) shows a mean of 117 20.1 days, and a median of 18.8 days. Figure 1 shows g(t), h(t), and f(t), as well as a 118 lognormal PDF with the same mean and median as f(t) for comparison. Note that the 119 probability that X IP+IOD is less than or equal to 33 days is 0.95. 120 121 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 6 [ Figure 1 ] 122 123 Given a time series of deaths produced by the illness, x(t), we estimate the infection 124 time series that produced such deaths, y(t). If we assume a case fatality ratio (CFR) of 125 100%, y(t) would represent all daily infections. To calculate y(t), we use the following 126 likelihood-based estimation procedure. 127 128 Because the relative likelihood that a given infection will produce a death t days later is 129 f(t), the infections at a given time t 0 can be estimated as 130 In practice, x(t) is a discrete time series and could be written as x(n), where n is an 133 integer representing entire days. Let F(n) be a discrete approximation to f(t) as follows: 134 Then, 137 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 China/situacionActual.htm). In those reports, daily data were replaced by deaths during 163 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 8 the week after 23 May. From that date, we obtained daily infections by dividing the 164 weekly infections by 7 and used the 7-day running means. Those data will not show 165 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 The COVID-19 official deaths and MoMo ED time series overlap for the period from 4 214 March to 1 June 2020 for Spain and its 19 regions ( Figure 2 Infections associated with the COVID-19 deaths were calculated from Equation 1 and 232 shown in Figure 3 . By definition of Equation 1, y(n) represents all daily infections 233 assuming CFR= 100%. Note that the obtained time series is 33 days shorter than death 234 time series. We did this to ensure that all infections were estimated with more than 95% 235 of their associated deaths (see Section 2). The estimated infections in Spain reached a 236 maximum on 14 March 2020 (Table 2) , the day that the national government decreed 237 the state of emergency and national lockdown. It means that the measures adopted had 238 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 an immediate effect, but it was not officially observed until the maximum of recorded 239 infections was reached on 30 March 16 days later (estimated from the 14-day running 240 mean time series). This delay is longer than the expected ~11 days from adding the 241 mean incubation period of 5.78 days 2 , and the illness onset to diagnosis mean period of 242 5.2 days 14, 15 . On the other hand, the maximum number of deaths was reached on 2 243 April, 19 days after the inferred infection maximum. This delay is similar to the 20 days 244 expected from infection to death (Figures 1 and 3) . March, when the state of emergency and lockdown were enforced. On that date, official 254 daily infections were 1,832; however, estimated daily infections were 62,860 (CI 95%: 255 59,088-67,888), 34 times more (Table 3) . Thus, the documented daily infections were 256 only 2.9% (CI 95%: 2.7%-3.1%) of estimated infections. Officially there were 223,054 257 accumulated infections before 28 April (last day of estimated daily infections from 258 MoMo ED, see next section), but with the assumed CFR there were 2,405,617 total 259 cases (CI 95%: 2,598,063-2,261,274). Note that this value is similar to the 2.35 million 260 infections estimated by the first phase of the National Seroprevalence Study (ISCIII 261 2020) for the period 27 April -11 May. This 2.40 million is 11 times larger than the 262 official data, suggesting that only 7.8% (CI 95%: 7.2%-8.2%) of the probable infections 263 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 were detected. Although detection improved from 14 March to 28 April, it was still far 264 below our estimation of the probable infectee population. 265 266 Almost all regions showed a delay of 1 month or more between the first estimated case 267 and the first documented case. The exceptions were Islas Baleares, and Canarias, with 268 shorter delays of 25 and 28 days, respectively ( Table 2 ). Note that these regions are the 269 most isolated in Spain. Although Canarias and Islas Baleares were the first regions 270 reporting cases, their estimated infections ranked 10 and 14, respectively. In contrast, 271 Madrid, Cataluña, and País Vasco were the 4 th , 5 th , and 8 th regions reporting their first 272 case, but their estimated infections ranked as the 1 st , 2 nd , and 3 rd regions, respectively. 273 Madrid reached the maximum on 11 March when the educational centres were shut 275 down. The last region reaching the peak was Asturias. All regions showed one 276 maximum, except for Galicia and Navarra with two maxima each. In Galicia, the 277 second maximum was lower than the first and reached on 11 April, 24 days after the 278 first one. In Navarra, the second maximum was larger than the first and reached on 31 279 March, 16 days after the first one. Islas Baleares and Canarias did not show a second 280 maximum, but had inflection points 9 and 15 days after their maxima, respectively. 281 282 On 14 March, Asturias had detected 12.1% (CI 95%: 8.7%-16.7%) of estimated daily 283 infections. The other regions, however, detected only 1.1 to 7.9% of the probable 284 infections ( Table 3 ). The accumulated infections in the regions before 28 April were 285 similar to the entire country, which generally under-detected infections (Table 4) . 286 Excluding Melilla (57.0%) and Ceuta (27.0%), all regions detected between 6.3% and 287 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020. If we assume that MoMo ED accounts for both recorded and non-recorded COVID-19 299 deaths, negative deaths are meaningless and they were set to zero. Then, the associated 300 daily infections can be estimated, as in Section 3.2, with a CFR of 100% from MoMo 301 ED for Spain ( Figure 5 ). Note three main differences between this time series and that 302 estimated from official COVID-19 deaths: (1) time series estimated from MoMo ED 303 was 7 days shorter due to the updating delay on MoMo statistics; (2) MoMo data 304 present an error band that was inherited by the estimated infections; (3) MoMo ED 305 estimated infections reached a maximum of 1,428 (CI 99%: 1,322-1,529), doubling the 306 720 inferred daily infections from official COVID-19 deaths in Figure 3 . This is 307 because maximum MoMo ED was 1,569 (CI 99%: 1,462-1,668%) and maximum 308 COVID-19 official deaths was 929, both estimated from the 14-day running mean time 309 series. The maximum of inferred infections was reached on 13 March, just one day prior 310 to the state of emergency and lockdown. The expected and observed delays with respect 311 to official infections and MoMo ED were similar to those observed for estimated 312 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 Table 2 shows the estimated date of first infection for Spain and by region. Note that the 337 first cases estimated from MoMo ED in Spain were on 11 January and in Madrid on 10 338 January, which is possible because significant excess deaths in a region may not 339 become significant for the whole country. In general, the maxima of daily infections 340 were closer to 14 March when they are inferred from MoMo ED than from official 341 COVID-19 deaths. All regions showed a unique maximum, except for Asturias, Islas 342 Baleares, and Ceuta, which showed two maxima. Also note that those regions were 343 different from those with two maxima in Figure 4 . Daily infections on 14 March were 344 comparable to those estimated from official COVID-19 deaths. The CI of MoMo 345 estimated infections is so conservative, that they contained those from official COVID-346 19 estimated infections, except for Castilla -La Mancha, Cataluña, and Madrid (Table 347 3). Such differences were expected because those three regions presented significant 348 differences in accumulated deaths between MoMo ED and official COVID-19 deaths 349 before 14 May (Table 1) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 This misreporting is mainly due to: (1) the lack of testing for asymptomatic and mild 362 symptomatic people 3 , which could have been detected with a "test, track and trace" 363 strategy, as done in South Korea, China, and Singapore 16 ; (2) (2) Time series of daily deaths. We use two sources, the official COVID-19 death 385 figures from the CCAES and the MoMo ED. We proceed in this way because official 386 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020. sources of mortality is recommended. In any case, it is a valuable source of data that 409 must be explored considering the huge bias in official data of daily infections. 410 411 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020. April CFRs of 0.56%, 7.18%, and 13.53% were reported for Iceland, the world, and 423 Italy, respectively 24 . In addition, we assumed a constant CFR throughout the study, 424 although introduction time dependence on the CFR would enhance the infectee 425 estimates. A constant CFR does not consider who is affected by saturation of the health 426 systems (due to limited human and material resources). Moreover, CFR among severe 427 cases (mostly detected) is much greater than that among asymptomatic and mild cases 428 (mostly undetected). Consequently, the CFR calculation has a high degree of 429 uncertainty. Because information on the CFRs will improve, calculations in this study 430 can be redone easily using the MATLAB code provided as Supplementary Material. 431 We ran the REMEDID algorithm to provide the estimated time series of infections for 433 Spain and for its 19 regions for the period from 10 January to 28 April 2020. These time 434 series provide valuable information to understand the time evolution of the pandemic. 435 Our main findings for Spain are: 436 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 Madrid, the estimated infections reached their maxima on 11 March, when regional 458 government warned the population to stay at home, and schools and universities were 459 closed, forcing 1.2 million students to stay at home. Moreover, overall 460 recommendations on disease control and social distancing were given by the Ministry of 461 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 Knowledge of the real epidemic dynamics of different population nodes is key to 475 succeed in modelling attempts, because we could calculate the sole effect of group 476 size 25 . This spatially-explicit information, in combination with population size per node 477 and mobility would allow us to use a metapopulation approach in future models [26] [27] [28] . 478 479 Among the methodology limitations, the most important is that it can only be 480 implemented retrospectively. Thus, these estimates cannot be used to control the 481 pandemic in real time. But considering that different regions or countries are at different 482 stages at the same time, results of this methodology for the first communities could be 483 applied elsewhere. Furthermore, it is useful to enhance models and improve our 484 knowledge of the pandemic dynamics, including the effectiveness of the different 485 measures adopted to flatten the curve and design safe post-lockdown measures. In 486 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 addition, more realistic and accurate models could be obtained by use of the more 487 realistic daily number of infections provided by REMEDID, which, in turn, would 488 improve the models outcome and enhance comparisons of different lockdown and post-489 lockdown measures 29 . 490 We believe that REMEDID methodology applied to daily COVID-19 deaths (if 492 accurately reported) or to MoMo ED could be useful to analyse the dynamics of the 493 pandemic retrospectively and more accurately quantify the real daily infections with 494 respect to the official numbers. We only need the CFR and, for greater precision, the 495 PDF of Infections to Death or, alternatively, the PDF of Incubation Period and Illness 496 Onset to Death. This approach could be implemented anywhere, improving our 497 understanding of the dynamics of the pandemic, and the effectiveness of the 498 confinement measures. This is of high importance to prepare strategies to face 499 successfully and reduce the effects of future epidemic episodes. 500 501 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 24 11. Uspensky, J. V. Introduction to mathematical probability. Ed. McGraw-Hill Book 551 Company, New York and London (1937) . is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 June 2020 at https://www.rtve.es/noticias/20200529/radiografia-del-coronavirus-575 residencias-ancianos-espana/2011609.shtml (RTVE, 2020 CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 Centro de Coordinación de Alertas y Emergencias Sanitarias China/documentos/Actualizacion_90_COVID-19.pdf (CCAES, 2020e) Our World in Data Epidemiological effects of group size 609 variation in social species Some demographic and genetic consequences of environmental 613 heterogeneity for biological control Canarias 1,168 (714; 1817) 2,216 (141 730) 2,831 (1,368; 4,690) 109 the author/funder, who has granted medRxiv a license to display the preprint in perpetuity Cantabria 18895 005) 442,121 (312,385; the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted 387) 4,553 11.5 (10.6; the author/funder, who has granted medRxiv a license to display the preprint in perpetuity . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 34 The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101 43 768 769 770 771 772 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 23, 2020 . . https://doi.org/10.1101