key: cord-0269184-tpdinhy1
authors: Bhatia, S.; Parag, K. V.; Wardle, J.; Imai, N.; Elsland, S. L. V.; Lassmann, B.; Cuomo-Dannenburg, G.; Jauneikaite, E.; Unwin, H. J. T.; Riley, S.; Ferguson, N.; Donnelly, C. A.; Cori, A.; Nouvellet, P.
title: Global predictions of short- to medium-term COVID-19 transmission trends : a retrospective assessment
date: 2021-07-22
journal: nan
DOI: 10.1101/2021.07.19.21260746
sha: d1bb5f9cb65156b27f8ae5f9edeb47dfb15f8bde
doc_id: 269184
cord_uid: tpdinhy1

Background: As of July 2021, more than 180,000,000 cases of COVID-19 have been reported across the world, with more than 4 million deaths. Mathematical modelling and forecasting efforts have been widely used to inform policy-making and to create situational awareness. Methods and Findings: From 8th March to 29th November 2020, we produced weekly estimates of SARS-CoV-2 transmissibility and forecasts of deaths due to COVID-19 for countries with evidence of sustained transmission. The estimates and forecasts were based on an ensemble model comprising of three models that were calibrated using only the reported number of COVID-19 cases and deaths in each country. We also developed a novel heuristic to combine weekly estimates of transmissibility and potential changes in population immunity due to infection to produce forecasts over a 4-week horizon. We evaluated the robustness of the forecasts using relative error, coverage probability, and comparisons with null models. Conclusions: During the 39-week period covered by this study, we produced short- and medium-term forecasts for 81 countries. Both the short- and medium-term forecasts captured well the epidemic trajectory across different waves of COVID-19 infections with small relative errors over the forecast horizon. The model was well calibrated with 56.3% and 45.6% of the observations lying in the 50% Credible Interval in 1-week and 4-week ahead forecasts respectively. We could accurately characterise the overall phase of the epidemic up to 4-weeks ahead in 84.9% of country-days. The medium-term forecasts can be used in conjunction with the short-term forecasts of COVID-19 mortality as a useful planning tool as countries continue to relax stringent public health measures that were implemented to contain the pandemic.

As of July 2021, more than 4 million deaths have been attributed to COVID-19 with 39 over 180 million cases reported globally [1] . The scale of the current pandemic has 40 led to a widespread adoption of data-driven public health responses across the globe. 41 Epidemiological quantities such as the reproduction number and the herd immunity 42 threshold have become a part of the public discourse, used by governments to plan 43 their response and by the media to aid public understanding of the health emergency. 44 Outbreak analysis and real-time modelling, including short-term forecasts of future 45 incidence, have been used to inform decision making and response efforts in several 46 past public health challenges including the West African Ebola epidemic and seasonal 47 Methods for estimating transmissibility during epidemics typically rely on the time series 91 of incident cases combined with the natural history parameters of the pathogen [32, 33] . 92 However, in the current pandemic, interpretation and comparison of estimates across 93 countries based on the number of cases was made difficult by the differences in case 94 definitions, testing regimes, and variable reporting across countries as well as over time 95 within each country [34] . We therefore developed methods that relied on the number 96 of reported deaths to estimate COVID-19 transmissibility and to produce short-term 97 forecasts of deaths. Since we use deaths to estimate transmissibility, our estimates reflect 98 the epidemiological situation with a delay corresponding to the delay from infection to 99 death [35] . Despite this, our estimates and the short-term forecasts contributed and 100 continue to contribute to situational awareness by providing near real-time insights into 101 the dynamics of the pandemic. They also provide useful, albeit indirect, evidence into 102 the effectiveness of various interventions such as lockdowns and the impact of reopening. 103 The instantaneous reproduction number is frequently used to quantify transmissibility. 104 It is defined as the average number of secondary cases that an individual infected at 105 time t would generate if conditions remained as they were at time t [36]. When applied 106 to the incidence of deaths (rather than cases), the instantaneous reproduction number 107 R D t represents the average number of secondary deaths "generated by" the death of a 108 primary case at time t. We developed three different models, each of which estimated 109 transmissibility in the recent past and produced forecasts of COVID-19 deaths. We then 110 incorporated the outputs of these models to build an ensemble model. We produced 111 short-term forecasts (i.e. 1-week ahead), for which changes in the population immunity 112 level could be ignored. We also produced medium-term forecasts (up to 4-weeks ahead) 113

accounting for the depletion of the susceptible population due to the increased levels of 114 natural host immunity. The methods underlying the individual models are illustrated in 115 Hereafter, D t and C t represent the number of reported COVID-19 deaths and cases 117 at time t respectively. Since we only used reported deaths to estimate transmissibility, 118

for ease of notation, we drop the superscript D from R D t and use R t to denote the 119 instantaneous reproduction number with respect to deaths at time t. R[t 1 , t 2 ] is the 120 

where D t = {D 1 , D 2 , . . . D T −τ } and D t = D t for t = T − τ + 1, . . . , T . The 142

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) reproduction numbers (R[T − τ + 1, T ]) from the joint posterior distribution obtained in 145 the estimation process, and projected future incidence D T +i for i ≥ 1 conditional on 146 these as follows:

where D t = D t for t = T − τ + 1, . . . , T . During the period covered in the analysis, 148 the epidemiological situation in most countries was changing rapidly with public health 149 measures being reviewed weekly. At the same time, there was a strong 'weekend effect' 150 in the observed data, with typically fewer deaths reported on Saturdays and Sundays. 151 We therefore assumed a fixed calibration window of 10 days to incorporate the rapid 152 dynamics and offset the lower reporting over the weekend. We ran the MCMC for 10000 153 iterations. We sampled 1000 sets of R curr T and back-calculated incidence, and for each 154 sampled set, we drew 10 stochastic realisations of the projected incidence of deaths.

Model 2 (APEestim) 156 Similarly to Model 1, Model 2 relies on the renewal equation (Eq. (1)) but uses the full 157 time series of observed deaths, and uses information theory to optimise the choice of the 158 calibration window i.e. the time-window of size τ over which R[T − τ + 1, T ] is assumed 159 to be constant in the estimation process [38] . Choices of window size can influence the 160 bias and variance of resulting estimates of transmissibility [39] . We integrated over the 161 entire posterior distribution of R t (under a given window size), to obtain the posterior 162 predictive distribution of incidence at time t + 1 as

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) where R[t − τ + 1, t] represents the posterior distribution of R t assuming a window 165 of length τ . We computed this distribution sequentially for t ∈ {1, 2, . . . T − 1} and then 166 evaluated every observed count of deaths according to their likelihood under the posterior 167 predictive distribution. This allowed us to construct the accumulated predictive error 168 (APE) for a window length τ and under a given serial interval distribution as [38] :

Here, D t+1 is the observed number of deaths at time t+1. The optimal window length 170 τ * was then chosen as the window for which AP E τ is minimised (Fig. 1b) , optimising 171 the bias-variance trade-off (long windows reduce the estimate variance but increase bias 172 and short windows do the converse).

Again, forward projections were made assuming that transmissibility over the projec-174 tion horizon remains the same as that in the last τ * days. That is, R curr T is set to be 175

We then obtain forecasts of deaths as

for i ≥ 1. We drew 1000 samples from the posterior distribution of R curr T and for 177 each sampled value, simulated 10 forward trajectories.

Model 3 (DeCa) 179 Models 1 (RtI0) and 2 (APEestim) use only the time series of deaths to estimate R t . 180

Model 3 exploits the signal from both the reported deaths and cases to forecast deaths. 181 We assumed that the delay δ between a case being reported and the case dying (for those 182 who die) is distributed according to a gamma distribution with mean µ and standard 183 deviation σ. Let f Γ be the probability mass function of a discretised gamma distribution. 184

The cumulative number of reported cases at time t weighted by the delay distribution 185 from case report to death, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted July 22, 2021. ; https://doi.org/10.1101/2021.07.19.21260746 doi: medRxiv preprint quantity at time t can be thought of as an observed case fatality ratio (Fig. 1c) . We 188 assume that deaths are distributed according to a binomial distribution:

The model likelihood is given by

We obtained a posterior distribution for ρ 1 , ρ 2 , . . . , ρ T using the conjugate beta 191 prior for ρ t (using the R package binom [40] ), assuming that parameters of the delay 192 distribution µ and σ are known and fixed. The forecasted number of deaths at time 193 T + i for i ≥ 1 were then drawn from a binomial distribution as We drew 10000 samples from the posterior distribution of ρ T and 10000 samples 204 from a gamma distribution to augment the observed cases. We then drew 10000 samples 205

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 22, 2021. ; https://doi.org/10.1101/2021.07.19.21260746 doi: medRxiv preprint from a binomial distribution (Eq. (9)) for each pair of augmented cases trajectory and 206 sampled ρ T .

Ensemble Model 208 We combined the estimates of R curr T , and the outputs of models RtI0, APEestim, and 209

DeCa into an unweighted ensemble model by sampling the forecasts and reproduction 210 number from each model described above. We also explored building a weighted ensemble 211 by weighting the contribution of each model according to the relative error of the model 212 in the previous week, all previous weeks, across all countries, or estimating the weights 213 independently for each country. We did not find any substantial difference in the 214 performance of the unweighted and weighted ensembles (not shown here). We therefore 215 restricted our analyses to an unweighted ensemble model. The short-term forecast horizon was set to be 1 week. We produced forecasts for the 221 week ahead (Monday to Sunday) using the latest data up to (and including) Sunday. 222 We did not model the potential changes in the population immunity levels as any such 223 change is not expected to affect the trajectory of the epidemic over this short time 224 horizon.

Medium-term forecasts 226 Over the course of the epidemic, the effect of the potential depletion of the susceptible 227 population on the trajectory of the epidemic may become more pronounced. Inherently, 228 by estimating transmissibility in real-time, the models outlined above account for any 229 general decrease in the proportion of population being susceptible. However, the forecasts 230 produced do not account for any further decrease in this proportion, which may become 231 substantial when forecasting over a medium-to long-term time horizon. 232 9/32 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 22, 2021. those weeks, with a probability that decays exponentially in the past to favour the more 241 recent estimates. That is, the distribution of R curr T −7k contributed to R w T with a relative 242 weight proportional to e −βk (Fig. 1d ). Each week, the rate of decay β was optimised by 243 minimising the relative error in the predictions for the previous week.

Accounting for depletion of the susceptible population due to naturally-245 acquired immunity 246 As the weighted reproduction number R w t already accounts for the population immunity 247 at time t, we first estimated an effective reproduction number R ef f t , defined as the 248 reproduction number if the entire population were susceptible. That is,

where R w t is the weighted reproduction number at time t and p S t is the proportion of 250 population that is susceptible to infection at time t. p S t is given by 1 − t j=0 I j /N where 251 I j is the number of infections at time j and N is the total population. In estimating 252 the potential future population immunity using this formulation, we only accounted 253 for naturally acquired immunity assuming that the immunity acquired after infection 254

persists. Since we were forecasting deaths (rather than infections), the true number 255 of infections was estimated using a country-specific age-distribution weighted Infection 256

Fatality Ratio (IFR) (SI Sec. 3). 257 We then incorporated the effect of a declining proportion of susceptible population 258 due to naturally acquired immunity as 259 10/32 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 22, 2021. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 22, 2021. ; https://doi.org/10.1101/2021.07.19.21260746 doi: medRxiv preprint

The model forecasts were validated against observed deaths as these became available. 283

To quantitatively assess the performance of the model for both short-and medium-term 284 forecasts, we used the mean relative error (MRE) For the weekly analysis, we defined a country as having evidence of active transmission 296 if at least 100 deaths had been reported, and at least ten deaths were observed in each 297 of the past two weeks (SI Sec. 2). Countries with large variability in the reported deaths 298 within each week over the analysis period were excluded from the final analysis for this 299 work (SI Sec. 2 lists the full exclusion criterion). Results for 81 countries were included 300 in the work presented here. 301 We assumed a gamma distributed serial interval with mean 6.48 days and standard 302 deviation of 3.83 days following [44] . For simplicity, we assumed that the delay in 303 reporting a death is the same as the delay from onset to a case being reported. We 304 assumed that the delay in reporting of deaths follows a gamma distribution with mean of 305 10 days, and standard deviation of 2 days. These figures are roughly consistent with an 306 onset-to-death delay of 15.9 days [45] and onset-to-diagnosis delay of 6.6-6.8 days [46] . 307

The serial interval and delay distributions were discretised using R package EpiEstim [33] . 308 We used a country-specific population-adjusted IFR estimated using the IFR reported 309 in the REACT study (SI Sec. 3). 

The MRE across all countries and all weeks was 0.4 (SD 0.4) (Fig. 3) . That is, on 323 average the model forecasts were 0.4 times lower or higher than the observed incidence. 324

In most countries, the reporting of both cases and deaths through the week was variable, 325 with fewer numbers reported on some days of the week (typically, Saturday and Sunday). 326

The variability in reported deaths strongly influenced the model performance. The MRE 327 scaled linearly with the coefficient of variation (ratio of the standard deviation to the 328 mean) in the reported deaths for the week of forecasting. Thus, the error in forecasts 329 was on average similar to the variability in the reported deaths (SI Fig. 6 ). The MRE of 330 the model scaled inversely with the weekly incidence i.e. the error was relatively large 331 when the incidence was low (SI Fig. 6 ), as estimates of reproduction number when the 332 incidence is low are inherently more unstable [47] .

The model performance was largely consistent across epidemic phases (growing, 334 likely growing, decreasing, likely decreasing and indeterminate) with similar coverage 335 probability and MRE (SI Table 1 ). The slightly larger proportion of observations in 336 the 50% and 95% credible intervals for the 'indeterminate' phase and the larger MRE 337 in this phase together suggest that the model was 'under-confident' with large credible 338 intervals [48] . 339 We compared the performance of the model with that of a null no-growth model. In 340 most instances, the ensemble model outperformed the null model. In 80.9% of the weeks 341 13/32 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 22, 2021. ; https://doi.org/10.1101/2021.07.19.21260746 doi: medRxiv preprint in 'definitely decreasing' phase and 61.4% of weeks in 'definitely growing' phase, the 342 absolute error of the model was smaller than the error made by the null model (Fig. 3, 343 SI Sec. 5.2, SI Table 2 ). The null model performed better when the trajectory of the 344 epidemic in a country was relatively stable exhibiting little to no change over the time 345 frame of comparison. This is to be expected as the null model describes precisely this 346 stable dynamic. Indeed, in 68.1% of the weeks in the 'likely growing' phase and 67.1% 347 weeks classified as 'indeterminate' phase, the absolute error of the model was larger than 348 the error made by the null model. However, the relative error of the model remained 349 small even in countries and weeks where it did not perform as well as the null model. 350 Similarly, our model performed better than a linear growth model across all phases, 351 specifically in 96.4% of the weeks in 'definitely decreasing' phase and 70.3% weeks in 352 'definitely growing' phase (SI Sec. 5.3, SI Table 2 ).

The rapidly changing situation and the various interventions deployed to stem the growth 355 of the pandemic make forecasting at any but the shortest of time horizons extremely 356 challenging [49] . Despite these challenges, we find that our medium-term forecasts were 357 able to robustly capture the epidemic trajectory (Fig. 4) in all countries included in the 358 analysis (4).

Overall, the MRE remained small over a 4-week forecast horizon, with errors increasing 360 over the projection horizon (SI Sec. 6.1). We therefore restricted the projection horizon 361 to 4 weeks. The MRE across all countries in 1-week ahead forecasts was 0.4 (SD 0.3), 362 increasing to 2.6 (SD 28.3) in 4-week ahead forecasts (Fig. 5, SI Fig. 10 ). The MRE for 363 1-week ahead forecasts was less than 1 (indicating that the magnitude of the error was 364 smaller than the observation) in 91.1% of weeks for which we produced forecasts. The 365 corresponding figure for 4-week ahead forecasts was 66.0% (SI Table 3 ).

The proportion of observations in the 50% CrI remained consistent across the forecast 367 horizon and varied from 56.3% (SD 33.4%) in 1-week ahead forecasts to 45.6% (SD 368 40.9%) in 4-week ahead forecasts (SI Fig. 11, SI Fig. 12) .

More importantly, using R S t estimates from Eq. (11), we accurately characterised 370 the phase of the epidemic in each country. Across the 81 countries and 2210 weeks 371

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 22, 2021. ; https://doi.org/10.1101/2021.07.19.21260746 doi: medRxiv preprint (15470 days) for which we produced both short-and medium-term forecasts, the phase 372 definition using the reproduction number estimates from medium-term forecasts (R S t ) 373

was consistent with that using the estimates from the short-term forecasts (R curr t ) in 374 87.6% (13559/15470) of country-days (number of countries X number of days for which 375 we produced forecasts. The phase definition using reproduction number estimates from 376 medium-term forecasts was updated each day over the forecast horizon while the short-377 term forecasts assigned the same phase to all days of a week.) in 1-week ahead forecasts 378 and in 84.9% (13138/15470) of country-days in 4-week ahead forecasts (Fig. 6) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 22, 2021. ; https://doi.org/10.1101/2021.07.19.21260746 doi: medRxiv preprint computational time needed to fit complex models make scaling them difficult and delays 402 the timely provision of risk estimates.

In addition to the variable availability of surveillance data across countries, the wide-404 scale societal and behavioural changes brought about by the pandemic impose practical 405 constraints on utilising data that are available for multiple countries. For instance, widely 406 available data on the changes in mobility inferred from mobile phone usage released by 407

Google and Apple were informative of the changes in transmission in the early phase of 408 the COVID-19 pandemic and were used in several modelling studies [50, 51] . Although 409 these data continue to be available, recent evidence suggests a decoupling of transmission 410 and mobility in most countries [35, 52] . Models that relied on such additional data [51] 411 or assumptions about non-pharmaceutical interventions [53] could not fit the observed 412 trajectory well as the situation continued to change over the course of the epidemic. [55] [56] [57] , and for several countries across the globe [41, 58, 59] . In contrast to 418 models built for a region or country and calibrated using local data, models that aim to 419 provide a global overview must be sufficiently general to describe the epidemic trajectory 420 in a range of countries/regions using widely available data that are consistently available 421 over time. 422 We have produced short-term forecasts and estimates of transmissibility for 81 423 countries for more than 65 weeks at the time of writing implementing three simple 424 models that use only the time series of COVID-19 cases and deaths. We have thus 425 traded particularity for generality, to allow us to carry out analysis for a large number 426 of countries over a long period of time. As our methods make few assumptions and use 427 routine surveillance data, they can be easily used during any other future outbreaks. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 22, 2021. ; https://doi.org/10.1101/2021.07.19.21260746 doi: medRxiv preprint deaths and/or large variability in reported deaths over weeks reflects this trade-off. 434

In the absence of more detailed data, we assumed that epidemiological parameters 435 such as the delay from onset of symptoms to death were the same across all countries 436 and throughout the period of analysis. These parameters are likely to vary over time 437 and between countries and using country-specific parameters could lead to moderate 438 improvements in the model fits and forecast performance. could investigate integrating more data streams into the models. In addition to the 453 weekly reports that we publish, our work has also contributed to other international 454 forecasting efforts [22, 48, 55] . 455 We developed a simple heuristic to combine past estimates of transmissibility and a 456 decline in the proportion of susceptible population to produce medium-term forecasts. 457 We were able to achieve good model performance in forecasting up to 4 weeks ahead. 458 Consistent with findings from other modelling studies [22], we found that the model 459 error was unacceptably high beyond 4 weeks, suggesting that forecasting beyond this 460 horizon is difficult. Importantly, our characterisation of the epidemic phase using 461 weighted estimates of transmissibility were largely in agreement with that using short-462 term transmissibility estimates. Thus, our method was successful at capturing the 463 broad trends in transmission up to 4 weeks ahead. The medium-term forecasts can 464 therefore serve as a useful planning tool as governments around the world plan further 465

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 22, 2021. ; https://doi.org/10.1101/2021.07.19.21260746 doi: medRxiv preprint Our method incorporates the depletion of susceptible population and hence can in 467 principle be extended to account for increasing population immunity as vaccination is 468 rolled out across the world. However, inclusion of vaccine induced immunity depends 469 on the availability of reliable data on vaccination coverage. Further, even if such data 470 were available, teasing apart the impact of vaccination on transmission and mortality 471 could be non-trivial. In light of these issues, it might be challenging to extend our 472 approach to rigorously assess the effect of vaccination on epidemic trajectory on a global 473 scale. However, our latest estimates of transmissibility indirectly reflect the impact of 474 vaccination on transmission, allowing for the delay from vaccination to full immunity, 475 and from infection to death. As we continue to track COVID-19 transmissibility globally, 476 any temporal changes in transmissibility would implicitly account for the changes due to 477 differential vaccination coverage. The model is fitted using only the data in this window (T − τ + 1 to T ) to jointly estimate the initial incidence of deaths and R[T − τ + 1, T ]. (b) Model 2 optimises the window over which R t is assumed to be constant by minimising the cumulative predictive error over the entire epidemic time series. Estimates from R[T − τ * + 1, T ] are used to forecast into the future, with τ * the window of optimal length. (c) Model 3 uses data from both cases and deaths. The dashed blue curve represents the incidence of reported cases weighted by the case-report to death delay distribution, where µ is the mean delay. ρ t is the ratio of the observed deaths and the weighted cases at time t and is analogous to an observed case fatality ratio. Forecasts of deaths are obtained by sampling from a binomial distribution using the most recent estimate of ρ T . See also SI Fig. 3. (d) To obtain medium-term forecasts, we combine the most recent transmissibility estimate R curr T (shown in dark blue) with estimates of transmissibility in the previous weeks to produce a weighted estimate of transmissibility R w T (filled in pink) at time T . Estimates from previous weeks are combined with the most recent estimates if the 95% CrI of estimates in week k, R curr T −7k overlaps the 95% CrI of R curr T . Estimates for weeks where the 95% CrI overlap are shown in light purple, and where the 95% CrI do not overlap in grey. The dashed horizontal lines represent the 2.5 th and 97.5 th quantile of R curr T . We combine the estimates by sampling from the posterior distribution of R curr T −7k with probability proportional to e −β * k , where β is a rate at which the probability decays as we go back in time.

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 22, 2021. The ratio of the absolute error of the model to the absolute error of a no-growth null model that uses the average of the last 10 days as a forecast for the week ahead. Shades of green show weeks for a given country where the ratio was smaller than 1 i.e., the ensemble model error was smaller, and weeks where the ratio was greater than 1 i.e. the ensemble model error was larger than the null model error are shown in shades of red (yellow to red). Dark blue indicates weeks when the ratio was larger than 2. In order to present a representative sample, we first ranked all countries by the percentage of weeks in which ensemble model error was smaller than the null model error. We then selected every third country from the top 75 countries in this list. Results for the selected 25 countries are shown here. See SI Fig. 4 for the results for other countries. Ordering of countries in the figure reflects the order in the ranked list i.e. countries with the highest percentage of weeks with model error smaller than null model error are shown on the top.

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted July 22, 2021. from the short-term forecasts and the median (green line) and the 95% CrI (shaded green area ) of R S t i.e. the reproduction number accounting for depletion of susceptible population from the medium-term forecasts over a 4-week horizon ( Methods). The dashed red line indicates the R S t = 1 threshold. Note that the y-axis is different for each subfigure. The forecasts were produced every week over a 4-week forecast horizon. The figure shows all non-overlapping forecasts over the course of the pandemic. See SI 2 for results for all other countries and weeks.

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted July 22, 2021. ; Fig 5. Relative error of medium-term forecasts. The mean relative error of the model in 1-week, 2-week, 3-week and 4-week ahead forecasts for each week when a forecast was made (x-axis) and for each country (y-axis). Blue cells indicate weeks where the relative error of the model was greater than 2. For ease of presentation, results are shown for the same 25 countries as Fig. 2 . See SI Sec. 5 for the results for other countries. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted July 22, 2021. ; https://doi.org/10.1101/2021.07.19.21260746 doi: medRxiv preprint

A model to 559 forecast regional demand for COVID-19 related hospital beds

COVID-19-Associated Hospitalizations under Different Levels of Social Distanc-563

IHME COVID-19 health service utilization forecasting team and Murray

ICU-days, ventilator-568 days and deaths by US state in the next 4 months

COVID-19: Short-term 571 forecast of ICU beds in times of crisis

Key epidemiological drivers and impact of interventions in the

CoV-2 epidemic in England

Underdetection of cases of COVID-19 in France threatens epidemic control

Reduction 618 in mobility and COVID-19 transmission

Estimating individual and household reproduction numbers in an 621 emerging epidemic

A 623 simple approach to measure transmissibility and forecast incidence

Using information theory to optimise epidemic 626 models for real-time prediction and estimation

Adaptive Estimation for Epidemic Renewal 629

Binomial Confidence Intervals For Several Parameterizations

Epi-644 demiological characteristics of COVID-19: a systematic review and meta-analysis

Transmission characteristics of the COVID-647 19 outbreak in China: a study driven by data. Epidemiology; 2020

Improved estimation of time-varying reproduction numbers at low 650 case incidence and between epidemic waves. Epidemiology; 2020

European Covid-19 Forecast Hub: Weekly reports

The turning point and end of an 655 expanding epidemic cannot be precisely forecast

A 658 sub-national analysis of the rate of transmission of COVID-19 in Italy. Public 659 and Global Health

State-level tracking of COVID-19 in the United States

Evidence of initial 665 success for China exiting COVID-19 social distancing policy after achieving contain-666

Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe

Short-term forecasting of COVID-19 in Germany and Poland during the second 674 wave-a preregistered study

COVID-19 Austria

Inference of COVID-19 epidemiological distributions from Brazilian 678 hospital data

COVID-19 Daily Epidemic Forecasting

Underreporting of death by COVID-19 in Brazil's second most populous state

Estimation of US SARS-CoV-2 infections, 687 symptomatic infections, hospitalizations, and deaths using seroprevalence surveys

The authors acknowledge funding from the MRC Centre for Global Infectious Disease 492 Analysis (reference MR/R015600/1), jointly funded by the UK Medical Research Council 493 18/32