key: cord-1004225-ia6sv81x
authors: Reis, Ruy Freitas; Oliveira, Rafael Sachetto; Quintela, Bárbara de Melo; Campos, Joventino de Oliveira; Gomes, Johnny Moreira; Rocha, Bernardo Martins; Lobosco, Marcelo; dos Santos, Rodrigo Weber
title: The Quixotic Task of Forecasting Peaks of COVID-19: Rather Focus on Forward and Backward Projections
date: 2021-03-16
journal: Front Public Health
DOI: 10.3389/fpubh.2021.623521
sha: a7c1f4f9c8346a2b43f88f3258a841059e3152a0
doc_id: 1004225
cord_uid: ia6sv81x

Over the last months, mathematical models have been extensively used to help control the COVID-19 pandemic worldwide. Although extremely useful in many tasks, most models have performed poorly in forecasting the pandemic peaks. We investigate this common pitfall by forecasting four countries' pandemic peak: Austria, Germany, Italy, and South Korea. Far from the peaks, our models can forecast the pandemic dynamics 20 days ahead. Nevertheless, when calibrating our models close to the day of the pandemic peak, all forecasts fail. Uncertainty quantification and sensitivity analysis revealed the main obstacle: the misestimation of the transmission rate. Inverse uncertainty quantification has shown that significant changes in transmission rate commonly precede a peak. These changes are a key factor in forecasting the pandemic peak. Long forecasts of the pandemic peak are therefore undermined by the lack of models that can forecast changes in the transmission rate, i.e., how a particular society behaves, changes of mitigation policies, or how society chooses to respond to them. In addition, our studies revealed that even short forecasts of the pandemic peak are challenging. Backward projections have shown us that the correct estimation of any temporal change in the transmission rate is only possible many days ahead. Our results suggest that the distance between a change in the transmission rate and its correct identification in the curve of active infected cases can be as long as 15 days. This is intrinsic to the phenomenon and how it affects epidemic data: a new case is usually only reported after an incubation period followed by a delay associated with the test. In summary, our results suggest the phenomenon itself challenges the task of forecasting the peak of the COVID-19 pandemic when only epidemic data is available. Nevertheless, we show that exciting results can be obtained when using the same models to project different scenarios of reduced transmission rates. Therefore, our results highlight that mathematical modeling can help control COVID-19 pandemic by backward projections that characterize the phenomena' essential features and forward projections when different scenarios and strategies can be tested and used for decision-making.

Epidemiology is defined in the International Epidemiological Association's dictionary as "the study of the occurrence and distribution of health-related events, states, and processes in specified populations, including the study of the determinants influencing such processes, and the application of this knowledge to control relevant health problems" (1) . One of its main objectives, as stated in the definition, is to provide data so governments can plan and execute actions to prevent and control diseases. The current COVID-19 pandemic has put epidemiology at the center of the debate as, to date, there are no antivirals with proven efficacy against the disease (2) (3) (4) (5) (6) . The first vaccines have just become available, but it is unknown how long does the immunity last after vaccination. Due to the lack of pharmaceutical treatments, non-pharmaceutical interventions suggested by epidemiologists have been used by many countries to deal with the pandemic, more specifically to reduce transmission and the impact on healthcare systems (7) (8) (9) (10) (11) (12) .

Mathematical and computational tools can be used by epidemiological studies to describe and predict the dynamics of the spread of a disease over time and space (13, 14) . In addition, these tools can be used to assess the impact of non-pharmaceutical interventions, such as isolation (15) (16) (17) (18) . Several models have been proposed to describe the spread of diseases (14) . Over the last months, many more have been developed to represent the dynamics of populations and their interactions, as well as to forecast the dynamics of the COVID-19 pandemic. Most describe the spread of COVID-19 based on ODEs (Ordinary Differential Equations) (17, (19) (20) (21) (22) (23) (24) (25) (26) (27) (28) , but statistical (29, 30) , chaotic (27) , and stochastic/probabilistic models (16, 18, 31, 32) have also been used.

Determining the pandemic's peak is a piece of valuable information for planning the health resource needed to cope with the disease. In the case of COVID-19, it is also relevant for economic reasons since many countries adopted lockdowns to reduce the spread of the disease, impairing their gross domestic product and, consequently, their budget while increasing their health and social protection costs systems. Although most of the models found in the literature can have their parameters adjusted to COVID-19 data, i.e., to describe the behavior of its spread in different cities, regions, or countries, they usually fail to forecast the peak of the pandemic accurately (26) (27) (28) . One could ask if one particular modeling technique would be more appropriate to forecast the dynamics of COVID-19 than others. However, a review of the literature does not clearly show that this hypothesis holds. In fact, in the literature we can find examples of forecasts that failed using SEIR (26) , SIRD (25) extensions of SEIR with more compartments (28) , statistical (33, 34) , agent-based (35) , machine-learning (36) , and chaos-based theory models (27) .

Another possible explanation for the failed forecasts could be related to the classical problem of overfitting (37) , where the model can replicate the data it is adjusted to but fails on any attempt of extrapolation or forecasting. In this paper, we took several precautions to prevent the issue of overfitting. A simple mathematical model, based on the classical SIRD model, was adopted with a reduced number of parameters. We decided to keep the model as simple as possible since adding more compartments increases the number of unknown parameters to be estimated, which hinders the accurate calibration of the model. We also used the methods of forward and inverse uncertainty quantification (UQ). The parameters of the models were treated as probability density functions (PDFs) during the task of model-to-data fitting (via inverse UQ) and during the tasks of forecasting and projections (via forward UQ). In addition, during the fitting phase, we also considered a possible discrepancy between model and reality (38) . Nevertheless, in this paper, we show that the above precautions to avoid overfitting did not solve the problem of mispredicting the peak of COVID-19.

This work shows that this common pitfall is likely due to fast and unpredictable changes in the disease's transmission rate. The models are useful for predictions in a more controlled environment. It is like trying to predict the trajectory of a paper aeroplan on a windless day. It is much easier than during a storm. Likewise, it is challenging to predict significant changes in how a particular society behaves, mitigation policies, or how society responds to them during a pandemic. However, these all have a direct impact on the transmission rate, which in turn significantly affects the dynamics of the pandemic, as shown in previous studies (20, 39, 40) . Nevertheless, differently from the weather, mitigation policies can be planned and controlled to some extent. This fact brings up the importance of projections of different scenarios during this pandemic. Different from forecasts, projections aim to study one or more hypothetical scenarios. In contrast, forecasts use the available data and try to predict future trends (41) .

The models and techniques used in this work were first described in previous work (20) . The model consists of a non-linear system of ordinary differential equations subject to uncertainty in some of its parameters and initial conditions. Probability density functions (PDFs) were used to describe the uncertainties associated with these parameters, so they are not scalars. Some of them, such as the transmission rate, are additionally time-dependent. After we adjusted the model's PDFs to a particular data (via inverse Uncertainty Quantification), the model was able to provide useful insights in terms of characterization of the pandemic dynamics in a particular country.

The current study uses four countries as examples, with distinct population sizes and demographics: Austria, Germany, Italy, and South Korea. For each country, we first show that the proposed model and methods correctly described the dynamics of total reported cases, active infected, and deaths when fitting model to data, i.e., our models can reproduce the different dynamics and peaks. Next, we show that our models can forecast the pandemic dynamics 20-30 days ahead when far from the peaks. However, all pandemic peaks' forecasts fail, even when adjusting the model to the data up to 5 days before the peak in each country. We further investigate this weakness, which is shared by many distinct models presented in the literature, by analyzing which parameter was misestimated via backward projection or inverse UQ. The analysis pointed to the misestimation of changes in the transmission rate near the peak as the primary source of error. In addition, the correct estimation of any temporal change in the transmission rate was only possible many days ahead.

Finally, we performed projections, adjusted the model to the data up to 10 days before the peak, and focused on different scenarios that considered changes in the transmission rate. The projections that simulated significant reductions in the transmission rate were the ones where the pandemic peaks were closest to the real observed ones. Therefore, our results highlight how mathematical models can help the fight against the COVID-19 pandemic: by characterizing important parameters that dictate the pandemic dynamics, as performed before in our previous work (20) ; and via projections, when different scenarios and strategies can be tested and used for decision-making. In addition, our analysis suggests that the misestimation of changes in the transmission rate near the peaks is the main source of error during the task of forecasting the peaks of COVID-19 pandemic.

To demonstrate the impact of the transmission rate in forecasting the peak of COVID-19 pandemic, the parameters of our model (20) were calibrated according to total and active COVID-19 cases and deaths in three countries that have already achieved the pandemic peak: Austria, Germany, South Korea, and Italy. Peak predictions are performed considering a distinct number of available days.

The characterization of COVID-19 in these four countries is performed using inverse UQ techniques. Therefore, during the calibration of the model, the coefficients are treated as unknown probability density functions. Once estimated, the PDFs of the coefficients, their means, standard deviations (SD), and shape provide important information on model parameters that are essential in the characterization of the COVID-19 pandemic. The model and how it is adjusted are briefly described in this section to facilitate the understanding of the results. More details about the model, calibration of the parameters, and uncertainties can be found in our previous work (20).

The model used in this work (20) is based on the classic compartmental SIRD model (13, 14, (42) (43) (44) , and was kept as simple as possible to reduce the number of unknown parameters to be estimated.

The model is described by the following set of equations:

where S, I, R, D, I r , R r , and C are the variables that represent the number of individuals within a population of size N that are susceptible, infected, recovered, dead, reported as infected, reported as recovered, and total confirmed cases, respectively. The term α(t) = a(t)b denotes the rate at which a susceptible individual becomes infected; where a(t) denotes the probability of contact and b the rate of infection. The function a(t) models temporal changes in the transmission rate:

otherwise.

(

Each transmission change starts at t i , and is changed by a factor r at the final time t i + . The mortality rate of infected individuals is modeled by the constant β = m(1/τ o ), where m is the probability of death. It must be noted that this is not the same as the rate of death and as the percentage of death among the reported cases of positive infection (I r ). The number of days from infection until death is represented by τ o = τ 1 + τ 2 , where τ 1 is the incubation time of the virus and τ 2 is the time between the first symptoms until death. Similarly, represented by τ r = τ 1 + τ 3 , where τ r is the number of days from infection until recovery and τ 3 is the time between the first symptoms until recovery. The rate at which infected individuals recover from the virus is given by constant

Lastly, the percentage of confirmed infected individuals that are notified or reported is represented by θ . For making projections using the model beyond the last day used during the fitting, we consider α(t) = a p (t)b:

The function a p (t) is similar to a(t) but it adds different scenarios in terms of how the transmission rate evolves after the last day of data used for model fitting. The constant t f is the last day used during the calibration of the model, r f is the final value of a p (t) during the projection phase, and f is the time interval for a p (t) to change from r to r f . Figure 1 is an illustrative example for Equation (3).

Model parameters were adjusted using the differential evolution (DE) optimization method (45) implemented in the C programming language. The DE was used to estimate each of the parameters of the proposed mathematical model, respecting the limits established for each one of them [see (20) ].

The parameter values were estimated based on official data from the epidemic reported in each country (Austria, Germany, S. Korea, and Italy) and made available by Dong et al. (46) . Here, we are usingÎ(t) as the reported numbers of active cases,D(t) the number of deaths, andĈ(t) the total confirmed cases. The following objective function, Equation (5), was used to minimize the relative error (R E (λ,λ)) between the data and the model:

where p is the set of parameters to be estimated and ω n is a weight. For this work, we used ω 1 = ω 2 = ω 3 = 1.0. Some input parameters of the model are subject to uncertainties and variations, due to measurement errors, technical limitations, and resource availability. Parameters such as the incubation period, mortality, period from symptoms to death, period from symptoms to recovery, and the effectiveness of contact reduction are hampered by uncertainties, as suggested in data from literature. Therefore, when evaluating models such as the one studied in this work, it is important to evaluate the uncertainties from estimated input parameters, providing a confidence interval for the predictions.

Although the parameter intervals are reported in the literature, we still do not know how their probability densities functions behave. So, an inverse UQ technique was used to estimate the PDFs and corresponding uncertainties of the input parameters or coefficients of the model during model calibration. For each model parameter, we determined its PDF from the fitting procedure using the DE method. Among the offsprings generated by the DE, during the fitting process, we selected individuals with o(p) ≤ 25%, where o(p) is defined in Equation (6).

in which p is the set of parameters to be estimated and ω n are the same weights defined in Equation (5) . Using this procedure, we acknowledge a possible discrepancy of up to 25% between the model and reality. Thus, from these samples, we estimate the covariance matrix and mean of all parameters. We use these data to generate a multivariate normal distribution to perform a forward UQ analysis via the Monte Carlo method with a total of 10,000 samples using the ChaosPy library (47) .

Finally, we evaluate how the uncertainties in the input parameters of the model impact its outputs using forward UQ techniques. The forward UQ technique propagates the uncertainty of the input parameters to the outputs. Among the consolidated methods from the literature, Monte Carlo is one of the most used to perform uncertainty propagation (48) (49) (50) . Briefly, this method draws samples of the input parameters and evaluates the model using them to provide statistical properties for the quantities of interest.

The model was calibrated using the data reported by the Center for Systems Science and Engineering at Johns Hopkins University (46) , between 01/22/2020 and 12/20/2020. The bounds used for the parameters are described in (20). Tables 1-4 show the mean and standard deviation of all offspring solution with o(p) ≤ 25%. The value of N is 9.00×10 6 for Austria, 83.02 × 10 6 for Germany, 60.42 × 10 6 for Italy and 51.47 × 10 6 for South Korea.

First, to check if the proposed model is able to fit the available data of countries during the peaks. A summary of the inverse UQ analysis results is presented in Table 1 , which presents the mean and standard deviation (SD) of the estimated PDFs of the parameters for the three countries. Figure 2 compares the results of the fitted models to the original data for each country. For each time instant t, I(t), C(t) and D(t) are PDFs, in response to the process of forward uncertainty quantification. It should be noted that the same model, with different parameters, was able to reproduce the distinct scenarios and peaks of the COVID-19 pandemic in Germany, Austria, Italy, and S. Korea. For the case of Italy, we fitted the recent second peak that has just been reached. Figure 3 shows that the models can correctly forecast the dynamics of COVID-19 for the four countries when away from the peaks. The match between forecasts and real data is observed to last between 20 and 30 days after the fitting phase, which used 30 days for all countries. We have chosen different but all recent phases for these forecasts. The curves' shapes are very distinct, but as long as the forecast is away from the peak, the b, COVID-19 transmission rate; m, death probability; r, contact reduction; t i , start of intervention policy; , duration of intervention policy; τ 1 , incubation period; τ 2 , period from symptoms to death; τ 3 , period from symptoms to recovery; θ , fraction of notified cases. The model parameters were calibrated using data from the first day with more than 100 cases to 05/11/2020 for each country for Austria, Germany, and S. Korea. For Italy the model parameters were calibrated from 08/17/2020 to 12/14/2020. b, COVID-19 transmission rate; m, death probability; r, contact reduction; t i , start of intervention policy; , duration of intervention policy; τ 1 , incubation period; τ 2 , period from symptoms to death; τ 3 , period from symptoms to recovery; θ , fraction of notified cases. The model parameters were calibrated using 30 days of data. The model was then used to forecast the following 30 days for Austria, Germany, S. Korea, and Italy. b, COVID-19 transmission rate; m, death probability; r, contact reduction; t i , start of intervention policy; , duration of intervention policy; τ 1 , incubation period; τ 2 , period from symptoms to death; τ 3 , period from symptoms to recovery; θ , fraction of notified cases. The model parameters were calibrated using data from the first day with more than 100 cases to 5 days before the pandemic peak for Austria, Germany, and S. Korea. For Italy the model parameters were calibrated from 08/17/2020 to 5 days before the second peak. b, COVID-19 transmission rate; m, death probability; r, contact reduction; t i , start of intervention policy; , duration of intervention policy; τ 1 , incubation period; τ 2 , period from symptoms to death; τ 3 , period from symptoms to recovery; θ , fraction of notified cases. The model parameters were calibrated using data from the first day with more than 100 cases to 10 days before the pandemic peak for Austria, Germany, and S. Korea. For Italy the model parameters were calibrated from 08/17/2020 to 10 days before the second peak. prediction is within the calculated interval of confidence. Note the particular valley shape captured by the forecast for Germany. Table 2 presents the results of the calibration process.

As mentioned in the introduction, a literature review shows that many models fail to forecast the peak of the pandemic (26) (27) (28) , regardless if they are based on SEIR models (26) , SIRD (25) extensions of SEIR with more compartments (28) , statistical (33, 34) , agent-based (35) , machine-learning (36) , or chaos-based theory models (27) . The same happens with our model. The following experiment was performed: the parameters were adjusted again using data available for active cases until 10 days before achieving the pandemic peak. Then, we tried to predict the number of active cases in the next days. The same experiment was then performed adjusting the data available until 5 days before achieving the pandemic peak. We chose to calibrate the models up to 5 or 10 days before the peak to show that even when it is very close to occurring, forecasts may fail. A summary of the results of the inverse UQ analysis is presented in Tables 3, 4 , presenting the mean and standard deviation of the estimated PDFs of the parameters adjusted for the three countries using data until 5 and 10 days, respectively, before the pandemic peak. Figure 4 presents the forecasts. As one can observe, even in the case where the model was adjusted using data available 5 days before the FIGURE 3 | Total number of cases, active cases and deaths for Austria, Germany, South Korea, and Italy. Short forecasts of the dynamics of the pandemic for four countries. Available data is represented by • and ⋆. • represent the days used for fitting (before the vertical dotted line) and ⋆ represent the data that was not considered for fitting (after the vertical dotted line). The solid lines indicate the expected value obtained numerically after parameters have been fitted, shaded regions indicate the 95% confidence interval (CI). The x-axis is representing days in dd/mm format. FIGURE 4 | Forecasting the dynamics of the pandemic for four countries based on fitting the model to active cases data available until (A) 5 days before the peak (B) 10 days before the peak. Available data is represented by • and ⋆. • represent the days used for fitting (before the vertical dotted line) and ⋆ represent the data that was not considered for fitting (after the vertical dotted line). The solid lines indicate the expected value obtained numerically after parameters have been fitted, shaded regions indicate the 95% confidence interval (CI). The x-axis is representing days in dd/mm format.

peak, the calibrated model was not able to correctly predict the peaks.

In this section we investigate the possible sources of errors when forecasting the peaks of COVID-19. First, by comparing the inverse uncertainty quantification results presented in Tables 1, 3, 4 we can observe that the main difference between the simulations that capture the peaks (Figure 2 ) and those that do not (Figure 4) lies on the estimation of the parameters that describe the time-varying transmission rate (a(t)): t i , r, and . When the peak is well capture by the model, a(t) reduces significantly near the peak.

We continue this investigation by performing a sequence of four backward projection experiments, named E1 to E4. To describe these experiments, let P denotes the day of the peak in the corresponding country considered in this study. The first experiment (E1) calibrates the model parameters up to 5 days before the peak, as performed before, but focusing near the peak, from P − 10 to P − 5 days. For the second experiment (E2), we expand the model calibration toward the peak, from P − 10 to P. We continue expanding the window surrounding the peak in the third experiment (E3), performing the adjustment from FIGURE 5 | Backward projection of the pandemic's peak for Austria: active cases (top) and transmission rate a(t) (bottom). Fitting the model to active cases data until (E1) 5 days before the peak, (E2) the peak, (E3) 10 days after the peak, and (E4) 15 days after the peak. The x-axis is representing days in dd/mm format. FIGURE 6 | Backward projection of the pandemic's peak for Italy: active cases (top) and transmission rate a(t) (bottom). Fitting the model to active cases data until (E1) 5 days before the peak, (E2) the peak, (E3) 10 days after the peak, and (E4) 15 days after the peak. The x-axis is representing days in dd/mm format. P − 10 to P + 10 days. Finally, in the last experiment (E4), the last calibration is performed from P − 10 to P + 15 days. Then we compare all the parameters to check those that vary most between the different calibrations, which include the one we used before that failed forecasting the peak (E1, P − 10 to P − 5) and the one that captures the peak (E4). Once again, the main difference between the different experiments' estimated parameters was on those that describe the time-varying transmission rate (a(t)): t i , r, and . Figure 5 presents the experiments E1-E4 for Austria and each corresponding estimated a(t). In this case, we note that from the adjusted and experimental data of E1 (active cases), there is little to no indication that we are slowing down and reaching the peak. Also, by comparing the shapes of the estimated a(t), we observe that only 10 days after the peak (E3), the crucial information on how the transmission rate evolves before the peak converges [the shapes of a(t) obtained in E3 and E4 are nearly the same]. Figure 6 presents the experiments E1-E4 for Italy and each corresponding estimated a(t). In this case, we note that even from the adjusted and experimental data of E2 (active cases from P − 10 to P), there is little to no indication that we are slowing down and reaching the peak. Again, by comparing the shapes of the estimated a(t), we observe that only 15 days after the peak (E4), we can correctly estimate the shape of a(t) before the peak.

The last experiment makes projections considering different scenarios of transmission rate reduction. Again, the parameters were adjusted using data available for active cases until 5 and 10 days before achieving the peak (in the active cases) for the four countries. The idea here is to evaluate the impact of different shapes of a p (t) after the calibration, i.e., different scenarios of transmission rate reduction.

Since the projection is done considering the same dates used for the forecasts, the values in Tables 3, 4 are the same for both experiments. In the projections we consider FIGURE 7 | Projections considering different isolation policies for Austria, Germany, S. Korea, and Italy are considering data until 5 days before the peak (A) and 10 days before the peak (B). All projections considered f = 7 days. Projections in blue and red considered a final contact reduction of 10% and 5%, respectively. • represents the days used for fitting (before the vertical dotted line), and ⋆ represent the days that were not considered for the fitting (after the vertical dotted line). The solid lines indicate the expected value obtained after parameters were fitted; shaded regions indicate the 95% confidence interval (CI). The x-axis represents days in dd/mm format. FIGURE 8 | Projections considering different isolation policies for Austria, Germany, S. Korea, and Italy are considering data until 5 days before the peak (A) and 10 days before the peak (B). All projections considered f = 14 days. Projections in blue and red considered a final contact reduction of 10% and 5%, respectively. • represents the days used for fitting (before the vertical dotted line), and ⋆ represent the days that were not considered for the fitting (after the vertical dotted line). The solid lines indicate the expected value obtained after parameters were fitted; shaded regions indicate the 95% confidence interval (CI). The x-axis represents days in dd/mm format.

two distinct values for r f , 0.10 and 0.05, i.e., two different final values for a p (t), and two distinct values for f , 7 and 14. All projections considered t i = 0. Therefore, were are considering different scenarios where transmission rate reduces after the calibration a p (t), decreasing to 0.10 or 0.05, after 1 or 2 weeks. Figure 8 presents the projections for the four countries with f = 7. The peaks becomes visible when projecting a more significant reduction in the transmission rates, i.e., with r = 0.05. Figure 7 presents the projections with f = 14. The results are similar to the previous projections. The main difference is that the peaks are higher and occur further ahead. These results support that the fastest way to control the pandemic is with strict mitigation policies that can significantly reduce the transmission rate in a short period.

First of all, one should observe that the model used in this work can reproduce the dynamics of COVID-19 for distinct countries. The pandemic peak for all countries considered here was utterly determined since the adjusted model captures both the day in which the peak occurs as well as its maximum value, as shown in Figure 2 . Figure 3 also shows that the model can be very useful in forecasting the dynamics of COVID-19.

There is, however, one main weakness of this model: forecasts near the pandemic peak usually fail. In Figure 4 , the same method was applied to adjust the parameters of the model, but this time we did not use the entire dataset. The model was adjusted using data until 5 and 10 days before the peak of the pandemic. As can be observed in Figure 4 , the forecasts overestimate the number of active cases and mispredict the peaks by more than a month.

In this work, we have used many sophisticated tools base on forward and inverse UQ to identify the source of this problem. First, we compared the inverse UQ results presented in Tables 1, 3, 4 and observed that the main difference between the simulations that capture the peaks (Figure 2 ) and those that do not (Figure 4) lies on the estimation of the parameters that describe the time-varying transmission rate (a(t)): t i , r, and . When the peak is well-captured by the model, a(t) reduces significantly near the peak. This reduction in the transmission rate is key in forecasting the pandemic peak. Long forecasts of the pandemic peak are therefore undermined by the lack of models that can forecast changes in the transmission rate, i.e., how a particular society behaves, changes of mitigation policies, or how society chooses to respond to them.

In addition, we performed a sequence of four backward projection experiments, named (E1) to (E4). The first experiment (E1) calibrates the model parameters up to 5 days before the peak. The second one (E2) calibrates it up to the peak, (E3) up to the peak plus 10 days, and (E4) up to the peak plus 15 days. Then we compared all the parameters to check those that vary most between the different calibrations. Once again, the main difference between the different experiments' estimated parameters was on those that describe the time-varying transmission rate, a(t). Figures 5, 6 show also that the shape of the estimated a(t) only converges to the correct one when using epidemic data that includes many days after the peak. These results clearly show a delay between changes in the transmission rate and their impact on the curve of active cases, which is about 15 days. This is intrinsic to the phenomenon and how it affects epidemic data: a new case is usually only reported after an incubation period followed by a delay associated with the test.

The presence of this delay suggests that the challenging task of forecasting the pandemic peak might require additional data and constant monitoring to capture the transmission rate better.

The aforementioned results clearly show how inverse UQ and backward projections can provide important information on the dynamics of the COVID-19 pandemic. Finally, we have also performed forward projections to assess different scenarios of transmission rate reduction. Figures 7, 8 show how significant changes in the transmission rate impacts the dynamics and influences the location of the peaks. In addition, these results support that the fastest way to control the pandemic is with strict mitigation policies that can significantly reduce the transmission rate in a short period.

In summary, our results highlight how mathematical models can help the fight against the COVID-19 pandemic: by characterizing important parameters that dictate the dynamics of the pandemic, as performed before in our previous work (20) ; and via projections, when different scenarios can be tested and used for decision-making. In addition, they suggest that forecasting the peaks of COVID-19 can be quixotic due to the challenges that involve a precise estimation of how the transmission rate evolves.

The dataset used for this study can be found in the CSSEGIS and Data at: http://github.com/CSSEGISandData/COVID-19.

RF, BM, and RS: software, methodology, and formal analysis. BdM and JM: formal analysis. JdO: software and formal analysis. ML: methodology and formal analysis. RW: conceptualization, software, methodology, and formal analysis. All authors: writing the original draft, review and editing, and have read and approved the final manuscript.

This work was partially supported by Fapemig, CNPq, CAPES, UFJF, UFSJ, and CEFET-MG.

A Dictionary of Epidemiology

The COVID-19 vaccine development landscape

Networkbased drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2

Discovering drugs to treat coronavirus disease 2019. (COVID-19)

COVID-19-The search for effective therapy

Effectiveness of convalescent plasma therapy in severe COVID-19 patients

Report 9: impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand

Isolation, quarantine, social distancing and community containment: pivotal role for old-style public health measures in the novel coronavirus (2019-nCoV) outbreak

Can we contain the COVID-19 outbreak with the same measures as for SARS?

Interrupting transmission of COVID-19: lessons from containment efforts in Singapore

COVID-19: what is next for public health?

The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak

Mathematical epidemiology of infectious diseases: model building, analysis and interpretation

The mathematics of infectious diseases

Beyond just "flattening the curve": Optimal control of epidemics with purely non-pharmaceutical interventions

Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts

Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy

Early dynamics of transmission and control of COVID-19: a mathematical modelling study

An optimal predictive control strategy for COVID-19 (SARS-CoV-2) social distancing policies in Brazil

Characterization of the COVID-19 pandemic and the impact of uncertainties, mitigation strategies, and underreporting of cases in South Korea, Italy, and Brazil

A parametrized nonlinear predictive control strategy for relaxing COVID-19 social distancing measures in Brazil

A conceptual model for the coronavirus disease (2019). (COVID-19) outbreak in Wuhan, China with individual reaction governmental action

Data-based analysis, modelling and forecasting of the COVID-19 outbreak

Estimating clinical severity of COVID-19 from the transmission dynamics in Wuhan, China

Analysis and forecast of COVID-19 spreading in China, Italy and France

Prediction of the epidemic peak of Coronavirus Disease in Japan, 2020

Chaos theory applied to the outbreak of COVID-19: an ancillary approach to decision making in pandemic context

Predicting the number of reported and unreported cases for the COVID-19 epidemics in China

Real-time forecasts of the COVID-19 epidemic in China from

Shortterm forecasts of the COVID-19 epidemic in Guangdong and Zhejiang

Assessing the potential impact of COVID-19 in Brazil: mobility, morbidity and social vulnerability

On an alternative susceptible-infectedremoved epidemic model in discrete-time

Propagation analysis and prediction of the COVID-19

Caution warranted: using the institute for health metrics and evaluation model for predicting the course of the COVID-19 pandemic

Modelling transmission and control of the COVID-19 pandemic in Australia

Forecasting the spread of the COVID-19 pandemic in Saudi Arabia using ARIMA prediction model under current public health interventions

The problem of overfitting

Maier BF, Brockmann D. Effective containment explains subexponential growth in recent confirmed COVID-19 cases in China

Imitation dynamics in the mitigation of the novel coronavirus disease (COVID-19) outbreak in Wuhan

Forecasting and Prediction of Discrete Time Series

A contribution to the mathematical theory of epidemics

Contributions to the mathematical theory of epidemics-I

Modeling Infectious Diseases in Humans and Animals

Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces

An interactive web-based dashboard to track COVID-19 in real time

Chaospy: an open source tool for designing methods of uncertainty quantification

Uncertainty estimation and Monte Carlo simulation method

Markov Chain Monte Carlo

Engineering application of experimental

The authors would like to thank the researchers Guilherme Côrtes Fernandes, Thaiz Ruberti Schmal and Luis Paulo da Silva Barra for the motivating discussions that improve the quality of this work.