key: cord-1010811-3rmrkzuq authors: Cotta, Renato Machado; Naveira-Cotta, Carolina Palma; magal, pierre title: Modelling the COVID-19 epidemics in Brasil: Parametric identification and public health measures influence date: 2020-04-03 journal: nan DOI: 10.1101/2020.03.31.20049130 sha: 92acb561ba0ebfbeb8efb529279ae659dcd9a548 doc_id: 1010811 cord_uid: 3rmrkzuq A SIRU-type epidemic model is proposed for the prediction of COVID-19 spreading within Brasil, and analyse the influence of public health measures on simulating the control of this infectious disease. Since the reported cases are typically only a fraction of the total number of the symptomatic infectious individuals, the model accounts for both reported and unreported cases. Also, the model allows for the time variation of both the transmission rate and the fraction of asymptomatic infectious that become reported symptomatic individuals, so as to reflect public health interventions, towards its control, along the course of the epidemic evolution. An analytical exponential behaviour for the accumulated reported cases evolution is assumed at the onset of the epidemy, for explicitly estimating initial conditions, while a Bayesian inference approach is adopted for parametric estimations employing the present direct problem model with the data from the known portion of the epidemics evolution, represented by the time series for the reported cases of infected individuals. The direct-inverse problem analysis is then employed with the actual data from China, with the first half been employed for the parametric estimation and the second half for validation of the predictive capability of the proposed approach. The full dataset for China is then employed in another parameter identification, including the average times that asymptomatic infectious individuals and that symptomatic individuals are infectious. Following this validation, the available data on reported cases in Brasil from February 15th till March 29th, 2020, is used for estimating parameters and then predict the epidemy evolution under these conditions. Finally, public health interventions are simulated, aimed at diminishing the effects of the disease spreading, by acting on both the transmission rate and the fraction of the total number of the symptomatic infectious individuals, considering time variable exponential behaviours for these two parameters, usually assumed constant in epidemic evolutions without intervention. It is demonstrated that a combination of actions to affect both parameters can have a much faster and effective result in the control of the epidemy dynamics. A new human coronavirus started spreading in Wuhan, China, by the end of 2019, and turned into a pandemic disease called COVID-19 as declared by the World Health Organization on March 11 th , 2020. The affected countries and cities around the world have been reacting in different ways, towards locally controlling the disease evolution. These measures include general isolation through quarantine and massive testing for focused isolation, with varying degrees of success so far, as can be analysed from the limited data available. Naturally, China offers the longest time series on reported infected cases and the resulting effects of combining different public health interventions. As of March 26 th , 2020, there were no reports in China of further internal contaminations, and all the new cases are associated with infected individuals that (re)entered in the country. Despite the apparent success of the interventions in China, each region or country might require a specific combination of measures, due to demographic spatial distribution and age structure, health system capabilities, and social-economical characteristics. In this sense, it urges to have a mathematical model that would allow for the simulation of such possible interventions on the epidemic evolution within the following few weeks or months. This article presents a collaborative research effort towards the construction of an epidemic evolution prediction tool, which combines direct and inverse problem analysis and is both reliable and easy to implement and execute, initially motivated by offering some insight into the control of COVID-19 within Brasil. The classical susceptible-infectious-recovered (SIR) model describes the transmission of diseases between susceptible and infective individuals and provides the basic framework for almost all later epidemic models. At the onset of the coronavirus epidemy in China, there were some initial studies for the prediction of its evolution and and France [5] [6] [7] . Besides identifying unreported cases, this simple and robust model also allows for introducing a latency period and a time variable transmission rate, which can simulate a public health orientation change such as in a general isolation measure. In addition, an analytical exponential behaviour is assumed for the accumulated reported cases evolution along a second phase just following the onset of the epidemy, which, upon fitting of the available data, allows for the explicit analytical estimation of the transmission rate and the associated initial conditions required by the model. Here, the SIR-type model in [2] [3] [4] [5] [6] [7] is implemented for the direct problem formulation of the COVID-19 epidemic evolution, adding a time variable parametrization for the fraction of asymptomatic infectious that become reported symptomatic individuals, a very important parameter in the public health measure associated with massive testing and consequent focused isolation. The same analytical identification procedure is maintained for the required initial conditions, as obtained from the early stages exponential behaviour. However, a Bayesian inference approach is here adopted for parametric estimation, employing the Markov Chain Monte Carlo method with the Metropolis-Hastings sampling algorithm [8] [9] [10] [11] [12] . At first, the goal of the inverse problem analysis was estimating the parameters associated with the transmission rate and the fraction of asymptomatic infectious that become reported symptomatic individuals, which can be quite different in the various regions and countries and also very according to the public health measures. Then, in light of the success in this parametric identification, an extended estimation was also employed which incorporates the average time the asymptomatic infectious are asymptomatic and the average time the infectious stay in the symptomatic condition, due to the relative uncertainty on these parameters in the literature. The proposed approach was then applied to the data from China, first by taking just the first half of these data points in the estimation, while using the second half to validate the model using the estimated parameters with just the first half of the epidemy evolution, and second by employing the whole time series in the MCMC estimation procedure, thus identifying parameters for the whole evolution period. This second estimation was particularly aimed at refining the data for the average times that asymptomatic infectious individuals and that symptomatic individuals remain infectious. Upon validation of the approach through the data for China, we have proceeded to the analysis of the epidemic dynamics in Brasil, after about 35 days of collected information on reported infected individuals. First, the available data was employed in the parametric estimation, followed by the prediction of the epidemy evolution in Brasil. Then, we have explored the time variation of both the transmission rate and the fraction of asymptomatic infectious that become reported symptomatic individuals, so as to reflect public health interventions, in simulating possible government measures, as described in what follows. The implemented SIR-type model [2] [3] [4] [5] [6] [7] is given by the following initial value problem: where, with initial conditions + ν2(t) = ν. The transmission rate, τ(t), is also allowed to be a time variable function along the evolution process. Figure 1 below illustrates the infection process as a flow chart. The time variable coefficients, τ(t) and f(t), are given by: These parametrized functions are particularly useful in interpreting the effects of public health interventions. For instance, the transmission rate, τ(t), is particularly affected by a reduced circulation achieved through a general isolation or quarantine measure, while the fraction f(t) of asymptomatic infectious that become reported, thus isolated, cases can be drastically increased by a massive testing measure with focused isolation. In the above relations, is the attenuation factor for the transmission rate, N is the time in days for application of the public health intervention to change transmission rate, is the argument of the f(t) variation between the limits ( 0 , ). The first time variable function has been previously considered, while the second one has been introduced in the present work, so as to allow the examination of combined measures. The cumulative number of reported cases at time t, ( ), which is the quantity offered by the actual available data, and the a priori unknown cumulative number of unreported cases, ( ), are given by: The daily number of reported cases from the model, ( ), can be obtained by computing the solution of the following equation: with initial conditions Inverse problem analysis is nowadays a common practice in various science and engineering contexts, in which the groups involved with experimental data and numerical simulation synergistically collaborate so as to obtain the maximum information from the available data, towards the best possible use of the modelling for the problem under study. Here, as mentioned in the introduction, we first review an analytical parametric identification described in more details in [4] [5] [6] [7] , that from the initial phases of the epidemic evolution allows to explicitly obtain the unknown initial conditions of the model, while offering a reliable estimate for the transmission rate at the onset of the epidemy. Nevertheless, even after these estimates, a few other parameters in the model remain uncertain, either due to the specific characteristics of the physical conditions or reaction to the epidemy in each specific region, or due to lack of epidemiological information on the disease itself. Therefore, an inverse problem analysis was undertaken aimed at estimating the main parameters involved in the model, as summarized in Table 1 below. First, for the dataset on the accumulated reported cases for China, the focus is on the parametrized time variation of the transmission rate ( 0 and ) and the fraction of asymptomatic infectious that become reported ( 0 ), in this case assumed constant, followed by an effort to refine the information on the average times (1/ν and 1/η) through All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2020.03.31.20049130 doi: medRxiv preprint a simultaneous estimation of the five parameters. Then, employing the dataset for Brasil, the parametrized time variation of the transmission rate ( 0 and ) and the fraction of asymptomatic infectious that become reported ( 0 ), initially assumed constant, are estimated. In addition, due to the behaviour of the estimated CR(t) curve in this case, it is also attempted to estimate a possible time variation for the fraction of asymptomatic infectious that become reported, ( ), by parametrization of an abrupt variation that requires just the estimation of and . The statistical inversion approach here implemented falls within the Bayesian statistical framework [8] [9] [10] [11] [12] , in which (probability distribution) models for the measurements and the unknowns are constructed separately and explicitly, as shall be briefly reviewed in what follows. As explained in previous works employing this model [4] [5] [6] [7] , it is assumed that in the early phase of the epidemic, the cumulative number of reported cases grows approximately exponentially, according to the following functional form: After fitting this function to the early stages of the epidemic evolution, one may extract the information on the unknown initial conditions, in the form [4] [5] [6] [7] : All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2020.03.31.20049130 doi: medRxiv preprint In addition, an excellent estimate for the initial transmission rate can be obtained from the same fitted function, in the form: Also, the the basic reproductive number for this initial phase model is estimated as: The statistical approach for the solution of inverse problems here adopted employs where M is the number of parameters. For estimating P, we assume that a vector of measured data is available (Y) containing the measurements Yi at time ti, i = 1, …, I. Bayes' theorem can then be stated as [8] [9] : where  posterior (P) is the posterior probability density, that is, the conditional probability of the parameters P given the measurements Y,  prior (P) is the prior density, that is, the coded information about the parameters prior to the measurements,  (Y|P) is the likelihood function, which expresses the likelihood of different measurement outcomes involved, allowing one to do Bayesian inference even in rich and complex models. The idea behind the Metropolis-Hasting sampling algorithm is illustrated below, and these steps should be repeat until it is judged that a sufficiently representative sample has been generated. Start the chain with an initial value, that usually comes from any prior information that you may have; 2) Randomly generate a proposed jump aiming that the chain will move around and efficiently explores the region of the parameter space. The proposal distribution can take on many different forms, in this work a Gaussian random walk was employed, implying that the proposed jumps will usually be near the current one; 3) Compute the probability of moving from the current value to the proposed one. Candidates moving to regions of higher probability will be definitely accepted. Candidates in regions of lower probability can be accepted only probabilistically. If the proposed jump is rejected, the current value is tally again. For more details on theoretical aspects of the Metropolis-Hastings algorithm and MCMC methods and its application, the reader should refer to [8] [9] [10] [11] [12] . Before proceeding to the analysis of the COVID-19 epidemic evolution within Brasil, the major concern in the present contribution, the need was felt in validating the proposed direct-inverse problem analysis approach. In this sense, due to the largest available dataset All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is . on this pandemic, we have chosen to use the information from China in terms of the accumulated confirmed infectious cases. The data for China was extracted from [6] , complemented by the most recent data from [13] up to March 25 th , 2020. The exponential fit for the early phase of the China CR(t) dataset provided the estimates of the three parameters, 1 = 0.14936, 2 = 0.37680, 3 = 1.0, from which we have estimated 0 = 5.046. The remaining data for the initial conditions, 0 and 0 , and the early stage transmission rate, 0 , are in fact recalculated from within the MCMC algorithm, since the changing values of f will affect them, according to eqs. (7.c-e). The average times in the model were taken as 1/ν=7 and 1/η=7 days and the isolation measures were taken at N=29 days [6] . First, experimental data from China from the period of January 19 th up to February 17 th was employed in demonstrating the estimation of three parameters, 0 , , and 0 , assuming there is no significant time variation in the function f(t) ( = 0). In the absence of more informative priors, uniform distributions were employed for all three parameters under estimation. Table 1 presents the prior information and the initial guesses for the parameters. If the initial guesses were used to predict the CR(t) behavior, an over-estimation of the accumulated reported infected individuals would occur, especially in the long term, as can be noticed in Figure 1 , confirming the need for a proper parameter estimation. Table 1 -Prior distributions and initial guesses for the parameters to be estimated 0 , , and 0 (China). the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is Table 2 . It should be recalled that non-informative priors were adopted for the three parameters, as presented in Table 1 , and except for the transmission rate, when eq.(7.e) provides an excellent initial guess, the remaining guesses were completely the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2020.03.31.20049130 doi: medRxiv preprint arbitrary, such as in the analysis for a less complete dataset, as will be discussed in the next section. Although the present estimated parameters have led to a good prediction of the second half of the China epidemic evolution data, there are still uncertainties associated with the average times here assumed both equal to 7 days, according to [6] . This choice was based on early observations of the infected asymptomatic and symptomatic patients in Wuhan, but more recent studies have been refining the information on the epidemic evolution and All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2020.03.31.20049130 doi: medRxiv preprint the disease itself, such as in [14] [15] [16] [17] . For this reason, we have also implemented a statistical inverse analysis with the full dataset of China, but now seeking the estimation of five parameters, so as to simultaneously estimate the average times (1/ν and 1/η). Both uniform and Gaussian distributions were adopted for the two new parameters, with initial guesses of 1/ν=7 days and 1/η=7 days, and N=29 days, as employed in [6] . Table 3 provides the estimated values and 95% confidence intervals for all five parameters, with Gaussian priors for the two average times with data obtained from [14, 17] . The most affected parameter in comparison with the previous estimates is the average time 1/η, which is also the one with widest confidence interval. This behaviour is also evident from the Markov chains for this parameter, now simultaneously estimated. Figure 5 compares the theoretical predictions with the model incorporating the five estimated parameters as in Table 3 , against the full CR(t) dataset for China, confirming the improved agreement. The CR(t) data for the accumulated reported infectious in Brasil, from February 25 th , when the first infected individual was reported, up to March 29 th , is presented in the Appendix. First, the exponential phase of the evolution was fitted, taking the data from day 10 to 25, yielding the estimates of the three parameters, 1 For instance, in the analysis of the Italy epidemic evolution reported in [6] , with data from January 31 st to March 8 th , a comparable low attenuation factor of = 0.032 was identified. It is also possible to observe the lower value of the parameter 0 , in comparison to the value obtained for the China dataset, which represents that only around 30% of the infected symptomatic individuals become in fact reported cases. This result could reflect an initial protocol of not thoroughly testing the mildly symptomatic individuals or just a lack of enough testing kits. This fact shall be discussed again further ahead when the impact of public health measures is analysed. Figure 6 illustrates the good agreement of Brasil's full dataset (period from February 25th till March 29th) with the mathematical model predictions, after adopting the estimated values for the parameters in Table 4 . The theoretical CR(t) curve is plotted together with the 95% confidence interval bounds for this simulated evolution. It should be recalled that non-informative priors were adopted for the three parameters, as in the China example, and except for the transmission rate, All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is Next, this parameter estimation is employed in the prediction of the COVID-19 evolution in Brasil. Five scenarios were here explored: (i) the present public health interventions remain unchanged; (ii) a stricter isolation is implemented from now on, further reducing the transmission rate; (iii) an attenuation on the social isolation policy, leading to an increased transmission rate; (iv) an increment on the fraction of reported cases, through a massive blood testing campaign, for instance, forcing more unreported cases to become All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2020.03.31.20049130 doi: medRxiv preprint reported ones, thus isolating them earlier; (v) a combination of public health measures acting on both reducing the transmission rate and on increasing the conversion factor of unreported to reported cases; In the first scenario, it is assumed that no further public health interventions are implemented, other than those already reflected by the data which should be fully maintained throughout the control period, and the epidemics should evolve from the present stage, under the parameters above identified. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2020.03.31.20049130 doi: medRxiv preprint the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2020.03.31.20049130 doi: medRxiv preprint the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2020.03.31.20049130 doi: medRxiv preprint Finally, in the fifth scenario, the combination of public health measures acting on both the transmission rate and on the conversion factor or unreported to reported cases is analyzed for Brasil. Therefore, let us consider after day N2=40, 50% improvement with respect to the value of here identified, thus around, 2 =0.0831, and simultaneously increase the fraction of reported and unreported infectious cases, to become = 0.7185, also starting after day N2=40, with =0.5. The changes in the accumulated reported and unreported cases, as shown in Figure 11 , are the most encouraging in the present analysis. The predicted number of unreported infectious cases is now reaching after 150 days around 36,770 individuals, while the reported cases should reach 50,006 individuals, with a marked decrease to a total of around 86,777 infectious cases, about 30% reduction with respect to the base case. The predicted evolution of the daily reported infectious cases would then show a peak at around t=46 days of about 2,196 reported cases. Again, though this peak value is higher than for the base case, before the public health improvements, a number of these are of mild symptomatic cases that were moved from the unreported to the reported cases evolution, thus moved to monitored isolation earlier, and not necessarily requiring hospitalization. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2020.03.31.20049130 doi: medRxiv preprint Though the three parameters estimation provides a fairly good reproduction of the behaviour of the CR(t) curve for Brasil, one may observe a change in the pattern of the evolution around day 30, that could not be entirely followed by the proposed model. It is also a known fact that the initial amount of kits for blood testing that were purchased by the Brazilian government were finished around this time, and before being fully supplemented, there could have been a reduction on the number of executed exams of the symptomatic individuals, that might have affected the partition of reported to unreported cases by the end of this period covered by the present dataset. Therefore, the more general model including the time variation of the partition f(t), eqs.(4.c,d), is here implemented for a more refined inverse problem analysis. It is then expected that a reduction on the f value can be identified ( < 0 ), with an abrupt variation on the exponential behaviour, here assumed as a sharp functional time dependence (large ). Therefore, an additional statistical inverse problem analysis is undertaken, this time for estimating five parameters, namely, 0 , , 0 , , and , aimed at improving the overall agreement with the CR(t) data behaviour, with a possible reduction on the partition of the reported and unreported the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2020.03.31.20049130 doi: medRxiv preprint infectious cases. With uniform distributions for all five parameters, taking the previous estimates for the three first parameters, an arbitrary guesses for , and , the five estimated parameters are shown in Table 5 , together with the 95% confidence interval for each parameter. Figure 12 shows the theoretical CR(t) curve obtained with the five parameters estimation, plotted together with the 95% confidence interval bounds for this simulated evolution. One can see the marked reduction on the f(t) parameter from the estimates in Table 5, which results in the increase of the unreported to reported infectious cases, as is shown in Figure 13 .a for CR(t) and CU(t) predictions up to 150 days. Clearly, the reduction on the testing, and thus on the isolation of reported infectious individuals, leads to an impressive increase on the total number of infected individuals after 150 days (723,698 cases), including unreported (609,125) and reported cases (114,572). Figure 13 .b presents the predicted evolution of the daily reported infectious cases, which shows a peak at around t=61 days of about 2,672 reported cases. Hopefully this difficulty with the availability of enough testing kits that occurred around day 30 has been already solved and the desirable increase on the number of tests and reported cases will be apparent from the next few entries in the accumulated reported cases. From the present results it is quite clear that the reduction on the testing has unfortunate consequences on the epidemic evolution. At the end of this report, the predicted results for CR(t) provided the value of 5438 reported cases, in comparison to the officially announced value of 5717 cases on March 31 st , 2020. All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is . the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2020.03.31.20049130 doi: medRxiv preprint Figure 13 .b -Prediction of the daily reported data distribution, DR(t), with the five estimated parameter values from the available daily reported cases dataset for Brasil from February 25 th up to March 29 th (red dots). The present work implements a mixed analytical-statistical inverse problem analysis to the prediction of epidemics evolution, with focus on the COVID-19 progression in Brasil. A SIRU-type model is implemented for the direct problem solution, while a mixture of an analytical parametric estimation for the early phase epidemic exponential behavior with a Bayesian inference approach for the entire period, are considered for the inverse problem analysis. The evolution of the COVID-19 epidemy in China is considered for validation purposes, by taking the first part of the dataset to estimate parameters, and retaining the rest of the evolution data for direct comparison with the predicted results, with excellent agreement. Then, the same approach is applied to the Brazilian case, this time employing the available time series so far for the parametric estimates, and then offering an evolution prediction. Also, some public health intervention measures are critically examined, in addition to those already implemented, permitting the inspection of their impact on the overall dynamics of the disease proliferation. Clearly, a combination of public health interventions can offer a considerable impact reduction on the disease progression within Brasil, as illustrated by the implemented modelling. It was also analyzed the negative impact due to the scarcity of testing kits during a period, which if not solved and even incremented, would lead to an increase on the ratio of unreported to reported symptomatic cases, and consequently on a dramatic epidemic evolution. All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2020.03.31.20049130 doi: medRxiv preprint Further improvement on the modelling is envisioned by enriching the model with latency effects, age structure discrimination, spatial demographic distribution dependence, and recovery factor differentiation among isolated and non-isolated patients. Estimation of the Transmission Risk of 2019-nCov and Its Implication for Public Health Interventions Understanding unreported cases in the 2019 -nCov epidemic outbreak in Wuhan, China, and the importance of major public health interventions The parameter identification problem for SIR epidemic models: Identifying Unreported Cases Identifying the Number of Unreported Cases in SIR Epidemic Models Predicting the cumulative number of cases for the COVID -19 epidemic in China from early data Predicting the number of reported and unreported cases for the COVID -19 epidemic in South Korea A COVID -19 epidemic model with latency period SSNR Statistical and Computational Inverse Problems, Applied Mathematical Sciences Stochastic Simulation for Bayesian Inference Thermal Measurements and Inverse Techniques Doing Bayesian Data Analysis: A Tutorial with R, JAGS and Stan Inverse Problems in Heat Transfer: New Trends on Solution Methodologies and Applications The Incubation Period of Coronavirus Disease From Publicly Reported Confirmed Cases: Estimation and Application Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand, Imperial College COVID-19 Response Team All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is The authors are deeply grateful to Dr. Tania Mattos Petraglia, MD, for the valuable information on COVID-19 pathology and treatment.