key: cord-0559537-tq7tzagp
authors: Rodrigues, Tulio; Helene, Otaviano
title: A Monte Carlo approach to model COVID-19 deaths and infections using Gompertz functions
date: 2020-08-11
journal: nan
DOI: nan
sha: 9ac0b794dc98383f401b45a2c27685248979f801
doc_id: 559537
cord_uid: tq7tzagp

This study describes the dynamics of COVID-19 deaths and infections via a Monte Carlo approach. The analyses include death's data from USA, Brazil, Mexico, UK, India and Russia, which comprise the four countries with the highest number of deaths/confirmed cases, as of Aug 07, 2020, according to the WHO. The Gompertz functions were fitted to the data of weekly averaged confirmed deaths per day by mapping the $chi^2$ values. The uncertainties, variances and covariances of the model parameters were calculated by propagation. The fitted functions for the average deaths per day for USA and India have an upward trend, with the former having a higher growth rate and quite huge uncertainties. For Mexico, UK and Russia, the fits are consistent with a slope down pattern. For Brazil we found a subtle trend down, but with significant uncertainties. The USA, UK and India data shown a first peak with a higher growth rate when compared to the second one, demonstrating the benefits of non-pharmaceutical interventions of sanitary measures and social distance flattening the curve. For USA, a third peak seems quite plausible, most likely related with the recent relaxation policies. Brazil's data are satisfactorily described by two highly overlapped Gompertz functions with similar growth rates, suggesting a two-steps process for the pandemic spreading. The 95% CI for the total number of deaths ($times 10^3$) predicted by the model for Aug 31, 2020 are 160 to 220, 110 to 130, 59 to 62, 46.6 to 47.3, 54 to 63 and 16.0 to 16.7 for USA, Brazil, Mexico, UK, India and Russia, respectively. Our estimates for the prevalences of infections are in reasonable agreement with some preliminary reports from serological studies carried out in USA and Brazil. The method represents an effective framework to estimate the line-shape of the infection curves and the uncertainties of the relevant parameters based on the actual data.

The outbreak of the new coronavirus disease 2019 (COVID-19) brought a challenging scenario worldwide [1] [2] [3] , urging timely and effective responses from the authorities regarding the availability of intensive care units [4, 5] , as well as the implementation of nonpharmaceutical interventions of social distance and protective sanitary measures [6] [7] [8] . Epidemiological models [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] and other statistical approaches [19] [20] [21] have been very useful to guide actions to manage this crisis and to shed light on how to safely and gradually resume economics and social activities [22] . On the other hand, quantitative analyses are strongly susceptible to several uncertainties, such as under-reporting of confirmed cases and deaths [23] [24] [25] , lack of massive tests in some countries [26] , changes in policies and methods for reporting confirmed cases and deaths as time evolves during the * tulio@if.usp.br pandemic growth, very distinct socio-economic patterns and health facilities capabilities among different countries and also among different focuses of the disease within the same country. Such complex and puzzling scenario directly influences the forecast capabilities of the calculations, supporting the need of a multidisciplinary cooperation of the scientific community and the development of mathematical models to provide plausible estimates for the uncertainties of the relevant parameters [27] [28] [29] [30] . Therefore, the present analysis provides an effective phenomenological method to estimate the magnitude and the relevant uncertainties of some important quantities as the pandemic evolves, such as: (1) the peak day(s), growth rate(s) and total number of infected people for the reconstructed infection curves, (2) forecast distribution of deaths per day, and (3) forecast total number of deaths until Aug 31, 2020. It is worth mentioning, however, that the predictions of the model strongly depend on the prevailing conditions already in place for the specific country which data were analyzed, as any substantial change in governmental policies, either in the direction of loosening or tightening social distance, will generate different dynamics for the spread of the virus. The best-fit parameters are generally strongly correlated, but its uncertainties reflect the dispersion of the data at each epidemiological week, making the present analysis a suitable quantitative method to describe the pandemic line-shape behavior with few parameters. Moreover, such approach could also be useful for the identification of upward trends and its correlations with relaxation policies and procedures as global social and economics activities are gradually resumed [31] . The calculations assume that the total number of infected people increase according with a Gompertz-type function, which is a sigmoid curve with a lower growth rate at the beginning and at the end, such that:

where N represents the asymptotic number of infected people for t → ∞, λ is the growth rate and t 0 the peak time of the derivative of I(t). In such model, the number of infected people per time period can be written as:

For the specific case of COVID-19, the data of confirmed cases depend very strongly on testing and reporting policies for the related country. These policies may also vary along time, distorting the shape of the distributions. This complicated scenario disfavor the use of confirmed cases as a reliable source of information to describe the pandemic dynamics, which is crucial to guide government actions and decisions. In order to overcome these difficulties, we have adopted the data of deaths per day, as they should be more consistent with the actual spreading mechanism of the virus. The connection between the trial function of the number of infected people per day and the number of deaths per day takes into account the probability distribution function for the elapsed time between the infection and the death, which can be satisfactorily described by the sum of two time periods, namely, the incubation period t inc , and the symptom onset to death period t s−d . Both periods are generated in the Monte Carlo algorithm assuming that they are independent and gamma distributed with an average of 5.1 and 17.8 days and coefficient of variation of 0.86 and 0.45, respectively, as proposed elsewhere [32] based on previous studies of Wuhan data [6, 33] [34] . So, the time of the death t d can be written as: t d = t inf + t inc + t s−d , with t inf representing the time of infection. The analysis of the deaths data [35] were done considering the weekly averaged deaths per day and its corresponding standard deviation of the average for the respective epidemiological week, counted retrospectively from the end date of Aug 7, 2020. The time counting considered the day of the first confirmed case for each country and the first epidemiological weeks were chosen in such a way that all days of the week had at least one death, meaning that the present calculations do consider the early stages of the pandemic, including from 19 up to 22 weeks depending on the country (see Table I ). The mean day of each epidemiological week was chosen with a time bin of ± 3.5 days, and the mean of each day was taken at the half of the day. The trial line-shapes of the number of infected people per day were generated considering a single Gompertz function for the case of Mexico and Russia, a sum of two Gompertz functions for Brazil, United Kingdom, and India and three Gompertz functions for USA. Such approach allows the inclusion of two or more superimposed dynamics of the disease as one would expect considering the changes in policies as time evolves (which might remarkably reduce the growth rates) and also the cluster structure expected for large countries with multiple focuses of the disease (not observed for Mexico and Russia so far). Such method would also allow the inclusion of other peaks as the countries start their processes of re-opening social and economics activities, which seems to be the case for USA.

The fitting procedure was performed by mapping the χ 2 , defined as:

whereF iw is the trial function for deaths calculated at time t iw (the mean day of the corresponding week), y iw the weekly averaged deaths per day and σy iw its standard deviation. In that sense, the average values with higher standard deviations had lower weights in the fitting procedure. The best fit parameters of the Gompertz functions (N, λ and t 0 ) where obtained by sorting 5000 random sets of parameters around plausible guessing values and calculating the respective trial function and χ 2 for each candidate, assuming a total of 10 million infections using a time bin of one day. The trial functionsF iw were calculated for each set of parameters by the connection between the 10 million infection events with the respective death events using the probability distribution function of t inc + t s−d , herein denoted P rob(∆t) = P rob(t d −t inf ) (see the insert of Fig.2 ). This procedure was done several times with progressively narrow bins for each parameter's increments until the respective χ 2 converged to the minimal value (the convergence criteria required that the χ 2 obtained for 5000 random sets of parameters is lower than the χ 2 obtained for 4000 runs and their difference is lower than 0.05 units).

The calculation of the total number of infected people N (three Gompertz functions for USA; two for Brazil, UK and India and one for Mexico and Russia) was performed assuming an infection-fatality-ratio (ifr ) of 0.66% [33] . A least square method [36] was applied for the calculation of the uncertainties of the best fit parameters, which covariance matrix can be written as:

whereF =F ′ iw,j stands for the partial derivative ofF iw at any t iw in respect to the P j parameter of each Gompertz function (N, λ and t 0 ) . The variance matrix of the death's data is diagonal, such that: V iw,iw = (σy 2 iw ) −1 . Given the lower number of deaths per day for India and Russia and the huge variation of the data in each week, we have included an additional uncertainty of 5% (σy iw → σy iw + 0.05y iw ) in order to achieve a successful fitting. For the calculation ofF ′ iw,j we have used the resulting convolution between the reconstructed curve of infected people G(t) and the probability density function P rob(∆t), such that:

where k max = 1 for Mexico and Russia, 2 for Brazil, UK and India and 3 for USA. The propagation of the uncertainties of the best fit parameters to the reconstructed infection curve took into account the full co-variance matrix V b, as similarly described in [37] , with the vector G ′ m being defined as:

where G ′ m,1 , ...G ′ m,jmax are the partial derivatives of the infection curve G(t m , N k , λ k , t 0k ) in respect to the parameter P 1 , ...P jmax calculated at each day t m (j max = 3 for Mexico and Russia, 6 for Brazil, UK and India and 9 for USA). Consequently, the uncertainty of the infection curve at each point can be written as:

The uncertainties in the convoluted functions can be calculated as:

where σG(τ ) is obtained by the interpolation of σG m . Figure 1 shows the weekly averaged deaths per day distributions for all six countries (data points) and its respective convoluted functions F c (t) (dashed-dotted gray lines). The upper and lower estimates F ± C (t) [95% Confidence Intervals(CI)] are presented by the red and blue dashed-dotted lines, respectively. The total number of deaths and its 95% CI at any given time t f can be written as:

where N d (t i ) corresponds to the actual data of accumulated deaths until the day (t i ) for each country. Table I summarizes all the results, including the model predictions for the total number of deaths and its 95% CI for Aug 31, 2020. The countries with well defined first peaks (USA and UK) present the highest initial growth rates (0.109 ± 0.012d −1 and 0.100 ± 0.017d −1 ) and uncertainties in the peak days of 1.0 and 2.2 days, respectively. The second peaks in both cases have much lower growth rates, demonstrating the flattening of the infection curve due to non-pharmacological interventions of social distance and sanitary measures. Besides UK, which clearly shows a well controlled scenario, Mexico and Russia also have a trend down with modest uncertainties, which is a consequence of the fitting being successfully performed with a single Gompertz function. For the case of India, the first Gompertz function has a quite small contribution (∼2%) in the total number of infections and the second curve dominates the distribution of deaths per day; resulting in modest uncertainties. For the case of Brazil, there are few data points to properly constraint the second Compertz function, which has a similar growth rate compared with the first one. The large overlap between the two functions leads to high correlation coefficients between N 1 and λ 1 (-0.989), N 2 and λ 2 (-0.972), λ 1 and λ 2 (-0.841) and N 1 and N 2 (-0.973), influencing for the large uncertainties. A similar situation also play a role for the huge uncertainties found in the parameters of the third Gompertz function for USA, which is weakly constrained with few data points. The death's peak days have an average shift of 22.9 days from the corresponding infection's peaks, which is the average of Prob(t ) (the sum of the averages of the two gamma functions). The estimates for the prevalences of infections at the beginning of each month are also shown in Table I Obviously that these comparisons should be done with parsimony, given the fact that our results refer to an overall estimate for each country and are strictly related with an infection-fatality-ratio of 0.66% [33] . The upper panel of Figure 2 shows the model predictions for the accumulated number of infections (solid lines) and the lower panel shows the corresponding model estimates for the accumulated deaths, in comparison with the available data (data points) at each 5 day time interval. Once again it is verified a nice agreement between the data and the model, with some discrepancies found in the early stages of the pandemic (accumulated number of deaths 100). The Monte Carlo generated probability distribution Prob(t ), and the two gamma distributions (incubation and symptom onset to deaths periods) are presented in the insert of the lower panel of Figure 2 .

In conclusion, we have presented a few-parameter model to describe the dynamics of a pandemic in terms of Gompertz functions using Monte Carlo techniques to determine the best fit parameters and a least square method -weighted by the dispersion of the death's data in each epidemiological week -to estimate the relevant uncertainties. Model predictions for the accumulated deaths (solid lines) versus the available data (data-points), which are presented within 5 days' time intervals for clarity. The insert of the lower panel shows the Monte Carlo generated distribution Prob(t) (blue histogram) and the gamma distributions for the incubation (black histogram) and symptom onset to death (red histogram) periods.

A new coronavirus associated with human respiratory disease in China

A new twenty-first century science for effective epidemic response

Emerging understandings of 2019-ncov

Intensive care management of coronavirus disease 2019 (COVID-19): challenges and recommendations

Strengthening hospital capacity for the COVID-19 pandemic

Impact of nonpharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand

The effect of human mobility and control measures on the COVID-19 epidemic in China

Response to COVID-19 in Taiwan: big data analytics, new technology, and proactive testing

Early dynamics of transmission and control of COVID-19: a mathematical modelling study

Using statistics and mathematical modelling to understand infectious disease outbreaks

Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy

The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study

A simple SIR model with a large set of asymptomatic infectives

Social interaction layers in complex networks for the dynamical epidemic modeling of COVID-19 in Brazil

First-wave COVID-19 transmissibility and severity in China outside Hubei after control measures, and second-wave scenario planning: a modelling impact assessment

Evolution and epidemic spread of sars-cov-2 in Brazil

Transmission dynamics of the COVID-19 outbreak and effectiveness of government interventions: A data-driven analysis

Two complementary model-based methods for calculating the risk of international spreading of anovel virus from the outbreak epicentre. the case of COVID-19

Covid-19 predictions using a Gauss model

The dynamics of covid-19: weather, demographics and infection timeline

Level of underreporting including underdiagnosis before the first peak of COVID-19 in various countries: Preliminary retrospective results based on wavelets and deterministic modeling

Correcting under-reported COVID-19 case numbers: estimating the true scale of the pandemic

Countries test tactics in waragainst COVID-19

Parameter estimation and uncertainty quantication for an epidemic model, Mathematical biosciences and engineering

Application of the CDC Ebola Response modeling tool to disease predictions

Prediction of the epidemic peak of coronavirus disease in japan

On stable parameter estimation and forecasting in epidemiology by the levenberg-marquardt algorithm with broydens rank-one updates for the jacobian operator

Covid-19 mass testing facilities could end the epidemic rapidly

Estimating the number of infections and the impact of non-pharmaceutical interventions on COVID-19 in 11 European countries

Estimates of the severity of coronavirus disease 2019: a model-based analysis

The average elapsed time from symptom onset to death was corrected to 17.8 days in [33], instead of their previous estimates of 18.8 days used by

The deaths data included in the analysis were taken from the European Centre for Disease Prevention and Control webpage

Covariance analysis by means of the least squares method, in Update of X ray and gamma ray decay data standards for detector calibration and other applications. V. 2: Data selection, assessment and evaluation procedures

Useful and little-known applications of the least square method and some consequences of covariances

COVID-19 antibody seroprevalence in

Remarkable variability in sars-cov-2 antibodies across Brazilian regions: nationwide serological household survey in 27 states

A populationbased study of the prevalence of COVID-19 infection in Espirito Santo, Brazil: methodology and results of the first stage