key: cord-0669690-tsoga8he authors: Duan, Qibin; Wu, Jinran; Wu, Gaojun; Wang, You-Gan title: Predication of Inflection Point and Outbreak Size of COVID-19 in New Epicentres date: 2020-07-15 journal: nan DOI: nan sha: 1de6b10730c5a257620f233bfc80c788e14d20e5 doc_id: 669690 cord_uid: tsoga8he The coronavirus disease 2019 (COVID-19) had caused more that 8 million infections as of middle June 2020. Recently, Brazil has become a new epicentre of COVID-19, while India and African region are potential epicentres. This study aims to predict the inflection point and outbreak size of these new/potential epicentres at the early phase of the epidemics by borrowing information from more `mature' curves from other countries. We modeled the cumulative cases to the well-known sigmoid growth curves to describe the epidemic trends under the mixed-effect models and using the four-parameter logistic model after power transformations. African region is predicted to have the largest total outbreak size of 3.9 million cases (2.2 to 6 million), and the inflection will come around September 13, 2020. Brazil and India are predicted to have a similar final outbreak size of around 2.5 million cases (1.1 to 4.3 million), with the inflection points arriving June 23 and July 26, respectively. We conclude in Brazil, India, and African the epidemics of COVI19 have not yet passed the inflection points; these regions potentially can take over USA in terms of outbreak size The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), first reported in Wuhan, China at the end of 2019, spread across China and the globe and was declared a pandemic on March 11, 2020. As of middle June 2020, it had caused more than 8 million infections. As reported by WHO (2020) , in recent months, USA and Europe have been epicentres of COVID-19 and are experiencing the rapid increase of its outbreak size. Although the growth of daily confirmed infections in these regions is slowing down, there is still no sign that the epidemic has gained any control or flattened in these regions; even worse, Brazil, India and African region (all affected countries) have become the new epicentres of COVID-19 with increasing number of confirmed infection every day (Lancet, 2020; Pearson et al., 2020) . Mathematical modelling, including statistical modelling, is an important tool to understand and predict the dynamics of new diseases. Since the first identification of COVID-19, different approaches haves been developed to simulate and characterize its dynamics and spread trend (Grasselli et al., 2020; Yang et al., 2020; Roosa et al., 2020; Petropoulos & Makridakis, 2020; Cui & Hu, 2020; Perc et al., 2020; Sajadi et al., 2020; Zhang et al., 2020) . The classical one is the dynamic infectious disease modeling, with using deterministic ODE models or stochastic individual based models, and such approach allows to incorporate with underlying mechanisms of spread and various risk factors in the simulation of transmission (Peng et al., 2020; Wynants et al., 2020; Kucharski et al., 2020; Fanelli & Piazza, 2020) . This approach is commonly used to identify the crucial transmission parameters and assess the potential impact of public health interventions. Although prediction of transmission trend can be also achieved, setting up such models needs heavy information of local demography and praxiology that is difficult to obtain accurately. For the purpose of prediction, data-driven (or phenomenological) methods are preferred, i.e., machine learning and statistical modeling (Alimadadi et al., 2020; Ribeiro et al., 2020; Benvenuto et al., 2020; Ceylan, 2020; Ribeiro et al., 2020; Zheng et al., 2020; Kavadi et al., 2020) . Almost investigated modeling approaches have been developed to characterize the transmission and impact of COVID-19 in the context of a specific country or region, from which this information can be estimated, e.g., forecasting the confirmed cases and deaths in China (Gao et al., 2020; Al-Qaness et al., 2020) . However, for many regions, such modelling studies are still unavailable now. Knowledge of the inflection (flattening the curve) and maximal outbreak size is crucial to reflect the evolving trend of an epidemic; in the case of COVID-19, this information remains unclear but does influence the dates of changes to policy restrictions and the recovery of the global economy. Furthermore, an accurate prediction of such information at the early stage is difficult due to the lack of detailed data on testing availability and reporting/infection processes as well as governmental restrictions. To address the problems, non-linear mixed effect model is used to model the (transformed) daily reported number of cumulative confirmed cases. The data was grouped according to country or region. In the countries at the later stage of the epidemics, the reported number of cumulative confirmed cases all show a sigmoidal shape with respect to time, so we use the four parameters Logistic model (FPLM), a generalization of Logistic growth model, to model the growth patterns. By fitting to such non-linear mixed effect model we can predict the inflection point and final size of outbreak in modeled countries and regions. The data set was downloaded from European Centre for Disease Prevention and Control, which provides the geographic distribution of COVID-19 cases worldwide, e.g. daily incidence (newly confirmed cases), cumulative number of confirmed cases, population of each country, etc., from European Centre for Disease Prevention and Control. Note that we select two groups of countries/regions, the first is for the countries in the late stage of outbreak, including Australia, China, France, Italy, Germany, Spain and UK; and countries/regions in the second group are still in the early stage of the outbreak, including USA, Africa (as a whole), Brazil, India and Russia. The data set is about the officially confirmed and reported cases, which is inevitably inaccurate and under-reported due to the limited coverage of testing, especially in the early period of the outbreak. Also, this study is more interested in the future growth trend and final outbreak size. Hence, early observations were thrown away to enable the model to fit the early period more flexibly. Let y(t) be the cumulative numbers at time t and Y (t) be the derivative function representing the growth rate. If x t represents the explanatory variables(such as temperature, or behavour changes due to government restrictions) believed to be related to the growth rate, we need to incorporate their effect in the growth model via a link function where f (y) specifies the growth rate as a function of its current size under the constant environmental condition while g models how the growth rate might change when the environmental conditions (x t ) changes. Here σ(t) is the stochastic error of zero mean representing the environmental or measurement perturbation possibly with heteroscedasticity. More details can be seen in Wang (1999) . The simple linear function of f corresponds to the asymptotic regression model (also known as von Bertalanffy growth curve). However, we are particularly interested in a sigmoidal curves and the inflection point is of great interest. The well known curves of this type include logistic (autocatalytic), Richards and Gompertz (Seber & Wild, 1989) . In this study, we will apply the mixed-effects models assuming each country follows the same curve but different set of parameters. These parameters can be potentially modelled as functions of population size and other attributes. More details can be seen in Pinheiro & Bates (2006) . A model for nonlinear mixed-effects can be written as for observation j, j = 1, . . . , n i in group i, i = 1, . . . , M . In model (2), β i includes both fixed effect β and random effects b i . Specifically in our case, y is the cumulative number of confirmed cases, g is the transformation used, i is the index of country/region, and t is time index of the observation (day). Here A i and B i are design matrices for ith group to determine the fixed and random effects. The advantage of mixed-effect model is to produce more precise estimate of b i and β i by borrowing strength/information from the rest of the sample from the population. See Pinheiro & Bates (2006) for more detains about non-linear mixed effect model. To model the confirmed cases over time we use a four Four Parameter Logistic Model (FPLM) , and the parameters are: • φ i1 , the minimum theoretical value of y ij as time t ij → −∞; • φ i2 , the maximum value as t ij → ∞; • φ i3 , the inflection point, and t ij = φ i3 the response y ij is midway between the φ i1 and φ i2 ; • φ i4 , is a scale parameter for time. We are particularly interested in the following two parameters. n max : the maximum number (asymptote), and n Inf l : the number of cases at the inflection point (φ 1 + φ 2 )/2. We first tried this on the raw data (Model 1). We then used power and logarithmic transformation of cumulative number of confirmed cases as the fitting response as follows. a. Power (square root) transformation (Model 2) Here we fit the cumulative cases after power transformation to the well-known growth curves to describe the epidemic trends in the countries and regions of interest under the mixed-effect model and using the four-parameter logistic model. The advantage of the mixed-effect model is to borrow information from the members with rich information (Pinheiro & Bates, 2006) . The four-parameter logistic model has been proved to perform well in describing epidemic growths (Wu et al., 2020; Chen et al., 2020) . We included 12 countries and regions in our study. Nine countries (Australia, China, France, Germany, Italy, Russia, Spain, UK and USA) have almost experienced a full growth curve, and other regions (Brazil, India and the African region) at the early phase of the epidemic will have certain similarity with some of these nine countries in transmission and response strategies. This may result in similarity in the growth of confirmed cases. The fitted results for all selected countries/regions under three different models are shown in Table 1 . Note that n 0 = 10 φ1 , n max = 10 φ2 , and n Inf l is the number of cases at the inflection point, which is √ 10 φ1+φ2 = √ n 0 n max . Model 1, 2 and 3 correspond to no transformation, power transformation and log transformation of the cumulative number of confirmed cases in (3), (4) and (5), respectively. Additionally, Our models have confirmed that the nine countries aforementioned had passed the inflection point (the squared marks in Figure 1 ). Specifically, China had the inflection point on February 9, 2020, with a final outbreak size of 87k (Gu et al., 2020) . Australia, Italy, Spain, Germany and France have passed their inflection point in later March and early April; in these countries the current outbreak size almost reaches the estimates of maximal level (with invisible upper confidence boundaries in Figure 1 ). Unfortunately, cases in other countries (i.e., USA, Russia, and UK) will continue to increase, possibly to 2,357k (up to 2,425k); 538k (up to 546k); and 309k (up to 313k), respectively. In terms of other three regions (Brazil, India, and African region), they are still in the early outbreak phase (before inflection). Prediction of inflection point with current data for these regions is very difficult, based on the shape from other countries, the non-linear mixed effect model provides sensible predictions albeit large error intervals (in the left panel of Figure 1 ). African region is predicted to have the largest total outbreak size of 3.9 million cases (2.2 to 6 million), and the inflection will come around September 13, 2020. Note that African region here includes all the African countries affected by COVID-19. Brazil and India are predicted to have a similar final outbreak size of around 2.5 million cases (1.1 to 4.3 million), with the inflection points arriving June 23 and July 26, respectively. The epidemic in Brazil has entered the rapid growth stage and is increasing quickly in the number of the cumulative confirmed cases. The epidemic in India has the similar situation with Brazil, but the growth is predicted to have one-month delay. In these developing areas, community transmission and spread of COVID-19 has been ongoing for a while, but large-scale testing is only available until recent weeks. The rapid increase in confirmed cases can be attributed to the increasing level of testing coverage. Currently African region has a relatively small number of confirmed infection (<150k), but it might continue to increase to the level estimated in this study if not controlled effectively (Pearson et al., 2020) . Although USA passed the inflection point around April 18,2020, its growth rate (or the number of daily new confirmed infections) failed to achieve rapid decrease. Russia has a similar epidemic curve to USA. This In summary, our model predicted the inflection point and maximal outbreak size in Brazil, India, and African region; these regions might take over USA in terms of outbreak size at the end. This work only fits the growth curves to reported number of confirmed infections, and incorporating localized intervention policies and behavior parameters will improve the performance of fitting and prediction. Furthermore, in this work we test three different power transformation of cumulative number of cases, however, they are likely not the optimal choice for all countries and regions. Other power transformations are worthwhile to test and possibility of using different transformation to different countries/regions should be explored under the same framework of mixed-effect model. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Artificial intelligence and machine learning to fight covid-19 Optimization method for forecasting confirmed cases of covid-19 in china Application of the arima model on the covid-2019 epidemic dataset Estimation of covid-19 prevalence in italy, spain, and france. Science of The Total Environment Reconstructing and forecasting the covid-19 epidemic in the united states using a 5-parameter logistic growth model Nonlinear regression in covid-19 forecasting Analysis and forecast of covid-19 spreading in china, italy and france Forecasting the cumulative number of covid-19 deaths in china: a boltzmann function-based modeling study Critical care utilization for the covid-19 outbreak in lombardy, italy: early experience and forecast during an emergency response The inflection point about covid-19 may have passed Partial derivative nonlinear global pandemic machine learning prediction of covid 19 others (2020). Early dynamics of transmission and control of covid-19: a mathematical modelling study. The lancet infectious diseases Covid-19 in brazil:so what? Projected early spread of covid-19 in africa through 1 Epidemic analysis of covid-19 in china by dynamical modeling Forecasting covid-19 Forecasting the novel coronavirus covid-19 Mixed-effects models in s and s-plus Short-term forecasting covid-19 cumulative confirmed cases: Perspectives for brazil Real-time forecasts of the covid-19 epidemic in china from february 5th to february 24th Temperature and latitude analysis to predict potential spread and seasonality for covid-19. Available at SSRN 3550308 Nonlinear regression Estimating equations for parameters in stochastic growth models from tag-recapture data Generalized logistic growth modeling of the covid-19 outbreak in 29 provinces in china and in the rest of the world Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. bmj Modified seir and ai prediction of the epidemics trend of covid-19 in china under public health interventions Predicting turning point, duration and attack rate of covid-19 outbreaks in major western countries Predicting covid-19 in china using hybrid ai model