key: cord-0931958-107t3ffw authors: Hasan, A.; Putri, E.; Susanto, H.; Nuraini, N. title: Data-driven modeling and forecasting of COVID-19 outbreak for public policy making date: 2020-08-02 journal: nan DOI: 10.1101/2020.07.30.20165555 sha: d7f197974ba5e020b65bfd9412da81ea65d7279c doc_id: 931958 cord_uid: 107t3ffw This paper presents a data-driven approach for COVID-19 outbreak modeling and forecasting, which can be used by public policy and decision makers to control the outbreak through Non-Pharmaceutical Interventions (NPI). First, we apply an extended Kalman filter (EKF) to a discrete-time stochastic augmented compartmental model to estimate the time-varying effective reproduction number Rt. We use daily confirmed cases, active cases, recovered cases, deceased cases, Case-Fatality-Rate (CFR), and infectious time as inputs for the model. Furthermore, we define a Transmission Index (TI) as a ratio between the instantaneous and the maximum value of the effective reproduction number. The value of TI shows the disease transmission in a contact between a susceptible and an infectious individual due to current measures such as physical distancing and lock-down relative to a normal condition. Based on the value of TI, we forecast different scenarios to see the effect of relaxing and tightening public measures. Case studies in three countries are provided to show the practicability of our approach. The spread of new coronavirus disease 2019 (COVID- 19) , originating from Wuhan China, has been worldwide and caused a severe outbreak. The virus has infected more than 17.5 million people with more than 680 thousand confirmed deaths by the end of July 2020 [1] . The outbreak triggered crisis which is 5 beyond health crisis and it is expanding into economic and social crisis. The effect of crisis to the society become a multidimensional problem that need to be minimized through some measurable public policies [2] . Intervention measures are introduced to contain the outbreak and to pre- 10 vent it continuing to grow and transmitted, such as physical distancing and lock-down measures [3, 4] . To this extent, a thorough evaluation to implement available options, has been urgently needed. A quantitative as well as a qualitative evaluations involving key characteristics of COVID-19 outbreak can be conducted based on epidemiological parameters [5] . As the incidence is growing, 15 a quantitative evaluation to identify a minimum physical distancing policy aims to control the outbreak transmission in Australia [6] , China [7] , and Italy [8] . A control measure for the disease's transmission, known as the time-varying effective reproduction number R t , reflects the disease extended transmission with the presence of interventions. Therefore, estimation of the time-varying 1.2. Literature review 25 A deterministic SIR model-based for R t estimation, assumes that the data used significantly representing the actual outbreak, was presented in [11] . Different sets of data representing level of quarantine measures, are used in [7] describing growth of the cases and also the effective reproduction number R t for each measure's levels. To accommodate uncertainties in incidence data, noise 30 is added to the model in [9] . The authors used inputs for the model from daily new cases, active cases, recovered cases, and deceased cases, to estimate the spread of the disease and the evaluation is extended to R t estimation based on the stochastic model. Based on estimation of R t , several authors have proposed methods to fore-35 cast the evolution of the outbreak. Data-based analysis, modelling, and forecasting based on a Susceptible-Infectious-Recovered-Deceased (SIRD) model was presented in [12] . The authors fit the reported data with the SIRD model to estimate the epidemiological parameters. The main drawback when fitting the model with the data is that the estimated parameters can be unrealistic. In 40 [13] , the authors attempted to use phenomenological models that have been validated during previous outbreaks. The model is used to generate and assess short-term forecasts of the cumulative number of confirmed reported cases. However, since COVID-19 is a new virus, the model was not reliable and the forecast can only be used for a very short term. Another authors use analysis of In this paper, we propose a data-driven approach for COVID-19 modeling and forecasting, which can be used by public policy and decision makers to control the outbreak through Non-Pharmaceutical Interventions (NPI). Considering drawbacks in existing methods, we present two contributions: (i) estimation of 55 the time-varying effective reproduction number R t based on real-time data fitting using an extended Kalman filter (EKF), and (ii) short to medium terms forecasting based on different public policies. As the effective reproduction number R t shows simply the extent of transmission due to population immunity or intervention in the form of public policy making [9] , we propose a new measure 60 called a Transmission Index (TI) (see Section 2.5), which describe the disease transmission relative to a normal condition. As well as R t , the value of TI can be used to measure the effectiveness of public health measures. Furthermore, TI is used to forecast different public policy scenarios by relaxing or tightening the current measures. Briefly, this paper is organized as follows. In Section 2, we discuss the methods used in evaluating the spread of the disease, including data availability and reliability, data driven framework, modelling, estimation, and forecasting. Then, discussion about the method and its applications for estimating the Transmis-70 sion Index in United Arab Emirate (UAE), Australia, and Denmark, is presented in Section 3. Lastly, the conclusion is in Section 4. In this section, we describe the data-driven modeling and forecasting approach that can be used by public policy and decision makers to control COVID- 75 19 pandemic through NPI. We acknowledge no country knows the total number 4 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2020. . of people infected with COVID-19, partially due to lack of testing and undetected asymptomatic cases. Thus, the presented approach can only be used for country/region/area that have performed mass testing with laboratory confirmation. To this end, we assume the difference between the actual case and the 80 reported case is minimized when mass testing has been conducted. The COVID-19 pandemic generates a large amount of data. Typically, the government officials reported daily confirmed cases, active cases, recovered cases, and deceased cases (see Table 1 ). These data are available for 85 almost all countries and regions and can be accessed by the public through online websites. Some websites, such as https://www.worldometers.info and https://ourworldindata.org/, also provide information regarding the number of test per capita. The data can be utilized to obtain important epidemiological parameters such as the time-varying effective reproduction numbers R t , 90 which can be used by the decision and public policy makers. Unfortunately, not all countries/regions have the ability to provide mass testing for their citizen. The WHO advised governments that the positivity rate (i.e., out of all tests conducted, how many came back positive for COVID-19) should remain at 5% or lower for at least 14 days [1] . Rich countries such 95 as the UAE, Australia, and Denmark have successfully achieved this target. 5 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2020. . https://doi.org/10.1101/2020.07.30.20165555 doi: medRxiv preprint The reported cases from the pandemic are used for two purposes: (i) to estimate the time-varying effective reproduction number R t , and (ii) to project the number of active case, recovered case, deceased case, and total case, which 100 are important to prepare for the healthcare systems. To model the transmission of the coronavirus, we use a discrete-time stochastic augmented compartmental model presented in [9] . The model consists of six 6 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2020. . The equations (1)-(4) can be obtained from a standard Susceptible-Infectious-Recovered-Deceased (SIRD) model. We augment two equations: equation (5) takes into account the number of daily confirmed cases C, while equation (6) says that the effective reproduction number R t is assumed to be a piecewise constant function with jump every one day time interval. The noise 120 w 1 (k), w 2 (k), w 3 (k), w 4 (k), w 5 (k), w 6 (k) are used to model the uncertainty. The discrete-time stochastic augmented compartmental model (1)-(6) has three constant parameters: the number of population N , the recovery rate γ, and the death rate κ. The recovery and death rates are depend on infectious time T i and Case-Fatality-Rate (CFR), and are given by The infectious time is obtained from clinical data. For COVID-19, the infectious period on average lasts for 9 days with standard deviation of 3 days [15] . The CFR is unknown and need to be estimated. However, to simplify the calculation, in this paper we assume the CFR is equal to the last data of the number of deceased case divided by the total infected case. To account for under-reported 130 7 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2020. . case, this estimate can be divided by a correction factor c, e.g., where k f denotes the index for the latest data. In our example in Section 3, we assume the under-reported case is 3 times larger that the reported case. Thus, we take c = 4. The time-varying effective reproduction number R t is estimated by applying EKF to the discrete-time stochastic augmented compartmental model (1)- (6). The details regarding the implementation can be found in [9] . The algorithm has two tuning parameters: the covariance of the process noise Q and the covariance of the observation noise R, which can be chosen such that the Root Mean Square 140 Error (RMSE) between the reported and estimated data is minimized. The EKF serves as a real-time data fitting. The EKF will estimate any new data that is recorded. Once this estimation process works, the EKF will also produce an estimate of R t from (6). 8 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2020. . Figure 2 shows results from implementing EKF to (1)-(6) using data from 145 the United Arab Emirates (UAE). It can be observed that the EKF is able to estimate the active, recovered, deceased, and daily confirmed cases accurately. This data fitting process produces an estimate of R t based on correlation in the model (1)-(6). Forecasting are done for different scenarios. To this end, we define a Transmission Index (TI) as All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2020. In this section, we run simulations for three countries: UAE, Australia, and The UAE has conducted more than 4.9 million tests since the outbreak or 502.14 total tests per thousand population [16] . This brought UAE as one of the countries with the highest number of tests. The study of [17] estimates that 10 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2020. . https://doi.org/10.1101/2020.07.30.20165555 doi: medRxiv preprint the percentage of symptomatic COVID-19 cases reported in UAE using case 175 fatality ratio estimates is at 98% (86%-100% of 95% credible interval). The first confirmed cases were reported on 29 January, from an infected family of four who came to the country on holiday from Wuhan [18, 19] . As the number of positive cases steadily increased, the government took immediate public measures, such as the closure of schools and universities across the coun- March including the whole month of Ramadan [21] , as well as night curfews for disinfection on 26 March for an extended period of time that limited movements within the country [22] . The extreme measures together with the government's wider National Screening Programme, which seeks to test as many people as possible with the aims to identify, isolate and treat patients as quickly as possible, yielded positive results in the decrease of the reproduction number almost immediately afterwards, see 11 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2020. . https://doi.org/10.1101/2020.07.30.20165555 doi: medRxiv preprint With the current TI at 31%, our forecast shows that the daily cases will be steadily decreasing, see 12 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2020. . https://doi.org/10.1101/2020.07.30.20165555 doi: medRxiv preprint The first confirmed case of COVID-19 in Australia was found on January 2020 when a traveller was back to Victoria from Wuhan, China. The number of 200 incidence passed 1000 on March 2020 and doubled after three days. The growth of incidence during March and April is considered as the first wave of pandemic with R t = 3 (see Figure 6 ). The effects of pandemic in health sectors started to reach other sectors such as trade, travel, economic and finance and an intensive interventions to prevent the pandemic from growing have been done [6] . The Australian Government closed the borders to all non-residents and noncitizens on 20 March 2020 and applied a 14-day self isolation for all arrivals. Quarantine/lock-down related policy such as physical distancing or self-isolation policy has been applied in the form of school and workplace closure, mass gathering cancellation, contact tracing, etc. Also all the non-essential services have 210 been stopped to maximize the physical distancing. The policy is applied for the next three months [24] . (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2020. . https://doi.org/10.1101/2020.07.30.20165555 doi: medRxiv preprint As the quarantine related policy is lifted in the beginning of June after a slow rate of infection (R t < 1) for three months, there has been a rise in the number of positive cases. This makes the Australian Government applies the policy softer in order to prevent not only higher cases in the second wave but also reducing a long-term impact to other than health sector. Australia TI by 220 27 July was 62% and its short-term projection shows that the active cases will increase sharply if there is no further interventions. The number of recovered and deceased cases will increase in the next 90. The short-term projection is shown in Figure 7 . 14 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2020. The effect of the lock-down can be observed after three weeks, when R t < 1. A very slow and gradual reopening has been initiated on 15 of April, by opening Denmark TI by 27 July was 20%. A short-term projection shows the number of active cases will steady under the current measures. The number of recovered individuals will increase, while the number of death will decrease significantly. Total TI = 50% TI = 40% TI = 30% TI = 20% TI = 10% Figure 9 : Forecasting for the next 90 days in Denmark. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2020. . We present a data-driven approach for modeling and forecasting of COVID- 245 19 outbreak for public policy making. The method relies on the quality of the data. Thus, the the estimated R t and the forecast results need to be carefully interpreted when the number of testing is not sufficient. By estimating the Transmission Index (TI), our approach can be used for short to medium terms forecast based on policy to be taken by the decision makers. The forecast 250 can be used to predict the course of COVID-19 including a probability of the upcoming second wave. Simulation results using data from three countries show our approach gives reasonable forecasts. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2020. . https://doi.org/10.1101/2020.07.30.20165555 doi: medRxiv preprint World Health Organization, WHO Coronavirus Disease (COVID-19) Dash-255 board Everyone included: Social impact of covid-19 Scientific and ethical basis for social-distancing 260 interventions against covid-19 Modeling the control of covid-19: Impact of policy interventions and meteorological factors A quantitative and qualitative analysis of the covid-19 pandemic model Modelling transmission and control of the covid-19 pandemic in australia Transmission dynamics and control strategies of covid-19 in wuhan Critical care utilization for the covid-275 19 outbreak in lombardy, italy: Early experience and forecast during an emergency response A new estimation method for covid-280 19 time-varying reproduction number using active cases Developing who guidelines: time to formally include evidence from mathematical modelling studies How many can you infect? simple (and naive) methods of estimating the reproduction number Data-based analysis, 290 modelling and forecasting of the covid-19 outbreak Real-time forecasts of the covid-19 epidemic in china from Analysis and forecast of covid-19 spreading in china, italy and france Temporal profiles of viral load in posterior oropharyngeal saliva samples and serum antibody responses during infec-305 tion by sars-cov-2: an observational cohort study Data, Coronavirus (covid-19) testing Reconstructing the global dynamics of under-ascertained covid-19 cases and infections Coronavirus in uae: Four of a family infected, Gulf News Coronavirus: Uae records first case, The National Why uae school closures are an important opportunity to fight coronavirus Coronavirus: Prayers at mosques and all other places of worship in uae suspended Uae imposes night curfew as it carries out disinfection campaign Uae coronavirus (covid-19) updates Australian quarantine policy: From centralization to coordination with mid-pandemic covid-19 shifts