key: cord-0747540-dmn16gcl authors: Achterberg, Massimo A.; Prasse, Bastian; Ma, Long; Trajanovski, Stojan; Kitsak, Maksim; Van Mieghem, Piet title: Comparing the accuracy of several network-based COVID-19 prediction algorithms date: 2020-10-09 journal: Int J Forecast DOI: 10.1016/j.ijforecast.2020.10.001 sha: 5fef572d8bdf18cc0b6bf9a7c905abe47b0f2e1f doc_id: 747540 cord_uid: dmn16gcl Researchers from various scientific disciplines have attempted to forecast the spread of the Coronavirus Disease 2019 (COVID-19). The proposed epidemic prediction methods range from basic curve fitting methods and traffic interaction models to machine-learning approaches. If we combine all these approaches, we obtain the Network Inference-based Prediction Algorithm (NIPA). In this paper, we analyse a diverse set of COVID-19 forecast algorithms, including several modifications of NIPA. Among the diverse set of algorithms that we evaluated, original NIPA performs best on forecasting the spread of COVID-19 in Hubei, China and in the Netherlands. In particular, we show that network-based forecasting is superior to any other forecasting algorithm. Removed (SIR) epidemic model [2, 8] . 25 By fitting the number of infected cases to a sigmoid curve, we implicitly 26 assume that the spread in a particular region is independent of other regions, 27 which contrasts the strong interconnectedness of our modern world. Network-28 based techniques take into account the interaction between different regions, 29 which is due to the movement of people. 30 The interaction can be described by network G with N nodes. Each node i 31 in the network G represents a particular region (country, province, municipality 32 or city), and the link a ij ∈ {0, 1} represents the existence of an interaction from 33 region j to region i, specified by a link weight β ij denoting the infection proba-34 bility from region j to region i. The self-infection probability within a region i 35 is given by β ii , which we expect to be dominant over the other infection proba-36 bilities, because the interaction within a region is stronger than the interaction 37 with other regions. The N × N infection probability matrix B, with elements 38 β ij is, however, unknown and must be derived from past observations of the 39 epidemic. We will address this issue in more detail in Section 2. 40 Throughout this work, we often use "the number of infected cases", which 41 the number of asymptomatic cases cannot be determined on a daily basis, we 48 confine ourselves to the number of reported cases in this work. 49 Many scientific disciplines investigate and forecast the spread of COVID-50 19. Statistical approaches are commonly based on Kalman filtering [10] or 51 consider Bayesian approaches [11] . Network-based approaches consider aero-52 plane networks, daily commute traffic or cell phone traffic [12] . Data scien-53 tists apply machine learning algorithms, like adaptive neuro-fuzzy inference sys-54 tem [13] or Long Short-Term Memory (LSTM) [14] . Mathematicians perform 55 parameter estimation on compartmental models like the Susceptible-Infected- 56 Removed model (SIR) [14, 15] or the Susceptible-Exposed-Infected-Removed 57 (SEIR) model [16] . :::: Most ::::::::: epidemic ::::::: models :::::::: forecast :::: the :::::::: number :: of :::::::: infected ::::: cases ::: as :: a :::::: point 59 ::::::: forecast ::::::::: (generally: :::: the ::::: mean :: of :: a ::::::::::: distribution) :::::: rather :::: than :: a :::::::: complete ::::::::::: distribution. ::: All :::::: models ::: in :::: this ::::: work :::: have ::::: been :::::::: designed :: to :::::::: provide ::::: point ::::::::: forecasts, ::: but :::: can 61 :: be :::::::::: generalised ::: to ::::::: provide :::::::::: prediction :::::::: intervals. :::: We ::::::: discuss :::: this ::::: topic ::::::: further :: in 62 :::::: Section ::: 2. The focus of this work is the comparison of a diverse set of methods to fore-64 cast the spread of COVID-19, ranging from fitting closed-form epidemic curves 65 and comprehensive machine-learning algorithms to network-based approaches. 66 We focus on the spread of COVID first COVID-19 case was reported on day k = 1. Nearly :::::: Given :::: that :::::: nearly : all 83 governments report their epidemic data once a day, which explains our choice for 84 :: we ::::: take a time step of 1 day :: as : a ::::::: natural :::::: choice :::: and :::::::::: investigate ::: the ::::: effect ::: of ::: the 85 :::: time :::: step ::: on ::: the :::::::::: prediction :::::::: accuracy :: in ::::::::: and the fraction of susceptible individuals follows as Here, β ij [k] ≥ 0 denotes the infection probability from region j to region i at 89 time k, and δ i > 0 denotes the curing probability of region i. ::: The ::::::: spread :: of :::::::::: COVID-19 ::::::: cannot :: be ::::::::: described ::::::: exactly ::: by ::: the :::: SIR ::::::::: equations 91 ::: (1). ::::: The ::::::::::: COVID-19 ::::::::: pandemic ::::::: evolves :: in ::::::::::: continuous ::::: time, :::::::: whereas :::: the :::: SIR 92 ::::: model :::: (1) :::::: evolves ::: in ::::::: discrete ::::: time, ::::: with : a ::::: time :::: step :: of :: 1 :::: day. ::::::::::::: Additionally, ::: the 93 ::: SIR :::::: model ::: (1) ::: is :::::: unable :: to :::::::: describe ::::::::::: phenomena ::: like :::::::: personal :::::: social :::::::::: distancing, :::::::::: nation-wide :::::::::: lockdowns ::: and :::: the :::::::::: availability :: of :::::::::::: vaccinations. ::::: Each :: of ::::: these :::::: model 95 ::::::::::: assumptions ::::::::: introduces :::::: model :::::: errors. ::::: Prior :: to :::: the ::::::::::: introduction :: of :::::: several :::::::::: forecasting 96 :::::::::: algorithms, ::: we :::::::: explain :::: how :::::: model ::::::: errors :::: can ::: be ::::: used ::: to :::::: obtain :::::::::: prediction 97 ::::::: intervals ::: for :::: the ::::::::: forecasted :::::::: number :: of ::::::: infected :::::: cases. : 2.1. :::::::: Potential ::::::::::::: generalisation :: to :::::::::: prediction :::::::: intervals 104 :::::: Before :::::::::: introducing :::: the ::::::: different :::::::::: prediction :::::::: methods, ::: we :::::::::: emphasise :::: that :::: this :::: work :::::::: focusses :: on :::::::::: short-term :::: point ::::::::: forecasts. :::: The ::::::::: long-term ::::::::: epidemic ::::::::: behaviour : is ::::: very :::::::: random, :::: and ::::::::: providing ::::::: forecast :::::::: intervals :: is :::::::: essential ::: to :::: give :: a :::::::: complete :::::: picture :: of :::: the ::::::::: long-term :::: viral :::::: spread :::: [20] : . ::::::::: Extending :::: the ::::: point ::::::: forecast :::::::: methods :: in ::: this ::::: work :: to :::::::::: prediction ::::::: intervals :: is ::::::: outside ::: the ::::: scope :: of :::: this ::::: work. :::::::::::: Nonetheless, :: we :::::::: consider :: it ::::::: valuable :: to :::::::::::: conceptually :::::: discuss ::: an ::::::::: extension :: of ::: the :::: SIR :::::::: equations ::: (1) :: to ::::: allow ::: for ::: the :::::::::::: computation :: of ::::::::: prediction ::::::::: intervals. :::: Any :::: real :::::::: epidemic ::::: does ::: not ::::: follow :::: the ::: SIR :::::: model :::: (1) ::::::: exactly. :::::::: Instead, ::: the :::::::: infection ::::: state ::::: I i [k] :::::: evolves ::::: from :::: time :: k :: to ::::: k + 1 ::: as :::::::::::::::::::::::::::::::::::::::::::::::::::::::::: (4) ::::: where ::::: w i [k] ::::::: denotes :::: the ::::: model ::::: error :: of :::::: region : i :: at ::::: time :: k, ::: see :::: also ::::::::: Appendix ::: A. ::: The ::::::::: equations :::: (4) ::: can ::: be ::::: used :: as :: a ::::: basis ::: for ::::::::: prediction :::::::: intervals ::::: with :: a :::::: Monte 106 ::::: Carlo ::::::::: approach. :::::::: Define ::: the :::::: N × 1 :::::: error :::::: vector ::: as ::::::::::::::::::::::: for ::: all :::::: times ::: k. ::::: Then, :::::: based ::: on :::::::: equation :::: (4), ::: the :::: past :::::::::::: observations ::::::::::: I[1], ..., I[n] :::: and ::: the :::::: errors 109 ::::::::::::::: , ..., w[n − 1], :::::: point ::::::: forecast ::::::::::: algorithms ::::::: provide ::: an ::::::::: estimate :: of :::: the ::::: viral 110 :::: state :::: I[k] ::: at :::::: future ::::: times :::::: k > n. : 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 J o u r n a l P r e -p r o o f Journal Pre-proof :::::::::::: Conceptually, :: a :::::::::: prediction ::::::: interval ::: for :::: the :::::: future ::::: viral ::::: state ::::: I i [k] :::: can ::: be 112 :::::::: obtained :: by :::: two ::::: steps. :::::: First, :::::: obtain ::::::: random ::::::: samples ::::: from ::: the ::::::::::: distribution :: of ::: the 113 ::::: model :::::: errors ::::::::::::::: w[1], ..., w[n − 1]. :::::::: Second, :: for ::::: every ::::::: sample :: of ::: the :::::: errors ::::::::::::::: w[1], ..., w[n − 1], :::::: obtain : a ::::: point ::::::: forecast :: of :::: the ::::: future ::::: viral ::::: states ::::: I[k]. :::: The ::::::::: prediction :::::::: intervals ::: for 115 ::: the :::::: future :::: viral ::::: state :::: I[k] :::: can :: be :::::::: obtained ::::: from :::: the :::::::: ensemble :: of ::::: point ::::::::: forecasts. The ::::::: details ::: of ::: the :::::::: outlined :::::::: method ::: for ::::::::: obtaining :::::::::: prediction :::::::: intervals :::: are 118 :::::: beyond :::: the ::::: scope :: of :::: this :::::: paper. :::: Two :::::::::: particular ::::::::: challenges ::: are ::: the ::::::::::::: determination 119 :: of ::: the ::::::::::: distribution :: of ::: the :::::: model ::::: errors :::: w[k] :::: and ::: the :::::::::::::: implementation :: of :: a :::::::::::::: computationally 120 ::::::: efficient :::::::: sampling :::::::: method. : The logistic function is a well-known example of an epidemiological sigmoid curve [3, 7] . We assume the cumulative number of infected cases y i [k] in region i at time k to follow a logistic function: where y ∞,i is the long-term fraction of infections, K i is the logistic growth rate 123 and t 0,i is the inflection point, also known as the epidemic peak. The parameters 124 y ∞,i , K i and t 0,i are estimated for each region separately using a nonlinear curve 125 fitting procedure, which is explained in Appendix F. Other sigmoid curves, like 126 the Hill function and Gompertz function, are also discussed in Appendix F. 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 J o u r n a l P r e -p r o o f 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 J o u r n a l P r e -p r o o f which is commonly used in forecasting [32] . Furthermore, we quantify the Percentage Error (PE) for every region i and time k to investigate over-and underestimations. We representatives of the spread of COVID-19, let alone general infectious diseases. Rather, these regions illustrate the strengths and weaknesses of our methods. 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 As a second case study, we regard the spread of COVID-19 in the Nether- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 We have compared the prediction accuracy of eight algorithms to forecast the 310 spread of COVID-19. We summarise the results in Table 2 . The error in Table 311 2 is obtained by averaging over all sMAPE forecast errors for forecast horizons 312 between one and six days. Fitting a sigmoid curve, like the logistic function, we mention that the sMAPE error penalises over-and underestimates of the 344 true number of infected cases equally. If policymakers : In :: a ::::::::: practical ::::::: setting, :::: such :: as :::: the ::::::: current :::::::::: COVID-19 :::::::::: pandemic, :::::::::::: policymakers ::::: might : prefer to antici-346 pate to a worst-case scenario, then : . ::: In :::: that ::::: case, : an asymmetric error metric 347 :::: that :::::::: penalises ::::::::::::::: underestimations ::::: more ::::::::::: significantly ::::: than :::::::::::::: overestimations : may be 348 more suitable. 349 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 J o u r n a l P r e -p r o o f 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 J o u r n a l P r e -p r o o f 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 J o u r n a l P r e -p r o o f 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 J o u r n a l P r e -p r o o f Pried-351 horsky Data Streams in Epidemic Forecast AnonymousDetails omitted for double-blind reviewing A contribution to the mathematical 378 theory of epidemics Covid-19: four fifths of cases are asymptomatic, China figures 381 indicate Short-term forecasts and long-term mitigation evaluations for the COVID-385 19 epidemic in Epidemic Model to Quantify the Effects of 391 Contact Tracing, Testing, and Containment Mobility network modeling explains higher SARS-CoV-2 infec-394 tion rates among disadvantaged groups and informs reopening strategies Optimization Method 399 for Forecasting Confirmed Cases of COVID-19 in China Modified SEIR and AI prediction of the epidemics trend of COVID-19 in 406 China under public health interventions Meso-scale modeling of COVID-411 19 spatio-temporal outbreak dynamics in Germany, medRxiv (2020) SEIR modeling of the COVID-19 and its dynamics An individual-based approach to SIR epidemics in 418 contact networks AnonymousDetails omitted for double-blind reviewing AnonymousDetails omitted for double-blind reviewing Tail risk of contagious diseases Finding Structure in Time Deep Learning ing Based Natural Language Processing Long Short-Term Memory LSTM recurrent networks learn simple 436 context-free and context-sensitive languages Learning to Forget: Continual 439 Prediction with LSTM Proc. of 443 ICML (32nd International Conference on Machine Learning) A Review of Recurrent Neural Networks: 446 LSTM Cells and Network Architectures Adam: A Method for Stochastic Optimization AnonymousDetails omitted for double-blind reviewing Anonymous, details omitted for double-blind reviewing Another look at measures of forecast 454 accuracy Baidu Migration website Effective containment explains subexponential 465 growth in recent confirmed COVID-19 cases in China The M4 Competi-469 tion: 100,000 time series and 61 forecasting methods RIVM, Actuele informatie over het nieuwe coronavirus (COVID-475 19), retrieved on 83628NED/table? 83628NED/table?ts=1583844319444 Google COVID-19 Community Mobility Reports Analysis, Estimation, 487 and Validation of Discrete-Time Epidemic Processes Convex Optimization Regression Shrinkage and Selection via the Lasso Reproduction numbers and sub-496 threshold endemic equilibria for compartmental models of disease 497 transmission Modeling household and community transmis-502 sion of Ebola virus disease: Epidemic growth, spatial dynamics and insights 503 for epidemic control The Gompertz Curve as a Growth Curve