key: cord-0747540-dmn16gcl
authors: Achterberg, Massimo A.; Prasse, Bastian; Ma, Long; Trajanovski, Stojan; Kitsak, Maksim; Van Mieghem, Piet
title: Comparing the accuracy of several network-based COVID-19 prediction algorithms
date: 2020-10-09
journal: Int J Forecast
DOI: 10.1016/j.ijforecast.2020.10.001
sha: 5fef572d8bdf18cc0b6bf9a7c905abe47b0f2e1f
doc_id: 747540
cord_uid: dmn16gcl

Researchers from various scientific disciplines have attempted to forecast the spread of the Coronavirus Disease 2019 (COVID-19). The proposed epidemic prediction methods range from basic curve fitting methods and traffic interaction models to machine-learning approaches. If we combine all these approaches, we obtain the Network Inference-based Prediction Algorithm (NIPA). In this paper, we analyse a diverse set of COVID-19 forecast algorithms, including several modifications of NIPA. Among the diverse set of algorithms that we evaluated, original NIPA performs best on forecasting the spread of COVID-19 in Hubei, China and in the Netherlands. In particular, we show that network-based forecasting is superior to any other forecasting algorithm.

Removed (SIR) epidemic model [2, 8] . 25 By fitting the number of infected cases to a sigmoid curve, we implicitly 26 assume that the spread in a particular region is independent of other regions, 27 which contrasts the strong interconnectedness of our modern world. Network-28 based techniques take into account the interaction between different regions, 29 which is due to the movement of people. 30 The interaction can be described by network G with N nodes. Each node i 31 in the network G represents a particular region (country, province, municipality 32 or city), and the link a ij ∈ {0, 1} represents the existence of an interaction from 33 region j to region i, specified by a link weight β ij denoting the infection proba-34 bility from region j to region i. The self-infection probability within a region i 35 is given by β ii , which we expect to be dominant over the other infection proba-36 bilities, because the interaction within a region is stronger than the interaction 37 with other regions. The N × N infection probability matrix B, with elements 38 β ij is, however, unknown and must be derived from past observations of the 39 epidemic. We will address this issue in more detail in Section 2. 40 Throughout this work, we often use "the number of infected cases", which 41 the number of asymptomatic cases cannot be determined on a daily basis, we 48 confine ourselves to the number of reported cases in this work. 49 Many scientific disciplines investigate and forecast the spread of COVID-50 19. Statistical approaches are commonly based on Kalman filtering [10] or 51 consider Bayesian approaches [11] . Network-based approaches consider aero-52 plane networks, daily commute traffic or cell phone traffic [12] . Data scien-53 tists apply machine learning algorithms, like adaptive neuro-fuzzy inference sys-54 tem [13] or Long Short-Term Memory (LSTM) [14] . Mathematicians perform 55 parameter estimation on compartmental models like the Susceptible-Infected- 56 Removed model (SIR) [14, 15] or the Susceptible-Exposed-Infected-Removed 57 (SEIR) model [16] .

::::

Most ::::::::: epidemic ::::::: models :::::::: forecast :::: the :::::::: number :: of :::::::: infected ::::: cases ::: as :: a :::::: point 59 ::::::: forecast ::::::::: (generally: :::: the ::::: mean :: of :: a ::::::::::: distribution) :::::: rather :::: than :: a :::::::: complete ::::::::::: distribution.

:::

All :::::: models ::: in :::: this ::::: work :::: have ::::: been :::::::: designed :: to :::::::: provide ::::: point ::::::::: forecasts, ::: but :::: can 61 ::

be :::::::::: generalised ::: to ::::::: provide :::::::::: prediction :::::::: intervals. :::: We ::::::: discuss :::: this ::::: topic ::::::: further :: in 62 ::::::

Section ::: 2.

The focus of this work is the comparison of a diverse set of methods to fore-64 cast the spread of COVID-19, ranging from fitting closed-form epidemic curves 65 and comprehensive machine-learning algorithms to network-based approaches. 66 We focus on the spread of COVID first COVID-19 case was reported on day k = 1. Nearly :::::: Given :::: that :::::: nearly : all 83 governments report their epidemic data once a day, which explains our choice for 84 ::

we ::::: take a time step of 1 day :: as : a ::::::: natural :::::: choice :::: and :::::::::: investigate ::: the ::::: effect ::: of ::: the 85 :::: time :::: step ::: on ::: the :::::::::: prediction :::::::: accuracy :: in ::::::::: 

and the fraction of susceptible individuals follows as

Here, β ij [k] ≥ 0 denotes the infection probability from region j to region i at 89 time k, and δ i > 0 denotes the curing probability of region i.

:::

The ::::::: spread :: of :::::::::: COVID-19 ::::::: cannot :: be ::::::::: described ::::::: exactly ::: by ::: the :::: SIR ::::::::: equations 91 :::

(1). ::::: The ::::::::::: COVID-19 ::::::::: pandemic ::::::: evolves :: in ::::::::::: continuous ::::: time, :::::::: whereas :::: the :::: SIR 92 ::::: model :::: (1) :::::: evolves ::: in ::::::: discrete ::::: time, ::::: with : a ::::: time :::: step :: of :: 1 :::: day. ::::::::::::: Additionally, ::: the 93 ::: SIR :::::: model ::: (1) ::: is :::::: unable :: to :::::::: describe ::::::::::: phenomena ::: like :::::::: personal :::::: social :::::::::: distancing,

::::::::::

nation-wide :::::::::: lockdowns ::: and :::: the :::::::::: availability :: of :::::::::::: vaccinations. ::::: Each :: of ::::: these :::::: model 95 :::::::::::

assumptions ::::::::: introduces :::::: model :::::: errors. ::::: Prior :: to :::: the ::::::::::: introduction :: of :::::: several :::::::::: forecasting 96 ::::::::::

algorithms, ::: we :::::::: explain :::: how :::::: model ::::::: errors :::: can ::: be ::::: used ::: to :::::: obtain :::::::::: prediction 97 :::::::

intervals ::: for :::: the ::::::::: forecasted :::::::: number :: of ::::::: infected :::::: cases. : 2.1. :::::::: Potential ::::::::::::: generalisation :: to :::::::::: prediction :::::::: intervals 104 ::::::

Before :::::::::: introducing :::: the ::::::: different :::::::::: prediction :::::::: methods, ::: we :::::::::: emphasise :::: that :::: this ::::

work :::::::: focusses :: on :::::::::: short-term :::: point ::::::::: forecasts. :::: The ::::::::: long-term ::::::::: epidemic ::::::::: behaviour : is ::::: very :::::::: random, :::: and ::::::::: providing ::::::: forecast :::::::: intervals :: is :::::::: essential ::: to :::: give :: a :::::::: complete ::::::

picture :: of :::: the ::::::::: long-term :::: viral :::::: spread :::: [20] : . ::::::::: Extending :::: the ::::: point ::::::: forecast :::::::: methods ::

in ::: this ::::: work :: to :::::::::: prediction ::::::: intervals :: is ::::::: outside ::: the ::::: scope :: of :::: this ::::: work. :::::::::::: Nonetheless, ::

we :::::::: consider :: it ::::::: valuable :: to :::::::::::: conceptually :::::: discuss ::: an ::::::::: extension :: of ::: the :::: SIR :::::::: equations ::: (1)

::

to ::::: allow ::: for ::: the :::::::::::: computation :: of ::::::::: prediction ::::::::: intervals. :::: Any :::: real :::::::: epidemic ::::: does ::: not :::::

follow :::: the ::: SIR :::::: model :::: (1) ::::::: exactly. :::::::: Instead, ::: the :::::::: infection ::::: state ::::: I i [k] :::::: evolves ::::: from ::::

time :: k :: to ::::: k + 1 ::: as

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

(4)

:::::

where ::::: w i [k] ::::::: denotes :::: the ::::: model ::::: error :: of :::::: region : i :: at ::::: time :: k, ::: see :::: also ::::::::: Appendix ::: A.

:::

The ::::::::: equations :::: (4) ::: can ::: be ::::: used :: as :: a ::::: basis ::: for ::::::::: prediction :::::::: intervals ::::: with :: a :::::: Monte 106 :::::

Carlo ::::::::: approach. :::::::: Define ::: the :::::: N × 1 :::::: error :::::: vector ::: as ::::::::::::::::::::::: for ::: all :::::: times ::: k.

:::::

Then, :::::: based ::: on :::::::: equation :::: (4), ::: the :::: past :::::::::::: observations ::::::::::: I[1], ..., I[n] :::: and ::: the :::::: errors 109 :::::::::::::::

, ..., w[n − 1], :::::: point ::::::: forecast ::::::::::: algorithms ::::::: provide ::: an ::::::::: estimate :: of :::: the ::::: viral 110 :::: state :::: I[k] ::: at :::::: future ::::: times :::::: k > n. : 3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 J o u r n a l P r e -p r o o f Journal Pre-proof ::::::::::::

Conceptually, :: a :::::::::: prediction ::::::: interval ::: for :::: the :::::: future ::::: viral ::::: state ::::: I i [k] :::: can ::: be 112 ::::::::

obtained :: by :::: two ::::: steps. :::::: First, :::::: obtain ::::::: random ::::::: samples ::::: from ::: the ::::::::::: distribution :: of ::: the 113 ::::: model :::::: errors ::::::::::::::: w[1], ..., w[n − 1]. :::::::: Second, :: for ::::: every ::::::: sample :: of ::: the :::::: errors ::::::::::::::: w[1], ..., w[n − 1],

::::::

obtain : a ::::: point ::::::: forecast :: of :::: the ::::: future ::::: viral ::::: states ::::: I[k]. :::: The ::::::::: prediction :::::::: intervals ::: for 115 :::

the :::::: future :::: viral ::::: state :::: I[k] :::: can :: be :::::::: obtained ::::: from :::: the :::::::: ensemble :: of ::::: point ::::::::: forecasts. The ::::::: details ::: of ::: the :::::::: outlined :::::::: method ::: for ::::::::: obtaining :::::::::: prediction :::::::: intervals :::: are 118 ::::::

beyond :::: the ::::: scope :: of :::: this :::::: paper. :::: Two :::::::::: particular ::::::::: challenges ::: are ::: the ::::::::::::: determination 119 ::

of ::: the ::::::::::: distribution :: of ::: the :::::: model ::::: errors :::: w[k] :::: and ::: the :::::::::::::: implementation :: of :: a :::::::::::::: computationally 120 ::::::: efficient :::::::: sampling :::::::: method. :

The logistic function is a well-known example of an epidemiological sigmoid curve [3, 7] . We assume the cumulative number of infected cases y i [k] in region i at time k to follow a logistic function:

where y ∞,i is the long-term fraction of infections, K i is the logistic growth rate 123 and t 0,i is the inflection point, also known as the epidemic peak. The parameters 124 y ∞,i , K i and t 0,i are estimated for each region separately using a nonlinear curve 125 fitting procedure, which is explained in Appendix F. Other sigmoid curves, like 126 the Hill function and Gompertz function, are also discussed in Appendix F. 3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 J o u r n a l P r e -p r o o f 3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 J o u r n a l P r e -p r o o f

which is commonly used in forecasting [32] . Furthermore, we quantify the Percentage Error (PE)

for every region i and time k to investigate over-and underestimations. We representatives of the spread of COVID-19, let alone general infectious diseases.

Rather, these regions illustrate the strengths and weaknesses of our methods. 3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63 3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63 

As a second case study, we regard the spread of COVID-19 in the Nether- 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64 

We have compared the prediction accuracy of eight algorithms to forecast the 310 spread of COVID-19. We summarise the results in Table 2 . The error in Table   311 2 is obtained by averaging over all sMAPE forecast errors for forecast horizons 312 between one and six days. Fitting a sigmoid curve, like the logistic function, we mention that the sMAPE error penalises over-and underestimates of the 344 true number of infected cases equally. If policymakers : In :: a ::::::::: practical ::::::: setting,

::::

such :: as :::: the ::::::: current :::::::::: COVID-19 :::::::::: pandemic, :::::::::::: policymakers ::::: might : prefer to antici-346 pate to a worst-case scenario, then : . ::: In :::: that ::::: case, : an asymmetric error metric 347 :::: that :::::::: penalises ::::::::::::::: underestimations ::::: more ::::::::::: significantly ::::: than :::::::::::::: overestimations : may be 348 more suitable. 349  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 J o u r n a l P r e -p r o o f 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 J o u r n a l P r e -p r o o f  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 J o u r n a l P r e -p r o o f  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 J o u r n a l P r e -p r o o f

Pried-351 horsky

Data Streams in Epidemic Forecast

AnonymousDetails omitted for double-blind reviewing

A contribution to the mathematical 378 theory of epidemics

Covid-19: four fifths of cases are asymptomatic, China figures 381 indicate

Short-term forecasts and long-term mitigation evaluations for the COVID-385 19 epidemic in

Epidemic Model to Quantify the Effects of 391

Contact Tracing, Testing, and Containment

Mobility network modeling explains higher SARS-CoV-2 infec-394 tion rates among disadvantaged groups and informs reopening strategies

Optimization Method 399 for Forecasting Confirmed Cases of COVID-19 in China

Modified SEIR and AI prediction of the epidemics trend of COVID-19 in 406

China under public health interventions

Meso-scale modeling of COVID-411 19 spatio-temporal outbreak dynamics in Germany, medRxiv (2020)

SEIR modeling of the COVID-19 and its dynamics

An individual-based approach to SIR epidemics in 418 contact networks

AnonymousDetails omitted for double-blind reviewing

AnonymousDetails omitted for double-blind reviewing

Tail risk of contagious diseases

Finding Structure in Time

Deep Learning

ing Based Natural Language Processing

Long Short-Term Memory

LSTM recurrent networks learn simple 436 context-free and context-sensitive languages

Learning to Forget: Continual 439 Prediction with LSTM

Proc. of 443 ICML (32nd International Conference on Machine Learning)

A Review of Recurrent Neural Networks: 446 LSTM Cells and Network Architectures

Adam: A Method for Stochastic Optimization

AnonymousDetails omitted for double-blind reviewing

Anonymous, details omitted for double-blind reviewing

Another look at measures of forecast 454 accuracy

Baidu Migration website

Effective containment explains subexponential 465 growth in recent confirmed COVID-19 cases in China

The M4 Competi-469 tion: 100,000 time series and 61 forecasting methods

RIVM, Actuele informatie over het nieuwe coronavirus (COVID-475 19), retrieved on

83628NED/table?

83628NED/table?ts=1583844319444

Google COVID-19 Community Mobility Reports

Analysis, Estimation, 487 and Validation of Discrete-Time Epidemic Processes

Convex Optimization

Regression Shrinkage and Selection via the Lasso

Reproduction numbers and sub-496 threshold endemic equilibria for compartmental models of disease 497 transmission

Modeling household and community transmis-502 sion of Ebola virus disease: Epidemic growth, spatial dynamics and insights 503 for epidemic control

The Gompertz Curve as a Growth Curve