key: cord-0782087-qejmwvst
authors: Yang, Wuyue; Zhang, Dongyan; Peng, Liangrong; Zhuge, Changjing; Hong, Liu
title: Rational evaluation of various epidemic models based on the COVID-19 data of China
date: 2020-03-16
journal: nan
DOI: 10.1101/2020.03.12.20034595
sha: 5f91066bc15728c9bdb5441363b2ff4671b674ba
doc_id: 782087
cord_uid: qejmwvst

During the study of epidemics, one of the most significant and also challenging problems is to forecast the future trends, on which all follow-up actions of individuals and governments heavily rely. However, to pick out a reliable predictable model/method is far from simple, a rational evaluation of various possible choices is eagerly needed, especially under the severe threat of COVID-19 epidemics which is spreading worldwide right now. In this paper, based on the public COVID-19 data of seven provinces/cities in China reported during the spring of 2020, we make a systematical investigation on the forecast ability of eight widely used empirical functions, four statistical inference methods and five dynamical models widely used in the literature. We highlight the significance of a well balance between model complexity and accuracy, over-fitting and under-fitting, as well as model robustness and sensitivity. We further introduce the Akaike information criterion, root mean square errors and robustness index to quantify these three golden means and to evaluate various epidemic models/methods. Through extensive simulations, we find that the inflection point plays a crucial role in the choice of the size of dataset in forecasting. Before the inflection point, no model considered here could make a reliable prediction. We further notice the Logistic function steadily underestimate the final epidemic size, while the Gomertz's function makes an overestimation in all cases. Since the methods of sequential Bayesian and time-dependent reproduction number take the non-constant nature of the effective reproduction number with the progression of epidemics into consideration, we suggest to employ them especially in the late stage of an epidemic. The transition-like behavior of exponential growth method from underestimation to overestimation with respect to the inflection point might be useful for constructing a more reliable forecast. Towards the dynamical models based on ODEs, it is observed that the SEIR-QD and SEIR-PO models generally show a better performance than SIR, SEIR and SEIR-AHQ models on the COVID-19 epidemics, whose success could be attributed to the inclusion of self-protection and quarantine, and a proper trade-off between model complexity and fitting accuracy.

During the study of epidemics, one of the most significant and also challenging problems is to forecast the future trends, on which all follow-up actions of individuals and governments heavily rely. However, to pick out a reliable predictable model/method is far from simple, a rational evaluation of various possible choices is eagerly needed, especially under the severe threat of COVID-19 epidemics which is spreading worldwide right now.

In this paper, based on the public COVID-19 data of seven provinces/cities in China reported during the spring of 2020, we make a systematical investigation on the forecast ability of eight widely used empirical functions, four statistical inference methods and five dynamical models widely used in the literature. We highlight the significance of a well balance between model complexity and accuracy, over-fitting and under-fitting, as well as model robustness and sensitivity. We further introduce the Akaike information criterion, root mean square errors and robustness index to quantify these three golden means and to evaluate various epidemic models/methods.

Through extensive simulations, we find that the inflection point plays a crucial role in the choice of the size of dataset in forecasting. Before the inflection point, no model considered here could make a reliable prediction. We further notice the Logistic function steadily underestimate the final epidemic size, while the Gomertz's function makes an overestimation in all cases. Since the methods of sequential Bayesian and time-dependent reproduction number take the non-constant nature of the effective reproduction number with the progression of epidemics into consideration, we suggest to employ them especially in the late stage of an epidemic. The transition-like behavior of exponential growth method from underestimation to overestimation with respect to the inflection point might be useful for constructing a more reliable forecast.

Towards the dynamical models based on ODEs, it is observed that the SEIR-QD and

During the study of epidemics, one of the most significant and challenging problem is to forecast the future trends, like how many individuals might be infected each day, when the epidemics stop spreading, what kinds of policies and actions have to taken and how they influence the epidemics, and so forth [1] [2] [3] [4] Since the accurate epidemic forecast is so critical, there are diverse methods reported in the literature to try to achieve this goal [8] [9] [10] [11] [12] . Among them, empirical functions, methods based on statistical inference and dynamical models (difference equations, ODEs and PDEs) are three major routines (see Fig. 1 ). Empirical functions, especially those with explicit forms, are most popular ones. They are simple, easily understandable, fast implemented and analyzable. So many merits lay down the unreplaceable role of empirical functions in this field. The statistical inference methods are also highly welcomed, especially in the presence of a large amount of first-hand data. The basic goal of most statistical methods in epidemics is to estimate the basic/effective reproduction number, which serves as a key quantity to evaluate the future trends of an infectious disease. In dynamical models, the basic/effective reproduction number is transformed into reaction coefficients. Based on compartment assumptions on populations involved in epidemics, such as susceptible, exposed, infected, quarantined, hospitalized, recovered and death, classical SI, SIR, SEIR and other generalized models show a great ability to correctly reproduce the basic features of the spreading process of an infectious disease, to reveal the hidden dynamics, like the numbers of exposed cases and asymptomatic carriers which are hard to be learnt from usual epidemiology investigation, to forecast the future trends of epidemics, as well to evaluate the influence of diverse control policies and actions against the spreading of infectious diseases in quantity.

However, in the face of so many possible choices, which one would be our best choice, especially for the purpose of a reliable estimation on the future epidemic trend? In this paper, based on the COVID-19 data of Shanghai and other six provinces/cities in China in the spring of 2020, we explore this critical issue systematically by comparing the performance 3 All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted March 16, 2020. . https://doi.org/10.1101/2020.03. 12.20034595 doi: medRxiv preprint of eight widely used empirical functions, four statistical inference methods and five dynamical models reported in the literature. The basic guiding principles and evaluation criteria are summarized in Section II. Detailed analyses, comparisons and evaluations among various epidemic models are carried out in Section III. In Section IV, we forecast the epidemic trends in South Korea, Italy and Iran in the near future. The last section contains a conclusion and some brief discussions.

It is far from a trivial problem to evaluate the forecast ability of various functions/models/method in a rational way 10, 12, 13 . Many aspects should be considered at the same time. Here, as a very general guiding principle, we propose three golden means which can be measured through explicitly calculable quantities, i.e.

• Complexity v.s. Accuracy. We seek for a well-balance between the model complexity and fitting accuracy. Neither too complicated models with too many free parameters and unverified mechanisms nor over simplified models without sufficient capability to mimic the real complicated situations and all available data is welcomed.

This issue is also closely related to the over-fitting and under-fitting problem met in numerics.

• Fitting v.s. Prediction. It is a very one-sided pursuit of the lowest fitting errors (measured by MSE, RMSE, correlation coefficients, etc.) for a predictive model, though it is always the case in most published works! In fact, there are tremendous evidences to show that the best fitting does not always lead to the best forecast (see the Boltzmann factor).

• Robustness v.s. Sensitivity. On one hand, we hope our model be sensitive to parameter changes in order to model the influence of different situations and strategies, 4 All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted March 16, 2020. . https://doi.org/10.1101/2020.03. 12.20034595 doi: medRxiv preprint to pick out the best fitting parameters, etc. On the other hand, the model is expected to be robust (insensitive in other words) to against perturbations coming from various sources, such as numerical errors, data noise, incomplete knowledge about epidemic mechanisms, etc. Obviously, these are two opposite pursuits just like two sides of the same coin and can not be optimized simultaneously. So that we need to relax our requirement on model robustness and sensitivity, and focus on the reproducibility of key dynamical features (like the inflection point, half time, etc.), the asymptotic stability (basic/effective reproduction number R 0 ), the bifurcation point, etc.

Above three golden means reflect the struggle and compromise between complexity and simplicity, over-fitting and under-fitting, short-term and long-term goals, robustness and sensitivity, as well as energetic and entropic, deterministic and statistical, local and global views.

These issues are met here and there, now and then. The overemphasis of one aspect by omitting the other either deliberately or unintentionally would lead to unsatisfactory predictions.

As a perfect reflection of the central spirit of Confucianism -Zhongyong, our three golden means provide a practical solution to above difficulties both qualitatively and quantitatively, and can be used for making a rational evaluation of different fucntions/methods/models for epidemic forecast and other related scientific problems.

To quantify these three golden means, here we propose three easily calculable factors for model evaluation.

(1) The Akaike information criterion (AIC) and its various modified versions, like AICc, AICu, QAIC, BIC, etc. AIC was introduced by Japanese statistician Akaike in the early 1970s 14 . It is based on the concept of entropy, and incorporate the model complexity and its goodness of fit together in accordance with our first golden mean.

in which k is the total number of free parameters in a model, while L is the likelihood function. Models with less free parameters and high fitting accuracy will have lower AIC values. In this work, we use AICc = −2 ln(L)/N + (N + k)/(N − k − 2) proposed by

McQuarrie and Tsai 15 to remove the size dependence on data set N .

(2) The robustness index (RB). There are plenty of ways to quantity model robust-5 All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (3) The root mean square error (RMSE). It is defined as

in which N denotes the data size, x i and y i are the true values and predicted ones separately.

The RMSE is widely used to quantify the accuracy of regression models and needs no more explanation.

In this part, we apply our three golden means to evaluate most widely used models/methods in the field of epidemics. Far from complete, we collect eight empirical function with explicit formulas -linear, quadratic, cubic, exponential, Hill's, Logistic, Gompertz's and Richards' functions 16 , four statistical inference methods -exponential growth, maximum likelihood, sequential Bayesian and time-dependent reproduction number 17, 18 , as well as five ODE models -SIR, SEIR, SEIR-QD 19 , SEIR-AHQ 20 and SEIR-PO (see Fig. 1 ). The unknown parameters involved in different models are estimated respectively by fitting the models to the epidemic data through the standard non-linear least squares approach.

To make a quantitative comparison, here we focus on the outbreaks of COVID-19 caused by the novel coronavirus -SARS-CoV-2, which currently spreads severely worldwide. We collect the data of daily reported confirmed infected cases C(t) from the China CDC (http://www.chinacdc.cn/). The forty-day data from Jan. 20th, 2020 to Feb. 28th, 2020 of Shanghai, which is located in eastern China and is considered as one of the most successful 6 All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Our main findings are summarized as follows:

(1) The best model for epidemic forecast can be picked out based on AIC and RB. The best model (judged by RMSE of test data) can be easily picked out from each group based on the values of AICc and RB (depending only on the model and training data!). The lower the AICc value is, the better trade-off between model complexity and 7 All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. fitting accuracy is achieved. Meanwhile, a medium RB value (0.5 − 0.9 in general) could take the requirements on model robustness and sensitivity into consideration at the same time. Among 27 groups (9 cases times 3 groups of models) under comparison, the AICc value helps to find out 18 best models (by RMSE), and the rest 9 best models all have the second lowest AICc values (see Table. I and Table. II for details).

(2) Sigmoid functions are more suitable for epidemic forecast. Linear, quadratic, cubic and exponential functions are not suitable for describing epidemic data in general (see SI), while Hill's, Logistic, Gompertz's and Richards' functions can well capture the typical S-shaped curve for the cumulative infected cases.

(3) At the early stage of an epidemic, no model is reliable. In the case of very limited data during the early stage of an epidemic, there is no way to tell which model is superior than the other. They may either overestimate or underestimate the real situation of a epidemic in an unpredictable way. Since the model with fewer parameters are more robust, we suggest to adopt either exponential function or even linear function, though their valid region is quite narrow. A more delicate way is to combine the knowledge of the reproduction number by statistics methods and the forecast ability of exponential or logistic functions. However, it should be noted that during the early stage the variance of 8 All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted March 16, 2020. . https://doi.org/10.1101/2020.03.12.20034595 doi: medRxiv preprint the derived basic/effective reproduction number is generally very large (see SI), making a long-term reliable forecast almost impossible.

(4)The inflection is crucial for forecast. The inflection point plays in role in forecast. It was suggested by Zhao et al. 16 in Zika research that in the period when enough data are collected, typically when the epidemic passes the inflection point, predictions on the final epidemic size by the sigmoid empirical functions, such as Logistic, Gompertz's and Richards' functions, will converge to the true values. Here we basically reproduce their results. As shown in the last row of Fig. 3 , the RMSE of predictions on the COVID-19 epidemic data of Shanghai grows in an exponential way with respect to the size of training data, meanwhile the RMSE of fitting keeps nearly unchanged. Interestingly, before Jan. 31st which is also the inflection point of Shanghai, all functions fail to make a reliable fitting (though some functions fail even earlier) and their RB values rapidly drop to zero. (5) The observation that Logistic function underestimates the epidemic size 9 All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. while Gompertz's function overestimates it requires further validation. In all nine cases (including three cases for Shanghai), we notice the Logistic functions always underestimate the total number of infected cases, while the Gomertz's function makes an overestimate (see Fig. 4 ). If this is not a coincidence, it would be extremely useful for estimating the lower and upper bounds for the real total infected populations, though it still 10 All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted March 16, 2020. . https://doi.org/10.1101/2020.03.12.20034595 doi: medRxiv preprint requires further more evidences. The predictions of the other two functions are not so stable and their goodness-of-fit varies from case to case, which reflects the lacking of theoretical support of empirical functions during application.

(6) Methods fo sequential Bayesian and time-dependent reproduction number are more accurate at the late stage of an epidemic. For statistical methods, since sequential Bayesian and time-dependent reproduction number methods take the nonconstant nature of the effective reproduction number with the progression of epidemics into consideration, they appear to be more accurate than the exponential growth and maximum likelihood methods in the late stage. In addition, the sequential Bayesian method seems to be less robust than the time-dependent reproduction number method. The latter inherits the merit of Logistic function by slightly underestimate the true epidemic size. It is further observed that the estimated basic reproduction number R 0 by the exponential growth method exhibits a transition from overestimation to underestimation with respect to the inflection point, which is in accordance with the S-shaped curve for the total infected populations. As a consequence, in the early stage of Shanghai COVID-19 epidemic, the exponential growth method after combining with the Logistic function makes an underestimation on the final epidemic size, and a contrary overestimation based on accumulated data in the late stage.

Finally, we find the maximum likelihood method overestimates the epidemic size to a large extent in all seven cases, indicating this method may not be suitable for studying COVID-19 epidemics.

(7) The SEIR-QD and SEIR-PO models are suitable for modeling COVID-10 epidemics. The dynamical models generally requires more reliable data to achieve a reliable forecast than empirical functions, since the former usually involves more free parameters and more complicated mathematical structure than the latter. Based on their performance, the dynamical models can be classified into three groups. The classical SIR model and SEIR model seem to be inadequate to describe the outbreak of COVID-19, especially the final equilibration phase. Contrarily, the SEIR-AHQ model involves too many free parameters as reflected through the large AICc value. As a consequence, its robustness is also the poorest among all five models. The SEIR-QD and SEIR-PO models are two suitable ones for modeling COVID-19 by appropriately incorporating the effects of quarantine and self-protection.

All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 12 All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 

Based on our previous evaluations on different epidemic models and methods, we try to forecast the epidemic trends in several hot countries, which is currently facing severe threat of the outbreak of COVID-19 epidemics. However, after a very careful examination, we find that except China most countries with more than 100 confirmed COVID-19 cases are still within the early stage of epidemics (the inflection point has not be reached yet). It is really terrible! As a consequence, we refer to the statistical inference method, the time-dependent reproduction number to be exact, to try to derive the basic/effective reproduction number.

Then we adopt the Gompertz's function (usually overestimate the final epidemic size as we claimed above), Logistic function, as well as Logistic function combined with R 0 derived from the exponential growth model (usually underestimate the final epidemic size before the inflection point and make an overestimation after that) to try to provide more reasonable upper and lower bounds for the total confirmed infected case at least in the near future.

Here we look into three examples -South Korea, Italy and Iran, which have the second to fourth largest confirmed infected populations by COVID-19 till Mar. 10th, 2020. Based on the effective reproduction number R t derived from the method of time-dependent reproduction number, it is observed that R t for South Korea becomes lower than 1, meaning the COVID-19 epidemic is under control. The number of total confirmed cases is estimated to be 8, 000 − 10, 800. In contrast, R t for Italy and Iran are both larger than 1, suggesting a rapid spreading of COVID-19 in these countries. A conservative estimate of final confirmed cases is around 18, 000 − 23, 000 for Italy and 9, 500 − 13, 500 for Iran. Similar severe epidemic situations are observed in Germany, France and Spain (data not shown), meaning these European countries (and also some others) are in great danger too.

In this paper, based on the COVID-19 data of seven provinces/cities in China during the spring of 2020, we make a systematical investigation on the forecast ability of eight widely used empirical functions, four statistical inference methods and five dynamical models reported in the literature. We highlight the significance of a well balance between model complexity and accuracy, over-fitting and under-fitting, as well as model robustness and 13 All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. sensitivity. We further introduce the Akaike information criterion, root mean square errors and robustness index to quantify these three golden means and to evaluate various epidemic models/methods. Through extensive simulations and detailed comparisons, we find that the inflection point plays a crucial role for making reliable forecasts, in agreement with previous reports 16 . The RMSE of model prediction decays exponentially with respect to the size of training data set, while the model robustness characterized through the variance of final epidemic size also approaches to unity rapidly after the inflection point. Furthermore, the forecast abilities of several epidemic models are also closely related to the inflection point. For example, the estimated basic reproduction number R 0 by the exponential growth method exhibits a transition from overestimation to underestimation with the increasing in the size of training data set, and the inflection point acts as the demarcation.

We notice the Logistic functions always underestimate the total number of infected cases, 14 All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted March 16, 2020. . https://doi.org/10.1101/2020.03. 12.20034595 doi: medRxiv preprint while the Gomertz's function makes an overestimate in all cases we studies. Hill's and Richards' functions do not have such a consistency. Since the sequential Bayesian and time-dependent reproduction number methods take the non-constant nature of the effective reproduction number with the progression of epidemics into consideration, we think they are more accurate than the exponential growth and maximum likelihood methods especially in the late stage of an epidemic. The transition of exponential growth method from underestimation to overestimation with respect to the inflection point could be quite useful for constructing a more reliable forecast. Towards the dynamic models based on ODEs, it is observed that the SEIR-QD and SEIR-PO models generally show a better performance than the other three, highlighting the essential of a trade-off between model complexity and fitting accuracy. The success of the former two models could also be attributed to the inclusion of self-protection and quarantine during the progression of COVID-19 epidemics.

This work has some limitations. The analyses are highly relied on the quality of the epidemic data. The initial conditions and parameter regions may also affect the results significantly, though our qualitative conclusions will not be changed. Besides five simple ODE models, many other epidemic models inclduing PDEs, stochastic equations and timedelayed equations are not considered. In addition, further validations than seven cases are needed to explore the applicable regions of all the methods/models.

Important Concepts in Mathematical Modeling of Infectious Diseases

No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted

Applying infectious disease forecasting to public health: a path forward using influenza forecasting examples

Complexity in mathematical models of public health policies: A guide for consumers of models

Mathematical modeling of the transmission dynamics of clostridium difficile infection and colonization in healthcare settings: A systematic review

Mathematical modelling of sars and other infectious diseases in china: a review

A review of mathematical models of influenza a infections within a host or cell culture: lessons learned and challenges ahead

Models to predict h1n1 outbreaks: a literature review

Mathematical models to characterize early epidemic growth: A review

Modelling the global spread of diseases: A review of current practice and capability

Assessing the performance of real-time epidemic forecasts: A case study of ebola in the western area region of sierra leone

Model selection and parameter estimation for dynamic epidemic models via iterated filtering: application to rotavirus in Germany

No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted

Assessing parameter identifiability in compartmental dynamic models using a computational approach: application to infectious disease transmission models

A framework for evaluating epidemic forecasts

A new look at the statistical model identification

Simple framework for real-time forecast in a data-limited situation: the zika virus (zikv) outbreaks in brazil from 2015 to 2016 as an example

The r0 package: a toolbox to estimate reproduction numbers for epidemic outbreaks

Estimation of the epidemic properties of the 2019 novel coronavirus: A mathematical modeling study. medRxiv

Epidemic analysis of covid-19 in china by dynamical modeling

Estimation of the transmission risk of the 2019-ncov and its implication for public health interventions

The authors acknowledged the financial supports from the National Natural Science Foun- 

All rights reserved. No reuse allowed without permission.(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 18 All rights reserved. No reuse allowed without permission.(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint this version posted March 16, 2020. . https://doi.org/10.1101/2020.03.12.20034595 doi: medRxiv preprint