key: cord-0569671-hmnxpnpg authors: Guglielmi, Nicola; Iacomini, Elisa; Viguerie, Alex title: Identification of Time Delays in COVID-19 Data date: 2021-11-26 journal: nan DOI: nan sha: f00b2feb0a9116a8904e11b75064f74152636af5 doc_id: 569671 cord_uid: hmnxpnpg COVID-19 data released by public health authorities features the presence of notable time-delays, corresponding to the difference between actual time of infection and identification of infection. These delays have several causes, including the natural incubation period of the virus, availability and speed of testing facilities, population demographics, and testing center capacity, among others. Such delays have important ramifications for both the mathematical modeling of COVID-19 contagion and the design and evaluation of intervention strategies. In the present work, we introduce a novel optimization technique for the identification of time delays in COVID-19 data, making use of a delay-differential equation model. The proposed method is general in nature and may be applied not only to COVID-19, but for generic dynamical systems in which time delays may be present. The outbreak of COVID-19 in 2020 and into 2021 has led to a surge in interest in the mathematical modeling of the COVID-19 epidemic, as well as the mathematical modeling of epidemics generally. These models have taken many types. A full review of the relevant literature is beyond the current work. However, some general classes of models include data-driven and machine learning approaches [1, 2, 3, 4, 5, 6] , models based on partial differential equation (PDE) systems [7, 8, 9, 10, 11, 12, 13, 14] , agent-based models [15] , and models based on ordinary differential equation (ODE) systems. This last category is by far the most common such model, with such articles numbering in the thousands. Some works of this type include e.g. [16, 17, 18, 19, 20, 21, 22, 23, 24] . We note that the categories listed above are not necessarily strict delimiters; indeed, many of the above cited works, as well myriad others, have characteristics of multiple of different model classes. One commonality that nearly all the presented models share is that they are, in some way, compartmental models in the SIR family owing to the seminal work of Kermack and Mackendrick [25] (see also: [26, 27] ). One of the most important aspects of COVID-19 modeling deals with the presence of time delays in the data. Time delays certainly manifest themselves via the natural incubation period of the disease; this is reflected in the employed models commonly incorporating an exposed compartment, to model the period after exposure to the virus but before symptomatic infection [1, 2, 11, 16] . Such an approach is a reasonable one, though not without difficulties. In particular, the exposed compartment is not necessarily easily known from data, requiring its estimation. This may be done, for example, through rule-of-thumb estimates based on the infected population, which result in high uncertainty [13, 12, 9] . Compounding this problem further are the well-known presence of asymptomatic patients, who may spread the disease while never showing appreciable symptoms, and, at least in the earlier stages of the pandemic, were almost certainly under-counted [9, 11] . In lieu of an exposed compartment, in [28, 5] , the authors employed a delay-differential equation (DDE) model, in which the incubation period was modeled with a time-delay term, rather than a separate compartment. Aside from providing the advantage of eliminating the potentially troublesome exposed compartment, such a modeling approach eliminates the assumption that the sojourn time of the incubation period is exponentially distributed, which is wellknown to be a questionable assumption in some settings [26] . Other works incorporating DDE models into the analysis of COVID-19 include [29, 30, 31] , while references for more general epidemiological models of this type may be found in [26, 27, 32, 33, 34, 35] . Aside from the presence of incubation periods, delays in data may result from other sources: additional time from initial onset of symptoms to when a person decides to be tested, the availability of testing, the speed of test result processing, testing center capacity, and other factors may have a large influence on when a given case is properly identified in the data [36] . The presence of such lags is important, particularly when attempting to model measured data, as such processes are intrinsic to the data [37] . Hence, when attempting to properly evaluate different intervention strategies, for instance, such delays must be considered in the modeling process. A model designed to fit measured parameters, while not taking into account the delays in measurement, may fail provide a wholly accurate description of the dynamics. In this sense, we may view such delays as no longer time from exposure to infection, but rather time from exposure to identification. In the present work, we seek to rigorously quantify this time delay effect in data. In order to accomplish this task, we first introduce, as a forward problem, a delayed-SIRD (susceptible-infected-recovered-deceased) model, incorporating a range of weighted possible time delays. We then employ an optimization process against measured data over the different weights, using the output of the optimization process to evaluate the presence of the different time delays present within the data. We note that the introduced method, while applied in the current work to COVID-19, is general in nature and such a technique may be employed in any situation in which a dynamical system and its associated measurements may exhibit time-delayed dynamics. The article is outlined as follows. We first introduce the employed delay-differential equation model and discuss some of its characteristics. We then introduce and describe the optimization process for the identification of the relevant time delays. We then validate the proposed technique on an interval of data for the COVID-19 outbreak in Italy, demonstrating its effectiveness under several different modeling assumptions. Finally, we conclude with a summary of our main findings and possible directions for future research in this area. For the model used in the present work, we make the following assumptions: 1. That the time scales considered are sufficiently short such that we do not need to consider new births or non-COVID-19 deaths; 2. That the time scales considered are sufficiently short such that we do not need to consider waning immunity; 3. That probability of contact and transmission remains constant throughout the infectious period; 4. That recovery and mortality rates follow an exponential distribution; We acknowledge that some of these assumptions are simplifications; however, for the scope of the current work, we feel they are nonetheless reasonable. Differently from [28] we model here the interaction with infected individuals by a convolution term where σ min and σ max represent the extrema of the latency period and w(σ) ≥ 0 is a weight function (or a distribution) such that σmax σmin w(σ)dσ = 1. We note that this term resembles the renewal function described first in [25] , from which the classical SIR model is then derived (see also [38] ). In the model discussed in [28] we considered (with δ(·) a Dirac-δ distribution) for a certainσ ∈ [σ min , σ max ] a representative constant delay identifying the latency period of the infection. Here instead we aim to optimize the function w in terms of the avalable measured data, allowing a certain variability of the latency, depending on the features of the population, In order to approach the problem numerically we discretize (1) by In this way we can make use of an optimization procedure to estimate the weights, and thus the memory effect, in terms of the observed data. As we will see the discrete profile does not differ qualitatively from a bell-like function. This indicates that there is a prevalent constant delay in the model and this justifies the approximation considered in [28] . The advantage is that in this way we are able to estimate numerically the most suitable value of a possibly constant delay to be used in the model. Under the above assumptions, we are led to consider the following delayed-SIRD system of differential equations with delays σ j , where j = 1 : k, with k the number of considered delays: It was shown in [28] that one may expect the model given by (2)-(5) to exhibit stable behavior provided that: which in the present work is easily extended to: for each j. We note additionally that above we have provided the density-dependent version of the model for generality. One may easily obtain the frequency-dependent version of (2)-(5) by instead considering β = β/N , where N is the total population, in lieu of β in (2), (3). Such a change will not affect the stability bounds (6)-(7), though we note that the units ofβ become Days −1 . We seek to use (2)-(5) identify the time delays σ j present in COVID-19 data. To this end, we solve the inverse problem for the delay weights w j . Denote the measured susceptible, infected, recovered, and deceased (from COVID-19) populations as s, i, r, d, respectively. We then define the following optimization problem: For a given β, φ r , φ d , Time-delay weight j Dimensionless Table 1 : Relevant parameters, parameter names, and units for the equations. The novelty of the proposed method above lies in its treatment of the delay term as a weighted sum of different delays. By considering the system in this manner, it is possible to obtain an optimization problem for the time delay that is linear in the delay term. This is important, as attempting to optimize directly on the σ j gives a highly nonlinear optimization problem that is likely untractable. This method of discretizing the time lags and optimizing for the weights as a means of time-delay identification is, to the authors' knowledge, novel. We note that this approach is highly general and is not restricted to models of COVID-19 or epidemiological models in general. Moreover, the accuracy and the detailed description of the delay depend on how k is chosen. This allows this method to be very applicable for other models as well. We note that the model (2)-(5) depends heavily not only on the time delays σ j and their respective weights w j , but also on the contact rate β, recovery rate φ r , and mortality rate φ d . In order to obtain robust estimates for the w j through the optimization procedure defined above, we resort to a Monte Carlo method. We consider the parameters β, φ r , φ d as independent random variables sampled from Gaussian distributions. The mean of the respective distributions is obtained via an empirical parameter fitting, with the standard deviation of each distribution taken to be 5% the value of the mean. We then run the optimization procedure (8a)-(8d) a total of 1000 times, corresponding to 1000 different parameter samplings. We then identify frequency with which different w j appear in the optimization, as well as the mean value of each w j over the sampled trial. We note as well, that a limitation of our implementation is its equal treatment of time delays across compartments. In general, we consider the infection and deceased data to be the most reliable such data and is emphasized in our calculations. As the easiest quantity to measure, the deceased compartment is assumed the most accurate. Further, the as the rate of new infections is the driving point behind policy measures, it is considered the second most accurate measure for our purposes. However, one may modify (8a) to include other compartments, or different subsets of the compartment set. In terms of implementation, there are several important points to note. We define the optimization procedure using the MATLAB routine fmincon. In order to solve the delay differential equation system (2)-(5), we use the MATLAB function dde23. We note that dde23 uses adaptive grid points, and hence the temporal points of the solution obtained using dde23 do not correspond to those in our dataset in general. To circumvent this, we use deval to evaluate the dde23 solution at the desired temporal points. Mean Min. Max. s − s L 2 / s L 2 .0051 .0022 .0104 .1101 .0798 .2757 r − r L 2 / r L 2 .2602 .071 .5913 d − d L 2 / d L 2 .0695 .0249 .2063 Table 2 : Error table: relative errors in L 2 norm against measured data for the different compartments over 1000 simulations, optimizing over i and d. Mean Min. Max. s − s L 2 / s L 2 .0051 .0022 .0108 Table 3 : Error table: relative errors in L 2 norm against measured data for the different compartments over 1000 simulations, optimizing only over i. For our measured data, we consider the COVID-19 outbreak in Italy, as reported by the newspaper Il Sole 24 [39] . We consider the dates ranging from September 11, 2020 to February 7, 2021. This date range was chosen to represent a 150-day span, in which different governmental policies were enacted, widespread testing was available, and before large-scale population vaccination. This choice was intentional, as we wish to avoid as many confounding factors as possible. We consider the following distributions for the parameters β ∼ N (.1131/n 0 , .05(.1131/n 0 )) (9) φ d ∼ N (1/940, .05(1/940)) (10) φ r ∼ N (1/24, .05(1/24)), (11) where N (µ, σ) denotes a normal distribution with mean µ and standard distribution σ and n 0 denotes the initial living population. For t > 73, where t 0 = 0, we replace β by β/3 to model the introduced government restrictions, designed to curb the virus spread. The values of φ d and φ r remain unchanged throughout the considered interval. The mean values of these parameters were obtained via a preliminary parameter tuning. For the delays, we considered 12 possible values, from σ = 2 days to σ = 35 days, with 3 days of spacing between the different days. We also performed some experiments (not shown) using different discretizations over the same range (for instance, 10 possible values) to show the robustness of the optimization procedure, which did not appreciably change the conclusions drawn. To examine the possible differences in delays among the different compartments, we perform the optimization four times: once considering the i and d compartments together in the objective function, and for the i, r and d compartments, considering each compartment individually. For each ensemble, we report the minimum, maximum, and mean L 2 error over the simulation ensemble when compared to the measured data, as well as the mean value and frequency of the different weights w j corresponding to the delays σ j . The aggregated results of the simulations in terms of the infected, recovered, and deceased compartments, when optimizing over the i and d compartments are shown in Fig. 1 . We see good qualitative agreement with the measured and simulated data across the three compartments of interest, with the major relevant trends being captured. In table 2, we report the mean, minimum, and maximum of each error value computed over the simulation ensemble. We see particularly strong agreement in the susceptible and deceased compartments, reasonable agreement in the infected compartment (especially considering its rapid-changing trajectory), and somewhat worse performance in the recovered compartment. Given that the r compartment was not included in the optimization, this is unsurprising. Turning our attention to the influence of time delays, we report the mean and frequency of the different weights w j in Figure 5 . We see a heavy concentration of weights values, for both frequency and mean, in the range of σ = 8 to σ = 17 days. This is consistent with what has been reported in general media, as the incubation period of COVID is believed to have an average of 3-11 days [40] . When combined with natural delays coming from the testing and data reporting, a total time lag in the range of 8-17 days would appear consistent with this information. Table 4 : Error table: relative errors in L 2 norm against measured data for the different compartments over 1000 simulations, optimizing only over r. Mean Min. Max. Table 5 : Error table: relative errors in L 2 norm against measured data for the different compartments over 1000 simulations, optimizing only over d. We then repeat this experiment three more times: once each for the i, r, and d compartments considered individually in the optimization process. The error tables for the i, r and d compartments can be found in Table 3 , 4, and 5 respectively. The corresponding simulation outputs for each case are plotted in Figures 2, 3 and 4 . The behavior is consistent with what one may intuitively expect; in each instance, the error behavior is minimized for the considered compartment, with the simulation variance greatly reduced for the isolated compartment. We note that one can observe from the relevant tables and figures that there is a much higher degree similarity between the i and d error behavior when compared to the r compartment, which appears to act more independently. This is confirmed when looking at the frequency and mean distribution of the weights w j in Figures 6, 7 and 8 for the i, r, and d compartments respectively. In the case of the i compartment, we see concentration between σ=8 and σ=17 days, similar to when i and d are considered jointly; however, for i alone, σ =14 appears as the most dominant lag term, rather than σ =11. The behavior appears similar but even more pronounced when considering d alone. In this instance σ=14 is again the dominant delay term; however, the grouping is tightly clustered between σ =11 and σ =17, with terms outside this range showing very little influence. In contrast, when examining the case of isolated r in 7, we see a different distribution of delay terms. Indeed, in this instance, the dominant delay terms range from σ = 14 to σ = 20, and are very tightly concentrated within this range. This suggests a distinct and longer time scale when compared to the i and d, and may explain why the error behavior for the r compartment shows a distinctly different behavior in the plots and tables shown. In all, we can conclude three key takeaway points from these results: 1. The time delays present in COVID-19 data range from σ = 8 to σ = 20 days, with the most likely value for σ j the i and d compartments in the range of σ=11 to σ=14 days. Figure 1 : Aggregated results of the infected, recovered, and deceased compartments over the range of simulations when optimizing over the i and d compartments. We see good agreement across all compartments with the measured data, and particularly strong agreement (less than 10% in L 2 norm) in the deceased compartment. Figure 2 : Aggregated results of the infected, recovered, and deceased compartments over the range of simulations when optimizing over the i compartment. We see good agreement across all compartments with the measured data, and particularly strong agreement (less than 10% in L 2 norm) in the deceased compartment. Figure 3 : Aggregated results of the infected, recovered, and deceased compartments over the range of simulations when optimizing over the r compartment. We see good agreement in the r compartment; the agreement with the d compartment remains acceptable, though somewhat lesser than other optimization choices. The error in the infected compartment is notably higher. This different error behavior is reflected in the optimization results; when optimizing over r, the optimal time delays are different than when optimizing over i and d. For both frequency and mean, σ = 11 appears as the most represented delay. 2. The scale of the time delays across the compartments is not uniform. It is similar for new infections and for deceased persons, but occurs over a somewhat longer time scale for recovered individuals.This is not surprising due to the different recovery rate and the inherent difficulty of measuring this compartment, given delays in testing and in the reporting of results. 3. The time scales revealed by the optimization procedure shown here are longer than most estimates of the incubation period, suggesting that other delays are present in the data, and a period of 14 to 17 days should be considered when analyzing the effects of interventions. In the present work, we have attempted to identify time delays present in COVID-19 data by means of newlyintroduced method. This method consists of discretizing k possible time delays σ j , j = 1 : k and expressing these terms as a weighted sum k j=1 w j i(t − σ j ). This formulation then yields a an optimization problem which is linear in the weights w j . After a preliminary parameter fitting, we then solve an ensemble of optimization problems, finding the weights w j which minimize the difference between simulated and measured data over different samplings of parameter values. We recover a distribution of delay terms for the different variables, allowing us to recover the inherent time-delays within the measured data. We tested this approach on COVID-19 data in Italy, and found that the time-delays differ among different data points, but generally speaking range between 8 and 17 days for i and d, with the most likely value between 11 and 14 days, and over a somewhat longer time scale (14-20 days) for the recovered compartment. This is consistent with our expectations given the incubation period of the disease and the natural time delays one may expect from the testing and data collection process. Our results confirm that proper evaluation of a given intervention for COVID-19 should allow for a sufficiently long time window to pass before conclusions are made. There are several important directions for future work. Given the increased importance of waning vaccine efficacy over time, a similar analysis given vaccine data and data on breakthrough infections could be helpful in identifying similar such time-delay trends for this problem. While we have elected to avoid vaccines in the current work, preferring to establish a proof of concept on a more simplified problem, the approach offered is extremely general and may be applied to these important problems. Further, given the generality of the proposed optimization procedure, it may have wide applications outside of COVID-19 and epidemiology generally, and may be applied to any model where time delays are present, including applications in traffic modeling, pedestrian flow, remote sensing, and other fields. From the theoretical and methodological standpoint, an extension of the framework shown here for problems in which the delays are non-constant or state-dependent is also an important direction of future research. Is it safe to lift covid-19 travel bans? the newfoundland story Covid-19 dynamics across the us: A deep learning study of human mobility and social behavior Bayesian-based predictions of covid-19 evolution in texas using multispecies mixture-theoretic continuum models Dynamic mode decomposition in adaptive mesh refinement and coarsening simulations Coupled and uncoupled dynamic mode decomposition in multi-compartmental systems with applications to epidemiological and additive manufacturing problems System inference for the spatio-temporal evolution of infectious diseases: Michigan in the time of covid-19 Hyperbolic models for the spread of epidemics on networks: kinetic description and numerical methods Hyperbolic compartmental models for epidemic spread on networks with uncertain data: Application to the emergence of covid-19 in italy Kinetic modelling of epidemic dynamics: social contacts, control with uncertain data, and multiscale spatial dynamics Simulating the spread of covid-19 via a spatially-resolved susceptible-exposed-infected-recovered-deceased (seird) model with heterogeneous diffusion Diffusion-reaction compartmental models formulated in a continuum mechanics framework: application to covid-19, mathematical analysis, and numerical study Adaptive mesh refinement and coarsening for diffusion-reaction epidemiological models Assessing the spatio-temporal spread of COVID-19 via compartmental models with diffusion in Italy, USA, and Brazil. Archives of Computational Least-squares finite element method for a meso-scale model of the spread of covid-19 An agent-based computational framework for simulation of global pandemic and social response on planet x Spread and dynamics of the covid-19 epidemic in italy: Effects of emergency containment measures Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand Spread and dynamics of the COVID-19 epidemic in Italy: Effects of emergency containment measures COVID-19 and Italy: what next? The Lancet A modified sir model for the covid-19 contagion in italy Monitoring italian covid-19 spread by a forced seird model Estimating the reproductive number and the outbreak size of covid-19 in korea Mathematical modeling of the spread of the coronavirus disease 2019 (covid-19) taking into account the undetected infections. the case of china Suihter: A new mathematical model for covid-19. application to the analysis of the second epidemic outbreak in italy Containing papers of a mathematical and physical character An Introduction to Mathematical Population Dynamics: Along the Trail of Volterra and Lotka Mathematical Biology I: An Introduction Delay differential equations for the spatially-resolved simulation of epidemics with specific application to covid-19 Solvable delay model for epidemic spreading: the case of covid-19 in italy Seir model for covid-19 epidemic using delay differential equation The analysis of a time delay fractional covid-19 model via caputo type fractional derivative. Mathematical methods in the applied sciences Global asymptotic properties of a delay sir epidemic model with finite incubation times Mathematical models in population biology and epidemiology Delay differential equation models in mathematical biology Global stability of an sir epidemic model with information dependent vaccination Correcting notification delay and forecasting of covid-19 data A modelling approach for correcting reporting delays in disease surveillance data On the formulation of epidemic models (an appraisal of kermack and mckendrick) The incubation period of coronavirus disease 2019 (covid-19) from publicly reported confirmed cases: estimation and application