key: cord-0813110-t4p6850g authors: Bai, Yue; Safikhani, Abolfazl; Michailidis, George title: Hybrid Modeling of Regional COVID-19 Transmission Dynamics in the U.S date: 2021-12-21 journal: IEEE Journal on Selected Topics in Signal Processing DOI: 10.1109/jstsp.2022.3140703 sha: 6904abcb528c63effc7dcbe502374b350deb619c doc_id: 813110 cord_uid: t4p6850g The fast transmission rate of COVID-19 worldwide has made this virus the most important challenge of year 2020. Many mitigation policies have been imposed by the governments at different regional levels (country, state, county, and city) to stop the spread of this virus. Quantifying the effect of such mitigation strategies on the transmission and recovery rates, and predicting the rate of new daily cases are two crucial tasks. In this paper, we propose a hybrid modeling framework which not only accounts for such policies but also utilizes the spatial and temporal information to characterize the pattern of COVID-19 progression. Specifically, a piecewise susceptible-infected-recovered (SIR) model is developed while the dates at which the transmission/recover rates change significantly are defined as"break points"in this model. A novel and data-driven algorithm is designed to locate the break points using ideas from fused lasso and thresholding. In order to enhance the forecasting power and to describe additional temporal dependence among the daily number of cases, this model is further coupled with spatial smoothing covariates and vector auto-regressive (VAR) model. The proposed model is applied to several U.S. states and counties, and the results confirm the effect of"stay-at-home orders"and some states' early"re-openings"by detecting break points close to such events. Further, the model provided satisfactory short-term forecasts of the number of new daily cases at regional levels by utilizing the estimated spatio-temporal covariance structures. They were also better or on par with other proposed models in the literature, including flexible deep learning ones. Finally, selected theoretical results and empirical performance of the proposed methodology on synthetic data are reported which justify the good performance of the proposed method. Italy, Spain and the tri-state area of New York, New Jersey and Connecticut, various mitigation strategies were put rapidly in place with the most stringent one being "stay-at-home" orders. The key purpose of such strategies was to reduce the virus transmission rate and consequently pressure on public health infrastructure [1] . To that end, the California governor issued a "stay-athome" order on March 19, 2020 , that was quickly followed by another 42 states by early April. All states with such orders proceeded with multi-phase reopening plans starting in early May, allowing various non-essential business to operate, possibly at reduced capacity levels to enforce social distancing guidelines. In addition, mask wearing mandates also came into effect [2] as emerging evidence from clinical and laboratory studies showed that masks reduce the spread [3] . However, these reopening plans led to a substantial increase in the number of confirmed COVID-19 cases in many US states, followed by increased number of fatalities throughout the summer of 2020, concentrated primarily in the Southern US states. Different states and local communities adopted and implemented different non-pharmaceutical interventions to reduce infections, but a "3rd wave" emerged in the fall of 2020 with cooling temperatures and people spending more time indoors. Further, during late fall of 2020, various variants of concerns started emerging around the world, characterized by higher transmission capabilities and potentially increased severity based on hospitalizations and fatalities [4] . Variants that exhibited a certain degree of The emergence of the COVID-19 pandemic led to the development of many data science and signal processing modeling approaches addressing diverse issues, including forecasting progress of the disease, impact of non-pharmaceutical intervention strategies [5] , methods to estimate the Infection and Case Fatality Rates (IFR/CFR) [6] , pre-existing conditions and clinical factors that impact the CFR [7] , computer audition for diagnosing COVID-19 [8] , image analysis for COVID-19 [9] , phylogenetic network analysis of Covid-19 genomes [10] , impact of aerosol transmission to public health [11] , guidelines for reopening critical social activities such as schools [12] . Note that as Covid-19 progressed together with our knowledge about it, the range of topics addressed significantly expanded, while the focus also exhibited certain shifts. As an example, early on (March 2020) it was believed that the virus can spread through contaminated surfaces, known as fomites, and this informed both the World Health Organization (WHO) and the Center of Disease Control (CDC) recommendations 2 on surface cleaning and disinfection. However, subsequent studies and investigations of outbreaks pointed that the majority of transmissions occur through droplets and aerosols that led to a revision of recommendations by the WHO and the CDC 3 . It also led to new research on aerosol dispersion models and on the role of ventilation to mitigate transmission [13] . Nevertheless, forecasting the spread of the epidemic throughout its course (initially with the imposition of various mitigation strategies, and more recently through the emergence of more transmissible variants and the increased pace of vaccination campaigns around the world) has remained a key task and a number of signal processing approaches have been developed as briefly summarized next. A number of epidemic models have been developed to analyze and predict COVID-19 transmission dynamics. Mathematical models, such as the class of susceptible-infectious-recovered (SIR) models are widely used to model and forecast epidemic spreads. [14] proposed a time-1 https://covid.cdc.gov/covid-data-tracker/#variant-proportions 2 https://www.who.int/news-room/commentaries/detail/transmission-of-sars-cov-2-implications-for-infection-prevention-preca utions 3 https://www.cdc.gov/coronavirus/2019-ncov/science/science-briefs/sars-cov-2-transmission.html dependent SIR model and tracked transmission and recovery rates at each time point by employing ridge regression while [15] proposed a discrete-time susceptible-infectious-recovered-dead (SIRD) model and provided estimations of the basic reproduction number (R 0 ), and the infection mortality and recovery rates by least squares method. Moreover, [16] and [17] built an extended SIR model with time-varying transmission rates and implemented a Markov Chain Monte Carlo algorithm (MCMC) to obtain posterior estimates and credible intervals of the model parameters. A number of models focused on identifying a change in the parameters of the underlying model employed. For example, [18] combined the widely used SIR model (see Section II) with Bayesian parameter inference through MCMC algorithms, assuming a time-dependent transmission rate. Instead of directly estimating a change point in the transmission rate and the other parameters in the SIR model, they assumed a fixed value on the number of the change points, and imposed informative prior distributions on their locations, as well as the transmission rate based on information from intervention policies. Further, [19] proposed to model the time series of the log-scaled cumulative confirmed cases and deaths of each country via a piecewise linear trend model. They combined the self-normalization (SN) change-point test with the narrowest-overthreshold (NOT) algorithm [20] to achieve multiple change-point estimation. Moreover, [21] and [22] analyzed the effect of social distancing measures adopted in Europe and the United States, respectively, using an interrupted time series (ITS) analysis of the confirmed case counts. Their work aim to find a change point in the time series data of confirmed cases counts for which there is a significant change in the growth rate. In [21] 's paper, the change points were determined by linear threshold regression models of the logarithm of daily cases while [22] used an algorithm developed in [23] , based on an L 0 penalty on changes in slope to identify the change points. Finally, [24] utilized a branching process for modeling and forecasting the spread of COVID-19. Another line of work employed spatio-temporal models for parameter estimation and forecasting the spread of COVID-19. For example, [25] introduced an additive varying coefficient model and coupled it with a non-parametric approach for modeling the data, to study spatiotemporal patterns in the spread of COVID-19 at the county level. Further, [26] proposed a heterogeneous infection rate model with human mobility from multiple regions and trained it using weighted least squares at regional levels while [27] fitted a generalized additive model (GAM) to quantify the province-specific associations between meteorological variables and the daily cases of COVID-19 during the period under consideration. In addition to mathematical methods, many machine learning/deep learning methods were ap-plied for forecasting of COVID-19 transmission. For example, [28] and [29] employed Artificial Neural Networks (ANN) and Long Short-Term Memory (LSTM) type deep neural networks to forecast future COVID-19 cases in Iran and Canada, respectively, while [30] developed a modified stacked auto-encoder for modeling the transmission dynamics of the epidemic. Review paper [31] presents a summary of recent COVID-19 forecasting models. The previous brief overview of the literature indicates that there are two streams of models, the first mechanistic and the second statistical in nature. The former (SIR/SIRD) describe key components of the transmission chain and its dynamics and have proved useful in assessing scenarios of the evolution of a contagious disease, by altering the values of key model parameters. However, they are macroscopic in nature and can not easily incorporate additional information provided either by mitigation strategies or other features, such as movement of people assessed through cell phone data. Statistical models can easily utilize such information in the form of covariates to improve their forecasting power. However, they primarily leverage correlation patterns in the available data, that may be noisy, especially at more granular spatio-temporal scales (e.g., county or city level) that are of primary interest to public health officials and policy makers. To that end, this paper aims to develop an interpretable hybrid model that combines a mechanistic and a statistical model, that respects the theoretical transmission dynamics of the former, but also incorporates additional spatio-temporal characteristics resulting in improved forecasting capabilities at fairly granular spatio-temporal scales. Specifically, we analyze confirmed cases and deaths related to COVID-19 from several states and counties/cities in the United States from March 1st, 2020 to March 31st, 2021. In the absence of non-pharmaceutical interventions, the spread of COVID-19 can be modeled by a SIR model with fixed transmission and recovery rates. One of the main reasons to select that model as the building block for the proposed methodology is that the transmission and recovery rates are easy to interpret and hence can be used in policy decision making. However, many diverse mitigation policies were put in place at different regional levels in the U.S. Thus, the simple SIR model may not be a good fit for the data. Instead, we propose a piecewise stationary SIR model (Model 1), i.e. the SIR model parameters may change at certain (unknown) time points. Such time points are defined as "break (change) points". Unlike some other methods discussed in the literature review, in our modeling framework, the number of change points and their locations are assumed to be unknown and must be inferred from the data. Such flexibility on the modeling front allows inferring potentially different temporal patterns across different regions (states or counties), and yields a data-driven segmentation of the data which subsequently improves the fit (see more details in Section IV), but also complicates the model fitting procedures. To that end, a novel data-driven algorithm is developed to detect all break points, and to estimate the model parameters within each stationary segment. Specifically, we define certain time blocks and assume the model parameters are fixed during each block of time points. Then, a fused lasso penalty is used to estimate all model parameters [32] . This procedure is further coupled with hardthresholding and exhaustive search steps to estimate the number and location of change points (details provided in Section II). To enhance the forecasting power of the model and to capture additional spatial and temporal dependence not explained through the SIR model, the piecewise constant SIR model is coupled with spatial smoothing (Model 2) and time series components (Model 3). The former is accomplished through the addition of a spatial effect term which accounts for the effect of neighboring regions, while the latter through a Vector Auto-Regressive (VAR) component to capture additional auto-correlations among new daily cases and deaths. Capturing the spatio-temporal dependence through Model 3 aids in reducing the prediction error significantly (sometimes around 80%) compared to the piecewise SIR model which confirms the usefulness of a hybrid modeling framework (for more details see Section IV). To verify the applicability of the proposed methodology to other data sets with similar characteristics, the developed algorithm is tested over several simulation settings and exhibits very satisfactory performance (details in Section III) and some theoretical properties of the proposed method (prediction consistency, as well as detection accuracy) are established in Appendix B. The remainder of the paper is organized as follows. In Section II, proposed statistical models are introduced and data-driven algorithms are described to estimate their parameters. The proposed algorithms are tested on various simulation settings and the results are reported in Section III. The proposed models are applied to several U.S. states and counties and the results are described in Section IV. Finally, some concluding remarks are drawn in Section V. The proposed class of hybrid models leverages the framework of the SIR model, which is presented next to set up key concepts. The standard SIR model [33] is a mechanistic model, wherein the total population is divided into the following three compartments: susceptible (uninfected), infected, and recovered (healed or dead). It is assumed that each infected individual, on average, infects β other individuals per unit time, and each infected individual recovers at rate γ. The two key model parameters, the transmission rate β and recovery rate γ, are assumed to be fixed over time. The temporal evolution of the SIR model is governed by the following system of three ordinary differential equations: where S, I f and R represent the individuals in the population in the susceptible, infected and recovered stages, respectively. Note that the variables S, I f and R always satisfy where N is the total population size. In this formulation, we ignore the change in the total population, so that N remains constant over time. Due to the fact that COVID-19 records are discrete in time (∆t = 1 day), we consider the discrete-time version of SIR model, so that for each t = 1, . . . , T − 1, the system comprises of the following three difference equations where S(t) stand for the number of susceptible individuals at time t, I f (t) for the number of infected ones and finally R(t) for those recovered. Note that these three variables S(t), I f (t) and R(t) still satisfy the constraint S(t) Notice that the number of infected cases I f (t) is not observable. Specifically, confirmed COVID-19 case counts may not capture the total infected cases due to limited testing availability/capacity, especially at the beginning of the pandemic (testing has been primarily restricted to individuals with moderate to severe symptoms). For example, in the United States, over 90% of COVID-19 infections were not identified/reported at the beginning of the pandemic [34] , [35] . To that end, we define a relationship between the true infected cases and observed/recorded infected cases through an under-reporting function. Specifically, define ∆I(t) = ∆I f (t) × (1 − u(t + 1)) is the true infected cases, I(t) is the observed/recorded infected cases, and finally u(t) is the underreporting function. We consider a parametric model for the function u(t) (see more details in Section III). Note that we could also use non-parametric method to solve the under-reporting issue, but since the sample size of the time series is limited, in this paper, we consider a parametric method instead. Given the observed infected cases I(t) and under-reporting function u(t), one can transform the data back to I f (t) by for t = 2, . . . , T . Combining the difference equation (2) normalized by the total population with the transformations stated in Equation (5) yield to the following simple linear equations: for each t = 1, . . . , T − 1 where ∆I(t) = I(t + 1) − I(t) and ∆R(t) = R(t + 1) − R(t). Next, we extend the standard SIR model to accommodate temporal and spatial heterogeneity as well as to include stochastic temporal components. The former is achieved, by allowing the transmission and recovery rates to vary over time and through the inclusion of an additional term in (7) that captures spatial effects while the latter is achieved through adding a vector auto-regressive component [36] . [37] . Variants of the SIR model with time varying parameters have been proposed in the literature [38] . For our application, we assume that the transmission and recovery rates are piecewise constant over time, reflecting the fact that their temporal evolution is impacted by intervention strategies and environmental factors (Model 1). Second, the standard SIR model and its piecewise stationary counterpart do not account for any influence due to inter-region mobility and travel activity. We incorporate such inter-region information by considering the influence exerted by its few neighboring regions such as cities, counties or states (Model 2); see also [26] . Third, the standard homogeneous SIR model, previously discussed, is deterministic; hence, the output of the model is fully determined by the parameter values of the transmission and recovery rates and the initial conditions. Its stochastic counterpart [39] , [40] , possesses some inherent randomness. Alternatively, the general stochastic epidemic model can be approximated by the stochastic differential equation (see e.g. [41] ): where the random variables S(t) and I f (t) are continuous, and W = (W 1 , W 2 ) is a vector of two independent Wiener processes, i.e., W i (t) ∼ N (0, t). Given (8), the stochastic SIR model can be written as: where with the increments ∆W 1 (t) and ∆W 2 (t) being two independent normal random variables, i.e., ∆W i (t) ∼ N (0, ∆t). It can be seen that the resulting regression model, based on the discrete analogue of (10), will have an error term exhibiting temporal correlation, driven by I f (t) and where {t 1 , . . . , t m 0 } are unknown m 0 "change points" such that the transmission and recovery rates exhibit a change from while it remains fixed until the next break point. Hence, these break points divide the time series data into stationary segments. Moreover, is the weighted average of spatial effect over the neighboring regions at time t where N j denotes the total population in neighboring region j; α is a spatial effect parameter; ω j 's are spatial weights such that q j=1 ω j = 1. The latter two parameters capture inter-region mobility patterns. Finally, e t is white noise with mean 0 and variance σ 2 , and φ 1 , . . . , φ p are the corresponding autoregressive parameters. In the sequel, model (11) with α = φ 1 = . . . = φ p = 0 is considered as Model 1 (piecewise SIR), the case of only restricting φ 1 = . . . = φ p = 0 is considered as Model 2 (piecewise SIR with spatial effects) while the full model with no constrains is considered as The estimation of the model parameters is accomplished in the following three steps: Step 1: Fit Model 1 for each region of interest to obtain the change points; Step 2: Obtain the transmission and recovery rates and spatial effect as in Model 2. Step 3: Compute the residuals ( t ) from Step 2 and fit a VAR model to them, see [36] . The rationale behind this step-wise algorithm is that assuming that the influence of the spatial effect component Z t is small, we can use Model 1 for each region of interest to estimate both the change points and the corresponding transmission and recovery rates. Denoting the final estimated change points by Model 1 as A f n = t f 1 , . . . , t f m f , segment-specific transmission and recovery rates combined with an overall spatial effect can be readily estimated using least squares applied to augmented linear model which includes all segments concatenated to each other at time points t f j 's and the spatial effect. Finally, the residuals of this augmented linear model utilizing the least squares estimates can be computed and additional least squares estimates on the residuals with its previous values in the design matrix can yield to estimates of autoregressive parameters. The difficult part of the algorithm is to estimate the number and locations of break points. Details of the algorithm are presented in the Appendix A while a brief summary is provided next. The first step of the algorithm aims to select candidate change points A n among blocks by solving a block fused lasso problem. The estimated change points obtained by the block fused lasso step includes all points with non-zero estimated parameters, which leads to overestimating the number of the true change points in the model. Nevertheless, the block fused lasso parameter estimates enjoy a prediction consistency property, which implies that the prediction error converges to zero with high probability as n → +∞. This result is stated and proved (see Theorem 1 in the Appendix B) under some mild conditions on the behaviour of the tail distributions of error terms. A hard-thresholding step is then added to reduce the over-selection problem from the fused lasso step by "thinning out" redundant change points exhibiting small changes in the estimated coefficients. After the hard-thresholding step, those candidate change points located far from any true change points will be eliminated when the block size is appropriate. On the other hand, there may be more than one selected change points remaining in small neighborhoods of each true change point. To remedy this issue, the remaining estimated change points are clustered while in each cluster, an exhaustive search examines every time point inside the neighborhood search region based on the cluster of candidate change points and selects the best time point as the final estimated change point. We evaluate the performance of the proposed models on their predictive accuracy, change point detection and parameter estimation. We consider three simulation scenarios (see additional simulation settings in B in the supplementary material). The details of the simulation settings for each scenario are explained in Section III-A. All results are averaged over 100 random replicates. We assess the results for the three models presented: Model 1, the piecewise stationary SIR model; Model 2, the piecewise stationary SIR model with spatial effect; and Model 3, the piecewise stationary SIR model with spatial effect and a VAR(p) error process. The out-of-sample Mean Relative Prediction Error (MRPE) is used as the performance criterion defined as: where n test is the number of time points for prediction, I(t) is the predicted count of infected cases at time t, and I(t) the observed one. The MRPE of R(t) can be obtained by respectively replacing the I(t) and I(t) with R(t) and R(t). The predicted number of infected cases and recovered cases are defined as for all t = T + 1, · · · , T + n test . For change point detection, we report the locations of the estimated change points and the percentage of replicates that correctly identifies the change point. This percentage is calculated as the proportion of replicates, where the estimated change points are close to each of the true break points. Specifically, to compute the selection rate, a selected break point is counted as a "success" for the j-th true break point, t j , if it falls in the ], j = 1, . . . , m 0 . We also report the mean and standard deviation of estimated parameters for each models. All results are reported in Table I . We consider three different simulation settings. The SIR model's coefficients and underreporting functions are depicted in Figure 4 in the Section B. and γ (2) = 0.04 . For the spatial effect, we set α = 1, β s (t) = 0.10 − 0.05t T −1 , γ s (t) = 0.04, t = 1, . . . , T − 1. We first generate the spatial effect data from SIR model in (11) with parameter β s (t) and γ s (t) and generate the error term by VAR(1) model with the covariance matrix of the noise process Σ ε = 0.1I 2 , where I 2 is the two-dimensional identity matrix. By plugging in the spatial effect data and error term data, we generate the dataset of the response variable Y t from (11) . The autoregressive coefficient matrix has entries 0.8, 0, 0.2, 0.7 from top left to bottom right. We assume no under-reporting issue in this scenario, i.e., u(t) = 0, hence ∆I(t) = ∆I f (t), for t = 1, . . . , T − 1. Simulation Scenario B (Model 1 with exponentially decreasing under-reporting rate): In this scenario, we set the number of time points T = 250, m 0 = 2, the change points t 1 = 100 and t 2 = 200. We choose β (1) = 0.10, β (2) = 0.05, β (3) = 0.10, γ (1) = 0.04, γ (2) = 0.06, γ (3) = 0.04. Results are based on data generated from the SIR model in (11) The under-reporting rate is chosen to change over time. Specifically, we set the under-reporting rate u(t) = 1 − The mean and standard deviation of the location of selected change point, relative to the the number of time points T -i.e., t f 1 /T -for all simulation scenarios are summarized in Table I . The results clearly indicate that, in the piecewise constant setting, our procedure accurately detects the location of change points. The results of the estimated transmission rate β, recovery rate γ, spatial effect α and parameter of the under-reporting rate function a suggest that our procedure produces accurate estimates of the parameters, under the various under-reporting function settings (b is assumed to be known). We generate additional 20 days worth of data to measure the prediction performance. The MRPE results for I(t) and R(t) are provided in Table I The COVID-19 data used in this study are obtained from [42] . The curated data and code used in the analysis are available at the authors' GitHub repository 4 . The analysis is performed both at the state and county level and the raw data include both cases and deaths, as reported we calculate the number of recovered cases for each region (state/county) as follows: the number of deaths in the region, multiplied by the nationwide cumulative recovered cases and divided by the nationwide deaths. Specifically, we assume that the recovery versus deceased ratio for each state/county is fixed, and can be well approximated by the nationwide recovery-to-death ratio. As coronavirus infections increase, while laboratory testing faces capacity constraints, reporting only confirmed cases and deaths leads to (possibly severe) under-estimation of the disease's impact. On April 14, 2020, CDC advised states to count both confirmed and probable cases and deaths. As more states and localities did so since then, in this study we focus on the combined cases, which include both confirmed and probable cases. The populations of states and counties are obtained from [44] . Further, to decide which neighboring states/counties to include in Model 2, their distance to the target state/county of interest is used. The latter is obtained from the [45] . In the results presented, we define regions within 500 miles for states and 100 miles for counties/cities as neighboring ones in Models 2 and 3. For those areas with a large number of neighboring regions, such as New York state, we only consider the top five regions with the smallest distances. We assume the probability rate of becoming susceptible again after having recovered from the infection to be 0.5%. The reinfection rate in the short run (∼ 6 months) is believed to be very low. Some evidence from health care workers (median age 38 years) estimates it at 0.2% [46] . Hence, the selected one seems a reasonable upper bound for the task at hand. Note that small variation of this rate did not impact the results. We analyzed the daily count of COVID-19 cases at the state-level from March 1, 2020, to Table XVII in the Supplement) . Therefore, all presented results in this Section are based on the quadratic function. Finally, to estimate the under-reporting parameter a, we perform a grid search within the interval [0.1, 0.3]. The main reason for selecting this interval is that it matches with around 90% of COVID-19 under-reporting rate at the beginning of the pandemic as investigated and reported in [34] , [35] . Note that we also assume that the COVID-19 underreporting rate after December is very low as most of the regions built-up their testing capacity. Therefore, we set u(t) = 1 after December 2020. Most of the states were selected due to being severely affected for a certain period of time during the course of COVID-19. The remaining regions illustrate interesting patterns gleaned from the proposed models. Let I(t) and R(t) denote the number of infected and recovered individuals (cases) on day t. Day 1 refers to the first day after March for which the region records at least one positive COVID-19 case. Figure Table XIII in the supplementary material. When the spatial resolution is high (county level aggregated data), neighboring regions may exhibit similar patterns in terms of the evolution of transmission and recovery rates. Therefore, constructing weight matrices based on distance is meaningful as it is a common practice in spatial statistics [47] . Such spatial smoothing through proper weight matrices is especially helpful in increasing the statistical power through increasing the sample size, thus yielding more accurate predictions. However, the evolution of COVID-19 may exhibit different patterns across neighboring states due to the coarse spatial resolution. Thus, defining weight matrices based on distance may not be ideal. Hence, Models 2.3 and 2.4 select regions based on similarities of infected/recovered cases. Similarity between the region of interest and the j-th potential similar region is defined as where In the equal weight setting (Model 2.1), ω j = 1/q for any j = 1, . . . , q. In both distance-based weight and similarity-based weight settings, power distance weights are used, wherein the weight is a function of the distance/similarity to the neighboring region ω j = 1 d j , ω j = 1 s j where d j is the distance score and s j is the similarity score for the j-th region. Under the constraint that q j=1 ω j = 1, we obtain the normalized weights as ω j = proved to be the best performing one. As expected, change points detected for state data are related to "stay-at-home" orders, or phased reopening dates issued by state governments. We define the reopening date as the time when either the "stay-at-home" order expired or state governments explicitly lifted orders and allowed (selected or even all) businesses to reopen [48] . The "stay-at-home" and reopening dates for all states are shown in Table III. TABLE III In Model 1, a change point is detected from March to April for all five states: New York, Oregon, Florida, California and Texas. These change points coincide with the onset of "stay-athome" orders and correspond to a significant decrease in the transmission rate. The first change points detected are around two weeks after the state's Governors signed a statewide "stay-athome" order (three weeks for CA), which is consistent with the fact that COVID-19 symptoms develop 2 days to 2 weeks following exposure to the virus. As can be seen from Figure 7 In July, many states paused plans to reopen, amid rising infected case counts 5 . These pausing actions effectively slowed the spread of COVID-19, as can be clearly seen in the downward trend of the transmission rate in July and August in Oregon, Florida, California and Texas. Interestingly, a change point in July is detected in Oregon, Florida and Texas, and a change point in August is detected in California, mainly related to this pausing. The left panel of Figure 1 The observed and fitted number of infected cases are displayed in Figure 2 . Note that the fitted number of infected cases and recovered cases are defined as for all t = 2, . . . , T . In summary, the piecewise constant SIR model (Model 1) with detected change points significantly improves the performance in prediction of the number of infected cases compared with the piecewise constant SIR model with pre-specified change points. Prespecified change points are defined as stay-at-home or reopening order dates for each region. It can be seen from the first two rows in Figure 2 that estimating when break points occurred using the developed algorithm improves the fit significantly, especially for states in which multiple change points are selected such as Oregon, California and Texas. Model 2.3 further improves the fit of the data for Florida and California, due to the addition of a spatial smoothing effect. Finally, Model 3 provides further improvements, in particular for New York, which justifies empirically the use of hybrid modeling to analyze regional transmission dynamics of COVID- To determine the significance level of the spatial effect in Model 2, we provide the estimate, p-value, and 95% confidence intervals for the parameter α in Table XIV in the Supplement. We find that the influence of the infected or recovered cases in adjacent states is statistically significant (p-value < 0.05) for all states. Next, we assess the prediction performance for the three proposed models. The out-of-sample MRPE is used as the performance measurement, given by (12) . We set the last two weeks of our observation period as the testing period and use the remaining time points for training the model. Note that predicted number of infected and recovered cases are defined by (13) . The results of out-of-sample MRPE of I(t) and R(t) in the selected regions are reported in VI. The calculated MRPEs of I(t) show that Model 3 which includes spatial effects and VAR temporal component outperforms the other models. Spatial smoothing itself (Model 2) reduces the prediction error significantly in some states. For example, in New York, the spatial smoothing reduced the MRPE (I) by 64% when using Model 2.3 (similarity-based weight). Finally, the reduction in MRPEs using Model 3 justifies the presence of the VAR component in the modeling framework. Additional results related to the VAR component (including the estimated auto-regressive parameters) are reported in Section E in the Supplement. We also compare the developed model and associated methodology with the extended SIR, the ANN and the LSTM based models that were proposed in [17] , [28] , and [29] , respectively. The extended SIR model was trained using the "eSIR" package in the R programming language. The transmission rate modifier π(t) was specified according to actual interventions at different times and regions as described in [17] . The ANN was trained using the "nnfor" package in the R programming language. It comprises of 3 hidden layers with 10 nodes in each and a linear output activation function, which is the exact architecture in [28] . The number of repetitions for this algorithm was set to be 20 for I(t) and 10 for R(t). The LSTM architecture was implemented in PyTorch. The [29] did not specify the network architecture setting (number of layers, number of neurons). Thus, we performed a grid search over number of layers ranging from 1 to 3 with number of neurons as 10, 50, and 100. The best architecture in terms of minimizing the prediction error in the validation data is selected as the optimal LSTM architecture. Note that the prediction results with different network architecture in the LSTM were very similar. The selected number of layers and number of neurons based on grid search and corresponding prediction error for each region are also provided in Tables IV and V. The results of out-of-sample MRPE for I(t) and R(t) in the five states under consideration together with their sample standard deviations in the bracket are reported in Tables IV and V, respectively. The proposed method clearly outperforms the extended SIR (eSIR) model across all five states for both I(t) and R(t). Further, it broadly matches the performance of ANN and LSTM for most states. Further, note that the proposed model is easy to interpret since its key parameters (infection and recovery rates) are routinely used by policy makers (see also the discussion in Section IV-C). In December, Southern California was experiencing a fast and sustained outbreak, believed to be driven by a new strain designated as CAL.20C 6 . To that end, we analyzed the daily count of cases in Riverside and Santa Barbara counties in CA. As seen from Figure 14 We also provide the out-of-sample MRPE of I(t) and R(t) of selected counties in Notation. Denote the indicator function of a subset A as 1 A . R denotes the set of real numbers. Steps of the Proposed Algorithm. Block Fused Lasso: the objective is to partition the observations into blocks of size b n , wherein the model parameter B t remains fixed within each block and select only those blocks for which the corresponding change in the parameter vector is much larger than the others. Specifically, let n = T − 1 be the number of the times points for the response data Y t and define a sequence of time points 1 = r 0 < r 1 < · · · < r kn = n + 1 for block segmentation, such that r i − r i−1 = b n for i = 1, . . . , k n − 1, b n ≤ r kn − r kn−1 < 2b n , where k n = n bn is the total number of blocks. For ease of presentation, it is further assumed that n is divisible by b n such that r i − r i−1 = b n for all i = 1, . . . , k n . By partitioning the observations into blocks of size b n and fixing the model parameters within each block, we set θ 1 = B (1) and otherwise, for i = 2, 3, . . . , k n . Note that θ i = 0 for i ≥ 2 implies that θ i has at least one non-zero entry and hence a change in the parameters. Next, we formulate the following linear regression model in terms of Θ(k n ) = (θ 1 , . . . , θ kn ) : Y ∈ R 2n , X ∈ R 2n×2kn , Θ(k n ) ∈ R 2kn and E ∈ R 2n . A simple estimate of parameters Θ(k n ) can be obtained by using an 1 -penalized least squares regression of the form which uses a fused lasso penalty to control the number of change points in the model. This penalty term encourages the parameters across consecutive time blocks to be similar or even identical; hence, only large changes are registered, thus aiding in identifying the change points. Further, a hard-thresholding procedure is added to cluster the jumps into two sets: large and small ones, so that those redundant change points with small changes in the estimated parameters can be removed. We only declare that there is a change point at the end point of a block, when associated with large jump of the model parameters. Hard Thresholding: is based on a data-driven procedure for selecting the threshold η. The idea is to combine the K-means clustering method [50] with the BIC criterion [51] to cluster the changes in the parameter matrix into two subgroups. The main steps are: • Step 1 (initial state): Denote the jumps for each block by v k = θ k 2 2 , k = 2, · · · , k n and let v 1 = 0. Denote the set of selected blocks with large jumps as J (initially, this is an empty set) and set BIC old = ∞. Block clustering: the Gap statistic [52] is applied to determine the number of clusters of the candidate change points. The basic idea is to run a clustering method (here, K-means is selected) over a grid of possible number of clusters, and to pick the optimal one by comparing the changes in within-cluster dispersion with that expected under an appropriate reference null distribution (for more details, see Section 3 in [52] ). Exhaustive search: Specifically, define the final estimated change point t f i as computationally expensive, and not scalable for large data sets. Simple fused lasso is (block of size one) another method, which although computationally fast, it leads to over-estimating the number of break points; hence, it requires additional "screening" steps to remove redundant break points found using the fused lasso algorithm [53] , [54] . Such screening steps usually include tuning several hyper-parameters. This task not only slows down the detection method, but is also not robust. The proposed approach (block fused lasso coupled with hard-thresholding) aims to solve this issue by choosing appropriate block sizes, while it only needs a single tuning parameter (the threshold) to be estimated. Once the locations of break points are obtained, one can estimate the model parameters by running a separate regression for each identified stationary segment of the time series data. The work of [54] shows that this strategy yields consistent model parameter estimates. Grid search of parameter in the under-reporting function: We use grid search to estimate the parameter a in the under-reporting function u(t). Given a parameter grid of a, we transform the observed infected data I(t) by I f (t) = ∆I(t)/(1 − u(t)) + I f (t − 1), then apply the transformed data to the above method and compute the in-sample mean relative prediction error (MRPE) of ∆I f (t). Choose the value a that minimizes the in-sample MRPE of ∆I f (t). In this section, we establish the prediction consistency of the estimator from (16) . To establish prediction consistency of the procedure, the following assumptions are needed: (A1.) (Deviation bound) There exist constants c i > 0 such that with probability at least 1 − c 1 exp(−c 2 (log 2n)), we have (A2.) There exists a positive constant M B > 0 such that Theorem 1. 1 Suppose A1-A2 hold. Choose λ n = 2C 1 log 2n 2n for some large constant C 1 > 0, and assume m 0 ≤ m n with m n = o(λ −1 n ). Then with high probability approaching to 1 and n → +∞ , the following holds: Proof of Theorem 1. By the definition of Θ in (16), the value of the function in (16) is minimized at Θ. Therefore, we have Denoting A = {t 1 , t 2 , . . . , t m 0 } as the set of true change points, we have with high probability approaching to deviation bound in (19) . This completes the proof. Theoretical properties of lasso have been have been studied by several authors [55] , [56] , [57] , [58] . In controlling the statistical error, a suitable deviation conditions on X E/2n is needed. The deviation bound conditions (e.g. the assumption A1) are known to hold with high probability under several mild conditions. Under the condition that the error term E ∼ N (0, σ 2 I 2n ), the deviation bound condition holds with high probability by Lemme 3.1 in [56] . Given that the p (the number of time series components) is small and fixed, we have n log p, therefore, in the case where the X is a zero-mean sub-Gaussian matrix with parameters (Σ x , σ 2 x ), and the error term E is a zero-mean sub-Gaussian matrix with parameters (Σ e , σ 2 e ), the deviation bound condition holds with high probability by Lemme 14 in [57] . Detection Accuracy. When the block size is large enough, such that log n/b n remains small, if the selected change point t j is close to a true change point , the estimated Θ j will be large (asymptotically similar to the true jump size in the model parameters); if the selected change point t j is far away from all the true change points, the estimated Θ j will be quite small (converges to zero as sample size tend to infinity). Therefore, after the hard-thresholding, the candidate change points that are located far from any true change points will be eliminated. In other words, for any selected change point t j ∈ A n , there would exist a true change point t j ∈ A n close by, with the distance being at most b n . Thus, the number of clusters (by radius b n ) seems to be a reasonable estimate for the true number of break points in the model. On the other hand, since the set of true change points A n has cardinality less than or equal to the cardinality of the set of selected change points A n , i.e., m 0 ≤ m, there may be more than one selected change points remaining in the set A n in b n -neighborhoods of each true change point. For a set A, define cluster (A, x) to be the minimal partition of A, where the diameter for each subset is at most x. Denote the subset in cluster A n , b n by cluster A n , b n = {C 1 , C 2 , . . . , C m 0 }, where each subset C i has a diameter at most b n , i.e., max a,b∈C i |a − b| ≤ b n . Then with high probability converging to one, the number of subsets in cluster A n , b n is exactly m 0 . All candidate change points in A n are within a b n -neighborhood of at least one true change point and therefore, with high probability converging to one, there is a true change point t i within the Results are based on data generated from the SIR model in (11) in main paper with The coefficients for the SIR model are depicted in Figure 4 . We assume no under-reporting issue in this scenario, i.e., u(t) = 1, ∆I(t) = ∆I u (t), t = 1, . . . , T − 1. Simulation We generate the spatial effect data from SIR model in (11) in main paper. By plugging in the spatial effect data, we generate the response variable Y t with additional white noise error term from N (0, I 2 ). The SIR model's coefficients for the response variable and the spatial effect variable are depicted in Figure 4 . We assume no under-reporting issue in this scenario, i.e., u(t) = 1, ∆I(t) = ∆I u (t), t = 1, . . . , T − 1. Results are based on data generated from the SIR model in (11) in main paper with β(t) ∼ The coefficients for the SIR model are depicted in Figure 4 . We assume no under-reporting issue in this scenario, i.e., u(t) = 1, ∆I(t) = ∆I u (t), t = 1, . . . , T − 1. The mean and standard deviation of the location of the selected change point, relative to the the number of time points T -i.e., t f 1 /T -for all simulation scenarios are summarized in Table VII . The results clearly indicate that, in the piecewise constant setting, our procedure accurately detects the location of change points. It also indicates the proposed algorithm's robustness with respect to block size selection. The results of the estimated transmission rate β, recovery rate γ The block size is set to b n = 7, i.e. partition the observations into blocks of 7 days for all regions. Finally, the tuning parameter λ n is selected via a cross-validated grid search [59] . Few additional plots and tables related to the results of applying our method to some U.S. states are provided in this section. Let I(t) and R(t) denote the number of infected and recovered individuals (cases) on day t. Figure 5 depicts the actual case numbers I(t) and R(t) in the five states considered. In Figure 6 , we provide fractions of infected and recovered cases in given states and their neighboring states selected by similarity score in the Model 2.3. In Figure 7 , we show the 7-day moving average of measured transmission rate β(t) and recovery rate γ(t) along with the estimated β(t) and γ(t) from Model 1. To further examine the spread of the COVID-19, one could consider where R 0 (t) is the basic reproduction number of a newly infected person at time t. It mean that an infected person can further infect on average R 0 (t) persons. Generally, when R 0 > 1, the infection will be able to start spreading in a population with an exponential rate, and it will be hard to control the epidemic. Further, the observed and fitted number of infected cases and recovered cases are displayed in Figure 2 and Figure 8 , respectively. The fitted number of infected cases and recovered cases for all t = 2, . . . , T . In Table XIV , we provide the estimate, p-value, and 95% confidence intervals for the parameter α in Model 2 for all five states. For a stationary time series X t , the auto-covariance function (ACVF) at lag h is defined as and the auto-correlation function (ACF) at lag h is defined as Let and be the residuals from models 2 and 3, respectively. We use auto-correlation function (ACF) to measure the degree of dependence among the residual processes at different times. Figure 9 shows the ACF of these residual time series. As can be seen from these plots, in all states, there is a statistically significant auto-correlation in the process at different times while this temporal dependence has been reduced significantly in the residuals of Model 3 which confirms the applicability of VAR modeling for the considered data sets. The results of out-of-sample MRPE of I(t) and R(t) in the five states are reported in Table XV and In Figure 10 , we provide the predicted values in the last 2 weeks from all three different models in five states: New York, Oregon, Florida, California and Texas. In both the daily infected cases ∆I(t) and the daily recovered cases ∆R(t), the predictions by Model 2 and Model 3 perform better than Model 1 in all states. The change point detection results using exponential function are presented in Table XVII . As shown in Table XVII , using the exponential under-reporting function, we missed some Since the parameters obtained from (11) vary greatly, the 7-day moving average of transmission β(t) and recovery γ(t) rates is provided in Figure 14 . As can be seen from those figures, our estimated piecewise parameters are very close to the real parameters in the 7-day moving average. Few additional plots and tables related to the results of applying our method to some U.S. counties are provided in this section. Figure 11 depicts the observed infected case numbers I(t) and recovered case numbers R(t) in the nine counties/cities. In Figures 12 and 13 , we provide fractions of infected and recovered cases in given counties and their neighboring regions selected by similarity score in the Model 2.3. The observed and fitted number of infected cases and recovered cases are displayed in Figure 15 and Figure 16 , respectively. Note that we calculate the number of recovered cases by an approximated nationwide recovery-to-death ratio which probably leads to a less satisfactory fitted result of recovered cases at the county level. Nevertheless, it does not affect the change point detection result. In Table XXI , we provide the estimate, p-value, and 95% confidence intervals for the parameter α in Model 2 for additional counties. The results in Table XXI indicate that the neighboring counties are making an significant impact on estimating and forecasting the transmission dynamic. Finally, we use auto-correlation function (ACF) to measure the degree of dependence among residuals at different time points. Figure 17 shows the ACF of these residual time series in several counties/cities. After adding the VAR(p) model component to the modeling framework, the residual series in Model 3 has a significantly reduced the auto-correlation. We also provide the out-of-sample MRPE of I(t) and R(t) in Tables XXII and XXIII In Figure 18 , we provide the predicted values in the last 2 weeks from all three different models in nine counties/cities. In both the daily infected cases ∆I(t) and the daily recovered cases ∆R(t), the predictions by Model 2 and Model 3 perform better than Model 1 in all regions. In Figure 3 in the main paper, we provide heatmaps of the transmission and recovery rates in 67 counties in Florida. In this section, the details about the calculations of the estimated transmission rate β, the estimated recovery rate γ and the corresponding standard errors of β and γ are presented. where σ 2 is the sample variance Figure 19 . q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q (h) Scenario C (under-reporting rate) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q How will country-based mitigation measures influence the course of the covid-19 epidemic Considerations for wearing masks Respiratory virus shedding in exhaled breath and efficacy of face masks Nervtag note on b. 1.1. 7 severity Estimating the effects of non-pharmaceutical interventions on covid-19 in europe A systematic review and meta-analysis of published research data on covid-19 infection-fatality rates Factors associated with covid-19-related death using opensafely Recent advances in computer audition for diagnosing covid-19: An overview Point of care image analysis for covid-19 Phylogenetic network analysis of sars-cov-2 genomes Consideration of the aerosol transmission for covid-19 and public health Data and policy to guide opening schools safely to limit the spread of sars-cov-2 infection A guideline to limit indoor airborne transmission of covid-19 A time-dependent sir model for covid-19 with undetectable infected persons Data-based analysis, modelling and forecasting of the covid-19 outbreak An epidemiological forecast model and software assessing interventions on the covid-19 epidemic in china Extended sir prediction of the epidemics trend of covid-19 in italy and compared with hunan, china Inferring change points in the spread of covid-19 reveals the effectiveness of interventions Time series analysis of covid-19 infection curve: A change-point perspective Narrowest-over-threshold detection of multiple change points and changepoint-like features The effect of social distance measures on covid-19 epidemics in europe: an interrupted time series analysis Social distancing merely stabilized covid-19 in the united states Detecting changes in slope with an l 0 penalty The challenges of modeling and forecasting the spread of covid-19 Spatiotemporal dynamics, nowcasting and forecasting of covid-19 in the united states Learning to forecast and forecasting to learn from the covid-19 pandemic Covid-19 transmission in mainland china is associated with temperature and humidity: A time-series analysis Exponentially increasing trend of infected patients with covid-19 in iran: a comparison of neural network and arima forecasting models Time series forecasting of covid-19 transmission in canada using lstm networks Artificial intelligence forecasting of covid-19 in china A review on covid-19 forecasting models Sparsity and smoothness via the fused lasso Containing papers of a mathematical and physical character Substantial underestimation of sars-cov-2 infection in the united states Evaluating the massive underreporting and undertesting of covid-19 cases in multiple global epicenters New introduction to multiple time series analysis Modelling the covid-19 epidemic and implementation of population-wide interventions in Italy Infectious disease models with time-varying parameters and general nonlinear incidence rate Some evolutionary stochastic processes The total size of a general stochastic epidemic Stochastic epidemic modeling," in Mathematical and statistical estimation approaches in epidemiology The New York Times, 2020, coronavirus (Covid-19) data in the United States Comparing and integrating us covid-19 daily data from multiple sources: A county-level dataset with local characteristics Census Bureau, 2019, population and Housing Unit Estimates Datasets population and Housing Unit Estimates Datasets Antibody status and incidence of sars-cov-2 infection in health care workers Statistics for spatial data See how all 50 states are reopening Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov-2) Algorithm as 136: A k-means clustering algorithm Estimating the dimension of a model Estimating the number of clusters in a data set via the gap statistic Multiple change-point estimation with a total variation penalty Joint structural break detection and parameter estimation in high-dimensional non-stationary var models Simultaneous analysis of lasso and dantzig selector The adaptive and the thresholded lasso for potentially misspecified models (and a lower bound for the lasso) High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers The elements of statistical learning: data mining, inference, and prediction Linear regression analysis confidence intervals for the parameter α in counties/cities. (7). Using linear regression analysis [60] , one can calculate the estimated coefficients and