key: cord-338024-8kq5nzv5
authors: Lee, Sokbae; Liao, Yuan; Seo, Myung Hwan; Shin, Youngki
title: Sparse HP filter: Finding kinks in the COVID-19 contact rate()
date: 2020-09-26
journal: J Econom
DOI: 10.1016/j.jeconom.2020.08.008
sha: 
doc_id: 338024
cord_uid: 8kq5nzv5

In this paper, we estimate the time-varying COVID-19 contact rate of a Susceptible-Infected-Recovered (SIR) model. Our measurement of the contact rate is constructed using data on actively infected, recovered and deceased cases. We propose a new trend filtering method that is a variant of the Hodrick-Prescott (HP) filter, constrained by the number of possible kinks. We term it the sparse HP filter and apply it to daily data from five countries: Canada, China, South Korea, the UK and the US. Our new method yields the kinks that are well aligned with actual events in each country. We find that the sparse HP filter provides a fewer kinks than the [Formula: see text] trend filter, while both methods fitting data equally well. Theoretically, we establish risk consistency of both the sparse HP and [Formula: see text] trend filters. Ultimately, we propose to use time-varying contact growth rates to document and monitor outbreaks of COVID-19.

Since March 2020, there has been a meteoric rise in economic research on COVID-19. New research outputs have been appearing on the daily and weekly basis at an unprecedented level. 1 To sample a few, Ludvigson et al. (2020) quantified the macroeconomic impact of COVID-19 by using data on costly and deadly disasters in recent US history; Manski and Molinari (2020) and Manski (2020) applied the principle of partial identification to the infection rate and antibody tests, respectively; Chernozhukov et al. (2020) used the US state-level data to study determinants of social distancing behavior.

Across a wide spectrum of research, there is a rapidly emerging strand of literature based on a Susceptible-Infected-Recovered (SIR) model and its variants (e.g., Hethcote, 2000 , for a review of the SIR and related models). Many economists have embraced the SIR-type models as new tools to study the COVID-19 pandemic. Avery et al. (2020) provided a review of the SIR models for economists, calling for new research in economics. A variety of economic models and policy simulations have been built on the SIR-type models. See Acemoglu et al. (2020) , Alvarez et al. (2020) , Atkeson (2020) , Eichenbaum et al. (2020) , Pindyck (2020) , Stock (2020) , Kim et al. (2020) , and Toda (2020) among many others.

One of central parameters in the SIR-type models is the contact rate, typically denoted by β. 2 It measures "the average number of adequate contacts (i.e., contacts sufficient for transmission) of a person per unit time" (Hethcote, 2000) . The contact number β/γ is the product between β and the average infectious period, denoted by 1/γ; the contact number is interpreted as "the average number of adequate contacts of a typical infective during the infectious period" (Hethcote, 2000) .

The goal of this paper is to estimate the time-varying COVID-19 contact rate, say β t . In canonical SIR models, β is a time-constant parameter. However, it may vary over time due to multiple factors. For example, as pointed by Stock (2020) , self-isolation, social distancing and lockdown may reduce β. To estimate a SIR-type model, Fernández-Villaverde and Jones (2020) allowed for a time-varying contact rate to reflect behavioral and policy-induced changes associated with social distancing. In particular, they estimated β t using data on deaths at city, state and country levels. Their main focus was to simulate future outcomes for many cities, states and countries.

Researchers have also adopted nonlinear time-series models from the econometric toolbox. For example, Li and Linton (2020) analyzed the daily data on the number of new cases and the number of new deaths with a quadratic time trend model in logs. Their main purpose was to estimate the peak of the pandemic. Liu et al. (2020) studied the density forecasts of the daily number of active infections for a panel of countries/regions. They modeled the growth rate of active infections as autoregressive fluctuations around a piecewise linear trend with a single break. Hartl et al. (2020) used a linear trend model in logs with a trend break to fit German confirmed cases. Harvey and Kattuman (2020) used a Gompertz model with a time-varying trend to fit and forecast German and UK new cases and deaths.

In this paper, we aim to synthesize the time-varying contact rate with nonparametric time series modeling. Especially, we build a new nonparametric regression model for β t that allows for a piecewise linear trend with multiple kinks at unknown dates. We analyze daily data from Johns Hopkins University Center for Systems Science and Engineering (Dong et al., 2020, JHU CSSE) and suggest a particular transformation of data that can be regarded as a noisy measurement of time-varying β t . Our measurement of β t , which is constructed from daily data on confirmed, recovered and deceased cases, is different from that of Fernández-Villaverde and Jones (2020) who used only death data. We believe both measurements are complements to each other. However, the SIR model is at best a first-order approximation to the real world; a raw series of β t would be too noisy to draw on inferences regarding the underlying contact rate. In fact, the raw series exhibits high degrees of skewness and time-varying volatility even after the log transformation.

To extract the time-varying signal from the noisy measurements, we consider nonparametric trend filters that produce possibly multiple kinks in β t where the kinks are induced by government policies and changes in individual behavior. A natural candidate method that yields the kinks is 1 trend filtering (e.g., Kim et al., 2009 ). However, 1 trend filtering is akin to LASSO; hence, it may have a problem of producing too many kinks, just like LASSO selects too many covariates. In view of this concern, we propose a novel filtering method by adding a constraint on the maximum number of kinks to the popular Hodrick and Prescott (1997) (HP) filter. It turns out that this method produces a smaller number of the kink points than 1 trend filtering when both methods fit data equally well. In view of that, we call our new method the sparse HP filter. We find that the estimated kinks are well aligned with actual events in each country. To document and monitor outbreaks of COVID-19, we propose to use piecewise constant contact growth rates using the piecewise linear trend estimates from the sparse HP filter. They provide not only an informative summary of past outbreaks but also a useful surveillance measure.

The remainder of the paper is organized as follows. In Section 2, we describe a simple time series model of the time-varying contact rate. In Section 3, we introduce two classes of filtering methods. In Section 4, we have a first look at the US data, as a benchmark country.

In Section 5, we present empirical results for five countries: Canada, China, South Korea, the UK and the US. In Section 6, we establish risk consistency of both the sparse HP and 1 trend filters. Section 7 concludes and appendices include additional materials. The replication R codes for the empirical results are available at https://github.com/yshin12/sparseHP. Finally, we add the caveat that the empirical analysis in the paper was carried out in mid-June using daily observations up to June 8th. As a result, some remarks and analysis might be out of sync with the COVID-19 pandemic in real time.

In this section, we develop a time-series model of the contact rate. Our model specification is inspired by the classical SIR model which has been adopted by many economists in the current coronavirus pandemic.

We start with a discrete version of the SIR model, augmented with deaths, adopted from Pindyck (2020) :

where the (initial) population size is normalized to be 1, S t is the proportion of the population that is susceptible, I t the fraction infected, D t the proportion that have died, and R t the fraction that have recovered. The parameter γ = γ r + γ d governs the rate at which infectives transfer to the state of being deceased or recovered.

In the emerging economics literature on COVID-19, the contact rate β is viewed as the parameter that can be affected by changes in individual behavior and government policies through social distancing and lockdown. We follow this literature and let β = β t be timevarying.

Let C t be the proportion of confirmed cases, that is C t = I t + R t + D t . In words, the confirmed cases consist of actively infected, recovered and deceased cases. Use the equations in (2.1) to obtain

Assume that we have daily data on ∆C t , ∆R t and ∆D t . From these, we can construct cumulative C t , R t and D t . Then S t = 1 − C t and I t = C t − R t − D t . This means that we can obtain tion in the paper. We use daily data from JHU CSSE and they are subject to measurement errors, which could bias our estimates. In Appendix A, we show that the time series model given in this section is robust to some degree of under-reporting of confirmed cases. However, our estimates are likely to be biased if the underreporting is time-varying. For example, this could happen because testing capacity in many countries has expanded over the time period. Nonetheless, we believe that our measurement of Y t primarily captures the genuine underlying trend of β t . Moreover, because the SIR model in (2.1) is at best a first-order approximation, a raw series of Y t would be too noisy to be used as the actual series of the underlying contact rate β t . In other words, β t = Y t in actual data and it would be natural to include an error term in Y t . Because β t has to be positive, we adopt a multiplicative error structure and make the following assumption.

Assumption 2 (Time-Varying Signal plus Regression Error). For each t, the unobserved random

where the error term u t has the following properties:

Define

Under Assumption 2, (2.2) can be rewritten as

We consider two different classes of trend filtering methods to produce piecewise estimators of f 0,t := log β t . The first class is based on 1 trend filtering, which has become popular recently. See, e.g., Kim et al. (2009 ), Tibshirani (2014 , and Wang et al. (2016) among others.

The starting point of the second class is the HP filter, which has been popular in macroeconomics and has been frequently used to separate trend from cycle. The standard convention in the literature is to set λ = 1600 for quarterly time series. For example, Ravn and Uhlig (2002) suggested a method for adjusting the HP filter for the frequency of observations;

de Jong and Sakarya (2016) and Cornea-Madeira (2017) established some representation results; Hamilton (2018) provided criticism on the HP filter; Phillips and Shi (2019) advocated a boosted version of the HP filter via L 2 -boosting (Bühlmann and Yu, 2003 ) that can detect multiple structural breaks. We view that the kinks might be more suitable than the breaks for modelling β t using daily data. It is unlikely that in a few days, the degree of contagion of COVID-19 would be diminished with an abrupt jump by social distancing and lockdown.

The original HP filter cannot produce any kink just as ridge regression does not select any variable. We build the sparse HP filter by drawing on the recent literature that uses an 0constraint or -penalty (see, e.g. Bertsimas et al., 2016; Lee, 2018, 2020; Huang et al., 2018) .

In 1 trend filtering, the trend estimate f t is a minimizer of

which is related to Hodrick and Prescott (1997) filtering; the latter is the minimizer of

In this paper, the main interest is to find the kinks in the trend. For that purpose, 1 trend filtering is more suitable than the HP filtering. The main difficulty of using (3.1) is the choice of λ. This is especially challenging since the time series behavior of y t is largely unknown.

The 1 trend filter is akin to LASSO. In view of an analogy to square-root LASSO (Belloni et al., 2011) , it might be useful to consider a square-root variant of (3.1):

We will call (3.3) square-root 1 trend filtering. Both (3.1) and (3.3) can be solved via convex optimization software, e.g., CVXR (Fu et al., 2017) .

As an alternative to 1 trend filtering, we may exploit Assumption 3 and consider an 0constrained version of trend flitering: is low (Hastie et al., 2017; Mazumder et al., 2017) . This is likely to be a concern for our measurement of the log contact rate.

To regularize the best subset selection procedure, it has been suggested in the literature that (3.4) can be combined with 1 or 2 penalization (Bertsimas and Van Parys, 2020; Mazumder et al., 2017) . We adopt Bertsimas and Van Parys (2020) and propose an 0constrained version of the Hodrick-Prescott filter:

(3.5) As in (3.4), the tuning parameter κ controls how many kinks are allowed for. Thus, we have a direct control of the resulting segments of different slopes. The 2 penalty term is useful to deal with the low SNR problem with the COVID 19 data. We will call (3.5) sparse HP trend filtering.

Problem (3.5) can be solved by mixed integer quadratic programming (MIQP). Rewrite the objective function in (3.5) as

This is called a big-M formulation that requires that

We need to choose the auxiliary parameters f , f and M . We set f = min y t and f = max y t .

One simple practical method for choosing M is to set

To implement the proposed method, it is simpler to write the MIQP problem above in matrix notation. Let y denote the (T × 1) vector of y t 's and 1 a vector of 1's whose dimension may vary. We solve

with entries not shown above being zero. Let f and z denote the resulting maximizers. It is straightforward to see that f also solves (3.5). Therefore, f is the (T × 1) vector of trend estimates and K := {t = 2, . . . , T − 1 : z t = 1} is the index set of estimated kinks. The MIQP problem can be solved via modern mixed integer programming software, e.g., Gurobi.

Because the sample size for y t is typically less than 100, the computational speed of MIQP 7 J o u r n a l P r e -p r o o f is fast enough to carry out cross-validation to select tuning parameters. We summarize the equivalence between the original and MIQP formulation in the following proposition.

Let f MIQP denote a solution to (3.7). Then, both f SHP and f MIQP are equivalent in the sense that

We first consider the sparse HP filter. There are two tuning parameters: λ and κ. It is likely that there will be an initial stage of coronavirus spread, followed by lockdown or social distancing. Even without any policy intervention, it will come down since many people will voluntarily select into self-isolation and there is a chance of herd immunity. Hence, the minimum κ is at least 1. If κ is too large, it is difficult to interpret the resulting kinks. In view of these, we set the possible values κ ∈ K = {2, 3, 4}. For each pair of (κ, λ), let f −s (κ, λ) denote the leave-one-out estimator of f s . That is, it is the sparse HP filter estimate by solving:

(3.8)

The only departure from (3.5) is that we replace the fidelity term T t=1

(3.9) 8 J o u r n a l P r e -p r o o f where L is the set for possible values of λ. We view λ as an auxiliary tuning parameter that mitigates the low SNR problem. Hence, we take L to be in the range of relatively smaller values than the typical values used for the HP filter. In the numerical work, we let Λ to a grid of equi-spaced points in the log 2 -scale.

We now turn to the HP, 1 and square-root 1 trend filters. For each filter, we choose λ such that the fidelity term T t=1 (y t −f t ) 2 is the same as that of the sparse HP filter. In this way, we can compare different methods holding the same level of fitting the data. Alternatively, we may choose λ by leave-one-out cross validation for each filtering method. However, in that case, it would be more difficult to make a comparison across different methods. Since our main focus is to find the kinks in the contact rate, we will fine-tune all the filters to have the same level of T t=1 (y t − f t ) 2 based on the sparse HP filter's cross validation result.

As a benchmark, we have a first look at the US data. The dataset is obtained via R package coronavirus (Krispin, 2020) , which provides a daily summary of COVID-19 cases from Johns Hopkins University Center for Systems Science and Engineering (Dong et al., 2020) .

Following Liu et al. (2020) , we set the first date of the analysis to begin when the number of cumulative cases reaches 100 (that is, March 4 for the US). To smooth data minimally, we take Y t in (2.2) to be a three-day simple moving average: that is,

whereY t is the daily observation of Y t constructed from the dataset. 3 Then, we take the log to obtain y t = log Y t . to March 30, which we will call the "lockdown" date for simplicity, although there was no lockdown at the national level. As a noisy measurement of β t , Y t shows enormous skewness and fluctuations especially in the beginning of the study period. This indicates that the signal-to-noise ratio is high and is time-varying as well. This pattern of the data has motivated Assumption 2. Because S t−1 is virtually one throughout the analysis period (0.994 on June 8, which is the last date of the sample), Y t ≈ ∆C t /I t−1 , which is daily positives divided by the lagged infectives. 

where t 0 is March 30. The simple idea behind (4.1) is that an initial, time-constant contact rate began to diminish over time after a majority of US states imposed stay-at-home orders.

In simple SIR models, the contact number β/γ is identical to the basic reproduction number denoted by R 0 , which is viewed as a key threshold quantity in the sense that "an infection can get started in a fully susceptible population if and only if R 0 > 1 for many deterministic epidemiology models" (Hethcote, 2000) . Since β t is time-varying in our framework, we may define a time-varying basic reproduction number by R 0 (t) := β t /γ.

The top-right panel shows the estimates of time-varying R 0 (t): 4

where γ = 1/18 is taken from Acemoglu et al. (2020) . This corresponds to 18 days of the average infectious period. The parametric estimates of R 0 (t) started above 4 and reached 0.15 at the end of the sample period.

The left-bottom panel shows the residual plot in terms of y t and the right-bottom panel the residual plot in terms of R 0 (t). In both panels, the estimated residuals seem to be biased and show autocorrelation. Especially, the positive values of residuals at the end of the sample period is worrisome because the resulting prediction would be too optimistic.

In this section, we present estimation results for five countries: Canada, China, South Korea, the UK and the US. These countries are not meant to be a random sample of the world; they are selected based on our familiarity with them so that we can interpret the estimated kinks with narratives. We look at the US as a benchmark country and provide a detailed analysis in Section 5.1. A condensed version of the estimation results for other countries are provided in Section 5.2. can see that the choice of κ seems to matter more than that of λ. Clearly, κ = 2 provides the worst result and κ = 3 and κ = 4 are relatively similar. The LOOCV criterion function was minimized at ( κ, λ) = (4, 1). We now turn to different filtering methods. In Figure 5 , we show selection of λ for the HP, 1 and square-root 1 filters. As explained in Section 3.3, the penalization parameter λ is chosen to ensure that all different methods have the same level of fitting the data. Figure 6 shows the estimation results for the HP filter. The HP trend estimates trace data pretty well after late March, as clear in residual plots. However, there is no kink in the estimates due to the nature of the 2 penalty term in the HP filter. The tuning parameter was λ = 30, which is 30 times as large as the one used in the sparse HP filter. This is because for the HP filter, λ is the main tuning parameter; however, for the sparse HP filter, λ plays a minor role of regularizing the 0 constrained method. Note: The thunning parameter λ is chosen by minimizing the distance between two fidelities as described in Section 3.3. The selected tuning parameters for HP, 1, and square-root 1 are as 30, 0.9, and 0.5, repectively.

In Figure 7 , we plot estimation results using 1 trend filtering. The results look similar to those in Figure 6 

J o u r n a l P r e -p r o o f |∆ 2 log β t | > η, where ∆ 2 is the double difference operator and η = 10 −6 is an effective zero. 5 The tuning parameter λ = 0.9 was chosen by minimizing the distance between the fidelity of 1 and that of the Sparse HP. Recall that the sparse HP filter produces the kinks on March 16, March 20, April 14, and May 13. In other words, the 1 filter estimates 6 more kinks than the sparse HP filter when both fit the data equally well. It is unlikely that two adjacent dates (March 15-16 and March 20-21) correspond to two different regimes in the time-varying contact rate. This suggests that the 1 filter may over-estimate the number of kinks. Figure 8 shows estimation results for the square-root 1 trend filters. The chosen λ = 0.5 was smaller than that of the 1 trend filter due to the change in the scale of the fidelity term; however, the trend estimates look very similar and the estimated kinks are identical between the 1 and square-root 1 trend filters. In Figure 9 , we plot the sparse HP filter estimates along with 1 filter estimates. Both methods have produced very similar trend estimates, but the number of kinks is substantially different: only 4 kinks for the sparse HP filter but 10 kinks for the 1 filter.

In this section, we provide condensed estimation results for other countries. We focus on the sparse HP and 1 filters whose tuning parameters are chosen as in the previous section.

Appendices B and C contain the details of the selection of tuning parameters. The results are robust to the size of the effective zero and do not change even if we set η = 10 −3 . Gurobi used for the Sparse HP filtering also imposes some effective zeros in various constraints. We use the default values of them. For example, the integer tolerance level and the general feasibility tolerance level are 10 −5 and 10 −6 , respectively. 20 J o u r n a l P r e -p r o o f state of emergency was announced in Ontario on March 17 and ordered to close all nonessential businesses on March 23 (Global News, 2020). We set the lockdown date in Canada on March 13 as other provincial governments as well as the federal government started to recommend the social distancing measures strongly along with the cancellation of various events on the date (CBC News, 2020). These tight lockdown and social distancing measures seemed to contribute the sharp decline of the contact rate in the second period. Both governments started to announce the plans to lift the lockdown measures at the end of April, which corresponds to the third period. Lockdown fatigue would also cause the slower decrease of the contact rate. In sum, a series of social distancing measures have been effective to decrease the contact rate but with some lags. The sparse HP filtering separates these periods reasonably well. However, the 1 filtering overfits the model with 5 kinks. 3. March 14 -March 24: This period shows a V-turn of the contact rate in terms of the log β t scale. It also shows an upward trending in the R 0 (t) scale but the level is lower than that in early February. The mass quarantine of Wuhan was partially lifted on March 19 (Bloomberg, 2020) . Most provinces downgraded their public health emergency response level, where factories and stores started to reopen in this period.

4. March 24 -April 18: The contact rate still increased but at a lower rate. It started to decrease again at the end of this period. We can see a slight increase in R 0 (t). The mass quarantine of Wuhan was lifted more and the travel to other provinces was allowed on April 8 (Bloomberg, 2020) .

The contact rate went down quickly and was flattened at a low level.

The last hospitalized Covid-19 patient in Wuhan was discharged on April 26 (Xinhua, 2020) . Overall, the trend of the contact rate is quite similar to those of the US and Canada. The location of the kinks are around more in the initial periods but it shows the steady downward trending after the prime minister's lockdown announcement. This results in a smooth curve in the R 0 (t) scale. The trend estimates of the 1 filter is almost identical to those of the sparse HP filter; however, it indicates 10 kinks, which seem overly excessive.

The sparse HP filter produces the kinks where the slope changes in the log β t scale, thereby providing a good surveillance measure for monitoring the ongoing epidemic situation. The policy responses are based on various scenarios and the contact rate is one of the most important measures that determine different developments. As a summary statistic of the timevarying contact rate, we propose to consider the time-varying growth rate of the contact rate, which we call contact growth rates:

Recall that we have defined the time-varying basic reproduction number by R 0 (t) = β t /γ.

Because γ is fixed over time, we have that The growth rates, expressed as percentages, are obtained by (5.1) using the sparse HP trend estimates. The contact growth rates are also growth rates of R0(t). The kink dates separating distinct periods are different for each country and they are reported in Sections 5.1 and 5.2. algebra,

which implies that ξ(t) will be piecewise constant if log β t is piecewise linear. This simple algebraic relationship shows that a change in the slope at the kink in the log β t scale is translated to a break in the time-varying contact growth rates and therefore in growth rates of the time-varying basic reproduction number. When ξ(t) is a large positive number, that will be a warning signal for the policymakers. On the contrary, if ξ(t) is a big negative number, that may suggest that policy measures imposed before are effective to reduce the contagion. Table 1 reports the time-varying contact growth rates in the five countries that we investigate, using the sparse HP trend estimates. For the US, the explosive growth rate of 7.5%

in the second period is followed by the negative growth rates of −7.7%, −3.4%, and −1%, However, it will be mainly useful for a short-term projection of the contact growth rate because it is not designed to make long-term trend predictions.

In this section, we examine theoretical properties of the sparse HP and 1 filters in terms of risk consistency. Let · 0 denote the usual 0 -(pseudo)norm, that is the number of nonzero elements, and let · r and · ∞ , respectively, denote the r norm for r = 1, 2 and the sup-norm.

Define

Let f * denote the ideal sparse filter in the sense that

Let f denote the sparse HP filter defined in Section 3.2. Then,

is always nonnegative. Following the literature on empirical risk minimization, we bound the excess risk R in (6.2) and establish conditions under which it converges to zero.

Recall that the sparse HP filter minimizes

J o u r n a l P r e -p r o o f

Therefore, it suffices to bound two terms above. For the second term, we can use (3.6) and

(6.1) to bound

We summarize discussions above in the following lemma.

Lemma 6.1. Let f denote the sparse HP filter. Then,

To derive an asymptotic result, we introduce subscripts indexed by the sample size T , when necessary for clarification. Let G κ denote the set of every continuous and piecewise linear function whose slopes and the function itself is bounded by C 1 and C 2 , respectively, and the number of kinks is bounded by κ.

Assumption 4. Assume that F in (6.1) satisfies

where log β t = f * T,t , f * T ∈ F T and u t satisfies sup t=1,2,... E|u t | p < ∞ for some p ≥ 2 and Assumption 2. Finally, λκT −(1−1/p) → 0 as T → ∞.

Then, we have the following proposition.

Proposition 6.1. Let Assumption 4 hold. Then, we have that as T → ∞,

and λκ T max t=2,...,T −1 |y t−1 − 2y t + y t+1 | 2 → p 0. Therefore, R( f , f * ) → P 0.

Theorem 6.1 establishes the consistency in terms of the excess risk R. Assumption 4 provides sufficient conditions for (6.4) and (6.5). Condition (6.4) is a uniform law of large numbers for the class F and condition (6.5) imposes a weak condition on λ. Proposition 6.1 follows immediately from Lemma 6.1 once (6.4) and (6.5) are established.

Proof of Proposition 6.1. Note that the summand in (6.4) can be rewritten (

due to the law of large numbers (LLN) for a martingale difference sequence (mds).

We now turn to sup f ∈F T −1 T t=1 f t u t . The marginal convergence is straightforward since f t u t is an mds with bounded second moments due to the LLN for mds. Next, note that

which implies the stochastic equicontinuity of the process indexed by f ∈ F T . Finally, recall Arzelà-Ascolli theorem, see e.g. Van Der Vaart and Wellner (1996) , to conclude that G κ is totally bounded with respect to | · | ∞ . Therefore,

by a generic uniform convergence theorem, e.g. Andrews (1992) .

To show the condition (6.5), note that it is bounded by 16 λκ T max t=1,...,T y 2 t , which is in turn O p (λκT −(1−1/p) ) due to the moment condition on u t .

The 1 trend filtering (3.1) can be expressed as

We now derive the deviation bound for f −f * 2 . First, the problem is equivalent to a regular LASSO problem as stated in Lemma 6.2 below.

J o u r n a l P r e -p r o o f

Write D = (D 3 , D 2 ) where D 2 has two columns. Additionally, write

where 0 is 2 × (T − 2), g 1 is T × 2 and G 2 is T × (T − 2). Let P g 1 = g 1 (g 1 g 1 ) −1 g 1 .

Lemma 6.2. We have f = y − y + X θ, where y := (I − P g 1 )y, X := (I − P g 1 )G 2 and θ := arg min θ y − Xθ 2 2 + λ θ 1 .

Proof of Lemma 6.2. Let D 1 = (0 : I 2 ) be a 2 × T matrix, so that

is upper triangular and invertible. Then, G :=D −1 = (G 2 , g 1 ).

Then for a generic f ∈ R T , we can define

So (θ, a) also depend on f and Gα = g 1 a + G 2 θ. Then the problem is equivalent to: f = g 1 a + G 2 θ, where ( a, θ) := min a,θ y − (g 1 a + G 2 θ) 2 2 + λ θ 1 .

To solve the problem, we concentrate out a: Given θ, the optimal a is (g 1 g 1 ) −1 g 1 (y − G 2 θ) and the optimal g 1 a is P g 1 (y − G 2 θ). Substituting, so the problem becomes a regular LASSO problem: min θ y − Xθ 2 2 + λ θ 1 .

Here, {f 0,t : t = 1, . . . , T } denote the true elements of f . For a generic vector θ ∈ R T −2 , let θ J and θ J c respectively be its subvectors whose elements are in J and J c . No we define the restricted eigenvalue constant

Proposition 6.2. Let f * denote the true value of f and u := y−f * . Suppose the event 2.5 u X ∞ < where in the first equality we used the mean value theorem for somef t ∈ (f t , f * t ). Therefore,

and the analogous result folds for exp( f ).

We have developed a novel method to estimate the time-varying COVID-19 contact rate using data on actively infected, recovered and deceased cases. Our preferred method called the sparse HP filter has produced the kinks that are well aligned with actual events in each of five countries we have examined. We have also proposed contact growth rates to document and monitor outbreaks. Theoretically, we have outlined the basic properties of the sparse HP and 1 filters in terms of risk consistency. The next step might be to establish a theoretical result that may distinguish between the two methods by looking at kink selection consistency. It would also be important to develop a test for presence of kinks as well as an inference method on the location and magnitude of kinks and on contact growth rates. In the context of the nonparametric kernel regression of the trend function, Delgado and Hidalgo (2000) explored the distribution theory for the jump estimates but did not offer testing for the presence of a jump. Compared to the kernel smoothing approach, it is easier to determine the number of kinks using our approach, as we have demonstrated. Furthermore, the linear trend specification is more suitable for forecasting immediate future outcomes, at least until the next kink arises. The long-term prediction is more challenging and it is beyond the scope of this paper. Finally, it would be useful to develop a panel regression model for the contact rate at the level of city, state or country. These are interesting research topics for future research.

In Section 2, it is assumed that we observe (C t , R t , D t ). In this appendix, we show that our time series model in Section 2 is robust to some degree of under-reporting of positive cases.

Assume that what we observe is only a fraction of changes in C t . This assumption reflects the reality that a daily reported number of newly positive cases of COVID-19 is likely to be underreported. Suppose that we observe ∆c t in period t such that ∆c t := ρ∆C t , where J o u r n a l P r e -p r o o f Journal Pre-proof 0 < ρ < 1 is unknown. Then,

assuming that c 0 = C 0 = 0. In words, ρ is the constant ratio between reported and true cases.

Formally, we make the following assumption.

Assumption 5 (Fraction Reporting). For each t, we observe (c t , r t , d t ) such that c t := ρC t , r t := ρR t and d t := ρD t ,

where 0 < ρ < 1.

The two simplifying conditions in Assumption 5 is that (i) ρ is identical among the three time series and (ii) ρ is constant over time. In reality, a fraction of reported deaths might be higher than that of reported cases; ρ might be time-varying especially in the beginning of the pandemic due to capacity constraints in testing. However, we believe that ρ is unlikely to vary over time as much as β t changes over time; thus, we take a simple approach to minimize complexity. The common ρ can be thought of a broad measure of detecting COVID-19 in a community.

Define i t := c t − r t − d t and s t := 1 − c t . Under Assumption 5, the reported fraction infected at time t (i t ) is underestimated, but the reported fraction of the proportion that is susceptible at time t (s t ) is overestimated. Note that g t := ∆c t i t−1 = ρ∆C t ρI t−1 = ∆C t I t−1 .

However,

In words, we have a measurement error problem on s t−1 but not on g t . It follows from (2.2) that the observed g t and s t−1 are related by

The right-hand side of (A.2) is likely to exhibit an increasing trend since C t−1 is the cumu-34 J o u r n a l P r e -p r o o f Journal Pre-proof lative fraction ever infected. To alleviate this problem, we now divide both sides of (A.1) by c t−1 , which is positive, to obtain

On one hand, if ρ = 1, (A.3) is identical to (2.2). On other hand, if ρ → 0, the term inside the brackets on the right-hand side of (A.3) diverges to infinity.

In the intermediate case, it depends on the relative size between s t−1 /c t−1 and (ρ − 1)/ρ.

We now use the UK data to argue that the latter is negligible to the former. According to the estimate by Office for National Statistics (2020) Note: The red dashed line denotes the equalizer between the fidelity of the sparse HP filter and that of the 1 filter: λ = 4.9 for Canada; λ = 8.9 for China; λ = 3.0 for South Korea; and λ = 2.7 for the UK. The analysis period is ended if the number of newly confirmed cases averaged over 3 days is smaller than 10: April 26 (China) and April 29 (South Korea).

Optimal targeted lockdowns in a multi-group SIR model

A simple planning problem for COVID-19 lockdown

Generic uniform convergence

What will be the economic impact of COVID-19 in the US? Rough estimates of disease scenarios

COVID-19 doesn't need lockdowns to destroy jobs: The effect of local outbreaks in Korea

Policy implications of models of the spread of coronavirus: Perspectives and opportunities for economists

Boris Johnson speech: PM unveils 'conditional plan' to reopen society

Coronavirus: Boris Johnson's address to the nation in full

Square-root lasso: pivotal recovery of sparse signals via conic programming

Best subset selection via a modern optimization lens

Sparse high-dimensional regression: Exact scalable algorithms and phase transitions

China to lift lockdown over virus epicenter Wuhan on april 8

Boosting with the l 2 loss

Coronavirus: Here's what's happening in Canada and around the world on march 13

Best subset binary prediction

Binary classification with covariate selection through 0 -penalized empirical risk minimization

Causal impact of masks, policies, behavior on early Covid-19 pandemic in the

The explicit formula for the Hodrick-Prescott filter in a finite sample

Covid-19 in Quebec: A timeline of key dates and events

The econometrics of the Hodrick-Prescott filter

Nonparametric inference on structural breaks

An interactive web-based dashboard to track COVID-19 in real time

The macroeconomics of epidemics

Estimating and simulating a SIRD model of COVID-19 for many countries, states, and cities

CVXR: An R package for disciplined convex optimization

A timeline of the novel coronavirus in Ontario

Why you should never use the Hodrick-Prescott filter

Measuring the impact of the German public shutdown on the spread of Covid-19. Covid Economics, Vetted and Real-Time Papers

Time series models based on growth curves with applications to forecasting coronavirus. Covid Economics, Vetted and Real-Time Papers

Extended comparisons of best subset selection, forward stepwise selection, and the lasso

The mathematics of infectious diseases

Postwar U.S. business cycles: An empirical investigation

A constructive approach to l 0 penalized regression

1 trend filtering

Estimating a breakpoint in the pattern of spread of covid-19 in south korea

The 2019 Novel Coronavirus COVID-19 (2019-nCoV) Dataset

When will the Covid-19 pandemic peak? Working Paper 2020/11, University of Cambridge

Panel forecasts of country-level Covid-19 infections

Covid19 and the macroeconomic effects of costly disasters

Bounding the predictive values of COVID-19 antibody tests

Estimating the COVID-19 infection rate: Anatomy of an inference problem

Subset selection with shrinkage: Sparse linear modeling when the SNR is low

COVID-19) Infection Survey pilot

Boosting: Why you can use the HP filter

COVID-19 and the welfare effects of reducing contagion

On adjusting the Hodrick-Prescott filter for the frequency of observations

Data gaps and the policy response to the novel coronavirus

How South Korea flattened the curve

How the coronavirus pandemic unfolded: a timeline

See how all 50 states are reopening

See which states and cities have told residents to stay at home

Adaptive piecewise polynomial estimation via trend filtering

Susceptible-infected-recovered (SIR) dynamics of Covid-19 and economic impact. Covid Economics, Vetted and Real-Time Papers

Weak convergence

Trend filtering on graphs

Fighting Covid-19, China in action

λ holds. Then on this eventProof of Proposition 6.2. Let θ * = Df * . Consider the vector form of the model y = f * + u.Then y = Xθ * + u where u = (I − P g 1 )u. By Lemma 6.2, f = y − y + X θ, where θ := arg min θ y − Xθ 2 2 + λ θ 1 .The standard argument for the LASSO deviation bound implies, on the event 2.5 u X ∞ < λ,To achieve risk consistency, λ has to be chosen to make the second term on the right-hand side of (6.6) asymptotically small and to ensure that the event 2.5 u X ∞ < λ holds with high probability. The first term on the right-hand side of (6.6) will converge to zero under mild conditions on u. It is reassuring that the 1 trend filter fits COVID-19 data well in our empirical results.

In this subsection, we obtain risk consistency of exp( f t ) and exp( f t ). To do so, we first rewrite the excess risk in (6.2) asfor f = (f 1 , ..., f T ) and f * = (f * 1 , ..., f * T ) . We have proved in the previous sections that R( f , f * ) → P 0 and R( f , f * ) → P 0. Then, under the assumption that there exists a constant C < ∞ such that max t |f t | + max t |f * t | < C for all {f t } on the parameter space, we get, uniformly for all f on the parameter space,