key: cord-1005280-oks8ziws
authors: Buckman, Shelby R.; Glick, Reuven; Lansing, Kevin J.; Petrosky-Nadeau, Nicolas; Seitelman, Lily M.
title: Replicating and projecting the path of COVID-19 with a model-implied reproduction number
date: 2020-08-28
journal: Infect Dis Model
DOI: 10.1016/j.idm.2020.08.007
sha: 538ce8da96ca77d07cbedd553f9d7e7317314bbd
doc_id: 1005280
cord_uid: oks8ziws

We demonstrate a methodology for replicating and projecting the path of COVID-19 using a simple epidemiology model. We fit the model to daily data on the number of infected cases in China, Italy, the United States, and Brazil. These four countries can be viewed as representing different stages, from later to earlier, of a COVID-19 epidemic cycle. We solve for a model-implied effective reproduction number [Formula: see text] each day so that the model closely replicates the daily number of currently infected cases in each country. For out-of-sample projections, we fit a behavioral function to the in-sample data that allows for the endogenous response of [Formula: see text] to movements in the lagged number of infected cases. We show that declines in measures of population mobility tend to precede declines in the model-implied reproduction numbers for each country. This pattern suggests that mandatory and voluntary stay-at-home behavior and social distancing during the early stages of the epidemic worked to reduce the effective reproduction number and mitigate the spread of COVID-19.

As of July 19, 2020, the ongoing COVID-19 pandemic has infected nearly 15 million people worldwide, accounting for over 600,000 deaths. 2 The two hardest hit nations are the United States and Brazil, as measured by the total number of confirmed cases. In recent months, epidemiology models have been used to project the path of the epidemic in different locations and help guide decisions about public health interventions. 3 This paper demonstrates a methodology for replicating and projecting the path of COVID-19 using a simple epidemiology model. We fit a standard compartmental epidemiology model (called a SEIR model) to daily data on the number of COVID-19 infected cases and closed cases (recovered or deceased) in four countries: China, Italy, the United States, and Brazil. 4 These four countries can be viewed as representing different stages, from later to earlier, of a COVID-19 epidemic cycle. China (specifically Hubei Province) has experienced a nearly complete epidemic cycle in which the number of COVID-19 infected cases dropped to a value of only 55 on June 10. 5 Italy is three months beyond its peak number of infected cases that occurred on April 19. The number of infected cases in both the United States and Brazil continue to increase. In the United States, the number of infected cases reached a local peak on May 30. But after trending down for five days, the number of infected cases reversed course and has continued to rise through the end of our data sample on July 19. The trailing 7-day average daily growth rate of infected cases in the United States started trending up in the first week of June, but has recently leveled off at a value near 1.5%. In Brazil, the trailing 7-day average daily growth rate of infected cases is also near 1.5%, but the growth rate is more volatile than in the United States.

In addition to representing different stages of the COVID-19 epidemic, the four countries that we examine represent different magnitudes in the total number of cases (infected plus closed). China has recorded only about 84,000 total cases, whereas Italy has nearly three times that number. In contrast, the total number of cases in the United States and Brazil are currently about 3.9 million and 2.1 million, respectively.

Based on epidemiological evidence, we calibrate the incubation period for COVID-19 (the average time between exposure and subsequent infection) to be 5.1 days for each country. Based on the nearly complete epidemic cycle for China, we calibrate the illness duration parameter (the average time between infection and either recovery or death) to be 20 days for each country. This value allows the SEIR model's law of motion for China to approximately match the end-of-sample number of closed cases on July 19. We introduce an additional country-specific parameter in the law of motion for closed cases so that we can exactly match the end-of-sample smoothed number of closed cases in each country. The additional parameter allows us to capture cross-country differences in the reporting of recoveries or deaths that can influence the transition rate from infected cases to closed cases. For the out-of-sample projections, we assume that the additional parameter converges towards 1.0 in a manner that approximates the quasi-real time trajectory of the calibrated value for China.

Given the model parameter values, we solve for the model-implied reproduction number R t each day so that our SEIR model exactly replicates a centered 7-day moving average of the number of infected cases in each country.

We use smoothed data in place of the raw data for this computation because it helps to reduce the sensitivity of the model's out-of-sample projections to daily fluctuations in new infected cases. But in-sample, the model continues to closely replicate the raw number of infected and closed cases in each country.

During the early stages of the epidemic, the model-implied R t is typically large and volatile to capture the rapid and uneven growth in the number of infected cases. But as the epidemic progresses, the model-implied R t tends to decline and become less volatile, providing a daily indicator that can track the degree to which mandatory or voluntary actions by individuals may be helping to mitigate the spread of the disease. Our model-implied reproduction number should not be interpreted literally as the average number of secondary infections per infected case, as usually defined in the epidemiology literature. Rather, the model-implied reproduction number can be interpreted as the analog to the "Solow residual" in economics, acting as a stand-in for whatever time-varying model complexities are needed to closely replicate the observed time series of infected cases. 6 For the out-of-sample projections, we fit a behavioral function to the in-sample data that allows for the endogenous response of R t to movements in the lagged number of infected cases. The function captures the idea that a rising number of infections will trigger a behavioral response by individuals or health authorities that helps to mitigate the spread of the disease. Our methodology allows us to make projections about the future path of the epidemic while closely replicating the in-sample data. Nevertheless, we wish to emphasize that our out-of-sample projections are subject to enormous uncertainty and can sometimes shift by large amounts from one week to the 6 The model-implied R t can be viewed as a reverse-engineered stochastic shock. For examples of this approach in economics, see Gelain, Lansing, and Natvik (2018) and Lansing (2019). next, depending on recent incoming data. We illustrate this important point with a quasi real-time experiment in which we plot a sequence of out-of-sample projections for China and the United States using different end-ofsample starting points for the projections. Given the wide range of estimates for COVID-19 fatality rates, we do not attempt to separately project recoveries versus deaths, but we do report some statistics on closed case fatality rates and estimates of more refined fatality rates from other studies.

The COVID-19 scenarios examined here are intended to demonstrate our methodology and provide a qualitative view of potential epidemic trajectories in a small sample of selected countries. The out-of-sample projections should not be viewed as definitive forecasts. 7 At the end of our raw data sample on July 19, the epidemic cycle in China Finally, we show that declines in measures of population mobility tend to precede declines in the model-implied reproduction numbers for each country. This pattern suggests that mandatory and voluntary stay-at-home behavior and social distancing during the early stages of the epidemic worked to reduce the effective reproduction number and mitigate the spread of COVID-19. More recently, measures of population mobility have been trending upwards in all four countries. This pattern reflects both the relaxation of mandatory containment measures and increased voluntary mobility. But as of July 19, a resurgence of new infections in some areas of the United States has triggered a reinstatement of some containment measures, consistent with our behavioral hypothesis. At the end of our data sample, measures of population mobility for the United States appear to have plateaued at a level that is below the pre-epidemic baseline.

The number of new COVID-19 related research papers is growing in a manner that may rival the growth rate of the disease itself. It is not possible to summarize the many related contributions to the literature, whether in epidemiology, economics, or other fields. Nevertheless, we wish to highlight some known contributions that employ methods that appear closely related to our approach. Kucinskas (2020) and Arroyo-Marioli, Bullano, and Rondï¿oen-Moreno (2020) employ SIR models and data on the number of infected cases to infer the time path of the effective reproduction number in various countries using a Kalman filter that treats the reproduction number as an unobserved component. Beenstock and Dai (2020) compute daily values of the effective reproduction number in various countries using a "perpetual inventory method" that cumulates the number of infected cases over time while assuming a fixed period of contagiousness for each infected case. Dandekar and Barbastathis (2020) allow for time variation in their SEIR model-implied reproduction number by introducing a new variable called the "strength of quarantine." They solve for the time path of the unobserved quarantine variable and other parameters to produce a best fit of the number of infected and recovered cases in various locations. Toda (2020) estimates values of the COVID-19 transmission rate for many countries by fitting a SIR model to daily data on the fraction of confirmed cases in the population.

As discussed by Ma (2020) , "phenomenological models," or curve-fitting approaches, represent an alternative to epidemiology models when forecasting the evolution of an epidemic. An influential example of this approach applied to COVID-19 is the model developed by the University of Washington's Institute for Health Metrics and Evaluation (IHME 2020). Other recent examples include Roosa et al. (2020) , Li and Linton (2020) , Liu, Moon, and Schorfheide (2020) , and Harvey and Kattuman (2020) .

A COVID-19 forecasting model developed by Atkeson, Kopecky, and Zha (2020) combines a curve-fitting approach with a simple SIRD model. Specifically, they fit a smooth curve to daily data on the cumulative number of deaths in a given location and then solve for the values of the model parameters (including initial conditions) and time paths of the model variables (including the effective reproduction number) so as to exactly replicate the smoothed curve of cumulative deaths. Fernï¿oendez-Villaverde and Jones (2020) adopt a similar approach by inverting a simple SIRD model to solve for the time path of the effective reproduction number that causes the model to replicate the smoothed number of cumulative and daily deaths in various locations. In both papers, the number of infected and recovered cases is inferred from the model; only the number of deaths is considered observable.

In contrast, our approach closely replicates the number of infected and closed cases (recovered or deceased) in the data. 8 In reality, data on the number of infections, recoveries, or deaths are all measured with error, so in the end, it comes down to which variables the model builder chooses to replicate. Atkeson (2020a) and Stock (2020) present epidemiology model simulations for different "flattening the curve" strategies that define the out-of-sample trajectory of the effective reproduction number. Eichenbaum, Rebelo, and Trabandt (2020) , among a long list of others, explicitly model the welfare-maximizing choices of individuals and policymakers that, in turn, influence the economic and epidemiological consequences of the disease. Atkeson (2020b) , Korolev (2020) , and Fernï¿oendez-Villaverde and Jones (2020) each demonstrate that different sets of epidemiology model parameters can fit the in-sample data equally well, yet imply markedly different long run forecasts. Our quasi real-time projections make a similar point. Hong, Wang, and Yang (2020) consider an epidemiology model in which the effective reproduction number is subject to stochastic shocks. They show that, relative to the deterministic version of the same model, the stochastic version can predict a substantially lower number of infections, even at horizons beyond 12 months.

The remainder of the paper is organized as follows. Section 2 presents the model, followed by the derivation of the model-implied reproduction number in section 3. The data, parameter values, and initial conditions are discussed in section 4. Section 5 shows time series plots of the model-implied reproduction numbers for China, Italy, the United States, and Brazil. Out-of-sample projections for each country are presented in section 6. Time series plots of population mobility indices versus model-implied reproduction numbers are presented in section 7.

The appendix outlines an extended version of our model that includes asymptomatic infected cases.

The canonical SEIR model of epidemics divides the population N into 4 compartments: Susceptible S t , Exposed E t (but not yet infected due to an incubation period), Infected I t , and Removed (or Resolved) R t , representing closed cases, i.e., those who are either recovered or deceased. 9 Homogeneous random mixing between susceptible and infected individuals creates exposed individuals who later fall ill at the end of a disease incubation period.

Infected individuals experience a period of illness, after which they may either recover or die. At the beginning of an epidemic, the share of the population susceptible to infection is high. The share of the population that is infected accelerates as each infected person can infect more than one other person. The number of new infected cases eventually slows as there are fewer susceptible individuals to infect and more individuals who have become non-infectious because they recover or die. The basic model employed here does not separate recoveries from deaths.

The propagation of an epidemic depends crucially on the daily transmission rate β t . The value of β t may be influenced by public health measures known as non-pharmaceutical interventions (NPIs) or by the endogenous response of the population as awareness of the disease grows. 10 Other model parameters include σ, the rate at which exposure leads to infection (the inverse of the incubation period) and γ, the rate of recovery or death (the inverse of the illness duration). Epidemiological models frequently refer to a "basic reproduction number," denoted by R 0 ≡ β 0 /γ. This is the number of secondary infections that one infected case produces in a fully susceptible population at t = 0 through the duration of the infectious period (given by 1/γ). As the epidemic evolves (t > 0), the number of susceptible individuals in the population is reduced. For t > 0, we define the effective reproduction number as R t ≡ β t /γ (also called the normalized transmission rate) which measures the average number of secondary infections per infected case in a population that is no longer fully susceptible. 11 When R t > 1, the number of infected cases continues to grow until the disease eventually spreads to nearly the entire population. However, when R t < 1, the growth rate of infected cases is slow enough so that the disease eventually dies out before reaching a large fraction of the population.

Given parameter values and a set of initial conditions I 0 , E 0 , R 0 , and S 0 = N − I 0 − E 0 − R 0 , the four health compartments evolve according to the following laws of motion:

where we have made the substitution β t = R t γ into equations (1) and (2). The ratio S t−1 /N is the recent fraction of the population that is susceptible to the disease. This ratio will be high during the initial stages of an epidemic like COVID-19 for which the population has little or no herd immunity. 12 To facilitate the computation of a modelimplied value of R t , we postulate that the daily number of exposed cases E t in equation (2) immediately impacts the daily number of infected cases I t in equation (3). 13 In equation (4), we introduce the additional parameter θ T > 0. This parameter allows the model to capture country-specific differences in the reporting of recoveries or deaths that can influence the transition rate from infected to closed cases. 14 In-sample, we calibrate the value of θ T for each country so that the model exactly matches the end-of-sample smoothed number of closed cases, denoted byR T . For the out-of-sample projections (t > T ), we assume that θ t converges towards 1.0 according to the following law of motion:

where κ > 0 governs the speed of convergence. We estimate the value of κ using the quasi real-time evolution of the calibrated value of θ T for China, which has gone through a nearly complete COVID-19 epidemic cycle. 15

As described below, we fit the above model to smoothed data on the number of COVID-19 infected and closed cases in China, Italy, the United States, and Brazil. We then project the out-of-sample path of the epidemic using a behavioral function that governs the evolution of R t .

Starting from equations (1) through (3), and then solving for R t yields the following model-implied value of the reproduction number:

which is not influenced by the additional parameter θ T . Given values for σ, γ, and N, together with the initial conditions of the model variables, we use equation (6) to solve for the value of R t each day for t = 1, 2, 3... so that 12 Fine, Eames, and Heymann (2011) examine the concept of "herd immunity" from theoretical and practical perspectives. 13 Our discrete-time model approximates the continuous-time derivative for any variable X t as dX t /dt ≈ X t − X t−1 . In the continuous-time limit, there is no distinction between the value of right-side variables dated either t or t − 1.

14 An extreme example of this phenomenon can be found in the COVID-19 data for Norway. The reported number of recovered cases remained constant at 32 from mid-April through May 21. On May 22, the reported number of recovered cases jumped to 7,727. 15 The adding-up constraint S t + E t + I t + R t = N is relaxed when θ t = 1. In our model projections, the resulting percentage deviation, defined as 100 × (S t + E t + I t + R t − N) /N, never exceeds 1.6% in absolute value for any country. This deviation can be interpreted as reflecting changes in N over time (due to births, deaths, or migration) or errors in measuring I t or R t . the model exactly replicates a centered 7-day moving average of the number of infected cases in the data for the in-sample period. Specifically, the values of I t and I t−1 in equation (6) are taken from the smoothed data which runs through July 16. We use smoothed data for I t and I t−1 because this helps to reduce the sensitivity of the model's out-of-sample projections (described below) to daily fluctuations in new infected cases. But in-sample, the model continues to closely replicate the raw number of infected and closed cases in each country.

During the early stages of the epidemic when the value of the denominator in equation (6) is low (because I t−1 is low and S t−1 /N ≈ 1 ), the model-implied reproduction number is typically large (i.e., R t 1) and volatile to capture the rapid and uneven growth in the number of infected cases. 16 As the epidemic progresses, the quantity S t−1 I t−1 /N in the denominator increases and the model-implied reproduction number tends to decline and become less volatile. During the progression stage, the model-implied reproduction number can serve as a daily indicator that can track the degree to which mandated or voluntary behavior on the part of individuals in the population may be helping to mitigate the spread of the disease. Towards the end of the epidemic cycle when the quantity S t−1 I t−1 /N again becomes low, the model-implied reproduction number can once again become more volatile.

We can see examples of this end-of-cycle volatility in Figure 1 for China. But in these late stages of the cycle, the model-implied R t has already served its purpose in tracking the daily progression of the disease.

In the appendix, we consider an extended version of the model that allows a fraction of infected cases to be asymptomatic. We show that a model that does not explicitly account for asymptomatic cases when they are indeed present can exhibit a larger model-implied reproduction number, thus capturing the impact of the asymptomatic cases in a reduced-form way.

Raw data for the daily number of infected (or active) cases and closed cases (recovered or deceased) are from www.worldometers.info/coronavirus/. 17 Starting from the raw data ending on July 19, we apply a centered 7-day moving average to construct the time series for I t that is used to compute R t from equation (6). For China, we use January 25, 2020 to represent t = 0. For Italy and the United States, we use February 25, 2020 to represent t = 0.

For Brazil, we use March 1, 2020 to represent t = 0. These dates allow for some smoothing of the raw data before computing the initial model-implied reproduction numbers. Given that our raw data sample runs through July 19, 16 The model-implied R t can even turn briefly negative if

17 The data for China shows only 66 infected cases on April 17. But the data for April 16 and April 18 show 1,081 and 1,058 infected cases, respectively. We interpreted the April 17 number to be a data entry error and recoded it as 1,066 infected cases. the endpoint T of the smoothed data is July 16.

We calibrate N to equal the total population of each country with the exception of China, where N equals the population of Hubei Province, the area that accounts for nearly all confirmed cases. The values of I 0 and R 0 are the smoothed number of infected and closed cases at t = 0. Following Atkeson (2020a), we set E 0 = 4I 0 in all four countries, such that S 0 = N − 5I 0 − R 0 . Based on a recent study of COVID-19 cases in China by Lauer et al. (2020) , we set σ = 1/5.1 in all four countries, implying an average incubation period of 5.1 days.

When θ T = 1, the model's law of motion for closed cases, equation (4)

where R T is the smoothed number of closed cases at the end of our data sample on day T and the denominator is the cumulative sum of smoothed infected cases through day T − 1. Using this formula, we obtain γ ≈ 1/20 for China, which is the only country so far to have experienced a nearly complete COVID-19 epidemic cycle. Based on this result, we set γ = 1/20 for all countries, implying an illness duration of about three weeks on average.

Given the common value of γ = 1/20, we solve for the value of θ T so that the model-predicted value of R T exactly matches the end-of-sample smoothed number of closed cases in each country. Specifically, we set θ T =

For China, we obtain θ T ≈ 1 by construction. For Brazil, we obtain θ T = 1.07, implying a somewhat faster transition rate from infected to closed cases. But for Italy and the United States we obtain θ T = 0.64 and θ T = 0.33, respectively, implying slower transition rates from infected to closed cases. These faster or slower transition rates may reflect the lack of uniform standards for the reporting of recoveries among local, state, or national governments. 18 But death counts can also be inaccurate, as evidenced by the April 17 revision to the number of COVID-19 deaths in Wuhan, China, which caused the number to jump from 2,579 to 3,869, an increase of 50%. 19 Figure A .1 in the appendix plots the quasi real-time evolution of θ T for each country. For the out-of-sample projections, we estimate the value of the speed-of-convergence parameter κ in equation (5) using the quasi real-time evolution of θ T for China. The estimation yields κ = 0.07 with a standard error of 0.01.

To construct model projections for the out-of-sample paths of I t and R t , we must project the future evolution of the effective reproduction number R t . Along the lines of Eksin, Paarporn, and Weitz (2019) and Cochrane (2020), we postulate a behavioral function that allows for the endogenous response of R t to movements in the number of 18 Regarding data on recoveries, Worldometer states "This statistic is highly imperfect, because reporting can be missing, incomplete, incorrect, based on different definitions, or dated (or a combination of all of these) for many governments, both at the local and national level...In view of this, "Active Cases" and "Closed Cases Outcome" which both depend on the number of recoveries (in addition to an accurate death count and a satisfactory rate of case detection, both of which are lacking in the vast majority of countries) can be affected by this inherent flaw for many countries and for the total worldwide count." Source: https://www.worldometers.info/coronavirus/about/. 19 According to the Wall Street Journal "A growing pool of global death statistics indicates that few countries are accurately capturing fatalities from the new coronavirus-and in some the shortfall is significant." Source: https://www.wsj.com/articles/most-countries-fail-to-captureextent-of-covid-19-deaths-11590658200.

Page 9 of 28 J o u r n a l P r e -p r o o f infected cases. Specifically, we assume that the out-of-sample value of R t evolves according to the law of motion

where η > 0. Equation (7) implies that the out-of-sample reproduction number is highly persistent, but it responds negatively to an increase in the lagged number of infected cases. This function captures the idea that a rising number of infections will trigger a behavioral response by individuals or health authorities that helps to mitigate the spread of the disease. A number of recent COVID-19 studies present empirical evidence in support of this type of behavioral response (Maloney and Taskin 2020 , Hatzius, Struyven, and Rosenberg 2020 , Goolsbee and Syverson 2020 , and Winkler 2020 Given the in-sample time path of the model-implied R t , we solve for the best fit values of the starting reproduction number R 0 and the behavioral response parameter η that cause the end-of-sample value of R t computed from equation (7) to hit an end-of-sample target value. 21 For Italy, the United States, and Brazil, the end-of-sample target value is the model-implied R t from equation (6) averaged over the most recent 7 days. As before, using a 7-day average helps to reduce the sensitivity of the out-of-sample projections to daily fluctuations in new infected cases. For

China, we set the end-of-sample target value to 0.1, reflecting our view that the epidemic cycle in Hubei Province is nearly complete. Otherwise, the end-of-sample target value can be unduly influenced by the end-of-cycle volatility in the model-implied R t , as evidenced in Figure 1 . 22 For the first out-of-sample projection, we set R t−1 in equation (7) equal to the end-of-sample target value for each country. Table 1 summarizes the initial conditions and parameter values used in the projections. Notes: For all countries, σ = 1/5.1, γ = 1/20, and κ = 0.07. The values of θ T , R 0 and η are computed using smoothed data that runs through T = July 16. H.P. = Hubei Province. 20 Starting of June 25, 2020, the COVID-19 model developed by the University of Washington's Institute for Heath Metrics and Evaluation (IHME) employs a behavioral function in which the trend of easing containment measures in a given location continues along its current trajectory until the daily death rate rises above a threshold, thus triggering a reintroduction of stricter containment measures. For details, see http://www.healthdata.org/covid/updates. 21 In an earlier version of this paper, we assumed that the out-of-sample reproduction number evolved according to the exogenous law of motion: R t = R 0 exp(−ηt) + [1 − exp(−ηt)]R ∞ , with R 0 and η estimated from in-sample data and R ∞ = 0.1. 22 For the quasi real-time projections plotted in Figure 7 for the earlier stages of the epidemic cycle in China, the end-of-sample target value is the model-implied R t from equation (6) averaged over the most recent 7 days.

Since China (specifically Hubei Province) has experienced a nearly complete COVID-19 epidemic cycle, it offers a template for modelling the evolution of the epidemic in other countries. The model-implied R t for China together with the "China trajectory" are plotted in Figure 1 . The level and volatility of the model-implied R t for China is high at beginning stages of the epidemic cycle when the quantity S t−1 I t−1 /N in the denominator of equation (6) The China trajectory that is used for out-of-sample projections is the estimated version of equation (7) The model-implied R t for Italy together with the "Italy trajectory" are plotted in Figure 2 . As with China, the level and volatility of the model-implied R t are high during the first 25 days of the epidemic. 24 The peak number of infections for Italy occurred on April 19 (t = 54). Compared to China, it took longer for Italy to reach its peak number of infections. The model-implied R t for Italy tracks below 1.0 after the infection peak, reflecting the persistent decline in the number of infected cases. The Italy trajectory that is used for the out-of-sample projections starts at R 0 = 6.0 and then declines over time to hit the end-of-sample target value of 0.81. States trajectory crosses below 1.0 on August 7 (t = 164), one day before the projected date of peak infections on August 8.

The model-implied R t for Brazil together with the "Brazil trajectory" are plotted in Figure 4 . As with the other countries, the level and volatility of the model-implied R t are high during the first 25 days of the epidemic. But after an interval where the level and volatility are both declining, the model-implied R t for Brazil exhibits some sharp downward and upward jumps during the middle part of April (t = 40 to 50), which reflect corresponding jumps in the number of infected cases in the data. These jumps may reflect reporting errors or corrections to reporting errors. 25 Since then, however, the level and volatility of the model-implied R t have resumed their declines. The Brazil trajectory that is used for out-of-sample projections starts at R 0 = 11.4 and then declines over time to hit the end-of-sample target value of 1.56. The Brazil trajectory crosses below 1.0 on August 9 (t = 161), one day before the projected date of peak infections on August 10. Based on this trajectory, Brazil appears roughly aligned with the United States in the COVID-19 epidemic cycle. During the month of May, it had appeared that Brazil was about two to three weeks behind the United States in the cycle. But the incoming data during the months of June and July has served to delay the projected date of peak infections for the United States.

Using the foregoing framework, we construct out-of-sample projections for the number of infected cases and the number of closed cases (recovered or deceased) in each country. In-sample, we assume that R t is given by the country's model-implied value that is computed using smoothed data that runs through July 16. For the out-ofsample projections starting on July 20, we assume that R t evolves according to the estimated version of equation (7).

The top panels of Figure 5 show the out-of-sample predictions for China. At the end of our data sample, the epidemic cycle in Hubei Province appears nearly complete with only a small number of infected cases. The mostrecent recorded death from COVID-19 occurred on May 17. The peak number of infections occurred on February 25 The raw number of infected cases dropped from 21,929 on April 13 to only 9,704 on April 14. Four days later on April 18, the raw number of infected cases was back up to 20,335. The raw number then dropped to 14,062 on April 19.

Even though COVID-19 emerged just a few weeks prior to the Chinese New Year (a period of typically high travel), the rapid deployment of NPIs proved to be effective in limiting the spread of the outbreak. This is a remarkable achievement for an area with a population of around 60 million people. 26 A study by Lai, et al. (2020) concludes that "if NPIs were conducted one week, two weeks, or three weeks later, the number of cases could have shown a 3-fold, 7-fold, and 18-fold increase across China, respectively." 27 The same study acknowledges that "If NPIs could have been conducted one week, two weeks, or three weeks earlier in China, [then] cases could have been reduced by 66%, 86%, and 95%, respectively."

At the end of our data sample, China has recorded a total of 4,634 deaths out of 83,660 closed cases, yielding a closed case fatality rate of 5.5%. But more refined estimates yield much lower fatality rates. After adjusting for lags in the reporting of deaths and differences in fatality rates by age, China's fatality rate from COVID-19 has been estimated to be in the range of 1.1% ) to 1.4% (Verity et al. 2020 , Guan, et al. 2020 . Further adjustments to include estimates of asymptomatic cases in the denominator yield even lower fatality rates-in the range of 0.5% to 0.7%.

The bottom panels of Figure 5 show the out-of-sample predictions for Italy. At the end of our data sample, there are about 12,400 infected cases and about 232,000 closed cases. The peak number of infections occurred on April 19 (t = 54) at 108,165. The projected number of closed cases at the end of the epidemic is around 260,000.

At the end of our data sample, Italy has recorded a total of 35,045 deaths out of 231,994 closed cases, yielding a closed case fatality rate of 15.1%, well above the 5.5% closed case fatality rate for China. Rinaldi and Paradisi (2020) use population level statistics of death records comparing pre-COVID and post-COVID sample periods to estimate a fatality rate of 1.29% for Italy. Using a modified SIR Model, Calafiore, et al. (2020) estimate a fatality rate of 1.18%

for Italy using cases that tested positive. 26 The first cases were identified in early December 2019. On December 31, 2019, the Wuhan Health Commission notified the China Center for Disease Control and Prevention and the World Health Organization (WHO) of a potential virus problem. On January 23, 2020, travel from Wuhan City was shut down, followed by similar travel shutdowns for 16 other cities in Hubei Province. Sources: Wu and McGoogan (2020) , and Leung et al. (2020) . 27 According to Lai et al. (2020) : "In Wuhan, where the largest number of infected people live, residents were required to measure and report their temperature daily to confirm their onset, and those with mild and asymptomatic infections were also quarantined in 'Fang Cang' hospitals, which are public spaces such as stadiums and conference centers that have been repurposed for medical care." The early detection and isolation of cases was estimated to prevent more infections than travel restrictions and contact reductions.

The top panels of Figure 6 show the out-of-sample projections for the United States. At the end of our data sample, there are about 1.953 million infected cases and about 1.946 million closed cases. The number of infected cases reached a local peak on May 30. But after trending down for five days, the number of infections reversed course and has continued to rise through the end of our data sample. The peak number of infections is projected to occur on August 8 (t = 165) at about 2.23 million. This projection reflects what might be called a "resurgent first wave" because the plot of the actual and projected number of infections (top left panel of Figure 6 ) exhibits a double-peaked shape.

The projected number of closed cases at the end of the epidemic is around 8.89 million (top right panel of Figure   6 ). The calibrated value of θ T for the United States is well below 1.0 and the peak number of infections has yet to be reached. Consequently, the projected number of closed cases at the end of the epidemic is somewhat sensitive to the value of the speed-of-convergence parameter κ that appears in equation (5). 28 Our baseline projection of 8.89 million closed cases employs κ = 0.07. When κ = 0.04, the projected number of closed cases declines to around 7.88 million. When κ = 0.10, the projected number of closed cases rises to around 9.37 million.

At the end of our data sample, the United States has recorded a total 143,289 deaths out of 1,945,627 closed cases, yielding a closed case fatality rate of 7.4%, somewhat above the 5.5% closed case fatality rate for China. According to the U.S. Centers for Disease Control and Prevention, the best estimate of the overall infection fatality rate for COVID-19 is 0.65%. 29

On July 20, 2020, the University of Washington's Institute for Heath Metrics and Evaluation (IHME) was projecting about 225,000 total deaths for the United States for the period through November 1, with an uncertainty range of about 197,000 to 268,000 deaths. 30 Prior to May 4, 2020, IHME employed a purely phenomenological model that fitted a statistical distribution to the hump-shaped curve of daily deaths in various locations and then used the fitted distribution to project out-of-sample. Starting on May 4, 2020, the IMHE projection methodology was augmented to include a SEIR model component in which the effective reproduction number is allowed to vary over time to closely match the observed number of deaths in each location. 31 Upon introduction of these updates, the projected number of total deaths from COVID-19 for the United States jumped from 72,433 to 134,475. This example 28 For the other three countries, the sensitivity of the out-of sample projections to the value of κ is much lower because θ T is already close to 1.0 (China and Brazil) or because the number of infections is well past the peak (Italy). 29 Source: www.cdc.gov/coronavirus/2019-ncov/hcp/planning-scenarios.html. 30 Daily updates of the projections can be found at https://covid19.healthdata.org/projections. 31 Details of the May 4 update can be found at http://www.healthdata.org/sites/default/files/files/Projects/COVID/Estimation_update_050420.pdf. helps to illustrate the wide range of uncertainty surrounding out-of-sample projections, even when constructed by professional epidemiologists. 32

The bottom panels of Figure 6 show At the end of our data sample, Brazil has recorded a total 79,533 deaths out of 1,285,663 closed cases, yielding a closed case fatality rate of 5.5%, the same as China. An epidemiological study of COVID-19 deaths by Ganem, et al.

(2020) estimates a case fatality rate of 1.6% for Brazil.

The four countries we examine have large differences in population, which can affect the total number of cases and the number of resulting deaths from COVID-19. Table 2 provides population-adjusted statistics for the total number of cases (infected plus closed) and the total number of deaths for each country. As before, we use the population of Hubei Province to compute the statistics for China because that area accounts for nearly all confirmed cases. Table 2 shows that China has the lowest number of population-adjusted cases whereas the United States has the highest number. China also has the lowest number of population-adjusted deaths whereas Italy has the highest number. 32 Atkeson (2020c) provides a simplified example of IHME's pre-May 4 forecasting approach. He shows that when mapped into the daily number of deaths predicted by a simple SIRD model, the IHME's approach implies an effective reproduction number that falls linearly over time, possibly resulting in an optimistic forecast if the declining time trend does not materialize in practice. Similarly, Wang, Wua, and Yang (2012) demonstrate a one-to-one mapping between the parameters of a curve-fitting approach based on the Richards (1959) Our out-of-sample projections are subject to enormous uncertainty and can sometimes shift by large amounts from one week to the next, depending on recent incoming data. This is a typical feature of epidemiology (and economic) prediction models. 33 Figure 7 illustrates this important point. Specifically, we plot a sequence of "quasi real-time" projections for the number of infected cases and the number of closed cases in China and the United States. 34 Each projection uses a different end-of-sample starting point. For each end-of-sample starting point, we recalibrate the values of θ T , R 0 , and η according to the procedures described in Section 4.

The left-side panels in Figure 7 show that our out-of-sample projections can significantly underpredict or overpredict the number infected cases during the early stages of the epidemic when the model-implied R t is above 1.0 and highly volatile. But as the epidemic evolves and the model-implied R t declines and becomes less volatile, the out-of-sample projections exhibit less sensitivity to incoming data. The sensitivity to incoming data also declines after the peak number of infections has been reached. Similarly, Fernï¿oendez-Villaverde and Jones (2020) find that their out-of-sample projections for daily deaths from COVID-19 become less noisy after the peak number of daily deaths in a given location has been reached.

The right-side panels of Figure 7 show that shifts in the projected trajectory of infected cases can translate into large shifts in the projected number of closed cases at the end of the epidemic (and correspondingly large shifts in the projected number of total deaths). This result highlights the difficulty of formulating a set of health policy containment measures that strike the appropriate balance between epidemiological benefits and the costs that derive from negative impacts to the economy and other health metrics. We note that recent studies of optimal COVID-19 containment policy often treat key model parameters, such as the disease transmission rate, as known constants, thereby suppressing a major source of uncertainty. Hornstein (2020) is an example of one study that does take into account the uncertainty regarding COVID-19 disease parameters. He shows that model-projected outcomes for total deaths as a fraction of the population can vary by a factor of nine. 33 For epidemiology models, see the record of real-time forecasts from the University of Washington's Institute of Heath Metrics and Evaluation (IHME) model, which are available from https://www.covid-projections.com. 34 Orphanides and van Norden (2002) employ this quasi real-time methodology to demonstrate that most of the variation in real-time estimates of the output gap (defined as the percent deviation of actual GDP from trend GDP) is due to new incoming data, as opposed to revisions to older data. The COVID-19 data from www.worldometers.info/coronavirus/ are frequently revised without any notifications to the user. Taking into account these real-time data revisions would increase the uncertainty surrounding our out-of-sample projections.

What accounts for the declines in the model-implied reproduction numbers plotted in Figures 1 through 4 ? A number of studies have linked declines in daily COVID-19 infections, deaths, or effective reproduction numbers to both mandatory and voluntary containment measures. For example, Xu, et al. (2020) argue that there were two turning points of daily new infections or deaths in the United States which appear to be linked to the implementation of stay-at-home orders in 10 states on March 23 and the Center for Disease Control's recommendation for the wearing of face-masks on April 3. A study by Pei, et al. (2020) of major United States metropolitan areas estimates significant declines in reproduction numbers that appear linked to declines in real-time mobility indices. Maloney and Taskin (2020) present evidence that reductions in mobility for various countries (as measured by Google mobility indices) are driven mainly by voluntary responses. A cross-country study by Deb et al. (2020) finds that daily numbers of infected cases and deaths declined in the 30 days following the implementation of governmentmandated containment measures. 35 Based on trends in Google mobility indices, Hatzius, Struyven, and Rosenberg (2020) conclude that voluntary social distancing started in many places before mandatory government controls were enacted, possibly due to fear of the virus. Figure 8 shows that declines in measures of population mobility tend to precede declines in the model-implied R t for each country. This pattern suggests that mandatory and voluntary stay-at-home behavior and social distancing during the early stages of the epidemic worked to reduce the effective reproduction number and mitigate the spread of COVID-19. 35 Data on the various containment measures are from the University of Oxford's Coronavirus Government Response Tracker: www.bsg.ox.ac.uk/research/research-projects/coronavirus-government-response-tracker. 36 The Google mobility indices are available from https://www.google.com/covid19/mobility/. 37 Data on the Goldman Sachs lockdown index are available from https://research.gs.com/content/research/en/reports/2020/07/15/38f54e72-93ba-4fdd-a166-5781558b43fd.pdf. See also Tilton and Struyven (2020) .

More recently, measures of population mobility have been trending upwards in all four countries. This pattern reflects both the relaxation of mandatory containment measures and increased voluntary mobility. 38 But as of July 19, a resurgence of new infections in some areas of the United States has triggered a reinstatement of some containment measures, consistent with our behavioral hypothesis set forth in equation (7). At the end of our data sample, measures of population mobility for the United States appear to have plateaued at a level that is below the pre-epidemic baseline.

Modeling the evolution of COVID-19 is fraught with challenges. There is an enormous range of uncertainty surrounding the projected numbers of infections, recoveries, or deaths. At the same time, this enormous uncertainty highlights the potentially large risks of relaxing containment measures too early. Some countries, including the United States, which had started to relax containment measures are now reversing course after seeing a resurgence in the number of infected cases.

Previous influenza pandemics have typically been followed by a second (and sometimes even a third) wave of infections (Moore, et al. 2020) . A second wave of infections could be magnified by "seasonal forcing" that serves to push up the effective reproduction number of COVID-19 during the Fall of 2020 (Kissler et al. 2020) . Some infectious disease experts advocate for maintaining strict containment measures long after the effective reproduction number drops below 1.0. 39 This is because a delayed relaxation date permits the number of infected cases to be driven much lower, resulting in a slower spread of the disease when random mixing between infected and susceptible groups eventually recommences. Clearly, there are epidemiological benefits of maintaining strict containment measures, but these epidemiological benefits must be balanced against the economic costs and the collateral health damage costs of doing so. 38 Chakrabarti and Pinkovskiy (2020) find that the relaxation of mandatory containment measures contributes to increases in mobility after accounting for trends that were already in place at the time of relaxation. 39 See, for example, McBryde, Meehan, and Trauer (2020) and the following Washington Post news article from April 8, 2020: https://www.washingtonpost.com/national/health-science/as-social-distancing-shows-signs-of-working-whats-next-crush-the-curveexperts-say/2020/04/08/3c720e06-7923-11ea-b6ff-597f170df8f8_story.html.

Page 18 of 28 J o u r n a l P r e -p r o o f

According to the U.S. Centers for Disease Control and Prevention, the best estimate of the percentage of COVID-19 infections that are asymptomatic is 40%. 40 Following Aguilar et al. (2020) , this appendix extends our model to allow a fraction of infected cases to be asymptomatic. We show that a model that does not explicitly account for asymptomatic cases when they are in fact present can nevertheless capture the impact of asymptomatic cases on the model-implied reproduction number in a reduced-form way. The laws of motion for the generalized model are given by:

where the superscripts s and a denote symptomatic and asymptomatic infected cases, respectively. The parameter α is the fraction of exposed cases that are infected without showing any symptoms, i.e. the probability of becoming an asymptomatic case. The effective reproduction number in the generalized model is given by R t ≡ β t /γ, where we have assumed that the daily transmission rate and the average illness duration are the same for both types of infected cases. 41

Solving equations (A.1) through (A.4) for R t yields

which collapses to equation (6) when I a t = I a t−1 = 0. The above expression implies ∂ R t /∂I a t > 0, i.e., an increase in asymptomatic cases serves to magnify the effective reproduction number for any given values of I s t , I s t−1 , I a t−1 , and 

where R t is the model-implied reproduction number from equation (6) in the reduced-form model that does not account for asymptomatic cases. Solving equation (A.7) for R t yields:

In other words, if the reproduction number R t in the true model with asymptomatic cases is sufficiently high to satisfy this condition, then the model-implied reproduction number R t in the reduced-form model that does not account for asymptomatic cases will be even higher. For example, at the start of the epidemic we have S t−1 /N ≈ 1 (because few individuals are infected) and I a t ≈ I a t−1 (because infections grow very slowly at the start). In this case, we have Notes: Given the common value of γ = 1/20 for all countries, we solve for the value of θ T so that the modelpredicted value of R T exactly matches the end-of-sample smoothed number of closed cases for each country. The figure plots the quasi-real time evolution of θ T for each country. For the out-of-sample projections (t > T), we assume that θ t converges towards 1.0, as governed by equation (5) with κ = 0.07, which is estimated from the quasi-real time evolution of θ T for China. The dashed lines show the out-of-sample paths of θ t for each country.

Investigating the impact of asymptomatic carriers on COVID-19 transmission

Dynamics of transmission and control of COVID-19: A real-time estimation using the Kalman filter

What will be the economic impact of COVID-19 in the US? Rough estimates of disease scenarios

How deadly is COVID-19? Understanding the difficulties with estimation of its fatality rate

On using SIR models to model disease scenarios for COVID-19

Estimating and forecasting disease scenarios for COVID-19 with a SIR model

Policy implications of models of the spread of coronavirus: Perspectives and opportunities for economists

The natural and unnatural histories of Covid-19 contagion

A modified SIR model for the COVID-19 contagion in Italy

The early phase of the COVID-19 outbreak in

Did state reopenings increase social interactions? Federal Reserve Bank of New York

An SIR model with behavior, The Grumpy Economist blog

Quantifying the effect of quarantine control in Covid-19 infectious spread using machine learning

The effect of containment measures on the COVID-19 pandemic

Complexity of the basic reproduction number (R0)

The macroeconomics of epidemics

Systematic biases in disease forecasting--The role of behavior change

Estimating and simulating a SIRD model of COVID-19 for many countries, states, and cities

Herd immunity: A rough guide

The impact of early social distancing at COVID-19 outbreak in the largest metropolitan area of Brazil

Explaining the boom-bust cycle in the U.S. housing market: A reverseengineering approach

Fear, lockdown, and diversion: Comparing drivers of pandemic economic decline

Clinical characteristics of coronavirus disease 2019 in China

Time series models based on growth curves with applications to forecasting coronavirus

The effect of virus control measures on the outbreak

Implications of stochastic transmission rates for managing pandemic risks

Social distancing, quarantine, contact tracing, and testing: Implications of an augmented SEIR model, Federal Reserve Bank of Richmond

Forecasting the impact of the first wave of the COVID-19 pandemic on hospital demand and deaths for the USA and European Economic Area countries

A contribution to the mathematical theory of epidemics

Social distancing strategies for curbing the COVID-19 epidemic

Identification and estimation of the SEIRD epidemic model for COVID-19

Mapping global variation in human mobility

Early dynamics of transmission and control of COVID-19: A mathematical modelling study

Tracking R of COVID-19

Effect of non-pharmaceutical interventions for containing the COVID-19 outbreak in China

Real business cycles, animal spirits, and stock market valuation

COVID-19) from publicly reported confirmed cases: Estimation and application

First-wave COVID-19 transmissibility and severity in China outside Hubei after control measures, and second-wave scenario planning: A modelling impact assessment, The Lancet

When will the COVID-19 pandemic peak? Cambridge-INET Working

Panel Forecasts of Country-Level COVID-19 Infections

Estimating epidemic exponential growth rate and basic reproduction number

Determinants of social distancing and economic activity during COVID-19: A global view

Flattening the curve is not enough, we need to squash it: An explainer using a simple model

COVID-19: The CIDRAP viewpoint. The future of the COVID-19 pandemic: Lessons learned from pandemic influenza

The unreliability of output-gap estimates in real time

Differential effects of intervention timing on COVID-19 spread in the U.S

A flexible growth function for empirical use

An empirical estimate of the infection fatality rate of COVID-19 from the first Italian outbreak

Real-time forecasts of the COVID-19 epidemic in China from

Estimating the infection and case fatality ratio for COVID-19 using age-adjusted data from the outbreak on the Diamond Princess cruise ship

Data gaps and the policy response to the novel coronavirus, National Bureau of Economic Research

Effective lockdown index: July 15 update

Susceptible-Infected-Recovered (SIR) dynamics of COVID-19 and economic impact

Estimates of the severity of coronavirus disease 2019: A model-based analysis

Richards model revisited: Validation by and application to infection dynamics

Evolving epidemiology and impact of non-pharmaceutical on the outbreak of coronavirus disease

For the economy, cases matter more than deaths, Deutsche Bank Research, FX Blog

Nowcasting and forecasting the potential domestic and international spread of the 2019-Ncov outbreak originating in Wuhan, China: A modelling study

Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: Summary of a report of 72314 cases from the Chinese Center for Disease Control and Prevention

Associations of stay-at-home order and face-masking