key: cord-0756341-fy5oivct
authors: Yang, Wan; Shaman, Jeffrey
title: Author Correction: Development of a model-inference system for estimating epidemiological characteristics of SARS-CoV-2 variants of concern
date: 2021-12-16
journal: Nat Commun
DOI: 10.1038/s41467-021-27703-9
sha: 1c1a3af8e4d86048bfbf5d71793bbbd3b4a477ed
doc_id: 756341
cord_uid: fy5oivct

nan

Many respiratory infections tend to occur seasonally and are predominantly prevalent during certain months of the year (e.g., cold months in temperate climates). 1 This seasonal pattern has been documented for influenza viruses, 2 respiratory syncytial viruses, 3 and endemic human coronaviruses. 4 In addition, studies have showed that this seasonality may be associated with climate conditions -particularly, temperature and humidity -as they may modulate the survival and transmission of respiratory viruses. 5, 6, 7, 8 For the SARS-CoV-2 virus, our work has also shown that a winter-time seasonality exists, similar to endemic human coronaviruses in New York City (NYC), and that models accounting for this seasonality enable more accurate projection of COVID-19 pandemic dynamics than those do not. 9, 10 However, to date, no mechanistic models exist that quantify the response of the SARS-CoV-2 virus to temperature and humidity and in turn the seasonality of COVID-19. In addition, seasonal trends may differ by climate. For instance, epidemics of influenza can occur any time of the year in subtropical and tropical climates; it is thus more difficult to characterize the seasonality of respiratory infections in these climates. To address these challenges, we recently developed a flexible climate-forced model of epidemic dynamics for subtropical and tropical climates; results with this model also describe the response to temperature and humidity conditions common in temperate climates. 11 Thus, to account for the potentially diverse seasonal trends of COVID-19 in the UK (temperate climates), South Africa (mostly temperate climates), and Brazil (mostly tropical climates), we applied this climate-forcing to temperature and humidity data for each country and computed the relative seasonal trend for each country.

Specifically, the climate-forcing takes the following form: where R0(t) is the basic reproduction number at time t; q(t) is specific humidity (i.e. a measure of absolute humidity) at time t; and T(t) is temperature at time t. In essence, the forcing function assumes that specific humidity has a bimodal effect on R0, with both low and high humidity conditions favoring transmission; in addition, this effect is moderated by temperature, where low temperatures promote transmission and temperatures above a certain threshold (i.e., Tc in Eqn. S1) limit transmission. Further, to link the coefficients a, b, and c to humidity q and R0, Yuan et al. 11 reparametrized the forcing function by solving the parabola with a nadir at (qmid, R0max -R0diff) and maxima at both (qmin, R0max) and (qmax, R0max), such that: Yuan et al. 11 estimated the parameters R0max (i.e., the maximum R0), R0diff (i.e., the difference between the maximum and minimum R0), qmin, qmid, and qmax (i.e., the minimum, median, and maximum specific humidity for the response), Tc (the threshold temperature) and Texp (the exponent in Eqn S1) for influenza in Hong Kong, a subtropical city, based on long-term epidemic data collected therein during 1998 -2018. Here we use their mean estimates for these parameters and temperature and humidity data for each country (see main text and Supplementary Fig 7) to compute the seasonal trend for each country using Eqns. S1-2. However, as these parameters were estimated for influenza, the outputs do not represent the actual R0 for the SARS-CoV-2 virus. Thus, we instead compute the relative seasonal trend, by scaling the weekly country output from Eqn S1 by the country mean output, such that this scaled output provides a relative, seasonality-related transmissibility for each week of the year. These relative estimates also decouple the seasonality-related and variant-specific transmissibility (assuming no interaction; see below). The estimated seasonal trends are shown in Supplementary Fig 2 and used in the epidemic model to represent disease seasonality for each country via the parameter bt in Eqn S3 (see below).

The model-inference system developed for this study consists of an SEIRSV model to simulate the transmission dynamics of SARS-CoV-2 and the ensemble Kalman adjustment filter (EAKF) 12 to estimate the model state variables and parameters, based on case and mortality data. Here we describe the model and the filtering method in detail.

2.1. Epidemic model. The SEIRSV (susceptible-exposed-infectious-removed-susceptible-vaccination) model uses the following set of equations to simulate the transition of sub-populations between different disease stages, while accounting for disease seasonality, concurrent non-pharmaceutical interventions (NPIs), and vaccination: 

where S, E, I, R are the number of susceptible, exposed (but not yet infectious), infectious, and removed (i.e., recovered or immune) individuals, respectively; N is the population size. The parameter ε represents travel-related importation of infections (nominally set to 1 per 20 days per 1 million population, unless specified otherwise).

To account for local seasonality, bt, the estimated relative seasonal trend for each country (see Section 1 above and Supplementary Fig 2) , is used to adjust the relative transmission rate at time t. For instance, for the UK ( Supplementary Fig 2A) , the estimated seasonal trend bt is 1. 15 for week 1 (a week during winter), indicating the transmission rate for that week is 1.15 times the yearly average, due to the more favorable winter weather conditions; similarly, estimated bt is 0.86 for week 27 (a week during summer), indicating the transmission rate for that week is 86% of the yearly average, due to the less favorable summer weather conditions. Different NPIs (e.g., stay-at-home mandate for nonessential workers, school closures, masking mandate, and other social distancing rules and preventive measures) have been implemented during the COVID-19 pandemic. However, the specific NPIs implemented varied by location and time; in addition, the effectiveness of each NPI remains unclear. It is thus difficult to detail and capture specific NPI and effectiveness in the model. Nevertheless, previous studies have shown that the overall impact of NPIs (e.g., reduction in the reproduction number Rt) has been highly correlated with population mobility during the COVID-19 pandemic. 9, 13 14 As such, here we used the relative mobility as observed in each study location to represent the overall impact of NPIs. Specifically, to account for concurrent NPIs, the term mt, the relative population mobility at time t (in this study, we use data from Google Community Mobility Reports; 15 see main text and Supplementary Fig 2) , is used to adjust the transmission rate. For instance, for the UK ( Supplementary Fig 2A) , relative mobility mt was 41% of pre-COVID levels for the week starting March 22, 2020; per Eqn 3, the overall transmission rate for that week was reduced by 41%, before adjusting for the effectiveness. To further account for the potential changes in effectiveness, the model additionally includes a parameter, et, to scale NPI effectiveness at time t. This effectiveness (et) was then estimated by the model-inference system along with other model variables and parameters.

For virus-specific characterization, N K is the variant-specific transmission rate at time t, Z is the latency period, D is the infectious period, and L is the immunity period. Note that the parameters et, N K , Z, D, and L are estimated by the model-inference system as described below.

To incorporate changes in population susceptibility due to vaccination, the model accounts for two-dose vaccination via v1(t) and v2(t). Specifically, v1(t) is the number of individuals successfully immunized after the first dose of the vaccine and is computed using vaccination data and vaccine efficacy for one dose (see detailed settings in Supplementary 10 Specifically, we include 1) a time-lag from infectiousness to detection (i.e., an infection being diagnosed as a case) -drawn from a gamma distribution with a mean of Tm and standard deviation (SD) of Tsd days -to account for delays in diagnosis and detection; 2) an infection-detection rate (r), i.e. the fraction of infections (including subclinical or asymptomatic infections) reported as cases, to account for underdetection; 3) a time-lag from infectiousness to death, drawn from a gamma distribution with a mean of 14 days and a standard deviation of 10 days, empirically based on mortality data; and 4) an infection-fatality risk (IFR), i.e., the fraction of infections that die from COVID-19. To compute the model-simulated number of new cases per week, we multiply the modelsimulated number of new infections per day by the infection-detection rate, and further distribute these simulated cases in time per the distribution of time-from-infectiousness-todetection. We then aggregate the daily lagged, simulated cases to weekly totals for model inference (see below). Similarly, to compute the model-simulated deaths per week and account for delays in time to death, we multiply the simulated-infections by the IFR and then distribute these simulated deaths in time per the distribution of time-from-infectiousness-to-death, and aggregate these daily numbers to weekly totals. For each week, the infection detection rate (r), the mean (Tm) and standard deviation (Tsd) of time-from-infectiousness-to-detection, and the IFR are estimated based on weekly case and mortality data, along with other model parameters.

2.3. Inference using the EAKF At the end of each week, the inference system uses the EAKF to update the state variables and parameters based on model-generated prior estimates and case and mortality data. Briefly, the EAKF uses an ensemble of model realizations (n=500 here), each with initial parameters and variables randomly drawn from a prior range (see Supplementary Table 2 ). After model initialization, the system integrates the model ensemble forward in time for a week (per Eqn S3) stochastically to compute the prior distribution for each model state variable or parameter, as well as the model-simulated number of cases and deaths for that week as described in Section 2.2. The system then combines the prior estimates with the observed case and death data for the same week to compute the posterior per Bayes' theorem. 12 During this filtering process, the system updates the posterior distribution of all model parameters and variables for each week. 12 As such, it is able to capture the time-varying changes in transmission dynamics including the variant-specific transmission rate (N K ) and infectious period (D) -the two parameters that we use to compute variant-specific transmissibility over time.

However, unlike previous studies using similar model-inference approaches, here we further modify the EAKF filtering process to test different potential combinations of changes in transmissibility and immune escape. To enable this exploration of systemic changes (e.g. due to the emergence of a new variant), we randomly replace a small fraction of ensemble members (3-10%) using values randomly drawn from specified ranges. This technique, termed space reprobing (SR), was developed in order to explore state space without corrupting performance of the filter. 16 Specifically for this application, we apply SR to a given related set of parameters/variables and restrict the EAKF update of non-related parameters/variables, for 14 different hypothesized behaviors. These hypothesized changes are as follows:

1) Hypothesis 1 (minor changes in transmissibility, no immune escape): Large updates are only allowed for the two transmissibility-related parameters N K and D; to explore the changes, the system applies SR to these two parameters using values drawn from prior ranges 10-20% higher than the initial priors. 2) Hypothesis 2 (major changes in transmissibility only, no immune escape): Similar to 1); but to explore the changes, the system applies SR to N K and D using values drawn from prior ranges 30-40% higher than the initial priors.

3) Hypothesis 3 (minor immune escape only, no changes to transmissibility): Large updates are only allowed for S, the population susceptibility, up to a total loss of 50% of the prior immunity. 4) Hypothesis 4 (major immune escape only, no changes to transmissibility): Large updates are only allowed for S, the population susceptibility, up to a total loss of 95% of the prior immunity. 5) Hypothesis 5 (minor changes in transmissibility + minor immune escape): combining 1) and 3) above. 6) Hypothesis 6 (major changes in transmissibility + major immune escape): combining 2) and 4) above. 7) Hypothesis 7 (minor changes in transmissibility + major immune escape): combining 1) and 4) above. 8) Hypothesis 8 (major changes in transmissibility + minor immune escape): combining 2) and 3) above. 9) Hypothesis 9 (changes in both transmissibility and immune escape, no restriction on magnitude of change): Large updates are allowed for N K and D as well as S. To explore the changes, initial SR uses values drawn from prior ranges 10-20% higher than the initial priors, and values up to 30-40% higher than the initial priors if the inference system detects the prior continues to underestimate the observed cases and deaths with the 10-20% initial increase in SR values. In addition, SR allows updates of S up to 95% of the prior immunity.

To account for slower changes in overall population immunity (i.e., in the entire country) as the new variant gradually spreads to different sub-regions across a large geographic space, such as in Brazil, we also explore the fitting using the following five additional settings: 10) Hypothesis 10 (immune escape only and changes to overall population immunity occur slowly over time): Large updates are only allowed for S, up to a total loss of 95% of the prior immunity; however, SR is applied to a smaller fraction of ensemble members than in 1)-9) such that changes in S occur gradually. 11) Hypothesis 11 (minor changes in transmissibility + minor immune escape; both occur slowly over time): Large updates are allowed for N K and D as well as S. Adjustment to S is allowed as in 10) but up to only 50% of prior immunity. In addition, for transmissibility, the system applies SR to N K and D using values drawn from prior ranges 10-20% higher than the initial priors. 12) Hypothesis 12 (major changes in transmissibility + minor immune escape; both occur slowly over time): Similar to 11); however, for N K and D, initial SR uses values drawn from prior ranges 10-20% higher than the initial priors, and values up to 30-40% higher than the initial priors if the inference system detects the prior continues to underestimate the observed cases and deaths with the 10-20% initial increase in SR values. 13) Hypothesis 13 (minor changes in transmissibility + major immune escape; both occur slowly over time): Similar to the settings specified in 11) but adjustment to S is allowed up to 95% of prior immunity. 14) Hypothesis 14 (major changes in transmissibility + major immune escape; both occur slowly over time): Similar to settings specified in 12) but adjustment to S is allowed up to 95% of the prior immunity.

We carry out the model-inference process for each of the 14 settings described above and for each country dataset. We then select the most plausible hypothesis for each country based on the following criteria: 1) model fitting to case and mortality data, as indicated by the relative root-mean-squared-error (RRMSE) between the posterior estimates for the corresponding variable (i.e. case rate or mortality rate) and data; 2) the accuracy of one-step ahead prediction, as indicated by the RRMSE between the prior estimates for the corresponding variable (i.e. case rate or mortality rate) and data; 3) the level of adjustment needed for two key variables, i.e., infection rate and case rate, as indicated by the RRMSE between the prior and posterior estimates for each variable; 4) a penalty on the number of variables needing SR adjustment; and 5) a penalty on the frequency of SR adjustment. We combine all these metrics by weighting them heuristically using the following set of weights: 0.27 for the two metrics in 1); 0.13 for the two metrics in 2), 0.03 for the two metrics in 3), and 0.07 for both 4) and 5). We also tested other sets of weights and found that higher weights should be given to 1) and 2) based on results from the synthetic testing where the 'true' values of the state variables and parameters are known; however, in general, the final results are similar if there are minor changes to these weights.

To account for model stochasticity, we repeat each model-inference 100 times for each dataset, each with initial parameters and variables randomly drawn from the prior distributions (Supplementary Table 2 ). Each model-inference tests the 14 hypotheses described above, selects the one with the best performance (i.e. minimizing the combined metric described above), and outputs the estimates of the best-performing run. That is, the model estimates reported in the main text are aggregated from 100 best-performing model runs (each with 500 ensemble members and totaling 50,000 individual model realizations).

To test the accuracy of our model-inference system, we generate 10 synthetic datasets using a separate multi-variant SEIRS model, similar to models developed in Yang et al. 17 and Gog & Grenfell. 18 Within this model, variants can interact via cross-immunity, which protects a portion of individuals with prior infection (i.e. polarized immunity) by reducing transmission. Specifically, the model takes the following form:

where N is the population size; Si, Ei, Ii, and Ri, are, respectively, the numbers of susceptible, exposed-but-not-yet infectious, infectious, and recovered individuals, with respect to variant-i (here, the wild-type SARS-CoV-2 virus or a new variant); N = , Di, and Li are, respectively, the transmission rate, mean infectious period, and mean immunity period, for variant-i; and cij measures the strength of cross-immunity to variant-i conferred by infection with variant-j (e.g., close to 0 if it is weak and cii=1 for infection by the same variant). The parameter ε represents travel-related importation of infections; to generate the synthetic data (i.e., "truths"), we set ε to 1 per week for the first 5 weeks and 1 every 3 days for the rest of the first wave (here weeks 1-20); for the second wave, as transmission has been established locally, we set ε to 0 for simplicity. The terms bt, mt, and et are the same as in Eqn S3 and account for seasonality and NPIs over time. For simplicity, we omit birth, death, and vaccination.

To generate the synthetic data (i.e., "truths"), we seed the Eq. S4 model with 2 infections of wild-type virus at the start of each simulation and 50 infections of a new variant at the start of Week 21, for N = 1 million people; we run the model stochastically with a daily time-step from the week starting 3/1/2020 to the week starting 2/21/2021 (i.e. 52 weeks in total) using the parameters listed in Supplementary Table 3 . To compute the weekly number of cases and deaths, we use the same procedure as described in Section 2.2 above for each variant. We then combine the case/mortality estimates for both variants, add random noises drawn from a Poisson distribution to mimic observational error. The final noise-added weekly case and mortality time series are then used as synthetic data for testing the model-inference system (described in Section 2 above). To compare the posterior estimates of key parameters and variables (e.g. transmissibility and population susceptibility) from the model-inference system, we compute the true values of population susceptibility and transmissibility over time as the weighted average of the two variants based on the relative prevalence during each week. 

We modify the multi-variant model in Eqn S4 to further include age structure and vaccination. The inclusion of age structure here allows incorporation of age-specific parameters (e.g., transmission rate and infection-fatality risk) as well as age-specific vaccination coverage and rates. Specifically, this multi-variant, age-structured SEIRSV model takes the following form:

Model parameters in Eqn S5 are similar to those in Eqn S4, except for those related to age, which are indicated by the superscripts. , where N = Y; is the transmission rate of variant-i from age-group a to age-group A). The vaccination model component is similar to Eqn S3, but with age-stratification. However, the terms R =,S Y ($) and R =,* Y ($) are variant-specific, as indicated by the additional subscript i; that is, they additionally account for the reduction in vaccine efficacy against the new variants, based on scenario assumptions specified in Supplementary Table 5 .

As Due to a lack of information, we do not account for potential differences in infection fatality risk by variant; therefore, the simulated mortality under different scenarios only reflect the relative infection rate by age group, for which we apply age-specific infection-fatality risk (see Supplementary Table 5 ). In addition, due to the uncertainty of the infection fatality risk among breakthrough infections (i.e., those who have been vaccinated), we only show mortality-related simulations for the "Same VE" scenario which assumes no reduction in VE.

We run the model for each scenario 1000 times stochastically, with the parameters and initial conditions randomly drawn from uniform distributions with ranges specified in Supplementary  Table 5 . Results are summarized from the 1000 model runs for each scenario. Supplementary Fig 1. Model-inference system validation using model-generated synthetic data with an infection-detection rate of 10%. For this testing, the true values of incidence and mortality by week (A), transmissibility by week (B, top panel), population susceptibility by week (b, bottom), and overall changes in transmissibility and immune escape due to a new variant (C) were generated by model simulations with prescribed parameters and conditions. Unlike the real-world in which most epidemiological characteristics are unobserved, here these quantities (i.e. the 'Truth') are prescribed and known and thus can be compared to estimates made with the model-inference system using the synthetic, model-generated incidence and mortality data Table 5) and NPIs (as indicated by the x-axis labels, see detail in Supplementary Weekly average temperature and specific humidity for the three countries. These data were used to estimate the disease seasonality for each country. 

Seasonality of Respiratory Viral Infections

Global influenza seasonality: reconciling patterns across temperate and tropical regions

Epidemic dynamics of respiratory syncytial virus in current and future climates

Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period

Environmental predictors of seasonal influenza epidemics across temperate and tropical climates

Absolute humidity modulates influenza survival, transmission, and seasonality

Absolute humidity and the seasonal onset of influenza in the continental United States

Mechanisms by which ambient humidity may affect viruses in aerosols

Effectiveness of non-pharmaceutical interventions to contain COVID-19: a case study of the 2020 spring pandemic wave in New York City

Estimating the infection-fatality risk of SARS-CoV-2 in New York City during the spring 2020 pandemic wave: a model-based analysis. The Lancet Infectious diseases 21

Modeling influenza seasonality in the tropics and subtropics

An ensemble adjustment Kalman filter for data assimilation

Timing of Community Mitigation and Changes in Reported COVID-19 and Community Mobility -Four

The effect of human mobility and control measures on the COVID-19 epidemic in China

A simple modification for improving inference of non-linear dynamical systems

Dynamic interactions of influenza viruses in Hong Kong during

Dynamics and selection of many-strain pathogens

Projections of COVID19 Epidemic Outcomes and Healthcare Demands for

Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia

Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study

Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2)

Evolving epidemiology and transmission dynamics of coronavirus disease 2019 outside Hubei province, China: a descriptive and modelling study

Estimates of the severity of coronavirus disease 2019: a model-based analysis

Large outbreak during the first wave (See Fig 1 and Supplementary Fig 1) . No increase in transmissibility (i.e. same β and D as for the wildtype virus); 80% increase in immune escape. Other parameters same as the wildtype virus.

Large outbreak during the first wave (See Fig 1 and Supplementary Fig 1) . 50% increase in transmissibility (i.e. β = 0.65 x 1.5 = 0.975 per day); no immune escape. Other parameters same as the wildtype virus.

Large outbreak during the first wave (See Fig 1 and Supplementary Fig 1) . 50% increase in transmissibility (i.e. β = 0.65 x 1.5 = 0.975 per day); 80% increase in immune escape. Other parameters same as the wildtype virus.

Large outbreak during the first wave (See Fig 1 and Supplementary Fig 1) . 25% increase in transmissibility (i.e. β = 0.65 x 1.25 = 0.8125 per day); 40% increase in immune escape. Other parameters same as the wildtype virus.

Small outbreak during the first wave (See Fig 1 and Supplementary Fig 1) . 50% increase in transmissibility (i.e. β = 0.65 x 1.5 = 0.975 per day); no immune escape. Other parameters same as the wildtype virus.Infectiondetection rate 20% for results shown in Fig 1 and 10% for results shown in Supplementary Fig  1.