key: cord-0853032-sqdk57f9 authors: Mandal, M.; Mandal, S. title: Detection of transmission change points during unlock-3 and unlock-4 measures controlling COVID-19 in India date: 2020-11-18 journal: nan DOI: 10.1101/2020.11.17.20233221 sha: 87f4eaa341b9c3bb814eaa5cfc4880eb92ea464d doc_id: 853032 cord_uid: sqdk57f9 Documentation in scientific literature is not available on prospective evaluation of the efficiency of the unlock measure related to COVID-19 transmission change points in India, projecting the infected population, planning suitable measures related to future interventions and lifting of restrictions so that the economic settings are not damaged beyond repair. We have applied SIR model and Bayesian approach combined with Monte Carlo Markov algorithms on the Indian COVID-19 daily new infected cases from 1 August 2020 to 30 September 2020. We showed that the COVID-19 epidemic declined after implementing unlock-4 measure and the identified change-points were consistent with the timelines of announced unlock-3 and unlock-4 measure, on 1 August 2020 and 1 September 2020, respectively, effectiveness of which were quantified as the change in both effective transmission rates (100% reduction) and the basic reproduction number attaining 1, implying measures taken to control and mitigate the COVID-19 epidemic in India managed to flatten and recede the epidemic curve. To contain COVID-19 spread in India, strong phasic lockdowns were implemented leading to reduction of human contact to a maximum 55%, and 34% at the end of lockdown, followed by stratified unlock measures with gradual return to activities, controlling social contacts to 19% reduction, as on 30 September, 2020 (IHME, 2020). India, currently is the world's second-worst-hit country with nearly 11.7 million COVID-19 infections including more than 98,000 deaths, as on 30 September 2020 (COVID19India, 2020). Until COVID-19 is completely eradicated, and effective treatment or vaccine become available, non-pharmaceutical intervention policies are the key public health options to control the epidemics (Varghese et al., 2020) . During the evolution of COVID-19, India implemented lockdowns in four phases from 24 March to 31 May 2020 as containment and mitigation measure, followed by unlocks in four phases from 1 June to 30 September 2020, featured by conditional relaxations of restrictions outside containment zones in graded manner, to minimise the negative economic and social consequences of strict lockdown measures (MOHWF, GoI, 2020). With the escalating case numbers and prolonged COVID-19 epidemic situation in India, the present study is an investigation to determine, if within the currently implemented non-pharmaceutical strategies, taking into account the simulated stochastic SIR model of transmission dynamics, is effective in curbing the spread of COVID-19. For this purpose, we made posterior inference on transmission rate , recovery rate , reproduction number 0 , number of initially infected people 0 , reporting delay , width of liklihood between observed daily infected cases and its best fit estimates, effective transmission rate λ * = λ − μ, based on data-driven likelihood updates of prior settings. We determined also the changepoints in disease transmission and investigated the effectiveness of unlock-3 and unlock-4 measures, with respect to their strength, timing and duration. We used an open source probabilistic programming in Python code PyMC3 with theano to compute gradients via automatic differentiation variational inference (ADVI), and followed model interpretation on German COVID-19 data (Dehning et al., 2020) , based on GitHub repository (Priesemann-Group, 2020), to analyse the recent COVID-19 pandemic situation in India with emphasis on unlock-3 and unlock-4 measure. The data on ongoing new daily and cumulative COVID-19 cases in India were retrieved from Johns Hopkins University Centre for Systems Science and Engineering dashboard, up to September 30, 2020 (CSSE, 2020). The codes for this research article pertaining to the analysis of unlock-3 and unlock-4 situation in India was run on Jupyter notebook using PymC3=3.8 and was based on GitHub repository (Priesemann-Group, 2020), by importing python-based data analysis toolkit (pandas); libraries for working with arrays (numpy), plotting (matplotlib), scientific and technical computing (scipy), and multi-dimensional arrays (theano); modules including Basic date and time types (datetime), Systemspecific parameters and functions (sys), and Python object serialization (pickle); package for Bayesian statistical modeling and probabilistic machine learning with MCMC and ADVI algorithms (PymC3) (Kucukelbir et al., 2007) . The SIR model was based on time-varying cumulative number of COVID-19 cases, where the total population size ( ) was categorized into three mutually exclusive infection levels, assuming that any infectious person ( ), is likely to contact any susceptible person ( ), and later recovered ( ), so that = + + . The dynamics of the pandemic in India was modelled using the following three differential equations: where represents the transmission rate of the infected people to infect susceptible people and denotes the recovery rate of the infected people to recover (Kermack et al., 1927) . This is solved by using a forward finite-difference scheme (Carcione et al., 2014) : where is a natural number which divides time in discrete time steps, = . The fraction of maximum number of infected people, ( ) = 1 + The SEIR model is an extension of the SIR with an added exposure ( ) period due to the reported incubation period of COVID-19 during which individuals are not yet infectious (Hethcote, 2000) . The SEIR models the total population size ( ) divided into four mutually exclusive infection stages, = + + + , and is based on following differential equations: where, σ is the rate at which individuals in incubation become infectious. The differential equation is solved as (Carcione et al., 2014) : A reporting delay D, was incorporated in becoming infected ( ) and being reported, such that the daily reported cases at any time t was given by (Dehning et al., 2020 ), = − . To examine if there was any weekend effect on daily reported case numbers, a periodic sine function was assigned to the reporting fraction ( ) expressed as, where and are the weekly modulation amplitude and phase respectively (Dehning et al., 2020). where, ( ) = ∏ ( , ) =1 is the likelihood function. The likelihood is a measure of the goodness of fit between model prediction and the observed data on reported case numbers, applied hereby using Student-t distribution. The evidence = ( 1 , … , ) = ∫ ( 1 , … , , ) ( ) = ∫ ( ) ( ) . (9) The Bayesian posterior interval estimate for and , ∈ (0,1), = ( , ), is given by, The Bayesian predictive distribution is ( , ) = ∫ ( , ) ( | ) (11) The inferences about a function = ( ), so that cumulative distribution function for is ( , ) = ( ( ≤ , ) where = ( : g(θ) ≤ t); the posterior density is ( , ) =ˊ( , ). The prior distribution settings for the model parameter estimation were made by incorporating LogNormal values of λ, μ, and D and half-Cauchy distribution for I0, and σ (Table 1 and 2). The priors on change points in transmission rate were based on announcements of applied intervention including unlock-3 on 1 August 2020 and unlock-4 on 1 September 2020. This method is essentially a Monte Carlo sampling with multiple Markov chains, used to approximate the posterior distribution of model parameters by including ADVI, 1000 tuning steps with NUTS (No U Turn Sampling) algorithm (Hoffman et al., 2014) for each of four chains, and R-hat diagnostics for equilibrated chain convergence of model parameters (Vehtari et al., 2019) . A sequence of random variables{ 1 , … , }, on a discrete state apace is called a Markov chain if ( = , −1 = −1 , … , 1 = 1 ) = ( = , −1 = −1 ) (13) We wanted to find a setting of a parameter ∈ , such that the expectation ℎ( ) ≡ ( , ) = 0, the updates were applied as (Hoffman et al., 2014) : where t is iteration, is the step size schedule, is a freely chosen point that the iterates are shrunk towards, > 0 is a free parameter that controls the amount of shrinkage towards , 0 ≥ 0 is a free parameter that stabilizes the initial iterations of the algorithm, ≡ − is a step size schedule satisfying is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 18, 2020. ; the conditions, ∑ = ∞; ∑ 2 < ∞. The step size parameter was set for NUTS using stochastic optimization with vanishing adaptation of the primal-dual algorithm. For each iteration we defined the statistic and its expectation when the chain reached equilibrium as (Hoffman et al., 2014) : where is the set of all states explored during the final doubling of iteration of the Markov chain; −1 , ,0 are the initial position and sampled momentum for the ℎ iteration of the Markov chain; is the average acceptance probability. We applied in the above updates equation: and ≡ for the step size to combine ℎ = for any ∈ (0,1). For model fit and comparison using MCMC, following computations were made using (leave one out) package in PyMC3: the Bayesian estimate of the expected log pointwise predictive density ( -) for a new point, standard error ( ) of -, the difference between and the non-cross-validated log posterior predictive density ( )interpreted as the effective number of parameters (Vehtari et al., 2017) . Lower scores indicated better consistency between models. scores with < 1 represented model compatibility while scores > 1 indicated mismatch between the models. is the distribution of the true data generating process for ~, which is approximated by cross-validation with computed with draws from a posterior distribution=̂= computed log pointwise The -, calculated by cross-validation by running the model times is The median posterior distribution of COVID-19 epidemiological parameters using SIR model (from Eqs. 1 and 2) combined with Bayesian inference (generated by Eqs. from 8 to 15) during unlock-3 were Table 3 ). The median estimates of 0 decreased from 1.07 in the unlock-3 period to 1.00 in the unlock-4 period, with SIR model, indicating slowing down of the spread of the disease. The 0 was reported as 1.14 at the end of August 2020, and 1.12 in the mid of September 2020 in India, with stationary-time-series auto regressive integrated moving average model (Yadav et al., 2020) . The mathematical models help to determine the effect of preventive policies against COVID-19, primarily by maintaining the reproduction number 0 < 1 to inhibit further spread of infection, whereas 0 > 1 indicate continuation of the epidemics, which fade away when the transmissibility is reduced by (1 − is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 18, 2020. ; the end of unlock-3 (Figs. 1B and 2B). The difference in daily infected cases showed approximately 1.125 fold change to 5,625 during unlock-4 on 30 September, 2020, from 5000 during unlock-3 on 31 August, 2020 (Figs. 1C and 2C). The COVID-19 daily cases increased during unlock-3 but decreased during unlock-4, though with a higher end point (Figs. 1A and 2A) . The real-time daily and cumulative cases were consistent with SIR-model displaying linear-semi-logarithmic variation. A decreasing first order differences of the logarithm of the cumulative cases over time indicated exponential growth as found for India, whereas a constant trend indicated logarithmic growth of the epidemic curve as seen in the US, for the period from 17 September 2020 to 1 October 2020 (Baruah, 2020) . Both daily and cumulative infected cases remain unaltered until the duration of and change-point were over, beyond which both continued to rise, build on the hypothesis that ongoing unlock phase and its post-effect prevailed, but continuation of pre-unlock situation caused decline in both, the effect being more significant in the daily cases ( Fig. 3A and 3D) . The daily as well as cumulative case numbers showed rising trend in post unlock-3 scenario (Fig. 3A) , however, the onset timings of intervention had no effect on case numbers (Fig. 3B) . With declining new cases and rising cumulative cases (Fig. 3D) , during unlock-4, advancing or delaying change-point onset by five days showed insignificant difference in cumulative cases (Fig. 3E) . The cumulative cases remained unchanged with the change in transient duration of intervention, the new cases showed similar variation as ( Fig. 3C and 3F) . The SIR-model parameters with two identified change-points without weekend effect showed that the first change-point matched with the timelines of publicly announced strategies around 1 August 2020, when unlock-3 began, coinciding with continued closure of educational institutions and banned social gatherings, permission of interstate transport, besides release of night curfews. This change-point featured λ 1 = 0.17 (0.12 − 0.21) that unfolded over 2.9 days (1.6 − 5.4) (Fig. 4F) . The second changepoint was detected around 1 September 2020, which coincided with the announcement of unlock-4, featured by lockdown measures remaining in force in containment zones, some activities permitted outside containment zones with reopening of metro-rail in graded manner, small gatherings permitted, continued compulsion of face-masking in public. The second change point had λ 2 = 0.15 (0.10 − 0.19) that unfolded over 3.3 days (1.7 − 6.7) (Fig. 4F ). With λ 0 = 0.16, = 15.6, = 9.6 days, = 0.15, the change-points were quantified as λ 1 * = λ 1 -= 0.02, 0 = 1.13 during unlock-3 and λ 2 * = λ 2 -= 0, 0 = 1 during unlock-4, implying effectiveness of unlock-4 measure bringing 100% reduction of * and decline of the COVID-19 epidemic (Fig. 4) . Thus the effectiveness of an intervention modelled as Bayesian change points could help us interpret the impact of different control measures and to include them into forecasts. Previously, Bayesian inference of COVID-19 change points correlating with social distancing restrictions were applied using the example of Germany (Fig. 5) . Sampling were run for 4 chains with 1000 tunes and 500 draw iterations so that a total of 6000 draws occurred. Thus, in our study the lower in the former, differing by 1.18 , indicated higher consistency of the twochange-points SIR model with data excluding the weekend-factor that implied homogenous reporting of daily new cases through the entire week irrespective of weekend effect. However, higher number of COVID-19 daily new cases were reported during weekdays compared to weekends in Germany, substantiated by lower score in weekend-effect model compared to that without weekend-effect model (Dehning et al., 2020) . Application of SEIR model based on Eqs. 4 and 5 with two-change-points and weekend-modulation as per Eqs. 6 and 7 ( = 0.8, = 2.7), exhibited greater negativity of the effective transmission rate with unlock-4 compared to unlock-3 measure (λ 1 * = 1 − = −0.03, λ 2 * = 2 − = −0.07) (Fig. 6) , and higher transmission and recovery rates ( 0 = 0.36, 1 = 0.30, 2 = 0.26, = 0.33) (Fig. 6) compared to the SIR model (Fig. 5) . Cross-validation showed the -for the SEIR-model (−650.94, = 7.16, = 13.66, computed from 2000 by 68 log-likelihood matrix) slightly greater (by 5.97) than that of the corresponding SIR-model (Fig. 5) , with 1 (0.34) higher variation (Table 3) , indicating considerably greater evidence for SIR model with respect to SEIR in explaining current COVID-19 data in Indian context. Similarly, SIR model displayed superior goodness of fit to the SEIR on South African data whereas SEIR produced a slightly better score than the SIR main model on German data (Dehning et al., 2020; Mbuvha et al., 2020). Estimating the effect of change-points is vital for priors settings that help to anticipate the effects of any impending change points and accordingly make future projections. The SIR-model with one change point (Fig. 7) centred around the implementation of unlock-3 on 1 August 2020 showed superior goodness of fit with the COVID-19 observed data in India compared to the model with two changepoints announcement of unlock-3 and unlock-4 on 1 August 2020 and 1 September 2020 respectively (Fig. 5) ; both models examined over the period from 25 July 2020 to 30 September 2020, with weekendmodulation. This was evident from the lower (by 43.14) score with one change point (−700.05, = 6.14, = 6.24) (Table 3) , ( 0 = 0.17, and 1 = 0.16, = 0.15, = 17.6, = 0.8, = 2.6, = 8.7 days, λ 1 * = 1 − = 0.01) (Fig. 7) that fitted the observed data better compared to that with two change-points (Fig. 5) with < 1 (= 0.68) lower difference (Table 3 ). This was also clear from the simulation effect of hypothetical inventions on future COVID-19 cases in India, which showed that continuation of pre-unlock situation would have caused further decrease in both daily new and cumulative infected cases ( Fig. 3A and 3D (Table 3) , using Eqs. 16 to 18. SIR models with three change-points described the data better than fewer change points on German data and SIR model with two change points was the best fit on South African data, as exhibited by the cross-validation and all change points coinciding with respective government interventions (Dehning et al., 2020; Mbuvha et al., 2020) . Surveillance of COVID-19 pandemic involved a reporting delay factor (range: 7 − 13 days) that was composed of testing delay between the incubation period of the virus (time period for the symptoms to develop following infection with the virus, with median estimates of 5 − 6 days) and the testing date (1 − 3 days); an additional delay occured between the testing date and results date (1 − 4 days) (Lauer et al., 2020) . The D extrapolated from SIR-model with two change-points (12.8 days) (Fig. 5F ) versus one change point (8.7 days) with weekend modulation (Fig. 7F) , SIR model with two change points without weekend modulation (10.2days) (Fig. 5F ), SEIR model with two change points with weekend modulation (reporting plus incubation delay 10.9 days) (Fig. 6F ), indicate consistency with the above mentioned summated delay factor; the inherent difference within the obtained values might be due to the experimental conditions and model types adopted. The median change duration in all such situations were estimated to be around 3 days that were necessary to enact interventions, in the form of continuing is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 18, 2020. ; and lifting of restrictions based on containment areas. Thus, the reporting delay combined with the change duration ranged from 11 to 16 days, which represented the time gap required to identify any change points in infected case numbers that in conjunction with the effective COVID-19 transmission rate, help to determine pertinent containment measures. The India COVID-19 daily infected case numbers using SIR and SEIR models were estimated to be around 75,000 and 72,000 respectively; the cumulative infected case numbers using both models were estimated at 8000,000 as of October 18, 2020 (for the period from 25 July 2020 to 30 September 2020); the effective transmission rate stabilized at less than zero, 0 = 1 and 0.79 using SIR and SEIR models respectively (Figs. 5 and 6 ). Overall, the SIR model, including weekend-modulation and one-change point, with continuation of intervention similar to the unlock 3 situation was favoured over other models. However, the finding from the SIR model including weekend-modulation and two-change points that, was 0.15, * was 10% and 'zero' during unlock-3 and unlock-4 respectively, implied unlock-4 measure brought 100% reduction of * , beginning around 5 September, 2020 (Fig. 5A-G) , indicating new recoveries exceeding the new infections. Therefore, the epidemic curve is expected to decline to the baseline level, when the effective transmission rate becomes remarkably negative leading to sustained dwindling of new infections, provided no re-infection occurs and non-pharmaceutical interventions such as voluntary face-masking, physical-distancing, in addition to government measures including graded lockdown intervention in containment zones are maintained. Author's contribution: Manisha Mandal and Shyamapada Mandal jointly designed the study, analysed and interpreted the data, discussed and wrote the manuscript. There is no conflict of interest by the authors. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 18, 2020. ; Each unlock measure featured re-opening of activities outside Containment Zones in phased manner and strict lockdown in containment zones only. Specifically unlock-3 removed night curfews, reopened recreational centres like gymnasiums and yoga centres. Unlock4 reopened metro rail in graded manner, and permitted limited gatherings. Under the extended relaxations, social distancing were hypothesized to be ~ 0.9 factor stronger and ~0.9 factor milder respectively, as a pre-and post-effect of unlock measure. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 18, 2020. ; https://doi.org/10.1101/2020.11.17.20233221 doi: medRxiv preprint Time-series SIR model fit estimates of the (A) effective transmission rate * ( ), (B) daily infected cases compared to the observed data, and (C) cumulative infected cases compared to the observed data. Inset shows semi-log plots. Underreporting factor on 18 October 2020, for daily and cumulative infected cases were and respectively, using SIR model. (D-G) Priors and posterior distribution of model parameters, values are expressed in median and 95% of the posteriors. The SIR exhibited 5.97 lower LOO score than SEIR, both models with 2 change points and a weekend factor. Thus SIR model represented better consistency with observed data compared to SEIR model. Time-series SIR model fit estimates of the (A) * ( ), (B) daily infected cases compared to the observed data, and (C) cumulative infected cases compared to the observed data. Inset shows semi-log plots. Underreporting factor on 18 October 2020, for daily infected cases were higher using SEIR model compared to SIR model whereas the underreporting factor for cumulative infected cases were same as SIR model. (D-G) Priors and posterior distribution of model parameters, values are expressed in median and 95% CIs of the posteriors. The SEIR model featured an additional incubation period with prior lognormal (5, 1) scale parameter 0.418, and an initial exposed function 0 ~H alfCauchy (10) . The corresponding priors for reporting delay were (5, 0.2), 0 (2, 0.7), (0.3, 0.3), and the other priors were same as SIR model. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 18, 2020. ; https://doi.org/10.1101/2020.11.17.20233221 doi: medRxiv preprint A Numerical Study of the Current COVID-19 Spread Patterns in India, the USA and the World Bayesian inference of phylogeny Bayesian inference in statistical analysis In wave Fields in Real Media: Theory and numerical simulation of wave propagation in anisotropic, anelastic, porous and electromagnetic media (3 rd edn Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions Impact of non-pharmaceutical interventions (NPIS) to reduce COVID-19 mortality and healthcare demand The mathematics of infectious diseases The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo Institute for Health Metrics and Evaluation (IHME), 2020. COVID-19 mortality, infection, testing, hospital resource use, and social distancing projections BayesSMILES: Bayesian segmentation modeling for longitudinal epidemiological studies A contribution to the mathematical theory of epidemics Automatic differentiation variational inference The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: Estimation and application COVID-19 pandemic scenario in India compared to China and rest of the world: a data driven and model analysis Bayesian inference of COVID-19 spreading rates in South Africa Ministry of Health and Family Welfare Association of public health interventions with the epidemiology of the COVID-19 outbreak in Wuhan, China COVID-19 in India: Moving from containment to mitigation Practical Bayesian model evaluation using leave-oneout cross-validation and WAIC Rank-normalization, folding, and localization: An improved R-hat for assessing convergence of MCMC The peak of COVID-19 in India