key: cord-322543-lo1ra50f authors: Li, z.; Xu, T.; Zhang, K.; Deng, H.-W.; Boerwinkle, E.; Xiong, M. title: Causal Analysis of Health Interventions and Environments for Influencing the Spread of COVID-19 in the United States of America date: 2020-09-29 journal: nan DOI: 10.1101/2020.09.29.20203505 sha: doc_id: 322543 cord_uid: lo1ra50f As of August 27, 2020, the number of cumulative cases of COVID-19 in the US exceeded 5,863,363 and included 180,595 deaths, thus causing a serious public health crisis. Curbing the spread of Covid-19 is still urgently needed. Given the lack of potential vaccines and effective medications, non-pharmaceutical interventions are the major option to curtail the spread of COVID-19. An accurate estimate of the potential impact of different non-pharmaceutical measures on containing, and identify risk factors influencing the spread of COVID-19 is crucial for planning the most effective interventions to curb the spread of COVID-19 and to reduce the deaths. Additive model-based bivariate causal discovery for scalar factors and multivariate Granger causality tests for time series factors are applied to the surveillance data of lab-confirmed Covid-19 cases in the US, University of Maryland Data (UMD) data, and Google mobility data from March 5, 2020 to August 25, 2020 in order to evaluate the contributions of social-biological factors, economics, the Google mobility indexes, and the rate of the virus test to the number of the new cases and number of deaths from COVID-19. We found that active cases/1000 people, workplaces, tests done/1000 people, imported COVID-19 cases, unemployment rate and unemployment claims/1000 people, mobility trends for places of residence (residential), retail and test capacity were the most significant risk factor for the new cases of COVID-19 in 23, 7, 6, 5, 4, 2, 1 and 1 states, respectively, and that active cases/1000 people, workplaces, residential, unemployment rate, imported COVID cases, unemployment claims/1000 people, transit stations, mobility trends (transit) , tests done/1000 people, grocery, testing capacity, retail, percentage of change in consumption, percentage of working from home were the most significant risk factor for the deaths of COVID-19 in 17, 10, 4, 4, 3, 2, 2, 2, 1, 1, 1, 1 states, respectively. We observed that no metrics showed significant evidence in mitigating the COVID-19 epidemic in FL and only a few metrics showed evidence in reducing the number of new cases of COVID-19 in AZ, NY and TX. Our results showed that the majority of non-pharmaceutical interventions had a large effect on slowing the transmission and reducing deaths, and that health interventions were still needed to contain COVID-19. As of August 27, 2020 this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. . https://doi.org/10.1101/2020.09. 29.20203505 doi: medRxiv preprint Suresh, 2020; Anderson et al. 2020) , airflow (Priyadarsini and Suresh, 2020) , and socioeconomics such as median income (Coccia 2020; Saadat et al. 2020 ). The most explored non-pharmaceutical public health interventions and digital technologies for curbing the spread of COVID-19 include social distancing, case isolation and quarantine as well as closuring borders, schools travel restrictions, use of face-masks, and testing ( Although association analysis is of great importance for curbing the spread of COVID-19, association measures dependence between two variables or two sets of variables in the data, and use the dependence for prediction and evaluation of the effects of environmental, socialeconomic factors and public health interventions on the spread of COVID-19 (Altman and Krzywinski 2015; Sharkey and Wood 2020). It is well recognized that association analysis is not a direct method to discover the causal mechanism of complex diseases. Association analysis may detect superficial patterns between intervention measures and transmission variables of COVID- 19 . Its signals provide limited information on the causal mechanism of the transmission dynamics of COVID-19 (Steigera et al. 2020 ). Association analysis has been a major paradigm for statistical evaluation of the effects of influencing factors and health interventions on the spread of COVID-19 (Li et al. 2020) . Understanding the transmission mechanism of COVID-19 based on association analysis remains elusive. The question to uncover the transmission mechanisms of COVID-19 is causal in nature. reuse, remix, or adapt this material for any purpose without crediting the original authors. this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, intervention measures are time series data. An essential difference between time series and crosssectional data is that the time series data have temporal order, but cross sectional data do not have any order. As a consequence, the causal inference methods for cross sectional data cannot be directly applied to time series data. Basic tools in statistical analysis are the raw of large numbers and the central limit theorem. Applications of these tools usually assume that all reuse, remix, or adapt this material for any purpose without crediting the original authors. this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. . https://doi.org/10.1101/2020.09.29.20203505 doi: medRxiv preprint moment functions are constant. When the moment functions of the time series vary over time, the raw of large numbers and the central limit theorem cannot be applied. In order to use basic probabilistic and statistical theories, the nonstationary time series must be transformed to stationary time series (Johansen 1991). A widely used concept of causality for time series data is Granger causality (Granger 1969; Eichler 2013). Underlying the Granger causality is the following two principles: (1) Effect does not precede the cause in time; (2) The effect series contains unique causal series information which is not present elsewhere. The multivariate linear Granger causality test will be used to test causality between the number of new cases and deaths from COVID-19 and environmental, economic and intervention time series variables (Bai et al. 2010). The proposed ANMs and multivariate linear Granger causality analysis methods are applied to the surveillance data of lab-confirmed Covid-19 cases in the US, UMD data, and Google mobility data from March 5, 2020 to August 25, 2020 in order to evaluate the contributions of social-biological factors, economics, the Google mobility indexes, and the rate of virus testing to the number of the new cases and number of deaths from COVID- Nonlinear additive noise models for bivariate causal discovery this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, where n is not necessarily equal to m . Procedures for using the ANM to assess causal relationships between two variables are summarized below (Jiao et al. 2018). Step 1. Regress on using the training dataset and non-parametric regression methods: . (3) Step 2. Calculate the residual using the test dataset and test whether the residual is independent of causal to assess the ANM . Step 3. Repeat the procedure to assess the ANM . reuse, remix, or adapt this material for any purpose without crediting the original authors. this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. . https://doi.org/10.1101/2020.09.29.20203505 doi: medRxiv preprint Step 4. If the ANM in one direction is accepted and the ANM in the other is rejected, then the former is inferred as the causal direction. There are many non-parametric methods that can be used to regress Y on X or regress Cleveland, 1979). In this paper, the smoothing spline regression method was used to fit the regression models. Covariance can be used to measure association, but cannot be used to test independence between two variables with a non-Gaussian distribution. A covariance operator that is a generalization of the finite dimensional covariance matrix to infinite dimensional feature space can be used to test for independence between two variables with arbitrary distributions. Specifically, we will use the Hilbert-Schmidt norm of the cross-covariance operator or its approximation, the Hilbert-Schmidt independence criterion (HSIC) to measure the degree of dependence between the residuals and potential causal variable and test for their independence The covariance operator can be defined as , where are any nonlinear functions and is the covariance operator and is an inner product in the Hilbert space. The Hilbert-Schmidt norm of the covariance operator can be used as criterion for assessing independence between two random variables and is called the Hilbert-Schmidt independence criterion (HSIC). The Hilbert-Schmidt norm of the centered covariance operator is defined as , reuse, remix, or adapt this material for any purpose without crediting the original authors. this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. In summary, the general procedure for testing independence between the average number of new cases or new deaths and the scalar factor or intervention measure is given as follows (Mooij et al. Step 1: Divide a data set into a training data set for fitting the model and a test data set } for testing the independence. Step 2: Use the training data set and smoothing spline regression nonparametric regression methods (a) Regress on : , reuse, remix, or adapt this material for any purpose without crediting the original authors. this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. . https://doi.org/10.1101/2020.09.29.20203505 doi: medRxiv preprint Step 3: Use the test data set and estimated smoothing spline regression nonparametric regression that fits the test data set to predict residuals: (a) , . Step 4: Calculate the dependence measures ) , Step 5: Infer causal direction: this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. . https://doi.org/10.1101/2020.09.29.20203505 doi: medRxiv preprint defined as the proportions of the statistic (computed on the permuted data) greater than or equal to (computed on the original test data ). Before performing multivariate linear Grander causality test, we first need to transform nonstationary time series to stationary time series. Consider an -variable VAR with lags: , where is a dimensional vector, the are coefficient matrices and dimensional residual vector, is assumed to have mean zero ( , with no autocorrelation where matrixes and are functions of matrices . When two non-stationary variables are cointegrated, the VAR model should be augmented with an error correction term for testing the Granger causality (Engle and Granger, 1987). The VECM can be reduced to , where reuse, remix, or adapt this material for any purpose without crediting the original authors. this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. , and (3) Both and and Likelihood ratio tests for multivariate Granger causality are given by the following. (1) The likelihood ration statistics for testing the null hypothesis: is , which is asymptotically distributed as a central under the null hypothesis . (2) The likelihood ration statistics for testing the null hypothesis: is , reuse, remix, or adapt this material for any purpose without crediting the original authors. this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. . which is asymptotically distributed as a central under the null hypothesis . ( The scalar variables tested for causation of the new cases and deaths from COVID-19 in the US included the number of contact tracing workers per 100,000 people, percent of population above 60 years of age, median income, population density, percentage of African Americans, reuse, remix, or adapt this material for any purpose without crediting the original authors. this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. . https://doi.org/10.1101/2020.09.29.20203505 doi: medRxiv preprint percentage of Hispanic Americans, percentage of males, employment density, number of points of interests for crowd gathering per 1000 people, number of staffed hospital beds per 1000 people, and number of ICU beds per 1000 people. The number of new cases and deaths were averaged over time. Each state was a sample. Since the sample sizes were small, the P-value for declaring significance was 0.05 without Bonferroni correction for multiple comparison. The Pvalues for testing 11 scalar potential causes of the number of new cases and deaths from COVID-19 in the US were summarized in Table 1 . We observed from Table 1 that population density (Pvalue < 0.0002) and percentage of males (P-value < 0.03) showed significant evidence of causing the spread of COVID-19. Percentage of Hispanic Americans (P-value < 0.0575) was close to significance. Percentage of African American (P-value < 0.024) and population density (P-value < 0.025) showed significant evidence of causing deaths due to COVID-19. P-values of employment density (P-value < 0.059) and percentage of Hispanic Americans (P-value < 0.064) were close to significance level 0.05 for causing death. The second most significant demographic risk factor for the spread of COVID-19 was percentage of males. We found higher COVID-19 morbidity in males than females. However, we did not find higher COVID-19 mortality in males than females. Population density was an important risk factor for both the spread and death from COVID- this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. . https://doi.org/10.1101/2020.09.29.20203505 doi: medRxiv preprint percentage of Hispanic Americans was a weak risk factor for both the spread and death fromCOVID-19, while the literature showed stronger evidence that Hispanic communities were highly vulnerable to COVID-19 (Calo et. Al. 2020) . It was reported that higher COVID-19 mortality in males than females can be due to the following factors (Bwire 2020). The first factor was higher expression of angiotensin-converting enzyme-2 (ACE 2; receptors for coronavirus) in males than females. The second factor was sexbased immunological differences due to sex hormone and the X chromosome. Daily mobility and social distancing data from a COVID-19 impacted the analysis platform, Google community mobility reports, 2020). The total number of variables to be tested was 19. The P-value for declaring significance after Bonferroni correction was 0.0025. However, although CA was most affected and the most populated state, all 19 metrics except reuse, remix, or adapt this material for any purpose without crediting the original authors. this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. . https://doi.org/10.1101/2020.09.29.20203505 doi: medRxiv preprint protest attendance showed a strong significance in causing rapid spread of COVID-19 (Table 2 and Table S1 ). Table 4 listed the most significant risk factor for the new cases of COVID-19 in each of the 50 states in the US. Active Cases/1000 People, workplaces, number of tests completed/1000 people, imported COVD cases, unemployment rate and unemployment claims/1000 people, mobility trends for places of residence (residential), retail & recreation, mobility trends for places like restaurants, cafes, shopping centers, theme parks, museums, libraries, and movie theaters (retail) reuse, remix, or adapt this material for any purpose without crediting the original authors. this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. . https://doi.org/10.1101/2020.09.29.20203505 doi: medRxiv preprint and test capacity were the most significant risk factors for the new cases of COVID-19 in 23, 7, 6, 5, 4, 2, 1 and 1 states of the US, respectively. Table 5 summarized the most significant risk factor for the deaths from COVID-19 in each of the 50 states in the US. Active Cases/1000 people, workplaces, residential, unemployment rate, imported COVID cases, unemployment claims/1000 people, transit, test done/1000 people, grocery, testing capacity, retail, percentage of change in consumption, percentage of working from home were the most significant risk factor for the deaths of COVID-19 in 17, 10, 4, 4, 3, 2, 2, 2, 1, 1, 1, 1 states, respectively. We also observed that the number of protest attendees showed mild significant evidence to cause increasing the number of new cases of COVID-19 in KY (Pvalue < 0.00012), KS (P-value < 0.00026), NH (P-value < 0.00108), MA (P-value < 0.0016) and TN (P-value < 0.0024) or to cause more deaths from COVID-19 in OR (P-value < 5.11 E-05), TX (P-value < 0.00017), ME (P-value < 0.00028), KS (P-value < 0.00061), MI (P-value < 0.0015), OH (P-value < 0.0021) and NC (P-value < 0.0023). To illustrate the causal relationships between the risk factors and the number of new cases and deaths from COVID-19, we plotted Figures 1 and 2 . Figure 1 plotted the social distance index curves as a function of time from March 5, 2020 to August 25, 2020 in Florida (FL) and Rhode Island (RI). Figure 1 showed that the social distance index in FL was much higher than that in RI state, which resulted in the larger number of new cases of COVID-19 in FL than that in RI. Our data suggested that men were more vulnerable to Covid-19 than women . However, our analysis did not conclude that more men than women were dying from COVID-19. reuse, remix, or adapt this material for any purpose without crediting the original authors. this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. . https://doi.org/10.1101/2020.09. 29.20203505 doi: medRxiv preprint We also discovered that more Black Americans were dying from COVID-19. The reasons for this were complex. Black Americans had higher rates of chronic disease conditions, including diabetes, heart disease, and lung disease, were poor and more easily exposed to the COVID-19, and lived in the cramped housing. Inequities in the social determinants of health affected mortality and morbidity of COVID-19 for Hispanic Americans with much milder significance. states. Therefore, workplaces should be considered as a very important risk mitigation measure to reduce the number of new cases and deaths from COVID-19. Tests done/1000 people was the second population intervention in the US. It was the significant cause of the new cases of reuse, remix, or adapt this material for any purpose without crediting the original authors. this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. . https://doi.org/10.1101/2020.09.29.20203505 doi: medRxiv preprint COVID-19 in 46 states and significant cause of death in 47 states. Virus test results in quick case identification and isolation to contain COVID-19, and rapid treatment to reduce the number of deaths. Imported COVID cases were also a top significant risk factor for speeding the spread and increasing the deaths from COVID-19. Our results showed that the imported COVID case metric was the significant causal factor for the new cases in 46 states and the significant causal factor for the deaths in 47 states. Our results showed that the high numbers of cases and deaths from COVID-19 were due to lacking strong interventions and high population density. We observed that no metrics showed significant evidence in mitigating the COVID-19 epidemic in FL and only a few metrics showed evidence in reducing the number of new cases of COVID-19 in AZ, NY and TX. Our results showed strong interventions were needed to contain COVID-19. Although we tried to systematically and comprehensively analyze the data, this study has multiple limitations. First, we only analyzed the causal relationship between mobility patterns and the number of new cases or deaths and ignored the role of other potential mitigating factors (e.g, wearing face masks) that could also have contributed to the reduction of new cases or deaths from COVID-19. When data are available, more metrics should be included in the analysis. Second, we have not addressed the confounding bias issue. When confounding is unknown, adjusting for confounding methods cannot be applied to eliminate confounding bias from the causal analysis. Unadjusted confounding bias will distort the inferred (true) causal relationship between the number of new cases or deaths from COVID-19, and metrics for social distancing when these two variables share common causes. This will have substantive implications for reuse, remix, or adapt this material for any purpose without crediting the original authors. this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. . https://doi.org/10.1101/2020.09.29.20203505 doi: medRxiv preprint developing interventions to mitigate the spread of COVID-19 and reduce the deaths from COVID-19. However, removing confounding from causal analysis for COVID-19 is complicated and will be investigated in the future. In summary, our analysis has provided information for both individuals and governments to plan future interventions on containing COVID-19 and reduction of deaths from COVID-19. HW Deng was partially supported by NIH grants U19AG05537301 and R01AR069055. Momiao Xiong was partially supported by NIH grants U19AG05537301. The authors thank Sara Barton for editing the manuscript. reuse, remix, or adapt this material for any purpose without crediting the original authors. this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. reuse, remix, or adapt this material for any purpose without crediting the original authors. this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. . https://doi.org/10.1101/2020.09.29.20203505 doi: medRxiv preprint Grocery: Mobility trends for places like grocery markets, food warehouses, farmers markets, specialty food shops, drug stores, and pharmacies. Parks: Mobility trends for places like local parks, national parks, public beaches, marinas, dog parks, plazas, and public gardens. Transit: Transit stations, mobility trends for places like public transport hubs such as subway, bus, and train stations. Workplaces: Mobility trends for places of work. Residential: Mobility trends for places of residence. Test Rate: Ratio of the number of individuals who have taken the virus test over the total population in the region. Attendee: Number of attendees in the protest. this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. . https://doi.org/10.1101/2020.09.29.20203505 doi: medRxiv preprint this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. . https://doi.org/10.1101/2020.09.29.20203505 doi: medRxiv preprint reuse, remix, or adapt this material for any purpose without crediting the original authors. this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. . https://doi.org/10.1101/2020.09.29.20203505 doi: medRxiv preprint reuse, remix, or adapt this material for any purpose without crediting the original authors. this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. . https://doi.org/10.1101/2020.09.29.20203505 doi: medRxiv preprint reuse, remix, or adapt this material for any purpose without crediting the original authors. this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this version posted September 29, 2020. . https://doi.org/10.1101/2020.09.29.20203505 doi: medRxiv preprint Time to use the p-word? Coronavirus enters dangerous new phase Factors influencing the epidemiological characteristics of pandemic COVID 19: A TISM approach The math behind why we need social distancing, starting right now Understanding Economic and Health Factors Impacting the Spread of COVID-19 Disease Influencing factors of COVID-19 spreading: a case study of Thailand COVID-19 pandemic: environmental and social factors influencing the spread of SARS-CoV-2 in the expanded metropolitan area of São Paulo A country level analysis measuring the impact of government actions, country preparedness and socioeconomic factors on COVID-19 mortality and related health outcomes Socioeconomic Factors influencing the Spatial Spread Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study An empirical investigation of chronic diseases: a visualization approach to Medicare in the United States COVID-19 spread in the UK: the end of the beginning? Environmental perspective of COVID-19 Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe School closure and management practices during coronavirus outbreaks including COVID-19: a rapid systematic review Digital technologies in the public-health response to COVID-19 Effectiveness of airport screening at detecting travelers infected with 2019-nCoV Applications of digital technology in COVID-19 pandemic planning and response Association, correlation and causation The Causal Effect of Social Distancing on the Spread of SARS-CoV-2 Causal analysis of COVID-19 observational data in German districts reveals effects of mobility, awareness, and temperature Difference-in-Differences to Identify Causal Effects of COVID-19 Policies. Discussion Papers of DIW Berlin 1870 Multivariate Analysis of Factors Affecting COVID-19 Case and Death Rate Causal Inference for COVID-19 Interventions Dynamic causal modelling of COVID-19 Causal impact of masks, policies, behavior on early COVID-19 pandemic in the U Air transportation, population density and temperature predict the spread of COVID-19 in Brazil High population density in India associated with spread of COVID-19 Strong Effects of Population Density and Social Characteristics on Distribution of COVID-19 Infections in the United States Racial demographics and COVID-19 confirmed cases and deaths: a correlational analysis of 2886 US counties Reaching the Hispanic Community About COVID-19 Through Existing Chronic Disease Prevention Programs Effect of non-pharmaceutical interventions to contain COVID-19 in China. medRxiv