key: cord-0699688-40y8688t authors: Her, Peter Hyunwuk; Saeed, Sahar; Tram, Khai Hoan; Bhatnagar, Sahir R title: Novel mobility index tracks COVID-19 transmission following stay-at-home orders date: 2022-05-10 journal: Sci Rep DOI: 10.1038/s41598-022-10941-2 sha: ea81647aae3376cf77ed081d989a859feb1ea5e0 doc_id: 699688 cord_uid: 40y8688t Considering the emergence of SARS-CoV-2 variants and low vaccine access and uptake, minimizing human interactions remains an effective strategy to mitigate the spread of SARS-CoV-2. Using a functional principal component analysis, we created a multidimensional mobility index (MI) using six metrics compiled by SafeGraph from all counties in Illinois, Ohio, Michigan and Indiana between January 1 to December 8, 2020. Changes in mobility were defined as a time-updated 7-day rolling average. Associations between our MI and COVID-19 cases were estimated using a quasi-Poisson hierarchical generalized additive model adjusted for population density and the COVID-19 Community Vulnerability Index. Individual mobility metrics varied significantly by counties and by calendar time. More than 50% of the variability in the data was explained by the first principal component by each state, indicating good dimension reduction. While an individual metric of mobility was not associated with surges of COVID-19, our MI was independently associated with COVID-19 cases in all four states given varying time-lags. Following the expiration of stay-at-home orders, a single metric of mobility was not sensitive enough to capture the complexity of human interactions. Monitoring mobility can be an important public health tool, however, it should be modelled as a multidimensional construct. Mobility index. We defined mobility as a change of each mobility metric relative to the average of the week before (time-updated rolling average). For each county j = 1, . . . , 365 , we index each of the 6 mobility metrics s = 1, . . . , 6 by calendar day t = 1, . . . , m j , where m j is the total number of observed days since re-opening in county j. We define the following quantities: • X j,t,s : the scalar value of mobility metric s measured on day t in county j. • X j,t−8,...,t−1,s = (X j,t−8,s , . . . , X j,t−1,s ) ∈ R 8 : the value of mobility metric s measured on days t − 8, . . . , t − 1 , i.e., the 7 days prior to day t in county j. This is a vector quantity. • X j,t−8,...,t−1,s : the average of the X j,t−8,...,t−1,s . The change from baseline mobility metric s for day t in county j is given by The use of a rolling average is unique to this analysis. Most studies have used a static relative baseline period such as mobility trends between January until February 2020 9, 33, 34 . This common approach does not account for seasonal mobility variability or changes as a result of the pandemic [35] [36] [37] . In contrast, our baseline (rolling average) takes into consideration temporal trends that were likely changing with evolving public health policies. To check for outliers and appropriateness of the use of the average, we calculated the coefficient of variation (CV) for each individual metric in each county (Supplemental Figure S13 ). We found that there were no strong outliers, as all the CV were less than 2.5, suggesting that the average was a valid metric to use. We also considered using the median but did not find a significant difference in the results relative to the average. www.nature.com/scientificreports/ Since our hypothesis was each metric could be attributed to a common underlying notion of mobility, we used an unsupervised machine learning method known as functional principal component analysis (fPCA) to create our latent mobility index 38 . Briefly, PCA is a technique for reducing the dimensionality of multiple variables while minimizing information loss. This is done by creating new uncorrelated variables (principal components) that successively maximize variance. A "functional" PCA accounts for the longitudinal nature of the data. We applied fPCA on X j,t,s separately for each county and extracted the first principal component, i.e., the linear combination of individual mobility metrics that explained the most variance. We denote this first principal component by fPCA j,t , a score summarizing mobility in each county (j) on a given day (t). To enable comparability between counties and states, fPCA j,t were scaled as Z-scores, which defined our mobility index (MI) given by: where fPCA j,t = 1 m j m j k=1 fPCA j,k and Var(fPCA j,t ) are the average and variance of the fPCA scores in county j over the observed time period, respectively. The interpretation of MI is as follows: MI = 0 , on average there was no change in mobility relative the previous week; MI = 1 on average there was an increase in mobility by one standard deviation relative to the last week and: MI = −1 on average there was a decrease in mobility by one standard deviation relative to the last week. An animation was created to visualize the relative daily changes of MI by counties. Association with COVID-19. For each county j, let y j,t be the number of confirmed COVID-19 cases on day t = 1, . . . , m j , and q j,t = [MI j,t−0 , . . . , MI j,t−21 ] denote the vector of lagged occurrences of our mobility index (defined in Eq. (2)) with 0 days and 21 days as minimum and maximum lags, respectively. In words, the first element of q j,t represents the value of our mobility index on day t, the second element represents the value of our mobility index one day prior to t, and so on. From the time stay-at-home orders expired until December 8 2020, the relationship between daily counts of COVID-19 cases ( y j,t ) and mobility ( q j,t ), accounting for up to 21 days of lag, was estimated with a quasi-Poisson hierarchical generalized additive model (HGAM) 39,40 of the form: where β 0 is the intercept, s(·) are the smooth non-parametric functions of the predictor variables, CCVI j is the COVID-19 Community Vulnerability Index and density j is the population density (people per square kilometer) at the county-level. The term s(q j,t ) in Eq. (3) captures the potentially non-linear and delayed effect of mobility on COVID-19 cases through a cross-basis function 41 . We used penalized cubic regression splines 39 for both dimensions, with interior knots placed at Z-scores of −3, −2, −1, 0, 1, 2, 3 for MI j,t , and 7 and 14 days for the lag. The penalty is given by β ⊤ Sβ , where β are the regression parameters, S is a matrix of known coefficients 42 , and is the tuning parameter that controls the degree of smoothing and chosen via generalized cross-validation. Given the heterogeneity of COVID-19 epidemiology across counties, models included both a state level calendar time effect s(time t ) using thin plate regression splines 43 and a county level calendar time effect s j (time t ) using a factor-smoother interaction basis 40 . Population size was used as offset in each model. A separate model was run for each of the selected states. There were two main advantages for using a HGAM to evaluate the association between mobility and COVID-19 cases: (1) it can quantify the non-linear functional relationships over time where the shape of each function varies across counties, and (2) it has the capacity of modelling varying lags 44 . Based on a recent systematic review of 42 studies, the mean incubation period of SARS-CoV-2 was 8 days (95% CI 10.3, 16 days) 45 . This lag between time of infection and becoming symptomatic/testing positive can vary at both the patient-level (the time it takes to develop symptoms and get tested) and also at the county-level (the time it takes to perform and report the test results). Given this variation, we were able to control for varying lagged exposures (up to 21 days) at the county-level. To evaluate the utility of our mobility index, we compared a dose-response relationship between mobility and COVID-19 cases and goodness-of-fit-statistics of our latent MI compared to a single measure of mobility (the fractions of devices leaving the home). All analyses were performed using R version 4.0.2 46 along with the mgcv 39 and dlnm 47 packages. Code and data for reproducing all the results, figures and animation in this paper is available at https:// github. com/ sahir bhatn agar/ covid 19-mobil ity. Mobility patterns. Daily mobility changes of three hundred sixty-five counties from the four most populous states in the Midwest: Illinois, Ohio, Michigan and Indiana were analyzed between January 1 2020 until December 8 2020. State-level sociodemographic and economic characteristics were similar across four states and are summarized in Table 1 . Figure 1 illustrates the average daily changes of the six mobility metrics between January and December 2020 of each state (average of all counties) relative to the week before. Overall, each metric had a unique trajectory but trends were similar across four states. Based on the average change, the number of devices not at home and delivery behavior (more than 3 stops lasting for less than 20 mins) remained stable throughout time. There was more variation in the metrics related to full-time work and the time spent away from home. Of the four states, mobility changes were more pronounced in Ohio. Across all states, relative to the previous seven days, mobility increased daily between March and May. Mobility metrics varied considerably by counties ( S1, S2, S3, S4), illustrating how aggregating changes at the state-level may mask granular changes at the county-level. www.nature.com/scientificreports/ First fPCA summarizes mobility patterns by counties. We created a latent index of mobility by counties as given by Eq. (2) which is derived from the first fPCA. Table 2 provides the median and inter quartile range of the proportion of variance explained by the first fPCA across counties in a given state. We saw that more than 50% of the variance was explained by the first fPCA for a majority of all the counties analyzed, indicating good dimension reduction. In Supplemental Figures S5, S6 , S7 and S8, we provide the absolute Pearson correlations between our MI and each individual metric by county for Illinois, Ohio, Michigan and Indiana, respectively. Furthermore, the correlations were particularly strong with full-/part-time work behavior as well as time spent away from home. Importantly, there was significant heterogeneity across counties, which would otherwise be missed when aggregating mobility metrics at the state level. Figure 2 compares the changes of the MI from the day stay-at-home policies expired and July 4 (Independence Day). Blue shades indicate MI <0 (decrease in mobility) and red shades indicate MI>0 (increase in mobility). This graph provides some evidence that our MI is appropriately capturing mobility as we would expect there www.nature.com/scientificreports/ to be more movement on a traditionally busy U.S. holiday compared to earlier on in the pandemic when stayat-home orders were lifted. In the Supplemental material, we also provide an animation illustrating the daily changes from reopening to December 8, 2020. The animation shows substantial difference in mobility patterns across counties that vary from day to day. The most dramatic change over time is increases in mobility from a weekday to a weekend. To evaluate the validity of the MI, we compared its association with COVID-19 cases and a commonly used single metric of mobility (fraction of devices leaving home) (Fig. 3) . Notably, the single metric was not associated with COVID-19 cases in any state at any lagged time point. However, there was a clear dose-response relationship between our MI and COVID-19 cases following a 10-21-day time lag in all four states. Across all four states the MI model resulted in significantly better goodness-of-fit statistics compared to the single metric (Table 3 ). The COVID-19 pandemic is now fueled by highly transmissible variants of concern. Understanding the association between mobility and disease transmission can help tailor non-pharmaceutical interventions to mitigate outbreaks and potentially be used as an early indicator for surges in new infections. We leveraged freely available cell phone data with an unsupervised machine learning approach to create a multidimensional index of mobility. Results from our study suggest following the expiration of stay-at-home physical distancing policies, single metrics of mobility were not sensitive enough to capture the complexity of human mobility related to disease transmission. Our MI was correlated with COVID-19 cases from all counties in Illinois, Ohio, Michigan and Indiana. In comparison, the single metric of mobility (fraction of devices leaving home) was not associated with incident cases. Our results also demonstrate the importance of evaluating changes at a granular level as there was significant heterogeneity within states. Tracking mobility has the potential of becoming a powerful tool to determine the impact of public health policies 25 . A growing subfield of COVID-19 research involves the analysis of mobility data and patterns. In the last 2 years, a variety of metrics and sources have been used to track mobility 49, 50 . Initial studies evaluated how populations adhered to stay-at-home policies by tracking mobility trends 11, 51 . Later it became evident that mobility may be useful as a public health surveillance tool as studies evaluated the correlation between mobility and COVID-19 diagnoses. 3,52-59 A study by Lasry et al. used Safegraph mobility data as a proxy for social distancing in the metropolitan areas of Seattle, San Francisco, New York City, and New Orleans and found an association between changes in mobility (% personal mobile devices leaving home) at the state-level and COVID-19 cases during the first COVID-19 wave 9 . In all four metropolitan areas, the number of mobile devices leaving home Model results comparing the MI and its association with COVID-19 cases and a commonly used single metric of mobility (fraction of devices leaving home). For each state, the left panel summarizes the multidimensional MI; the right panel represents the percentage of devices leaving their home (x-axis); y-axis is the adjusted incidence rate ratio of COVID-19, at varying lagged response (0-21 days) (z-axis). www.nature.com/scientificreports/ threshold during the data aggregation period. This means that less densely populated areas are most likely to be underrepresented by Facebook mobility data. Google also provides another source of aggregate mobility using the time spent by users at several geolocations using Google Maps. While studies using Google data have found that increases in mobility lead to increased COVID-19 cases and death 61 , to date studies have not taken into account changes in mobility following the end of stay-at-home policies. Apple Mobility also provides mobility data using Apple Maps to create aggregated counts of direction requests. A study by James and Menzies used Apple Data and found mobility data and national financial indices exhibited similarities in their trajectories. Apple Mobility has several limitations as it includes any searches for directions as a measure of mobility and therefore may not be representative of actual community mobility. 62 Mobility data offers several functions as a public health tool. While we focused directly on the number of COVID-19, cases, mobility can also be used to estimate and model of transmission rates 52, 60 . Spatially explicit models of disease transmission using census data are often used to guide disease intervention decisions. However, it remains important to define mobility as a multidimensional construct. We demonstrated among hundreds of counties from four states, time-updated relative changes were associated with increases in COVID-19 cases. Furthermore, results from our study suggest our mobility should be considered an important confounder when evaluating the impact of other non-pharmaceutical interventions. The strength of our study was the use of multiple advanced statistical methods to measure mobility and then validating its utility by evaluating its association with COVID-19 cases. The fPCA used to create the mobility index effectively captured the heterogeneity of the individual metrics over time and across counties within a given state. The unsupervised nature of this approach prevented the model from overfitting when evaluating the association with cases. Furthermore, we modelled a non-linear functional relationship between mobility and COVID-19 cases using a HGAM model while simultaneously fitting different lagged time periods. The expectation that the lag time should vary across states was confirmed by our results. The use of these methods has been under appreciated in the epidemiological and public health studies; we provide code and data to expand the use as we believe these methods could have wide applications in future research. We also highlight the need to track population-level mobility at a granular level, as we show significant heterogeneity across counties. Our study also has limitations. The results of our study are based on data from all 365 counties from four states in the Midwest. While these counties represented varying population densities, socioeconomic conditions, and party affiliations (that may have resulted in different adherence and uptake of other NPI) our results may have limited generalizability to other larger metropolitan cities. Cell phone data was freely available and could help to predict trends during the pandemic but it is only a proxy for human contact. In this study we attempted to define a more robust definition of mobility, however it still remains a surrogate exposure. The association between mobility and COVID-19 cases may be underestimated, given our outcome is dependent on testing. Testing capacity has significantly changed throughout the pandemic in the United States. Seroprevalence studies estimate case detection is underrepresented by a factor of three times 63 . Although we do not believe this underrepresentation is differential between counties, outcomes such as COVID-19 related deaths and hospitalizations may be less biased. While the advantage is clear, the utility of these outcomes as a "real-time" public health tool is debatable as the latency period (time of infection to outcome) is long (greater than 21 days). As with all observational studies, associations should not be interpreted causally. Our model does not take into consideration confounding interventions that could also increase or mitigate transmission such as the proportion of the population adhering to physical distancing guidelines, wearing masks, interactions outside vs inside or air quality. To effectively measure social distancing patterns using individual-level data (either cell phones or wearable technology such as fitness trackers), would be more sensitive compared to aggregate data, but this raises ethical and privacy concerns 64 . Recent reports have hypothesized the COVID-19 pandemic may not be following a normal distribution but over dispersed or driven by "super spreader" transmission events which we did not account for in our model 65 . Finally, while PCA has the advantage of reducing overfitting, it has several assumptions and limitations. We must assume the features are related to each other in a linear fashion, and that the data can be appropriately summarised by the mean and variance 66 . Furthermore, PCA can be heavily influenced by outliers (three times the standard deviation from the sample mean), requires that the PCs are orthogonal to each other, and results in information loss due to selecting a relatively small number of PCs for downstream analysis. Specifically for our data, we show in Table 2 that selecting the first fPCA explains over 50% of the variance explained for a majority of all the counties analysed. The data did not have any significant outliers as seen in Supplemental Figure S13 , which shows that the coefficients of variation (standard deviation divided by the mean) is less than 2.5 for all mobility metrics across all counties. We did not pursue nonlinear dimension reduction techniques such as kernel PCA, but think this would be an interesting direction for future research. Our study underscores the potential of using freely available cell phone data as public health tool. We show changes in mobility can be used a predictor of surges in COVID-19 cases. However, monitoring mobility in the absence of strict non-pharmaceutical interventions such as stay-at-home policies will require robust definitions. Deliberation, dissent, and distrust: Understanding distinct drivers of Coronavirus disease 2019 vaccine hesitancy in the United States COVID-19 vaccine tracker: See your state's progress Mobility and mortality during the COVID-19 pandemic The impact of the COVID-19 pandemic on Italian mobility Influence of population mobility on the novel coronavirus disease (COVID-19) epidemic: Based on panel data from Hubei The mobility gap: Estimating mobility thresholds required to control SARS-CoV-2 in Canada Impact of the Delta variant on vaccine efficacy and response strategies Nonpharmaceutical interventions remain essential to reducing Coronavirus disease 2019 burden even in a well-vaccinated society: A modeling study Timing of community mitigation and changes in reported COVID-19 and community mobility-four US metropolitan areas Global and local mobility as a barometer for COVID-19 dynamics Mobile device location data reveal human mobility response to state-level stay-at-home orders during the COVID-19 pandemic in the USA Effects of social distancing on the spreading of COVID-19 inferred from mobile phone data Lockdown for COVID-19 and its impact on community mobility in India: An analysis of the COVID-19 Community Mobility Reports Analysis of mobility trends during the COVID-19 coronavirus pandemic: Exploring the impacts on global aviation and travel in selected cities Mobility and sales activity during the Corona crisis: Daily indicators for Switzerland Mobility and the effective reproduction rate of COVID-19 Evaluating the impact of mobility on COVID-19 pandemic with machine learning hybrid predictions Reduction in mobility and COVID-19 transmission Effects of human mobility restrictions on the spread of COVID-19 in Shenzhen, China: A modelling study using mobile phone data World Health Organization. Tracking SARS-CoV-2 variants Pathophysiology, transmission, diagnosis, and treatment of Coronavirus disease 2019 (COVID-19): A review The silent and dangerous inequity around access to COVID-19 testing: A call to action Hospitalization rates and characteristics of patients hospitalized with laboratory-confirmed Coronavirus disease 2019-COVID-NET, 14 States COVID-19 exacerbating inequalities in the US Aggregated mobility data could help fight COVID-19 COVID-19) Data in the United States Surgo Ventures. The U.S. covid community vulnerability index (CCVI) A COVID-19 community vulnerability index to drive precision policy in the US A social vulnerability index for disaster management Peer reviewed: The role of public health in COVID-19 emergency response efforts from a rural health perspective Centers for Disease Control & Prevention. COVID-19 secondary data and statistics 2020 Pandemic politics: Timing state-level social distancing responses to COVID-19 Community movement and COVID-19: A global study using Googles community mobility reports Mapping county-level mobility pattern changes in the United States in response to COVID-19 Surveillance metrics of SARS-CoV-2 transmission in central Asia: Longitudinal trend analysis Moving average based index for judging the peak of the COVID-19 epidemic Seasonality of tuberculosis in the Republic of Korea Functional data analysis Generalized Additive Models: An Introduction with R. Chapman & Hall/CRC Texts in Statistical Science Hierarchical generalized additive models in ecology: An introduction with mgcv Distributed lag non-linear models Thin plate regression splines A penalized framework for distributed lag non-linear models The incubation period during the pandemic of COVID-19: A systematic review and meta-analysis R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Distributed lag linear and non-linear models in R: The package dlnm The New York Times. Presidential Election Results: Biden Wins How mobility habits influenced the spread of the COVID-19 pandemic: Results from the Italian case study The effect of human mobility and control measures on the COVID-19 epidemic in China Policy and weather influences on mobility during the early US COVID-19 pandemic Real-time pandemic surveillance using hospital admissions and mobility data Quantifying COVID-19 importation risk in a dynamic network of domestic cities and international countries Monitoring the COVID-19 epidemic with nationwide telecommunication data Intracounty modeling of COVID-19 infection with human mobility: Assessing spatial heterogeneity with business traffic, age, and race Impacts of introducing and lifting nonpharmaceutical interventions on COVID-19 daily growth rate and compliance in the United States Mobility restrictions are more than transient reduction of travel activities COVID-19 lockdown induces disease-mitigating structural changes in mobility networks Human mobility and Coronavirus disease 2019 (COVID-19): A negative binomial regression analysis Risk mapping for COVID-19 outbreaks in Australia using mobility data Stay-at-home works to fight against COVID-19: International evidence from Google mobility data Efficiency of communities and financial markets during the 2020 pandemic Use of US blood donors for national serosurveillance of severe acute respiratory syndrome Coronavirus 2 antibodies: Basis for an expanded national donor serosurveillance program Ethical implications of user perceptions of wearable devices Mobility network models of COVID-19 explain inequities and inform reopening Principal Component Analysis All the authors have seen and approved the final manuscript and have participated sufficiently in the work to take public responsibility for its content. S.S. and P. The authors declare no competing interests. The online version contains supplementary material available at https:// doi. org/ 10. 1038/ s41598-022-10941-2.Correspondence and requests for materials should be addressed to S.R.B.Reprints and permissions information is available at www.nature.com/reprints.Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.