key: cord-0741068-kizmret5 authors: Cho, Jung-Hoon; Kim, Dong-Kyu; Kim, Eui-Jin title: Multi-scale causality analysis between COVID-19 cases and mobility using ensemble empirical mode decomposition and causal decomposition date: 2022-04-30 journal: Physica A DOI: 10.1016/j.physa.2022.127488 sha: 0eadd2717808f0841ac84eb96b8dec86555baa5a doc_id: 741068 cord_uid: kizmret5 The global spread of the coronavirus disease 2019 (COVID-19) pandemic has affected the world in many ways. Due to the communicable nature of the disease, it is difficult to investigate the causal reason for the epidemic’s spread sufficiently. This study comprehensively investigates the causal relationship between spread of COVID-19 and mobility level on a multi time-scale and its influencing factors, by using ensemble empirical mode decomposition (EEMD) and the causal decomposition approach. Linear regression analysis is used to investigate the significance and importance of the influential factors on the intrastate and interstate causal strength. The results of an EEMD analysis indicate that the mid-term and long-term domain portrays the macroscopic component of the states’ mobility level and COVID-19 cases, which represents overall intrinsic characteristics. In particular, the mobility level is highly associated with the long-term variations of COVID-19 cases rather than short-term variations. Intrastate causality analysis identifies the significant effects of median age and political orientation on the causal strength at specific time-scale, and some of them cannot be identified from the existing method. Interstate causality results show the negative association with the interstate distance and the positive one with the airline traffics in the long-term domain. Clustering analysis confirms that the states with the higher the gross domestic product and the more politically democratic tend to more adhere to social distancing. The findings of this study can provide practical implications to the policymakers that whether the social distancing policies are effectively working or not should be monitored by long-term trends of COVID-19 cases rather than short-term. The recent coronavirus disease 2019 (COVID- 19) pandemic has affected daily lives in various 2 ways. The United States has suffered from the pandemic by being one of the countries with many decomposition has merits when investigating the causality of the mutual causation system in different 24 frequency domains (Yang et al., 2018) . Absolute and relative cause strengths derived from causal 25 decomposition are also compared to find the causal relationships between the paired IMFs. 26 The purpose of this paper is to comprehensively investigate the spread of COVID-19 in different 27 frequency domains (i.e., different time-scale) that have varied from the short-term to long-term influences 28 by signal decomposition methods. Also, linear regression analysis is applied to identify the state-specific 29 influential factors for intrastate causal strength between the number of COVID-19 cases and the number 30 of daily trips (i.e., social distancing measure) and then compare their significance and importance. 31 Further, interstate causal analysis is also performed to identify the mutual causation between different 32 states for different time scales and find the strength and direction of the causal relationship among the 33 states. Lastly, clustering analysis is conducted to group the states by energy distribution of COVID-19 34 EEMD is also useful when investigating the intrinsic multi-scale characteristics of various 23 sources (Kim et al., , 2019 . The IMFs obtained from the EEMD method were 24 grouped into high-, medium-, and low-frequency components, which can represent the short-, medium-, 25 and long-term variations of original signal, respectively . The IMFs extracted by EMD 26 process well represents the local characteristics of the passenger flows (Chen and Wei, 2011). 27 Causal decomposition is the causality analysis method based on EEMD. It is known that the 28 combination of various factors affects the transmission of infectious diseases in different frequency 29 domains (Liu et al., 2021b) . Therefore, if the effects of these factors could be analyzed by decomposing 30 them into signals having different periods, the hidden characteristics caused by the interference of 31 different periods could be disclosed more clearly. We thus applied the casual decomposition to reveal the 32 unseen causal relationship by separating signals with different frequency domains. Other methods can be 33 considered aside from the causal decomposition. For instance, Granger causality is superior in a linear 34 stochastic situation where separability is guaranteed, but it is not proper to apply in a non-linear 1 deterministic system. On the other hand, causal decomposition has the advantage of reflecting real-world 2 data and phenomena based on instantaneous phase dependency between cause and effect, that is, 3 oscillatory stochastic and deterministic mechanisms (Yang et al., 2018) . Several studies applied this 4 technique to the change rate for the GDP time series between major on a different time scale (Mao et al., 5 2020 ) and the investigation of Malaria epidemics (Craciunescu et al., 2019) . 6 The effect of social distancing and travel restriction on the COVID-19 outbreaks has been 7 investigated in the reviewed studies. Their efforts are mainly focused on the influence of road or air traffic 8 on the spread of COVID-19, or microscopic analysis of specific policy such as lockdown. However, due 9 to the difficulty of quantifying and decomposing the macroscopic data, most of the reviewed studies did 10 not consider the macroscopic effect of COVID-19 cases and social distancing in state-level. Therefore, 11 EEMD and causal decomposition, which have been applied in the analysis of other macroscopic time-12 series, can fill this research gap. Empirical mode decomposition (EMD) was developed to decompose the non-linear and non-stationary 3 signals into orthogonal sub-signals called intrinsic mode functions (IMFs) and trends, which represent the 4 distinct time scale (Huang et al., 1998) . The decomposed IMFs represent the short-term periodical 5 patterns or part of long-term trend in each time scale. Hilbert transform analysis was used to derive the 6 instantaneous phase and frequency from the IMFs that were decomposed from EMD and each IMF 7 represents the amplitude and frequency-modulated signal ( ) as noted below Equation 1. Ensemble EMD (EEMD) was developed to deal with the drawbacks of the original EMD (Wu 20 and Huang, 2009 ). EEMD is a noise-assisted data analysis technique used to extract the true signal from 21 the data. These true IMF components ( ) are regarded as the mean of an ensemble of trials added by the EEMD mainly uses the standard deviation of added white noise that ranges 0.1 to 0.4 (Wu and 2 Huang, 2009 ). This current paper applies 0.35 to it. The number of ensemble members is set to 1000, 3 which should be enough to average-out the added noise. be regarded as the coordinates in multidimensional space, and the variance-weighted Euclidean distance 31 is then calculated to quantify the causal strength (Yang et al., 2018) . Absolute causal strengths are defined 1 as shown below (Equation 6 and 7) , and their range are between 0 and 1. 12 Table 1 describes the data used in this study. The data for COVID-19 confirmed cases in the US were 22 acquired from the USA Facts, a non-profit organization that provides data and reports on the US 23 population (USA Facts, 2020). To represent how many people are traveling in each state, we use the 24 Transportation Statistics, 2020b). All the variables are standardized to compare the parameter estimates of 10 variables with a different unit without the concern of multicollinearity. 11 We confirmed the correlation coefficients for intrastate variables and interstate variables, and 12 they are used in the linear regression analysis in the later section. Regarding intrastate variables, the 13 variance inflation factor values representing multicollinearity appeared to be lower than 1.50. The 14 Pearson correlation coefficient between the intrastate variables was −0.036. Based on the results of the 15 small correlation for the intrastate and interstate variables, we investigated the effect of those variables 16 using linear regression analysis in a later section here. Variability of amplitude and phase in the different frequency domains was reported from the previous 6 influenza spread, which had a relationship with the geographical distance or temperature influences 7 (Alonso et al., 2007) . The travel pattern also was associated with the seasonality and its variability based 8 on the gender, race, income, and geographical attributes (Yang et al., 2011) , and it has the nonlinear 9 relationship with the COVID-19 cases (Li et al., 2021). Therefore, as with the impact of policy, EEMD 10 are required to analyze daily trips and COVID-19 cases, as explained by different periods with multiple 11 We first obtained IMFs after applying the EEMD methods to the time series on COVID-19 cases 13 and the number of daily trip data for all 50 states and a federal district (Washington, D.C.) in the United 14 States. Figure 2 depicts the decomposed IMFs after employing EEMD in California and New York state. 15 The number of state policies represented as a time-series and corresponding IMF are presented as gray 16 and red lines, respectively. The left y-axis describes the normalized amplitude of IMF while the right y-17 axis shows the number of state policies. Each IMF represents the orthogonal (i.e., mutual independent) 18 signal that is associated with a specific frequency domain. To characterize those IMFs, the average 19 period, energy strength, and its correlation coefficient to each state policy were calculated and displayed 20 on the right side of the figure. The energy strength ranges from 0% to 100%, and it stands for how much 21 of the energy density distribution corresponds to each IMF, as denoted in Equation 10 . 22 The IMF having the highest cross-correlation coefficient is highlighted in boldface and blue colored in 24 New York state, both COVID-19 cases and the number of trips showed the highest correlation at the IMF 11 5 for the period of 74 days. These characteristics could be commonly found in that the correlation 12 coefficient is the highest at the low frequency IMF; however, it turned out to be slightly different from 13 state to state. These results show that the effect of state policy on the number of daily trips and further 14 still, the number of COVID-19 cases appeared as a long-term effect. Therefore, the continuous intensity 1 of policy measures may be more effective than the temporal ones. approximately one week, while IMF 5 and IMF 6, representing long-term patterns, have a period of more 6 than a couple of months. IMF 3 and IMF 4 denoted mid-term range patterns, having the largest variation 7 among the states. In other words, the intrinsic and intricate pattern of the state could be recognized 8 through gathering the mid-term pattern, while there was no significant difference between the average 9 periods in the short-term and long-term (i.e., weekly pattern) by region. Therefore, this study mainly 10 conducted state-level analysis focusing on the short-term and long-term because the time-scale of mid-11 term IMF vary from state to state. The decomposed function of the number of daily trips (See red 12 triangles) also appeared to be similar to the COVID-19 cases in short, medium, and long-term patterns. 13 However, the average periods were relatively steady and only small variated for the states in the short and Notes. * = p < 0.05; ** = p < 0.01; *** = p < 0.001. To find the mutual causal relationship between the COVID-19 cases and the number of daily 18 trips, the causal decomposition was conducted followed by a regression analysis. The bottom of Table 2 19 presents each IMF's average causal strength between the COVID-19 cases and the number of daily trips 20 and the results of that regression using intrastate variables and causal strength. For causal analysis 21 between the case of COVID-19 and the number of trips, it is necessary to compute the average of the 22 causal strength of each direction, since it is not known whether the number of confirmed cases or the 1 social distancing mandate is the cause, and these patterns are expected to change continuously over time 2 and space. The average causal strength of IMF 5, representing long-term causality, is the largest among 3 the IMFs, which indicates that the long-term effects dominate the influences between COVID-19 and the 4 number of daily trips (i.e., degree of social distancing). Average CCM coefficient also tends to be larger 5 in the long-term effects. While the short-term effect of social distancing policies is reported to shift the 6 peak of epidemics keeping the value of the peak retain to the same level, those policies significantly 7 flatten the long-term trend of the spread (Bastos and Cajueiro, 2020). Therefore, the energy strength of 8 long-term domain of COVID-19 cases may partially represent the degree of social distancing. 9 Causal strength of causal decomposition from the COVID-19 cases to the daily trips becomes 10 stronger when the state's median age is high. The higher the median age, the more successful was the 11 practice of reducing trips or social distancing according to the spread of diseases. The elderly people 12 considerably reduced their trips during the COVID-19 pandemic, as they are thought to be one of the 13 most susceptible populations (Liu et al., 2021a) . This is also comparable to the results from Hu et al. 14 (2021) that a higher proportions of the 40 to 65 age groups are more likely to reduce more trips than the 0 15 to 24 age group. In the causal strength at the low-frequency IMF, the stronger the state's democratic 16 tendency, the greater was the effect of reducing trips according to outbreaks of the COVID-19. Political 17 orientation has been known to be a significant factor in affecting trips by case. Democrats are more likely The spread of epidemics does not only occur within states. It can also occur through the interaction 32 between states. Reports on the number of confirmed cases disclosed and the spread of an epidemic may 33 influence people's decisions to travel. We applied causal decomposition methods to discover mutual 34 causation and place it into two folds: between the COVID-19 cases of two states, and the COVID-19 case 1 and daily trips of two states. The former is to see the transmission of COVID-19 between states, and the 2 latter is to investigate the impact of other states' COVID-19 cases on people's mobility level. In this 3 study, a total of 2,550 (51×50) pairs of causal strengths are calculated to investigate the mutual causal 4 relationship among 50 states and a federal district. Similar to the case of intrastate causality, we only 5 measure the average of causal strength of each direction. 6 Table 3 presents the average causal strength and the parameter estimates for each IMF. Similar to 11 the causality of the intrastate, the average causal strength of IMF 5 is the largest among the IMFs, which 12 indicates that the long-term effects also dominate the influences between the COVID-19 cases of a pair of 13 states and those between COVID-19 and the number of daily trips. In the United States, the distance 14 between two states greatly influences road traffic, so the amount of road traffic could be considered 15 through examining the distance between states. The distance variable has a negative association in IMF 3, 16 4, and 5. This result implies that the smaller the distance, the more likely it is to have mutual travel, and a 17 higher causal strength between COVID-19 cases. However, the IMF 6 representing the longest term 18 shows a positive relationship between the geographic distance and the causal strength of IMF 6. The IMF 19 6 shows a period of 74 or 111 days, nearly half or a third of the total period of 222 days of the pandemic. 20 Therefore, this trend appears to be determined dependently on the entire period such as seasonality, rather 21 than on specific factors. 22 On the other hand, since there is a tendency to use airplanes for relatively long-distance travel, the 23 amount of airplane traffic refers to the number of travels made between states over a longer distance. 24 Airline traffic showed a positive association in IMF 4 and 5. It is in line with the expectation that the 25 more air traffic that occurs in the long-term period, the greater is the impact on other states. It is 26 acknowledged that there is a significant linear correlation between domestic air traffic volume and the 27 spread of COVID-19 (Lau et al., 2020) . However, the results show that air traffic in a high-frequency 28 IMF has a negative association, which is distinct from the hypothesis. It implies that this difference is due 29 to the greater effect of not only traffic but also other unobserved factors including noises that were not 30 considered in the regression model (Xiao et al., 2020) . 31 Table 3 Notes. * = p < 0.1; ** = p < 0.05; *** = p < 0.01. Due to the seriousness of COVID-19, the number of confirmed cases worldwide is disclosed on a 6 daily basis. As this information is widely accessible to the public, the spread of COVID-19 in one state 7 may greatly influence the trip decision-making processes of another states. These linear regression results 8 describe the causality of the COVID-19 reported case in one state on the number of daily trips in another 9 state as an influence of that distance or the estimated amount of airline traffics between the states. The 10 results show that the distance between states is significant in all time frequency domains. It also appears 11 the negative association from the IMF 1 to 5 on the causal strengths is less likely to be affected by the 12 new COVID-19 cases between distant states. This association is also consistent with the notion that near 13 things are more related than distant things. On the other hand, the positive influence in IMF 6, which also 14 appeared in the case to case, could be seen as the effect of an entire trend rather than any specific factor. 15 This effect could be examined for longer data collection period lengths. The variable of airline domestic 16 traffic shows significant positive association in IMF 5 and the negative association with IMF 1 to 3. It 17 was found that, in the long-term domain, the higher the air traffic volume, the higher the causality they 18 have, whereas, in the short-term, the states with more air traffic decreased the causal strength of the two 1 states. This can be seen as the same trend as revealed in the relationship between the COVID-19 cases 2 shown above. These results might be stem from the phenomenon that we have not considered or the The preceding results show that each state has a certain frequency domain that dominantly affects 11 or is affected. We conduct the clustering analysis to group states having similar time-scale energy 12 strengths of COVID-19 cases. IMF 1 and 2 are classified as short-term, IMF 3 and 4 as mid-term, and 13 IMF 5 and 6 as the long-term components. Clustering methods can be divided into simultaneous 14 clustering and hierarchical clustering. We have considered the basic method in each approach, K-means 15 clustering and agglomerative hierarchical clustering, which can cluster points in 3D space into several 16 groups. The K-means clustering is sensitive to outliers and has disadvantages in that only circular clusters 17 can be found. Figure 5 depicts the scatter plots for each method according to the number of clusters 18 predetermined. Since the three variables used for clustering are the relative ratio of the energy strength of 19 short-term, mid-term, and long-term domains, their summation is 1.0, which can be drawn on one plane 20 (i.e., + + = 1). Each color represents the clustering results. The silhouette method is applied to evaluate the clustering performances. The better clustering 6 result indicates a higher overall average silhouette coefficient. Table 4 describes the calculated average 7 silhouette coefficient of the K-means and agglomerative hierarchical clustering. Both methods require the 8 prior determination of the number of clusters. To derive the best combination of the clustering method 9 and the number of clusters, we evaluated the clustering performance using the average silhouette 10 coefficient, which measures the variations of inner distance. The evaluation results showed that the 11 average silhouette coefficient of K-means clustering at k=2 shows the highest value, 0.52, so we adopted 12 it in this study. According to the energy strength in the short-, mid-, and long-term IMFs, the states are clustered 18 into two groups. The optimal number of clusters is determined by the silhouette method, which measures 19 the similarity of each element within the cluster compared to the other clusters (Rousseeuw, 1987) . 20 corresponding standardized values of the intrastate variables. Cluster 1, consisting of 29 states, has 7 relatively higher energy strength in the short-term IMFs; Cluster 2 includes 21 states, having higher 8 energy strength in long-term IMFs. The logistic regression to classify clusters reveals the statistically 9 significant association with the intrastate variables, including GDP and political orientation, as shown in 10 Table 6 . Positive coefficients indicate that the attribute increase the probability of the state to be 11 classified as a Cluster 2. Compared to Cluster 1, states belonging to Cluster 2 tend to produce more GDP 12 and are politically closer to the Democrats. It reveals the relationship between clusters and socio-13 demographic characteristics that the higher the GDP and the more politically democratic, the higher the and these states are more likely to embrace social distancing policies. Therefore, in Cluster 2, COVID- 19 21 cases would be mainly affected by long-term variations stemmed from the social distancing policies and 22 global trends rather than short-term variations of COVID-19 outbreaks. These results revisit the findings 23 that the causality of social distancing and the COVID-19 cases mainly comes from the long-term rather 24 than the short-term scale. These findings suggest that state policy makers need to continuously monitor 25 long-term trends of COVID-19 cases to determine whether the social distancing policies are effectively 1 working or not. The prior works have focused on the influence of road or airline traffic on the spread of COVID- show a negative association with the interstate distance, which implies that the distance plays significant 5 role in the spread of COVID-19. It is also found that, in the long-term domain, the higher the air traffic 6 volume, the higher the causality they have, whereas, in the short-term, the states with more air traffic 7 decreased the causal strength of the two states due to noisy and unobserved factors. 8 Clustering analysis help reveal the relationship between clusters and socio-demographic 9 characteristics. Cluster 1 has relatively higher energy strength in the short-term IMFs, and Cluster 2 10 shows higher energy strength in long-term IMFs. It turned out that the higher the GDP and the more 11 politically democratic, the higher the energy strength of the long-term IMFs. Considering the tendency to Clustering and superspreading potential of SARS-CoV-2 infections in Hong Kong Convergent cross mapping (CCM) based approach for 5 isolating the source of plant-wide disturbances Polarization and public 8 health: Partisan differences in social distancing during the coronavirus pandemic Seasonality of Influenza in Brazil: A Traveling Wave from the Amazon to the Subtropics Modeling and forecasting the early evolution of the Covid-19 Time series regression studies 16 in environmental epidemiology Bureau of Transportation Statistics, 2020a. Trips by Distance Airline Origin and Destination Survey (DB1B) Wavelet analysis in ecology and epidemiology: impact of 22 statistical tests Causality Detection Methods Applied to the Investigation 28 of Malaria Epidemics Partisan differences in physical distancing are linked to health outcomes during the COVID-31 19 pandemic COVID-19: US federal accountability for entry, spread, and inequities-lessons for the future Social distancing 19 across vulnerability, race, politics, and employment: How different Americans changed behaviors 20 before and after major COVID-19 policy announcements Modeling COVID-19 scenarios for the United States Coronavirus infections and deaths by poverty status: The effects 25 of social distancing Contributions to the mathematical theory of epidemics. II. -28 The problem of endemicity Spatiotemporal filtering method for detecting 31 kinematic waves in a connected environment A Hybrid Approach Based on Variational Mode Decomposition for Analyzing and Predicting Urban Travel Speed The association between international and domestic air traffic and the 4 coronavirus (COVID-19) outbreak The relationship between 7 trends in COVID-19 prevalence and traffic levels in South Korea COVID-19, flattening the curve, and Benford's law. Phys. A Stat Outbreak dynamics of COVID-19 in Europe 12 and the effect of travel restrictions Elderly mobility during the COVID-19 pandemic: A 15 qualitative exploration in Kunming The role of seasonality in the 18 spread of COVID-19 pandemic A model based study on the 21 dynamics of COVID-19: Prediction and control Dynamic correlations at different time-scales with empirical 1 mode decomposition Socio-Economic Disparities and COVID-19 in the USA COVID-19 US State Policy Database Silhouettes: A graphical aid to the interpretation and validation of cluster analysis Effect of Nonpharmaceutical Interventions on 11 Transmission of Severe Acute Respiratory Syndrome Coronavirus 2 Modeling Social Network Influence on Joint Trip Frequency 14 for Regular Activity Travel Decisions Detecting causality 17 in complex ecosystems. Science (80-. ) SQGDP2 Gross domestic product (GDP) by state American Community Survey 1-Year Estimates Detailed Tables: MEDIAN 21 AGE BY SEX [WWW Document American Community Survey 1-Year Estimates Detailed Tables: RACE 25 [WWW Document USA Facts, 2020. US COVID-19 cases and deaths by state ENSEMBLE EMPIRICAL MODE DECOMPOSITION: A NOISE-31 ASSISTED DATA ANALYSIS METHOD Gold price analysis based on ensemble empirical model decomposition and independent component analysis Tourism Traffic Demand Prediction Using Google Trends Based on Cross-correlation analysis of stock markets using EMD and EEMD Causal decomposition in the mutual causation system Variability and seasonality of active transportation in 9 USA: evidence from the 2001 NHTS Why you consider Physica A is suited for this publication This study aims to explain the causal relationship between COVID-19 cases and mobility patterns by investigating their multi-scale constituents. Many research in this journal have applied the EEMD, one of our main methodology, to multi-scale analysis of economic data (Zebende DCCA cross-correlation coefficient: Quantifying level of cross-correlation. Phys. A Stat Gold price analysis based on ensemble empirical model decomposition and independent component analysis Cross-correlation analysis of stock markets using EMD and EEMD Dynamic correlations at different time-scales with empirical mode decomposition Analysis of economic growth fluctuations based on EEMD and causal decomposition verify these findings apart from US, it is necessary to apply this study to other region with different 1 cultural background. Lastly, the recent relaxation of the strict social distancing mandates has been 2 progressed around the world after the vaccination. Thus, future research should explore the possible 3 influence of the vaccination and the modified mandates on the spread of COVID-19. The authors declare that they have no known competing financial interests or personal 12 relationships that could have appeared to influence the work reported in this paper.