key: cord-0773684-qlcmnh4j authors: Liu, Mengyang; Liu, Mengmeng; Li, Zhiwei; Zhu, Yingxuan; Liu, Yue; Wang, Xiaonan; Tao, Lixin; Guo, Xiuhua title: The spatial clustering analysis of COVID-19 and its associated factors in mainland China at the prefecture level date: 2021-02-20 journal: Sci Total Environ DOI: 10.1016/j.scitotenv.2021.145992 sha: 7ccfa81cb8a04713d923482f611dd5cdd5931692 doc_id: 773684 cord_uid: qlcmnh4j Coronavirus disease 2019 (COVID-19) has become a worldwide public health threat. Many associated factors including population movement, meteorological parameters, air quality and socioeconomic conditions can affect COVID-19 transmission. However, no study has combined these various factors in a comprehensive analysis. We collected data on COVID-19 cases and the factors of interest in 340 prefectures of mainland China from 1 December 2019 to 30 April 2020. Moran's I statistic, Getis-Ord Gi⁎ statistic and Kulldorff's space-time scan statistics were used to identify spatial clusters of COVID-19, and the geographically weighted regression (GWR) model was applied to investigate the effects of the associated factors on COVID-19 incidence. A total of 67,449 laboratory-confirmed cases were reported during the study period. Wuhan city as well as its surrounding areas were the cluster areas, and January 25 to February 21, 2020, was the clustering time of COVID-19. The population outflow from Wuhan played a significant role in COVID-19 transmission, with the local coefficients varying from 14.87 to 15.02 in the 340 prefectures. Among the meteorological parameters, relative humidity and precipitation were positively associated with COVID-19 incidence, while the average wind speed showed a negative correlation, but the relationship of average temperature with COVID-19 incidence inconsistent between northern and southern China. NO2 was positively associated, and O3 was negatively associated, with COVID-19 incidence. Environment with high levels of inbound migration or travel, poor ventilation, high humidity or heavy rainfall, low temperature, and high air pollution may be favorable for the growth, reproduction and spread of SARS-CoV-2. Therefore, applying appropriate lockdown measures and travel restrictions, strengthening the ventilation of living and working environments, controlling air pollution and making sufficient preparations for a possible second wave in the relatively cold autumn and winter months may be helpful for the control and prevention of COVID-19. Coronavirus disease 2019 , caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has become a worldwide public health threat. We have identified the human-to-human transmission of SARS-CoV-2 based on clinical and epidemiological evidence (Shereen et al., 2020) , and the infectivity of SARS-CoV-2 is greater than that of the previous two subtypes of coronavirus diseases, severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS) (Petrosillo et al., 2020) . The WHO characterized COVID-19 as a pandemic on 11 March, 2020 (WHO, 2020b , and there were 10 185 374 cases and 503 862 deaths across more than 200 countries by 30, June, 2020 (WHO, 2020a). However, this pandemic is much more than a health crisis. The United Nations defined it as a human, economic and social crisis (UN, 2020) . It is predicted that global trade could also fall by 13% to 32%, depending on the depth and extent of the global economic downturn (CRS, 2020) . Although COVID-19 is still spreading in many countries around the world, China and some other Asian countries have essentially controlled the epidemic. Studying the epidemic of COVID-19 in China is of great importance, because it can give the public with an overall and comprehensive understanding of the disease. Since COVID-19 is an emerging and serious infectious disease, it is critical to implement spatio-temporal surveillance, which can help prioritize locations for targeted interventions, rapid testing, and resource allocation. A study by Yang et al. explored the spatio-temporal patterns of the COVID-19 epidemic in Hubei Province and found that the spatial clusters were concentrated in Wuhan city . Other studies conducted at both the prefecture and county levels in Hubei Province and at the provincial level in mainland China also found similar results (Mo et al., 2020; Xiong et al., 2020) . Several studies have been conducted to explore the spatial and temporal distribution of COVID-19 in mainland China, but their results were limited to the provincial level or to Hubei Province alone. Few studies have been launched at a more detailed spatial scale with appropriate statistical methods. Many factors including population movement, meteorological parameters, air quality and socioeconomic conditions can affect COVID-19 transmission. A positive association was observed between population outflow from Wuhan and the number of COVID-19 cases Zhang et al., 2020) . Many studies have explored the relationship between meteorological factors and COVID-19 transmission, especially temperature and humidity, but the results are contradictory. A multicity study in China identified a negative association such that each 1 °C increase in ambient temperature and each 1% increase in absolute humidity was related to a decline in daily confirmed case counts, with corresponding pooled RRs of 0.80 (95% CI: 0.75, 0.85) and 0.72 (95% CI: 0.59, 0.89), respectively . Another study conducted in 166 countries worldwide reported similar results: a 1°C increase in temperature or a 1% increase in relative humidity was associated with a 3.08% (95% CI: 1.53%, 4.63%) or a 0.85% (95% CI: 0.51%, 1.19%) reduction in daily new cases of COVID-19, respectively . However, Xie et al. reported out the opposite result of the positive association in China , while Pan et al. did not find significant relationships between meteorological factors and the R 0 of COVID-19 in 202 locations of 8 countries (Pan et al., 2021) . Air pollutants such as fine particulate matter (PM 2.5 ) and inhalable particles (PM 10 ) have also been shown to increase the risk of SARS-CoV-2 infection . Moreover, increasing population density, construction land area proportion and aged population density were associated with an increased COVID-19 morbidity rate but gross domestic product (GDP) per unit of land area and hospital density were associated with a decreased COVID-19 morbidity rate in Wuhan, China (You et al., 2020) . In summary, most previous studies only focused on one or a few limited influencing factors or limited regions (Hubei Province or Wuhan city), and their results were sometimes inconsistent. All factors can influence each other, and no associated factor can exist completely independently, but there is no study that has combined the various influencing factors together for analysis in mainland China. This study aims to investigate the spatial clustering pattern of COVID-19 in mainland China at the prefecture level and explore the relationships between many factors, such as population movement, meteorological parameters, air quality factors and socioeconomic conditions, and the incidence of COVID-19. The laboratory COVID-19 case data for 340 prefectures of mainland China (Figure S1 Table S1 ) were extracted from the R package "nCov2019" . This package aggregated global COVID-19 data from four different sources and provides real-time and historical data on COVID-19 in every country worldwide, along with daily confirmed case counts. The four different data sources including China National Health Commission (CNHC, 2020), Tencent SARS-COV-2 website (Tencent, 2020) , a non-governmental organization Dingxiangyuan (DXY, 2020) and a public GitHub repository (GitHub Repo, 2020) . Of course, these four data sources are based on the official website including the WHO, China National Health Commission, Chinese provincial and city health agencies, and public health agencies in other countries . The authors of this package collect, check and synthesize the information from the above sources and determine the final database. The epidemic data in China mainly originated from the National Health Council and the provincial and city health committees from the late December 2019 to 30 April 2020. Cases imported from other countries were excluded from the study. In addition, the epidemiological investigation from Chinese CDC experts indicated that the onset date of the patients can be traced back to 8 December, 2019 (Wuhan MHC, 2020) . In order to consider the associated factors before the onset, we have incorporated influencing factors such as meteorology and air pollutants since 1 December, 2019. Meteorological data in every prefecture were obtained from the China Meteorological Data Sharing Service System (https://data.cma.cn/en), which is the authoritative platform on which the China Meteorological Administration shares its meteorological J o u r n a l P r e -p r o o f Journal Pre-proof data resources with the public. We obtained the daily data of average temperature (AT, °C), average wind speed (AWS, m/s), average relative humidity (ARH, %), average atmospheric pressure (AAP, hPa) and average precipitation (AP, mm) from 1 December 2019 to 30 April 2020, and averaged them over a total of 151 days to prepare for subsequent modeling. Air pollution data were obtained from more than 1600 fixed-site air quality monitoring stations from the China National Environmental Monitoring Centre (http://www.cnemc.cn/) between 1 December 2019 and 30 April 2020, covering every prefecture in mainland China. The average concentrations of six pollutants as well as the air quality index (AQI) were used in this study, namely, PM 2.5 , PM 10 , carbon monoxide (CO), sulfur dioxide (SO 2 ), nitrogen dioxide (NO 2 ) and ozone (O 3 ). We obtained the daily proportion of the population outflow from Wuhan to other cities (%) and population inflow from other cities to Wuhan (%) from 10 January to 15 March 2020, which covered the China Spring Festival travel period, from data released by Baidu Qianxi. Baidu Qianxi provides real-time dynamic information on regional population movements based on Baidu location-based services (LBS) and Baidu Tianyan. Data from Baidu Qianxi during the Spring Festival are freely available to the public (https://qianxi.baidu.com). Socioeconomic factors were obtained from the China Statistical Yearbook 2020 J o u r n a l P r e -p r o o f released by the National Bureau of Statistics (http://data.stats.gov.cn/) for every province. The factors collected included population density, proportion of people aged 65 and above, GDP per capita, number of beds per 10 000 people, number of health technicians per 10 000 people and number of high schools and colleges per 100 000 people, which covered all aspects, such as population, health care, education and so on. It is worth noting that these data reflected the overall condition of each province in 2018 rather than 2020. In addition, the population data in every prefecture in mainland China were also collected from the China regional economic statistical yearbook to calculate the COVID-19 incidence. First, the global spatial autocorrelation analysis method global Moran's I index was used to check whether there was global spatial autocorrelation over the entire region regarding COVID-19 incidence and the associated factors. The value of global Moran's I ranges from -1 to 1. A higher positive Moran's I indicates that values in neighboring positions tend to cluster, while a lower negative Moran's I implies that higher and lower values are interspersed. When Moran's I is near 0, there is no spatial clustering, meaning that the data are randomly distributed. The Z-score and P-value were calculated to evaluate the significance of the global Moran's I index (Huo et al., 2012; Jiang and Zhao, 2011) . Then, local spatial autocorrelation analysis methods, including Anselin Local Moran's I statistic and Getis-Ord Gi * statistic, were used to analyse the clusters of COVID-19 incidence and to identify COVID-19 incidence J o u r n a l P r e -p r o o f hotspots in the study period. Similar to the global Moran's I index, the value of Anselin Local Moran's I statistic also range from -1 to 1, and the Z-score and P-value were used to test whether it was statistically significant. The results of Anselin Local Moran's I statistic are presented as spatial clusters (high-high clusters and low-low clusters) and spatial outliers (high-low outliers and low-high outliers) in specific locations (Anselin, 1995) . The Getis-Ord Gi* statistic can be used as a measure of the degree of spatial clustering of COVID-19 incidence in a study location. The degree of clustering and its statistical significance are estimated based on a confidence level according to the Z (Gi*)-score and P -value. Districts with Z-scores > 2.58, and Z-scores between 1.96-2.58 and 1.65-1.96 were considered to be significant at the 99% confidence level (P < 0.01), 95% confidence level (P < 0.05) and 90% confidence level (P < 0.10), respectively. A positive and significant Z score indicates a hotspot; on the contrary, a negative and significant Z score indicates a cold spot, and a Z (Gi * ) score close to zero indicates no obvious spatial clustering (Lu et al., 2019; Peeters et al., 2015) . Finally, Kulldorff's space-time scan statistical analysis was used to explore the spatial and temporal clusters of COVID-19. The space-time scan statistics were defined by a potential cylindrical window in which the bottom of the cylinder corresponds to space and the height of the cylinder corresponds to time. Then, the cylindrical window was moved in dynamic space and time to find the potential geographical locations and possible time periods. The most likely cluster is the window with the maximum likelihood, and secondary clusters are also presented if they are statistically significant. The Monte Carlo method was used to estimate the P J o u r n a l P r e -p r o o f value with the number of replications limited to 999, and the relative risk was calculated as the ratio of the estimated risk within the cluster and outside the cluster (Liu et al., 2018) . Before the regression analysis of associated factors and COVID-19 incidence, the Lasso dimension reduction method was first used to screen the essential factors from among all kinds of factors; this could also avoid the problem of collinearity of similar variables. Then, the geographically weighted regression (GWR) model was used to explore the spatial variation in the relationship between COVID-19 incidence and essential associated factors. Since the spatial positions of the data were taken into consideration in the GWR model, the local parameters changed with the spatial position. A conventional GWR model can be described by the following equation: Kulldorff's space-time scan statistical analysis was conducted by SaTScan 9.3 software. The GWR model was built with the GWR 4.0 software. All statistical tests were two-sided, and P < 0.05 was considered statistically significant. There were 67 449 laboratory confirmed COVID-19 cases included in the study from the late December 2019 to 30 April 2020. The incidence rate varied from 0 to 1513.16, with a median of 1.07 per million in 340 prefectures (Table 1, Figure 1a ). The distribution of associated factors including population movement, five meteorological factors, six air pollutants and AQI at the prefecture level, and six socioeconomic factors at the province level are also shown in Table 1 . The regions with the highest COVID-19 incidence were located in Wuhan and its surrounding cities, which are all located in Hubei Province (Figure 1a) . The global Moran's I statistic indicated that the COVID-19 cases were not randomly distributed such as Shenzhen and Dongguan city in Guangdong Province were also scattered cluster regions (Figure 1d ). For temporal patterns, the clustering time of the most likely cluster ranged from 4 February 2020 to 18 February 2020 while nine secondary clusters occurred from 25 January 2020 to 21 February 2020 ( Figure S2 ). Figure 3a) and -9.30 (range: -11.27~-5.19, Table 2, Figure 3b ), respectively. Regarding the meteorological factors, the influence of AT varied by region from southern China to northern China, with local coefficients varying from -0.16 to 0.20 (Table 2, Figure 3c ). AWS was negatively associated with COVID-19 incidence, but ARH and AP were positively associated with COVID-19 incidence, with coefficients of -0.04 (range: Our study not only explored the spatial distribution of COVID-19, but also examined the association between COVID-19 incidence and associated factors including For the meteorological factors, the relationship between temperature and COVID-19 incidence was complex and varied from region to region. In northern China, with a relatively low temperature, temperature was positively correlated with COVID-19 incidence, while in southern China, with a relatively high temperature, there was a negative correlation between temperature and COVID-19 incidence. We speculated that different temperature ranges have different effects on SARS-CoV-2 infection and that their relationship may not be linear but rather an inverted U shape. The SARS-CoV-2 infectivity increased with increasing temperature in the lower temperature range on the left side of the inverted U shape, while the infectivity decreased with increasing temperature in the higher temperature range on the right J o u r n a l P r e -p r o o f side of the inverted U shape. The left and right sides of the inverted U may correspond to the north and the south of China, respectively. Studies based on either epidemiological surveys or laboratory experiments also supported this result (Casanova et al., 2010; Prata et al., 2020; Xie and Zhu, 2020) . For example, a national study based on 122 Chinese cities observed that when the mean temperature was lower than 3°C in the past two weeks, the daily confirmed COVID-19 cases increased by 4.86% with each 1 °C increase , but in the tropical cities of Brazil, each 1°C increase in AT was associated with a 4.89% decrease in the number of daily COVID-19 cases (Prata et al., 2020) . Of course, the relationship between temperature and COVID-19 incidence is complex, and whether there is an inverted U-shaped relation and the optimum temperature of SARS-CoV-2 need further research for confirmation. Densely populated and poorly ventilated environments, whether indoors or outdoors, could also increase the risk of airborne diseases such as SRAR, tuberculosis and SARS-CoV-2 infection Olsen et al., 2003; Zhang et al., 2019) . Conversely, wind blowing can speed up air flow and virus dispersion, thereby reducing the COVID-19 incidence rate. Both ARH and AP were positively associated with COVID-19 incidence, perhaps because SARS-CoV-2 needs a relatively humid environment for growth and reproduction (Yao et al., 2020) . The first COVID-19 case was detected in the Huanan Seafood Wholesale Market of Wuhan and another wave of the epidemic was recently detected in the Xinfadi Wholesale Market of Beijing, which supports the theory that a wet environment is favorable for the transmission of COVID-19 Wang et al., 2020) . A climate-dependent epidemic model about the SARS-CoV-2 pandemic in different scenarios based on known coronavirus biology suggested that strong outbreaks were likely in more humid climates (Baker et al., 2020) . Research on MERS and other forms of coronavirus also proved that the virus started to decay rapidly at a lower humidity level (Pyankov et al., 2018) . Some other studies presented the opposite results for ARH, but they only controlled for a limited number of meteorological factors (Qi et al., 2020) ; our analysis controlled for many confounding factors and should be more robust. For the air pollutant parameters, higher O 3 may decrease the SARS-CoV-2 infection risk, and this may be attributed to two aspects. First, many previous studies have found that O 3 has a broad sterilization spectrum, and can deactivate various pathogens regardless of the virus, bacteria, fungus or protozoon (Elvis and Ekta, 2011) . For example, a previous study reported that a high concentration of ozone in water of 27.73 mg/L can completely inactivate the SARS virus within 4 minutes, and medium and low concentrations could also deactivate the virus with different levels of efficacy (Zhang et al., 2004) . Second, a moderate dose of O 3 could trigger an entire cascade of immune reactions by producing a large amount of inflammatory cytokines including interferon, interleukin and tumor necrosis factor, so O 3 was used to treat many kinds of diseases and may become a protective factor against COVID-19 (Elvis and Ekta, 2011) . In addition, our research found that NO 2 was positively related to COVID-19. Studies from the United States and Europe also observed a positive relationship between NO 2 and the COVID-19 case-fatality rate and mortality rate J o u r n a l P r e -p r o o f (Liang et al., 2020; Ogen, 2020) . In the US, each 4.6 ppb increase in NO 2 was related to an increase of 7.1% (95% CI 1.2% to 13.4%) and 11.2% (95% CI 3.4% to 19.5%) in the COVID-19 case-fatality rate and mortality rate, respectively (Liang et al., 2020) . As a highly reactive exogeneous oxidant, NO 2 can enhance oxidative stress and induce inflammatory reactions, generating reactive nitrogen and oxygen species, which may eventually damage the immune system and make the human body more susceptible to SARS-CoV-2 infection (Bevelander et al., 2007) . Of course, the correlation does not mean causality. Further studies should be conducted to prove the causal relationship between the influencing factors and COVID-19. This study has several limitations. First, we could not obtain detailed socioeconomic information for each prefecture, only for the provincial level in 2019, so they were fitted into the model as global variables. Second, this was a cross-sectional study with all data averaged in the study period, so the time effects were ignored, and causality cannot be verified. Third, the population movement data were limited to the Spring Festival travel rush period, which was from 10 January to 15 March. This did not cover the whole study period, but it fully represented the population flow situation in the study period for the following reasons: 1) the Spring Festival in China is often accompanied by large-scale population movements and the travel rush in 2020 started on 10 January, so the number of population movement began to grow dramatically from this moment until the end of the holiday; 2) a lockdown measure was implemented in Wuhan city from 23 January 2020 until 8 April 2020, so the population outflows from Wuhan and population inflows from J o u r n a l P r e -p r o o f other cities to Wuhan were sufficiently small (except for some special personnel such as medical staff) to be ignore; and 3) we used the average proportion of population movements rather than the absolute number of people, which was relatively stable for a particular region. In conclusion, our study found that a variety of associated factors including J o u r n a l P r e -p r o o f Number of high schools and colleges per 100 000 people 4124.00 5208.00 5311.00 5585.00 6633.00 377.00 Note: & The COVID-19 incidence (1/1000000) was calculated from the late December 2019 to 30 April 2020. * The "Population outflow from Wuhan to other cities (%)" was calculated from 10 January to 15 March 2020. * The "Population inflow from other cities to Wuhan (%)" was calculated from 10 January to 15 March 2020. # "Socioeconomic factors" in this study described the socioeconomic condition at the provincial level in 2018. J o u r n a l P r e -p r o o f -0.19 -0.17 -0.12 -0.07 -0.02 0.17 Note: * The "Population outflow from Wuhan to other cities (%)" was calculated from 10 January to 15 March 2020; * The "Population inflow from other cities to Wuhan (%)" was calculated from 10 January to 15 March 2020. J o u r n a l P r e -p r o o f Local indicators of spatial association-LISA Susceptible supply limits the role of climate in the early SARS-CoV-2 pandemic Nitrogen dioxide promotes allergic sensitization to inhaled antigen Geographically weighted regression: a method for exploring spatial nonstationarity Effects of air temperature and relative humidity on coronavirus survival on surfaces The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases (COVID-19) in China Epidemic notification Global economic effects of COVID-19 DXY (DingXiangYuan), 2020. Real-time data on the novel coronavirus Ozone therapy: A clinical review Combining geostatistics with Moran's I analysis for mapping soil heavy metals in Beijing, China Application of spatial autocorrelation method in epidemiology Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia Evidence for probable aerosol transmission of SARS-CoV-2 in a poorly ventilated restaurant Urban air pollution may enhance COVID-19 case-fatality and mortality rates in the United States Impact of meteorological factors on the COVID-19 transmission: A multi-city study in China Population movement, city closure in Wuhan and geographical expansion of the 2019-nCoV pneumonia infection in China in Spatial and temporal clustering analysis of tuberculosis in the mainland of China at the prefecture level Analysis of Epidemiological characteristics of scarlet fever in An analysis of spatiotemporal pattern for COIVD-19 in China based on space-time cube Assessing nitrogen dioxide (NO(2)) levels as a contributing factor to coronavirus (COVID-19) fatality Transmission of the severe acute respiratory syndrome on aircraft Warmer weather unlikely to reduce the COVID-19 transmission: An ecological study in 202 locations in 8 countries Getis-Ord's hot-and cold-spot statistics as a basis for multivariate spatial clustering of orchard tree data COVID-19, SARS and MERS: are they closely related? Temperature significantly changes COVID-19 transmission in (sub)tropical cities of Brazil Survival of aerosolized coronavirus in the ambient air COVID-19 transmission in Mainland China is associated with temperature and humidity: A time-series analysis COVID-19 infection: Origin, transmission, and characteristics of human coronaviruses Real-time tracking of the coronavirus infection UN (the United Nations), 2020. The social impact of COVID-19 Increasing SARS-CoV-2 nucleic acid testing capacity during the COVID-19 epidemic in Beijing: experience from a general hospital WHO (World Health Organization), 2020a. Coronavirus disease (COVID-19) situation report -145 WHO (World Health Organization), 2020b. WHO characterizes COVID-19 outbreak as pandemic Open-source analytics tools for studying the COVID-19 coronavirus outbreak Effects of temperature and humidity on the daily new cases and new deaths of COVID-19 in 166 countries Wuhan Municipal Health Commission), 2020. The latest report of viral pneumonia of unknown cause from experts interpret Association between ambient temperature and COVID-19 infection in 122 cities from China Spatial statistics and influencing factors of the COVID-19 epidemic at both prefecture and county levels in Hubei Province, China Spatio-temporal patterns of the 2019-nCoV epidemic at the county level in Hubei Province, China On airborne transmission and control of SARS-Cov-2 Distribution of COVID-19 morbidity rate in association with social and economic factors in Wuhan, China: implications for urban development Examination of the efficacy of ozone solution disinfectant ininactivating SARs virus Spatial distribution of tuberculosis and its association with meteorological factors in mainland China Exploring the roles of high-speed train, air and coach services in the spread of COVID-19 in China Association between short-term exposure to air J o u r n a l P r e -p r o o f pollution and COVID-19 infection: Evidence from China Xiaonan Wang: Writing -review & editing, Supervision. Lixin Tao: Writing -review & editing, Project administration, Supervision. Xiuhua Guo: Writing -review & editing, Conceptualization, Project administration, Supervision