key: cord-0792216-plc2ksfo authors: Xu, Gang; Jiang, Yuhan; Wang, Shuai; Qin, Kun; Ding, Jingchen; Liu, Yang; Lu, Binbin title: Spatial Disparities of Self-reported COVID-19 Cases and Influencing Factors in Wuhan, China date: 2021-10-25 journal: Sustain Cities Soc DOI: 10.1016/j.scs.2021.103485 sha: 1e6a82ed04034b5b876b2167320bf55c333936c5 doc_id: 792216 cord_uid: plc2ksfo The lack of detailed COVID-19 cases at a fine spatial resolution restricts the investigation of spatial disparities of its attack rate. Here, we collected nearly one thousand self-reported cases from a social media platform during the early stage of COVID-19 epidemic in Wuhan, China. We use kernel density estimation (KDE) to explore spatial disparities of epidemic intensity and adopt geographically weighted regression (GWR) model to quantify influences of population dynamics, transportation, and social interactions on COVID-19 epidemic. Results show that self-reported COVID-19 cases concentrated in commercial centers and populous residential areas. Blocks with higher population density, higher aging rate, more metro stations, more main roads, and more commercial point-of-interests (POIs) have higher density of COVID-19 cases. These five explanatory variables explain 76% variance of self-reported cases using an OLS model. Commercial POIs have the strongest influence, which increase COVID-19 cases by 28% with one standard deviation increase. The GWR model performs better than OLS model with the adjusted R(2) of 0.96. Spatial heterogeneities of coefficients in the GWR model show that influencing factors play different roles in diverse communities. We further discussed potential implications for the healthy city and urban planning for the sustainable development of cities. The COVID-19 epidemic has spread across the world. As of September 30, 2021, there are 230 million COVID-19 cases with 4.7 million deaths across the world (Dong et al., 2020) . Urban residents are the primary victims of this epidemic as more than 95% COVID-19 cases lived in cities (HABITAT, 2020; Xu et al., 2021b) . Therefore, it is of great significance to investigate spatial disparites and determinants of the COVID-19 epidemic through an ruban lens (Acuto et al., 2020) . The intra-urban analysis of the COVID-19 epidemic can not only reveal the spread of infectious deseases within cities but also support the building of a healthy city in the post-pandamic era (Frumkin, 2021; Megahed and Ghoneim, 2020) . Attack rates of the COVID-19 epidemic present apparent spatial heterogeneities (Das et al., 2021) . Infectious diseases spread though human-to-human contact and diffuse with mobility of urban residents, thus, spatial heterogeneities of infectious diseases are strongly correlated with urban population dynamics, including spatially aggregation of infected individuals, individual dispersal characteristics, social interactions and contact patterns (Real and Biek, 2007; Sun et al., 2020) . In addition, the demographic structure strongly influences the COVID-19 epidemic in space, becasues elderly people are more vulnerable to the virus (Team, 2020) . Socioeconomic status, such as income and education, are also correlated with COVID-19 outcomes (Chang et al., 2021; Drefahl et al., 2020) . Low-income people are less likely to follow social distancing and other prevention polities, which in turn leads to higher morbidity and mortality (Mena et al., 2021) . Built environment and layout of urban form are associated with COVID-19 incidence rates (Kashem et al., 2021) . Physical environment, such as air quality and meteorological factors, also influence the spread of COVID-19 epidemic (Zhang et al., 2020) . However, most of these studies are on the country or provincial level, while limited studies investigated the spatio-temporal variations of the COVID-19 epidemic at the intra-urban scale (Cordes and Castro, 2020; Maroko et al., 2020) . Intra-urban studies of the COVID-19 epidemic require locations of COVID-19 cases at a fine scale, while official statistics of the epidemic are basically released in accordance with administrative units . It is difficult to obtain spatial distributions of individual cases. Social media data is an alternative data source in intra-urban scale modeling of the COVID-19 epidemic, which have been used in COVID-19 related studies (Li et al., 2020; Peng et al., 2020) . Wuhan is the first city in China to report COVID-19, and it is also the most affected city in China. The local government of Wuhan published confirmed COVID-19 cases in 13 administrative districts in the city. However, the district-level data is too coarse to reveal spatial disparities of infection intensity within a city. In this study, we collected nearly one thousand self-reported COVID-19 cases in Wuhan through a Chinese social media platform (Weibo, like Twitter), which was used to represent spatial disparities of attack rates of the COVID-19 epidemic within the city. We further chose five explanatory variables, namely, population density, aging rate, metro station, main roads, and commercial point-of-interests (POIs), to quantify their influences on spatial heterogeneities of COVID-19 epidemic in Wuhan. Considering the spatial nonstationary of self-reported COVID-19 cases and influencing factors, we adopted the geographically weighted regression (GWR) model and compared results of the GWR model with the global ordinary least squares (OLS) model (Maiti et al., 2021; Mollalo et al., 2020; Xu et al., 2021a) . Wuhan is the capital city of Hubei Province, located in the eastern part of the Jianghan Plain and the middle reaches of the Yangtze River ( Figure 1 ). It is the intersection of the Yangtze River and its largest tributary, the Han River, and thus forms a pattern of three parts 6 (Hankou, Wuchang, and Hanyang) (Figure 1 ). In 2019, more than ten million people lived in Wuhan, with the GDP over 1.6 trillion RMB. Wuhan is the largest inland water-land-air transportation hub in Central China, and it is the only city in Central China with direct flights to five continents around the world. Most initial COVID-19 cases reported in Wuhan had exposed to one seafood market in Jianghan District, Wuhan ( Figure 1 ). As of May 18, 2020 that is the date of the last COVID-19 case in Wuhan, there are a total of 50,340 confirmed COVID-19 cases in Wuhan, with 3,869 deaths (http://www.wuhan.gov.cn/zwgk/tzgg/202005/t20200521_1325022.shtml). The overall mortality rate is 7.7% of the whole city. The most severely affected areas are Wuchang, Jiang'an, Jianghan and Qiaokou districts on both sides of the Yangtze River. Wuchang District has the largest number of confirmed cases, reaching 7551 cases, and Jianghan District has the highest morbidity that is 1.23%. In the early stage of the COVID-19 epidemic in Wuhan, medical resources and other aspects of responses were not prepared in time. As a result, some infected or suspected cases could not be admitted to hospitals in time. China's largest social media platform, Weibo (like Twitter), opened up a channel for help, allowing patients who self-reported as infected cases to post their symptoms and onset time on the platform (Li et al., 2020; Peng et al., 2020) . The information was collected, sorted and sent to the local government to better help patients. We collected the information related to the self-reported COVID-19 cases on Weibo using the Python crawler technology. There are 910 self-reported cases in total, which contains the age, time of onset, where they live, date of post, and report text. Among them, there are a total of 693 cases who reported their specific time of illness. We completed part of the incomplete information based on the report text. We acquired the latitude and longitude of the location according to the text address for spatial analysis and modeling. Detailed information (age, location, post time, and confirmed time) about 910 self-reported COVID-19 cases was shared on the GitHub after removing personal information (https://github.com/Inn905/COVID19_Self-reported_Data_Weibo). We choose the following five explanatory variables to explain spatial disparities of intensity of self-reported COVID-19 cases, namely, population density, aging rate (over 60 years old), metro station, main roads, and commercial point-of-interests (POI). The population density and aging rate are at the street block level from the annual population survey in 2014. Transportation is an essential medium for disease transmission. We use metro station and road net to represent public and private transportation, respectively. Metro stations and main road net are from the Gaode Map (https://www.amap.com) in 2019. The metro system counts for more than 50% inter-city commuters in Wuhan. Commercial activities correspond to the frequent contact and interaction between people within a city, and there is a high probability of indirect contact with strangers. We collected more than 120,000 commercial POIs from the Gaode Map, which cover shopping malls, hotels, restaurants, leisure and entertainment sites, and life service sites. The spatial extent is the main urban area of Wuhan with the block as the analysis unit. The average size of street blocks is 0.72 km 2 , with the smallest block of 0.012 km 2 and the largest block of 15.8 km 2 . There are 1107 blocks in total. A block is an area enclosed by main roads. In Chinese cities, this is a more refined spatial unit than district-level administrative divisions. By using blocks as the analysis unit, we can obtain more samples to reveal the spatial heterogeneity of the epidemic and the spatial differences of its influencing factors from a more refined perspective. The averages of the kernel density estimation (KDE, see Methods for detailed information) of self-reported COVID-19 cases in each block is the dependent variable. Population density and aging rate (with their initial values) at the block level are two explanatory variables. In terms of transportation variables (metro station and main road net) and commercial POIs, we also used the KDE of the two factors in each block as the other three explanatory variables. Kernel Density Estimation (KDE) takes the sample as the center (core) and calculates the density per unit area of the sample point within the search radius (bandwidth) to indicate the spatial distribution of densities of geographic elements. KDE has a wide range of applications in disease mapping. The mathematical expression of kernel density estimation is as equation (1): where K() is the kernel function; x-x i represents the distance from the value point to the output grid, h is the bandwidth, that is, the radius of the circle, and n is the sample size. This study uses the kernel density estimation function in ArcGIS 10.3 software to analyze the spatial intensity of self-reported COVID-19 cases and influencing factors at the block level in Wuhan. The adaptive bandwidth is used as the search radius. We first build the ordinary least square (OLS) linear regression model (Equation (2)) to quantify influencing factors of COVID-19 cases. where y is the KDE of self-reported COVID-19 cases at blocks, x i are five explanatory variables, k is the number of explanatory variables,  0 is a constant,  i is the regression coefficient of x i , and  is the error term. We use percent changes (PC) of self-reported COVID-19 cases caused by the increase of one standard deviation in the explanatory variable to compare contributions from different explanatory variables, which is defined as Equation (3) (Xu et al., 2020) : where PC i is the percent change of y caused by the increase of one standard deviation of x i ,  i is the regression coefficient of x i , sd i is the standard deviation of x i , and y ̅ is the average of self-reported COVID-19 cases at all blocks. Spatial heterogeneity is a fundamental characteristic of geographical variables, leading to a spatial variance in their relationships (Fotheringham et al., 2015) . The global ordinary least squares (OLS) regression model assumes the spatial stationarity in their relationships between explanatory variables and the dependent variable, failing in capturing the variance in space of relationships among geographical variables. A spatially varying coefficient modeling strategy is needed in geographical analysis (Murakami et al., 2019) . The geographically weighted regression (GWR) model was proposed and widely used in many disciplines, whose generic formulation is shown by equation (4) (Fotheringham et al., 2002) : where at block i, y i is the averaged KDE of self-reported COVID-19 cases, β i0 is the intercept, β ij is the j th regression parameter, X ij is the value of the j th explanatory parameter, and ε i is a random error term. In the GWR model, closer observations have a higher influence in estimating the local set of coefficients than distant observations. Regression parameters in the GWR model at each block in matrix form are as equation (5) (Fotheringham et al., 2002) : where ̂( ) is the vector of parameter estimates for block i, W(i) is the diagonal weights matrix specified for block i, X is the matrix of the explanatory variable with a first column of 1s for the intercept, y is the vector of the dependent variable. The weights matrix (W(i)) is calculated with a specified kernel function and a bandwidth. The Gaussian function is the widely used kernel function. The bandwidth is specified either by a fixed distance or a fixed number of nearest neighbors, namely, a fixed bandwidth or an adaptive bandwidth, respectively. More detailed information on kernel function and bandwidth in GWR modeling can be found in references (Fotheringham and Oshan, 2016; Lu et al., 2017; Lu et al., 2014a; Wheeler and Tiefelsdorf, 2005) . In this study, the GWR modeling was conducted using the package of "GWmodel" in R programming (Lu et al., 2014b ). The temporal variations of self-reported COVID-19 cases are shown in Figure 2 . The self-reported COVID-19 cases started to post their symptoms for help from February 3, 2020, and numbers of self-reported cases quickly increased in the following days (Figure 2a) . On February 5, there were more than 180 self-reported cases who posted their symptoms on social media for help. Wuhan has urgently constructed two infectious hospitals and also opened up mobile cabin hospitals for the isolation and treatment of patients with mild symptoms. As the treatment capacity increased, self-reported cases were quickly admitted to hospitals. As a result, the number of self-reported COVID-19 cases also dropped significantly and there were fewer than 10 self-reported cases after February 15. We calculated the interval time from onset to self-report for each case and the histogram of the interval time is shown in We assess the biases of the self-reported data by comparing it with official data. We calculate proportions of self-reported COIVD-19 cases in each district in Wuhan, and compare proportions from the official published data (Figure 2c, 2d) . Overall, all districts are scattered around the 1:1 line, showing a genral consistence of proportions between self-reported COIVD-19 cases and official data. The proportion of self-reported COIVD-19 cases in highly infected districtes (like Wuchang, Qiaokou, and Jiang'an districts) is higher than the official published proportion (Figure 2c ). By calculating the chi-square value between self-reported cases and official statistics (as of May 18, 2020), the result shows that the chi-square value is 122.55 (P < 0.001), which shows that proportions among districts between the self-reported cases and official data are consistent. The average age of the self-reported COVID-19 cases is 59.8 years old. We also compare the age distributions of self-reported COVID-19 cases with official published data (Team, 2020) (Figure 2d ). The overall distribution trends of them are similar, with the peak group being 50-70 years old. However, the age distribution of self-reported cases is more biased towards the elderly. The spatial distribution of more than nine hundred self-reported COVID-19 cases is presented in Figure 3 . The spatial distribution of explanatory variables and results of kernel density estimation Descriptive statitistics of self-reported COVID-19 cases and explanatory variables at blocks are shown in Table 1 . Self-reported COVID-19 cases, metro stations, main roads, and POIs are summerized from the KDE results of them. Correlation matrix between self-reported COVID-19 cases and explanatory variables are shown in Figure 5 . The Pearson's r between five explanatory variables and self-reported cases at the block level varies in 0.32-0.82, and they are all significantly and positively correlated (P < 0.001). Among them, the Pearson's r between commercial POIs and self-reported cases is the strongest, reaching 0.82. The scatter plot between aging rate and self-reported cases is relatively dispersed, resulting in the lowest correlation. The blue straight line is the regression line with Pearson's R and p value in each plot. We take the average KDE of self-reported COVID-19 cases in a block as the dependent variable and build the global ordinary least squares (OLS) regression model with five explanatory variables ( Table 2 ). The adjusted R 2 of the OLS model is 0.76, indicating that five explanatory variables can explain 76% variance of self-reported COVID-19 cases at the block level. In general, all explanatory variables are significantly (P <0.001) and positively correlated with the dependent variable. (3)), shown in Figure 6 . Commercial POIs show the strongest influences on self-reported COVID-19 cases, which increase COVID-19 cases by 28% with the increase of one standard deviation. In terms of transportation, metro stations have stronger influences on COVID-19 cases than main roads. It is easy to understand that public transportation has a higher risk for infectious diseases. As for population, the increase of one standard deviation in population density and agring rate increase COVID-19 cases by 14% and 6%, respectively. We further investigate influences of explanatory variables on self-reported COVID-19 cases using the geographically weighted regression (GWR) method. The spatial distribution of coefficients and local R 2 in the GWR model are presented in Figure 7 . Generally, the adjusted R 2 of the GWR model is 0.96, which is significantly higher than that of OLS model (0.76). It's clear that the GWR model can explain the dependent variable more effectively, indicating that GWR has a high explanatory power and better fit ability in most blocks. In this model, the corrected Akaike information criterion (AICc) value of the GWR model (1305.6) is reduced by 57% compared with the global OLS model (AICc = 3052.5). Obviously, the performance of the GWR model is significantly improved. Nemurous studies investigating the COVID-19 epidimic are in a large scale like coutry, province, and state levels, while detailed explortation at smaller spatial scales is limited. This study contributes to reveal spatial disparities of COVID-19 cases and influencing factors at a fine spatial resolution in Wuhan, China. We collected more than nine hundred self-reported COVID-19 cases in Wuhan through a large Chinese social media platform (Weibo, like Twitter), compensating for the vacancy of detailed confirmed COVID-19 cases at the intra-urban scale. The proportions of self-reported cases at the district level are consistent with official published data, and they also share a very similar distribution of ages with official published data, suggesting the representativeness of self-reported cases to quantifying spatial heterogeneities of the COVID-19 epidemic in Wuhan. Such social media data and other big data have great potential applications in response to disasters and public emergencies . Nevertheless, self-reported COVID-19 cases are not final cases after all, and there may be a bias in the spatial distribution of final cases. Overall, there are obvious spatial clusters of self-reported COVID-19 cases, showing obvious spatial heterogeneities of the COVID-19 epidemic, which was domenstrated by previous related studeis (Mena et al., 2021; Yang et al., 2021) . Areas with higher morbidity rates are mainly concentrated on commercial centers and populous residential areas, where there are higher population densities and a higher frequency of social interactions. The OLS model shows that population dynamics, transportation, and social interaction account for 76% variance in self-reported COVID-19 cases. The GWR model has a better performance (adjusted R 2 = 0.96) than the OLS model and reveals spatial disparities of influences of explanatory variables on self-reported COVID-19 cases. The COVID-19 epidemic asked us to rethink the city, to reflect on urban planning, construction and governance: including city size, urban density, and community design (Batty, 2020; Moosa and Khatatbeh, 2021; Sharifi and Khavarian-Garmsir, 2020) . The core of a city is the agglomeration of people, which is reflected in two aspects, size and density . Size and density support the economic output and knowledge innovation of cities, but also provide hotbeds for the breeding and spread of infectious diseases (Bettencourt et al., 2007; Lei et al., 2021) . It is especially necessary to incorporate healthy city in urban planning. Firstly, in high-density cities, the focus is on improving the accessibility of public facilities and services, and increasing urban green space and open space (Liu et al., 2021) . Secondly, we should improve urban resilience and improve the ability of cities to deal with emergencies through the improvement of self-sufficiency in resource and reducing in ecological footprint . Finally, local realities should be taken into consideration in community design. This study also have limitations. Many other factors may have influences on the spread of COVID-19 epidemic in cities, such as socioeconomic status, occupation, urban structure, housing quality, etc. (Hu et al., 2021; Mansour et al., 2021; Megahed and Ghoneim, 2020) . On the other hand, those demographical and urban characteristics have complex on the infectious diseases. For example, the enclosed community in China is usually critizited for isolation of human mobility and incrasing in traffic congestion. However, it is the enclosed community that allows the stay-at-home order to be implemented well in Wuhan and other Chinese cities (Huang et al., 2021) . From this point of view, it is very meaningful to analyze the COVID-19 infection rate in enclosed and open communities in Wuhan. Wuhan is the initial epicenter of the COVID-19 epidemic. This study use self-reported COVID-19 cases in Wuhan from a Chinese social media platform (Weibo) to quantify spatial intensity of COVID-19 epidemic within the city. The self-reported cases share consistent properties with official published data at the macro level but with detailed locations. Self-reported cases are mainly concentrated in commercial centers and populous residential areas, verifying spatial clusters of COVID-19 epidemic there. Population dynamics, transportation, and social interactions strongly determine the spatial disparities of the COVID-19 epidemic at the block level. The GWR model characterizes local variations of influences of five explanatory variables on the COVID-19 epidemic and performs better than the OLS model. Our findings enlighten us to optimize urban design, transform urban infrastructure, and create a healthy city. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Seeing COVID-19 through an urban lens The Coronavirus crisis: What will the post-pandemic city look like? Growth, innovation, scaling, and the pace of life in cities Mobility network models of COVID-19 explain inequities and inform reopening Spatial analysis of COVID-19 clusters and contextual factors in New York City Living environment matters: Unravelling the spatial clustering of COVID-19 hotspots in Kolkata megacity An interactive web-based dashboard to track COVID-19 in real time 2020) A population-based cohort study of socio-demographic risk factors for COVID-19 deaths in Sweden Geographically weighted regression: the analysis of spatially varying relationships Geographical and Temporal Weighted Regression (GTWR) Geographically weighted regression and multicollinearity: dispelling the myth Environmental health perspectives 129, 075001. HABITAT, U., (2020) Policy Brief: COVID-19 in an Urban World The role of built and social environmental factors in Covid-19 transmission: A look at America's capital city Importance of community containment measures in combating the COVID-19 epidemic: from the perspective of urban planning Exploring the nexus between social vulnerability, built environment, and the prevalence of COVID-19: A case study of Chicago Urban scaling in rapidly urbanising China. Urban studies Association of built environment attributes with the spread of COVID-19 at its initial stage in China Associating COVID-19 Severity with Urban Factors: A Case Study of Wuhan The impacts of the built environment on the incidence rate of COVID-19: A case study of King County Geographically weighted regression with parameter-specific distance metrics Geographically weighted regression with a non-Euclidean distance metric: a case study using hedonic house price data The GWmodel R package: further topics for exploring spatial heterogeneity using geographically weighted models Exploring spatiotemporal effects of the driving factors on COVID-19 incidences in the contiguous United States Sociodemographic determinants of COVID-19 incidence rates in Oman: Geospatial modelling using multiscale geographically weighted regression (MGWR) COVID-19 and Inequity: a Comparative Spatial Analysis of New York City and Chicago Hot Spots Antivirus-built environment: Lessons learned from Covid-19 pandemic Socioeconomic status determines COVID-19 incidence and related mortality in GIS-based spatial modeling of COVID-19 incidence rate in the continental United States The density paradox: Are densely-populated regions more vulnerable to Covid -19? The International Journal of Health Planning and Management The Importance of Scale in Spatially Varying Coefficient Modeling Exploring Urban Spatial Features Spatial dynamics and genetics of infectious diseases on heterogeneous landscapes The COVID-19 pandemic: Impacts on cities and major lessons for urban planning, design, and management A spatial analysis of the COVID-19 period prevalence in U.S. counties through The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases (COVID-19)-China Multicollinearity and correlation among local regression coefficients in geographically weighted regression Geographically varying relationships between population flows from Wuhan and COVID-19 cases in Chinese cities Lockdown Induced Night-Time Light Dynamics during the COVID-19 Epidemic in Global Megacities Compact Urban Form and Expansion Pattern Slow Down the Decline in Urban Densities: A Global Perspective Urban design attributes and resilience: COVID-19 evidence from Effects of meteorological conditions and air pollution on COVID-19 transmission: Evidence from 219 Chinese cities COVID-19: Challenges to GIS with Big Data