key: cord-0850567-36sodc87
authors: Sun, Yeran; Xie, Jing; Hu, Xuke
title: Detecting Spatial Clusters of Coronavirus Infection Across London During the Second Wave
date: 2021-08-03
journal: Appl Spat Anal Policy
DOI: 10.1007/s12061-021-09413-3
sha: 6644998c9e0de1a4f08ebceb58c31b976572bf28
doc_id: 850567
cord_uid: 36sodc87

The identification of seriously infected areas across a city, region, or country can inform policies and assist in resources allocation. Concentration of coronavirus infection can be identified through applying cluster detection methods to coronavirus cases over space. To enhance the identification of seriously infected areas by relevant studies, this study focused on coronavirus infection by small area across a city during the second wave. Specifically, we firstly explored spatiotemporal patterns of new coronavirus cases. Subsequently, we detected spatial clusters of new coronavirus cases by small area. Empirically, we used the London-wide small-area coronavirus infection data aggregately collected. Methodologically, we applied a fast Bayesian model-based detection method newly developed to new coronavirus cases by small area. As empirical evidence on the association of socioeconomic factors and coronavirus spread have been found, spatial patterns of coronavirus infection are arguably associated with socioeconomic and built environmental characteristics. Therefore, we further investigated the socioeconomic and built environmental characteristics of the clusters detected. As a result, the most significant clusters of new cases during the second wave are likely to occur around the airports. And, lower income or lower healthcare accessibility is associated with concentration of coronavirus infection across London.

To contain the spread of coronavirus, governments implemented stay-at-home policies and social distancing measures. The closure of public places (e.g., pubs, bars, schools, etc.) and working from home prevent the spread of coronavirus among people. The identification of seriously infected areas across a city, region, or country can inform policies and assist in resources allocation. Concentration of Coronavirus infection can be identified through applying cluster detection methods to coronavirus cases over space. With the outbreak of global coronavirus pandemic, georeferenced coronavirus cases are being reported on a regular basis. To protect patient privacy, coronavirus cases are not released at the individual level. Instead, coronavirus cases are aggregated into areal units (e.g., neighborhoods/ districts, towns/cities, provinces/states, countries, etc.) before being released. Recently, aggregate-level coronavirus cases by small areas are available in some countries or regions. Therefore, small-area coronavirus cases enable researchers to better explore spatial patterns of coronavirus infection and model spread of coronavirus over space and time. On the one hand, some studies performed spatial analysis of coronavirus cases across a country (e.g., Adekunle et al., 2020; Guliyev, 2020; Huang et al., 2020; Mollalo et al., 2020) . They explored spatial variations of coronavirus infection rate in relation to socioeconomic and environmental factors across China (e.g., Guliyev, 2020; Huang et al., 2020) , United States (e.g., Mollalo et al., 2020) , and Africa (e.g., Adekunle et al., 2020) . And, city-wide studies of coronavirus cases and deaths by small areas have been conducted as well (e.g., Cordes & Castro, 2020; Harris, 2020) . On the other hand, some scholars modelled the dynamic spread of coronavirus according to travel patterns of people across China (Zheng et al., 2020) , Italy (Gatto et al., 2020) , and U.S. (Velásquez & Lara, 2020) . As most of the relevant studies used the coronavirus case data collected between February and April when mass testing was not available, their findings are of some potential bias.

Moreover, the identification of areas seriously infected by coronavirus had been conducted by some studies Hohl et al., 2020) . Hohl et al. (2020) and Desjardins et al. (2020) had applied a spatio-temporal cluster detection method (i.e., Kulldorff's prospective space-time scan statistic) to county-level coronavirus infection across U.S. To enhance the identification of seriously infected areas, we further focused on coronavirus infection by small area across a city during the second wave. Specifically, we firstly explored spatiotemporal patterns of new coronavirus cases. Subsequently, we detected spatial clusters of new coronavirus cases by small area. Empirically, we used the Londonwide small-area coronavirus infection data aggregately collected. And, the data collected since June is of higher quality than those collected before June. Since mass testing is available for London since June, number of confirmed cases after June is more reliable than that before June. Methodologically, we applied a fast Bayesian model-based detection method newly developed (Gómez-Rubio et al., 2018) to new coronavirus cases by small area due to its advantages: mode-based approach accounting for covariates and the application of a fast approximation method (integrated nested Laplace approximation) instead of conventional one (Markov chain Monte Carlo methods). Gómez-Rubio et al. (2018 ) extend previous work (e.g., Jung, 2009 to cope with new problems and provide a new way of assessing cluster significance and importance by means of a more general model selection criteria. As empirical evidence on the association of socioeconomic factors and coronavirus spread have been found (e.g., Adekunle et al., 2020; Cordes & Castro, 2020) , we could speculate that spatial clusters of coronavirus infection might be associated with socioeconomic and built environmental characteristics. Therefore, we further investigated the socioeconomic and built environmental characteristics of the clusters detected. Compared to relevant studies Hohl et al., 2020) , this study made new contributions as follow: (1) usage of more reliable data collected when mass testing is available; (2) applying a new and faster cluster detection method which can further incorporate covariates into the cluster detection; and (3) a city-wide study of coronavirus infection by small area.

The coronavirus case count by small area is available for the UK (https:// coron avirus. data. gov. uk/ detai ls/ cases). The data offer monthly number of new coronavirus cases by Middle Layer Super Output Areas (MSOAs). In this study, the latest demographic data by MSOA are used as covariates in the cluster detection. Specifically, population by MSOA is available for 2019 (https:// www. ons. gov. uk/ peopl epopu latio nandc ommun ity/ popul ation andmi grati on/ popul ation estim ates/ datas ets/ middl esupe routp utare amidy earpo pulat iones timat es), and annual household income by MSOA is available for 2015/2016 (https:// data. london. gov. uk/ datas et/ ons-model-based-income-estim ates--msoa). The locations of hospitals, police stations and schools are acquired from the Ordnance Survey (https:// www. ordna ncesu rvey. co. uk/ busin ess-gover nment/ produ cts/ points-of-inter est); while the land use data for 2018 was downloaded from the Copernicus (https:// land. coper nicus. eu/ local/ urban-atlas). 

In this study, we first explored spatial and temporal variations of coronavirus infection respectively.

In this study, we subsequently explored spatial shift of highly infected areas by investigating local spatial association of coronavirus infection rate before the wave and coronavirus infection rate during the wave. Accordingly, bivariate local Moran's I test is used to quantify the local spatial association between two variables. Specifically, a positive association (a positive Moran's I value) means high (low) values of one variable is surrounded by high (low) values of the other variable; whilst a negative association (a negative Moran's I value) means high (low) values of one variable is surrounded by low (high) values of the other variable. Besides, the bivariate local Moran's I testing was implemented in GeoDa (http:// geoda center. github. io/ index. html).

Based on the model-based approaches of Jung (2009) where μ i,t is the mean of area i at time t, and E i,t is the expected number of cases in area i at time t. c (j) i,t is a cluster dummy variable for spatio-temporal cluster j, and j is the coefficient of the cluster dummy variable." Moreover, the expected number of cases E i,t is computed through fitting a Poisson regression (generalised linear model) with offset log(E i,t ) on the covariates (Gómez-Rubio et al., 2018). Table 1 lists the covariates at the MSOA level, including socioeconomic characteristics (i.e., income, hospital accessibility, police station accessibility, school accessibility, population density) and built environment characteristics (i.e., land use composition and land use mix). Since a large portion of MSOAs have no hospitals and police stations, this study selected distance-based measures instead of density-based or count-based measures to quantify accessibility to hospital or police station. Table 2 shows the statistical description for all the covariates in this study. 

(1) log( i,t ) = log(E i,t ) + j c (j) i,t

In this study, the cluster detection was implementable in R. Specifically, the model-based cluster detection method used is supported by an R package named "DClusterm" (Gómez-Rubio et al., 2019).

We first explored spatiotemporal patterns of coronavirus infection in London. Figure 3a shows city-level daily number of new coronavirus cases in London. London is experiencing the first wave and the second wave in April and November respectively according to daily number of new cases confirmed in London. As mass testing was not available until June 2020, the number of new coronavirus cases was likely to be undercounted. At the same time, testing count during the second wave is dramatically higher than that during the first wave (see Fig. 3b ). Therefore, the reported amount of new coronavirus cases during the second is more reliable than that during the first wave. From September, a number of international students who were infected arrived in London. Since most of the students live in student dormitories, coronavirus virus is likely to transmit among students in dormitories. And, students are also likely to gather in pubs and travel around the city. Not only students but also residents are at increasing risk of being infected since the start of new semester. 

We performed the bivariate local Moran's I test of 'COVID-19 infection rate in October' and 'COVID-19 infection rate in November'. The bivariate local Moran's I testing result is shown in Fig. 5 . Figure 5 maps the clusters and outliers of 'COVID-19 infection rate in October' and 'COVID-19 infection rate in November'. In Fig. 5 , all the clusters and outliers are statistically significant at the 0.05 level. Clusters and outliers indicate the existence of positive and negative local spatial association respectively. Specifically, 'High-High' and 'Low-Low' represent two types of clusters; whilst 'Low-High' and 'High-Low' represent two types of outliers. In Fig. 5 

In this subsection, we detected spatial clusters of new cases across London during the second wave (i.e., November). We applied the fast Bayesian model-based cluster detection method to the 983 observations (983 MSOAs) with no covariates and after adjusting for covariates respectively. 

First of all, we implemented the model-based cluster detection method with no covariates. Standardised expected number of new cases E i,t was computed fitting a Poisson regression (generalised linear model) with offset log(E i,t ) (see Eq. (1)). As a result, 24 statistically significant clusters were detected with a p-value of below 0.05. After ranking these clusters according to the DIC, top 5 clusters are list in Table 3 and mapped in Fig. 6 . Those 5 clusters totally have 20% of the population of London. And those 5 clusters are all located around the Outer London rather than the Inner London (central London).

Subsequently, we implemented the model-based cluster detection method after adjusting for covariates. Covariate standardised expected number of coronavirus cases E i,t was computed fitting a Poisson regression (generalised linear model) with offset log(E i,t ) on one covariate: AHI (annual household income). The generalised linear model (GLM) estimated is shown in Table 4 . Expectedly, AHI is statistically significantly and negatively associated with observed number of coronavirus cases (response); and Dis_Hos is statistically significantly and positively associated. Unexpectedly, apart from Den_Sch and LUX, the remaining covariates (i.e., Den_Pop, Res_Per, CIT_Per and Rec_Per) are statistically significantly and negatively associated with observed number of coronavirus cases. Eventually, 28 statistically significant clusters were detected with a p-value of below 0.05. After ranking these clusters according to the DIC, top 5 clusters are list in Table 5 and mapped in Fig. 7 . The difference between the top 5 clusters detected with and without the covariate is not large. This means that the occurrence of those top 5 clusters is insufficiently explained by the covariates. Compared to the other clusters, Cluster 4 and Cluster 5 experience a dramatic decrease in size and at-risk population (i.e., a decrease by over 40%) before and after adjusting for covariates (see Figs. 6 and 7 as well as Tables 3 and 5 ). This indicates that Cluster 4 and Cluster 5 is partly attributable to the socioeconomic and built environmental characteristics.

London always has the highest coronavirus infection rate in the UK whilst the UK is one of the most seriously infected countries worldwide. London is densely populated and a top destination of international travellers. Particularly, after the summer most of the international students need to fly to London before going to their final destinations by train, bus, or car. Increasing coronavirus viruses were transmitted by students to London since the start of fall semester in September. This is arguably one key cause of the second wave.

The top 3 clusters are insufficiently attributable to socioeconomic and built environmental characteristics. Additionally, after comparing Figs. 1b and 7, we can find that the top 5 clusters are located around the areas that had been highly infected during the first wave (April 2020). Arguably, the top 5 clusters are partly attributable to the locals infected during the first wave. However, given the lack of individual-level coronavirus contact-tracing data it is nearly impossible to precisely assess how much contribution is made by locals or travellers.

Although clusters of coronavirus infection during the second wave are insufficiently attributable to socioeconomic characteristics, the findings in this study suggest that socioeconomically disadvantaged areas (i.e., areas with a lower income or a lower healthcare accessibility) are more likely to suffer a high risk of coronavirus or alike pandemics. Resource allocation by government should prioritise socioeconomically disadvantaged areas. And, stricter measures should be implemented in socioeconomically disadvantaged areas to reduce spatial disparities in coronavirus infection across London. Besides, effective tracking tools (e.g., coronavirus contact-tracing apps) are needed to better curb the spread of coronavirus in those areas.

In this study, we detected clusters of new coronavirus cases around London during the second wave. We applied a fast Bayesian model-based cluster detection method to smallarea number of new cases in November. As a result, the most significant clusters of new cases during the second wave are likely to occur in low-income areas with a low level of hospital access or a low level of police station access around the airports. The empirical study suggests a policy implication that socioeconomically disadvantaged areas are more likely to suffer a high risk of coronavirus or alike pandemics. Besides, the fast Bayesian model-based detection method is efficient and robust.

However, there are some limitations in this study. Firstly, we did not undertake spatiotemporal cluster detection due to data sparsity. Secondly, we did not compare the highly infected areas between November and April when is the first wave. Since mass testing was available since June, the number of confirmed cases is not appropriately comparable between November (mass testing available) and April (mass testing unavailable). It might be of much interest to compare concentration of coronavirus infection between the two waves. Thirdly, in the explanation of the occurrence of coronavirus clusters, we take no account of mobility patterns, such as the number of daily trips or time spent out of home, due to the lack of data. Fourthly, given the lack of accurate categorisation in the POI data used, schools were not further classified into primary schools, secondary schools, universities/colleges and others. Otherwise, the association of schools and coronavirus infection might differ from primary school to university/college since the latter has a substantially higher proportion of students who had international travel. Finally, unexpectedly, this study found that all the three main land use categories (i.e., residential land, recreational land, and commercial, industrial & transportation land) are likely to curb coronavirus infection and land use mix is not statistically significantly associated with coronavirus infection.

In the future, we will improve this study by addressing those limitations. Firstly, we would undertake spatiotemporal cluster detection at a lower geography level to address the data sparsity issue. Secondly, we would compare spatial clusters of coronavirus infection between the two waves once the number of coronavirus cases before the availability of mass testing could be adjusted through some models in future. The results would be compared with those in this paper to discuss the nfluence of time gap in some data on the model estimation results. Thirdly, we would attempt to acquire mobility data from social media or mobile phone data in future. We would investigate the how mobility patterns contribute to the clusters of coronavirus infection. Fourthly, we would focus more on accessibility to university/college than accessibility to school in general once the locations of accurately categorised schools are acquirable in the future. Finally, more studies are needed to further investigate the association of land use characteristics and coronavirus infection.

Modelling spatial variations of coronavirus disease (COVID-19) in Africa

Spatial analysis of COVID-19 clusters and contextual factors in New York City

Rapid surveillance of COVID-19 in the United States using a prospective space-time scan statistic: Detecting and evaluating emerging clusters

Spread and dynamics of the COVID-19 epidemic in Italy: Effects of emergency containment measures

Fast Bayesian classification for disease mapping and the detection of disease clusters

DClusterm: Model-based detection of disease clusters

Determining the spatial effects of COVID-19 using the spatial panel data model

Exploring the neighbourhood-level correlates of Covid-19 deaths in London using a difference across spatial boundaries method

Daily surveillance of COVID-19 using the prospective space-time scan statistic in the United States

Spatial-temporal distribution of COVID-19 in China and its prediction: A data-driven modeling analysis

A generalized linear models approach to spatial scan statistics for covariate adjustment

GIS-based spatial modeling of COVID-19 incidence rate in the continental United States

Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations

Forecast and evaluation of COVID-19 spreading in USA with Reduced-space Gaussian process regression

Spatial transmission of COVID-19 via public and private transportation in China

The authors would like to thank the anonymous reviewers.