key: cord-0450831-gucrtnqx
authors: Yucel, Shiv G.; Pereira, Rafael H. M.; Peixoto, Pedro S.; Camargo, Chico Q.
title: Impact of Network Centrality and Income on Slowing Infection Spread after Outbreaks
date: 2022-02-08
journal: nan
DOI: nan
sha: 28117b085016ba3ed09b6c987f634deb97efd6c7
doc_id: 450831
cord_uid: gucrtnqx

The COVID-19 pandemic has shed light on how the spread of infectious diseases worldwide are importantly shaped by both human mobility networks and socio-economic factors. Few studies, however, have examined the interaction of mobility networks with socio-spatial inequalities to understand the spread of infection. We introduce a novel methodology, called the Infection Delay Model, to calculate how the arrival time of an infection varies geographically, considering both effective distance-based metrics and differences in regions' capacity to isolate -- a feature associated with socioeconomic inequalities. To illustrate an application of the Infection Delay Model, this paper integrates household travel survey data with cell phone mobility data from the S~ao Paulo metropolitan region to assess the effectiveness of lockdowns to slow the spread of COVID-19. Rather than operating under the assumption that the next pandemic will begin in the same region as the last, the model estimates infection delays under every possible outbreak scenario, allowing for generalizable insights into the effectiveness of interventions to delay a region's first case. The model sheds light on how the effectiveness of lockdowns to slow the spread of disease is influenced by the interaction of mobility networks and socio-economic levels. We find that a negative relationship emerges between network centrality and the infection delay after lockdown, irrespective of income. Furthermore, for regions across all income and centrality levels, outbreaks starting in less central locations were more effectively slowed by a lockdown. Using the Infection Delay Model, this paper identifies and quantifies a new dimension of disease risk faced by those most central in a mobility network.

Since the start of the COVID-19 pandemic, an active literature has evolved to study the spread and dynamics of the disease from mobility networks [1, 2, 3, 4] or socio-spatial perspectives [5, 6, 7] . However, very few studies look at how both socio-economic conditions and network properties interact, and how they influence outbreaks together. Further, while extensive work has been done to model the spread of the virus and non-pharmaceutical intervention effectiveness in terms of cases, hospitalizations, and deaths [6, 7, 8, 9, 10, 11] there is a lack of emphasis on the timing of case spread, and how interventions can delay a region's first infection. Given the spatio-temporal granularity of cell phone mobility data capturing responses to lockdown policies, it is now possible to develop generalized, preventative methodologies which seek to further our understanding of disease vulnerability, and better prepare for novel outbreaks or variants. This paper develops the Infection Delay Model (IDM), a novel effective distancebased methodology that can be used for assessing how lockdowns can delay a region's first case and their intersection with socio-economic inequalities. The IDM captures the difference between disease arrival times with and without a lockdown, using a novel application of cellphone mobility data for effective distance research. To develop a forward-looking understanding of the impacts of interventions on the timing of disease spread, a use-case of the IDM is presented which considers the potential variability of future outbreak scenarios. Drawing from recent studies of network-driven contagion phenomena [12, 13, 14] , we simulate epidemics from every node in the transport network. By connecting those simulations with socio-economic data, generalizable insights are uncovered which can be applicable beyond the specific spreading patterns observed during COVID- 19. This paper uses the Metropolitan Region of São Paulo (MRSP) as a case study to apply the IDM. Given its unique position as an area of early disease introduction and high intrastate transmission, COVID-19 studies in the MRSP can help with preparation for future variants of COVID-19 or other pandemics [1, 15] .

One branch of literature on COVID-19 has focused on mobility networks to model the spread of the disease and assess the risks of cases and deaths. The data sources used to generate such networks range from domestic and international flight records [1, 16] , to cell phone mobility records and geo-located visits to places of interest [2, 3, 17, 18] . The varying spatio-temporal granularity of the data sources used in these analyses have led to diverse outputs to identify regions at risk and explore how non-pharmaceutical interventions (NPIs) such as lockdowns impact mobility and vulnerability.

This area of literature uses transport flows to construct aggregated networks of population movement. Various methods have been implemented to study COVID-19 risks on these mobility networks. Effective distance-based studies calculate the 'distance' between two regions based on the degree of mobility flows between themmore connected regions are effectively 'closer' [1] . The effective distance of a region from an outbreak location has been shown to be predictive of infection arrival times [12, 13] . Other studies build compartmental models on top of the mobility networks, calibrated to regional epidemic trajectories, and use epidemiological parameters and outbreak locations to simulate the course of an epidemic [3, 14, 2, 17] . As greater mobility and person-to-person contact is associated with transmission, epidemic simulations can be run on mobility networks with adjusted levels of mobility or contact patterns to explore the impacts of real or hypothetical interventions on health outcomes [4, 17] .

A separate branch of literature on COVID-19 has focussed on disease vulnerability and its intersection with existing socio-spatial inequalities. The range of analyses includes studies on how socioeconomic levels are associated with differences in terms of cases, hospitalizations, and deaths [6, 7] , health care facility access [19, 20] and inequalities in NPI adherence [6, 5, 21, 22] . These spatial analyses often seek to uncover how variables such as race and income relate to COVID-19 risks, to identify how existing inequalities are being compounded by the ongoing pandemic.

In an analysis of hospitalization and deaths in São Paulo, it was found that black and pardo Brazilians were more likely to be hospitalized and die of COVID-19 [6] . Similarly, an analysis of clusters and contextual factors of COVID-19 in New York City found that regions with larger black populations without health insurance had higher positive testing rates [7] . Cell phone mobility data has also been used to study the interaction of lockdown adherence and socio-economic inequalities. Conceptualizing mobility restrictions as a luxury not everyone can afford, it has been found that more vulnerable individuals were less able to reduce their mobility -potentially due to a lower probability of furlough or teleworking opportunities [5, 6] .

The first contribution of this paper is the integrated analysis of mobility networks and socio-economic characteristics to measure disease risk. While the socio-spatial branch of literature has consistently identified intersections between socio-economic vulnerability and disease burdens, current network-based studies either omit socioeconomic data, or include it to identify at-risk regions based on network-based results [1] . There is a lack of investigation into the interaction between network properties and socio-economic factors, and how they jointly impact the distribution of disease risk. It cannot be assumed that features such as network centrality and income are proxies for each other, justifying an investigation which explicitly examines both.

The second contribution of this study is its focus on the ability of lockdown restrictions to slow the spread of disease. Existing network-based and socio-spatial research on cases, hospitalizations, and deaths fail to measure a crucial goal of early lockdowns -slowing the spread of disease and delaying the time until a region's first case. Delaying disease onset with early interventions can buy time for health systems to increase hospital and intensive care capacity, and establish rapid testing sites [23] . This unexplored dimension of disease risk is investigated using the Infection Delay Model, an effective distance-based method of calculating disease arrival times under baseline and lockdown mobility scenarios. Current literature which explores rankings of disease arrivals using effective distances does so while assuming a single known outbreak location [12, 13] , or including a small subset of potential outbreak locations [1] . These studies also overlook how differences in social isolation levels across regions are shaped by socioeconomic inequalities. Given recent literature on the outsized influence of the outbreak region on the trajectory of a communicable disease [24] , this study simulates outbreaks beginning in every region of the MRSP, to allow for generalizable findings that do not assume that the next outbreak will begin in the same region as the last.

Cellphone Mobility Data Through an agreement with InLoco [25] , a Brazilian cellphone analytics company now known as Incognia, this paper had access to daily isolation levels for MRSP from March 1, 2020 to April 19, 2020. These data come spatially aggregated on a hexagonal grid using the H3 index at resolution 8 [26] . The data set contains 2893 hexagonal cells of roughly 740m 2 across the MRSP, of which 2599 had suitable time frames and auxiliary data after interpolation to be used in the analysis. The hexagonal isolation data is openly available in a data repository (see Availability of Data and Materials section). InLoco/Incognia gathers data by partnering with mobile phone applications, and uses software development kits to harvest location data while individuals are using the partnered app [2] . This form of location gathering provides precise geo-coordinates, which are anonymized and aggregated to develop the social-isolation indices. For a given hexagon cell, the proportion of individuals who reside in the cell and stay within it on a given day is recorded. This proportional value is used as a proxy for social isolation [6] , recording the extent to which individuals travel outside their residence area. Higher or lower social isolation values indicate that fewer or more individuals are leaving their residence area, respectively [18] . The distribution of social isolation hexagon cells is presented in Figure 1 . The uneven coverage of the MRSP hexagon cells is a feature of the data set provided by InLoco/Incognia, discussed in the limitations section.

The travel survey data for the MRSP were gathered from the 2017 MRSP household travel survey, conducted by the São Paulo Metropolitan Transportation Department between June 2017 and October 2018 [27] . The original data set is a table of survey responses regarding the total daily trips of 86,318 individuals who reside in the MRSP. On average, each individual reports 2.12 daily journeys, leading to a total of 182,994 trip reports [27] . Key information for the reports are the journey origin and destination, along with the travel time. The interviews were conducted across 39 municipalities within the MRSP, divided into 510 research zones for the purposes of the survey. Of all the research zones, 66% lie within the main municipality in the MRSP, São Paulo. The survey was designed to be statistically representative across the MRSP, and includes journey and population weights to scale responses by their frequency in the true population. From these weights, the total 2017 mobility flows between travel survey zones and 2017 estimates of populations were calculated. Population levels in 2020 were estimated by determining the geometric growth rate from 2010 and 2019 population totals, and scaling the 2017 populations [28, 29] .

This study uses socio-economic data from the official 2010 Brazilian Census, focused on the census tracts within the MRSP [30, 31] . Within the state of São Paulo, there are 68,296 tracts, with data included on the total population, racial aggregates, average income per capita (Brazilian Real per calendar month), functioning water networks, and other relevant socio-economic features. The census tracts within São Paulo state cover a larger area than both the cellphone mobility hexagon cells and travel survey zones, which are primarily focused on the MRSP. The population data from the the MRSP travel survey is more up to date than the 2010 census, therefore it is used in favour of the census data population totals. The census data remains useful for calculating regional income per capita averages, which are interpolated from census tracts into the hexagon cells.

Interpolating Data to Hexagon-Level While the social isolation hexagon cells provide spatially and temporally granular information on the daily proportion of residents leaving a given area, information on which population subgroups are included in each hexagon cell remain unknown. This problem is shared across the growing body of literature using cellphone mobility data for public health purposes, where anonymity measures by cellphone data providers obscure information on the sample [32] . While fundamental selection biases in the mobile phone data are a persistent issue, discussed in the limitations section, traditional data sources can be leveraged to generate population estimates within the hexagons [33] .

The census tracts and travel survey zones are constructed of varying spatial structures which must be mapped to the social isolation hexagon cells. This process, known as spatial interpolation, is used in geospatial studies to estimate values in unknown area units using values in known geographic units [34] . The spatial interpolation method used in this analysis is known as aerial weighting, which integrates socio-economic estimates based on proportional overlap [34] . This method depends on the assumption of homogeneously distributed characteristics within census tracts and travel survey zones, but benefits from transparency and simplicity relative to interpolation methods which rely on auxiliary information [34] . Each hexagon cell's overlap with census tracts and travel survey zones was determined relative to their total areas. This proportional overlap area was used to generate a weighted allocation for income and population levels. For example, if a hexagon cell covered 50% of a travel survey zone with a population of 20, the hexagon cell would be assigned 10 individuals. Figure 1 geographically displays the interpolated populations across the hexagon cells in the MRSP, and Figure 2 and To interpolate the 2017 travel survey network to the hexagonal cells, the homogeneity assumptions of aerial weighting are extended to mobility flows between travel survey zones [35] . It is assumed that a hexagon cell overlapping with a given origin zone has a proportional quantity of outflow to all its targets. Similarly, hexagon cells overlapping with a given destination zone receive inflow from all relevant origin zones proportional to their intersection with that destination zone. An illustration of the travel flow interpolation to hexagon cells is provided in Figure  3 .

Region A

Hex X 12% Hex Y 25% Figure 3 Travel survey data interpolation strategy from travel zones to hexagon cells. Outflow and inflow are proportional to a hexagon cell's overlap with a travel zone. To estimate travel patterns within a given hexagon, known inflow and outflow between travels zones A and B are proportionally allocated based on the overlap. For example, Hexagon X overlaps with 12% of Region A, and therefore 12% of Region A's outflow is assigned to Hexagon X. Hexagon Y overlaps with 25% of Region B, and therefore 25% of the 12% outflow is assigned to Hexagon Y as inflow.

Based on the interpolated mobility network, the in-degree centrality of each hexagon cell is calculated. In-degree centrality is the number of edges that directly flow into a cell, representing the diversity of inflow connections -associated with a region's time to infection [36, 37] . The weighted in-degree (total travellers in) and weighted out-degree (total travellers out) are also highly correlated with the in-degree, and have been shown to influence the spread of disease [38] . The distribution of in-degree centrality in the hexagon-interpolated mobility network are presented in Figure 4 and Table . 

The inputs to the Infection Delay Model are the effective distances from all pairs of hexagon cells, calculated under baseline and intervention mobility scenarios -the baseline being the scenario with no mobility restrictions, i.e. no form of lockdown. These effective distances are translated into two sets of infection arrival times, whose differences represent the 'infection delay' of an intervention. Subsequently, an algorithm is developed to compute the infection delays of an intervention for a given region over the course of an outbreak, outputting a time series plot defined as the infection delay curve.

To calculate the effective distances for the hexagon-scaled mobility network, this analysis uses the 'dominant path' effective distance, a metric used in numerous disease arrival time analyses [13, 12, 1, 39] , translated into Python by [13] . Measures of the dominant path effective distance focus solely on the most probable path of transmission from hexagon i to j. To calculate this value, for every connected origin i and destination j in the network, we calculate the transition rate as the proportion of total travellers beginning in hexagon i who arrive in hexagon j, denoted as 0 ≤ P ij ≤ 1 [12] . The effective distance between hexagons i and j is calculated as:

which is used as an edge weight for every pair of i, j hexagons, or nodes in the mobility network [12] . These edge weights, greater than or equal to one, are used in a weighted shortest path analysis to determine the dominant path effective distance between every pair of hexagon cells. With d ij calculated for all edges in the network, the dominant path between i and j is chosen as the path which minimizes the sum of effective distance edge weights between them. Finally, the dominant path effective distance between two hexagons (D ij ) is calculated as the sum of the effective distances along the determined shortest path. This basic dominant path effective distance can be used to detect rankings of arrival times for a given outbreak location [12] .

The traditional dominant path effective distance model is solely based on the mobility network, captured by P ij , and does not have parameters which can incorporate changing epidemiological parameters or rates of mobility reduction in the network. To add epidemiological and mobility-based parameters, useful for a comparative analysis, the effective distance formula is altered to

shown in [13] , where β and µ are the infection and recovery rate. The mobility compound parameter κ, representing the proportion of the circulating population, is altered to incorporate mobility reductions, given by (1.0−mobility reduction/100.0)× κ 0 , where the mobility reduction goes from 0 to 100%. In this compound parameter, κ 0 is the mobility rate, chosen to be 10%, which also ensures the logarithm is positive after the subtraction of λ, the Euler-Mascheroni constant [13] . As κ 0 is constant between the baseline and intervention scenarios, its value does not impact the infection delay value when the differences in arrival times are calculated between the two. The reproductive number R 0 is chosen to be 2.9, based on an epidemiological characterisation of the MRSP early in the pandemic [40] . The infectious period is chosen to be 9.2 from a mathematical analysis of COVID-19 in Brazil [41] . The infection rate is thus R 0 /infectious period = 2.9/9.2, and the recovery rate is given by 1/infectious period=1/9.2. It is important to note that the transition rate P ij calculation is unaltered from the traditional model. As the mobility compound parameter κ rises, d ij decreases, indicating that i and j are effectively farther. Similarly to the traditional model, for every potential outbreak and target hexagon cell in the network, the dominant path effective distance is generated from the weighted shortest path analysis, generating a 2599 × 2599 matrix of effective distances. This method is able to calculate effective distances between hexagons irrespective of whether they are directly or indirectly connected.

Two 2599 × 2599 matrices of effective distances are calculated for every potential origin i and destination j, under the following mobility flow scenarios:

1 No mobility reduction (baseline scenario) 2 Reduction in mobility based on hexagonal isolation changes The first scenario assumes no interventions, where arrival times are calculated using the baseline travel pattern information (mobility reduction = 0). The second scenario assumes that hexagon cells reduce their mobility by the same amount as observed during the first wave of the pandemic, through leveraging the cellphone social isolation information. To determine the extent of the mobility reduction for each region, the marginal change in social isolation from pre-lockdown to post-lockdown is calculated. The initial isolation value for each hexagon is calculated as the mean across March 1 to March 15, the two weeks leading up to the MRSP's lockdown [42] . The lockdown isolation value for each hexagon is calculated as mean from March 16 to March 30 2020, capturing the initial regional responses to lockdown measures. After determining the marginal change in real isolation for each origin hexagon, the effective distance calculation becomes:

This representation of effective distance is used to approximate how rapidly a disease would spread from hexagon i to j given the observed change in pandemic isolation for region i. The adjustment of the compound κ term to κ mobility reduction i is a novel contribution of the study, allowing the analysis to capture heterogeneous changes in mobility based on cellphone mobility data, known to intersect with socioeconomic vulnerability in the MRSP [6] .

To generate an estimation of arrival times based on the effective distances, this paper employs the methods used in [13] , dividing the effective distance by the effective velocity, defined as V EF ≈ β − µ, where β is the infection rate and µ is the recovery rate. The arrival time for a disease to arrive from location i to location j, including both the dominant path effective distance D ij (sum of shortest effective distance path from i to j) and velocity is thus:

Having generated the arrival times under both scenarios for every i, j combination, the infection delay by an intervention for an introductory case arriving from origin i to destination j is calculated as:

The infection delay (ID ij ) values are calculated for every pair of hexagon cells, generating a 2599 × 2599 matrix where each i, j value represents the additional time to a case arriving from i to j given a reduction a traffic proportional to i's real mobility change.

Using known changes in mobility to understand intervention effectiveness takes into account the inequality in regional responses, and allows intervention scenarios to mimic the real capacities of hexagon cells to isolate and adhere to policy guidelines. Having the arrival times in T intervention ij reflecting the real mobility changes allows for an infection delay analysis which better captures the lived experience of each of the 2599 hexagon cells in determining the relative benefits from early interventions.

From the MRSP's first case of COVID-19 to its widespread presence, this analysis determines the time 'added' until a region's first case (infection delay) by an intervention at every hypothetical time t, assuming no intervention before t. At time t = 0, only the initial outbreak location i 0 has the disease, and each hexagon's infection delay by an intervention is ID 0 ij0 , representing the change in intervention arrival time relative to the baseline arrival time from i 0 to j. For every t ≥ 1, each hexagon cell's infection delay value is determined based on the currently infected regions. To calculate this value, for every hexagon cell j and discrete time step t, the following algorithm is developed:

1 Determine all infected hexagon cells at time t 2 Determine the infection delay of an intervention across all currently infected hexagon cells relative to destination j 3 Select the minimum infection delay value Following this algorithm, the IDM generates a time-series infection delay curve. An example plot is presented in Figure 5 , for a given hexagon A and outbreak location B. There are two primary factors that interact to create the structure of the infection delay curve: (1) the effective distance of infected hexagon cells to the hexagon cell of interest; (2) the degree of mobility reduction of infected hexagon cells. The outbreaks used in this analysis will be simulations calculated from a compartmental epidemiological model. Hypothetical infection delay curve for region-at-risk A caused by a lockdown, following an outbreak beginning in region B. At time t = 0, location B would be the only infected regionas the outbreak location. At this time, a lockdown would allow region A to gain approximately 6.6 days (y-axis) until its first case of COVID-19. If the disease were to spread unmitigated until time t = 30 days, a lockdown would provide a gain of only 2 days before region A's first case. At the 40-day mark following an outbreak in region B, without any intervention, region A would already be infected. Thus, a lockdown intervention at this point would have no ability to delay the onset of infection, with a y-axis value of 0.

To generalize the findings of the infection delay analysis to outbreak scenarios other than those observed during COVID-19, epidemic outbreaks are simulated beginning in each of the 2599 hexagons in the MRSP. This paper employs the commuter-oriented susceptible-infected-removed (SIR) model used in [43] , on GitHub as EpiCommute. While the original model is used to simulate the spread of COVID-19 in 401 German counties, this analysis uses the 2599 social isolation hexagons, providing their interpolated populations and mobility flows.

For each outbreak scenario, the calculated arrival times are used in conjunction with the IDM to generate infection delay curves for every hexagon cell. The end result is 2598 infection delay curves for every hexagon (excluding its own outbreak), each one encapsulating the infection delay to the first case by an intervention at every time t.

To extract key information from each hexagon cell's 2598 infection delay curves, the median infection delay from an intervention over the first 10 days is used to summarize the curve. The first 10 days are chosen as they best exemplify the differences in infection delays across early outbreak scenarios, after which the curves begin to converge. Figure 6 displays the pipeline for calculating median infection delay curves for each hexagon cell. In the first set of results, rather than assigning every infection delay curve an equal weight, assuming that each scenario is equally likely, each curve is weighted by the in-degree centrality of its outbreak location. Each hexagon cell is divided into centrality and income quartiles, and their relationships to infection delays are explored. A one-way ANOVA test is performed on the infection delay values to determine whether the differences are statistically significant. In the second analysis, each hexagon cell's 2598 infection delay curves are divided into two groups based on the in-degree centrality of the outbreak location. A student's t-test is performed on the two groups of infection delay values to test whether the differences are statistically significant. 

The relationship between greater centrality and lower infection delay values is displayed in Figure 7 and Table 3 . Within every income quartile, greater centrality is associated with a lower median infection delay value. These differences between infection delay values across centrality quartiles, controlling for income quartile, are statistically significant at the p < 0.01 level based on the one-way ANOVA test. Figure 8 shows the geographic distribution of weighted median infection delay values. In−Degree Quartile Weighted Infection Delay Table 3 Weighted median infection delay values across income and in-degree quartiles. Median income per capita (Brazilian Real per Month) and in-degree centrality within each quartile subgroup is shown. The differences in weighted median infection delay values across in-degree centrality quartiles are statistically significant (p < 0.01). 23 Income Quartile Weighted Infection Delay Figure 9 Weighted median infection delay values across income quartiles, while controlling for in-degree centrality.

Each hexagon cell's infection delay value is subsequently calculated and shown when the outbreak location is in the bottom versus top 50% of centrality. For every hexagon cell, this creates two infection delay values, shown side-by-side in Figures  10 and 11 . We see that greater centrality is associated with lower infection delays, irrespective of income, and no clear pattern across income groups is observed when controlling for centrality -similarly to Figures 7 and 9 . The results also show that irrespective of the income and centrality grouping, outbreaks beginning in hexagon cells of lower centrality lead to greater infection delays of lockdowns. The student's t-test indicates a statistically significant (p < 0.01) difference between infection delay values depending on whether the outbreak location's centrality is below of above the median. 

This analysis has sought to uncover how the socio-economic and network characteristics of a region relate to the delay of its first case from an early intervention. The results of the Infection Delay Model indicate that the centrality of a region, independent of its income level, plays the largest role in determining how an early intervention will delay their first infection. There is no discernable relationship between income levels and the ability of a lockdown to slow the arrival of disease when controlling for centrality. Although previous studies have shown that vulnerable communities with lower isolation levels have higher infection rates of COVID-19 [5, 6, 7] , our results suggest that the influence of socioeconomic and isolation inequalities in determining disease arrival is overridden by the outsized influence of centrality in the network. As an effective distance-based analysis, more central regions tend, on average, to be 'closer' to infected regions. This proximity reduces the potential infection delay of a lockdown, with an opposite mechanism in play for less central regions.

The literature produced during the COVID-19 pandemic has thoroughly highlighted the importance of socio-economic factors and their relationship to disease risk, rationalizing their use as more than a passive add-on to network-based results. The growing prioritization of socio-economic inequalities as a driving force of disease risk is exemplified in studies such as [17] , who study how eviction rates in Philadelphia have a measurable impact on the spread of COVID-19. The Infection Delay Model reflects socio-economic inequalities in the MRSP by incorporating real-life mobility reductions -known to be weaker in vulnerable areas [6] -as a core component in the effective distance network analysis. Income is then used as a key axis to explore infection delays, found to be overpowered by a region's centrality.

Rather than contradicting existing literature on the health burden inequalities associated with socio-economic status, this paper uncovers an unexplored perspective on pandemic preparedness. The emphasis of previous literature on case, death, and hospitalization counts illuminate how vulnerable groups are most at risk during the course of an outbreak [6, 23, 21, 5, 7, 19, 1] . This paper targets a different, intervention-focused question: How much time can be gained to a region's first case from an early lockdown? It cannot be assumed that the same mechanisms leading to greater disease risk during an outbreak lead to reduced intervention effectiveness prior to an outbreak. Our results, in conjunction with the established literature on socio-economic vulnerability and COVID-19, illuminate an additional burden faced by low-income, centrally located regions.

A major contribution of this study is its generalized, forward-looking characterisation of intervention effectiveness. Rather than relying on a single set of initial conditions when modelling a disease, or using a subset of transport hubs as outbreak locations, this analysis incorporates all possible outbreak locations when assessing how early interventions lead to infection delays. This allows for broad understandings of intervention effectiveness whose validity is not reliant on the next epidemic beginning in the same location as the last. This addresses the recently explored importance of outbreak locations on disease trajectories, providing generalizable insights for future disease preparedness [43] . We are able to use the abundance of scenarios to generate weighted median infection delay values (Figures 7 and 9 ), emphasizing the dominant role of centrality. Further, we can divide outbreak locations into low and high centrality groups (Figures 10 and 11) , and show that the infection delays of interventions vary based on the centrality of the outbreak location. We see that irrespective of the income or centrality quartile of recipient regions, outbreaks beginning in less central regions tend to lead to greater slowdowns.

Research into the effectiveness of government interventions to slow disease spread is essential, as the disaster resulting from the COVID-19 pandemic and its emerging new variants continues globally. The novel Infection Delay Model proposed in this study provides a method of capturing how mobility reductions can slow the spread of an outbreak while considering the network patterns of mobility flows, an important element of intervention effectiveness. The data-linkage approach, interpolating travel behaviour and socio-economic data, allowed for insights into the social context of regions and how interventions can delay a region's first case. The unique integration of cellphone mobility data into the effective distance metrics has captured heterogeneous changes in isolation, found in prior literature to intersect with socio-economic inequalities [6, 5] . While this analysis is focused on Brazil, a region where income, health, and transport inequalities are stark [44] , the presented approach can be applied in other regions to observe the intersection of intervention effectiveness, centrality, and socio-economic vulnerability. Similarly, the epidemiological parameters in this analysis are chosen to mimic COVID-19, but a novel variant or disease's reproduction rate and infectious period could be used as substitutes. Adopting interdisciplinary methodologies to investigate the effectiveness of interventions, with a focus on exploring inequalities, may provide novel insights into the factors driving the unequal playing field exposed during the COVID-19 pandemic.

Based on the Infection Delay Model algorithm, the delays calculated for a given region are dependent on the reductions of mobility flows that arrive to it, rather than its own mobility reduction. This operates well under a regime where first cases arrive from individuals travelling from other locations. Advancements of the Infection Delay Model which capture how a first disease introduction to a region can originate from one of its residents travelling elsewhere would capture an important dimension of disease transmission. This may lead socioeconomic and isolation inequalities to play a stronger role in shaping infection delay curves. Further, rather than calculating the time to a region's first case, a case threshold such as 5% infection-rate of the population could be implemented, in which case a region's own social isolation capabilities would more directly impact its infection delay value. These adaptations of the Infection Delay Model can expand its scope in capturing the concept of intervention effectiveness, as its current focus on the delay to a region's first case is only one important element.

When considering cellphone data sources, originally collected for commercial purposes, coverage bias should be noted. As a cellphone analytics company, the sample of users in the Inloco/Incognia data set is determined by their market share, rather than an emphasis on representative samples [45] . The near-global ubiquity of cellphones does not preclude biases, as possession and use rates vary across demographic and income groups [46] . The elderly are often underrepresented in such samples, while educated urban males are overrepresented relative to lower-income individuals [46] .

In a preliminary analysis, a modified radiation model was used to determine if the results using real commuting data could be replicated with a generalized model. When observing the outbreak locations which led to above and below average infection delays for the rest of the MRSP, the radiation network overstated the influence of income-related mobility reductions relative to centrality. This may have occurred because the radiation model failed to replicate regional hubs with disproportionately large connectivity throughout the commuting network. This caveat should be considered for future research using effective distance-based metrics on artificially generated commuting data.

The suitability of integrating traditional household travel survey data with the aggregated social isolation cellphone data deserves exploration by future research. This study recommends comparing granular cellphone mobility location pairs, and observing how daily travel patterns and their changes after lockdown resemble those found in this paper's analysis. If providing similar results, the greater anonymity of the aggregated social isolation data may render it a more readily accessible and minimally invasive method for granular mobility-related studies. 

Assessing the spread of COVID-19 in Brazil: Mobility, morbidity and social vulnerability

Modeling future spread of infections via mobile geolocation data and population dynamics. An application to COVID-19 in Brazil

Mobility network models of COVID-19 explain inequities and inform reopening

Effects of short-term travel on COVID-19 spread: A novel SEIR model and case study in Minnesota

The association between socioeconomic status and mobility reductions in the early stage of England's COVID-19 epidemic

Higher risk of death from COVID-19 in low-income and non-White populations of São Paulo, Brazil

Spatial analysis of COVID-19 clusters and contextual factors in New York City. Spatial and spatio-temporal epidemiology

Estimating the effects of non

Modeling the effect of lockdown timing as a COVID-19 control measure in countries with differing social contacts

The effect of human mobility restrictions on the COVID-19 transmission network in China

Impact of lockdown on COVID-19 prevalence and mortality during 2020 pandemic: observational analysis of 27 countries

The hidden geometry of complex, network-driven contagion phenomena (Science (1337))

Effective distances for epidemics spreading on complex networks

Multiscale mobility networks and the spatial spreading of infectious diseases

Evolution and epidemic spread of SARS-CoV-2 in Brazil

Airline transportation and arrival time of international disease spread: A case study of Covid-19

The effect of eviction moratoriums on the transmission of SARS-CoV-2. medRxiv

A snapshot of a pandemic: The interplay between social isolation and COVID-19 dynamics in Brazil

Geographic access to COVID-19 healthcare in Brazil using a balanced float catchment area approach

Examining spatial accessibility to COVID-19 testing sites in Florida

Neighbourhood income and physical distancing during the COVID-19 pandemic in the United States

COVID-19 policy analysis: labour structure dictates lockdown mobility behaviour

Effect of socioeconomic inequalities and vulnerabilities on health-system preparedness and response to COVID-19 in Brazil: a comprehensive analysis

Finding disease outbreak locations from human mobility data

Incognia: Política de Privacidade COVID-19

METRÔ-SP: Pesquisa Origem Destino 2017: A mobilidade urbana da Região Metopolitana de São Paulo em detalhes

Sinopse de Censo Demográfico 2010

Demographic Census

geobr: Loads shapefiles of official spatial data sets of brazil

The use of mobile phone data to inform analysis of COVID-19 pandemic epidemiology

Quantifying the importance and location of SARS-CoV-2 transmission events in large metropolitan areas

Spatial interpolation using areal features: A review of methods and opportunities using new forms of data with coded illustrations

Interpolating Spatial Interaction Data1: Interpolating Spatial Interaction Data

A Model for the Spread of Infectious Diseases in a Region

Infection in Social Networks: Using Network Analysis to Identify High-Risk Individuals

Corona and coffee on your commute: a spatial analysis of COVID-19 mortality and commuting flows in England in 2020

Global disease spread: Statistics and estimation of arrival times

Epidemiological and clinical characteristics of the COVID-19 epidemic in Brazil

Mathematical model of COVID-19 intervention scenarios for São Paulo-Brazil

The Impact of COVID-19 Partial Lockdown on Primary Pollutant Concentrations in the Atmosphere of Rio de Janeiro and São Paulo Megacities (Brazil)

COVID-19 lockdown induces disease-mitigating structural changes in mobility networks

Coronavirus in Brazil: The heavy weight of inequality and unsound leadership

On the Use of Human Mobility Proxies for Modeling Epidemics

Mapping global variation in human mobility

We would like to thank Inloco/Incognia, for providing access to valuable privately owned mobility data. 

The authors declare that they have no competing interests.

Author's contributions All authors conceived and designed the study. S.Y. implemented the models, carried out the analysis and wrote the first draft. R.H.M.P. and P.S.P. performed data curation and contributed with the formal analysis. C.Q.C. performed the initial analyses and coordinated the project. All authors discussed, edited, and reviewed the manuscript, and gave final approval for publication.