key: cord-0584711-jia5kn1w authors: Pan, Wang; Deng, Qipu; Li, Jiadong; Wang, Zhi; Zhu, Wenwu title: STSIR: Spatial Temporal Pandemic Model with Mobility Data date: 2020-12-07 journal: nan DOI: nan sha: 746d81a2d1cc1acb3d27b2ec9ff5859ca5481ca1 doc_id: 584711 cord_uid: jia5kn1w With the outbreak of COVID-19, how to mitigate and suppress its spread is a big issue to the government. Department of public health need powerful models to model and predict the trend and scale of such pandemic. And models that could evaluate the effect of the public policy are also essential to the fight with the COVID-19. A main limitation of existing models is that they can only evaluate the policy by calculating $R_0$ after infection happens instead of giving observable index. To tackle this, based on the transmission character of the COVID-19, we preposed a novel framework Spatial-Temporal-Susceptible-Infected-Removed (STSIR) model. In particular, we merged both intra-city and inter-city mobility index with the traditional SIR dynamics and make it a dynamic system. And we proved that the STSIR system is a closed system which makes the system self-consistent. And finally we proposed a Multi-Stage Simulated Annealing (MSSA) algorithm to find optimal parameter of the system. In our experiments, based on Baidu Mobility dataset, and China pandemic dataset provided by Dingxiangyuan, our model can effectively predict the total scale of the pandemic and also gives clear policy analysis with observable index. Building accurate models to predict and quantify the effect of infectious diseases is very important to today's public health. The classic SI, SIR and SEIR models [3] have been proposed and used for a long time. However, with the development of modern transportation, the range and the effect of infectious disease are widely amplified, which traditional SI, SIR and SEIR models become insufficient to describe the transmission characteristics of new infectious diseases accurately. For example, the outbreak of the new COVID-19 pandemic has caused nearly 20 million infections and hundreds of thousands of deaths nowadays. We have not expected so much loss that such disease can cause, which urge us to build better infectious disease models to analyze and monitor the pandemic's development. And we can expect to rely on such tools to help us understand the epidemic transmission characters and make correct and reasonable epidemic prevention policies to save lives. In order to better fit and analyze the spread of such infectious diseases, various scholars have conducted different new methods. On one hand, several scholars add new transiting phases (eg. Asymptomatic, Hospitalized, etc.) in infectious models [4] [5] [6] [7] . However, the method of simply adding phases has several shortcomings. Firstly, the abundant parameters make parameter calculation very laborious. Secondly, after adding multiple phases, the model focus on only simulates a single city or region, and cannot model the flow of people between multiple regions. The transmission characters between regions have been ignored. Thirdly, with the different definition of different phases, properly categorizing patients to those phases is difficult. To make the model be able to model multiple regions, other scholars have added transportation statistics such as civil aviation and railways data to model pandemic evolution [5, [8] [9] [10] [11] . Nevertheless, the information granularity of civil aviation and railway statistics are too coarse to give the intercity information accurately, such modeling can only give a rough pandemic trend, with low predicting accuracy. Such models considered the inter-region transmission from the origin of the pandemic outbreak to other major cities. But with the development of modern transportation. New situations such that infected people may travel not only from the outbreak city to the first destination but also from the first destination to the second one are also crucial for the trend analysis, which has not been considered by the current models. Furthermore, the basic reproduction number 0 is essential in modeling. However, such a parameter can only be calculated based on infected numbers after infection happens. The model can only analyze the effect of our quarantine or intervention policy by heuristically imagining how low the policy can suppress the 0 to without giving any quantifiable index that we can monitoring [7, 10] . A systematic fine-grained spatial-temporal model that can better describe the trend of the pandemic and give quantifiable indexes to make policies aim to mitigate pandemic spread is desperately needed. To achieve this target, combining mobility data with classical infectious models is an insightful idea. With the prevalence of mobile phones and mobile apps, mobility data provided by large companies gives an accurate and timely overview of the migration of people [12] . The near real-time and fine-grained data are undoubtedly helpful in building new pandemic modeling frameworks [13, 14] . Multiple researches have shown that non-pharmaceutical interventions have obvious effects to mobility activities [15, 16] . For example, Baidu and Tencent provide an apparent migration curve of people between regions [1] , through the data collected from various apps (Figure2). Moreover, due to the accuracy of positioning and ITS development, the trajectory information generated by people when they travel within the city also allows us to obtain a description of the degree of intra-city activities. (Figure 3 ) During the pandemic, compared with the fixed and extensive civil aviation railway statistics, mobility data gave a more fine-grained transportation overview from inter-region and intra-region perspectives. With the considerations list above, we proposed the STSIR framework. A succinct, accurate, quantifiable framework that gives pandemic trend prediction and quantifiable indexes could help mitigate the pandemic spread. Generally, the STSIR model contains a system of differential equations in which we have designed two novel parts to model the spread of infectious diseases intra-regionally and inter-regionally. We set one day as a time segment. We only need the initial information when the outbreak started to give a pandemic evolving curve by fusing the intra-city activity index and the inter-city activity index. With the outbreak of the COVID-19, numerous studies related to such infectious disease are emerged. New models adopted new transmission character of the COVID-19 in order to get better model performance Initially, Wu et al. [10] integrated inter-regional traffic originated from Wuhan with SEIR model to catch the pandemic's transmission character. However, inter-regional transmission between other big cities has been ignored. Such a model has not considered latent infection transmission between big cities except the initial outbreak point. Followed by this research, Wang et al. [7] tried to give an analysis of the pandemic by intuitively adjusting the 0 . Such a method gives an overview of the pandemic with different 0 s. However, no observable indexes related to 0 are mentioned in such a model, which gives difficulties in evaluating our policy's effect. Furthermore, Gatto et al. [4] and Zhou [5] added a new asymptomatic phase into the pandemic evolving procedure. Gatto et al. [4] also added pre-symptomatic and hospitalized phases to give a more accurate dynamic of COVID-19. Nevertheless, with the different definitions of the different phases, accurately categorizing patients into phases is quite difficult. Lau et al. 's work [9] tried to use air traffic statistics to model transmission traits. However, the granularity of the statistic data limits the model performance. Ting et al. [12] , Ienca et al. [14] and Buckee et al. [13] pointed out the importance of the mobility data in fighting COVID-19. With such insight, combined with mobility data, Jia et al. [17] gave out a model that could help analyze the risk distribution of the pandemics. Gatto et al. [4] and Wu et al. [10] also enhance their works with interregional mobility data. Nevertheless, the mobility activities within each region are not considered thoroughly. In this subsection, we present the fundamental concepts about basic epidemiology models. We start with pure SI and SIR models. Moreover, we analyzed its usefulness in today's COVID-19 cases. As the analysis goes on, we derive our new STSIR framework from the basic models. Susceptible-Infected (SI) model. The most basic epidemiology model is Susceptible-Infected(SI) model. As shown in figure 4 , there are two transiting phases in the dynamic model. And the whole dynamics will be running based on the following equations: with initial condition: Susceptible-Infected-Removed (SIR) model. A transformation of the SI model is Susceptible-Infected-Removed (SIR) model. As shown in the figure 5 there are three transiting phases in the dynamic model. A new Removed phase is included to describe the people who are recovered, dead, or quarantined, which do not affect the infecting procedure. And the whole dynamics will be running based on the following equations: 2020-01-01 2020-01-11 2020-01-21 2020-01-31 2020-02-10 2020-02-20 2020-03-01 2020-03-11 date Other model with new phases During this COVID-19 pandemic, clinical reports [18] have shown that people may show no symptoms or just mild symptoms after they get infected. However, such people also get the ability to infect others. This kind of character makes this virus can be easily spread widely. As shown in figure 6, numerous scholars added new phases to enrich the dynamic model. Although these models increased the granularity of the transiting Figure 7 : Whole framework of the STSIR system STSIR Framework As shown in figure 7 , the whole framework is consist of several STSIR Layers. Each layer models the daily infection and transmission procedure of the pandemic. With an input of the initial infection and necessary information, the framework can give out the simulated infection numbers based on the corresponding mobility index. Mobility Index To better describe the pandemic's transmission character, We adopt two kinds of mobility index in the model. Intracity index and Inter-city index. The intra-city index describes the activity density within the city. The inter-city index describes the movement between regions. As shown in figure 2, since the government shut down Wuhan, the origin of the outbreak, the intercity mobility originated from Wuhan to other big cities in China dropped from a high value to near 0. However, there is still transmission between other big cities, as we can see from figure 1, which means that there are possibilities that those who get infected can move to multiple big cities and seed the transmission of the pandemic there. Furthermore, as we can see from figure 3 , the intra-city index has dropped dramatically when quarantine and stay-home policy begins. However, there are still low-level activities within the city, which give chances of the infection. So both inter-city and intracity index should be included in our model. STSIR Layer After considering the problems list above we propose the STSIR Layer which contains a dynamic model shown in follow equations: with initial condition: Given cities in the system, here ( ), ( ), ( ) are the system susceptible, infectious and recovered vectors with the shape of [ , 1] at time respectively. denotes the population vector of each city with the shape of [ , 1]. 1 represents the city-wise multiplier for the contribution of intra-city activity to the pandemic transmission. 2 is a unified multiplier to denote the transmission between multiple cities. 1 denotes the removing coefficient, which shows the removing rate of the infectious people. T ( ) and T ( ) are Intra-city mobility index with shape the of [ , 1] at time and Intercity mobility index with the shape of [ , ] at time respectively. Detailed meanings can be checked from table 1. Now we show the derivation of such system models. For each city , we consider the transition procedure for each phase, respectively. For the susceptible phase, the change of the number can be consist of three parts: • The people who were susceptible get infected within the city • The people who are susceptible coming from other cities • The people who are susceptible moving to other cities With the same idea, we can know the four parts of changes of the infectious number: • The people who were susceptible get infected within the city • The people who are infectious coming from other cities • The people who are infectious going to other cities • The people who are quarantined, dead, or recovered thus removed from the system. Then for each city , we can have a single dynamic system as follows: with initial condition And we can aggregate cities with above dynamics together to get the format of the dynamics shown in equation (11) to (16) T 1. Each STSIR Layer is a Self-consistent system, which means there are no new members introduced to the system. The total population of the whole system always remains the same. . For each layer at time we can add (17) , (18) and (19) together. We can get: And for each △ ( ) we can add them together for k cities. We have: Based on equation (24), the inter-city transmission mechanism implement no change of the total population. Parameter Learning There are several parameters in the STSIR model = { 1 , 2 , 1 }. Because of the asymptomatic state of the COVID-19 and the complexity of the case test, we consider the official announced cases is a delayed expression of the genuine situation. Instead of fitting the officially confirmed curve, we aim to train the model to be able to give a trend of the pandemic by simulating. Given two time sequences, ( ) and ( ), which denote simulated numbers of daily infections and officially confirmed numbers of daily infections respectively, we want to minimize the gap between total number of cases by simulation and the real confirmed numbers after the test delay, which can be defined as following optimization problem: To get the parameters of such a multi-system model, we developed a Multi-Stage Simulated Annealing algorithm. Simulated Annealing [19] algorithm has been used for multiple optimization tasks [20] , including both single objective and multiple objective optimization problems. It has shown its effectiveness in optimizing parameters. However, traditional simulated annealing is not suitable to optimize a dynamic model with multiple inter-reacting systems. So based on our problem, we use Multi-Stage Simulated Annealing algorithm to find our optimal parameters for the model. The algorithm is described in Algorithm 1. To evaluate the effectiveness of STSIR model, we conduct experiments on with Baidu Mobility dataset [1] and China pandemic dataset provided by Dingxiangyuan [2]. We will then provide a brief data description in Section 4.1. In Section 4.2, we will analyze the data we have, and illustrate our experiment settings. In Section 4.3, we demonstrate the strong predicting power of our model. In section 4.4, we analyze how great the model can help us analyze the effect of different policies in pandemic transmission. Baidu Mobility Dataset Baidu Mobility dataset [1] is provided by the Baidu company. Baidu search engine, as well as its various applications, owns over 1 billion users in China. Every time an application is launched, Baidu collects the IP address of the end device. Thus, Baidu could know how the crowd moves from one place to another based on the changes of the IP address. The information is represented as a mobility index, of which a higher number means a more frequent movement of the crowd. The data is fully anonymized and collected according to the terms and conditions of Baidu for privacy concerns. The dataset contains two kinds of mobility data of 388 administrative areas. The administrative areas are four municipalities, 343 prefecture-level cities, and 41 province directly governed counties (including Hong Kong, Macau, and Taiwan). The two kinds of mobility data, intra-city index and inter-city index, are two float numbers representing the activity strength within the city and between cities per day, respectively. The record ranges from 2020-01-01 to 2020-03-15. Since the range covers the Chinese New Year, the dataset also provides a historical intra-city index of the same period based on Lunar Calendar for reference. Table 1 As China has conducted a strict policy of distancing and quarantine, the whole pandemic in China become relatively stable ( Figure 8 ) during late February, which enables us to test our model by training our model at an early stage, and then feed it with the mobility index to give the simulated final scale of the pandemic. We will calculate the MAE (Mean Absolute Error) with the genuine final scale of the pandemic by: In this section, we demonstrate how well our model gives the final predictions of the real data. Moreover, the experiments show the model performs well in giving predictions of the total confirmed cases. As shown in the optimization problem (25), we want the model can give the current transmission situation based on the concurrent confirmed cases are given. Because of the asymptomatic character of the COVID-19, there always a delay between the tested confirmed cases and the real infected cases. During the experiment, we set the outbreak that has been passed for 50 days, and the delay between the test and real infection is 20 days. We then train the model at this stage and try to give the total confirmed case at the 70 days, based on real intra-city activity and inter-city activity. As shown in 4, the STSIR model achieves an overall MAE of 7.76 in predicting the final scale of the pandemic, which denotes its prediction power. As shown in 9, we can see the gap between the real confirmed cases and the simulated curve. It shows the delay effect of the case test. When quarantine and distancing policy is conducted, the transmission tends to be stopped, and the model can predict the final scale of the pandemic well. In this section, we conduct experiments on what if there is no "quarantine" and "shut down" policies are made how severe the pandemic will go in China. Since the transportation system will be really activated during the Chinese New Year, the pandemic will go really mad if no such policies are made. We insert the mobility index last year in the model to see what will happen in Huber Province. As shown in Figure 10 , if we do not shut down Wuhan, there will be massive infected populations moving to cities near Wuhan and causing massive infections. To tackle the spread of COVID-19, we introduce the STSIR model to quantify the scale of the pandemic and help to analyze the effect of the policies by introducing different mobility indexes. The system is designed to be self-consistent, and a new MSSA algorithm is introduced to learn the model parameters. Experimental results verify the effectiveness of our study. It shows outstanding predicting power and excellent analysis capability. Covid-19/2019-ncov infection time series data warehouse Series A, Containing papers of a mathematical and physical character Spread and dynamics of the covid-19 epidemic in italy: Effects of emergency containment measures Evaluating new evidence in the early dynamics of the novel coronavirus covid-19 outbreak in wuhan, china with real time domestic traffic and potential asymptomatic transmissions. medRxiv Estimating the efficacy of traffic blockage and quarantine for the epidemic caused by 2019-ncov (covid-19). medRxiv Phase-adjusted estimation of the number of coronavirus disease 2019 cases in wuhan, china Sars-cov-2 infection among travelers returning from wuhan, china The association between international and domestic air traffic and the coronavirus (covid-19) outbreak Nowcasting and forecasting the potential domestic and international spread of the 2019-ncov outbreak originating in wuhan, china: a modelling study Covid-19 progression timeline and effectiveness of response-to-spread interventions across the united states. medRxiv Digital technology and covid-19 Aggregated mobility data could help fight covid-19 On the responsible use of digital data to tackle the covid-19 pandemic Private sources of mobility data under covid-19 Covid-19 outbreak response, a dataset to assess mobility changes in italy following national lockdown Population flow drives spatio-temporal distribution of covid-19 in china Early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia Optimization by simulated annealing. science A survey of simulated annealing as a tool for single and multiobjective optimization