key: cord-0666617-hyz61yf6
authors: Ghosh, Shreya; Mukherjee, Anwesha; Ghosh, Soumya K; Buyya, Rajkumar
title: STOPPAGE: Spatio-temporal Data Driven Cloud-Fog-Edge Computing Framework for Pandemic Monitoring and Management
date: 2021-04-04
journal: nan
DOI: nan
sha: 425fbe8c32d0709832a898bb58a4654585277344
doc_id: 666617
cord_uid: hyz61yf6

Several researches and evidence show the increasing likelihood of pandemics (large-scale outbreaks of infectious disease) which has far reaching sequels in all aspects of human lives ranging from rapid mortality rates to economic and social disruption across the world. In the recent time, COVID-19 (Coronavirus Disease 2019) pandemic disrupted normal human lives, and motivated by the urgent need of combating COVID-19, researchers have put significant efforts in modelling and analysing the disease spread patterns for effective preventive measures (in addition to developing pharmaceutical solutions, like vaccine). In this regards, it is absolutely necessary to develop an analytics framework by extracting and incorporating the knowledge of heterogeneous datasources to deliver insights in improving administrative policy and enhance the preparedness to combat the pandemic. Specifically, human mobility, travel history and other transport statistics have significant impacts on the spread of any infectious disease. In this direction, this paper proposes a spatio-temporal knowledge mining framework, named STOPPAGE to model the impact of human mobility and other contextual information over large geographic area in different temporal scales. The framework has two major modules: (i) Spatio-temporal data and computing infrastructure using fog/edge based architecture; and (ii) Spatio-temporal data analytics module to efficiently extract knowledge from heterogeneous data sources. Typically, we develop a Pandemic-knowledge graph to discover correlations among mobility information and disease spread, a deep learning architecture to predict the next hot-spot zones; and provide necessary support in home-health monitoring utilizing Femtolet and fog/edge based solutions. The experimental evaluations on real-life datasets related to COVID-19 in India illustrate the efficacy of the proposed methods.

The significant growth of global travel, improved communication, urbanization, and greater exploitation of the natural environment have escalated the likelihood of outbreak of infectious diseases in a larger geographical scale. These fatal infectious diseases pose real threats for public health and government agencies require effective health measures in pandemic situations. In this context, the global outbreak of infectious disease COVID-19 (Coronavirus Disease 2019), caused by SARS-Cov-2, has swept 200+ countries or territories and contracted more than 160+ million people (as on the third week of March 2021). This human-to-human disease transmission is highly contagious and it is already observed that traditional infectioncontrol or public-health measures to combat COVID-19 are inadequate. In the exigency situation, governments had already taken varied measures and policies such as travel-restriction, Email addresses: shreya.cst@gmail.com (Shreya Ghosh), anweshamukherjee2011@gmail.com (Anwesha Mukherjee), skg@cse.iitkgp.ac.in (Soumya K Ghosh), rbuyya@unimelb.edu.au (Rajkumar Buyya) lockdown 1 of several regions, self-quarantine to control the rapid growth/ spread of the pandemic. However, still the threat of second wave of this pandemic persists in several countries of the world.

The rapid development and emergence of Internet of Things (IoT) has significantly improved all life spheres and industries by connecting billions of devices and people around the world. IoT has brought useful solutions in different fields such as agriculture, farming, healthcare, smart building, smart city, personalized recommendation etc. [1] , [2] . The IoT devices need to send data to cloud servers frequently for processing and analysing the accumulated data. However, this increases the delay, therefore affects the Quality of Service (QoS). Here, edge and fog nodes extend the functionality of the cloud computing by processing, analysing and storing the information near the enduser. The combination of these emerging technologies facilitate several time-critical applications, namely, continuous patient monitoring in home, ambulance, and hospital, as well as assisting users in emergency situations, like disaster [3] .

In the recent times, there is a growing need of analysing spatio-temporal datasets for extracting meaningful information, and providing location-aware services, such as trip-planning, weather forecasting and even health-management. Notably, one of the major aspects in epidemiological analysis is retrieving the correlations among the people and disease spread in spatial and temporal dimensions to effectively adapt the countermeasures. For instance, it was found in a spatio-temporal data analytics [4] that the source of Cholera outbreak (in London, 1854) was contaminated through bore wells. The finding was immensely helpful to combat the spread of Cholera. To this end, Internet of Spatial Things (IoST) combines IoT with spatial context [5] , where location information of the objects plays an important role. To fight against pandemic, spatio-temporal information and health data need to be integrated and analysed to predict the spread of the disease as well as assisting users about the risk of infected. Here, we propose the term Internet of Spatio-Health Things (IoSHT) as an integration of IoST with IoHT (Internet of Health Things), where all the health-related information is combined with the location data. This work mainly focuses on how novel spatio-temporal data analytics method is beneficial in deciding strategical administrative planning to enhance the preparedness to curb the pandemic. Undoubtedly, the entire world is suffering enormous amount of challenges from the pandemic situation caused by COVID-19:

• (Q1) How to share information about available medical facilities and disease spread patterns in varied locations?

• (Q2) What is the most effective measure to analyse available data sources and predict next probable hotspots to further enhance the preparation for pathological disease outbreak?

To resolve Q1, an efficient data infrastructure is required, which can store, manage and share authentic information amongst stakeholders. Further, novel spatio-temporal data analytics module is a must for understanding the impact of human movement data in the disease spread and summarizing the information in this situation (Q2). To this end, we propose a Pandemicknowledge graph (PKG) to capture the interdependence and connections among several entities and users' movement log, and subsequently identifying the hotspot/containment zones. The knowledge graph is represented as a multi-relational graph consisting of entities as nodes and relations as edges, and support several Artificial Intelligence (AI) related applications. However, it is difficult to represent mobility and other contextual information utilizing the conventional knowledge-graph that changes in temporal scale. On the other side, deep learning is the most feasible solution to extract the correlations among several factors such as population density, mobility information and other data-sources.

Although, the full lockdown measure (complete restriction of the movements) is one of the optimal solution to reduce the human-to-human transmission rate, on the counter side, it has a drastic impact on the economy of the country. This paper analyses the impact of the movement in two phases in the context of India: (a) In the pre-lock down phase when the International travel was allowed. It analyses the in-flow of the International flights in several regions of the country, and the cases reported within a temporal buffer of the visits. The approximate count of the travellers and air-crafts in the time span of March 10 to March 21, 2020 2 and the reported cases 3 have been analysed; (b) In the next phase, we develop a disease spread module based on the factors such as population density, changes of mobility patterns 4 , POI (point-of-interest) information etc. Here, we have proposed a variant of co-occurrence pattern where the mobility information has been augmented.

The major significance and contributions of STOPPAGE are as follows:

• IoSHT: An end-to-end spatial data infrastructure framework named, IoSHT, consisting cloud, fog, edge and IoT layers is presented which stores, manages and analyses pandemic related data effectively and assists in taking effective decisions. IoSHT is capable of collecting BAN (Body area network) information and other contextual data such as environment-temperature, location data, mobility information etc. Femtolet (a small cell base station with storage and computation ability) [23] is used for faster reporting of patient information of hospitals. The delay and power consumption of the user device for health status monitoring have been determined and compared with the cloud only system to show the efficacy of the proposed framework.

• Pandemic-Knowledge Graph (PKG): We propose a novel time-dependent Pandemic-knowledge graph (PKG) based on the active cases, POI, road and air-travel connectivity index and users' movement history. PKG captures correlations and impacts among locations, temporal information, movement semantics and infectious disease spread.

• The paper proposes a deep learning network which incorporates the spatio-temporal data instances and finds out the probable next hotspot zones by mining the inherent knowledge. To be specific, the deep learning architecture is capable to incorporate the impacts of human mobility and other spatio-temporal features of PKG, and finds out the hotspots effectively. Wang et al. [6] Comprehensive knowledge discovery framework to extract finegrained knowledge elements from scientific literature, and a case study on Drug Repurposing Report Generation has been presented Esteva et al. [7] Retriever-ranker semantic search engine to resolve complex queries to find scientific answers Fernandez et al. [8] Cause-and-effect network generated from scientific literature and presented a knowledge graph. The paper analyzes comorbidities, symptoms, and discovered over 300 candidate drugs for COVID- 19 Chen et al. [9] Formalization and extraction of meaningful insights from the PubMed dataset and generation of knowledge graph using two methods: co-occurrence frequency and cosine similarity Reese et al. [10] Present flexible framework to integrate biomedical data for producing knowledge graphs (KGs) [1] Most of the works deal with knowledge mining related to pharmaceutical data, drug discovery or extracting information from scientific literature.

No existing works deal with information storage and knowledge extraction from heterogeneous data sources, like travel-logs, aggregate movement information, COVID-19 statistics in different spatio-temporal granularity and find the correlations amongst them.

[II]

Kraemer et al. [11] The spatial distribution of COVID-19 cases in China and the mobility traces are analyzed Pepe et al. [12] The daily time-series of three different aggregated mobility metrics have been presented. The authors use 170,000 smartphone users' data to monitor the impact and assisting in making decisions. Huajun et al. [13] Identification of suspected infected crowds based on the human movement trajectories. Samuel et al. [14] Analyse human mobility data to find out disease spread for restrictions order to stay home. Hamda et al. [15] A mathematical modelling with the movement dynamics and disease outbreaks have been proposed. The work helps in taking measures of social distancing and other policies in the 25 counties in the USA

Rahman et al. [16] Distributed deep learning framework for COVID-19 diagnosis in 5G network at the edge Kapoor et al. [17] Forecasting approach using Graph Neural Networks and mobility data for COVID-19 case prediction. Luz et al. [18] An efficient method of COVID-19 screening in chest X-rays in less time and computational cost.

[II]-[III] To the best of our knowledge, there is no deep learning module which analyzes the movement patterns and other contextual facts to understand the impact of the disease spread and predicts hotspots efficiently.

[IV]

Tuli et al. [19] Cloud computing based framework to predict growth of the epidemic and appropriate strategies have been presented. The authors present a prediction framework. Wang et al. [20] Anonymous and secure aggregation model in fog-based public cloud computing model Md et al. [21] Utilize mobile and fog computing to trace and prevent COVID-19 community transmission Adarsh et al. [22] Multi-layered architecture to collect real-time information from drones and utilize in disease monitoring, control, thermal imaging, social distancing, and statistics generation STOPPAGE Cloud-fog-edge enabled collaborative framework conducive to construct a novel time-dependent (Proposed framework) Pandemic-Knowledge graph (PKG) based on the spatio-temporal data. Presenting deep learning based analytics module to find out hotspots by mining contextual data.

have also accumulated individuals' mobility history from Google Map Timeline from specific regions of India to evaluate the efficacy of STOPPAGE framework.

The rest of the paper is organized as follows. Section 2 summarizes the related works in this domain. Then, the system architecture and the deep learning module have been presented in section 3. Then we present a real-life example of home-health monitoring system and usages of Femtolet to reduce the reporting time in section 4 followed by the experimental evaluations in section 5. Finally, we conclude the paper with research challenges and opportunities in this domain in section 6.

To begin with, Internet of Spatio-Health Things is an emerging field of IoT, where integration of IoHT and IoST is carried out to improve the health care system. In IoHT the IoT devices are integrated with mobile technologies to process and exchange data for monitoring health condition of individuals [24, 25] . While several IoT-enabled medical devices, BAN accumulate health data and provide insights into symptoms and trends, the information-exchange, communication and interoperability of IoT devices make healthcare services more effective. Inside the health-care centers, IoHT can enable vital signs monitoring of a patient [24] . However, the COVID-19 pandemic situation can not be managed by only IoHT solutions, it also requires the monitoring of human-to-human transmission of the disease. To this end, IoHT needs to be integrated with IoST to monitor the overall spread of the disease and reduce the transmission rate by early identification and taking preventive measures, such as restricting human movement or close contact. Further, IoT devices associated with real-time location tracking of medical equipment such as oxygen pumps, wheelchairs, defibrillators make medical resource management efficient in this pandemic. It is evident that location-information has a huge significance to curb the pandemic. Specifically, mobility traces play a pivotal role in several real-life applications, such as, location-prediction [26, 27] , outdoor navigation of visually impaired people [28] etc. It is quite obvious mobility is also an important factor for spreading infectious diseases. Internet of Spatial Things (IoST) is a new domain and only a very few works (e.g. [5, 29] ) have been carried out in this direction. In this paper, we introduce Internet of Spatio-Health Things (IoSHT), where individual health monitoring and assistance can be provided anytime anywhere, as well as the information-sharing, pandemic/disease spread pattern are analysed to enhance the preparedness.

The onset of COVID-19 has resulted large volume of research papers proposing several methods to curb the pandemic. While there are numerous articles dealing with the pharmaceutical methods, we emphasize more on analysis of contextual information collected from mobile-phones and other sources, such as movement patterns, cumulative count etc. to predict the future spread of the disease. Table 1 shows four major features and the existing approaches in the context of COVID-19. Data management and information retrieval is one of the major step in combating the pandemic. A comprehensive insight on how IoMT (Internet of Medical Things) systems can be beneficial in the context of COVID-19 along with the architecture, tools and technologies are discussed in [30] . Marcello et al. [31] states that large scale collection of data and maintaining the privacy and public trust is one of prime step. There are several works on summarizing the knowledge from drug discovery or other pharmaceutical methods [8] , [9] . Also, researchers [6] have put significant efforts to summarize information from scientific literature by proposing knowledge graph. There are works on aggregate movement dynamics and understanding the impact in disease spread [11] , [12] . Huajun et al. [13] present a novel trajectory pruning method to find out suspected crowds in a region. However, to the best of our knowledge, no existing works has extensively studied the movement patterns and contextual information in a region in the context of COVID-19 and effectively stores and manages information for better decision making. There are also few works leveraging cloud-fog paradigm for contact tracing, real-time drone based system and analysing the growth of the disease [19] , [21] .

It may be concluded that machine learning and movement data analytics may act as major scientific tools for combating the spread of COVID-19 pandemic. In this direction, our work explores and analyses the human movement related data proposing novel knowledge graph (PKG) and deep learning architecture.

The proposed framework, STOPPAGE has two major modules, namely: spatio-temporal data infrastructure (SDI) and a data analytics engine running over the SDI. Figure 1 illustrates the overall workflow and modules of STOPPAGE. The spatial data infrastructure consists of backbone IoT network and geospatial resource management module. The spatio-temporal data analytics engine analyses the COVID-19 data and mobility related information to find out the probable hotspot zones by deploying deep learning architecture. Furthermore, STOP-PAGEproposes the use of Femtolet to report any health events in minimal latency. Also, the health status monitoring is provisioned in the framework by a common API-endpoint. Fig. 2 illustrates the overall block-diagram of data analytics engine of STOPPAGE. It has three major components (i) Pandemic knowledge graph (PKG) construction, (ii) deep learning based movement analytics and (iii) faster reporting of COVID-19 events and data sharing using cloud-fog-edge based paradigm. Fig.  3 (a) represents the SDI (Spatial Data Infrastructure) and Fig.  3 (b) illustrates the deployment of IoSHT in cloud-fog-edge structure.

Spatial data infrastructure (SDI) 6 denotes a framework consisting technologies, policies and orchestration method to cre- 6 The term Spatial Data Infrastructure (SDI) was coined in 1993 by the U.S. National Research Council ate, exchange, and utilize geospatial information and services among the community [32] . The goals of this framework are:

• ease of search and discovery of geospatial services and information

• reduce data duplication of information among the national agencies (government)

• seamless data sharing technique

• maintaining the data integrity and privacy Further, SDI enables the communication between the repositories of cities, states, countries and industries to share the data and services. In other words, the difficulty in sharing and accessing different format/structure of data is resolved by SDI.

In the context of COVID-19, different stakeholders, such as health-department, transportation authority or epidemic-control team along with the end-users need to seamlessly share data and communicate to effectively handle the situation. Without a proper underlying infrastructure of data sharing and managing mechanism, any epidemic-control measure or policy is difficult to implement. Therefore, in our proposed IoSHT framework, we have adapted SDI as the backbone infrastructure. The SDI in the context of IoSHT is represented in Fig. 3 (a). To combat the disease spread and take countermeasures, we need to analyse heterogeneous data-sources. The authentic data about the country's population, demography, health care centers etc. can be found from the government departments. In the context of India, we collect demography and population data from SOI and open-source platform 7 , 8 , health-related information from MoHFW 9 , transport data from NHAI 10 and raster information (satellite data) from ISRO 11 . All of these datasets of country's population, demography information of the people, health care center related information, traffic data are useful to analyse the disease spread patterns. Further, the crowdsourcing data from Google Map, OpenStreetMap (OSM), or Android application are also beneficial to get the present status of the region. The data and health service providers provide the metadata information to the catalog. This catalog is maintained in each of the nearest fog devices, where the data or the service is generated. User can search the catalog any time, and extract the metadata of the information or service he/she requires. Once, the user selects the required service/information, he/she can consume the service. It may be noted, in the proposed STOPPAGE framework, the services are: (i) contact the health care center for medical help; (ii) personalized health-checkup by sending the health-parameters' values collected from the IoT devices/BAN; (iii) get the present disease spread information of any region; (iv) find the risk of infection analysing his/her movement history. In our model, the metadata information is retrieved in .xml or .json format. All of these analytics are computed in the cloud servers, and the user gets the result through the web-dashboard or android app. health care centers for faster reporting of the number patients affected, deceased and cured, because this information is also very important to monitor the current status of the pandemic in a geographic region. Femtolet is small cell base station which has storage and processing ability [23] . In the proposed model, in each ward of the hospital Femtolets are deployed to store and process the information of the patients currently admitted in that ward. The number of Femtolets to be allocated depends on the coverage area, storage capacity, data processing speed of the Femtolet, and the number of patients currently admitted in that ward. The health status of a patient is collected using BAN and sent to an edge device (for each patient BAN and an edge device are used). The edge device (ED) accumulates the data and sends to the Femtolet, connected to the edge device. The Femtolet processes the incoming data and stores it under the respective patient registry (for each patient a data registry is maintained). If any emergency situation arises, the Femtolet sends signal to the connected alarm which after receiving the signal starts ringing. The health personnel of that ward then takes required action. The Femtolet periodically sends the patient health status to the private cloud servers of the health centre, which maintains the information regarding the number of patients cured, dead and affected. From the private cloud servers the details of the number of patients affected, cured, and deceased in the hospital with respect to the disease are reported to the public cloud servers periodically. The overall architecture of Femtolet based health care model is shown in Fig. 4 . As the public cloud has the information regarding the number of patients admitted, cured, and dead due to COVID-19 in each hospital, the total number of admitted, cured and dead patients (considering all the hospitals' information) due to COVID-19 is determined. The people can get this information if he/she wishes to know the same. However, the information of each hospital will not be disclosed without authorization due to privacy management. The cloud already has the information regarding hotspot zone (in Section 3.4 hotspot identification is discussed). By considering both the information, the people can be made aware of the current scenario.

Graph (PKG) Construction This section describes the features of Pandemic-Knowledge graph (PKG) and how it is constructed from the available dataset. This is a multi-layer network -which captures the movement behaviour individuals as well as the overall movement semantics of the ROI (Region-of-interest) and the statistics related to COVID-19, such as growth pattern of active cases, changes of aggregated mobility patterns, available medical facilities etc. The intuitions behind constructing knowledge graph are: (i) Transforming the underlying relations of mobility semantics in a machine-readable format to support information retrieval and query-processing, (ii) The complex spatio-temporal mobility dataset can be represented by graph structure effectively instead of other storage, (iii) Updation of relations and facts can be easily incorporated in this structure compared to SQLbased processing, therefore provides more flexible schema, and finally (iv) Knowledge graph provides a higher level abstraction of information, which may help to extract more complex and previously unknown interrelations merging more than one relations of the graph. Therefore, it facilitates building a semantically enriched knowledge-base.

The Pandemic-Knowledge graph is formulated as triplet of < s, r, o >, where s ⊆ Ω and o ⊆ Ω are entities and r ⊆ ϕ is the relation between two entities. The sets of entities and relations are denoted by Ω (Ω 1 ∪ Ω 2 ) and ϕ respectively. Notably, in our Pandemic-Knowledge graph proposition, each triplet (or fact) has a time-slot, when the fact is valid. Therefore the facts of PKG takes the form of MF :< s, r, o, [t 1 , t 2 ], f > where the [t 1 , t 2 ] entry denotes the time-interval when the fact is true. f is the feature value of the relation. This f is introduced to capture the semantic correlation with the COVID data.

The entities consist of: U ⊆ Ω 1 and P ⊆ Ω 2 , where U and P are the set of users and set of place-ids of a ROI. Each entity pl ⊆ P has attributes: POI-type, location (latitude and longitude), enclosing area. Few pl has additional features such as opening/closing time, available facilities. Each entity u ⊆ U has unique user-id, age, gender, residence-area, health-profile and travel-history. The derived attributes of POIs such as GPS footprint or edge segment are extracted from the GPS log of the users in the ROI. The facts connectedBy and boundingBox can be directly computed from the road-network and latitude and longitude information of the POI. Based on the historical movement log the relations r ⊆ ϕ are extracted.

Apart from the mobility fact label, each edge in the graph has a feature value associated, which depicts the probability of the occurrence of the edge. Few such facts are represented as follows:

visit

User u a visits place p i in the time-interval [t 1 , t 2 ] with a probability of being infected f x . Here, the feature value of the fact is computed by the graph embedding approach [33] .

The sequence of places (p 1 , . . . , p m ) are visited by a group of users (U ) in the same time-intervals. f x may have a value within [0,1).

The above mentioned movement flow (MF) is detected from p a to p b in a specific time-slot, when a set of users (u 1 , . . . , u n ) visit the POI p a followed by p b in the same time-interval. The count (n) of GPS footprints satisfying the fact must be greater than a threshold value ν. In other words, if ≥ ν people visit the sequence of POIs in same time-interval, then it is considered that there is a mobility flow from p a to p b . The place p b can be visited from place p a in the time-interval [t 1 , t 2 ] where f p denotes the number of available routes. It may be noted that due to the lockdown measure, this fact changes frequently for all pairs of places. The connectivity index (CI) of a place p is computed based on the connectivity and the in-flow and out-flow of traffic from a particular place p. The high connectivity index between two place represents higher mobility flow, and thus can be a potent feature for the hotspot detection.

Next, we define cascading pattern (P ca ) and co-occurrence pattern (P co ) extracted from PKG. These patterns can be analysed from the travel history of the persons who are detected COVID-19 positive. Cascading patterns represent events whose instances are located together and occur serially. For instance, analysing the PKG data of a hotspot may reveal that a group of users visit a POI on a day and spends a particular timeduration within a distance threshold. Also, air-travel of a group of users from a region with high number of active cases and new hotspot emergence in the destination region within a timespan is another example of cascading pattern. Also, P ca covers patterns such as: higher number of medical facilities in a city-region, more cases of COVID-19 patient transfer and emergence of hotspot. On the other side, co-occurrence patterns reflects patterns occurring frequently within a spatial range and a temporal span. This patterns comprise properties from different contexts. For example, given three different contexts, namely, population density, aggregate movement pattern and COVID-19 active/ new cases (say, within a spatial buffer of 2km and time-span of 7days): P co : {population − density > δ; aggregate movement pattern > γ; Hotspot prob = HIGH}. Also, P co covers patterns such as: festival/ gathering events in a region and emergence of new hotspot; or more amount of location-based service requests (ridesharing services/ food delivery services) and hotspot detection in the same region within a temporal bound. Algorithm 1 shows the steps of extracting cascading and co-occurrence patterns from PKG, which in turn helps in extracting the knowledge of disease spread, and finding the possible hotspot zones in future. The algorithm needs a threshold value called pattern participation index (PI) to prune the irrelevant patterns. 

Here, MF represents the mobility facts and events present in PKG. It may be noted that the conditional probability of a pattern; given the possible similar type of events/ facts is measured by PI. It is also an useful measure for predicting the near future occurrence of a pattern in the spatio-temporal proximity of an observed instance of a participating event-type. We set two threshold values namely PI 1 and PI 2 for pruning the irrelevant patterns from PKG. The neighbor spatio-temporal or ST relation represents the time and spatial constraint on finding the co-occurrence patterns. In brief, the algorithm first extracts the plausible candidates from PKG, followed by pruning technique to eliminate irrelevant patterns. Construction of the Pandemic-Knowledge graph consists of following steps: (i) In the first step, the POIs and users along with the attributes are extracted and stored. The information are discovered by geo-tagging step and segmenting the trajectory of individuals at different time-scales. (ii) Next, the links between the entities (or facts) at different time-instances are discovered by knowledge graph embedding approach. The relations such as visit, group, flow etc. are defined. The movement log is analysed to measure the plausibility of any such facts. However, the facts are checked over different temporal instances, since the facts or relations of the knowledge graph change with timeinstances. The backbone SDI helps in sharing heterogeneous data, maintaining the authenticity and integrity of the data to build PKG, and facilitating services seamlessly to the user. Genpat(i) ← P ca o f length (i − 1) Extract new patterns of i length from patterns with length (i-1)

temp ← S T − Join(P ca [i], NR) Perform spatio-temporal join with instances from direct neighbor relation 6: if checkT hreshold(temp) ≥ PI 1 then 7:

Append the new pattern if it satisfy the threshold condition 8: end if 9: end for 10: for t = 1 to(n − 1) do 11: te ← GenCand(PKG(MF, NR)) Generate candidates from PKG using possible facts/ events and neighbor relation 12:

Sweep-line based method to extract events satisfying ST-property and threshold 13: end for

Architecture to find out Hotspots "You cannot fight a fire blindfolded. And we cannot stop this pandemic if we don't know who is infected." 12 Until there is no vaccine of this contagious disease to curb the pandemic, identifying the affected person and restricting the mobility to reduce human-to-human transmission is the only optimistic solution. Analyzing spatio-temporal pattern of disease outbreak is an integral part of pandemic control, which helps in identifying spatio-temporal hotspots (disease emerging areas), and subsequently assists the planning of emergency measures for monitoring, surveillance, and prevention of the disease spread. Most of the governments have enforced partial or complete lockdown to prevent the disease spread. However, the complete lockdown measure has negative impact on the socio-economic condition of a country, and more feasible solution is required, such as identifying high-risk zones (large number of affected person) and impose lockdown at those regions. In such a scenario, it is absolutely necessary to find the correlation of the disease spread and other additional information from spatio-temporal context. For instance, whether the disease spread is related to population density, demography data etc. Moreover, whether specific 12 WHO Director-General's opening remarks at the media briefing on COVID-19, dated March 16, 2020 

The primary objective of our analytics module is to identify the high-risk areas of disease spread in a region. We aim to find out the correlation of contextual information such as mobility with the probable spread of the disease.

Of late, deep learning has gained significant research interest to utilize the correlations between related tasks and improve the classification accuracy by jointly learning more than one task. Inspired by this paradigm, STOPPAGE models the deep learning module to predict the hot-spot areas. It may be noted, that in our problem set-up, the mobility patterns have significant impact, and learning the representation of these contextual variables is one of the important aspects of our deep learning architecture. Fig. 5 illustrates the recurrent architecture of the proposed deep learning module. In our framework, we have considered the following features: The aggregated mobility report is collected from Google Community mobility Report 15 . The individual mobility history is collected from users' smart-device and Google map timeline of individuals 16 in the study-region.

All of these information is managed using an efficient storage method. The region is segmented into different grids, and the fog devices store the information of all the grids within its coverage using a hashing scheme [34] . Thus, the information is segregated into different hierarchical segments, and data extraction becomes efficient.

Next, it is important to find out the spatio-temporal correlation (SC) factor of the COVID-19 spread in neighbouring regions, which has been computed based on Moran's I [35] as follows:

Here, v a represents the number of events/ incidence (here, number of new cases reported in region a) observed in a region a, the mean of the newly confirmed cases in the whole region isv, the total number of observations is o. w ab represents the spatial adjacency between a and b. Here, we have used different types of adjacency relation instead of only distance measure. The direct spatial distance measures as follows: (1) two regions share a common border; (2) direct route is present to reach b directly from a. Next, we sort the regions based on their (3) population density, (4) literacy rate, (5) available medical facilities, (6) aggregate movement flow. Based on each of the variable, we compute S C where a region is adjacent with its previous ranked region and the next ranked region. We compute these values at different time-spans (specifically in each week of the study period).

No S patial Autocorrelation > 0 Positive S patial Autocorrelation < 0 Negative S patial Autocorrelation (8) As shown in the equation (8), when the computed value of SC is 0, it represents no spatial autocorrelation. Similarly, the larger absolute SC value demonstrates stronger spatial autocorrelation. This information (SC on six different adjacency metric) is used in the deep learning module to efficiently predict next possible hotspot zones.

Specifically, a trajectory segment/ movement history of an individual records the consecutive stay-points in the path of the user, which are continuous spatial locations. Each grid is identified with unique id. Thus, we discretize the continuous spatial location information. We use the skip-gram model to learn the representation of spatial locations (l 1 , l 2 , . . . , l T ) in the training dataset:

1 T T t=1 log[p(l t−c , . . . , l t−1 , l t+1 , . . . , l t+c |l t )] (9) which can be written as:

where context : −c ≤ j ≤ c, j 0

where l t+ j is the neighboring location of the present location l t . Based on the spatial proximity rule that two neighboring locations have similar representation, we use softmax function as defined:

Thus, the embedding layer converts the spatial locations into: 12) where N is the number of the regions. Next, we embed the temporal information. The representation of the temporal information should comprises of day of the week, timestamp and timeduration spent in a location. Here, the skip-gram model is not efficient, and we use paragraph-vector model and get the vector representation of temporal information. After embedding the spatial and temporal features, we deploy a bidirectional LSTM layer to capture the shared information of both the tasks. The basic building block of the LSTM layer is as follows:

where the input, output, and forget gates are represented by i, o and f , respectively. The hidden representation is denoted by h t . Since, STOPPAGE uses a bidirectional LSTM, the output of the layer is modified as:

Here, the output from the forward and backward propagation layer are represented by #» h t and ← − h t respectively. The next layer utilizes a Gated recurrent unit (GRU) which is similar to LS T M. This layer is used for extracting the impact of the other contexts such as demography, age and aggregated movement etc. GRU has reset and update gate which are formally defined as:

where W z , W r and W h are weight matrices. The update gate (z t ) helps to extract the required information from the past timestep and pass to the future. On the other hand, reset gate (r t ) is used to determine how much past information needs to be eliminated. Therefore, in the proposed model, the GRU is capable to filter and store information utilizing the reset and forget gates. Here, we use specific information such as S C, P ca , P co etc. This is crucial to capture the different phase of the disease spread, since GRU layer eliminates the vanishing gradient problem, but pass the relevant information to the next steps of the network. Note that in the proposed approach, deep LSTM architecture is used, allowing the network to learn at different time scales over the input. Furthermore, they can make better use of parameters by distributing over the space through multiple layers.

Another important module of the architecture is attention mechanism. It is used to capture the relationship between all of these contextual factors and hotspots in different spatio-temporal resolutions. Here, we have used the dot product attention function f att and the representation is defined as:

Here, m is the vector representation of spatio-temporal features from the previous layer. The GRU input layer is replaced by the weighted representation (r att ). On the other side, air travel connectivity index is utilized for the initial phase of disease spread. Here, attention layer provides more attention to the long-distance travels and connectivity between regions. For both the phases of disease spread, the final layer is softmax layer where all the outputs from different layers are fed.

The network is trained to minimize the cross-entropy loss as:

The number of training samples and number of classes are represented by T S and Cl. We define 4 classes here: (i) > 20 cases within 500m; (ii) > 50 cases within 1km; (iii) < 5 cases within 1km and (iv) < 10 cases within 2km. We define (i) and (ii) as hotspot zones. y is the ground truth and y is the prediction probability.

It may be noted that, in the MR feature analysis, we also combine the information if any neighborhood region was a containment zone/ hotspot zone (high infection rate) in last 14 days. This information is stored in the respective fog devices of the region. Thus, the architecture encodes and learns different mobility semantics and other parameters at varied contexts and finds out whether the region is the next hotspot zone.

In this section, we show how STOPPAGE can be utilized to combat pandemic situation. We have discussed two usecases, and how data analytics engine and SDI of STOPPAGE can be utilized in such applications. An Android application along with the API-endpoints have been developed to retrieve and share information. We have also shown delay and power consumption to illustrate the feasibility of STOPPAGE in terms of latency and energy consumption.

This is also observed that the risk of the infection spread depends on other factors also. For instance, the probability of infecting other people are much higher when the asymptomatic person is in the workplace or shopping mall compared to when he is in a less crowded park or playground. Also, the timeduration of contact with other person is also an important factor. Therefore, STOPPAGE can also use the individual mobility history to find the suspected people from the PKG analysis. If any of the users is tested COVID19-positive, then the information is send to the nearby fog device and based on the PKG analysis, all other users are notified. Alongside, using the reverse pruning technique, we can also find out the asymptotic person by analysing the mobility data. Analysing and assisting people in real-time is the major challenge here. The fog devices keep track of a particular locality and assist users within that region. Since, the dataset is huge, any compute-intensive task is carried out in the distant cloud servers. However, since the fog nodes store the results, the delay in communication is reduced.

Home In this crucial time, it may not be possible to clinically test everyone. However, the preliminary testing can be done at home. The basic health parameter values such as body temperature, blood pressure (systolic and diastolic), pulse rate, SPO2 (oxygen saturation) level can be collected using BAN and sent to the smart phone of the user. In case of COVID-19, the common symptoms are fever, abnormality in breathing, cough etc. Here, fever and breathing abnormality can be predicted from the collected body temperature, pulse rate and SPO2 level. The collected health parameter values are sent to the smart phone. The smart phones already have GPS (Global Positioning System) tracking module, the environmental temperature and humidity detecting module. It can check that whether the collected data values fall in the normal range with respect to the user's health profile and contextual information (location, motion and acceleration, environmental temperature and humidity). If the collected value falls in the normal range, the smart phone will show that the health status is normal. Let the normal range of a health parameter hp is hp up u to hp low u , i.e. the upper value is hp up u and lower value is hp low u for a user u with respect his/her health profile and contextual information, where hp ∈ H and H is the set of health parameters. Let hp cl u is the collected value of the health parameter hp for the user u. If hp up u ≥ hp cl u ≥ hp low u , then the collected health parameter value resides in normal range. This checking is done for all health parameters considered in the experiment. If hp cl u falls in normal range for hp ∀H, then the status is predicted as normal. Otherwise, the predicted health status is abnormal and the Figure 6 : MyHealth: Android app for health monitoring smart phone will show an alert message, and send the collected data with user's health profile and contextual information to the cloud. The health care centre of the region where the user belongs will access the data and contact the user for further check up. In this way a preliminary health monitoring service can be provided to the user.

We have designed an Android application (app) called, My-Health which can be used for personal health status checking. The collected health data, profile and contextual information are processed, and if the user's health status seems to be abnormal (e.g. body temperature is high, pulse rate is abnormal, SPO2 level is low), then the user is suggested to be at home, and medical person will contact him/her soon for further medical tests (refer Fig. 6 ). It indicates faster health care provisioning than the traditional system, where the user has to contact a medical staff by himself/herself for further medical tests all the time.

Let the total delay in collecting health parameter data and transmission to the smart phone is Delay h , and the delay in collecting contextual information is Delay c . The health profile is already present inside the smart phone. As the contextual information collection takes place simultaneously while the smart phone receives health parameter data from BAN, the total delay in collecting and accumulating health data with profile and contextual information is given as:

where Delay mob is the delay in data accumulation. 

Let the amount of data processed inside the smart phone is D mob , inside the fog device is D f , and inside the cloud is D c , and the processing speed of the smart phone, fog device and cloud are S mob , S f and S c respectively. Then the delay in processing the data is given as,

The total delay in predicting health status is given as, Delay tot = Delay ca + Delay com + Delay pro (22) The delay in predicting health status in case of proposed system and in case of cloud only system are presented in Fig. 7 . This is observed that the use of the proposed health care system reduces the delay ∼(21-60)% than the cloud only health care system. In cloud only system the communication delay is higher, where as in proposed system the intermediate nodes participate in data processing, which in turn reduces the communication delay. Subsequently, the total delay is reduced in the proposed system.

In the module, smartphones are used as an edge device, which collects, accumulates and sends health and context related information. In this context, it is important to measure and optimize the power consumption of the mobile device. Let the power consumption of the smart phone in data transmission, reception, active and idle mode are P t , P r , P a , and P i respectively.

The power consumption of the smart phone during data collection and accumulation period is given as, P ca = (P a · Delay mob ) + (P r · max(Delay h , Delay c )) (23) 

The power consumption of the smart phone during data processing period is given as,

The power consumption of the smart phone in the proposed system and cloud only system are presented in Fig. 8 . As in the proposed system the intermediate nodes participate in data processing the delay is reduced, and consequently the power consumption of the smart phone is also reduced. This is observed that the proposed system reduces the power consumption of the smart phone by ∼(19-50)% than the cloud only system. In the conventional strategies followed in most of the developing countries, the user sees the parameter values, answers to the questions regarding health monitoring over voice call/message, and then based on his/her answers, the health status is predicted, and if it seems to be abnormal, then the user is asked to visit a health centre for further check up. It is quite obvious this whole procedure takes much time. If the mobile health monitoring takes place, the latency is quite less than the conventional health monitoring system. Moreover, as the data collection and processing entirely takes place by the device, the accuracy of the prediction and reliability are also higher. On the other hand, the user can do this residing at home; no need to visit the health centre for preliminary check up until and unless an emergency occurs. This in turn helps to avoid unnecessary gathering and social distancing. Thus, we strongly recommend to use mobile health monitoring in such situations to avoid unnecessary delay and gathering.

In this section, we present the implementation details of the proposed framework, STOPPAGE. We have used the compute engine and app engine of Google Cloud Platform (GCP) to carry out the spatio-temporal data analysis. In the test-bed, we have used Raspberry Pi 3 as the fog device. We have also designed an Android application, using Android Studio 4.1 with Firebase database support, which collects the location, acceleration, proximity, temperature and light sensor data from the smartphone's in-built sensors using the Android sensor framework. The application is also capable to communicate with the wearable devices such as smart-watch (Fitbit). In the Raspberry Pi 3, we have installed the Eddystone Bluetooth Beacon, for sending data periodically.

For training the deep learning module, we have used popular Adam algorithm to update the network weights iteratively and optimize the parameters. The run-time (training time) of the model is 86minutes for all five regions for 100 rounds (epochs). We also observe the performance (accuracy@5 for hotspot prediction) variation of the model using different cell size and batch size, and get optimal results in cell size 64 and batch size 10.

For evaluating the spatio-temporal analysis, we have implemented the methods in GCP VM and QGIS framework. The overall mobility change in India is shown in Fig. 10 . It also shows the phase-wise lockdown and unlockdown measures taken by the government. Based on the overall situation of the disease spread, we have evaluated the prediction in two phases: (i) initial phase of the disease spread (upto unlock 1.0) and (ii) next phase (till mid Dec). The accuracy of predicting the hotspots is shown in Fig. 9 . For this prediction, we have considered the study region as West Bengal, a state of India. We have executed the proposed method on five selected regions of the state where we have the access of individual mobility history of the residents. The precision and recall value computation captures the efficacy of STOPPAGE to predict the boundary of the hotspot region. The ground-truth data has been collected from the state government dashboard 17 in daily basis. It is observed that the precision and recall in both phases of the disease spread is quite high. STOPPAGE can identify the hotspot areas apriori with more than 85% precision and recall values. Table 2 presents the accuracy measure of STOPPAGE along with the baseline methods namely, KNN, Decision Tree (DT), SVM, CNN [37] and ST-RNN [38] . The parameter for KNN is selected as 5. We have chosen radial basis function (RF) as the activation function in NN. A linear kernel is selected for SVM. The results for different runs are captured and average precision, recall and accuracy measure is reported. We have performed the ablation study of the deep learning architecture of STOPPAGE to demonstrate the significance of each of the modules. Table 2 shows the comparison values of the baselines and the ablation result. It is 17 https://wb.gov.in/containment-zones-in-west-bengal. aspx Figure 11 : Accuracy of PKG observed that STOPPAGE outperforms all other baselines in a significant margin. The ablation study proves that the architecture is suitable and better compared to all other possible set-ups. This shows the novelty and the significance of the deep learning module of STOPPAGE in COVID-19 context. The Android application can be used to monitor the current pandemic status of the country, and for assisting the users. It retrieves the data from the nearest fog device regarding the predicted hotspot areas and sends the notification to the users within the predicted hotspots.

One of the major contributions of this paper is constructing the Pandemic-Knowledge graph (PKG) and summarizing the information in the knowledge graph structure. We have implemented PKG with real-life dataset and compared the execution time to extract relations or facts with different number of entities ranging from 1000 to 50,000. The performance of PKG is compared with baseline knowledge graph [33] and temporal knowledge graph [36] . The execution time to extract the facts or relations is shown in TABLE 3. It has been observed that our proposed framework STOPPAGE has lower execution time than the other existing approaches, which indicates STOPPAGE provides better time-efficiency. Fig. 11 shows the accuracy measures for two tasks namely, relation resolution and entity reso-lution. The relation resolution finds out facts or relations from the knowledge graph, and entity resolution refines the entity information effectively. In the temporal knowledge graph [36] , the authors present a novel deep learning based framework to combine the evolving entities and their dynamically changing relationships over time, and represented that as knowledge evolution. However, in the context of COVID-19, this approach does not work well. It is observed that the baseline and temporal KG have achieved accuracy within 0.42 − 0.72 range, while STOPPAGE has an accuracy of 0.91 − 0.93 range. Therefore, it is observed empirically that STOPPAGE has outperformed other existing approaches in a larger margin in terms of accuracy and execution time.

In this paper, we have presented an efficient pandemic management and monitoring framework, namely, STOPPAGE leveraging the Internet of Spatio-Health Things -a Cloud/Fog/Edge/IoT based efficient SDI to overcome the various present challenges due to the outbreak of COVID-19. Specifically, we have presented a novel spatio-temporal data analytics method using deep learning to augment varied contextual information to identify probable hotspots for taking preventive measures. A novel pandemic knowledge graph (PKG) is proposed to find out the correlations of disease outbreak and semantic spatio-temporal information. STOPPAGE presents varied types of movement semantics (cascading and co-occurrence patterns), and spatio-temporal correlation with the COVID-19 spread which are potent features to predict next hotspot zones. The proposed fog/edge based architecture outperforms existing approaches in terms of delay, power consumption and accuracy. It is shown that SDI is one of the major components to combat the pandemic situation. In the future, we will explore the possibility of incorporating climatology data to enhance the accuracy of STOP-PAGE. STOPPAGE facilitates a spatio-temporal data-driven analytics engine running over SDI backbone which is capable to extract meaningful insights from available data utilizing technologies like IoT, mobility analytics in spatial context. However, in future, we will extend STOPPAGE by incorporating domain knowledge from medical and epidemiology experts to make the system more robust and reliable to combat any type of pandemic situation efficiently.

Personalized recommendation system based on collaborative filtering for iot scenarios

Ubiquitous data accessing method in iot-based information system for emergency medical services

Application management in fog computing environments: A taxonomy, review and future directions

Cholera, chloroform, and the science of medicine: a life of John Snow, Medicine

Internet of spatial things: A new reference model with insight analysis

Covid-19 literature knowledge graph construction and drug repurposing report generation

Co-search: Covid-19 information retrieval with semantic search, question answering, and abstractive summarization

Covid-19 knowledge graph: a computable, multi-modal, cause-and-effect knowledge model of covid-19 pathophysiology

Coronavirus knowledge graph: A case study

Kg-covid-19: a framework to produce customized knowledge graphs for covid-19 response

The effect of human mobility and control measures on the covid-19 epidemic in china

Covid-19 outbreak response, a dataset to assess mobility changes in italy following national lockdown

Efficient suspected infected crowds detection based on spatio-temporal trajectories

Staying at home: mobility effects of covid-19

Association between mobility patterns and covid-19 transmission in the usa: a mathematical modelling study, The Lancet Infectious Diseases

B5g and explainable deep learning assisted healthcare vertical at the edge: Covid-i9 perspective

Examining covid-19 forecasting using spatio-temporal graph neural networks

Towards an efficient deep learning model for covid-19 patterns detection in x-ray images

Predicting the growth and trend of covid-19 pandemic using machine learning and cloud computing

Anonymous and secure aggregation scheme in fog-based public cloud computing

A privacy-preserving mobile and fog computing framework to trace and prevent covid-19 community transmission

A drone-based networked system and methods for combating coronavirus disease (covid-19) pandemic, Future Generation Computer Systems (FGCS)

Femtolet: A novel fifth generation network device for green mobile cloud computing

Internet of health things: Toward intelligent vital signs monitoring in hospital wards

Online heart monitoring systems on the internet of health things environments: A survey, a reference model and an outlook

Effective fine-grained location prediction based on user check-in pattern in lbsns

Human mobility in opportunistic networks: Characteristics, models and prediction methods

Real-time gps track simplification algorithm for outdoor navigation of visually impaired

Mobi-iost: mobilityaware cloud-fog-edge-iot collaborative framework for time-critical applications

Iomt amid covid-19 pandemic: Application, architecture, technology, and security

On the responsible use of digital data to tackle the covid-19 pandemic

Citizens as voluntary sensors: spatial data infrastructure in the world of web 2.0, International journal of spatial data infrastructures research

Knowledge graph embedding based question answering

Mario: A spatio-temporal data mining framework on google cloud to explore mobility dynamics from taxi trajectories

Notes on continuous stochastic phenomena

Know-evolve: deep temporal reasoning for dynamic knowledge graphs

A convolutional neural network approach for modeling semantic trajectories and predicting future locations

Predicting the next location: A recurrent model with spatial and temporal contexts

The work is partially supported by DST Research Grant. The authors are thankful to the local medical authority and stakeholders for supporting this work.