key: cord-0145445-oxokbyqz authors: Cao, Yang; Takagi, Shun; Xiao, Yonghui; Xiong, Li; Yoshikawa, Masatoshi title: PANDA: Policy-aware Location Privacy for Epidemic Surveillance date: 2020-05-01 journal: nan DOI: nan sha: a1aeff34b793834e144b4702a99c7709a45a02c7 doc_id: 145445 cord_uid: oxokbyqz In this demonstration, we present a privacy-preserving epidemic surveillance system. Recently, many countries that suffer from coronavirus crises attempt to access citizen's location data to eliminate the outbreak. However, it raises privacy concerns and may open the doors to more invasive forms of surveillance in the name of public health. It also brings a challenge for privacy protection techniques: how can we leverage people's mobile data to help combat the pandemic without scarifying our location privacy. We demonstrate that we can have the best of the two worlds by implementing policy-based location privacy for epidemic surveillance. Specifically, we formalize the privacy policy using graphs in light of differential privacy, called policy graph. Our system has three primary functions for epidemic surveillance: location monitoring, epidemic analysis, and contact tracing. We provide an interactive tool allowing the attendees to explore and examine the usability of our system: (1) the utility of location monitor and disease transmission model estimation, (2) the procedure of contact tracing in our systems, and (3) the privacy-utility trade-offs w.r.t. different policy graphs. The attendees can find that it is possible to have the full functionality of epidemic surveillance while preserving location privacy. We are fighting with the pandemic of COVID-19 disease. To prevent the spread of such a highly contagious virus, This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For any use beyond those covered by this license, obtain permission by emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights licensed to the VLDB Endowment. Proceedings of the VLDB Endowment, Vol. 12 the crucial information that we need is people's location history for epidemic surveillance. Recently, many countries that suffer from coronavirus crises attempt to access citizen's location data to eliminate the outbreak. The US pumped 500 million dollars of emergency funding into the CDC for building a surveillance and data collection system [1] and discussed with Facebook and Google for sharing people's location data to combat the coronavirus. In South Korea, the government created a public map of coronavirus patients using location data from telecom and credit card companies [4] . Italy's telecom companies are sharing location data with health authorities to check whether people are remaining at home [3] . China's giant tech companies provide a "health code" service to certificate a user's health status based on her health status and travel history, which are collected by the cellphone Apps [2]. Although these special measures of personal data collection for public health emergency may be temporary and under stringent government regulation, it raises concerns over privacy, and people are worried that it may open the doors to surveillance activities in the name of public health. It also brings a challenge for location privacy protection techniques: how can we utilize people's mobile data to help combat the pandemic without scarifying our location privacy. Location privacy has been extensively studied in the literature [14] . However, the state-of-the-art location privacy models are not flexible enough to balance the individual privacy and public interest in an emergency as we are witnessing in the COVID-19 crisis. The early studies on location privacy were extending k-anonymity [16] and were flexible enough to be adapted to different scenarios such as personalized location anonymity [9] . But the recent studies revealed that k-anonymity might not be rigorous enough since they suffer many realistic attacks [12, 13] when the adversary has background knowledge about the original dataset. The recent state-of-the-art location privacy models [5, 19, 18, 17] were extended from differential privacy (DP) [8] to private location release since DP is considered a rigorous privacy notion. Although these DP-based location privacy models are rigorously defined, yet they are not flexible and customizable for different scenarios with various requirements on privacy-utility trade-off. Taking an example of Geo-Indistinguishability [5] , which is the first and influential DP- Perturbed Loc. Loc. DB Figure 1 : Private location sharing with Customizable Policy. based location privacy metrics, the strength of protection is solely controlled by a single parameter to achieve indistinguishability among all possible locations. It is hard to make a good privacy-utility trade-off using this single in a complicated setting. We should have a flexible and rigorous location privacy model that enables customizable location privacy policy, which defines which locations are sensitive, which are not. The policy should be adjustable for different people, at different time, and with different purposes. For instance, under the emergency of COVID-19, a location privacy policy for contact tracing could be "allowing to disclose a user's true locations of the past two weeks if she is a diagnosed coronavirus patient; otherwise, ensuring indistinguishability of the user's location"; if the patient's location trace and the time period are confirmed, we can dynamically update the location privacy policy for each person to find all contacts of the confirmed patient. A policy for all other people could be "allowing to disclose a user's true locations if she has been stay in the same location at the same period; otherwise, ensuring indistinguishability of the user's location" In this way, we can guarantee both full usability of contact tracing and reasonable privacy. In this demonstration, we present PANDA, i.e., Policyaware privAcy preserviNg epiDemic surveillAnce, which implements our recently proposed Policy Graph-based Location Privacy (PGLP) [6] and mechanisms for epidemic surveillance. Our system is featured by the customizable location privacy policy graph, which provides a new dimension to tune utility-privacy trade-off. In our recent study [6] , we proposed a formal representation of location privacy policy using a graph, which is inspired by a statistical privacy notion of Blowfish privacy [10] . In our setting of private location release, a privacy policy graph (such as the ones shown in Fig.2 ) includes all possible locations that need to be protected as its nodes, and the edges indicate indistinguishability between two possible locations. A user could arbitrarily customize the location policy graph according to her privacexy and utility requirement and enjoy plausible deniability regarding her whereabouts. The definition of PGLP can be seen as a generalization of two influential DP-based location privacy models: Geo-Indistinguishability [5] and Location Set Privacy [19] . Under appropriate configuration of policy graphs, an algorithm satisfying PGLP w.r.t. the policy graphs could also satisfy Geo-Indistinguishability or Location Set Privacy. In [6] , we also designed mechanisms for PGLP by adapting the Laplace mechanism and Planar Isotropic Mechanism (PIM) (i.e., an optimal mechanism for Location Set Privacy [19] ) w.r.t. a given location policy graph. However, it is not trivial to directly apply PGLP for a location-based application such as epidemic surveillance due to the following reasons. First, it is not clear how to design a proper policy graph with reasonable privacy and functional utility. Second, when there are multiple choices for location privacy policies, we lack a tool to explore and compare the utility gain w.r.t. different location privacy policies. Third, it is difficult for users to understand the privacy implications (i.e., the privacy risks) of a given location privacy policy. To address the above issues and motivated by the significant impact of the pandemic of COVID-19 in the world, we demonstrate a policy-based location privacy-preserving epidemic surveillance system. Our contributions are summarized below. First, we design an epidemic surveillance system with three primary functions: location monitoring, epidemic analysis, and contact tracing. The scenario is shown in Fig.1 , where users locally maintain location databases (e.g., all locations in the past two weeks) and share perturbed locations satisfying PGLP w.r.t. a specific policy graph with a semi-honest server. The policy graph essentially acts as an information filter to control what could be shared and what should not be shared. Second, we demonstrate three policy graphs with the distinct granularity that are appropriate for different functions in the epidemic surveillance. Specifically, we visualize the utility gain or loss between different policy graphs. It turns out that no policy could be the best for all. The attendees of the conference can find that it is possible to have the full functionality of epidemic surveillance while preserving location privacy. Third, we visualize the trade-off between privacy and utility. Although we can specify a policy graph that enables the full usability of the system, yet it is not clear what is the privacy implication given a policy graph. The policy graph itself could be semantically meaningful, but we lack a quantitative measurement. We provide empirical privacy metrics as the adversary's successful inference [15] with an interactive tool The attendees can randomly generate a policy graph to explore its effect on the privacy-utility trade-off. The code is available in github 1 . A prototype of a mobile phone App will be available soon. Inspired by Blowfish privacy [10] , we use an undirected graph to define which location should be protected and which could not, i.e., location privacy policies. The nodes are secrets and the edges are the required indistinguishability, which indicate an attacker should not be able to distinguish the input secrets by observing the perturbed output. In our setting, we treat possible locations as nodes, and the indistinguishability between the locations as edges. Definition 2.1 (Location Policy Graph). A location policy graph is an undirected graph G = (S, E) where S denotes all the locations (nodes) and E represents indistinguishability (edges) between these locations. Definition 2.2 (Distance in Policy Graph). We define the distance between two nodes si and sj in a policy graph as the length of the shortest path between them, denoted by dG(si, sj). In DP, the two possible database instances with or without a user's data are called neighboring databases, which can be interpreted as two nodes with an edge in a policy graph. We generalize it to k-neighbors defined below. Definition 2.3 (k-Neighbors). The k-neighbors of location s, denoted by N k (s), is the set of nodes that reach s within k hops, i.e., N k (s) = {s | dG(s, s ) ≤ k, s ∈ S}. We define ∞-neighbors as the nodes having a path with s, denoted by N ∞ (s). In our system, we assume that the location policy graph is determined by the server for the purposed of utility maximization. The user has the right to reject a privacy policy so that no location will be released. By making the policy graph public, the system has a high level of transparency. We now formalize PGLP (i.e., Policy-based Location Privacy), which guarantees indistinguishability for every pair of neighbors (i.e., for each edge) in a location policy graph. In PGLP, privacy is rigorously guaranteed through ensuring indistinguishability between any two neighboring locations specified by a customizable location policy graph. The user enjoys plausible deniability about her whereabout. Lemma 2.1. An algorithm A satisfies { , G}-location privacy, iff any two ∞-neighbors si, sj ∈ G are · dG(si, sj)indistinguishable. Lemma 2.1 indicates that, if there is a path between two nodes (locations) si, sj in the policy graph, the corresponding indistinguishability is required at a certain degree; if two nodes are not connected (i.e., dG(si, sj) = ∞), the indistinguishability is not required by the policy. As an extreme case, if a node is not connected with any other nodes, it allows to release it without any perturbation. We analyze the relation between PGLP and two influential DP-based location privacy models, i.e., Geo-Indistinguishability [5] and δ-Location Set Privacy [19] . We show that PGLP implies each of them under proper configurations of location policy graphs. Figure 3 : System Overview. Geo-Indistinguishability [5] guarantees a level of indistinguishability between two locations si and sj that is scaled with their Euclidean distance, i.e., ·dE(si, sj)-indistinguishability, where dE(·, ·) denotes Euclidean distance. Let G1 be a location policy graph that every location has edges with its closest eight locations on the map as shown in Fig.2 (left) . We can derive the following theorem by the fact of dG(si, sj) ≤ dE(si, sj) for any si, sj ∈ G1 and Lemma 2.1. Theorem 2.1. An algorithm satisfying { , G1}-location privacy also achieves -Geo-Indistinguishability. δ-Location Set Privacy [19] extends differential privacy on a subset of possible locations, which is assumed as adversarial knowledge. δ-Location Set Privacy ensures indistinguishability among any two locations in the δ-location set. Let G2 be a location policy that is a complete graph among locations in the δ-location set as shown in Fig.2 (right) . Theorem 2.2. An algorithm satisfying { , G2}-location privacy also achieves δ-Location Set privacy. The proofs and the mechanisms for PGLP are presented in a full version of this paper [6] for interested readers. Our system provides consist of three main modules: PGLP mechanisms, Location Policy Configuration, and Epidemic Surveillance Apps as shown in Fig.3 . PGLP mechanisms are proposed in [6] for achieving rigorous and customization location privacy. It takes inputs of , location policy graph G and the user's true location, and outputs a perturbed location to the server. The policy G recommended by Location Policy Configuration and approved by the user. Location Policy Configuration defines different location policies according to the application of epidemic surveillance. Three primary functions (Apps) for epidemic surveillance are location monitoring, epidemic analysis and contact tracing. Location monitoring focuses on understanding people's movement between different cities or provinces in a coarsegrained level, which provides essential insights when combining with the incidence rate in each city along with the people's movement. It could also provide a "health code" service, i.e., allowing certification of the users health status, in a privacy-preserving way. A location policy for location monitoring can be "ensuring indistinguishability inside each coarse-grained area and allowing the locations are distinguishable in different coarse-grained areas" such as Ga shown in Fig.4 since such a monitor only requires the people moving between different cities. Epidemic analysis aims at building a predictive disease transmission model such as the SEIR model [11] . The fine-grained data would be beneficial for the estimation of the parameters such as R0 (i.e., basic reproduction number). A location policy for epidemic analysis is similar to the previous one, but more fine-grained, such as G b in Fig.4 . Contact tracing attempts to find all contacts of a diagnosed case so that to stop the spread of disease by finding and isolating patients. A policy for contact tracing can be "ensuring indistinguishability only if the user is not in an infected area, but allowing disclose true location if the user accesses an infected location", which can be formally represented by a graph Gc in Fig.4 . We introduce more details about contact tracing below. We demonstrate the system using Geolife [20] and Gowalla [7] datasets. Interested readers can find a more detailed configuration in [6] . We provide an interactive tool that allows the attendees to explore and examine the usability of our system: (1) the utility of location monitor and coronavirus transmission model estimation, (2) the procedure of contact tracing in our systems, and (3) the privacy-utility trade-offs, as shown in Fig.5 w.r.t. different policy graphs. First, we evaluate the utility of location monitoring as the Euclidean distance between perturbed locations and real locations. We test the accuracy of transmission model estimation using the difference between (i.e., basic reproduction number) R0 estimated over accurate locations and the perturbed locations, respectively. Second, we demonstrate the procedure of contact tracing using our system and dynamic policy graphs (such as Gc in Fig.4) . The goal is identifying the people who have the risk of infection (the decision rule of suspected infection could be advised by CDC or WHO; here we assume a simple rule of two persons have been the same location at the same time at least twice). At each time point, each user sends the perturbed location w.r.t. her policy graph and stores the past two weeks of location history in a local database. When the server confirms a diagnosed patient's location history, the Policy Graph Configuration module will update the location privacy policy of the users who have the risk of infection during the past two weeks (according to our simple rule). Then, the corresponding user will be asked to re-send his past location using the updated privacy policy (the places where the diagnosed patient has been are allowed to be disclosed). In this way, the user can get alerted and tested in case of infection. Third, similar to the previous utility evaluation, we will also allow the attendees to evaluate the empirical privacy that is measured by adversary error [15] . One can choose predefined policy graphs, as shown in Fig.4 , or randomly generate policy graphs to explore its effect on the privacy-utility trade-off. Geo-indistinguishability: differential privacy for location-based systems Customizable and rigorous location privacy through policy graph Friendship and mobility: User movement in location-based social networks Differential privacy Protecting location privacy with personalized k-anonymity: Architecture and algorithms Blowfish privacy: tuning privacy-utility trade-offs using policies Global stability for the seir model in epidemiology t-closeness: Privacy beyond k-anonymity and l-diversity L-diversity: privacy beyond k-anonymity The long road to computational location privacy: A survey Quantifying location privacy A model for protecting privacy Geo-graph-indistinguishability: Protecting location privacy for LBS over road networks Prolonging the hide-and-seek game: Optimal trajectory privacy for location-based services Protecting locations with differential privacy under temporal correlations 0: a location-based social networking service