key: cord-169081-34z49l4b authors: Sturzenegger, David; Sardon, Aetienne; Deml, Stefan; Hardjono, Thomas title: Confidential Computing for Privacy-Preserving Contact Tracing date: 2020-06-25 journal: nan DOI: nan sha: doc_id: 169081 cord_uid: 34z49l4b Contact tracing is paramount to fighting the pandemic but it comes with legitimate privacy concerns. This paper proposes a system enabling both, contact tracing and data privacy. We propose the use of the Intel SGX trusted execution environment to build a privacy-preserving contact tracing backend. While the concept of a confidential computing backend proposed in this paper can be combined with any existing contact tracing smartphone application, we describe a full contact tracing system for demonstration purposes. A prototype of a privacy-preserving contact tracing system based on SGX has been implemented by the authors in a hackathon. The COVID-19 pandemic has caused a severe human and economic tragedy. As of April 2020, more than 1.2 million COVID-19 infections have been confirmed globally [1] . Governments all over the world have taken action to prevent their health systems from being overwhelmed. National lockdowns as well as social and physical distancing measures have been imposed to slow the spread of disease, [2] . However, these measures have also brought large parts of the economy to a standstill, elevating the risk of a sustained economic downturn, [3] . A. The importance of contact tracing Current research suggests that contact tracing could play a critical role in avoiding or leaving lockdown, [4] . Contact tracing can help maintain a relatively unrestricted society and economy, while minimizing the damage to the health of the population, [5] . Since most transmissions are estimated to occur from pre-symptomatic individuals, traditional manual contact tracing procedures are too slow to effectively contain the COVID-19 spread, [6] . But smartphone apps that immediately alert recent close contacts and prompt them to self-isolate may significantly increase the efficacy of contact tracing. It is estimated that 60% of a country's population would need to participate in contact tracing for it to be effective, but privacy concerns may slow adoption [7] . The fundamental problem is the simple fact that to determine whether two people were in contact their location data needs to be compared. This directly conflicts with the desire of most people to keep their location data private, leading to a trade-off between health and privacy. Contact tracing systems built by the Chinese and South Korean governments have favored health over privacy in the context of the current pandemic. These systems recently came under public scrutiny over issues of data protection and privacy. Critics argue that emergency measures tend to be expanded beyond their original scope, [8] . Hence, liberal countries are clearly in favor of opt-in based apps that use privacy-preserving technologies to minimize privacy and civil liberty intrusions, [9] . For example, a group of European experts recently launched the Pan-European Privacy Preserving Proximity Tracing Initiative to guide on best practices for developing contact tracing apps, but privacy concerns remain, [10] , [11] . Conventional systems rely on a Trusted Third Party (TTP) to keep track of potential infection chains and orchestrate notifications (see section II). This has led some to conclude the need for elaborate governance structures. For example, in [12] the authors suggest amending the Epidemics Act to incorporate so called data trustees, who shall be entrusted with guaranteeing proper data handling. Such considerations are based on the assumption that contact tracing requires the presence of a TTP. However, with the advent of confidential computing this assumption seems outdated, as Trusted Execution Environments (TEEs) may make TTPs obsolete. Confidential computing refers to performing computations with additional data confidentiality and integrity guarantees. TEEs have recently emerged as one of the most flexible and mature technologies, which can enable confidential computing. Many of today's leading technology companies are actively developing and promoting confidential computing technologies. For example, companies like Microsoft, Google, Alibaba Cloud and others have joined forces to form the Confidential Computing Consortium under the Linux Foundation, [13] . Currently, Intel's SGX is the most advanced TEE implementation and the main technology the members of the Confidential Computing Consortium focus on, [14] . This paper proposes an Intel SGX-based contact tracing system which provably cannot reveal any user's location data while providing all benefits of a traditional contact tracing system. We focus on a confidential computing backend that can be used in combination with any of the currently existing contact tracing apps, requiring only minimal modifications. Current contact tracing apps typically rely on pushing the infected user's location data to the entire system. These location data include GPS and/or proximity data, i.e. (typically randomized) identifiers of devices that were close to the current device. On every user's phone, all infected users' data are then compared to the locally stored location data in order to determine whether the mobile user has been within close proximity to infected individuals. If the data is GPS data, this immediately reveals the infected user's past movements and offers very little privacy. If the data is proximity data, this may substantially leak privacy as well. Privacy loss may occur (a) to the mobile-phone user, and/or (b) to the diagnosed patient. 1 Attackers will prefer to attack large data sets located at Hospital servers. These privacy problems can negatively influence a user's decision and willingness to disclose their infection to the system. They may therefore substantially degrade the system's overall effectiveness. Two examples of existing contact tracing systems are discussed in the following. Israel's health ministry recently launched the contact tracing system HaMagen, [15] , [16] , [17] . HaMagen claims that it only processes the users' location data on their devices. However, the system relies on pushing the location data of all infected users over government servers to all users in the system. Hence, the location data of infected people is not protected at all. TraceTogether's approach is similar to the idea behind Apple's "find my device" technology, [18] . Every active phone continuously monitors for Bluetooth Low Energy (BLE) beacon messages, which are broadcasted from other devices together with some identifier. When it picks up one of these signals, the participating phone tags the data and stores it. As a result, no location data is stored on device, but rather a list of "identifiers" of the users one has met. In order to make location tracking more difficult, regularly changing random identifiers, derived from a user's secret key, are used. However, in order to identify potential transmissions, an infected user has to reveal his or her entire proximity data to a central authority, increasing the risk of re-identification (see section II). Current contact tracing apps need to address the following problems: • Revealing data of infected users. Contact tracing apps like HaMagen perform on-device transmission detection. While this protects non-infected user data, it exposes infected individuals to re-identification risk by pushing their identifiers to all edge devices for local matching. • Trustworthiness of central data processing. Other systems, like TraceTogether or Pepp-PT [11] , require all edge devices to send their collected location or proximity data to a central server, where matching of infected and non-infected identifiers is performed. Typically, this makes it difficult for users to verify how their data is processed by the server. More specifically, it becomes impossible to guarantee that their contributed data will not be used beyond the pre-agreed purpose of contact tracing and will be deleted afterwards. Contact tracing systems consist of two components: the smartphone contact tracing app, installed on the user's device, and a contact tracing backend. While special-purpose TEEs exist on smartphones, they currently do not offer all the guarantees that are needed to conduct confidential computing. Especially the concept of remote attestation is lacking in most existing smartphone implementations, which makes them impractical for the use-case discussed in this work. Hence we propose to build a confidential contact tracing backend to address the problems mentioned in section III. While this backend in general can be used with any contact tracing app, we propose a full contact tracing system (i.e. including a specific app) for demonstration purposes. The proposed backend shall leverage Intel SGX to confidentially determine potential chains of transmission, without ever exposing any user data to anyone-not even to the platform operator. Much of the following description will not be particular to a confidential computing solution. The key benefits of using Intel SGX are twofold: one can prove that the system works as described, thereby preventing data misuse, and one can achieve a higher level of privacy protection than with conventional systems. Using SGX technology, the GPS data from the infected patients are encrypted by the hospital in such a way that it can only be decrypted inside the SGX TEE. Similarly, GPS data from the user's mobile phone is encrypted for the same target SGX environment. Once both data sets are now within the trusted boundary of the SGX TEE, they can be decrypted safely and be compared. If a positive match is found, the SGX TEE will report the result over a secure channel to (a) the mobile user, and (b) optionally also to the hospital. The benefit of the SGX TEE is that GPS data is never accessible in plaintext. Once the SGX TEE finishes the comparison of GPS data-sets, SGX will delete ("flush") the data from its memory. This ensures that the original GPS data-sets are present inside the SGX TEE only for a very short time. This has the advantage that attackers are unable to obtain access to large GPS sets. We therefore recommend that hospitals who are in possession of GPS data-sets of infected patients to encrypt their data while in storage. Assume each device generates and emits a random identifier in discrete time intervals ∆t. For example, device A emits a 1 during [t 0 +∆t) and a 2 during [t 0 +2∆t) and so forth. Devices in proximity 2 to one another pick up these random identifiers reciprocally and locally store the sent and received identifiers in a contact tuple log. For example, assume that, while in proximity to one another, device A sends a 1 and device B sends b 1 . In this case, A locally stores (a 1 , b 1 ) and B stores (b 1 , a 1 ) (see figure 1 ). Let's now assume user C is tested at a health authority H. If the test is positive, the health authority submits this information to the confidential computing backend. The (authenticated) user C polls the backend to see whether or not the test was positive. Note that C cannot produce false infection notifications, since only the health authority can perform these calls to the backend. If C decides to notify the user network of his infection, he or she sends his contact tuple log, e.g. {(c 1 , a 1 ), (c 2 , b 2 ), ...}, to the backend which stores it in an encrypted database that is provably 3 only accessible to the backend 4 . All devices regularly poll the backend for matches in the encrypted storage by sending their contact tuples. For example, when A polls the storage, he or she sends {..., (a 1 , c 1 ), ...}. As there was a match between, A's and C's tuples, A is informed that he has been in contact with an infected individual. Note that provably neither A's tuples nor the fact that A was in contact with an infected individual get stored or submitted elsewhere by the backend (consider again footnote 3). V. DISCUSSION Much of the system proposed above is similar to existing contact tracing systems. The main difference consists of the fact that using Intel SGX, it can be proved using remote attestation as well as memory encryption and memory isolation (see also [19] for a high-level introduction to these concepts) that the backend operates exactly as advertised. It is important to stress again that the concept of a confidential computing backend can be used in combination with any contact tracing application, not just the one described here for demonstration purposes. A prototype of a privacy-preserving contact tracing system based on SGX has been implemented and open-sourced in the context CodeVsCovid19 hackathon, [20] . The main benefits of the confidential computing-based backend are twofold: On the one hand it enables effective data minimization (i.e., data does not need to be exposed to perform contact tracing logic); 5 and, on the other hand, it provides transparent and verifiable data processing. This means that users can be guaranteed that their data is only used for the pre-agreed specific purpose of contact tracing. The specific data processing logic can be open-sourced, audited and verified through independent parties. Note that the confidentiality and integrity guarantees of any system-including confidential computing systems-depend on a correct implementation. We did not describe such a full implementation. The purpose of this paper is to demonstrate the concept and feasibility of a privacy-preserving contact tracing system. We believe that a privacy-preserving backend can enable a more widespread and therefore effective contact tracing system. We described the need for a privacy-preserving contact tracing solution: Without a strong focus on data privacy, contact tracing is unlikely to be widely adopted in liberal countries. We propose the use of Intel SGX to build a confidential computing backend that provably cannot reveal any user data and outline a complete contact tracing system for demonstration purposes. Together with currently available contact tracing smartphone applications, such a privacy-preserving contact tracing system could help mitigate some of the adverse effects of the current pandemic. 2020, 4) Coronavirus disease (COVID-19) outbreak situation 2020, 3) Coronavirus deaths in Italy overtake China as economic damage mounts Restarting the Economy and Avoiding Big Brother 2020, 3) Coronavirus deaths in Italy overtake China as economic damage mounts 2020, 3) Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing 2020, 4) European experts ready smartphone technology to help stop coronavirus 2020, 3) Snowden warns: The surveillance states we're creating now will outlast the coronavirus 2020, 4) US and Europe race to develop 'contact tracing' apps Pan-European Privacy-Preserving Proximity Tracing 2020, 4) An even deeper dive into the Secure Enclaves 2020, 4) COVID-19: Gouvernanzmodell für ein digitales Proximity Tracing Confidential Computing Consortium. (2020) Confidential Computing Consortium Intel SGX Israel's Ministry of Health's COVID-19 Exposure Prevention App Isreal's Ministry of Health. (2020, 3) Partial Error at the "HaMagen"' Application Hamagen" Application -Fighting the Corona Virus Singapore's Ministry of Health. (2020, 3) Help speed up contact tracing with TraceTogether 2020) An even deeper dive into the Secure Enclaves cocotrace -Confidential Contact Tracing