key: cord-0214509-pnj5eq9s
authors: Barthe, Gilles; Viti, Roberta De; Druschel, Peter; Garg, Deepak; Gomez-Rodriguez, Manuel; Ingo, Pierfrancesco; Lentz, Matthew; Mehta, Aastha; Scholkopf, Bernhard
title: PanCast: Listening to Bluetooth Beacons for Epidemic Risk Mitigation
date: 2020-11-16
journal: nan
DOI: nan
sha: c3d422c7ed661b7a76046cfa61cc4c8b2c6d4415
doc_id: 214509
cord_uid: pnj5eq9s

During the ongoing COVID-19 pandemic, there have been burgeoning efforts to develop and deploy smartphone apps to expedite contact tracing and risk notification. Most of these apps track pairwise encounters between individuals via Bluetooth and then use these tracked encounters to identify and notify those who might have been in proximity of a contagious individual. Unfortunately, these apps have not yet proven sufficiently effective, partly owing to low adoption rates, but also due to the difficult tradeoff between utility and privacy and the fact that, in COVID-19, most individuals do not infect anyone but a few superspreaders infect many in superspreading events. In this paper, we proposePanCast, a privacy-preserving and inclusive system for epidemic risk assessment and notification that scales gracefully with adoption rates, utilizes location and environmental information to increase utility without tracking its users, and can be used to identify superspreading events. To this end, rather than capturing pairwise encounters between smartphones, our system utilizes Bluetooth encounters between beacons placed in strategic locations where superspreading events are most likely to occur and inexpensive, zero-maintenance, small devices that users can attach to their keyring. PanCast allows healthy individuals to use the system in a purely passive"radio"mode, and can assist and benefit from other digital and manual contact tracing systems. Finally, PanCast can be gracefully dismantled at the end of the pandemic, minimizing abuse from any malevolent government or entity.

During the ongoing COVID-19 pandemic, efforts have been made to develop and deploy smartphone apps to expedite contact tracing and risk notification. Most of these apps track pairwise encounters between individuals via Bluetooth and then use the tracked encounters to identify and notify those who might have been in proximity of contagious individuals. Unfortunately, the effectiveness of these apps for contact tracing has been limited, partly due to low adoption rates, a difficult tradeoff between utility and privacy, and COVID-19's overdispersion, i.e., most infected individuals do not infect anybody else but a few superspreaders infect many.

PanCast is a system for epidemic risk assessment and notification that scales gracefully with adoption rates, utilizes location and environmental information to increase utility without tracking its users, and can be used to identify superspreading events.

In PanCast, Bluetooth beacons placed in strategic locations continuously broadcast ephemeral IDs that change over time. A subset of these beacons, called network beacons, also broadcast risk information (associated with times when individuals who tested positive were near specific beacons). Users carry inexpensive, zero-maintenance, small dongles in the form of cards or keyfobs that listen to these beacon broadcasts passively (without transmitting anything), store beacon ephemeral IDs in local memory and, when in proximity of a network beacon, compare these stored IDs with the broadcast risk information. When an individual tests positive for the infectious disease, they can optionally disclose (a selected part of) the list of ephemeral IDs stored in their dongle, which becomes part of the global risk information. This means that the information about which locations were dangerous at which times gets included in the broadcast risk messages. By design, a user learns their risk of contagion due to proximity to diagnosed individuals but does not learn anything about locations visited by other individuals unless the user was in physical proximity at the same time.

PanCast has the following key properties:

• It is useful despite low adoption. PanCast lends itself to incremental deployment naturally by placing beacons in locations where superspreading events are most likely to occur and it can assist and benefit from existing digital contact tracing systems and manual contact tracing.

• It ensures data minimization and prevents tracking of users. A healthy individual can use the system in a purely passive radio mode. Individuals who test positive can optionally and partially disclose the same type of information as in a manual contact tracing interview. Disclosed information is accessible only to individuals at risk and in a privacy-preserving way.

• It can be used to identify superspreading events. Similar to manual contact tracing, PanCast makes use of location and environmental information (in a privacy-preserving way). This is crucial to the identification of superspreading events.

• It is an inclusive system. PanCast enables the participation of technology challenged, economically disadvantaged, and physically challenged individuals, who cannot or do not wish to use smartphones.

• It can be gracefully dismantled. At the end of the pandemic (or at any time), users who only used the system in passive radio mode can reset or destroy their dongles. Risk information uploaded by users who voluntarily decided to disclose their list of stored ephemeral IDs does not contain any identifiable personal data and might be valuable for epidemiological studies. Containing infectious diseases such as the ongoing COVID-19 pandemic requires effective testing, contact tracing, and isolation of infected individuals (TTI) [34, 44] . Among these, contact tracing is an important tool that can help use the limited test resources for those who are most likely to be infected, by identifying infected individuals and those who came into close contact with them during their infectious period. Furthermore, contact tracing provides knowledge of the circumstances of contagion, which in turn informs the implementation of public health policies and interventions that can help contain further disease spread. For instance, outbreaks of COVID-19 in meat packing plants in Germany [9] provided insights into conditions that can potentially breed infection hotspots, which led to regulatory action 1 .

To expedite contact tracing and assist health officials in responding to patients and at-risk individuals, many digital contact tracing systems have been deployed [10] [11] [12] [13] 15, 17, 33] . We collectively refer to these systems as SPECTs (smartphone-based, pairwise encounter-based contact tracing systems). In SPECTs, individuals install a smartphone application that records instances of physical proximity with devices of other individuals via close-range Bluetooth exchanges, referred to as encounters. When an individual is diagnosed with the infectious disease, they report their recent encounter history to a health authority, which can then be used to notify other infected or at-risk individuals who might have come into contact with the diagnosed individual during the individual's period of contagiousness (indicated by mutual encounters).

Depending on how risk status is determined and delivered to users, current SPECTs are broadly grouped into two types: centralized and decentralized. In a centralized system, the central health authority computes the risk status for all users in the system, and either broadcasts it to everyone (e.g., PEPP-PT-NTK [11] , systems based on the BlueTrace protocol [25] like OpenTrace [10] or TraceTogether [15] ) or returns the user-specific risk status to an individual upon inquiry (e.g., ROBERT [12] ). In a decentralized system, the health authority broadcasts information about patients' encounters, and the smartphone app computes the user's risk status locally on the device and reports it to the user. Examples of decentralized systems include those based on DP3T [17] , PACT [33] , and the TCN coalition [13] .

In centralized systems, all user data is collected and managed centrally, which places a high degree of trust in the central authority. On the other hand, decentralized systems aim to preserve the privacy of users and, therefore, minimize data that is shared with authorities. However, this data minimization also prevents aggregation of data for epidemiological analysis.

Despite strong efforts to deploy SPECTs and the availability of tracing apps in many countries, there is scarce evidence of the effectiveness of SPECTs [6, 28] . This can be partially due to: (i) low adoption rates (<25%) in countries that do not mandate their use [7] ; (ii) a difficult tradeoff between utility and privacy, which have precluded the use of location and environmental features as in manual contact tracing; and (iii) multiple lines of evidence that suggest that, for COVID-19, the number of secondary infections caused by a single individual is overdispersed, i.e., most individuals infect few and a few superspreaders infect many [32, 42, 60] .

Whether centralized or decentralized, existing solutions have several limitations that may contribute to their limited adoption and effectiveness:

• Inclusion. SPECTs require that participating users carry smartphones, thus excluding individuals who cannot or do not wish to use such a device. However, such individuals may be among the most vulnerable due to their age, economic situation, or special needs. In contrast, we seek to address inclusion by supporting simple, inexpensive, zero-maintenance devices that users can attach to their keyring, or wear around their wrist or neck. In the rest of this paper, we refer to such user devices as dongles.

• Circumstantial information. To protect user privacy, existing systems collect pairwise encounter events and their durations, but no information about the circumstances of encounters. For instance, they do not record locations, which would allow identification of infection hotspots. They also do not capture other features relevant for disease transmission, such as indoor vs. outdoor, room size, ventilation, air quality [58] , or noise level. 2 From an epidemiological point of view,  it would be desirable to get accurate probabilistic transmission models describing the influence  of such features on the infection risk associated with an encounter; however, controlled trials are  ruled out by ethical concerns, and data-driven models require data including the relevant features.  Such models would help us understand disease transmissions, justify political decisions, reduce false positive rates when alerting individuals, and enable more economical use of testing resources [21] .

We seek to address this issue by using Bluetooth beacons that are strategically placed at known locations (or trajectories in case of beacons placed in vehicles) and tagged with the relevant features. User dongles capture encounters with the beacons (not other user dongles as in SPECTs), thus indirectly capturing location information useful for risk identification.

• Privacy. SPECTs, including decentralized variants, raise privacy concerns related to the fact that user smartphones continuously transmit information. These transmissions may enable eavesdroppers to learn the whereabouts of users who subsequently test positive and possibly infer the identities of patients using background information. Moreover, SPECTs collect and store sensitive contact information on smartphones, where their privacy is subject to multinational platform providers' compliance with national privacy laws, and to the robustness of these platforms to hacking [53] .

We seek to address this concern by enabling user dongles to just listen passively to Bluetooth broadcasts from beacons, without transmitting anything during normal operation. Dongles transmit only when a patient is diagnosed positive and, even then, they only disclose information explicitly approved by their owners. Moreover, dongles rely on a single-purpose, custom software and hardware platform with a small attack surface and are independent of multinational platform providers.

• User transparency. Current systems are non-transparent in that users cannot readily interpret what information is exchanged with other users' devices and with the backend server (if there is one). Consequently, users cannot make informed, selective decisions about information they wish to release.

We seek to address this limitation by presenting the information gathered by a dongle to the user as a time series of visited beacons labeled with their locations. A diagnosed user can easily interpret the labeled information and can select parts of their history that they wish to disclose.

• Scaling with adoption level. Arguably, the lack of inclusion and user transparency, as well as concerns about privacy impede adoption of SPECTs, which in turn limits their effectiveness. Worse, SPECTs rely on pairwise encounters between user devices and therefore suffer from the n 2 problem [33] : if only a relatively small fraction n < 1 of a population uses the system, then at most n 2 1 of all encounters can be detected. Indeed, few SPECTs have adoption rates larger than 10% in their countries [52] , which implies that most relevant encounters are not detected.

We address this scaling issue by relying not on pairwise encounters between individuals, but on encounters between individuals' dongles and beacons installed in relevant locations. In this case, if a fraction n of people carry dongles, we detect n of the encounters of sick people with beacons. As argued below, this even allows certain risk predictions that do not involve (and indeed should not be limited to) pairwise encounters. Moreover, our system lends itself to incremental deployment naturally by placing beacons strategically in locations where superspreading events are more likely to occur.

• Non-contemporaneous transmission. SPECTs can only detect infection transmissions between individuals who were in physical proximity of each other, i.e., have been in the same location at the same time. These systems cannot detect fomite transmissions (contaminated surfaces) nor airborne transmissions caused by the dissemination of droplet nuclei (aerosols) that may remain infectious over longer time periods when suspended in air, especially when ventilation is poor [48] .

By relying on beacons, we can detect potential transmission among users who visit a location within a certain period of time without ever meeting.

• Indentifying superspreading events. Given a diagnosed patient A, contact tracing can find individuals at risk in two fundamentally different ways. First, by finding people that A may have met and infected. Second, by identifying the event where A was infected, and then looking for others who have attended that event. Most European countries have focused on the former but there is growing evidence that countries that have focused on the latter, e.g., Japan 3 , have been more successful at containing the spread of COVID-19. This can be in part due to the observed overdispersion-most infected individuals do not infect anyone, while a few superspreaders infect many [32, 42, 43, 60] . Therefore, while the diagnosed patient A is unlikely to be a superspreader herself, she is likely to have been infected by a superspreader. If contact tracing can identify the event where she was infected, we may be able to trace an entire cluster [37] .

Due to the lack of circumstantial information, SPECTs do not facilitate identification of such events, leading to the recommendation that citizens should in addition keep a 'cluster diary' [35] . Our beacon-based system can address this limitation. Users diagnosed after encountering a beacon can help authorities identify a potentially relevant event. In this case, authorities may decide to notify all attendees of the event, even if they did not attend at the same time as a diagnosed patient, and even if they do not carry a dongle (by conventional means of contract tracing or even via broadcast media in the case of public events).

• Interoperability with manual contact tracing. While digital contact tracing has not played a major role in containing COVID-19 in most countries to date, manual contact tracing has proven important [27] . However, most digital solutions fail to provide assistance to manual contact tracing and cannot benefit from information obtained through manual contact tracing. This is in contrast with our beacon-based system, which can assist and benefit from manual contact tracing.

A number of ongoing efforts seek to address some but not all of the limitations of SPECTs. Specifically, efforts are underway to rely on simple, small form-factor dongles as user devices such as the Trace Together Token [14] , the Corona Warning Buzzer [3] , the Corona Armband [2] , Contact Harald [1] , and Minew [8] . In the same spirit as our system, other ongoing efforts, such as the MIT Safe Paths [29] , WiFiTrace [55] and CrowdNotifier [5] depart from pair-wise Bluetooth encounters and instead focus on presence tracing. In these systems, users match their personal diary of location data on their smartphones, acquired using GPS (Safe Paths), WiFi network logs (WiFiTrace) or QR codes (CrowdNotifier), with the anonymized location history of infected patients. However, by relying on smartphones, these systems are less inclusive and, even in their decentralized versions, are more prone to privacy and security attacks than our system. Finally, this work focuses only on the tracing component of TTI and, like other SPECTs, the broader TTI strategy that it contributes to could suffer from weaknesses in testing and isolation that are beyond the scope of this work. These weakenesses need to be addressed by governments or health authorities separately. For example, a recent survey [4] has shown that (i) not every individual who needs to get tested is actually tested, either for lack of testing capacity or unwillingness of people to get tested, and (ii) many individuals are unable to self-isolate for a variety of reasons, including sharing their household with others, having to take care of children, not being able to afford weeks of isolation, not being able to take medical leave, psychological reasons, and fear of stigmatization (most prevalent among youth).

We propose PanCast, a system that enables rapid dissemination of risk information to users in a passive, secure, and privacy-preserving manner. Unlike existing SPECTs that record pairwise encounters between user smartphones, we rely on simple dongles recording the presence of BLE (Bluetooth Low Energy) beacons installed in strategic locations. Thus, dongles do not capture or disclose trajectories of interpersonal encounters. Instead, they collect space-time information about their encounters with beacons. In these encounters, the dongles by default only listen to the beacon broadcasts passively (without transmitting anything) 4 .

When a user tests positive for the infectious disease, they can voluntarily choose to connect their dongle to the system using a terminal that has both a BLE interface and an internet connection. Through the terminal, the dongle uploads the user's recent history of encountered beacons to the backend together with a certificate of positive diagnosis from the health facility. When permitted or required by law, PanCast can also be set up to allow the user to select a subset of their history for upload by manually going through locations they have visited (see Section 3.3 for details).

A subset of the beacons, called network beacons, are connected with health authorities over the internet and play a special role: They continuously broadcast current risk information over BLE. Risk information is a list of ephemeral ids (called risk entries) of beacons, where diagnosed individuals were recently present, as well as encounter circumstances considered risky. Whenever a dongle encounters a network beacon, it passively listens to the beacon's risk broadcast. If the dongle finds a risk entry matching with its own recorded space-time history, it alerts its user of potential risk, e.g., through a blinking LED. In case of an alert, health authorities may encourage (or mandate that) the dongle owner present themselves for testing.

The above design enables PanCast to have the following properties:

• Usability. Users carry inexpensive, zero-maintenance, small devices (e.g., keyfobs attached to a keyring or worn on wrist or around the neck) with a minimal user interface. Thus, PanCast enables the participation of technology-challenged, economically-disadvantaged, and physically-challenged individuals.

• Utility. PanCast provides utility for users and health authorities. Users obtain risk notification, while health authorities receive the relevant space-time information about infection events, which they can use for epidemiological analysis 5 .

• Privacy. In PanCast, a user device never transmits any information unless the owner chooses to do so (e.g., after the user is tested positive). When an individual is diagnosed they explicitly consent to the transmission and can select the information they wish to transmit from their device to the health authority. Even the most privacy-conscious users who disclose nothing and never transmit anything from their dongles (i.e., only listen passively) receive risk notifications arising from space-time proximity to other diagnosed users who choose to disclose information after their diagnosis. Thus, from a privacy standpoint, the user experience is closer to that of manual contact tracing, where users decide specifically which information to share and those at risk get notified. Moreover, a user learns about potential risks in a visited location only when a diagnosed individual visited the location within a time window, but does not otherwise learn the location history of a diagnosed individual. Finally, the risk broadcast provides strong differential privacy guarantees for the number of risk entries and the number of diagnosed individuals contained in a risk broadcast. This ensures that an adversary, even one with offline information about some users, learns nothing about the health status or whereabouts of the remaining users through the system.

• Security. PanCast's data collection and risk dissemination protocols are immune to many of the attacks that are possible with SPECTs [24, 57] . Moreover, PanCast's simpler devices have a smaller attack surface compared to smartphones.

• Interoperability. PanCast can effectively and transparently complement manual contact tracing. Health authorities may manually obtain location data from consenting diagnosed individuals and insert records into PanCast. Thus, even users who do not carry a dongle can contribute to subsequent risk estimates and broadcasts. Vice versa, by providing a user-comprehensible record of visited beacon locations, PanCast can be used as a diary aiding the memory of individuals who participate in manual contact tracing. Moreover, since our system can associate risk events with locations, information about potential superspreading events can be broadcast also using traditional means of communication. Furthermore, PanCast can be designed to inter-operate with existing SPECTs.

We acknowledge that PanCast requires investment in infrastructure-beacons must be installed in relevant locations and users need to be provided dongles. However, PanCast's utility scales more quickly with the degree of deployment than SPECTs', which suffer from the n 2 problem as described above. Thus, a strategic deployment plan can help ease the burden of investment while covering high-risk areas that contribute to significant spread of infection. Initially, beacons can be placed in locations where infections are most likely. In addition, network beacons can be colocated or integated with existing WiFi base stations, which can further reduce the installation and maintenance costs for these beacons. Finally, we expect the battery-operated BLE beacons and dongles to be very cheap-in the order of 10 euros per device when produced in bulk. Future updates to PanCast protocols can also be easily pushed to the devices through signed, over-the-air firmware upgrades, thus providing extensibility. Possible deployment scenarios include the case where, initially, specific areas such as a hospital, a company or a school may decide to roll out PanCast across their physical site. This may enable a country to keep critical infrastructure open during lock-down phases of a pandemic, while retaining the ability to swiftly test, trace and isolate cases even if manual contact tracing systems are overloaded. It would also allow us to gather data to better understand where infections take place in order to take countermeasures. Public acceptance for such measures to improve safety in the work place is likely to be higher than for systems such as SPECTs that continuously monitor encounter events even when people are in their homes.

We start with an overview of PanCast's components and our trust assumptions about them. Figure 1 shows an overview of PanCast's architecture. PanCast comprises two types of beacons (BLEonly and BLE+network), personal devices (dongles), terminals, and a backend platform that relays risk notifications and aggregates data for epidemiological analysis. All beacons and dongles are registered and authenticated with the backend, receive a secret key from the backend at the time of registration, and have a coarse-grained timer and a small amount of flash storage. When a user receives a dongle, they receive a list of one-time passwords (OTPs) that are also stored in the dongle. The user uses these OTPs to authenticate to the dongle during testing and to control upload of data from the dongle. We elaborate on the components and their functionalities next.

Beacons. There are two types of beacons. BLE beacons are commodity, battery-operated BLE-only beacons and require no network connection to the backend. Network beacons also use BLE, but addi-tionally require mains power and a network connection to provide connectivity to backend servers. The beacons serve two purposes:

(i) Every beacon provides a localization point in a specific place (e.g., an office, a bar, a public bus).

A beacon periodically broadcasts an ephemeral id, its device id, and a location id that is either a fixed geo-coordinate or a service id identifying the beacon's trajectory. The device id and the location id are cryptographically signed by the backend, while the ephemeral id is generated by the beacon by hashing a beacon-specific secret key, its location id, and an epoch number derived from its local clock. Because the hash includes the epoch number, a fresh ephemeral id is broadcast in every epoch. This localizes every encounter between the beacon and a dongle to a specific epoch. The epoch length is set upfront to a small value, e.g., 15 minutes.

(ii) Network beacons additionally broadcast global risk information received from the backend periodically using a protocol that balances privacy and efficiency.

Beacons are installed in specific locations, for instance, under the guidance of health authorities or by organizations who voluntarily place them on their own premises. Stationary beacons broadcast a GPS coordinate or a named identifier (e.g., city, zipcode, etc). Beacons may also be installed in mobile locations like a city bus or a train; they broadcast an id that identifies their trajectory (e.g., train service id). All beacons are registered with the backend using their id and their location id comprising their stationary coordinate or trajectory, as well as information about their location that may be epidemiologically relevant (e.g., indoor, outdoor, ventilation, air quality, ambient noise level, etc.). This information can be used by the backend when computing infection risks or performing epidemiological analyses. Beacons may serve one or both of the above functionalities. Simple, battery-operated BLE beacons would account for the majority of beacons. They are cheap and easy to install, because they do not require mains power or network connectivity. We expect them to be installed wherever infection transmission is likely to occur (e.g., in places where people congregate). A smaller number of network beacons provide nearby dongles with risk information. To reduce installation costs, we expect them to be installed where power and network is already available, e.g., next to a WiFi base station.

Beacon placement. The density and placement of beacons is important for minimizing false positives and false negatives in PanCast. False positives arise when a user receives a risk notification even though they have not been in close contact with a diagnosed user. For instance, a false risk notification may be generated when two users encounter a beacon placed on a glass door but from opposite sides of the door. Such false positives can be reduced by placing a sufficient number of beacons in a location and using wellknown localization techniques, and by relying on additional labels on beacons (e.g., whether a beacon is indoor or outdoor, etc.). False negatives arise when potential transmissions between users are missed, for instance, because of users meeting in locations where there are no beacons. Beacon deployments can be planned strategically to minimize false negatives. For instance, restaurants are likely to be more crowded than parks; therefore, restaurants must be prioritized in a partial rollout.

Dongles. Dongles are small, simple devices that users can attach to a keyring, or wear on the wrist or around the neck. They operate off a coin battery and have a minimal user interface in the form of a LED that indicates risk status and battery condition, and a button to control the LED notification.

Dongles continuously listen for BLE transmissions from nearby beacons. They receive ephemeral ids from both types of beacons and store them along with the timestamps. When in proximity of a network beacon, they additionally receive risk information, compare that information with the device's stored history of ephemeral ids received from beacons, and alert the user in case they were in proximity of a confirmed patient under circumstances that suggest a possible transmission.

As discussed in more detail in Section 3.3, dongles transmit information only when in the presence of terminals and with the explicit consent of the owner (terminals are introduced below). Normally, such information uploads occur only when the owner has tested positive. However, users have the option of contributing their beacon encounter histories on a regular basis to aid health authorities with epidemic analytics.

Backend service. The backend maintains several databases. First, it maintains a database of registered beacons, their locations/trajectories, and the secret key used by each beacon to generate the unique sequence of ephemeral ids it broadcasts. A second database contains registered users, their dongles, and the cryptographic keys required to authenticate each dongle. A third database, called the risk database, contains the uploaded encounter histories of recently diagnosed individuals. Finally, a fourth database, called the epidemiology database, contains the encounter histories of healthy users who chose to contribute their data for epidemic analytics.

The encounter database is available to health authorities for analytics, e.g., to identity hotspots, superspreading events, and to estimate epidemiological parameters. Moreover, it is used by the backend to transmit risk information to the network beacons, which those beacons in turn broadcast to nearby user dongles.

User terminals. User terminals are provided at locations that issue dongles as well as health care faclities that do testing. Terminals allow users to connect to their dongles over BLE to change their privacy settings, inspect what data is recorded on their dongles, upload data to the backend and, when allowed or required by law, decide what subset of the recorded information they wish to upload when they are diagnosed. Users can also use personal computers or smartphones as terminals to perform these tasks in the comfort of their homes, or use the smartphone of a care provider who visits their home.

Beacons. Beacons are operated by untrusted parties, may fail, be accidentally misconfigured, or be corrupted by malicious parties. However, we assume that only a small fraction of beacons is affected at any time.

BLE and network beacons are managed by untrusted third parties, but they are registered and approved by the backend, run trusted firmware, and only accept firmware updates signed by the backend. Network beacons relay risk information from the backend to user dongles. This information is signed by the backend so dongles can detect unauthorized changes by corrupt beacons. For timely dissemination of risk dissemination (liveness), PanCast requires that most beacons transmit risk information without change.

BLE beacons are prone to four types of issues, which may lead to inconsistencies in data collection and subsequently incorrect risk notification and epidemiological analysis. First, beacons may fail; for example, they may run out of battery or may be rebooted at any time. Clocks in beacons may go out of sync due to failures and reboots, or due to clock drift. Second, a beacon may be installed in a location different from the one it was registered for; such misconfigurations may result from human error or deliberate acts. Misconfigured beacons can cause the backend to mis-identify the physical location of potential transmissions, hotspots, or superspreading events. Third, attackers could relay and re-broadcast a legitimate beacon's transmissions at a different location, even in near-real time. Relay attacks can cause the system to temporarily infer transmission risks that do not actually exist (false positives). Fourth, an attacker could obtain illegitimate access to a beacon and hack it or break into it physically. A compromised beacon allows the attacker to steal its secret key and replicate the beacon's broadcast, thus creating false positives as with relay transmissions above. We assume that such failures, misconfigurations, and attacks are sporadic and uncoordinated. The backend can detect misbehaving and failed beacons, and initiate a manual repair. PanCast's backend can also detect and correct inconsistencies in the collected data to a large extent. While a beacon is unavailable for any reason, transmissions in its vicinity may be missed by PanCast (false negatives).

We do not address side-channel leaks of beacon secrets, for instance, through power or electromagnetic radiations.

Dongles. Dongles may fail or be corrupted by malicious parties. However, we assume that only a small fraction of dongles is affected at any time.

Like beacons, dongles are registered and approved by the backend, run trusted firmware, and only accept firmware updates signed by the backend. What information users must provide to register and obtain a dongle is subject to local policy and legislation. Minimally, evidence should be required to prevent any individual from registering many dongles. Requiring additional information such as names and contact details enables health authorities to actively trace individuals at risk, but raises privacy concerns that may limit adoption.

Like beacons, dongles may fail and be physically compromised. Dongle failures (e.g., reboots, permanent failures) may cause false negatives due to missed beacon broadcasts while the dongle is unavailable or the user obtains a new one. A physical compromise may allow the attacker to steal the owner's data from the dongle and make up plausible encounter histories, thereby potentially causing some false positives, some false negatives or adding misleading information about user behavior.

Side-channel leaks of secrets from dongles are out of scope. Also, PanCast does not address leaks through the sizes and timing of location history uploads from dongles to the backend. In principle, these can be mitigated by adding chaff traffic to dongle uploads, and having users upload chaff data at random times.

Backend. PanCast's backend service and its controlling authority are trusted to not leak or misuse user registration information, secret keys shared with devices, the encounter histories that users upload, and the backend's private key. The backend is also trusted to transmit risk information correctly to beacons. 6 User terminals. Terminals are trusted to a limited extent.

There are two cases. First, users may use their personal computing devices (computers or smartphones) as terminals. These computing devices are implicitly trusted by users. Second, users who do not own personal computing devices may use public terminals, or use the smartphone of a care provider. These terminals are trusted to a limited extent: (i) When such a terminal is used to select information on a dongle prior to upload, the terminal must be trusted to not leak that information. (ii) When such a terminal is used to send commands to a user dongle, the terminal is trusted to send the correct commands. Terminals are not trusted in any other way.

Eavesdroppers. Any BLE device may receive and store beacon broadcasts. However, we assume that no party has the ability to continuously receive and collect the BLE transmissions of a significant fraction of beacons within a large, contiguous geographic region.

BLE eavesdroppers can receive the transmissions of beacons they are close to and conspiring users may aggregate their observations. However, we assume that no party has the ability to continuously receive and collect the transmissions of a significant fraction of beacons within a large, contiguous geographic region. Malicious users and eavesdroppers may attempt to combine information obtained through PanCast with any amount of auxiliary information about a subset of users obtained through offline channels to try and violate the privacy of the remaining users.

We elaborate on PanCast's security and privacy properties in Section 6.

We next describe PanCast's operation involving device registration, encounter logging, encounter uploads to the backend, and encounter data processing at the backend.

When a dongle is registered, it receives a dongle id d, the backend's public key, an initial clock C d synced to real time, a secret key sk d and a list of one-time passwords (OTPs) from the backend. When a beacon is registered, it receives a beacon id b, the backend's public key, a location id loc b corresponding to the location where the beacon is supposed to be installed, an initial clock C b synced to real time, and a secret key sk b from the backend. The backend stores the registration data of the beacons and dongles in a device database. Specifically, the database contains entries of the form:

{device type, device id} :-{secret key, initial clock, clock offset, <location id>} where the device type is either beacon or dongle, and the location id exists only if the device type is beacon. The clock offset in the backend's database denotes any divergence between the local timer of a device and real time that is known to the backend. It is initialized to 0 during device registration. The clock offsets for a dongle d and a beacon b are denoted δ d and δ b , respectively. To simplify the presentation, in this document, we assume that all beacons and dongles are initialized synchronously, i.e., the initial clock is set to zero in all devices at the same time; however, this is not a necessary assumption.

Each dongle and beacon has a coarse-grained timer of 1-minute resolution (t d and t b respectively), which is set to the initial clock value provided by the backend, and subsequently increments once every minute. A device stores its timer value to local flash storage at intervals of fixed length L, called epochs. A non-persistent variable tracks the epoch id or the number of epochs elapsed since the device's start (i d and i b for the dongle and beacon, respectively). As stated earlier, in practice, we suggest an epoch length L = 15 minutes.

The secret key of a device is known only to the device and the backend. This key is used to mutually authenticate the device and the backend to each other, and to establish a secure channel between the two, whenever the two communicate with each other. The OTPs provided to a dongle are also given to the dongle's owner in human-readable form (e.g., printed on paper). These OTPs are used to mutually authenticate the owner and the dongle whenever the owner interacts with the dongle (e.g., to initiate data upload to the backend). See Section 3.3 for details of the communication between devices and the backend, and between dongles and their owners.

A beacon generates a new ephemeral id every epoch. In the i th epoch i b , the beacon b generates an ephemeral id eph b,i that is computed as follows.

which is captured by nearby user dongles. Suppose a dongle d encounters a beacon b when the dongle's local timer is t d , and the beacon's timer, epoch and ephemeral id are t b , i b , and eph b,i , respectively. The dongle logs an entry, enctr, in its persistent storage where enctr is a tuple defined as:

Beacons may broadcast the ephemeral ids several times within an epoch. Dongles persist only one entry for each unique ephemeral id they receive.

Data structure size and storage requirements Location identifiers loc b are 8 bytes in size. These bytes may be used to represent the GPS coordinates or a hierarchical location naming scheme (e.g., country.city.zip, or country.city.bus_id). Beacon timer and epoch counters are 4 bytes in size and have a 1-minute resolution. Thus they can present time for 16 years, which is well beyond the expected lifetime of the beacons. Beacon identifiers b are 4 bytes in size, and represent a simple global counter for all beacons. Thus, we can support roughly 4.3 billion beacons in the world. The hash in the ephemeral id is generated by computing SHA-256 of the inputs and taking the least significant 15 bytes of the result. Overall, each beacon broadcast ({eph b,i , b, loc b , t b }) is 15 + 4 + 8 + 4 = 31 bytes in length.

Similar to beacons, dongle timers and epoch counters are 4 bytes each and have a 1-minute resolution, and dongle identifiers are 4 bytes in size. Thus, for each unique ephemeral id received from a beacon, a dongle stores 35 bytes of data. We store an encounter with a beacon if the dongle observes the beacon's ephemeral id broadcast for at least E min minutes, which we assume to be 5 minutes.

For making estimates in later sections, we assume that users encounter on average no more than one unique beacon ephemeral id every 5 minutes throughout the day (24 hours). Accordingly, dongles need to store data for 288 encounters in one day or 4032 encounters in a 14-day window (the infectious period for the current COVID-19 disease as determined by health experts), which requires roughly 138KB of persistent storage in a dongle. In reality, a dongle is very unlikely to encounter a distinct beacon ephemeral id every 5 minutes continuously for 14 days, so this estimate is conservative.

Diagnosed individuals may share their encounter data with health authorities to enable dissemination of risk information to people who might have been in their vicinity. We first discuss the requirements for enabling data upload from users' dongles. Then, we describe different upload mechanisms that can enable users with different levels of technological access to upload their data.

(i) User dongles only have Bluetooth connectivity; therefore, they need to use a networked BLE node to upload their data to the backend. (ii) Since dongles have a minimal user interface, users fundamentally need access to a separate device through which they can authorize their dongle to upload data in case of a positive test. This device, subsequently called a terminal, can be any device that has a graphical user interface, supports BLE, and has an Internet connection. A user needs to trust the terminal to send correct commands to their dongle. In case they wish to perform a selective upload (described below), then they must also trust the terminal with the data stored on their dongle. The terminal can be part of a kiosk installed in a test center, clinic, or doctor's office, or a personal device (e.g., a smartphone or a computer) owned by the user or a care provider. (iii) Users may be allowed to share their data selectively. For this, a terminal can be used to allow users to review the history stored in their dongles and select what subset of the information they wish to release to the backend.

Users typically visit a test center or a clinic for testing and receive their results offline via email or phone. If the test is positive, they may wish to upload their data. Users may also voluntarily contribute their data for epidemic analytics in the absence of a test. We present different mechanisms for data upload, which vary in the time when data is uploaded and when it is actually released to the backend.

In all cases, patients identify themselves at the time a test is taken using the normal procedures in place for this purpose. Normally, their contact details are recorded along with the id of the test kit used for the patient. Once the test results are available, the user is informed using their contact details. If the result is positive, the notification includes a certificate that the patient has tested positive on the given date, which the user can forward to the PanCast backend.

Delayed release Here, a user initiates the data upload after they receive a positive test result. If the patient owns a smartphone or home computer, they can use it as a terminal for the upload. Users who don't own or have access to such a device and cannot leave their home to visit a terminal may use the smartphone of a person who visits them to provide essential services (e.g., delivering groceries).

When a user wishes to contribute their data, either because they tested positive and wish to warn other people they have encountered or because they wish to contribute their data for epidemic analytics, they establish a secure connection to their dongle via the terminal. For this, the user enters one of the OTPs it shares with the dongle into the terminal. The dongle and the terminal, acting on behalf of the user, use this OTP as a common shared secret to authenticate each other and establish a secure connection. Once the connection is established, the user may enable the upload and attach their test certificate if applicable. The dongle then encrypts the data it stores with a symmetric secret key it shares with the backend, and uploads to the backend via the terminal, along with any certificate.

Early release Alternatively, users may choose to upload their data into escrow at the time of their testing, pending a positive test result. Here, the user establishes a secure connnection to their dongle using a terminal at the testing site, and consents to contributing their data if the test result turns out positive. The data is immediately uploaded to the backend in encrypted form, using an encryption key derived from one of the OTPs the dongle shares with the user. The user is asked to reveal this OTP to the testing site, which uploads it to the backend along with the certificate if and when the result is positive.

With the early release approach, users trust the testing center to release their escrowed data only in case of a positive test result. Alternatively, users themselves can upload the OTP to release their escrowed information upon receiving a positive test result, for instance, through an automated phone system.

When a user enables a data upload or release, they can optionally select a subset of their recent history. For instance, they may choose not to upload data about visits to sensitive locations. The user may perform this selection using a terminal at the testing center, or using a smartphone or computer at any time. The user must trust this terminal with the currently recorded data on their dongles, as it is being displayed on the terminal.

Once the backend receives a user's encryption key, it decrypts their dongle's encounter entries and verifies their consistency. If the entries are consistent, the backend adds each encounter entry in the form of {eph b,i , loc b , T } to the risk database and/or the epidemiology database. Here, T corresponds to the real time of encounter between a dongle and a beacon.

The consistency check verifies that the value of each ephemeral beacon id eph b,i is consistent with the time at which the dongle received it. It accommodates known clock offsets between encountered beacons and the dongle. We explain this in more detail below. We describe the handling of other forms of inconsistencies, for instance, due to misconfigured beacons or relay attacks in Section 6.2.

Verifying an encounter ephermeral id, eph b,i . Recall that an encounter entry is of the form

Suppose the encouner happened at real time T . Taking timer offsets into account, we have the following equation:

Here, δ d and δ b are the clock offsets of the dongle and the beacon known to the backend. The backend checks that t d + δ d = t b + δ b . Next, the backend calculates the beacon's epoch id

at the time of encounter. The backend retrieves the beacon's secret key sk b using the beacon identifier b from the encounter entry, and computes the expected ephemeral id hash(sk b , loc b , i b ) for the epoch i b . If this expected ephemeral id matches the ephemeral id eph b,i in the encounter entry, then the encounter entry is consistent and the backend adds it to its risk database, else the entry is ignored.

In this section, we describe a protocol for risk dissemination. Such a protocol must balance privacy and efficiency concerns. On the one hand, the protocol must protect the location history and identity of diagnosed patients. On the other hand, the protocol must ensure that relevant risk information reaches potentially affected users' dongles in a timely manner. We opt for a protocol where network beacons broadcast global risk information periodically, and user dongles passively listen for relevant risk information whenever they are in proximity of such a beacon. With this approach, dongles receive risk information while reveal no information about their identity or their history to the backend, the network beacons, or to other users.

The risk information consists of a list of ephemeral ids. The ephemeral id of a beacon b for epoch e is included in the list if and only if a diagnosed individual encountered b in epoch e. To reduce the bandwidth overheads and broadcast delays, the list of risk entries is limited to the period of contagion of the disease (e.g., 14 days in the case of COVID-19). The list can also be compressed using cuckoo filters [40] , at the cost of a small percentage of false positives. For example, to ensure an average false positive rate of 0.01% for each user in a 14-day period, we require a cuckoo filter with entries of size just 27 bits (as opposed to the 15-byte ephemeral ids), which reduces the size of a risk broadcast by ∼4.4x. The risk information may additionally contain values of parameters relevant for risk score calculation performed on user devices, e.g., the weights of various features of an encounter in the risk estimation. All risk information is signed by the backend to allow detection of any tampering by intermediate nodes, including network beacons.

If a dongle has previously received any of the ephemeral ids listed in the risk information, its owner may have been exposed to a diagnosed individual. The device computes a risk score based on the number of matched ephemeral ids and other features of each encounter encoded in the beacon broadcast, and using the latest weights received from the backend. If the risk exceeds a certain threshold, the dongle notifies the user via a LED so they can self-isolate and get tested. Users press a button on the dongle to activate the LED; this ensures that they are not notified unexpectedly in public places, which may make nearby people uncomfortable or stigmatize the user.

In the following, we discuss the need to add noise in the risk dissemination protocol and the efficiency of the entire protocol.

We present two scenarios where the number of entries in the risk broadcast could potentially reveal an individual's identity, location history, or health status to an adversary in the locality of the individual. These leaks arise without the adversary having even encountered the individual at all. We then describe our solution to mitigate such leaks.

i Whereabouts of diagnosed individuals. Suppose Alice learns from the local news that there was only one case of infection in the past few days within some geographic region. Furthermore, Alice happens to know that Bob was diagnosed, that he carries a dongle, and that he agreed to upload his encounter history when he got diagnosed. Now, if Alice learns that the risk information for her region includes a non-zero number of ephemeral ids, she can infer that Bob left his home at some point while he was contagious. Dually, if Alice sees that no risk information is provided for her region, she can infer that Bob has not been near a beacon within the region and during the period in question. In short, the length of a risk notification broadcast (zero vs. non-zero) reveals to an adversary information about the whereabouts of a diagnosed individual. Note that this problem exists even if a cuckoo filter is used to encode risk information, because the size of a filter optimized for space reveals information about the number of elements it includes.

This information leak arises only if the location history of diagnosed users is always uploaded to the backend. In practice, users have a choice to not upload their history to the backend. If users exercise this choice frequently, then an adversary cannot learn much from the absence of risk entries. Nevertheless, we recognize that, with what we have described so far, such a leak is possible.

ii Health status of an individual. Suppose Alice lives in an area with very few people, say n people, and Alice is able to track the movements of n − 1 of these people through outside channels. Suppose the risk information Alice receives contains more ephemeral ids than can be accounted for by the movements of the n − 1 people Alice is tracking. At this point, Alice knows that the nth person (whom Alice is not tracking) must be sick as well. Even though such an attack requires a significant amount of offline information and may be difficult to use in practice, it does raise privacy concerns.

Note that both these leaks rely solely on the number of ephermeral ids in a risk notification. We propose to mitigate these leaks by adding noise to the risk notification broadcast in order to hide the actual number of ephermeral ids. For this, we add junk ids that do not correspond to any real beacon and, therefore, do not match the history of any user dongle. Since our threat model assumes that no adversary can monitor the ephermeral ids from a significant fraction of beacons, no adversary can distinguish these junk ids from legitimate ids. The number of junk ids can be chosen to satisfy differential privacy for all individual users, which we describe next.

Differential privacy. Our goal is to add a random number of additional junk ids to a risk broadcast to make the total number of entries differentially private for all individual users. The number of junk ids we add must obviously be non-negative. For this, we adapt a mechanism proposed in prior work [19] . We describe our adapted mechanism next.

Given a risk broadcast, we add N junk ids to it, where

Here, t is a natural number, andX is a random value sampled from a Laplacian distribution with mean 0 and parameter λ truncated to the interval [−t, ∞). Note that N is always non-negative. The values of t and λ we use depend on how much privacy we want. Specifically, to get ( , δ)-differential privacy, we pick the following λ and t.

Here, A is the sensitivity of the risk broadcast function; it equals the maximum number of risk entries that could be contributed by a single diagnosed individual. We prove in Appendix A that this mechanism is actually ( , δ)-differentially private. Table 1 shows the 99th percentiles of the number of noise entries required for different values of and δ. Here, we conservatively assume that A is 4032 (see Section 3.2). In practice, the number of junk ids must be tuned for the smallest granularity of a region for which statistical information is available from other public sources. For instance, in Germany, noise should be added to the statistics of cities and towns since statistics in Germany are reported at this granularity. 7

We now discuss the anticipated bandwidth and time requirements for a network beacon to broadcast risk information of a certain size. We estimate the time to broadcast this risk information using the latest BLE (Bluetooth Low Energy) protocol v5.2 [16] .

Bandwidth. As discussed in Section 3.2, we conservatively assume that a diagnosed individual records about 288 unique beacon ephemeral ids in a day (N day ). Furthermore, we assume that individuals are diagnosed at the very end of the infection window W ; in other words, they upload their encounter entries for the last W days. Thus, the number of risk entries from each diagnosed individual is N = N day * W . Given I diagnosed individuals, each having generated N risk entries of size S bytes each, the total number of bytes required for broadcasting all risk entries is B = (I * N * S) bytes.

Bluetooth Low Energy (BLE) v5.2 supports a new form of communication: isochronous channels [16] . Isochronous channels support both connection-and broadcast-oriented schemes. Here, we use only broadcast-oriented channels. A broadcast isochronous group (BIG) can contain up to 31 individual broadcast isochronous streams (BISs); each stream transmits data as bursts of packets in events that are synchronized to a fixed time interval. Streams can be sequential or interleaved. Devices can join a BIG and choose to listen to one or more BISs. As per the specification, BISs transmitted on a physical channel of 2 Mbps (LE 2M PHY) can nominally achieve a goodput of 1.66 Mbps. In practice, interference from WiFi and channel errors may reduce this goodput. Assuming that we can practically achieve a goodput of at least 0.8 Mbps, the time to broadcast B bytes in a single BIS is at most Delay = ((B * 8)/(0.8 * 10 6 )) seconds.

Broadcast estimates. Table 2 shows the bandwidth and latency requirements based on daily new COVID-19 cases reported in different countries [31] . S is 15 bytes and W is 14 days. Here, we assume that the risk entries of each region are coded as a cuckoo filter of size equal to the number of risk entries (column 3), and with entries of size 27 bits, which corresponds to a false positive rate of 0.01% for users in a 14-day period. Moreover, we include 318,262 noise entries, the 99%-ile of noise needed to provide differential privacy with = 0.1 and δ = 0.001.

In the above, note that the estimates in Table 2 are conservative in two ways: the maximum number of risk entries uploaded by a single diagnosed individual and the amount of differentially private noise added. The latter is also affected by the maximum number of risk entries possible from a single individual. In practice, most users are unlikely to encounter 288 unique ephermeral ids every day (as this corresponds to seeing a new ephemeral id every 5 minutes). Table 2 : Bandwidth and time requirements for broadcasting risk information (organized as cuckoo filters) in different countries. We include the cost of 318,262 noise entries, which is the 99%-ile for differential privacy with = 0.1 and δ = 0.001. Column 2 contains the 7-day moving average of daily new cases as of 11 October 2020.

Optimizing broadcasts. Users may be interested in the risk information of only a few specific regions, such as their neighbourhood and place of work, or their daily commute routes. We propose several optimizations to allow faster dissemination of relevant information to users. First, we organize the risk entries by regions of reasonable size (e.g., countries or cities) and transmit the entries of each region on a separate BIS stream. Second, we reorder the streams in different network beacons based on anticipated priorities of users in the region. For example, beacons at airports may broadcast the risk information of source and destination cities before the information of other regions, whereas beacons within a city may broadcast risk information of the city first, followed by that of neighboring cities, the state, the country, and so on. Finally, multiple network beacons can be placed in a location to parallelize risk broadcasts of different regions. Beacons can be connected to a common power source and configured to use nonoverlapping frequency bands for broadcast streams. Each beacon then broadcasts the information of a few regions using broadcast streams in its frequency bands. User dongles can tune to specific beacons to receive risk information of only specific regions.

Disregarded alternative: Actively querying risk information. We considered another alternative for risk dissemination, whereby users actively query the backend directly for risk information of specific regions. However, in this alternative, users' queries are inevitably revealed to the backend, which can then potentially infer their location history. Hence, we disregard this design in favor of the completely privacy-preserving passive broadcast model. Note that, in our current design, a user can receive risk notifications without ever transmitting anything from their dongle.

Whenever a user dongle receives risk information from the backend, it updates the owner's risk score locally (within the dongle). The individual risk score is proportional to the period during which the individual and diagnosed individuals were near the same beacons, as measured by the number of ephemeral ids contained in the risk information matching those stored in the dongle. Each ephemeral id may be weighted differently according to beacon-dependent parameters, such as indoor/outdoor, air quality, ventilation, and ambient noise. These parameters were stored by the dongle when it received the beacon's transmission. How these features are weighted depends on parameters provided by the backend as part of the risk information. The parameters can be determined by the backend using machine learning techniques [45] and reflect the latest scientific knowledge about the disease. An individual's risk score can serve as an additional input in subsequent laboratory testing of the individual if we view a laboratory test as a probabilistic procedure with nontrivial sensitivity and specificity characteristics [18] .

Propagating risk scores. In principle, risk scores could also be propagated from user to user in a probabilistic model if a significant number of yet-to-be-diagnosed users who were exposed to diagnosed users decide to voluntarily upload their history of encounters. However, there is a non-trivial trade-off between privacy and the correct calculation of these scores. To see this, note that if we have a large number of encounters with many different individuals each of whom has a nonzero probability of being infected, then our own infection risk approaches 1 [20, 41] . If the encounters, however, were all with the Table 3 : Information about a user disclosed to other entities during PanCast's operation. Notes: *Only if user wishes to use the terminal to select data to upload. **Other users only learn a list of (location, time) pairs where they intersected with at least one sick user. There is no information about how many users were sick, or their identities. ***Beacons only observe differentially-private (noised) broadcast size.

same person, then the risk approaches the infection risk of that person. In the above, for privacy reasons, we may not be able to disambiguate our contacts. This could be partly addressed by having location specific models, provided we have information about location-dependent visit profiles (i.e., how many different people tend to visit a beacon site), but this topic is beyond the scope of the present paper.

6 PanCast privacy and security

We discuss PanCast's privacy and security properties in this section. Table 3 summarizes the information disclosed to different entities at various stages of PanCast's operation. Note that BLE beacons only transmit information unidirectionally and learn nothing about nearby users. We next elaborate on the possible attacks by the remaining principals based on the information disclosed to them.

The backend learns the encounter histories of users who volunteer to upload this information by design. How much personally identifiable information (PII) the backend learns about users depends on the dongle registration policies of a given jurisdiction. From a technical standpoint, it suffices if the backend knows a pseudonym for each registered user and the associated dongle. In practice, some additional information may be required for Sybil mitigation, such as an email address, phone number, or other id. The backend learns that the owner of a dongle is sick when the dongle uploads its encounter history and the associated test certificate. In addition, the backend learns a subset of the owner's recent whereabouts approved by the owner, at the granularity of beacon visits and epochs. Additionally, the backend learns the whereabouts of volunteer users who have opted to share their encounter history even when not sick. The backend can also infer an over-approximation of the possible social contacts among sick users and volunteers at the granularity of beacon visits and epochs.

In PanCast, user dongles learn nothing about other users except through the risk notifications. Therefore, it is impossible for users to learn information about other healthy users through the system. Moreover, users can maximally learn information about diagnosed users that those users have volunteered to upload.

As described in Section 4.1, PanCast provides differential privacy in the number of risk entries from diagnosed users and the number of diagnosed users itself.

However, we note that this measure is effective only against adversaries whose history does not match any of the risk information entries, i.e., against users who were not in the vicinity of a diagnosed patient who uploaded their history. If a diagnosed user Alice uploaded their data to the backend, a user Bob who was recently near a beacon at the same time as Alice may be able to identify Alice or learn about her whereabouts from the risk dissemination information. Here, we describe two scenarios in which a user can learn some information about a diagnosed individual based on the risk information broadcast.

Location history of a diagnosed individual. Suppose Alice learns from the local news that there was one case of infection in the past few days, and she receives the associated risk information. If some ephemeral ids in the risk notification match Alice's recorded ephemeral ids, Alice learns a partial location history of the infected individual. Furthermore, by intersecting the common locations visited by the patient with her out-of-band observations of who was present, Alice may be able to identify the individual or narrow the list of candidates.

Such a leak is inherent in any risk notification system, including decentralized SPECTs, in which user devices compute their own risk score based on encounters with a patient. However, only users who have visited some of the same beacons around the same time as a sick person can benefit from the leak. To exploit the leak to learn the identity and whereabouts of arbitrary sick people, an attacker would require the ability to collect the ephemeral ids transmitted by all beacons within a geographic region of interest, which we believe is largely impractical.

Identity of a diagnosed individual. Suppose Alice visited a location at a time when there was only one other individual Bob around. Thus, the ephemeral id of the nearby beacon is recorded only in the dongles of Alice and Bob. If the ephemeral id in question appears in a risk notification, and Alice did not report sick, then she knows that Bob reported sick. This leak arises when only two (or a small number) of people were present at a given time at a given place. Note that, such a leak is also inherent to all digital contact tracing systems, centralized or decentralized.

As long as dongles listen to broadcasts of network beacons passively (which is sufficient for PanCast's operation), network beacons cannot learn anything about the presence or identity of the dongle.

When a dongle uploads data to the backend via a terminal, the terminal can observe the size of the upload. The terminal cannot observe the content of the (encrypted) communication nor does it learn the identity of the dongle or user. Since the size of an upload could reveal how many beacons a dongle has recently visited, uploads can be padded to obfuscate this information.

If the user uses the terminal to select records prior to upload, then the terminal learns the part of the user's history that is stored on the dongle at the time of the selection. To mitigate this concern, a user can use a trusted terminal for this purpose, e.g., their own personal computer or smartphone if they have one.

User dongles transmit data only when uploading information to the backend via a terminal, or to a terminal in case the user wants to select records. In both cases, the data is transmitted from the dongle only over a secure connection, and only after the dongle authenticates the other endpoint. Otherwise, retrieving the information stored in a dongle requires physical access.

We now analyze security risks posed by misconfigured devices and adversarial principals in PanCast, which may lead to inconsistent encounters. Inconsistent encounters may arise in three ways: (i) the clocks of beacons or dongles are out of sync with real time; (ii) a beacon is misconfigured and placed at a location different from where it is registered; or, (iii) an illegitimate beacon re-transmits a legitimate beacon's transmissions at a different location.

In the above, note that inconsistencies in beacons and dongles are irrelevant as long as there are no infections being reported. Next, we discuss mechanisms to identify and mitigate inconsistencies in encounters reported to the backend.

Clock inconsistencies. Encounters become inconsistent when an ephemeral id is found to have been used for more than one epoch length in real time. This may happen when devices crash and reboot after a long time, leading to encounter timestamps that are out of sync with real clock time. The backend can detect and fix such an inconsistency if it receives at least two encounters of an inconsistent device with consistent devices, where one encounter occurred before and the other after the device's crash. That is, the backend can fix an inconsistent beacon if it receives at least two encounters from consistent dongles with the beacon, and similarly, it can fix an inconsistent dongle if it receives at least two encounters of the dongle with consistent beacons. We now show an example of how the backend fixes beacon and dongle inconsistencies.

Suppose, at real time T , a dongle and a beacon encountered each other with local timers at t d and t b respectively and clock offsets at δ d and δ d , respectively. Suppose the beacon crashes and reboots at real time T + τ where τ > L. Without loss of generality, assume that the same dongle encounters the beacon after the beacon has rebooted. The beacon's timer after reboot is t b + 1, while the dongle's timer is t d + τ . Assume that t b and t b + 1 lie in a single epoch interval, i.e., the beacon's epoch id corresponding to the two times is the same. The dongle's encounter history includes two entries of encounters with the beacon as follows:

When the backend observes the two entries from the dongle with the same ephemeral id but with the dongle's local timers more than one epoch length apart, the backend knows that the inconsistency was introduced due to a beacon restart. In this case, the backend updates the beacon's clock offset δ b and the real time from when the offset comes into effect, and appends the tuple into the beacon's entry in the database. Specifically, the backend appends {C b , δ b } to a list of {clock, of f set} values stored with the beacon, where C b is the beacon's global clock in the backend at the time of encounter, and

A similar mechanism can be used to detect inconsistencies due to the crash of a user dongle. If there are multiple entries where the difference in the dongle timestamps does not match with the difference in beacon timestamps, the backend would update the dongle's clock offset appropriately. A beacon or dongle is restored to a consistent state when its clock offset equals the difference between its clock and local timer values.

Beacon misconfiguration. Inconsistencies also arise if a beacon transmits information inconsistent with its location. Such inconsistencies can arise if (i) a beacon was (accidentally or maliciously) installed in a location different from where it is registered, (ii) a spoofed beacon configured with the secret key of a legitimate beacon re-transmits the same ephemeral ids in a different location, or (iii) an adversary replays the ephemeral ids of a legitimate beacon in other locations [24] . All cases lead to the same inconsistencies due to the fact that the spoofed beacon is in a location different from where it is expected. First, dongles that travel between a spoofed beacon and nearby, legitimate beacons will appear to have traveled implausibly long distances in a short period of time. The backend can detect this when such a dongle uploads its history. Second, users with GPS-enabled smartphones can directly observe the problem when they see a beacon transmission with a signed location that is different from the phone's current location by more than the BLE range. Such phones may report the inconsistency to the backend.

In this section, we first introduce two deployment scenarios where PanCast may be quickly adopted, then explain how PanCast can be used to support manual contact tracing, and finally describe how PanCast can operate together with existing digital contact tracing systems.

Customer registration. In a growing list of countries, a wide variety of businesses, from restaurants, bars and hotels to cultural institutions and sport facilities are required to keep records of all their customers, including their names, contact details (e.g., phone number) and times of visits, for a certain period of time [23, 49, 51, 54, 56] . To this end, upon arrival, customers are typically asked to write down this information either with pen and paper or check in by scanning a QR code with their smartphones. Then, if a customer tests positive, the health authorities can use these records to more easily contact customers who might have been at risk. PanCast could provide a similar but more private mode of customer registration. Customers carrying a dongle can register by scanning a code using an NFC (nearfield-communication) protocol. However, unlike other systems, the registration information will be stored in customers' dongles itself.

In-person teaching at schools and universities. To have in-person teaching, it is important to implement effective testing and tracing strategies in schools and universities [46, 47, 50] . In this context, PanCast could provide a low cost tracing solution with minimal maintanence-schools and universities could afford the distribution of dongles across their student population for free and, as a result, achieve very high adoption levels from day one. This is in contrast with existing digital contact tracing systems, which rely on more expensive devices such as smartphones. Moreover, PanCast could provide valuable contact data to inform investigations and studies concerning the role of children in the transmission of COVID-19, which is not yet very well understood [39] .

Scaling manual contact tracing. While the effectiveness of digital contact tracing has yet to be proven, it is widely agreed that manual contact tracing is effective at reducing the spread of infectious diseases such as COVID- 19 [30, 38, 59] . In manual contact tracing, a group of contact tracers ask individuals who tested positive to recall any recent contact they may have had during a contact tracing interview. Unfortunately, existing digital contact tracing systems are, by design, of little use during a contact tracing interview. In contrast, PanCast could increase the utility of a contact tracing interview. More specifically, if the individual who tested positive owns a dongle, the contact tracer could advise them to use the encounters saved in their dongles to better recall the locations they have visited during the contact tracing interview. If the individual who tested positive does not own a dongle, the contact tracer could manually create an entry in the risk database for any of the individual's visits to places with installed beacons that they recall.

Interoperability with existing digital contact tracing systems. In addition to supporting encounter data collection and risk dissemination, PanCast could additionally implement other BLE-based contact tracing protocols in some of the deployed beacons-both centralized like PEPP-NTK or decentralized like DP3T. By doing so, any individual who tested positive and used these other BLE-based systems, could also be used to populate the PanCast risk database. Moreover, PanCast could provide relevant location dependent information to augment contact-tracing data from these other BLE-based systems, increasing the precision and specificity of their notifications [45] .

Effective digital contact tracing solutions are required to assist and scale manual contact tracing in the face of an infectious pandemic. In this paper, we introduce PanCast, a secure, privacy-preserving, inclusive, and interoperative digital contact tracing and risk notification system. PanCast relies on bluetooth encounters between simple user devices and strategically positioned beacons, which enables capturing both contemporaneous and non-contemporaneous contagions. The encounters include rich contextual information, such as location and environmental information, which facilitates epidemiological analysis and accurate risk prediction. PanCast minimizes privacy leaks for users as their dongles mostly operate in passive listening mode-they listen to beacon broadcasts for capturing encounters as well as for eventual risk notifications, and only actively transmit encounter information with owners' explicit consent. Moreover, users dongles and beacons are low cost, have a small attack surface, and require near-zero maintenance and no technical expertise on the part of users. Thus, PanCast enables participation of and provides utility to a larger proportion of the public. Finally, PanCast can complement manual contact tracing and interoperate with existing digital contact tracing solutions, thus providing a path for incremental deployment and adoption. Combining everything we get the required differential privacy inequality:

Covid-19 impact survey

CrowdNotifier -Decentralized Privacy-Preserving Presence Tracing

Early Evidence of Effectiveness of Digital Contact Tracing for SARS-CoV-2 in Switzer

Global contact tracing app downloads lag behind effective levels

More than 1,300 workers test positive: Germany fights to control coronavirus spread at meat plant

Data Protection and Information Security Architecture

ROBERT: ROBust and privacy-presERving proximity Tracing

TCN Coalition

Bluetooth Core Specification v5

Decentralized Privacy-Preserving Proximity Tracing

Crackovid: Optimizing group testing

Nontracking web analytics

Stochasticity and heterogeneity in the transmission dynamics of sars-cov-2

Aerosol emission and superemission during human speech increase with voice loudness

QR codes for check

Mind the gap: Security & privacy risks of contact tracing apps

BlueTrace: A privacy-preserving protocol for community-driven contact tracing across borders

Social distancing alters the clinical course of covid-19 in young adults: A comparative cohort study

Automated and partly automated contact tracing: a systematic review to inform the control of COVID-19. The Lancet Digital Health

Automated and partly automated contact tracing: a systematic review to inform the control of COVID-19. The Lancet: Digital Health

MIT Safe Paths Privacy Preserving WiFi Co-location for Contact Tracing without Prior Scanning of WiFi Signals

SARS-CoV-2 transmission dynamics should inform policy. SSRN 3692807

Privacy sensitive protocols and mechanisms for mobile contact tracing

Michael Wibral, and Viola Priesemann. The challenges of containing sars-cov-2 via test-trace-and-isolate

Ein Plan für den Herbst. Die Zeit

The algorithmic foundations of differential privacy

Implication of backward contact tracing in the presence of overdispersed transmission in COVID-19 outbreak. medRxiv

Contact tracing for COVID-19: current evidence, options for scale-up and an assessment of resources needed

European Centre for Disease Prevention and Control. COVID-19 in children and the role of school settings in COVID-19 transmission

Cuckoo Filter: Practically better than Bloom

COVIsim: an agent-based model for evaluating methods of digital contact tracing

Superspreading in early transmissions of COVID-19 in Indonesia. medRxiv

Characterizing superspreading events and age-specific infectiousness of sars-cov-2 transmission in georgia, usa

The foreshadow of a second wave: An analysis of current covid-19 fatalities in germany

A spatiotemporal epidemic model to quantify the effects of contact tracing, testing, and containment

Reopening schools during COVID-19

How schools can reopen safely during the pandemic

How can airborne transmission of COVID-19 indoors be minimised

Contact tracing at your workplace

Determining the optimal strategy for reopening schools, the impact of test and trace interventions, and the risk of occurrence of a second COVID-19 epidemic wave in the uk: a modelling study

Singapore Government

Adoption of government endorsed COVID-19 contact tracing apps in selected countries as of

Seyit Camtepe, and Damith Ranasinghe. Vetting Security and Privacy of Global COVID-19 Contact Tracing Applications

The Governing Mayor of Berlin. Measures against the corona virus

Network-based Contact Tracing for Infectious DiseasesUsing Passive WiFi Sensing

Maintaining records of staff, customers and visitors to support NHS Test and Trace

Centralized or Decentralized? The Contact Tracing Dilemma

Is there an association between the level of ambient air pollution and COVID-19?

Contact tracing in the context of COVID-19

Evaluating transmission heterogeneity and super-spreading event of COVID-19 in a metropolis of China

Hence,f (x) is a function of f (x) +X. Consequently, by the post-processing theorem of differential privacy [36], it is enough to show that the function g(x) = f (x) +X is ( , δ)-differentially private. So, pick two adjacent inputs x, x and any output set O. 9 We need to show that Pr

Before delving into the details of these proofs, we explain the intuition behind these bounds and our definition of O b . When g(x) ∈ O b , because of the way we defined O b , g(x) ≤ f (x) − t + A. Since the distance between f (x ) and f (x) can be A in the worst-case (x, x are adjacent by assumption and A is the sensitivity of f ), it is possible in this case that g(x) ≤ (f (x ) − A) − t + A = f (x ) − t. Note that the lower end of g(x )'s range is exactly f (x ) − t. Hence, in this case, it is possible that g(x ) will never equal g(x), so differential privacy could "fail" in this case. This is why, this case corresponds to the "δ" part. Dually, when g(x) ∈ O\O b , we will have g(x) = f (x) − t + A > f (x ) − t, so g(x ) will always have a non-zero probability of matching g(x). Hence, this corresponds to the "e Pr[g(x ) ∈ O]" case of differential privacy. Now we prove the bounds formally. We start by showing Pr[g(x) ∈ O b ] ≤ δ. Let X denote a random variable sampled from an untruncated (standard) Laplace distribution with mean 0 and parameter λ. We have: Pr[g(x) ∈ O b ] = Pr

We would like thank Nuria Oliver, Viola Priesemann, Nasim Rahaman, Peter Schwabe, Clara Scheidewind and Michael Meyer-Hermann for helpful feedback on an earlier version of this white paper, and the CIFAR contact tracing working group for their helpful discussion during presentation of an earlier version of PanCast.

A Proof of differential privacy of noise added to risk broadcastsWe prove the following differential privacy theorem, adapted from a similar theorem in the Appendix of [19] . Theorem 1. Let t ∈ R + , and letX be a random variable sampled from the Laplace distribution with mean 0 and parameter λ, truncated to the interval [−t, ∞). 8 Let f be a Z-valued function with sensitivity A. Then, the functionf defined asf (x) = f (x) + t + X