key: cord-0596643-gpxl3b27
authors: Li, Xiao; Wu, Weili; Chen, Tiantian
title: Blockchain Driven Privacy Preserving Contact Tracing Framework in Pandemics
date: 2022-02-18
journal: nan
DOI: nan
sha: 00a4e7576ec1cc6ee51efe1b9e3e074aebf71c5b
doc_id: 596643
cord_uid: gpxl3b27

Contact tracing has been proven an effective approach to control the virus spread in pandemics like COVID-19 pandemic. As an emerging powerful decentralized technique, blockchain has been explored to ensure data privacy and security in contact tracing processes. However, existing works are mostly high-level designs with no sufficient demonstration and treat blockchain as separate storage system assisting third-party central servers, ignoring the importance and capability of consensus mechanism and incentive mechanism. In this paper, we propose a light-weight and fully third-party free Blockchain-Driven Contact Tracing framework (BDCT) to bridge the gap. In the BDCT framework, RSA encryption based transaction verification method (RSA-TVM) is proposed to ensure contact tracing correctness, which can achieve more than 96% contact cases recording accuracy even each person has 60% probability of failing to verify the contact information. Reputation Corrected Delegated Proof of Stake (RC-DPoS) consensus mechanism is proposed together with the incentive mechanism, which can ensure timeliness of reporting contact cases and keep blockchain decentralized. A novel contact tracing simulation environment is created, which considers three different contact scenarios based on population density. The simulation results demonstrate the effectiveness, robustness and attack resistance of RSA-TVM and RC-DPoS in the proposed BDCT.

S INCE the first case of the novel corona-virus COVID-19 discovered in December 2019, there have been over 510 million globally confirmed cases, including 6 million deaths by April 2022 1 . The COVID-19 pandemic has brought considerable degree of fear, emotional stress and anxiety among individuals around the world [1] . The virus causes severe acute respiratory infection, bringing symptoms such as cough, fever, fatigue and breathlessness, which are very similar to symptoms caused by regular influenza. In addition, the high contagiousness makes it even hard to be controlled. In order to help people who have contact with the patient get medical treatment timely, it is imperative to record the contact histories of the patients.

The World Health Organization (WHO) has announced the importance of contract tracing since EBOLA outbreak in 2014 [2] . Formally, the contact tracing is the process of identifying history contact cases of people who may have come into contact with infected patients. Many countries have developed contact tracing methods, such as Trace Together in Singapore [3] and the QR code System in China [4] . Some technology companies also developed contact tracing tools, such as Google and Apple developed a Bluetoothbased API that can be used by third parties to develop smart phone apps [5] . These apps mostly use Bluetooth to recognize nearby devices or GPS signal to get the accurate location coordinates to determine contact cases. Most of these tracing systems rely on central servers controlled by governments or healthcare authorities, which may collect the users' identities and other privacy data through an application installed on smart phones.

Systems based on centralized servers suffer single-point failure and are weak to attacks. Decentralized contact tracing methods are then promoted, which give more control to users. In decentralized model, users are not required to update all data to the server. They can hold data locally, and share their data when necessary.

As an emerging decentralized data generating, sharing and storing technique, blockchain systems are introduced to solve contact tracing tasks to promote the security and privacy. Blockchain stores data into blocks that are connected to each other as a chain. The data stored in blocks are not able to be tempered. Smart contract deployed on blockchain can perform various functionalities. Furthermore, encryption and anonymization technologies can be applied in blockchain system to protect user's identity. The consensus mechanism in Blockchain allows blockchain systems keep working stably without a central server.

There are some initial attempts of contact tracing systems using blockchain technologies. Hasan et al. [6] propose proof of location and develop smart contracts to ensure the privacy of contact list. However, no simulation is provided. In addition, there is no incentive mechanism to motivate users to join the system. Authors assume there are plenty of users in the system behaving honestly, while the situation is hard to achieve in practice. Xu et al. [7] proposes BeepTrace blockchain-based contact tracing solution, where a blockchain system plays the neutral role in bridging data transmission between different parties, such as patients, doctors and government authorities. The users' geodata are securely preserved in specially designed blockchain. However, the efficiency of this system is not demonstrated, and arXiv:2202.09407v2 [cs.CR] 21 May 2022 no specific consensus mechanism and incentive mechanism are specified in the paper. Lv et al. [8] proposes Bychain, a three-layer contact tracing framework without reliance on trusted third parties. Proof of Location (PoL) is proposed to verify the contact record and incentive mechanism is design for maximizing contact tracing range. However, Bychain is not able to produce person-to-person accurate contact cases.

We conclude 4 main challenges to develop a third-party free blockchain-based contact tracing method. The 4 challenges are overlapped with each other: 1) Instead of simply treating blockchain as a separated storage method, how to leverage powerful consensus mechanism in blockchain system to promote data security; 2) How to design an effective consensus mechanism to organize data storage and meanwhile achieve low latency of recording contact information. The popular consensus mechanisms are usually too computational expensive for mobile devices, and may bring significant delay of recording contact information. People should be able to access latest contact records timely to prevent further possible virus spread. 3) How to design the incentive mechanism so that people are motivated to join the contact tracing system and behaviour honestly. 4) Due to lack of real-world contact data, as well as high cost of testing whole system in practice, it is hard to evaluate the effectiveness and efficiency of whole systems. The difficulty of collecting real-world contact information, is not only from privacy concerns, but also diversity of people contact scenarios. Contact cases happened in crowded cities and those happened in rural areas are totally different scenarios with different frequencies and amounts. This diversity brings challenge to design incentive mechanism fair to every one.

There is seldom work that clearly addresses all above mentioned 4 challenges. In addition, according to the survey conducted at multiple countries in [9] , though most people accept app-based tracing methods, the concern about the security and privacy is still an obstacle to the common adoption of tracing apps in many countries. Therefore, in this article, we aim to tackle the 4 challenges with the users' privacy ensured by proposing a fully third-party free contact tracing framework with blockchain technology. A RSA based transaction verification algorithm is proposed to ensure the correctness of recorded contact information and improve system robustness. To efficiently store contact information into blocks, we propose Reputation Corrected Delegated Proof of Stake (RC-DPoS) consensus mechanism, which can control the right of appending new blocks. An incentive mechanism is then designed to work with RC-DPoS motivating people to work honestly and maintaining system decentrality. Finally, we design a contact tracing simulation method that simulates different real-world people contact scenarios to evaluate the effectiveness of proposed framework.

The reminder of this article is organized as follows.

In Section 2, we discuss existing related work on contract tracing. Section 3 is dedicated to presenting the overview of proposed contact tracing framework. Next we elaborate transaction verification algorithm, RC-DPoS and incentive mechanism in Section 4, Section 5 and Section 6, respectively. Experimental simulation and discussion are conducted in Section 7. Finally, Section 8 concludes the paper.

Contact tracing refers to the process that records the people contact history so that the people contacted with a patient can be informed and get medical treatment timely to control the spread of virus.

Various contact tracing tools have been developed using location technologies such as GPS, Wifi, cell tower signal and Bluetooth [10] , [11] , [12] , [13] , [14] , [15] .

Reichert et al. [16] propose a centralized contact tracing method, which assumes every user has their location history stored in their devices, and the health authorities are able to read the data. Nisar et al. [12] propose to use call data record to trace the patient once she/he is diagnosed positive. However, the trace build is not practical since most people don not answer phones calls very frequently during a day, therefore we can only get limited number of locations. The contact tracing system based on GPS signal are not reliable for in-door situations, while in-door contact is one major way of virus spread due to short contact distance and long contact time. Some work proposes to use WiFi or wireless access points to discover contact cases [10] , [11] , [13] . These frameworks require users to connect their devices to specific wireless access points. However, in some public areas such as shopping malls, airports and train stations, people may not join public WiFi due to network security concerns. Bluetooth technology can scan nearby devices and get device identities within a small range, which can help generate the contact cases [15] . In this article, we also leverage this advantage to protect users privacy that avoids disclosing users' real identities and specific locations.

Chan et al. [17] propose PACT protocol, where every user holds the contact tracing data on their own local devices, and only when they are tested positive, they will broadcast their contact information to a public platform. Every other user will check the list on the platform to confirm if they have contacted with anyone in that list. Though this protocol is a third-party-free mobile contracting protocol and easy to be implemented in practice, However The users are not guaranteed to behave honestly, and the public platform is easy to be compromised since it is open to anyone.

Most of existing works are centralized where third-party servers are used collecting user's personal data and contact history to match contact records [18] . Centralized models are exposed to risks of single point failure, privacy data leaking and security compromising. Though some contact tracing methods are proposed to be decentralized [17] , [19] , these methods still require a server to process data computing functions and are vulnerable to dishonest behaviours from malicious users.

Blockchain technology is first proposed by [20] as a distributed ledger for Bitcoin system, which ensures data security without any trust given to third parties. A blockchain system usually constructs a peer-to-peer network, where each user plays exactly the same role and follows the same protocol. Every user stores a whole copy of blockchain, so that all the data on the blockchain are extremely hard to be tempered and single-point failure can naturally be avoided. There's no need of a central server to perform functions in the system, such as collecting, computing or storing data. Users (peers) in a blockchain system have equal rights to perform functionalities by executing smart contracts deployed in the system. A consensus mechanism is enforced in the system to control which user is qualified to generate a new block at each step. An incentive mechanism is also important in the system to motivate users to compete for the right of generating a block.

Blockchain technology, first known as distributed ledger [20] , can make a system work stably without any trust built among parties. With anonymity techniques and data encryption techniques, users in a blockchain system can share data securely without compromising privacy. Blockchain technology has demonstrated significant feasibility in IoT applications, which have similar requirements as contact tracing systems [21] , [22] , [23] , [24] , [25] .

Blockchain technology shows great potential for developing privacy preserving and efficient contact tracing applications. Idrees et al. [26] point out several challenges and risks associated with the available contact tracing apps and analyze how the adoption of a blockchain-based decentralized network could provide users with privacy-preserving contact tracing.

Besides the BeepTrace [7] mentioned above, there exists many other blockchain-based contract tracing frameworks or systems. Arifeen et al. [27] propose a highlevel blockchain based contract tracing framework where blockchain is used for patients to publish contact list. Zhang et al. [28] propose PTBM leveraging both permissionless and permissioned blockchain to manage users' location data, and 5G technique provides support for low latency communication. In PTBM, authorized third parties, such as medical centers and medical organizations, are able to compute the contact history and publish history route of patients.

Peng et al. [29] propose P 2 B, where users can upload contact information to blockchain storage to be further verified and cross-checked by clients and authorities. P 2 B is demonstrated with higher data transmission efficiency than BeepTrace. Vangipuram et al. [30] propose a threetier architecture for storing numerous data collected by Internet-of-MedicalThings (IoMT) for contact tracing. In the architecture blockchain is employed to securely transfer the data from the infected person to the hospital system using the edge infrastructure.

Zuhair et al. [31] consider a sixth-generation (6G)assisted unmanned aerial vehicles (UAVs) en-powered mass surveillance system in dense areas, which can monitor body temperature of persons with thermal imaging sensors. Blockchain also works as storage system in their work, and with the powerful bandwidth of 6G, the data can be processed with low latency. Salimibeni et al. [32] consider indoor contact tracing scenarios, and propose TB-ICT contact tracing framework,where dynamic Proof of Work (dPoW) credit-based consensus algorithm coupled with Randomized Hash Window (W-Hash) and dynamic Proof of Credit (dPoC) mechanisms are proposed to differentiate between honest and dishonest nodes. TB-ICT can motivate people to behave honestly since better credit can decrease mining difficulty. However, PoW-based consensus mechanism may bring high computation overhead while BLE-carried devices adopted in the system are not usually computational powerful.

In this paper, we study the contact tracing problem as: given Bluetooth signals on smart devices, with the constraints of preserving privacy, we aim to output pairwise users' contact lists by discovering nearby Bluetooth devices. The goals to achieve for the contact tracing problem are the completeness and correctness of contact list, the contact tracing robustness and attack resistance.

We assume our Blockchain-Driven Contact Tracing framework (BDCT) is implemented and deployed through clients on smart devices. People can join the contact tracing system by installing the client on their smart devices. It is assumed each user carries one device with the client installed. The client will generate private-public key pair and a unique device ID for each device. The client on a device will use Bluetooth to share the device ID as well as getting device IDs of other nearby devices. Bluetooth is capable to evaluate the distance between two devices within a certain range by the strength of Bluetooth signal. Therefore, the contact distance can be easily computed [33] . The furthest contact distance considered in this paper is 5 meters where Bluetooth can produce strong enough signal to support accurate computation. In this paper, since we directly record the device IDs of contacts rather than record accurate GPS coordinates and match contact information afterwards, without any accurate location data recorded, privacy will be preserved.

At a given frequency, the client on a smart device will scan and record all the device IDs of nearby devices within a range. This process is fast and secure since the client only scans surrounding devices without having to establish stable connection to them, which also avoids cyberattack through Bluetooth channel. If there is a device detected within 2 meters 2 , the client will identify this as a contact case. The client will then store the device IDs of contacted devices into contact list locally in a special format which will be specified in next section.

Most of previous works ignore the fact that mobile devices are not as robust as computers in terms of internet connectivity, system robustness and security level. The device may fail to collect the contacted device information, or be attacked to record false contact list. To improve the data integrity, a special role witness is proposed in this paper. All the devices that are 2 meters away but still within 5 meters from the current device are considered witnesses of the contact case. Witnesses play important roles in BDCT, which help verify the reported contact list, speed up the verification process, and recover the missed contacts. The client will also store the device IDs of witnessed devices into witness list locally in the similar format as contact list.

2. The distance can be adjusted according to particular scenarios. ④ Verify Signature and put into pool With the pseudo IDs of contact devices and witness devices stored locally, users can check whom they have contacted with and who have witnessed their contact cases at any time without knowing the real identity of the device owner. Based on the above setting, we now illustrate the whole Blockchain-Driven Contact Tracing Framework (BDCT) in Figure 1 with an example.

In Figure 1 , at a given Timestamp, assume user u 1 would like to report his current contact case. He will initiate a contact tracing transaction T con . let's assume users u 2 and u 3 are within 2 meters from u 1 and hence considered a contact case with user u 1 . Users u 4 and u 5 are 2 meters away but still within 5 meters from u 1 , and they witness that u 1 is with u 2 and u 3 . u 6 and u 7 are considered irrelevant to this contact case. As in the figure, there are 6 steps from generating contact record as blockchain transaction to the transaction being stored to blockchain storage in every device.

Step 1: User u 1 initiates a blockchain transaction T con = {T id , u 1 , ContactList, W itnessList, T imestamp}, which is used for record the contact case of u 1 at T imestamp. One transaction represents one contact case of users at some timestamp. T id is an unique transaction ID for each transaction. T imestamp is the exact time that u 1 contacts the users in ContactList. The ContactList and W itnessList contains secret messages from u 1 encrypted by the public keys of each contacted devices (u 2 and u 3 ) or witness devices (u 4 and u 5 ). Formal definition of ContactList and W itnessList will be presented in Section 4.

Step 2: User u 1 then broadcasts the transaction T con through internet to every user who have the client installed. Since no one knows others' identities, u 1 is not able to directly send message to u 2 , u 3 , u 4 and u 5 .

Step 3: When other users receive the transaction T con , it will check if it contacted with u 1 at T imestamp or if it witnessed the reported contact case. Then the contacted users in this example, u 2 and u 3 , and the witnessed users, u 4 and u 5 , will try decode the received message, sign the decoded message and broadcast this signed transaction. The transaction generator u 1 will receive the signed transaction.

Step 4: After receiving the signed transaction T con , u 1 will verify the signature by decoding the signature with contact's or witness's public key to make sure the contact list and witness list are signed by correct people. u 1 will wait for the signatures within a specific delay d, such as 60 minutes. Only the records in ContactList verified valid will be finally preserved in T con . u 1 will put transaction T con into a shared transaction pool which is synchronized on every device along with blockchain.

Step 5: At given frequency, one of the candidate miners will be selected to package all the transactions in the transaction pool into a block. In this paper, we propose the Reputation-Corrected DPoS (RC-DPoS) mechanism to choose the candidate miners, which will be presented in detail in Section 5.

Step 6: The block is finally appended to the blockchain by the miner, and broadcast to all users in the network for synchronizing.

Step 1 to Step 4 will be elaborated in Section 4 by proposing RSA-Based transaction verification method. In Section 5, RC-DPoS and corresponding incentive mechanism are presented to complete Step 5 and Step 6.

In this section, we will first describe how to initialize credentials for each user and then present RSA-based Transaction Verification Method (RSA-TVM). There are two major goals on contact tracing system: 1) data integrity: the collected contact cases should be as complete, untampered and correct as possible; and 2) privacy: the whole system should never initiatively disclose any location or identity information of users.

In this paper, We propose RSA-based Transaction Verification Method (RSA-TVM) to make sure the contact records in the transaction are valid meanwhile ensuring the anonymity. We employee RSA algorithm as encryption module [34] . RSA algorithm is an asymmetric encryption algorithm, and is able to generate a key pair, (public key, private key) for a user. Public key is known by public, while the private key is only known by the owner. The secret message encrypted by public key can only be decoded by private key owner. A message can be signed by private key indicating owner's consent to the message, and the signed message can be verified by corresponding public key to ensure the signature is correctly signed by the private key owner.

Next we will first describe how to initialize credentials for each user and then present RSA-TVM.

When a person u installs the tracing client on a smart device and become an user of the tracing system, the client will first name the device with an unique device ID, denoted as u D ID , and then generate a RSA key pair (public key, private key), denoted as (u P ub key , u P ri key ). The length of each key is set to 1024 bits. The private key will be stored locally in the smart device. The public key and the device information will be included in a transaction through the client, then be stored into blockchain. This transaction is called "Registration Transaction", which is defined as T reg = {T id , {u D ID , u P ub key , t}}. T id is the unique id for each transaction and is generated by SHA256 algorithm [35] based on timestamp t as well as the transaction content {u D ID , u P ub key , t}, so that any change made on the content will cause a different T id .

After the registration transaction is stored in the blockchain, since every user in the system have a synchronized copy of the whole blockchain, every user will hold the public keys for every others. Users are able to modify their device ID or credentials by submitting a new registration transaction, so that every other users can get a new copy of the updated device ID or public key.

Users will scan the nearby devices (through Bluetooth) at a given frequency to get the nearby devices' IDs and record them locally. We avoid any device connection through Bluetooth channels to improve security. The client only collects the devices' IDs, and look up the registration transactions to get the public keys for generating secret message later used in ContactList or W itnessList. Next if the user wants to report contact cases and store the contact information into Blockchain, "Contact Transaction" will be initialized.

If there are nearby devices within 2 meters detected by user u's device, u can generate "Contact Transaction", denoted as T con = {T id , {u D ID , C, W, t}}, where T id is the unique id for each transaction and is generated based on timestamp t and the transaction content {u D ID , C, W, t}. The u D ID is the device ID of u, and t is the timestamp for this contact case. C and W are Contact List and Witness List, which contain the information of the contacted people(devices) and the witness of this contact case, respectively. To generate C and W , user u first needs to decide a original secret message D, and then encrypt it with the public key of the contacted people (e.g., u i ) and the witnesses (e.g., u j ) of this contact case. For each contacted person u i , u iP ub key encrypted text, denoted as D ui P ub key is generated. Similarly, for each witness u j , D uj P ub key is generated. Formally, the Contact List C is defined as a set of tuples: C = {(u iP ub key , D ui P ub key )|∀u i }. Similarly, the Witness List is defined as: W = {(u j P ub key , D uj P ub key )|∀u j }.

Ideally, The secret message should be unique for every u i and u j for every transaction to ensure security which requires secret message as long as possible. However, the encryption time increase rapidly with the length of text. In practice, we set each secret message contains 10 Hex characters (0 − 9, a − f ), which is able to represent about 1.1 × 10 12 different messages.

Witness list W can be very helpful to avoid contact case loss and improve robustness against dishonest user behaviors or system failure. We will show this later in Section 7.

The transaction T con will then be broadcast to all users in order to protect privacy. Each user will check if C or W in the received T con contains his/her public key. If so, the related tuples require his/her verification. Since the messages are all encrypted, therefore only the user who holds the public key can decrypt the encrypted secret message by his/her public key.

When u i identifies the tuple (u iP ub key , D ui P ub key ) in C, u i will decrypt the encrypted text D ui P ub key with the private key u iP ri key to get the secret message D. Then u i will check local contact history. If u i has the record that u i contacted with u at timestamp t±3 min, then u i can confirm the tuple (u iP ub key , D ui P ub key ) valid in T con . Then u i needs to send a message back to u to indicate that the contact record about u i in T con is confirmed. Specifically, u i signs the secret message D with his private key. The signed text is denoted as SD ui P ri key . Then u i replaces (u iP ub key , D ui P ub key ) with (u iP ub key , SD ui P ri key ) in T con , and broadcast to all users. u j will conduct similar verification on the related tuple in witness list W . If u j has the record that the transaction generator u contacted with all users in C at t, then u j will consider all tuples in C valid by signing secret message in related tuple in W .

If u i can not find any local record showing u i contacted with u at timestamp t ± 3 min, then u i believes this is a wrong record. u i will sign a predefined warning message Z="Wrong Record" instead of signing the secret message D. The tuple (u iP ub key , D ui P ub key ) in T con will then be (u iP ub key , Z ui P ri key ). Once the transaction generator u receives the updated T con from user u i , u will verify the signature with the public key of u i .

A tuple in contact list C in T con is considered valid if: 1) there is no signed warning message in the tuple, and 2) the secret message in tuple is correctly signed by the contacted person or at least one tuple in witness list is correctly signed by the witness. Due to network or system failure of users' smart devices, users may have no response to the related tuple in C or W within given delay d. In this case, the tuple will still be considered valid as long as one witness has verified the this contact case is correct.

If not all tuples in the contact list C are verified valid within a specific delay d, then only the valid tuples in C will be preserved in T con . The transaction T con will be put into the shared transaction pool waiting to be mined, e.g. permanently stored in blockchain. Any user u i or u j who signed D or Z will get reward for helping verify the contact case. We will discuss reward policies in Section 6. Figure 2 shows the process of RSA-TVM that u 2 verifies u 1 's contact case.

Most existing work considers it is straightforward to let the transaction generator directly package the verified transactions into blocks and then broadcast to all users instead of choosing a miner to do the job. However, the above strategy will cause unfair incentive reward problem due to the nature of diverse contact scenarios. unfair incentive reward problem: Users are rewarded for reporting contact cases by generating contact transactions. However, people have different chance to have contact cases due to the diversity of jobs or living styles. People who live or work in human-dense areas, such as cashiers in markets and staffs at transport stations, will obviously have much more contact cases than those who stay or work at home, thus gain much more reward. This will even encourage people to go out and make contacts in order to earn re-ward, which is against the social distancing policy during pandemics.

In addition, since miners can be the one not in the Contactlist or W itnessList, it helps avoid group cheating that small groups deliberately generate fake contact cases, verify contact transactions for each other, package transactions and append new blocks in order to gain great amount of reward rapidly. Therefore, it is imperative to carefully design consensus mechanism and incentive mechanism to balance the reward. The consensus mechanism is required computational lightweight and have high transaction throughput to satisfy the huge data storage demand on smart devices which are usually have low computational power.

The Delegated Proof of Stake (DPoS) consensus mechanism [36] is a popular light-weight consensus mechanism. DPoS provides high-speed consensus making so that emerging transactions can be stored into blocks timely. In DPoS consensus mechanism, each user holds some stakes, which are usually crypto-currency. Whenever there is no candidate miners, every user will vote someone they trust. The weight of the vote is proportional to the stake of the voter. That is, more stake gives the voter more vote power. After the voting, the users with top k total weighted votes will be selected as k candidate miners. Whenever there is block waiting to be appended into blockchain, one candidate miner will be randomly chosen to do the job, and the chosen miner will be removed from the candidate miner set once the job is done. Once candidate miner set is empty, new round of voting will start.

DPoS can produce high throughput without compromising decentrality of blockchain system if everyone is honest and the voting is random. However, it can not be directly applied in our proposed BDCT. In order to motivate people to share their contact information, reward must be given to those who generates contact transactions honestly. In DPoS the reward is stake, people living or working in humandense areas will gather stakes quickly. Thus their votes will gradually become highly weighted due to high stakes, hence their votes will easily determine the selection of candidate miners. In other words, the whole blockchain system will be dominated by those people who generate contact cases often.

In order to solve the issue described above, we propose Reputation-Corrected DPoS (RC-DPoS) consensus mechanism. In RC-DPoS, we assign reputation to each user, which is represented by credit c. Users will gain reputation reward instead of stake reward for honestly reporting their contact cases, while only gain stake reward for working as a miner. Specifically, the RC-DPoS mechanism works as follows:

Step 1: When new users first join the contact tracing framework, they will be initialized with a fixed start-up stake s 0 and credit c 0 .

Step 2: Initially, the candidate miner set is empty, the candidate selection process will start. Each user votes for another one trusted user and users can not vote for themselves. Similar to DPoS, the vote is weighted according to the voter's stake. But the total votes received by a user will be corrected by receiver's credit. Formally, let N denotes the total number of users in the system. For user u i , i ∈ Z N , the total vote score accumulated by u i is calculated according to Equation 1:

where the sum taken over user u k who votes u i is the total weighted vote received by u i , s k is the current stake amount of u k . c i is the current credit amount of u i . RF (u i ) is the reputation correction factor of user u i , which is defined as:

RF (u i ) ∈ [0, 1], and RF (ui)+1 2 ∈ [0.5, 1]. The intuition behind this equation is that users with good reputation should have higher chance to be a candidate miner in order to improve the system security, meanwhile we also avoid applying too much punishment on other users with lower reputation (maximum 50% off on received votes).

Step 3: Rank all users in descending order according to their vote scores. The top N/5 users are selected into candidate miners set. The size of candidate miners set can be adjusted based on specific applications.

Step 4: At a given mining frequency (3 minutes, 5 minutes or so on), one arbitrary miner selected from the candidate miner set will package all the transactions in the shared transaction pool into a block and append it into the miner's local blockchain. Then shared transaction pool is empty and waits for new verified transactions. The structure of the blockchain storage is illustrated in Figure 3 .

Step 5: The miner then broadcasts this blockchain update to all users. Users in the system will update their local blockchain and the local transaction pool. The miner will be given stake reward and reputation reward. Reward detail will be elaborated in Section 6. Then the miner will be removed from the candidate miner set.

Step 6: When a miner fails to do this job within a excusable delay (e.g. 10 minutes) due to network disconnection or system failure, a penalty will be applied on the miner by taking away some credits and no stake reward will be given. The miner will be removed from the candidate miner list and another miner will be delegated to do the job.

Step 7: If the candidate miner set is empty, back to Step 2.

The proposed BDCT contact tracing framework is highly automatic without a central server. BDCT fully relies on people to generate transactions, store contact cases into blocks and maintain decentrality, therefore it is crucial to design an incentive mechanism to motivate people to generate contact transactions and append blocks into blockchain honestly. It is also important to ensure the incentive mechanism does not specially benefit a particular group of people to avoid the system becoming centralized and dominated. If the rewards are taking over by a small specific group of people, others will be discouraged, and the whole system will be barely helpful for contact tracing. In this paper, we design the incentive mechanism as a composition of following 4 incentive policies.

1) Users will be rewarded with 1 unit 3 credit for generating transactions. The users can not get the reward until the transaction is accepted by the shared transaction pool. This will motivate users to honestly report their contact list. With more credits, according to Equation 1, users will be more likely to be selected as candidate miner and thus can get more credit reward as well as stake reward. 2) Users will be rewarded with 1 unit credit after successfully verifying related tuple in contact transactions. This will motivate users to participate in generating transactions and improve the speed of verifying contact cases. 3) Users will be rewarded with R i unit stake reward and 1 unit credit reward for mining a block, e.g. append a new block into existing blockchain. R i is corresponding to the total amount of transactions that u i generated, which is formally defined as:

3. numbers in the incentive mechanism only for indicating the relative amount, they can be of any unit. 

. (4) w is a predefined reward amount (e.g. 5 units) and t i is the total number of transaction generated by u i . T F (u i ) ∈ [0, 1] is called the transaction correction factor. From the above definition, we could find that the more transaction u i has generated, the lower stake reward will be given to u i . The intuition behind R i is that we do not want people who generate much transactions gain much stake reward for completing mining job since they naturally have more chance to become a miner according to incentive policy 1). On the other hand, people who generate less transactions will get more stake reward per mining job they complete.

T F (ui)+1 2 ∈ [0.5, 1] will make a maximum 50% off on the stake reward. Therefore R i can help balance the stake reward among users in different contact scenarios, hence help maintain the vote power distributed. 4) A user will be punished if the user fails to complete a mining job. 5 units credits will be deducted on the miner for this punishment.

The stake reward is usually pecuniary crypto-currency, which can be distributed by government or healthcare authorities. Since this system does not require frequent maintenance and huge computation center, the budget can be saved for pecuniary stake reward. Then with accurate and efficient contact tracing, BDCT will save more money for government by helping control the virus spread.

Though there are some well-known real-word trajectory datasets indicating real people movements in specific areas [37] , [38] , [39] , [40] , [41] , [42] , they are mostly based on the record obtained from mobile vehicles or cell phone calls, the trajectories are not continuous or the number of trajectories are not sufficient to support the simulation in terms of frequency and amount. Since it is hard to collect real-word trajectory in a wide range due to privacy concerns and diversity of contact scenarios, we conduct experiments on synthetic datasets that simulates different people contact scenarios to demonstrate the effectiveness of the proposed BDCT contact tracing framework.

We propose to consider three general contact scenarios decided based on population density: Low density (Sparse), Medium density (Medium), High density (Crowded). Each scenario can intuitively represent for one kind of real-world people contacting cases. "Sparse" can represent for the people contacting cases in rural area or residential area. "Medium" can represent for the cases in schools, parks or other common public areas. "Crowded" represents for contacting cases happening in some very crowded places, such as shopping malls and sports events.

People in 3 scenarios have different frequencies of having contact cases, different numbers of contacted people and witnesses. Therefore the frequency of generating transactions, and the length of contact list and witness list need to be adjusted for simulating 3 scenarios. To achieve this goal, we specify the settings for the three scenarios as follows:

Low density (Sparse) case: In each transaction, the length of contact list and witness list follow normal distribution N (µ = 0, σ = 2) and N (µ = 0, σ = 1), respectively. The frequency of generating transaction is 1 cases/hr. Medium density (Medium) case: In each transaction, the length of contact list and witness list follow the normal distribution N (µ = 2, σ = 4) and N (µ = 2, σ = 2), respectively. The frequency of generating transaction is 3 cases/hr.

High density (Crowded) case: In each transaction, the length of contact list and witness list follow the normal distribution N (µ = 5, σ = 2) and N (µ = 7, σ = 2), respectively. The frequency of generating transaction is 12 cases/hr.

We implement the framework with Python 3.7, and all simulations are conducted on a machine with Intel Core i7-8750h 8 cores and 32GB memories. Each user is implemented as a thread of python, and all threads are run simultaneously to simulate real time contact. We randomly generate the contact cases for each user without considering reasonable trajectories for them, since the trajectories does not affect the evaluation of the effectiveness and efficiency of the whole framework. For the length of contact list or witness list, we only adopt the non-negative number sampled from corresponding normal distributions.

It is crucial to maintain the decentrality of BDCT, so that the voting power is distributed and every user can be equally motivated to keep contributing contact cases honestly.

We simulate 200 users for each contact scenario, and hence totally 600 users are in the whole simulation environment. To measure the dencentrality of the system, we draw Lorenz curve, and calculate the Gini coefficient/index of three factors of the 600 users: user balance (cumulative stake reward), user credits (cumulative reputation reward) and the total number of mined blocks.

Lorenz curve is originally proposed for drawing the cumulative income from different units when they are in the ascending order [43] . The closer the income distribution is to uniform distribution, the closer the corresponding Lorenz curve is to line y = x. We extend Lorenz curve in this article to illustrate the decentrality of the proposed RC-DPoS.

The Gini coefficient or Gini index Gini is a metric for quantitatively measure inequality of a distribution, which can derived from Lorenz curve [44] . It is defined as a ratio with values between 0 and 1. Specifically, the numerator is the area between the Lorenz curve of the distribution and the uniform distribution line; the denominator is the area under the uniform distribution line. Hence, Gini = 0 indicates perfect equality of a distribution, and Gini = 1 indicates the distribution is total skew to one unit.

We adopt the DPoS mechanism as the baseline. In the baseline, no credit reward is given to users, and users will get 1 unit stake reward for generating or verifying transactions and 5 units stake reward for mining a block. Other settings are kept the same as proposed BDCT. The initial stake and credit of users are set to 100 units. Random voting strategy is adopted for voting the candidate miners.

We run the simulation for 10 times, and evaluate statistical significance of Gini coefficient of user balance between baseline and proposed BDCT. We conduct a two-sided T-test for the null hypothesis that two frameworks' stake reward distribution have identical average (expected) values. The p = 1.22 × 10 −32 indicates BDCT achieves definitely better stake reward dencentrality. Figure 4 shows the results of Gini coefficient and Lorenz curve of the three factors. Figure 4(a) and (b) show how the Gini coefficient change with more and more data stored in the blockchain. Figure 4(c) shows the Gini coefficient of balance of baseline DPoS remains as high as 0.56 when blockchain height is 10k, indicating the stake rewards are mostly given to people in Dense scenario. Since the baseline DPoS mechanism does not consider any credit reward, therefore the Gini coefficient of credit is 0 in Figure 4 (a)(c). In addition The Gini coefficient of mined blocks count of baseline is close to 0, this is because under the random vote strategy in DPoS, users has the same expectation to be selected as a miner. Figure 4 (b) and (d) shows the results of our proposed BDCT framework. In Figure 4 (b), our BDCT framework makes the gini coefficient decrease with the height of blockchain, which means BDCT is achieving balanced stake reward when we continue recording more data. In Figure 4(d) , as expected, the Gini coefficient of credit is 0.57 when blockchain height is 10k, showing users in dense area can indeed earn more credit than other users. The Gini coefficient of mined blocks count is 0.27 which is higher than 0.12 in baseline, indicating people in dense areas indeed have higher chance to be a miner. Gini coefficient of credit is 0.19 which is significantly lower than 0.56 in baseline and demonstrates our RC-DPoS and proposed incentive mechanism can successfully balance the stake reward among different groups of users.

We further investigate the stake reward distribution. In DPoS baseline, the 200 users (1/3 of total users) in the Crowded scenario together hold more than 85% stake rewards giving them more than 85% vote power. On ther other hand, the three groups of users in three different contact scenarios in our proposed BDCT hold stake reward 23%, 42%, and 34% respectively.

Mobile devices are usually with low computational power and low security level, and sometimes may suffer from system failure or network delay and disconnection. All those factors can cause failure of detecting contact case, verifying contact list or receiving transactions. In this paper, We proposed witness for every contact case in BDCT framework, which can improve the robustness of recording correct contact cases. As mentioned in Section 4.2, if a tuple in contact case is not verified due to the failure mentioned above, as long as there is one witness in W verified the contact list C, the tuple will be considered valid.

To evaluate the robustness of recording contact information of the proposed system, we set a failure rate p of each user, representing that the user has a probability p of failing to verify the corresponding transaction. Then we compute how many contact cases, that u i contacts with u j at timestamp t, e.g. (u i , u j , t), will lose comparing with the given 300k contact cases. We use a baseline BDCT − w/o − W itness that is BDCT without witness role, therefore (u i , u j , t) will be lost when both u i and u j fail to verify the corresponding tuples in C. This is also a common design in exiting works [19] , [45] . We simulate this experiment for 10 times, and report the average results. Table 1 shows the simulation results. It can be seen that our framework lost significantly less contact cases than baseline at any failure rate p. BDCT can correctly record nearly 96.31% (1-3.69%) total contact cases even every node has 0.6 failure probability, which is 35% more than the baseline that can only preserve about 61.26% (1-38.74%) contact cases.

Malicious users may report false or fake contact cases to generate more transactions, which may bring them more credit reward or stake reward. The only way to achieve this attack is through group cheating that several malicious users together create and verify transactions. More specifically, malicious users have two attack approaches to get a fake transaction verified. They can create a contact list where all the contacts are malicious users, or put malicious witness in the witness list so that the whole transaction will be verified as long as this malicious witness verifies related tuple in witness list no matter whether the contact case is real or fake in the contact list. Next we describe how each of the attack approach impacts the whole system.

If a malicious user chooses the first method, that creates contact list composed by other malicious users, this will not bring any false information to other honest users. The malicious users may earn more credits by generating or verifying numerous fake transactions, but the number of transactions is limited by the Bluetooth scanning frequency specified by the system, e.g. 5 minutes. In other words, a user can generate at most twelve transactions per hour. In addition, the proposed RC-DPoS and incentive mechanism can well balance the stake reward as shown in Figure 4 . Therefore malicious users will not dominate stake reward too much.

If a malicious user attacks through the second method, that a malicious witness is put in the witness list for every false transaction, it may include false contact cases in the contact list. However, according to the proposed RSA-TVM method, the tuple recording the false contact case can be verified wrong and hence not be preserved before the transaction is put into the transaction pool. Therefore, the impact of this attack can also be controlled.

In BDCT framework, every user is holding the whole copy of blockchain where the contact transactions are stored. We evaluate the expected blockchain storage cost of the proposed BDCT framework by calculating the expected number of transactions and blocks generated per hour with respect to our experiment setting.

We denote the expected size of total blockchain segment generated per hour as E(S T B/h ), expected size of all block heads per hour as E(S BH/h ), and expected size of all block bodies per hour as E(S BB/h ). The size of single block head is denoted as s BH , the expected amount of blocks generated per hour is N B/h . Hence, E(S BH/h ) = E(N B/h ) * s BH . The block bodies contains transactions, therefore E(S BB/h ) = E(S T /h ), where E(S T /h ) is the expected size of total transactions generated per hour. Consequently, E(S T B/h ) is calculated by Equation 5 .

The speed of generating a block is predefined in the system, e.g. every 5 minutes. Therefore E(N B/h ) = 12. Based on the block structure illustrated in Figure 3 , s BH can be calculated as Equation 6 .

= 2 * 256 bits + 10 bytes + 32 bits = 64 bytes + 10 bytes + 4 bytes = 78 bytes (6) s BlockHash is the size of a unique block ID, which is a SHA256 hash value, therefore s BlockHash = 256 bits. s u D ID is the size of block generator's device ID. In our framework, the set device ID a string contains 10 Hex characters, which can represent 2 10 * 4 ≈ 1.1 × 10 12 unique devices. Since each char type hex character take 1 byte, s u D ID = 10 bytes. s T imestamp is size of an Unix timestamp of 32 bit integer type, hence s T imestamp = 32 bits.

In our experiment settings, three different contact scenarios are considered with different contact case generating frequency, number of contacted people and number of witnesses. Though there may also registration transactions in block bodies, registration transactions cost minor storage. In this discussion, we consider the general case that block bodies contain only contact transactions. Then E(S T /h ) is the sum of expected all transactions generated by 3 contact scenarios as in Equation 7 .

E(S T S/h ), E(S T M/h ) and E(S T C/h ) are the expected numbers of transactions generated per hour in Sparse scenario, Medium scenario and Crowded scenario respectively.

We denote E(N T S/h ) as the expected number of transactions (contact cases) per hour in Sparse scenario and E(s T S ) as the expected size of a transaction generated in Sparse scenario. Given the contact transaction structure described in Section 4.2, E(s T S ) is composed by size of transaction ID s T id , size of transaction generator's device ID s u D ID , size of transaction timestamp s T imestamp , expected size of contact list E(s C ) and expected size of witness list E(s W ). In each contact list C or witness list W , there are signed tuples (u iP ub key , SD ui P ri key ). The size of signed tuples is denoted as s st . In our experiments, we generate 1024 bits RSA Keys with the Python package Crypto 4 . With the secret message as 10 Hex characters, s st = 56 bytes + 161 bytes = 217 bytes in our simulation.

Though the length of contact list and witness list are sampled from normal distribution described in Section 7.1, 4. https://pycryptodome.readthedocs.io we set the length to 0 if the sampled length is less than 0. The expected length of contact list E(N C ) based such sample strategy satisfies Equation 8 .

E(S T S/h ) = E(N T S/h ) * E(s T S ) = 1 * (s T id + s u D ID + s T imestamp + E(s C ) + E(s W )) = 256 bits + 10 bytes + 32 bits + E(s C ) + E(s W ) = 46 bytes + E(N C ) * s st + E(N W ) * s st ≤ 46 bytes + ( 

In this article, we propose a Blockchain Driven Contact Tracing framework (BDCT), which is a fully decentralized framework without any third-party required. We propose the role "witness" in the framework to promote contact tracing data integrity, and the RSA based Transaction Verification Method (RSA-TVM) to verify the correctness of the reported contact cases. Reputation Corrected Delegated Proof of Stake (RC-DPoS) consensus mechanism is applied to select miners based on both users' reputation and users' stake. An incentive mechanism is further developed to motivate people to keep reporting contact cases honestly and work with RC-DPoS achieving balanced stake reward distribution to maintain the whole framework decentralized. In the simulation, we propose a simulation environment, which mixes three contact scenarios based on different population density. The simulation results demonstrate our proposed framework can achieve significantly decentrality than the baseline framework, and RSA-TVM incorporated with "witness" role in the framework can hugely improve the system robustness.

For future work, better consensus mechanism should be designed to lower communication cost. In BDCT, though the computation cost is eliminated and storage cost is considered acceptable, the communication cost is still high for circulating contact transactions among devices to get contact cases verified especially when contact cases are tremendous. Synchronizing blockchain and shared transaction pool also imposes communication stress on smart devices. Therefore a better communication protocol is highly demanded for making more scalable contact tracing applications.

An interactive web-based dashboard to track covid-19 in real time

Contact tracing during an outbreak of ebola virus disease

Bluetrace: A privacypreserving protocol for community-driven contact tracing across borders

Covid-19 and health code: How digital platforms tackle the pandemic in china

Apple and google partner on covid-19 contact tracing technology

Covid-19 contact tracing using blockchain

Beeptrace: Blockchain-enabled privacy-preserving contact tracing for COVID-19 pandemic and beyond

Towards large-scale and privacy-preserving contact tracing in COVID-19 pandemic: A blockchain perspective

Acceptability of app-based contact tracing for covid-19: Cross-country survey study

Epic: Efficient privacypreserving contact tracing for infection detection

Enact: Encounter-based architecture for contact tracing

A privacy preserved and cost efficient control scheme for coronavirus outbreak using call data record and contact tracing

Wifitrace: Network-based contact tracing for infectious diseases using passive wifi sensing

Contain: Privacy-oriented contact tracing protocols for epidemics

Covid-19 and your smartphone: Ble-based smart contact tracing

Privacy-preserving contact tracing of covid-19 patients

Pact: Privacy sensitive protocols and mechanisms for mobile contact tracing

A first look at privacy analysis of COVID-19 contact-tracing mobile applications

A survey of COVID-19 contact tracing apps

Bitcoin : A peer-to-peer electronic cash system

A blockchain-enabled ecosystem for distributed electricity trading in smart city

Cloud/edge computing resource allocation and pricing for mobile blockchain: An iterative greedy and search approach

An incentive mechanism for building a secure blockchain-based internet of things

Reliability-aware offloading and allocation in multilevel edge computing system

Edge computing integrated with blockchain technologies," in Complexity and Approximation -In Memory of Ker-I Ko, ser. Lecture Notes in Computer

Blockchain-based digital contact tracing apps for covid-19 pandemic management: Issues, challenges, solutions, and future directions

Blockchain-enable contact tracing for preserving user privacy during covid-19 outbreak

Privacy-preserving contact tracing in 5g-integrated and blockchain-based medical applications

P 2 b-trace: Privacy-preserving blockchain-based contact tracing to combat pandemics

Covichain: A blockchain based framework for nonrepudiable contact tracing in healthcare cyber-physical systems during pandemic outbreaks

Blocov6: A blockchain-based 6g-assisted uav contact tracing scheme for covid-19 pandemic

TB-ICT: A trustworthy blockchain-enabled system for indoor COVID-19 contact tracing

Experiments on local positioning with bluetooth

A method for obtaining digital signatures and public-key cryptosystems

Revised Papers, ser. Lecture Notes in Computer Science

Delegated proof-of-stake (dpos)

Simulating human mobility patterns in urban areas

Grab-posisi: An extensive reallife gps trajectory dataset in southeast asia

Data source: Didi chuxing gaia open dataset initiative

Cellnet: Inferring road networks from GPS trajectories

T-drive: driving directions based on taxi trajectories

One-month beijing taxi GPS trajectory dataset with taxi ids and vehicle status

Applications of lorenz curves in economic analysis

A formula for the gini coefficient

A survey of automatic contact tracing approaches using bluetooth low energy

respectively. He is currently pursuing the Ph.D. degree with the Department of Computer Science, University of Texas at Dallas