key: cord-201675-3bvshhtn authors: Ng, Pai Chet; Spachos, Petros; Plataniotis, Konstantinos title: COVID-19 and Your Smartphone: BLE-based Smart Contact Tracing date: 2020-05-28 journal: nan DOI: nan sha: doc_id: 201675 cord_uid: 3bvshhtn Contact tracing is of paramount importance when it comes to preventing the spreading of infectious diseases. Contact tracing is usually performed manually by authorized personnel. Manual contact tracing is an inefficient, error-prone, time-consuming process of limited utility to the population at large as those in close contact with infected individuals are informed hours, if not days, later. This paper introduces an alternative way to manual contact tracing. The proposed Smart Contact Tracing (SCT) system utilizes the smartphone's Bluetooth Low Energy (BLE) signals and machine learning classifier to accurately and quickly determined the contact profile. SCT's contribution is two-fold: a) classification of the user's contact as high/low-risk using precise proximity sensing, and b) user anonymity using a privacy-preserving communications protocol. SCT leverages BLE's non-connectable advertising feature to broadcast a signature packet when the user is in the public space. Both broadcasted and observed signatures are stored in the user's smartphone and they are only uploaded to a secure signature database when a user is confirmed by public health authorities to be infected. Using received signal strength (RSS) each smartphone estimates its distance from other user's phones and issues real-time alerts when social distancing rules are violated. The paper includes extensive experimentation utilizing real-life smartphone positions and a comparative evaluation of five machine learning classifiers. Reported results indicate that a decision tree classifier outperforms other states of the art classification methods in terms of accuracy. Lastly, to facilitate research in this area, and to contribute to the timely development of advanced solutions the entire data set of six experiments with about 123,000 data points is made publicly available. Abstract-Contact tracing is of paramount importance when it comes to preventing the spreading of infectious diseases. Contact tracing is usually performed manually by authorized personnel. Manual contact tracing is an inefficient, error-prone, time-consuming process of limited utility to the population at large as those in close contact with infected individuals are informed hours, if not days, later. This paper introduces an alternative way to manual contact tracing. The proposed Smart Contact Tracing (SCT) system utilizes the smartphones Bluetooth Low Energy (BLE) signals and machine learning classifier to accurately and quickly determined the contact profile. SCTs contribution is two-fold: a) classification of the users contact as high/low-risk using precise proximity sensing, and b) user anonymity using a privacy-preserving communications protocol. SCT leverages BLEs non-connectable advertising feature to broadcast a signature packet when the user is in the public space. Both broadcasted and observed signatures are stored in the users smartphone and they are only uploaded to a secure signature database when a user is confirmed by public health authorities to be infected. Using received signal strength (RSS) each smartphone estimates its distance from other users phones and issues real-time alerts when social distancing rules are violated. The paper includes extensive experimentation utilizing real-life smartphone positions and a comparative evaluation of five machine learning classifiers. Reported results indicate that a decision tree classifier outperforms other states of the art classification methods in terms of accuracy. Lastly, to facilitate research in this area, and to contribute to the timely development of advanced solutions the entire data set of six experiments with about 123,000 data points is made publicly available. Index Terms-Bluetooth Low Energy, Smartphone, COVID-19, Physical Distancing, Proximity, Contact Tracing C ONTACT tracing is an important step in containing a disease outbreak [1] , [2] . Many efforts have been devoted to tracing a list of contacts when a person is diagnosed with a highly infectious disease, such as COVID-19. The current contact tracing method, which requires a collaborative effort from several authorized personnel, is labor-intensive and timeconsuming [3] . Since it takes time to trace the contact, the group of users who have been in contact with an infected individual might spread the disease to another group of people before they get informed. It is critical to have an effective contact tracing method that not only can automatically inform the potential users immediately but also reducing the required amount of labor force [4] . To this end, a smart contact tracing (SCT) system is introduced by exploiting the Bluetooth Low Energy (BLE) signals on smartphones. BLE is ubiquitous and is readily available on many smartphones making it ideal for the introduced system [5] . On the other hand, smartphones have become an intimate device in our everyday life. While we might leave the smartphone away from us when we are in our private space (e.g., home, private office, etc.), we always carry the smartphone when we do the grocery shopping, commute on public transport, walk along the open street, etc. In this way, smartphones are the best choice for contact tracing, in which the tracing is only performed when the user is in the public space. An overall illustration of our introduced SCT system is shown in Fig. 1 . At any time, no location or any other information regarding the users is collected or transmitted. The application uses only BLE signals and no information exchange. The system has three main objectives: preserveprivacy, provide accurate contact tracing, and provide real-time proximity alerts. Preserve-privacy. We leverage the beaconing feature in BLE wireless technology to broadcast an encrypted packet periodically [6] . This encrypted packet is broadcast on the nonconnectable advertising channels (i.e., CH 37, 38, and 39). Hence, our proposed SCT system can prevent unauthorized access to the user's smartphone. Furthermore, the packet encrypts a piece of unique signature information based on the ambient environmental features the smartphone encountered at a particular time. This signature is unique and is almost impossible to be duplicated by another device on another arXiv:2005.13754v1 [cs.LG] 28 May 2020 occasion. All the broadcast signatures and observed signatures will be stored in the local storage. The user is only required to upload their own broadcast signatures to the signature database when the user is confirmed to be infected with the contagious disease. Otherwise, the signature store in the local storage will be deleted automatically when it is expired. We define the signature expiration according to the disease spreading time window, as suggested by the health authorities. By comparing the signature of each smartphone, a list of possible contacts can be retrieved without explicitly revealing the sensitive information of the infected user. Accurate contact tracing. The smartphone application identifies contacts in proximity, over time. It records the estimated distance and the duration of interaction between individuals. In this way, it will identify when someone has been too close to an infected person for too long (the too close for too long (TC4TL) problem). For instance, when people hug each other they are too close for a short period of time, while inside the cabin of a flight, people can be in ten meters distance for too long, breathing the same air. At the same time, a distance of two meters in a classroom might be safe while three meters in a subway train might trigger an alert. We have to rely heavily on the virologists and the epidemiologists to identify the healthy distance in different environments, and through this application, we can give them access to these crucial details. Real-time proximity alerts. The application will provide a real-time alert to the user if the physical distancing between any two users is not maintained. This will be achieved by detecting the proximity between any users in a given location, including the grocery store, public transit, etc. This proximity information can be retrieved by inspecting the RSS patterns from the user's smartphone [7] . A smartphone can measure the RSS value upon seeing the packet broadcast by the nearby smartphone. Since RSS is inversely proportional to the square of the distance [8] , we can use it to estimate the distance between any two smartphones and then classify the proximity based on the recommended physical distancing rule. Precise distance estimation through the RSS is necessary to determine the proximity between any two smartphones. However, RSS is subject to severe fluctuation especially the body shadowing effect since the smartphone is carried by users [9] . We examine five machine learning-based classifiers: decision tree (DT), linear discriminant analysis (LDA), naive bayes (NB), k nearest neighbors (kNN), and support vector machine (SVM), over six different smartphone positions: Hand-to-Hand (HH), Hand-to-Pocket (HP), Hand-to-Backpack (HB), Pocketto-Backpack (PB), Pocket-to-Pocket (PP), and Backpack-to-Backpack (BB). In summary, this paper has the following contributions: • Privacy-preserving signature protocol: our SCT system provides a secure contact tracing by using the nonconnectable advertising channels and an encrypted packet containing unique signature information based on the ambient environmental features observed by a smartphone. • Proximity sensing and real-time physical distance alert, with precise distance estimation: we classify the proximity of a user to any user by estimating the distance between any two users based on the RSS values measured by each smartphone, while we push a notification to alert the users when anyone violates the physical distancing rule. After approximately 10 s of interaction between smartphones, the system is able to provide a reliable estimation. DL is the most accurate classifier. • Smartphone implementation and effects of smartphone's position: we prototyped our system design and implemented the application in modern smartphones to demonstrate the feasibility of our proposed SCT. The energy requirements of the application are negligible. We compared the classifiers in terms of their estimation accuracy, while we examine six different positioning sets of smartphones. When the users have their smartphones in similar positions, the classifiers can improve accuracy. • Extensive experiments: we performed extensive experiments in a real-world setting verify the effectiveness of our SCT. All the collected data is available in the IEEE Dataport [10] and Github [11] . The overall dataset contains the measurement data obtained from six experimental sets, amounting to a total of 123,718 data points. We believe that the dataset will serve as an invaluable resource for researchers in this field, accelerating the development of contact tracing applications. Contact tracing aims to track down a group of users who have encountered with an infected individual. The goal is to inform this group of users regarding the potential risk that they might face so that they can take appropriate actions as recommended by the local health authority. Contact tracing could be a viable solution in resuming the normal lifestyle while preventing the further virus outbreak. An illustration of the differences in having a contact tracing system is shown in Fig. 2 . In practice, a person will be quarantined immediately when they are confirmed to be infected with the disease. However, those people who have been in close contact with the infected individual are still free to move without realizing that they may have already got infected and became a virus carrier. With contact tracing, we can inform most of the potential close contact so that they can take appropriate action to isolate themselves from the crowd. Recognizing the importance of contact tracing, many countries have put effort to develop a smartphone-based contact tracing system. • China: In China, a close contact detector based on QR codes technology is implemented [12] . The application is developed based on a surveillance strategy in monitoring the people's movement within the country and it can push an alert to users if they have been in close contact with the infected individual. • South Korea: In South Korea, the location data (i.e., the GPS data) obtained from the user's smartphone are used to detect the distance of the users from the infected individual. The tracker application will push a notification that contains the personal details of the infected individuals to the potential users who have been in contact with the infected individual [13] . • Singapore: In Singapore, a privacy-preserving approach is adopted, by using the BLE signal on the smartphone to detect the proximity between any two individuals. The TraceTogether application broadcasts an encrypted packet, which is generated by a secret key distributed by the Ministry of Health, given the phone number [14] . The application will also alert the users when they are in close contact with the infected individual. The first two approaches might compromise user's privacy since their applications require to monitor the user's mobility and locations. On the other hand, in the third approach, although it preserves privacy by tracking only the proximity between users without explicit location information, the encryption process involves the user's phone number. Hence, the phone number might be retrieved by a malicious hacker. Besides the national-level effort, there are collaborations in industry and academia in delivering an effective contact tracing solution while preserving user privacy [15] , [16] . Rather than using location data, many of these initiatives focus on the use of BLE signals for proximity detection. For instance, Pan European Privacy-Preserving Proximity Tracing (PEPP-PT) detects the proximity based on the broadcast BLE packet containing a full anonymous ID [17] . COVID-19 Watch, on the other hand, can automatically alert the user when they are in contact with the infected individual [18] . Similarly, the Privacy-Preserving Automated Contact Tracing (PACT) exploits the BLE signals in combination with secure encryption to detect possible contacts while protecting privacy [19] . Most of the above initiatives assume that the BLE signals will work for proximity detection without considering the smartphone positioning effects on it. To the best of our knowledge, there is a lack of a comprehensive study on the accuracy of BLE signals for contact tracing proximity sensing. Furthermore, most of the encryption methods are based on information provided by the user, which might be subject to possible information leaks if the encryption method is compromised. To bridge the gap, this paper studies the proximity sensing with the BLE signals broadcast from the smartphones carried by the user while designing a privacypreserving signature protocol that uses the environmental feature instead of user information for packet broadcasting. Six experimental sets, with different smartphone positioning are examined, to investigate the feasibility of the system under different realistic conditions. BLE provides a short-range communication over the 2.4 GHz ISM band [20] . It is ubiquitous and has been adopted by many smart devices (e.g., smartwatches, earphones, smart thermostats, etc.) as the main communication platform [21] . Furthermore, BLE is readily available in most modern smartphones regardless of the operating system. There are two modes of communication available with BLE: 1) nonconnectable advertising, and 2) connectable advertising [22] . The latter advertising mode allows another device to request a connection by sending a CONNECT REQ packet on the advertising channels. In this work, we focus on the nonconnectable advertising mode, in which the device cannot accept any incoming connection requests. This feature is useful for our SCT system in ensuring no neighboring devices can access the smartphone to retrieve sensitive information. For contact tracing purposes, we configure the smartphone to periodically broadcast the advertising packet via the nonconnectable advertising mode. These packets can be heard and received by any nearby smartphones as long as these smartphones are within the broadcast range. These smartphones can also measure the received signal strength (RSS) upon receiving the packet. However, there are two major challenges: 1) the length of the advertising packet is only up to 47 bytes, and 2) the RSS values are subject to severe fluctuation. 1) Advertising Packet: In the non-connectable advertising mode, the smartphone will broadcast the advertising packet over the three advertising channels periodically according to the system-defined advertising interval, T a . The advertising interval defines how frequent a packet is broadcast. For example, if T a = 100 ms, we shall expect to see at least 10 packets per second. The advertising packet can take up to 47 bytes, as shown in Fig. 3 . Note that 16 bytes are used for preamble (1 byte), access address (4 bytes), header (2 bytes), MAC address (6 bytes), and CRC (3 bytes only 31 bytes left to put in the information related to the environmental signature. This poses a question on how to construct a unique yet useful signature that can be encapsulated into this 31 bytes payload. 2) Received Signal Strength (RSS): Following the inverse square law [23] , the RSS is inversely proportional to the square of the distance. Let P r denotes the signal strength in dBm, then: where d is the distance between any two devices and n is the path loss exponent, and its value is subject to the environmental setting when the measurement is taken. As shown in Fig. 4 , different environments have different effects on the RSS variation even though the distance between any two devices in these environments are the same. Hence, we need to take the environmental factor into consideration when applying the path loss model to estimate the distance given the RSS. Section V provides a further discussion on our distance estimation approach that addresses the above problem. The intimacy of smartphones in our everyday life motivates us to adopt the smartphone for contact tracing purposes. However, there are privacy concerns about using such an intimate device for contact tracing [24] . Many users might worry that their sensitive information which resides in the smartphone will be exposed to the public during the contact tracing. Our introduced SCT uses the non-connectable advertising mode, hence, none of the neighboring devices are able to connect to the user's device to retrieve any information. Furthermore, we are using a unique environmental signature that contains no information about the user's identity. Research efforts that tried to address this privacy issue provide better encryption methodologies [15] , [16] . However, none of the works discuss the contact tracing in private and public locations. While most users might willingly to participate in contact tracing in the public locations in a hope to flatten the disease spreading curve, they might feel a bit uncomfortable to let the contact tracing application running when they are having their private time in the private location (e.g., home, sleeping room, car, etc.). Future work can be conducted using the embedded sensors on the smartphone to check if the user is in the private or public location. Then, we can use this information to turn on and turn off the contact tracing application accordingly. Observe Signatur Local Databases: Observed Signatures III. PROPOSED SMART CONTACT TRACING SYSTEM There are two major phases with our SCT system, as shown in Fig. 5 : the interaction phase and the tracing phase. The interaction phase focuses on the following two main components: 1) privacy-preserving signature protocol, and 2) precise proximity sensing; whereas the tracing phase aims to provide an efficient signature matching. The interaction phase involves the day to day activities in public locations, such as workplaces, public transports, grocery stores, outdoor parks, etc. The contact tracing application starts automatically when it detects the user is in the public location. The application executes the following functions: i. Signature generation: The smartphone scans for the ambient environmental features. These features are selectively processed to generate a unique signature that can be fit into the 31 bytes advertising payload. The signature will be updated every few minutes. ii. Signature broadcasting: The smartphone broadcasts the advertising packet containing the unique signature periodically according to the advertising interval of T a . The packet is broadcasted through the non-connectable advertising channels. iii. Signatures Observation: The smartphone scans the three advertising channels to listen for the advertising packet broadcast by the neighboring smartphones. The scanning is performed in between the broadcasting event. iv. Proximity sensing: The smartphone measures the RSS values and uses them to estimate how close it is to the neighboring smartphones. It is assumed to be in proximity when the distance is less than 2 m. v. Physical distancing alert: The smartphone triggers a realtime alert to warn the user to keep a healthy distance from nearby users when it detects any physical distancing violation. All the generated signatures and observed signatures will be stored inside the user's local storage, as shown in Fig. 5 . Since the signature does not contain any information about the owner, there is no way for the user to trace or identify the original owner of the observed signatures. Furthermore, the signatures are deleted from the local storage permanently once it is expired. We define the expiration period for each signature based on the virus spreading timeframe recommended by the health authorities. For instance, for COVID-19 the expiration period should be 14 days from the day the signature was recorded. After 14 days, the corresponding signature will be deleted. If a user is diagnosed with an infectious disease, they can upload all the signatures to the signature database. In Fig. 5 , user A uploaded all his signatures to the signature database after he became an infected individual. The database will distribute the signature to all the users' smartphones. The signature matching computation is taken placed on each individual smartphone and a local alert is triggered when there is a match. The local alert means that the alert is triggered by the smartphone's program itself, not the centralized alert sent by the server. The server is only used to distribute the data. No program/code is executed on the server to find the close contact. In this way, we can protect the user from revealing their identity and to ensure that none of the match cases can be eavesdropped by malicious hackers. Besides signature matching, the application also performs the classification to classify the potential risk of a user according to the time and distance the user spent with the infected individual. While the smartphone can use the non-connectable advertising mode to refuse any incoming connection request attempt, we also need to ensure that the packet broadcasting will not reveal one's identity. Several methods have been suggested to protect the user's identity by using an encrypted packet [19] , [25] . However, these methods require a random generated secret key that can be compromised. For instance, in TraceTogether application, a secret key is used to encode the phone number of the user. If the secret key is hacked, then the phone number can be retrieved. In this work, we propose to use the ambient environmental signal to construct a signature vector that can fit into the advertising packet. When the application starts the contact tracing, it first generates a signature that can fit into the 31 bytes advertising packet. The signature is a transformed vector containing the ambient environmental features. When the smartphone scans for the packet broadcast by the nearby smartphones, it may also see other BLE devices. For example, in a grocery store, the smartphone might see the BLE beacon attached to the promotional item, the BLE signal from a smartwatch, Apple pencil, smart thermostat, smart lighting control, etc. The signal strength of each of these devices observed by a user's smartphone is changing depending on the location of the user. Furthermore, some of these devices (i.e., smartwatch, Apple pencil) might not always remain at the same location. Let P r (d) be a function that returns the time average RSS value measured from a BLE device located at a certain distance from smartphone and B = {b 1 , b 2 , . . . , b m } be a set of BLE devices excluding the smartphone used for contact tracing, then the observed vector can be expressed as follows: where o u (t) ∈ R m is an m-dimensional vector observed by a smartphone of user u at time t. The length of the vector m is dependent on the size of B. Rather than truncating the vector when m > 31 or filling the vector with some arbitrary values when m < 31, we define a dictionary Ψ ∈ R 31×m to transform the m-dimensional vector to a 31-dimensional vector. This dictionary is also known as the secret transformation key to the observed vector. We can define the dictionary as follows: By multiplying the dictionary with the observed vector, we obtain our unique signature vector. where Ψ j = (ψ 1,j , ψ 2,j , . . . , ψ 31,j ) T ∈ R 31 is the j-th column vector from the dictionary. The dictionary for each smartphone is different. If any two smartphones observe the same ambient features, i.e., similar time average RSS values from a similar set of BLE devices, the generated signature vector is still different. Note that the above case is very rare because even though both persons appear at the same location at the same time, the RSS values from the same set of BLE devices might be different due to the receiver sensitivity of the smartphone, the smartphone's antenna, the position and orientation of the smartphone, etc. Upon the generation of signature, the smartphone encapsulates this signature information into its advertising packet and broadcasts the packet through the non-connectable advertising channels. The advertising interval determines the broadcasting Signature generation Signature generation Signature generation Own Signatures Observed Signatures Fig. 6 : Timing diagram for the advertising, scanning and signature generation activities. All the generated and observed signatures will be logged in the local database, together with a timestamp τ . frequency. Suppose that T a = 100 ms, then we should expect about 10 packets per second. However, this also depends on the scanning window and the interval of a smartphone. More precisely, the smartphone can only see the packet when its scanning activity overlaps with the advertising activity. The timing diagram for the advertising, scanning, and signature generation activities is shown in Fig. 6 . Each activity is triggered periodically according to their interval, i.e., generation interval T g , advertising interval T a , and scanning interval T s . Given T s , the smartphone will only stay active to listen for the incoming packet for a duration defined by the scanning window T w . As shown in Fig. 6 , smartphone A fails to receive s B (t 1 ) from smartphone B, for the first two times since there is no scanning activity in smartphone A when s B (t 1 ) arrived. However, smartphone A manages to receive s B (t 1 ) when smartphone B broadcasts the same packet the third time. According to [26] , the likelihood to see the advertising packet broadcast by neighboring smartphones is high as long as T a < T s . Intuitively, when the broadcasting frequency is higher than the scanning frequency given the scanning window is sufficiently long, then it is likely for one of the broadcast packets from A meets the scanning windows of B. We can use a continuous scanning (i.e., set T w = T s ) to increase the packet receiving rate. However, such a scanning approach can greatly affect the energy consumption of a smartphone and eventually create an adverse effect on a user's experience. This work mainly focuses on the privacy and preciseness of contact tracing, balancing the energy consumption and packet receiving rate is a possible direction for future work. The smartphone logs the generated signatures and the observed signatures in its own local storage, as shown in Fig. 6 . A timestamp τ is logged when the smartphone saves a signature into the local database. We could use either a SQL or NoSQL approach to construct this database. The logged timestamp is useful to examine the total time two persons spend in close proximity to each other. Note that for the observed signature, we also log the RSS value upon receiving the packet. This RSS value provides useful information for proximity sensing. Proximity sensing has been employed in many scenarios, including identification of the user's proximity to museum collection [27] , and gallery art pieces [28] . There are also works study the proximity detection in dense environment [29] , or proximity accuracy with filtering technique [30] . However, most of these works study the proximity detection between a human and an object attached to a BLE beacon [31] . There is no work studying the proximity sensing between the devices carried by humans. We use RSS to infer the distance between any two smartphones [32] . Given the RSS value, the distance of a smartphone to another smartphone that broadcasts the packet can be estimated as follows: where n is the path loss exponent and c is the constant coefficient. Both n and c can be obtained through least square fitting. Given the distance, then we can determine if the user follows the safe physical distance as recommended by the health authorities. An alert is sent to remind the user if they violate the physical distance rule. In the distance estimation context, accuracy indicates how close an estimated value to the true value. In other words, the error between the estimated value and true value is close to zero for an accurate estimation. Precision, on the other hand, tells if any two estimated values fall into the same region given similar measurement input (i.e., the RSS value). For contact tracing purposes, an accurate distance estimation is not that critical as compared to precise proximity sensing. We do not need an accurate estimation to tell if the user is in proximity to the infected individual. Rather, a precise estimation is more critical in determining the risk of a user. In particular, we consider that a user belongs to the high-risk group when the user is in close proximity (i.e., d ≤ 2m) with the infected individual, otherwise, the user is considered to be in the lowrisk group. The problem of classifying the risk of a potential contact can be modeled as a binary hypothesis test. In particular, consider a risk mapping function R : (d) −→ {+1, −1}, where +1 indicates high-risk and −1 low-risk, then there are three hypotheses including the null hypothesis, since we need to consider also the case of false positive and false negative. False positive is also known as false alarm, in which a user actually belongs to the low-risk group, but the system wrongly classifies them to the high-risk group. False negative, on the other hand, wrongly classifies the user as the low-risk group, but they are actually in close proximity to the infected individual. Let H + denote the hypothesis that the user belongs to the high-risk (+1) group and H − the hypothesis that the user belongs to the low-risk (−1), then the possible hypotheses are: where R(d) = 0 means that the user is not in contact with the infected individual. This is valid when the signature matching returns null, which means the user did not encounter any infected individual. Let h, l, and a be the ground truth label for high-risk, low-risk and absence (i.e., the user is not in contact with the infected individual, then: The possible classification outcomes given the above hypotheses are illustrated in Fig. 7 . It is obvious that miss detection is undesirable because the user might be at risk but the system considers the user is safe. False negative misclassified the high-risk user to low-risk, but in comparison to miss detection, it can at least detect the user. However, this may give a wrong impression to the user that the possibility for them to get infected is low, but actually the possibility could be high. False positive misclassified the low-risk user to high-risk. Even though it is a bit conservative to alarm the user that they are most likely to get infected while they may not, this is a relatively safer outcome than miss detection and false negative. We have developed a smartphone application to demonstrate our proposed SCT. The application has the following two functions: 1) contact tracing based on the privacy-preserving signature protocol, and 2) physical distancing alert based on precise proximity sensing. We describe our experimental setup and then discuss our experimental results by comparing the performance with another five classifiers, i.e., decision tree (DT), linear discriminant analysis (LDA), naive bayes (NB), k nearest neighbors (kNN), and support vector machine (SVM). Fig. 8 : For the experimental purpose, we created another version of the application by (a) adding a manual button to control the start and the end of the experiment, (b) an input field to key in the truth distance measured through the measuring tape, and (c) a save button that save the measurement data. We built an Android Application to demonstrate the functionalities of our proposed SCT. First, the application generates a signature packet according to the privacy-preserving signature protocol. Then, it pushes an alert notification when the user violates the physical distancing rule. All the generated signatures, observed signatures, and their corresponding signal strengths are all stored in the smartphone's storage. We installed the application into Android smartphones, including Nokia 8.1 with Android 10, HTC M9 with Android 7.0 Nougat. At least API 23 is required for the BLE to operate. According to Google, at least 95% of smartphones support API 23. If a user has a lower API version, then they can still use the application, however, only to receive BLE signals. Other IoT devices, such as BLE beacons can be used from these users to enable them to transmit signals [33] . When running the application the power requirement is less than 0.24 W which is negligible. For experimental purposes, we created another version of the application that allows us to log the ground truth during the experiment, in order to be able and compare the estimation with the real data. For experimental purposes, the application that allows us to log the ground truth was used. We use the ground truth to evaluate the classification performance. The following information is logged: the truth distance, name of smartphone, MAC address of BLE chipset, the packet payload, RSS values, time elapsed, and timestamp. The time elapsed indicates the time difference between the previous broadcast packet and the current broadcast packet, whereas the timestamp is the exact time when the smartphone received the packet. The true distance is measured with a measuring tape during the experiment, as shown in Fig. 8 . In this experiment, both users are required to hold the smartphone on their hand while doing the measurement. However, the position of the hand is not fixed, and the user can randomly hold the smartphone according to their own comfort. The Android BLE API only provides three possible advertising interval settings, as shown in Table I . In our experiment, the application is configured to advertise at a "ADVERTISE MODE LOW LATENCY". The application is initiated to start the scan when the user presses the scan button. The scan will continue until the user press the button again. We repeated the experiment for distance from 0.2 m to 2.0 m (with 0.2 m increment each step), and 3 m to 5 m (with 1 m increment each step). Hence, there are a total of 13 distance points where the measurement is conducted. For each distance, the application was running for at least 60 s. All the measurement data are saved into a "comma-separated values" (.csv) file format and are exported to Matlab for further experiments. There are a total of 19,903 data points collected. The statistical descriptions of our experimental data are shown in Table II . We can see that the variance is high at some distance points. This is mostly due to the multipath effects, in which the signal takes multiple paths to reach the receiver. At the receiving end, the signals might be added up constructively or destructively, resulting in many different RSS measurements even the distance is the same. Furthermore, the reflection by moving objects in practical environments might cause undesirable outliers. To mitigate the possible outlier, we apply a moving average filter to the raw data. A comparison between the raw RSS data with the filtered data, for d ={0.2, 1, 2, 3, 5} m is shown in Fig. 9 . It is clear that the filtered data provides a much smoother signal. However, there are still variations in signal strength even though the distance is the same. In practice, it is hard to obtain only line-of-sight (LOS) signal in indoor environments due to the reflection and diffraction of the signal. Extracting multipath profile features to achieve better distance estimation could be a possible future direction. We applied a non-linear least square fitting to determine the value for coefficients n and c, in Eq. (5). We used the mean RSS described in Table II as the dependent variable and distance as the independent variable. Using this fitted path loss model, we can then estimate the distance by feeding the RSS value measured at each time step into the model. The ultimate goal is to classify if the user belongs to the high-risk or low-risk group assuming the other user is infected with the disease. Based on the physical distancing rule recommended by the Canada Health authority [34] , we classify the user as high-risk if the estimated distance is ≤ 2 m and low-risk if the estimated distance is > 2 m. We compared the fitted path loss model with five machine learning-based classifiers: DT, LDA, NB, kNN, and SVM. We separated the measurement data into 80% training data and 20% testing data. The input RSS was encoded into an 8-bit binary feature. We used confusion matrix to evaluate the performance of each classifier, as shown in Fig. 10 . Overall, DT method achieves the highest accuracy, i.e., 82.9%. However, if we examined the matrix, we can see that DT also produces a very high false negative rate, i.e., it incorrectly classifies the high-risk group as low-risk for 10.5%. This is not a desirable result for contact tracing purposes because those in the high-risk group are those people that are very likely to get infected but DT method classified them as low-risk. On the other hand, both NB and kNN have a higher false positive rate, i.e., 21.2%. This is relatively acceptable as it is rather to be more conservative than to be ignored. While the overall results are acceptable, there is still room for improvement. Instead of using the raw measurement, we compared the results with preprocess data. Furthermore, we would like to understand how the distancing threshold affects accuracy. 1) Implications of Filtered Window: As shown in Fig. 9 , we can mitigate the possible outliers by preprocessing the data. Note that Fig. 9 is obtained by applying the moving average with window size equals to 10. We further examine the effect of window size on the risk classification performance, and the results are shown in Fig. 11 the window size increases. However, NB does not show any performance gain with increased window size. LDA, on the other hand, starts to show fluctuation when the window size increases. This could be due to overfitting during the training process. Overall, we see that the performance starts to saturate when the window size is more than 100. The performance gain with respect to window size is shown in Fig. 12 . The performance gain is computed by subtracting the accuracy obtained by the filtered RSS with the accuracy obtained by the raw RSS and then divided by the accuracy obtained by the raw RSS. From Fig. 12 , we can see that PL is the one that has benefited from the filtered RSS, in which it has higher performance gain than the rest. In particular, we see that the accuracy obtained via PL is increased from 79.6% to 83.36%, which is corresponding to 4.68% performance gain. However, not all the methods are benefited from the filtered RSS. Some methods show a performance drop when the window size increases, for example, SVM. Even though DT and LDA can achieve better performance than PL model, both of these methods require extensive training and the accuracy may drop when there are not sufficient data for training. 2) Implications of Physical Distancing Threshold: As discussed previously, our SCT system classifies the contact to high-risk or low-risk according to the physical distancing rule recommended by the health authority. While we might incorrectly classify the high-risk contact to low-risk due to the fluctuation of the RSS value, we observed that the classification accuracy, in fact, increases when the distancing threshold is smaller. This is preferable as we would definitely like to classify the user as high-risk when the user is very close to the infected person. We plotted the accuracy score for different distance thresholds for PL, DT, and LDA. These three methods are selected because they show a good performance, as discussed previously. We compared the accuracy score between raw data and filtered data, by setting the window size to 100. This window size is selected based on the window effect on the accuracy score discussed previously. The accuracy is high when the distancing threshold is less than 1 m, as shown in Fig. 13 Fig. 13 : The effect of distance thresholds (i.e., the distance rule used to classify the high-risk and low-risk contact) on the accuracy. system might produce some false negative, but this is mostly happening to the group of users with distance in between 1 m to 2 m from the infected individual. Interestingly, we also see that the accuracy increases when the distance threshold is increased from 1.5 m to 2 m. This indicates that the system is a bit conservative, in which it tends to classify the user as high-risk when the distancing threshold is more than 1.5 m. Overall, it is safer to have high false positive than high false negative especially if the virus is very contagious. We extended the experiment to investigate the proximity sensing performance in connection to the positions of smartphone on the body. The reason is that the user might not carry the smartphone on their hand most of the time. When they are walking on the street or doing grocery shopping, the user might either carry the phone on their hand, put their phone on their pocket or backpack. As shown in Fig. 14 , we consider additional five cases on top of the "Hand-to-Hand (HH)" case discussed previously. All the measurement data collected from all these six cases can be found in IEEE Dataport [10] and Github [11] . There are a total of 123,718 data points, in which HH contributes 19,903 data points. The additional five cases and their total data points are listed as follows: • Hand-to-Pocket (HP): one user carries the phone on his/her hand and another user keeps the phone in their pocket. There are a total of 16,081 data points collected for this case. Previously, we have verified that the accuracy in distance estimation did affect the classification performance. Hence, we examine the distance estimation performance for all the five cases above. In particular, we used the mean absolute error (MAE) to compute the error between the estimated distance and the ground truth distance. The CDF of the distance estimation errors for all the six cases is plotted, as shown in Fig. 15 . It is obvious that the filtered data achieves a better performance. The window size 100 is selected based on the justification provided in Fig. 11 . From the CDF, we can see that, for 80% of the time, the error is less than 1.27 m for HH case, 0.76 m HP case, 3.93 m HB case, 14.94 m PB case, 1.79 m PP case, and 1.48 m BB case. We observe that PB has the worst performance. This can be explained by the fact that the signals from both smartphones were suffered through different paths of attenuation. Hence, even though we tried to calibrate the model based on the environmental factor, the model is unable to capture such variations. For HH, PP, and BB cases, the signals from both sides might suffer similar path of attenuation. Take the BB case for example, the smartphone observed a signal blocked by a human body since the smartphone on the other side was located inside the backpack. Similarly, the smartphone on the other side also observed similar signal blocked by another human body. Hence, as long as both smartphones measure the signal from similar position, it is most likely to produce a good estimation. We also examine the classification performances for all these six cases. Table III shows the classification accuracy obtained using the raw data and the filtered data. From the table, we can see that DT achieves the best performance with more than 81% accuracy for all the cases. Compared to the classification based on distance estimation, the machine learning based is more robust to the signal variations caused by body shadowing. Rather than estimating the distance, these classifiers tried to memorize the output given the labeled input during the training process. Hence, the amount of data and the validity of data during the training is very important to train a good classifier. In general, the machine learning approach can be adopted when there are sufficient training data available, otherwise, the PL model is the best choice for instant proximity sensing. The accuracy of the PL model increases when the time duration users spent in contact increases, as shown in Fig. 16 . In general, when the duration increases, the smartphone will be able to observe more signals that help to produce better distance estimation and increase the classification accuracy. However, the accuracy starts to saturate after 10 s for most of the cases except HB case. This result indicates that the smartphone has already observed sufficient RSS data for making the most accurate distance estimation when the time duration is at least 10 s. The accuracy of the HB case, on the other hand, drops when the time duration increases. Since the signals arrive at two smartphones from different attenuation paths, the more signals the smartphone observes, the more confusing the smartphone in making a correct estimation. Note that both HB and PB cases converge to the same accuracy score when the time duration increases. It is clear that the varying attenuation paths due to the position of smartphones on different body positions can severely affect accuracy. Future work can be conducted to study the effect of attenuation paths from both sides and come up with an adaptive path loss model that can cater to such diversity in attenuation paths. VII. CONCLUSIONS Contact tracing is an essential measure in containing the further spread of a highly infected disease. We propose a smart contact tracing (SCT) system that can provide precise proximity sensing and classify the risk of encountered contact while providing a privacy-preserving signature protocol. From the experimental results, we verified that a BLE-based system for contact tracing is a prominent solution for epidemic control and prevention. Our SCT system offers tangible results of using RSS values for proximity sensing between two human beings. We have also shared the dataset in an open-source repository to encourage further research. 77.04% (76.87%, 81.29%) 81.44% (77.20%, 81.60%) LDA 76.99% (76.82%, 78.68%) 78.85% (77.16%, 79.02%) NB 76.70% (76.55%, 76.55%) 76.70% (76.86%, 76.86%) kNN 76.92% (76.75%, 77.82%) 77 PB DT 87.01% (87.02%, 87.45%) 87.51% (87.16%, 87.58%) LDA 87.01% (87.02%, 87.02%) 87.01% (87.18%, 87.18%) NB 87.12% (87.05%, 87.05%) 87.13% (87.21%, 87.21%) kNN 87.03% (86.96%, 86.96%) PP DT 73.35% (73.12%, 87.18%) 87.26% (73.47%, 87.34%) LDA 72.94% (72.82%, 87.13%) 87.23% (73.47%, 87.34%) NB 71.34% (71.22%, 73.77%) 73.90% (73.07%, 87.33%) kNN 71.41% (71.29%, 73.79%) BB DT 77.36% (77.26%, 90.78%) 90.85% (77.45%, 90.91%) LDA Next generation technology for epidemic prevention and control: Data-driven contact tracking Contact tracing and disease control Five things we need to do to make contact tracing really work Quantifying sars-cov-2 transmission suggests epidemic control with digital contact tracing Smartphones and ble services: Empirical insights Ble beacons for internet of things applications: Survey, challenges, and opportunities Improved distance estimation with ble beacon using kalman filter and svm Rss localization using unknown statistical path loss exponent model Body shadowing and furniture effects for accuracy improvement of indoor wave propagation models China launches coronavirus 'close contact detector' app Coronavirus mobile apps are surging in popularity in south korea Privacy guidelines for contact tracing applications Tracesecure: Towards privacy preserving contact tracing Pan-european privacy-preserving proximity tracing We put the power to reduce the spread of covid-19 in the palm of your hand Pact: Private automated contact tracing Overview and evaluation of bluetooth low energy: An emerging low-power wireless technology Bluetooth: a viable solution for iot? Secure seamless bluetooth low energy connection migration for unmodified iot devices Rssi-based localization for wireless sensor networks with a mobile beacon Projecting the transmission dynamics of sars-cov-2 through the postpandemic period Epic: Efficient privacypreserving contact tracing for infection detection Modeling neighbor discovery in bluetooth low energy networks Ble beacons for indoor positioning at an interactive iot-based smart museum Notify-and-interact: A beacon-smartphone interaction for user engagement in galleries High resolution beacon-based proximity detection for dense deployment Improving ble beacon proximity estimation accuracy through bayesian filtering A compressive sensing approach to detect the proximity between smartphones and ble beacons Face-to-face proximity estimationusing bluetooth on smartphones Ble beacons in the smart city: Applications, challenges, and research opportunities Physical distancing: How to slow the spread of covid-19