key: cord-0894657-ipd3ecqu
authors: Sei, Yuichi; Ohsuga, Akihiko
title: Differentially Private Mobile Crowd Sensing Considering Sensing Errors
date: 2020-05-14
journal: Sensors (Basel)
DOI: 10.3390/s20102785
sha: d7df1ac8abe46ecf7e46a8cd0042ed7437e7acb2
doc_id: 894657
cord_uid: ipd3ecqu

An increasingly popular class of software known as participatory sensing, or mobile crowdsensing, is a means of collecting people’s surrounding information via mobile sensing devices. To avoid potential undesired side effects of this data analysis method, such as privacy violations, considerable research has been conducted over the last decade to develop participatory sensing that looks to preserve privacy while analyzing participants’ surrounding information. To protect privacy, each participant perturbs the sensed data in his or her device, then the perturbed data is reported to the data collector. The data collector estimates the true data distribution from the reported data. As long as the data contains no sensing errors, current methods can accurately evaluate the data distribution. However, there has so far been little analysis of data that contains sensing errors. A more precise analysis that maintains privacy levels can only be achieved when a variety of sensing errors are considered.

Today's smartphones are powerful minicomputers that contain an impressive array of sensing components such as cameras or accelerometers, with the ability to collect and analyze users' surrounding information [1] (Figure 1 ). Extensive research shows that as well as through mobile phones, data is collected through different means of transportation, such as trains, cars or bicycles. Such information collection is referred to as participatory sensing or mobile crowdsensing. Many studies have been conducted on participatory sensing. For example, Bridgelall et al. proposed a system that detects anomaly locations of roadways using participatory vehicle sensors [2] . Kozu et al. developed a hazard map of bicycle accidents based on data from accelerometers of participatory smartphones [3] .

Although high participation is necessary for participatory sensing to be successful, participants may be discouraged by privacy concerns or having to use extra battery power. As such, it is necessary to develop a participatory sensing method featuring both low battery power requirements and high privacy protection [4] .

Several frameworks use geotagged posts of Twitter and/or Instagram [5, 6] . Although Twitter and Instagram users disclose their locations intentionally, a privacy mechanism could motivate the users to share more geotagged posts.

Several privacy-preserving techniques have been proposed for participatory sensing, such as in References [7, 8] . By perturbing data based on -differential privacy [9, 10] privacy leakage can be controlled. Differential privacy has been used in many studies, such as References [11] [12] [13] , as it is one of the strongest privacy metrics [14] .

Sensed data (e.g., location, images, noises) s) It is problematic, however, that although most collected data contain sensing errors, these seem to have been overlooked in the majority of existing studies. Therefore, the methods used in existing studies reconstruct not the true values but the sensing values with sensing errors (see Table 1 ). As such, the accuracy of the analysis based on current methods is compromised. Table 1 . The difference between the existing methods and our method.

Existing methods. The estimated distribution of sensing data containing errors. Our method. The estimated distribution of true data without errors.

In this paper, we propose an architecture of privacy-preserving participatory sensing considering sensing errors. The proposed architecture consists of two parts. One is the anonymization technique on each participant's side (perturbing data with sensing errors [PDE] ). Each device perturbs its sensed data and then reports the perturbed data to the data collector. Because perturbed data is reported to the data collector, the data collector cannot know the true data distribution. Therefore, the proposed architecture also provides an estimation technique, which estimates the true data distribution based on the reported data, on the data collector's side (estimating true distribution considering sensing errors [ETE]).

We define our proposed model. This model is the same as that used in an existing study [8] except for sensing errors.

Sensed data on participants' surrounding environment that features some sensing errors, such as their location or the radiation level, is collected on mobile phones and sent to the data collector. It is then assumed that the data collector's analysis results in an accurate data distribution (see Figure 2 ).

Many factors are worth considering when developing mobile crowdsensing applications, such as radiation levels, urban planning, class of vehicle (for example, whether it is a flatbed truck, taxi or ambulance), and anonymous driver monitoring, as well as more general information such as the participant's city of residence, surrounding noise levels or personal data such as age and gender [15, 16] . The perturbed data Data collector Based on -differential privacy Estimates the distribution of participants' true data.

Participatory sensing application There are several stages to the mobile crowdsensing application process. First, the crowdsensing application ID is determined by the data collector so that a selection of crowdsensing applications can be used simultaneously and still be easily distinguishable. Following from this, the data collector must source participants who own an electronic mobile device such as a GPS device or smartphone. Once a participant has volunteered to collaborate with the crowdsensing application, then PVE, the suggested anonymization algorithm, is applied. The final stage is for the data collector to analyze the mass of data using the ETE.

Because several studies suggest that measurement errors follow a normal distribution [17] , it is used in this paper as an error model. Standard deviation is defined by the parameter σ, which typifies normal distribution. It is widely recognized that true sensing data falls within the normal distribution pattern [18] [19] [20] . Indeed, a study of 29,000 items of GPS data collected by Devon et al. [21] and real-time gesture recognition achieved by the pose tracking accuracy of the Microsoft Kinect 2 reported by Wang et al. [22] both follow normal distribution patterns.

It can thus be predicted that error probability also, for the most part, emulates a normal distribution pattern. The accuracy of a sensor is normally depicted on a data sheet shown by sensor vendors. For example, a standard deviation of a normal distribution is shown on the data sheet. If an average error is shown, we can obtain the standard deviation of the normal distribution.

Jiang et al. proposed a fault diagnosis system that took into account a measurement error problem [23] . They assumed that the measurement error usually follows a normal distribution. Wang et al. proposed a measurement system for the rotational angle of the wheel [24] . They considered sensing error analysis to be a very important problem. Location errors of an accelerometer were set to follow normal distributions in their experiments.

MPU-6000 IMU is a low-cost navigation system for ground vehicles. Gonzalez et al. [25] collected real sensing data from MPU-6000 IMU and concluded that sensing errors of accelerometers of ZMPU-6000 follow normal distributions, and sensing errors of gyroscopes of ZMPU-6000 can be modeled as pseudo normal distribution processes, although the errors do not follow a perfect, normal distribution. They also collected real sensing data of Ekinox IMU. They showed that sensing errors of accelerometers and gyroscopes followed normal distributions.

Nguyen et al. discussed how sensing location errors affects mobile, robotic, wireless sensor networks [26] . The sensing location errors were modeled to follow normal distributions in their proposed algorithm. They showed that their algorithm realized a high performance using real data sets containing sensing errors.

Similarly, a machine learning technique featuring deep neural networks has been adopted by sensing systems. Several studies based on deep neural networks reported that prediction errors followed a normal distribution [27] [28] [29] . If several training samples can be amassed, a data collector can analyze the standard deviation of the error distribution.

Although not all sensing errors follow normal distributions, many sensing errors are considered to follow normal distributions, as described above. Our proposed method targets the situation in which sensing errors can be considered to follow normal distributions.

Assume that the data collector wants to analyze the noise level in each location to tackle a plane noise problem. To increase the number of participants, the data collector wants to mitigate the privacy issues of the participants. In this case, each participant can perturb her/his location information, and then each participant reports the perturbed location information. Because the reported location information is perturbed by each participant, the data collector should reconstruct the true information. However, because existing studies did not consider the sensing errors of the true location information, the accuracy of the reconstructed location information with the data collector will decrease.

In this paper, we aim to increase the accuracy of the reconstructed information with the data collector by modeling the sensing errors at the participants' side.

Extensive research completed within publications in the data-mining field [30] [31] [32] reveals that differential privacy [33] is among the most powerful privacy measures available. The following context can be considered: an honest data holder with a database containing participants' true information is paired with a malicious data analyst desiring access to that database. Whenever the database is accessed by the analyst, noise is added to the query response based on a privacy mechanism A. The differential privacy can be understood in the following manner, with as a positive real number: Definition 1 ( -differential privacy). Databases D and D are neighboring databases, if they differ only in at most one record. A privacy mechanism A satisfies -differential privacy if and only if for any output Y, the following equation holds for all databases of D and D :

This method can be used for privacy-preserving participatory sensing [34] : Definition 2 (local privacy). Databases x and x are neighboring databases with size = 1. A privacy mechanism A satisfies -differential privacy if and only if for any output y, the following equation holds for all databases of x and x :

Data analysis can be achieved through the data collector producing data distribution, which is expressed by a (multidimensional) histogram or a cross-tabulation. To measure the difference between the original data generated distribution, information known to neither the participant nor the data collector, and the reported data generated distribution analyzed by the data collector, the utility metric Mean Squared Error (MSE) is employed.

Let N denote the number of participants, and let H 1 , H 2 , . . . , H b n denote each bin of a histogram of sensed data. Here, b n represents the number of bins. Let § j denote the number of participants whose true data is categorized to H j , and let ‡ i denote the number of participants categorized to H i in the estimated histogram at the data collector.

We use the MSE between ‡ j and § j to quantify the utility for the estimated histogram:

(2)

The objective is to ensure the -differential privacy is achieved, each sensed value is anonymized, and a (multidimensional) histogram is created, while the MSE remains minimized to retain superior quality. This is outlined below: Problem 1. Given a set of participants U (the size of U is N), their sensed data x i (i = 1, . . . , N), and a privacy parameter , find anonymized data y i satisfying -differential privacy for all i. Moreover, given the anonymized data y i , find estimated data distribution ‡ i (i = 1, . . . , b n ) that minimizes the MSE.

Several privacy-preserving systems including [35] [36] [37] that are based on encryption, known as encryption schemes, can be established for this context. They assume that the data collector might be a malicious entity but that the participant fraction conspiring together with the data collector, can be no higher than the predefined value γ. Honest participants' private data could be leaked if the data collector connives with over γ% of participants. It must also be highlighted that, as demonstrated in Section 2.1, it is quite simple for data collectors to create smartphone emulators to freely connive within mobile crowdsensing situations. One increasingly trusted system that safeguards participants' data regardless of whether data collector and N − 1 of N participants are conspiring together is randomized response [38] . Here, a sensed value represents a predefined category that is then substituted with a certain probability category before the data collector receives it. In this way, participants' privacy is to some extent ensured as the true data with probability p and the perturbed data with probability 1 − p are sent to the server. Although the data collection server cannot obtain reliable information about each participant's data, by collecting information from many participants and conducting a statistical analysis, it is possible for the data collection server to estimate the true data distribution with some degree of accuracy.

Several methods that extend randomized response have been proposed, such as [7, 8, 39] . In a method called S2Mb (single to randomized multiple dummies with bayes) [8] , each participant selects and reports several category IDs to the data collector. By adjusting the probability of selecting the original category ID and the number of selected category IDs, S2Mb can achieve higher accuracy while maintaining the privacy protection level. S2Mb outperformed other privacy-preserving methods [8] .

Task allocation is one of the main issues of mobile crowdsensing. Yang et al. proposed a privacypreserving framework that can allocate tasks to each participant [40] . They assume that the data collector is a trusted entity, and the data collector has a signed agreement with participants. On the contrary, the data collector in our proposed method does not need such an agreement.

The protecting location privacy (PLP) framework was proposed by Ma et al. [41] . Each participant specifies her/his privacy location in advance, and the participant sends all sensing data with location information as they are, except for the data sensed in the specified privacy location. Because several data are sent to the data collector without any modification, PLP does not satisfy differential privacy. PLP uses another privacy metric named δ-privacy, and satisfying differential privacy is out of the scope of the PLP framework.

In mobile crowdsensing, there is a tradeoff between participants' privacy and data utility. Gao et al. proposed a game model that addressed this contradictory issue [42] . Their method helps to determine the value of the privacy budget of differential privacy. As noted in their paper, their method does not care about how to add noises to the sensing data or how to conduct statistical analysis. Our proposed perturbing data with sensing errors (PDE) and estimating true distribution considering sensing errors (ETE) can be used for adding noise and conducting statistical analysis, respectively.

Huai et al. proposed a privacy-preserving aggregation framework [43] . Their method can realize high data utility while preserving privacy. They assume that the data collector might not be a trusted entity. However, if many participants collude with the data collector, an honest participant's privacy will be leaked to the data collector. Because it is difficult for the honest participant to know how many other honest participants there are, the honest participant still might have privacy concerns.

Huang et al. proposed a privacy-preserving incentive mechanism for mobile crowdsensing [44] . Their target application is a noise monitoring system that collects noise levels and corresponding location information. Because the noise level is related to the location, an attacker could estimate each participant's location using a location inference attack. Although they did not consider sensing errors, Huang et al.'s proposed mechanism satisfies differential privacy and prevents location inference attacks.

Nonetheless, sensing errors are not taken into account in the methods outlined.

There are many privacy metrics other than differential privacy. For example, k-anonymity was originally proposed as a privacy model when publishing medical data [45] , and it is used today in many studies [46, 47] . k-anonymity ensures that there are k or more records that have the same quasi-identifier values so that k-anonymity can protect against "identity disclosure". For example, a method wherein a database that originally recorded ages in 1-year increments is abstracted to 30 s, 40 s, and so forth. Even in the event an attacker knows all the quasi-identifier information about a given user, because there are k or more records corresponding to that user, they cannot tell beyond a 1/k level of confidence which record belongs to the corresponding user. There are also k-anonymity related privacy metrics such as l-diversity [48] and t-closeness [49] . These privacy metrics are also important; however, applying our proposed model to privacy metrics other than differential privacy is out of the scope of this paper and considered for future work.

An incentive mechanism is a very important issue for mobile crowdsensing. If the incentive mechanism works well, it is expected that the crowdsensing system can gather many participants even if the privacy levels are relatively low. On the other hand, if there are no good incentive mechanisms, the privacy levels should be higher to recruit many participants.

Suliman et al. proposed an incentive-compatible mechanism for group recruitment [50] . They considered the greediness of participants of in-group recruitment, and the proposed mechanism can increase the quality of the collected information by selecting participants who are expected to give high-quality data at a low cost.

A reverse auction mechanism also can be used for recruiting participants. The participants bid their expected rewards, and the crowdsensing manager selects good participants. In general, the winning probability is not known to the participants. Modified reverse auction (MRA) mechanisms proposed by Saadatmand et al. provide the estimated winning probability to participants [51] .

The participants can modify their bidding price to increase their probability of winning. Wu et al. proposed a modified Thompson sampling worker selection (MTS-WS) mechanism, which uses reinforcement learning to estimate each participant's data quality [52] .

The prevention techniques against false data injection attacks are also important for the success of mobile crowdsensing. We can use these techniques, such as in References [53] [54] [55] , to select reliable participants who contribute to maximizing the quality of mobile crowdsensing.

Zhang et al. proposed a privacy-preserving crowdsensing framework using an auction mechanism [56] . They assumed that the data collector is a trusted entity, and each participant sends her/his sensitive data to the data collector as-is. Therefore, the privacy information of participants is known to the data collector. On the contrary, we assume that the data collection server might not be a trusted entity. Each participant's original data need not be sent to any other entities in our proposed method.

There are several important mobile crowdsensing survey articles. Capponi et al. analyzed mobile crowdsensing studies and outlined future research directions [57] . Liu et al. [58] focused on privacy and security, resource optimization, and incentive mechanisms. They argued that ensuring privacy and trustworthiness is important.

Pouryazdan et al. [59] proposed three new metrics to quantify the performance of mobile crowdsensing: platform utility, user utility, and false payments. Using these metrics, they showed that data trustworthiness and data utility could be improved by collaborative reputation scores, which are calculated based on statistical reputation scores and vote-based reputation scores.

Pouryazdan et al. [60] proposed a gamification incentive mechanism. They formulated a game theory approach and showed that their mechanism could improve data trustworthiness greatly. Moreover, the proposed mechanism could prevent the data collector from paying rewards to malicious participants.

Xiao et al. formulated the interactions between the data collector and the participants as a Stackelberg game [61] . Because the sensing accuracy determined the reward, each participant was motivated to sense highly accurate data. Deep Q-Network, a reinforcement learning algorithm with deep neural networks, was used to determine the optimal reward.

Privacy-preserving mechanisms, including our proposed method, could be combined with such incentive mechanisms to increase participants while maintaining a low cost.

Domínguez et al. [5] proposed a method that detects unusual events based on geolocated posts on Instagram. The framework uses DBSCAN, a density-based clustering algorithm that executes an outlier detection algorithm to detect unusual events. INRISCO, an incident detection platform for smart cities, was proposed by Igartua et al. [6] . INRISCO uses Twitter and Instagram posts along with the data of vehicular and mobile ad hoc networks. Although Twitter and Instagram users disclose their locations intentionally, privacy-preserving mechanisms and incentive mechanisms could motivate the users to share more geotagged posts. As a result, the ability to detect unusual events can be improved.

We assume that sensing errors follow a probability distribution such as a normal distribution, as described in Section 2.1.

Here, there are two scenarios. In the first scenario, the standard deviation of the sensing error is not considered private information. Because the standard deviation itself does not have any sensitive meaning, this scenario is reasonable. In the second scenario, we consider the standard deviation of the sensing error to also be private information. For example, if the standard deviation is correlated with the sensing value, then the second scenario is preferred. Our proposed architecture can address both scenarios.

A differential private value can be obtained by adding Laplace noise to a target value [9] . Each participant adds a Laplace noise to the sensed value; then, the noised value is reported to the data collector. The data collector estimates the data distribution (see Figure 2 ) from all of the reported values. If only one person participates in the participatory sensing, then the data collector concludes that the reported value is most likely to be the real value. However, if there are many participants, the data collector can estimate a more accurate data distribution through the statistical analysis proposed in this paper.

Our main notations are summarized in Table 2 . 

Set of standard deviations of the normal distributions of sensing errors of all participants. b n Number of bins of a histogram. maxv org Maximum value of a sensing data. minv org Minimum value of a sensing data. maxv rep Maximum value of a reported data. minv rep Minimum value of a reported data. maxσ org Maximum value of a standard deviation. minσ org Minimum value of a standard deviation.

b v Scale factor of a Laplace noise with regard to the sensing value. b σ Scale factor of a Laplace noise with regard to the standard deviation. † i Number of participants whose reported values were categorized into the ith bin. ‡ i Estimated number of participants whose true values were categorized into the ith bin.

In this section, we propose an anonymization technique at each participant's side: perturbing data with sensing errors (PDE).

The data collector determines the minimum and maximum values of the sensed data for which to use differential privacy. For example, the data collector can determine whether the participant's noise volume is from 0 to 120 dB. If the sensed value is out of this range, the value is considered to be 0 (if the sensed value is less than 0) or 120 (if the sensed value is greater than 120) on the participant's device. Let minv org and maxv org represent the minimum and maximum values of the sensed values.

The value range of perturbed data is infinity because a Laplace noise is added to the sensed data. To avoid decreasing the accuracy of an estimated histogram, the data collector also determines the minimum and maximum values of the reported data with which to create a histogram. Let minv rep and maxv rep represent these values.

If the data collector considers the standard deviation of the sensing error to also be private information, then the data collector will determine the minimum and maximum values of the standard deviation. Let minσ org and maxσ org represent these values.

The Laplace mechanism [9] can be used, which adds noise based on the Laplace distribution. The theorem of the Laplace mechanism for data collection is introduced.

A privacy mechanism A realizes -differential privacy if A adds the Laplace noise Lap(∆/ ), where ∆ is the range of the target attribute's possible values, and Lap(b) returns independent Laplace random variables with the scale parameter b.

If the standard deviation is considered private information, then a Laplace noise is added to the standard deviation as well as to the sensing data.

In the second scenario, in which the standard deviation σ of the sensing error is considered private information, a Laplace noise is added to not only the sensed value x but also the value of σ. If two elements are protected by -differential privacy, we should divide the privacy budget into two elements [34] .

Algorithm 1 shows the PDE algorithm.

Input: minv org , maxv org , minv rep , maxv rep , minσ org , maxσ org , . Output: Report value v and standard deviation σ of sensing error 1: Obtain sensed value v and standard deviation σ of sensing error 2: if the standard deviation is considered as private information then 3: ← /2 4: end if 5 : v ← min(max(minv org , v), maxv org ) /* If v is smaller than minv org (or larger than maxv org ), v is set to minv org (or maxv org ).*/ 6: v ← v + Lap((maxv org − minv org )/ ) /* The global sensitivity is maxv org − minv org .*/ 7: v ← min(max(minv rep , v), maxv rep ) /* If v is smaller than minv rep (or larger than maxv rep ), v is set to minv rep (or maxv rep ).*/ 8: if the standard deviation is considered as private information then 9: σ ← min(max(minσ org , σ), maxσ org ) /* If σ is smaller than minσ org (or larger than maxσ org ), σ is set to minσ org (or maxσ org ).*/ 10: σ ← σ + Lap((maxσ org − minσ org )/ ) /* The global sensitivity is maxσ org − minσ org .*/ 11: end if 12: Report v and σ.

First, a sensing device for each participant measures target data. The device obtains the sensed value v and the standard deviation σ (Line 1). If the standard deviation is considered to be private information, the privacy budget is divided by two (Line 2).

If v is smaller than minv org , v is set to minv org , and if v is larger than maxv org , v is set to maxv org (Line 5). Then, PDE adds a Laplace noise to v to satisfy -differential privacy (Line 6). Here, the global sensitivity is maxv org − minv org .

Finally, if the value of v with Laplace noise is smaller than minv rep (or larger than maxv rep ), v is set to minv rep (or maxv rep ) (Line 7). If the standard deviation σ is considered to be private information, PDE adds a Laplace noise to σ (Line 10).

The proposed PDE realizes -differential privacy.

Proof. The global sensitivity ∆ v of a sensing value and the global sensitivity of the standard deviation of a sensing error ∆ σ are (maxv org − minv min ) and (maxσ org − minσ org ), respectively. According to Theorem 1, when a Laplace noise with scale ∆ v / is added to the sensing value, we can achieve -differential privacy with regard to the sensing value. Similarly, when a Laplace noise with scale ∆ σ / is added to the standard deviation of the sensing error, we can achieve -differential privacy with regard to the standard deviation. When we consider the standard deviation to be private information, we should achieve -differential privacy for the combination of the sensing value and the standard deviation. In this case, PDE achieves /2-differential privacy with regard to the sensing value and the standard deviation, respectively. Therefore, according to Reference [34] , PDE achieves -differential privacy in total.

In this section, we propose an estimation technique that estimates the true data distribution based on the reported data, at the data-collector side: estimating true distribution considering sensing errors (ETE).

The data collector estimates the true data's distribution, which is represented by a (multi-dimensional) histogram, from the reported data. Each true data point of each participant might be unknown to the participant. Let F(y; x, θ) be the probability density function with regard to y, which represents the reported sensing value, where x represents the true value and θ represents the set of parameters comprising the sensing error and a Laplace noise. Let x i and y i represent the true sensing value and the reported sensing value of participant i, respectively. The value y i contains a sensing error following a normal distribution and a Laplace noise to satisfy -differential privacy. That is when the true value is x i , the probability density with which the reported value becomes y i is F(y i ; x i , θ). Let X and Y represent {x 1 , . . . , x N } and {y 1 , . . . , y N }, respectively. Based on F(y; x, θ), by using Bayes' technique, we can estimate the distribution of X from Y.

Let w be the width of each bin of the histogram. The value of w is calculated by

where b n represents the number of bins of an estimated histogram, as determined by the data collector. The function F(y; x, θ) is a probability density function, and y is a continuous random variable. The number of samples of y is a finite set in a real situation; therefore, we approximate the probability density function as a probability mass function. The domain of y is defined as V = (minv rep + w/2, minv rep + 2w/2, minv rep + 3w/2, . . . , minv rep + b n * w/2).

Let P be the b n × b n matrix and P(i, j) represent the value of P in the ith row and jth column. P(i, j) represents the probability that the reported value is categorized into jth bin when the true value is categorized into ith bin.

Let † i be the number of participants whose reported values are categorized into the i'th bin, and let ‡ i be the estimated number of participants whose true values are categorized into the i'th bin. Let † and ‡ be the sets { † 1 , . . . , † b n } and { ‡ 1 , . . . , ‡ b n }, respectively.

Based on the iterative Bayes' technique [62] , we have ‡

Equation (5) is repeated a sufficient number of times. Several values of the estimated data distribution might be negative. Therefore, the data distribution should be adjusted so that all values are greater than or equal to zero. The values are perturbed based on the probability simplex algorithm [63] . Moreover, because the data collector determines the value range for sensing in advance, values that are out of range should be zero. Note that to use differential privacy as a privacy metric, we must determine the value range in advance if we use any other methods that can satisfy differential privacy. Therefore, in each iteration, for

and

we set

because the true values are within minv org and maxv org . Now, we describe how to obtain P. Each value of P(i, j) is calculated by the following equation for all values of i;

F(y; minv rep + (i − 1) * w + w/2, θ)dy P(i, j) = minv rep +j * w minv rep +(j−1) * w F(y; minv rep + (i − 1) * w + w/2, θ)dy for j = 2, . . . , b n − 1 P(i, b n ) = ∞ minv rep +(b n −1) * w F(y; minv rep + (i − 1) * w + w/2, θ)dy. (9) The function F(y; x, θ) differs for the two scenarios. First, we consider the scenario in which the standard deviation of an error distribution is not private information. That is, a Laplace noise is added to the sensed value before the value is reported to the data collector, but each participant reports the standard deviation of the sensing error as it is to the data collector. In this case, the data collector can determine the true standard deviation of the sensing error's normal distribution. Let b v be a scale factor of a Laplace noise with regard to the sensed value. The value b v is represented by

and we can consider θ = {σ, b v }. In this case,

where N (t; x, σ) represents the probability density of t in a normal distribution with a mean of x and a standard deviation of σ, and L(y; t, b v ) represents the probability density of y in a Laplace distribution with a mean of t and a scale factor of b v .

In the second scenario, where the standard deviation σ of the sensing error is considered to be private information, a Laplace noise is added to not only the sensed value x, but also the value σ, as described in Section 4.2.

Let b v and b σ be scale factors of a Laplace noise with regard to the sensed value and the standard deviation, respectively. The values b v and b σ are represented by

and b σ = maxσ org − minσ org /2 . (13) In this case, we consider θ = {σ, b v , b σ }, and obtain

where Figure 3 shows a high-level diagram of the estimation algorithm (ETE) and Algorithm 2 shows the details.

Because the values of P(i, j) (i = 1, . . . , b n and j = 2, . . . , b n − 1) are the same when the values |i − j| are the same, we calculate only P(1, j) (represented by Q(j)) and additional values represented by le f t and right in lines 9-13. Then, we construct P(i, j) in lines 14-20. Figure 4a represents the relationship between P(1, j) and Q(j). Each value of Q(j) represents the area marked by the corresponding arrow. The curve line represents the F(y; x, θ). The value of x in Algorithm 2 can be arbitrary but is set to the middle of the area, represented by Q(1). Because the summation of le f t + right + ∑ b n j=1 Q(j) is equal to one, we obtain the value of right by 1 − le f t − ∑ b n j=1 Q(j) in Line 13.

Input: Y, Y σ , , minv org , maxv org , minv rep , maxv rep , minσ rep , maxσ rep , b n Output: ‡ 1: σ ave ← Average(Y σ ) /* Consider σ ave is the standard deviation of each participant*/ 2: if standard deviation is considered as private information then 3: b v and b σ are calculated by Equations (12) and (13), and set θ = {σ ave , b v , b σ }. 4: else 5: b v is calculated by Equation (10), and set θ = {σ ave , b v }. 6: end if 7: w ← (maxv rep − minv rep )/b n /*w represents the width of each bin*/ 8: x ← an arbitrary real number 9: le f t ← x−w/2 −∞ F(y; x, θ)dy 10: for j = 1, . . . , b n do 11 : 14: for i = 1, . . . , b n do 15 :

end for 19 :

j=b n −i+1 Q(j) + right 20: end for 21: Set † i for each i based on Y. 22: for Repeat sufficient times do 23: for i = 1, . . . , b n do 24: d i ← 0 25: for j = 1, . . . , b n do 26: d i ← d i + P(k, j) * ‡ k /*Calcuation of the denominator of Equation (5)*/ 27: end for 28: end for 29: for i = 1, . . . , b n do 30: for j = 1, . . . , b n do 31: Calculate .

Calculate P , .

Execute the iteraƟve Bayes' technique.

Obtain the esƟmated results.

Yes No Figure 3 . A high-level diagram of the estimation algorithm. Figure 4b ,c represent the relationship between P(2, j) and Q(j) and the relationship between P(3, j) and Q(j), respectively. As increases i, P(i, 1) decreases, and P(i, b n ) increase. 

Our proposed architecture models sensing errors. If we do not consider the sensing errors, then we consider that only a Laplace noise is added to the true data, even if the sensed data differs from the true data in a real situation. To verify the usefulness of considering the sensing errors, we developed a method of considering only the Laplace noise. We refer to this method as the Laplace mechanism. In this section, we compare our proposal with the Laplace mechanism and with S2Mb, which is described in Section 3. The Laplace mechanism, S2Mb, and the proposed method all use iterative Bayes' technique. We set the iteration times as the best values for each method, for each simulation, within 100,000 iterations.

The source code for the proposed architecture can be obtained from https://uecdisk.cc.uec.ac.jp/ index.php/s/WfIyH8hRMhoF01R. This source code consists of the server (data collector) program and the client (participant) program.

Apple's deployment ensures that is equal to 1 or 2 per each datum [64] , and that the total privacy loss is 16 per day. An Apple differential privacy team set = 2, 4, 8 for its evaluations [65] . Based on these settings, is set in the range 1-15 in the experiments.

First, we evaluated the MSE using synthetic datasets. We conducted experiments using several distributions to determine how different data distributions would affect the results. We used three distributions: normal, uniform, and peak. In the uniform distribution, all values of § i were set to the same value. In the normal distribution, the values of § i followed a normal distribution. In the peak distribution, all of the participants had the same true value.

Every setting was executed 10 times. The average results are shown in Figure 5 for when the standard deviation of sensing errors is not considered private information. Because the MSEs measure the difference between the true number of people and the estimated number of people within each bin, the MSEs become larger as the number of participants N becomes larger. A large value of means a low privacy-protection level. Therefore, when is large, the MSEs tend to become small for all methods. Figure 6 represents the experimental results when the standard deviation of the sensing errors is considered private information. Because the standard deviation should be protected in the same way as the sensed values in this situation, the MSEs are larger than those of the results in Figure 5 . In all of the settings, the MSEs of our proposed architecture were the smallest among the three methods.

We measured the calculation time at the data collection server's side. All of the experiments were conducted on a desktop PC with an Intel i7-4770 CPU and 16 GB of RAM. The average calculation time was less than 1 s for the Laplace mechanism and for S2Mb. Our proposed ETE required 14.7 s for each simulation, on average. Although the calculation time of the proposed method is longer than those of the other methods, we believe that the time does not greatly impact the data analysis because gathering participants takes a much longer time (for example, a few days). 

We implemented our proposed PDE as a smartphone application for Android to obtain real sensing data with sensing errors and to verify the algorithm's feasibility.

Operating systems such as iOS and Android express location by latitude, longitude, and uncertainty (https://developer.apple.com/documentation/corelocation/cllocation [Accessed on 26 March 2020], https://developer.android.com/reference/android/location/Location [Accessed on 26 March 2020]). Uncertainty means a radius of a circle centered at the location's latitude and longitude, and the true location is inside the circle with 68% probability. In a normal distribution, 68% of the data fall within one standard deviation from the mean.

The smartphone was located in the same place and sensed its location along with its uncertainty 200 times. In this experiment, we considered that 200 different people were in the same place. The true distribution of locations is shown in Figure 7 . The smartphone reported its differential private location and uncertainty to the data-collection server. We evaluated the MSEs of each method. Figure 8 represents the results. The MSEs of our proposed method were much smaller than those of the other methods. Figures 9 and 10 show the example results of the histograms generated with the Laplace mechanism, S2Mb, and the proposed method. The standard deviation of the sensing errors was considered private information in Figure 10 . The histograms of Figures 9c and 10c , which were generated by our proposed architecture, are similar to the true histogram ( Figure 7) . However, the histograms generated by the Laplace mechanism and S2Mb (Figures 9a,b and 10a,b) are very different from the true histogram. Furthermore, because some of the participants were concerned about battery consumption [66] , we measured the calculation time needed for sensing the GPS and generating differential private data. The smartphone used in this experiment was a SH-M09 with a Snapdragon 845 CPU and 4 GB of RAM. The application was developed with Java. The average time spent for 10 simulations was 100.6 ms. Our PDE is efficient for smartphones, and participants do not need to worry about their smartphones' battery life. 

Crowdsensing might collect an output of a machine learning model, such as deep neural networks (DNNs). For example, each participant's device can recognize his/her activity from an accelerometer, magnetometer, and gyroscope [67, 68] and recognize surrounding people's age from pictures [69, 70] . Surrounding information, such as how many people there are and how old they are, is useful to analyze for a pandemic such as the coronavirus pandemic. For example, age is an important factor for COVID-19 [71, 72] .

The estimated values from deep neural networks might include estimation errors, and researchers such as [27] [28] [29] have reported that such estimation errors followed a normal distribution. Several machinelearning models can obtain the probability distribution of a model's estimated value. For example, the age-estimation model [73] outputs the probability for a person being each age (e.g., the probability of being 1 year old is 0.01%, the probability of being 2 years old is 0.05%, . . ., the probability of being 33 years old is 32.3%, . . .). We developed a deep neural network model that estimates a person's age from a picture, based on Reference [73] .

We assume that it does not make a big difference if the participants report sensing data or an estimated age value. This is because the estimation error of deep neural networks can be considered to follow normal distributions, much as how sensing errors follow normal distributions. We consider that not all estimation errors of deep neural networks follow normal distributions. However, several estimation errors of deep neural networks follow normal distributions, and our proposed method targets such deep neural networks. To confirm that our proposed method can be used for outputs of deep neural networks, this experiment has been conducted. Table 3 shows the architecture of the deep neural network model we constructed. All of the activation functions of layers are rectified linear units (ReLUs [74] ). The loss function was the softmax function. Because our aim is not to increase the accuracy of the deep neural network itself, the accuracy might be increased by tuning architecture or parameters.

We assumed that a crowdsensing application for each smartphone would estimate the surrounding person's age. Because the model outputs the probability distribution of age, our PDE can calculate the standard deviation of errors at each device. Figure 11 represents the probability distributions of age, which were obtained from the trained deep neural network model. These distributions can be considered as normal distributions. We used the WIKI dataset, which consists of 22,578 instances (1 GB) (https://data.vision.ee.ethz.ch/ cvl/rrothe/imdb-wiki/static/wiki_crop.tar [Accessed on 26 March 2020]). Fifty percent of the dataset was used for our prediction task, that is, we assumed that 11,289 people were the participants. The data collector estimated the true age distribution from the reports. Because each picture in WIKI dataset is labeled true age, we can evaluate the performance of Laplace mechanisms, S2Mb, and the proposal. Figure 12 summarizes the results of this experiment. In both scenarios, the MSEs of the proposal were smaller than those of the other methods in almost all settings. The true and estimated data distributions are shown in Figure 13 . The line of the proposal fits the true values' line in Figure 13a 

In this paper, we assume that the sensing campaigns assign a single sensing task for simple discussion. However, our method can also easily be used for multiple tasks.

Assume that there are two tasks. For example, the first task is collecting a noise, and the second task is collecting humidity. In this case, we assume that the aim of the data collector is to create a 3D histogram ( Figure 14) .

Each participant perturbs the two values separately by our proposed PDE method. Then, each participant reports the resulted values and the standard deviations to the data collector. The data collector constructs P 1 (i 1 , j 1 ) for the first task (noise sensing) and P 2 (i 2 , j 2 ) for the second task (humidity sensing) separately (Lines 1-20 in Algorithm 2). Here, P 1 (i 1 , j 1 ) represents the probability that the reported value of the first task is categorized into j 1 th bin when the true value of the first task is categorized into i 1 th bin in the first dimension. In the example in Figure 14 , P 1 (1, 2) represents the probability that the reported value of the noise is "Noise 2" when the true value of the noise is "Noise 1".

Assume that the number of bins for the first task is b n1 , and the number of bins for the second task is b n2 . In the example in Figure 14 , b n1 = 4 and b n2 = 5. The data collector constructs P 1,2 ([i 1 , i 2 ], [j 1 , j 2 ]) for i 1 , j 1 = 1, . . ., b n1 and i 2 , j 2 = 1, . . ., b n2 , which represents that the reported values of the first and second tasks are categorized into j 1 th and j 2 th bins, respectively, while the true values of the first and second tasks are categorized into i 1 th and i 2 th bins, respectively. In the example in Figure 14 , P 1,2 ([1, 3], [2, 1] ) represents the probability that the reported values are "Noise 2" and "Humidity 1," while the true values are "Noise 1" and "Humidity 3".

Because each sensed value is perturbed separately, we can calculate P 1 , 2([i 1 , i 2 ], [j 1 , j 2 ]) = P 1 (i 1 , j 1 )* P 2 (i 2 , j 2 ). Then, the data collector executes the iterative Bayes' technique using P 1 , 2([i 1 , i 2 ], [j 1 , j 2 ]) (Lines 21-40 in Algorithm 2). Finally, the data collector obtains each estimated number of people in each two-dimensional bin ( Figure 14) . 

Participatory sensing is growing in popularity. Differential privacy can protect a user's privacy by adding noise to a target value that must be protected. However, in participatory sensing scenarios, the target value contains sensing errors. Because existing studies do not consider the sensing errors, the accuracy of the data analysis decreases when the sensing data contain errors. In this paper, therefore, the proposed architecture can address the noise added to the sensed value. The true data might be unknown to the participants; however, our proposal estimated the participants' true data distribution with higher accuracy than existing methods by modeling the sensing error.

The proposed architecture consists of two parts. One is the anonymization technique for each participant's side (PDE). Each device perturbs its sensed data and then reports the perturbed data to the data collector. The proposed architecture also provides an estimation technique, which estimates the true data distribution based on the reported data for the data collector's side (ETE). We have proved that the PDE satisfies differential privacy. We showed that the accuracy of ETE outperformed existing studies in our experiments. Further, the calculation time of PDE with a normal smartphone was less than 1 s. Therefore, participants do not need to worry about the battery life of their smartphones.

In this paper, we target numerical data with regard to sensing data. Moreover, images can be directly sent to the data collector. In recent years, several methods of protecting images based on differential privacy have been proposed [75] . We will apply our proposal to such data in our future work. 

The authors declare no conflicts of interest.

Sensing meets mobile social networks

Accuracy Enhancement of Anomaly Localization with Participatory Sensing Vehicles

User participatory construction of open hazard data for preventing bicycle accidents

Do Monetary Incentives Influence Users' Behavior in Participatory Sensing? Sensors

Sensing the city with Instagram: Clustering geolocated data for outlier detection

INRISCO: INcident monitoRing In Smart COmmunities

Discrete Distribution Estimation under Local Privacy

Differential Private Data Collection and Analysis Based on Randomized Multiple Dummies for Untrusted Mobile Crowdsensing

Calibrating Noise to Sensitivity in Private Data Analysis

The Algorithmic Foundations of Differential Privacy

Generalized Gaussian Mechanism for Differential Privacy

High-dimensional crowdsourced data publication with local differential privacy

Adaptive Laplace Mechanism: Differential Privacy Preservation in Deep Learning

Towards Accurate Histogram Publication under Differential Privacy

Road Traffic Congestion Monitoring in Social Media with Hinge-Loss Markov Random Fields

Generalized Outlier Detection with Flexible Kernel Density Estimates

Why are Normal Distributions Normal? Br

Angle of arrival localization for wireless sensor networks

Effects of core position uncertainty on optical shape sensor accuracy

Sonar sensor models and their application to mobile robot localization

Minimizing uncertainty and improving accuracy when fusing multiple stationary GPS receivers

Evaluation of pose tracking accuracy in the first and second generations of microsoft Kinect

Permissible Area Analyses of Measurement Errors with Required Fault Diagnosability Performance

Research on Rotational Angle Measurement for the Smart Wheel Force Sensor

Performance Assessment of an Ultra Low-Cost Inertial Measurement Unit for Ground Vehicle Navigation

Adaptive Placement for Mobile Sensors in Spatial Prediction under Locational Errors

State estimation in distribution smart grids using autoencoders

Pedestrian Stride-Length Estimation Based on LSTM and Denoising Autoencoders

Efficient Discrimination and Localization of Multimodal Remote Sensing Images Using CNN-Based Prediction of Localization Uncertainty

Discovering frequent patterns in sensitive data

Differentially private transit data publication

Privacy-Enhanced and Multifunctional Health Data Aggregation under Differential Privacy Guarantees

Proceedings of the ICALP

What Can We Learn Privately?

Efficient and privacy-aware data aggregation in mobile sensing

EPPA: An Efficient and Privacy-Preserving Aggregation Scheme for Secure Smart Grid Communications

A privacy-preserving data aggregation scheme for dynamic groups in fog computing

A Framework for High-Accuracy Privacy-Preserving Mining

FRAPP: A framework for high-accuracy privacy-preserving mining

Density-Based Location Preservation for Mobile Crowdsensing with Differential Privacy

PLP: Protecting Location Privacy Against Correlation Analyze Attack in Crowdsensing

A Differential Game Model for Data Utility and Privacy-Preserving in Mobile Crowdsensing

Efficient Privacy-Preserving Aggregation for Mobile Crowdsensing

Incentivizing Crowdsensing-based Noise Monitoring with Differentially-Private Locations

Achieving k-anonymity privacy protection using generalization and suppression

An incentive mechanism for K-anonymity in LBS privacy protection based on credit mechanism

Local synthesis for disclosure limitation that satisfies probabilistic k-anonymity criterion

l-diversity: Privacy beyond k-anonymity

Privacy beyond k-anonymity and l-diversity

A greedy-proof incentive-compatible mechanism for group recruitment in mobile crowd sensing

MRA: A modified reverse auction based framework for incentive mechanisms in mobile crowdsensing systems

A Context-Aware Multiarmed Bandit Incentive Mechanism for Mobile Crowd Sensing Systems

A Misbehaving-Proof Game Theoretical Selection Approach for Mobile Crowd Sourcing

FIDC: A framework for improving data credibility in mobile crowdsensing

Quick and Accurate False Data Detection in Mobile Crowd Sensing

Privacy-Preserving Crowdsensing: Privacy Valuation, Network Effect, and Profit Maximization

A Survey on Mobile Crowdsensing Systems: Challenges, Solutions, and Opportunities

Data-Oriented Mobile Crowdsensing: A Comprehensive Survey

Quantifying user reputation scores, data trustworthiness, and user incentives in mobile crowd-sensing

Intelligent Gaming for Mobile Crowd-Sensing Participants to Acquire Trustworthy Big Data in the Internet of Things

A Secure Mobile Crowdsensing Game with Deep Reinforcement Learning

Privacy preserving OLAP

Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application

Privacy Loss in Apple's Implementation of Differential Privacy on MacOS 10.12. arXiv 2017

EffSense: A Novel Mobile Crowd-Sensing Framework for Energy-Efficient and Cost-Effective Data Uploading

Deep learning for sensor-based activity recognition: A survey

Convolutional Neural Networks for human activity recognition using mobile sensors

Deep Regression Forests for Age Estimation

Using Ranking-CNN for Age Estimation

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease-2019 (COVID-19): The epidemic and the challenges

Estimating clinical severity of COVID-19 from the transmission dynamics in Wuhan, China

DEX: Deep EXpectation of apparent age from a single image

Rectified linear units improve restricted boltzmann machines

Image pixelization with differential privacy