key: cord-191351-3wu62bao authors: Kaptchuk, Gabriel; Goldstein, Daniel G.; Hargittai, Eszter; Hofman, Jake; Redmiles, Elissa M. title: How good is good enough for COVID19 apps? The influence of benefits, accuracy, and privacy on willingness to adopt date: 2020-05-09 journal: nan DOI: nan sha: doc_id: 191351 cord_uid: 3wu62bao A growing number of contact tracing apps are being developed to complement manual contact tracing. A key question is whether users will be willing to adopt these contact tracing apps. In this work, we survey over 4,500 Americans to evaluate (1) the effect of both accuracy and privacy concerns on reported willingness to install COVID19 contact tracing apps and (2) how different groups of users weight accuracy vs. privacy. Drawing on our findings from these first two research questions, we (3) quantitatively model how the amount of public health benefit (reduction in infection rate), amount of individual benefit (true-positive detection of exposures to COVID), and degree of privacy risk in a hypothetical contact tracing app may influence American's willingness to install. Our work takes a descriptive ethics approach toward offering implications for the development of policy and app designs related to COVID19. A growing number of coronavirus (COVID19) contact tracing apps are being developed and released with the goal of tracking and reducing the spread of COVID19 [19] . These apps are designed to complement manual contact tracing efforts using location data or Bluetooth communication to automatically detect if a user may have been exposed to the virus [2, 18] . Unlike manual contact tracing, which an investigator reaches out directly to affected parties, the benefits of these apps for public health scale quadratically with participation [2] . This is because both data collection and data distribution are part of the apps operation. Thus, it is critical to understand that factors that determine if people will be willing to adopt these apps. There are a large number of considerations that may influence users willingness to adopt [12] . For example, a person may weigh the features the app's offer, the app's benefits to themselves and their community [17] , the provider offering the app [7] , how well the app will preserve the user's privacy [2, 18] , and the app's accuracy. Understanding the impact of each of these factors can help app developers make design decisions that can maximize their impact. Drawing from the idea of descriptive ethics as a more fair approach to setting societal norms [4, 5] , in this work we use surveys to evaluate how well COVID19 apps need to function for users to be willing to adopt the apps. We present the results of a series of surveys of a total of 4,615 Americans, sampled using both crowd-sourcing and online survey panels that can satisfy census-representative demographic quotas. There are many ways to measure how well a COVID19 app works. From a public health perspective, reducing infection rate (i.e., basic reproduction number) is a key measurement of success. However, in order to understand individual's choices when it comes to adopting an app, we must also consider what it means for a COVID19 app to work well for the individual who has the app installed. Thus, we examine not only the societal-level, public health benefit of infection rate reduction, but also how app accuracy and app privacy, risk of the app exposing information collected by the app to others, influence reported willingness to adopt a COVID19 app. Within app accuracy, we consider both false negatives, the app failing to detect an exposure to COVID19, and false positives, the app falsely notifying the user that they were exposed when they were not. Understanding how rates of app failure may influence adoption allow us to estimate user response to potential app designs. For example, Saxena et. al. [16] have shown that using Bluetooth as a method for detecting proximity may be innately error prone, estimating an approximate error rate of 7-15% (including both false positive and false negative rates). Our results allow us to estimate user response to such error rates. In summary, we address the following three research questions: (RQ1) Do both accuracy (precision and recall) & privacy influence whether people want to install a COVID app? (RQ2) Do different types of people weight accuracy or privacy more heavily? (RQ3) How much public health benefit, accuracy, and/or privacy is necessary for people to want to adopt COVID19 apps? We find that: • Between 70-80% of Americans report being willing to install an app that is "perfectly" private and/or accurate, a significant increase from the 50-60% who are willing to install an app with unspecified privacy or accuracy [7, 20] . • False negatives have a significantly stronger influence on reported willingness than false positives or privacy risks. • Reported willingness to install correlates with the public health benefit and/or personal health benefit of a contact tracing app. Specifically, the majority of Americans report being willing to install an app that offers at least a 50% improvement in public health or in personal safety over the baseline rate offered when not using the app. We conducted a series of surveys to answer our research questions. In this section we discuss our questionnaires, questionnaire validation, sampling approaches, analysis approaches, and the limitations of our work. All studies were IRB approved by the Microsoft Research IRB, a federally recognized ethics review board. In this first survey we sought to understand how accuracy and/or privacy considerations might influence willingness to adopt (RQ1) and how respondent demographics and experiences might affect the relative weight of these considerations (RQ2). We used a vignette survey [1] to examine these questions as vignette surveys are known to maximize external validity. Questionnaire. Our questions were framed around a contact-tracing app scenario. Half of the respondents were placed in the proximity contact tracing scenario while the other half were placed in the location scenario. The proximity scenario was phrased as: Imagine that there is a mobile phone app intended to help combat the coronavirus. This app will collect information about who you have been near (within 6 feet), without revealing their identities. The app will use this information to alert you if you have been near someone who was diagnosed with coronavirus. If you decide to inform the app that you have been diagnosed with coronavirus, the app will inform those you've been near that they are at risk, without revealing your identity. The location scenario was phrased as: Imagine that there is a mobile phone app intended to help combat the coronavirus. This app will collect information about your location. The app will use this information to notify you, without revealing anyone's identity: • if you have been near someone who tested positive for coronavirus • about locations near you that were recently visited by people who tested positive for coronavirus If you decide to report to the app that you have been diagnosed with coronavirus, the app will inform those you've been near that they are at risk without revealing your identity. Participants were then routed to a set of control questions or a set of experimental questions regarding accruacy and privacy (in randomized order). All participants were asked "Would you install this app?" after a given question, with answer choices "Yes", "No", and "It depends on the [risk, chance of information being revealed, etc.]". Control. We had three control conditions (respondents saw only one of these three questions). Perfect accuracy: Imagine that this app will work perfectly. It will never fail to notify you when you are at risk nor will it ever incorrectly notify you when you are not at risk. Perfect privacy: Imagine that this app perfectly protects your privacy. It will never reveal any information about you to the US government, to a tech company, to your employer, or to anyone else. Perfect accuracy and privacy: Imagine that this app works perfectly and protects your privacy perfectly. It will never fail to notify you when you are at risk nor will it ever incorrectly notify you when you are not at risk. It will also never reveal any information about you to the US government, to a tech company, to your employer, or to anyone else. Experimental. These participants were asked about accuracy and privacy, in randomized order. Imagine that this app occasionally fails to notify you when you have been near someone who was diagnosed with coronavirus. Accuracy (false positives): Imagine that this app occasionally notifies you that you have been near someone who has coronavirus when you actually have not been exposed. Privacy: Imagine that this app might reveal information about [who you have been near/your location] to [entity] . This information may be used for a purpose unrelated to the fight against coronavirus. We asked about four entities, drawn from the list of 10 examined by Redmiles and Hargittai [7] : "non-profit institutions verified by the government", "technology companies", "the US government", and "your employer". Validation. The questionnaire design was validated through expert reviews with multiple external researchers. Additionally, three attention check questions were included, one general attention check and two scenario-specific attention checks that ensured respondents understood the scenarios described. Sample. 789 Americans answered our survey. The sample was quota sampled by Cint to be representative of the US population demographics on age, gender, income, and race. Analysis. We answered RQ1 using X 2 proportion tests to compare responses to our different sets of questions. We answered RQ2 by constructing two mixed effects binomial logistic regression models. In both models, our dependent variable was willingness to install the app, with "Yes" and "It depends on the risk" grouped together as a positive outcome and "No" was treated as a negative outcome. We model responses to the accuracy and privacy questions separately, controlling for data type and entity, in the privacy model, and both data and accuracy type in the accuracy model. We included as dependent variables the respondents' age, gender, race, internet skill (as measured using the Web Use Skill Index [6] ), level of educational attainment, party affiliation, and if they know someone who died due to complications from COVID19. Finally, we include a mixed effects term to account for our within subjects design. In this survey we sought to evaluate how people's reported willingness to install coronavirus apps correlates with the amount of public health (infection rate reduction) and individual health (notification of at risk status -e.g., accuracy) benefit of a hypothetical coronavirus tracking app. Questionnaire. All questions, except one control condition (FN app control, addressed below), were asked in the context of the following scenario. As the type of information compromised, as well as the entity that could compromise the information had relatively little effect on willingness to install in our first survey (see Section ??), we consider only proximity-based data in this scenario. Future work may wish to replicate these results for location information. Please consider the following scenario. Imagine that public health workers will notify you if they are able to determine that you have recently been near (within 6 feet) someone who was diagnosed with coronavirus. • You do not have to do anything in order for the public health workers to monitor whether you have recently been near someone diagnosed with coronavirus. • However, the public health workers are not aware of every time you are near someone diagnosed with coronavirus. Imagine that there is also a mobile phone app available that will alert you if you have been within 6 feet of someone diagnosed with coronavirus. • The app will do this by collecting information about who you have been within 6 feet of (who you have been "near"). • The app will not reveal the identity of the people you have been near. Participants were then assigned to one of the branches in Table 2 No information in this survey was expressed in terms of percentages, due to a plethora of research in health risk and numeracy showing that people interpret rates far more accurately than percentages [9, 15] . Below we describe exactly how each of the questions referenced in Table 2 .2 was asked in our survey. Implicit privacy. Pilot tests of our survey revealed that people had implicit privacy perceptions of the app described, which were influencing their willingness to adopt the apps. We used a modified version of the Paling Perspective scale [10] -a well validated tool for eliciting health risk perception -to assess respondents' perception of the likelihood that information collected by this app would be compromised 1 . This measurement allows us to (a) report on people's perceptions of the likelihood that information from a coronavirus app will be compromised, (b) control for the effect of differing implicit privacy perceptions on willingness to install, and (c) validate the influence of these privacy perceptions by comparing willingness to install given an implicit privacy perception vs. an explicit one that we set by telling the participant the risk their privacy will be compromised (described in the next section). The question we used to assess implicit privacy belief was: Studies show that despite best attempts to protect the data of those who use this app, some people may have information about who they have been near compromised and used for purposes other than the fight against coronavirus. Please indicate on the chart below how many app users you think will have this information compromised over the next year. Explicit Privacy. In order to understand how different privacy risks impacted respondents' reported willingness to install coronavirus apps we asked some participants about their willingness to install in the context of explicitly known (as opposed to implicitly perceived, as aforementioned) privacy risks. We asked about explicit risk using the following question: Studies show that despite best attempts to protect the data of those who use this app, some people may have information about who they have been near compromised and used for purposes other than the fight against coronavirus. X out of 1000 people who use this app will have this information compromised. We also asked all participants in this branch the false negative question (below) in order to be able to cross-validate the impact of explicit declaration of privacy risk vs. the effect of implicit perception of that risk on willingness to install. As it would not make much sense to assess implicit risk and then ask respondents whether they would install given their implicit perception of risk, privacy questions need to be paired with a benefit question. We chose to make our comparison using false negative questions since the results of survey 1 showed that false negatives were equally as important in users' consideration of whether to install as was privacy. False negative. We asked respondents whether they would be willing to install an app that could detect N out of 100 exposures to coronavirus compared to manual contact tracing, which could detect exposures a baseline number of times: 1 out of 100. The question was phrased as follows: Imagine that you are exposed to someone who has coronavirus 100 times over the next year. If you do not use the app, 1 out of 100 times public health workers will be able to detect and notify you that you were exposed. If you use the app, FN out of 100 times the app will be able to detect and notify that you that you were exposed. To compare willingness to install a 1% effective app as a baseline, we also had a FN control condition. This condition consisted of a scenario that did not describe manual contact tracing, but just described the app (in the same way as above), which respondents were told could detect 1 in 100 exposures (the same as the manual contact tracing option offered in the other conditions). False positive. We asked respondents whether they would be willing to install an app that detected all exposures to coronavirus 2 , but had N out of 100 additional false negatives. The question was phrased as follows: Imagine that you are exposed to someone who has coronavirus 100 times over the next year. If you do not use the app, 1 out of 100 times public health workers will be able to detect and notify you that you were exposed. The app is not perfect. If you use the app, the app will correctly notify you every time that you were exposed (100 out of 100 times). The app will also incorrectly notify you an additional FP times, when you were not actually exposed. Public Health Benefit. Finally, some respondents were assigned to a branch that evaluated how reduction in infection rate among app users would influence people's willingness to install an app. We chose 3% as the baseline infection rate without app use as this is the currently estimated U.S. infection rate by the IHME [3] . Studies show that 30 out of 1000 people who do not use the app will be infected with coronavirus in the next year. H out of 1000 people who use the app will be infected with coronavirus in the next year. Validation. The questionnaire design was validated through expert reviews with multiple external researchers. Additionally, three attention check questions were included, one general and two scenario-specific, as in Survey 1. Sample. 3,826 Amazon Mechanical Turk workers responded to our survey. These workers were split into different survey branches, as aforementioned, so all results sections note the number of respondents used in a particular analysis. Analysis. We analyze the data obtained in this survey descriptively, through data visualization, and using binomial logistic regression analysis: with willingness to install as the dependent variable and the dependent variables of the varied factor (e.g., chance of FN) and perceived implicit privacy risk. To evaluate the impact of privacy on decision making we use a X 2 proportion test to compare the proportion of respondents willing to install given some FN rate in the implicit and explicit privacy conditions. As with all surveys, the answers represented in these results are people's self-reported intentions regarding how to behave. As shown in prior literature on security, these intentions are likely to align directionally with actual behavior, but are likely to over-estimate actual behavior [14] . The goal of this work is to show how willingness to adopt may be influeced by privacy/accuracy considerations, and thus the precise numeric estimates should not be interpreted as precise adoption estimates. Additionally, regarding the RQ3 survey, there are always concerns about the generalizability of crowdsourced results. To address these concerns, we also conducted the RQ1,RQ2 survey on Amazon Mechanical Turk. We found only one significant difference (with small effect size) in the AMT results as compared to the online survey panel results. Due to the quantitative nature of the RQ3 survey and the sample size required, and our comfort in the relatively representative nature of AMT results on this particular topic verified by our RQ1, RQ2 comparison as well as prior work on the generalizability of AMT results in security and privacy surveys [13] , we chose to proceed with AMT for RQ3. In this section, we detail our findings. For those who prefer a swifter visual summary, please see http://www.cs.umd. edu/~eredmiles/how-good-good-enough.pdf. The results of our first survey, shown in Figure 1 illustrate that both accuracy and privacy do indeed significantly (X 2 tests in comparison to the control conditions, p < 0.05, Bonferroni-Holm multiple testing correction (BH correction)). We find that respondents did not significantly differentiate between perfect privacy vs. perfect accuracy (X 2 prop. test, p=0.178, BH correction), perfect accuracy vs. both perfect accuracy and privacy (X 2 prop. test, p=0.670, BH correction), of perfect privacy vs. both perfect accuracy and privacy (X 2 prop. test, p=0.069, BH correction). On the other hand, respondents were 8% less likely to install an app with false negatives, regardless of the number of the FN rate, than one with false positives, regardless of the FP rate (X 2 prop. test, p=0.003, BH correction). Respondents were similarly less likely to install an app with false negatives, regardless of the FN rate, than one with privacy leaks to any of the entities examined (X 2 prop. test, p=0.003, BH correction). Respondents were equally as likely to install an app with false positives as one with privacy leaks (X 2 prop. test, p=0.609, BH correction). Respondents were more likely to say that their decision to install would depend on the risk of false positives (31%) or false negatives (32%) than the risk of a privacy leak (17% across entities). Finally, respondents' reported willingness to install did not significantly differ (X 2 prop. tests, p>0.05, BH correction) based on what data the app might leak to a particular entity, except for hypothetical leaks to the respondents employer ( Figure 2 ). Only 23% of respondents were willing to install an app that might leak their locations to their employer while 31% were willing to install an app that might leak information about who they have been near (their proximity data) to their employer. The next section provides regression comparisons of willingness to install based on the entity to which the information was leaked, and also control for data type differences (finding no significant differences). In order to examine whether some American's weighed accuracy or privacy considerations more highly than others, we constructed two mixed effects logistic regression models as described in Section 2. Table 2 : Mixed effects logistic regression model of willingness to install apps with accuracy errors. Question baseline is FN, data baseline is location, political leaning baseline is Republican, mixed effects term controls for within-subjects design. We find that those who know someone who died from COVID19 are over 5× as likely as those who do not to be willing to install an app that has errors in accuracy. Additionally, we validate that even when controlling for demographic variance, respondents are more comfortable with false positives than false negatives: respondents are 65% more likely to report that they would install an app with false positives than one with false negatives. Respondents were more comfortable installing an app with potential privacy leaks to a non-profit organization verified by the government an app with potential leaks to any other entity (their employer, a technology company, or the U.S. government). Those who identify as Democrats are nearly 3× as likely as those who identify as Republican to be willing to install an app with privacy risks. Finally, those who are younger and women are less likely to report that they would install an app with privacy errors. This gender finding aligns with past work showing that women may be more privacy sensitive than men [8, 11] . Finally, those who have higher internet skill are more willing to install an app that has either errors in accuracy or privacy leaks, likely because those with higher skill are more likely to install COVID19 apps in general [7] . In the findings above, we validate that the individual considerations of accuracy and privacy both impact reported willingness to install. In our second survey we examine whether we can model how the quantitative amount of public health (i.e., infection rate reduction) and individual benefit (i.e., FN and FP rates) influences willingness to install. Figure 3 provides an overview of these findings. To examine the relationship between amount of benefit and willingness to install beyond visual inspection, we construct logistic regression models. We find that, for every 1% reduction to infection rate offered by the app, respondents are 5% more likely to report that they would install 3.4 Implicitly perceived risk of privacy leak in COVID apps influences willingness to adopt; risk of COVID app privacy leak perceived by respondents as between 0.01% -0.001% In our second survey, we not only measured willingness to install based on amount of benefit but we also measured implicit privacy risk perception. In this section we briefly summarize how likely respondents thought it was that information from a COVID19 contact tracing app would be leaked and we confirm the results of survey one: that privacy risk, even when unmentioned, influences willingness to adopt a COVID19 app. Figure 4 summarizes respondents implicit perceptions of the risk of a privacy leak of COVID19 app information in the next year. The median respondent (n=1,610) reported perceived the risk of a privacy leak of app information in the next year as between 0.01 and 0.001%, equivalent to the annual risk of an American having unattended property stolen. 82% of respondents reported perceiving the risk as between 0.1% and 0.00001%. Finally, we compare the proportion of respondents who were willing to install a COVID19 app given an explicit statement of privacy risk (privacy risks were drawn from the portion of the implicit risk distribution reported by the majority of respondents) vs. their own implicit perception. We find no significant different between the proportion of respondents who were willing to install an app with a given false negative rate when relying on their own implicit privacy assumption and the proportion who were willing to install given an explicit statement of the risk of privacy leak. A regression model of willingness to install in the explicit condition finds a significant relationship between risk perception and willingness to install (O.R.: 1.80, 95% CI: [1.31, 2.44], p < 0.001). This lends support for our implicit privacy risk measurements and suggests that these implicit risk perceptions affect willingness to install similarly to explicit risk statements. Finally, further confirming the relevance of all three components studied in this paper -benefits, accuracy, and privacyin users' consideration of whether to install, when we add implicit privacy risk as a factor to the regression models for willingness to install dependent on public health and individual benefit, we find that it is significant in all three models. Experimental vignette studies in survey research Privacy sensitive protocols and mechanisms for mobile contact tracing Infection fatality rate -a critical missing piece for managing covid-19 The definition of morality Human perceptions of fairness in algorithmic decision making: A case study of criminal risk prediction An update on survey measures of web-oriented digital literacy Will americans be willing to install covid-19 tracking apps? -scientific american blog network Gender differences in privacy-related measures for young adult facebook users Effect of risk communication formats on risk perception depending on numeracy Strategies to help patients understand risks Net benefits: Digital inequities in social capital, privacy preservation, and digital parenting practices of us social media users User concerns & tradeoffs in technology-facilitated contact tracing How well do my results generalize? comparing security and privacy survey results from mturk, web, and telephone samples Asking for a friend: Evaluating response biases in security user studies To put that in perspective: Generating analogies that make numbers easier to understand Smartphone-based automated contact tracing: Is it possible to balance privacy Covid-19 contact tracing and privacy: Studying opinion and preferences Decentralized privacy-preserving proximity tracing A scramble for virus apps that do no harm -the new york times With thanks to Eric Horvitz for the idea of investigating quantiative tradeoffs in public benefit, accuracy, and privacy. With thanks to Cormac Herley and Carmela Troncoso for survey feedback and general contact tracing conversations that contributed to this paper. This work was funded by Microsoft Research.