key: cord-0782569-6q9a9jz0
authors: Allen, William E.; Altae-Tran, Han; Briggs, James; Jin, Xin; McGee, Glen; Shi, Andy; Raghavan, Rumya; Kamariza, Mireille; Nova, Nicole; Pereta, Albert; Danford, Chris; Kamel, Amine; Gothe, Patrik; Milam, Evrhet; Aurambault, Jean; Primke, Thorben; Li, Weijie; Inkenbrandt, Josh; Huynh, Tuan; Chen, Evan; Lee, Christina; Croatto, Michael; Bentley, Helen; Lu, Wendy; Murray, Robert; Travassos, Mark; Coull, Brent A.; Openshaw, John; Greene, Casey S.; Shalem, Ophir; King, Gary; Probasco, Ryan; Cheng, David R.; Silbermann, Ben; Zhang, Feng; Lin, Xihong
title: Population-scale Longitudinal Mapping of COVID-19 Symptoms, Behavior, and Testing
date: 2020-08-26
journal: Nat Hum Behav
DOI: 10.1038/s41562-020-00944-2
sha: 461756bfd48cfbc75d96231ebf30c39a7478d255
doc_id: 782569
cord_uid: 6q9a9jz0

Despite the widespread implementation of public health measures, COVID-19 continues to spread in the United States. To facilitate an agile response to the pandemic, we developed How We Feel, a web and mobile application that collects longitudinal self-reported survey responses on health, behavior, and demographics. Here we report results from over 500,000 users in the United States from April 2, 2020 to May 12, 2020. We show that self-reported surveys can be used to build predictive models to identify likely COVID-19 positive individuals. We find evidence among our users for asymptomatic or presymptomatic presentation, show a variety of exposure, occupation, and demographic risk factors for COVID-19 beyond symptoms, reveal factors for which users have been SARS-CoV-2 PCR tested, and highlight the temporal dynamics of symptoms and self-isolation behavior. These results highlight the utility of collecting a diverse set of symptomatic, demographic, exposure, and behavioral self-reported data to fight the COVID-19 pandemic.

The rapid global spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the novel virus causing coronavirus disease 2019 (COVID-19) 1-3 , has created an unprecedented public health emergency. In the United States, efforts to slow the spread of disease have included, to varying extents, social distancing, home-quarantine and treating infected patients, mandatory facial covering, closure of schools and non-essential businesses, and testing-trace-isolate measures 4, 5 . The COVID-19 pandemic and ensuing response has produced a concurrent economic crisis of a scale not seen for nearly a century 6 , exacerbating the effect of the pandemic on different socioeconomic groups and producing adverse health outcomes beyond COVID-19. As a result, there is currently intense pressure to safely wind down these measures. Yet, in spite of widespread lockdowns and social distancing throughout the US, many states continue to exhibit steady increases in the number of cases 7 .

In order to understand where and why the disease continues to spread, there is a pressing need for real-time individual-level data on COVID-19 infections and tests, as well as on the behavior, exposure, and demographics of individuals at the population scale with granular location information. These data will allow medical professionals, public health officials, and policy makers to understand the effects of the pandemic on society, tailor intervention measures, efficiently allocate testing resources, and address disparities.

One approach to collecting this type of data on a population scale is to use web-and mobilephone based surveys that enable large-scale collection of self-reported data. Previous studies, such as FluNearYou, have demonstrated the potential for using online surveys for disease surveillance 8 . Since the start of the COVID-19 pandemic, several different applications have been launched throughout the world to collect COVID-19 symptoms, testing, and contact-tracing information 9 . Studies in the US and Canada (CovidNearYou 10, 11 ), UK (Covid Symptom Study 12, 13 , also in US) and Israel (PredictCorona 14 ) , have reported large cohorts of users drawn from the general population with a goal towards capturing information about COVID-19 along a variety of dimensions, from symptoms to behavior, and have demonstrated some ability to detect and predict the spread of disease [12] [13] [14] . This field has rapidly evolved since the beginning of the pandemic, with many analyses of these datasets focusing on COVID-19 diagnostics (i.e., symptoms, test results, medical background) 11 , care-seeking 15 , contact-tracing 16 , patient care 17 , effects on healthcare workers 18 , hospital attendance 19 , cancer 20 , primary care 21 , clinical symptoms 22 , and triage 23 . Here we perform a comprehensive analysis of a new source of COVID-19-related information spanning diagnostic and behavioral factors sampled from the general population during the beginning of the pandemic in the United States. We consider exposure, demographic, and behavioral factors that affect the chain of transmission, understand the factors for who have been tested, and study the degree of presence of asymptomatic, presymptomatic, mildly symptomatic cases 24 .

To overcome these limitations, we developed How We Feel (HWF, http:// www.howwefeel.org) (Fig. 1a-d) , a web and mobile-phone application for collecting deidentified self-reported COVID-19-related data. Rather than targeting suspected COVID-19 patients or existing study cohorts, HWF aims to collect data from users representing the population at large. By drawing from a large user base across the US that learn about the study through word of mouth and government partnerships, these results are complementary to other studies such as the Covid Symptom Study and CovidNearYou that also include sizable US populations and are targeted towards the general public. Users are asked to share information on demographics (gender, age, race/ethnicity, household structure, ZIP code), COVID-19 exposure, and pre-existing medical conditions. They then self-report daily how they feel (well or not well), any symptoms they may be experiencing, test results, behavior (e.g., use of face coverings), and sentiment (e.g., feeling safe to go to work) (Fig. 1c , Extended Data Fig. 1 ). To protect privacy, users are not identifiable beyond a randomlygenerated number that links repeated logins on the same device. A key feature of the app is the ability to rapidly release revised versions of the survey as the pandemic evolves. In the first month of operation, we released three iterations of the survey with increasingly expanded sets of questions (Fig. 1b) .

We find symptomatic subjects and health care workers and essential workers are more likely to be tested. Due to asymptomatic and mildly symptomatic individuals and heterogeneous symptom presentation, our results show that commonly used symptoms may not be sufficient criteria for evaluating COVID-19 infection. Further, we find that exposure both outside and within the household are major risk factors for users testing positive and build a predictive model to identify likely COVID positive users. African-American users, Hispanic/ Latinx users, and health care workers and essential workers are at a higher risk of infection, after accounting for the effects of pre-existing medical conditions. Finally, we find that even at the height of lockdowns throughout the U.S., the majority of users were leaving their homes, and a large fraction were not engaging in social distancing or face protection.

The app was launched on April 2, 2020 in the United States. As of May 12, 2020, the app had 502,731 users in the United States, with 3,661,716 total responses (Fig. 1b) (Supplementary Table 1 ). 74% of users responded on multiple days, with an average of 7 responses per user (Extended Data Fig. 2 ). Each day, ~5% of users who accessed the app reported feeling unwell (Fig. 1b) . The user base was distributed across all 50 states and several US territories, with the largest numbers of users in more populous states such as California, Texas, Florida, and New York (Fig. 1d) . Connecticut had the largest number of users per state, as the result of a partnership with the Connecticut state government (Fig. 1d ). Users were required to be 18 years of age or older and were 42 years old on average (mean: 42.0; SD: 16.3), including 18.4% in the bracket of 60+, which has experienced the highest mortality rate from COVID-19 ( Fig. 1e) 25, 26 . Users were primarily female (82.7%) (Fig. 1f) and white (75.5%, excluding 20.3% with missing data) (Fig. 1g) . Although the survey ran from April 2 through May 12, users could report test results from earlier than April 2.

A major ongoing problem in the US is the overall lack of testing across the country 27 and disparities in test accessibility, infection rates, and mortality rates in different regions and communities 28, 29 . In the absence of population-scale testing, it will be critical during a reopening to allocate limited testing resources to the groups or individuals most likely to be infected in order to track the spread of disease and break the chain of infection. We therefore first examined who in our userbase is currently receiving testing. We analyzed 4,759 users who took the Version 3 (V3) survey and who were PCR tested for SARS-CoV-2 (out of 272,392 total users) (Fig. 2a, Extended Data Fig. 3a ). Of these, 8.8% were PCR positive. The number of tests reported by test date displays a similar trend to the estimated number of tests across the US, suggesting that our sampling captures the increase in test availability (Fig. 2a) . The number of PCR tests per HWF user is highly correlated with external estimates of per-capita tests by state (Fig. 2b, Extended Data Fig. 3b , Pearson correlation 0.77) 30 .

We first examined via logistic regression which factors either collected in the survey or inferred from US Census data by user ZIP code were associated with receiving a SARS-CoV-2 PCR test, regardless of test result. As expected, we observed that a higher fraction of tested users from states with higher per-capita test numbers, according to the COVID Tracking Project 30 (Extended Data Fig. 3b ). Healthcare workers (OR: 2.94; 95% CI: [2.75, 3.15] ; p <0.001) and other essential workers (OR: 1.39; 95% CI: [1.28, 1.52] ; p<0.001) were more likely to have received a PCR test compared to users who did not report those professions (Fig. 2c) . Users who reported experiencing fever, cough, or loss of taste/smell (among other symptoms) had higher odds of being tested compared to users who never reported these symptoms (Fig. 2c) . The majority of these symptoms are listed as common for COVID-19 cases by the Centers for Disease Control and Prevention (CDC) (Fig. 2c , starred) 31 . A less common symptom, reporting a tight feeling in one's chest, was also associated with receiving a PCR-based test (OR: 2.27, 95% CI: [1.93, 2.66]; p<0.001). These results suggest that the most commonly reported symptoms are being used as screening criteria for determining who receives a test, potentially missing asymptomatic and mildly symptomatic individuals. This group could include those who are at high risk for infection but do not meet the testing eligibility criteria.

To obtain a global view of self-reported symptom patterns, we applied an unsupervised manifold learning algorithm to visualize how symptoms were correlated across users (Methods). As expected, we found that symptom presentation separated broadly by feeling well versus feeling unwell (Fig. 2d , Extended Data Fig. 4 ). Users who felt unwell were concentrated in a single cluster indicating similar overall symptom profiles, which was characterized by high proportions of common COVID-19 symptoms as defined by the CDC 31 (Fig. 2e) , and contained the vast majority of responses from users with both positive (+) and negative (−) SARS-CoV-2 PCR tests (Fig. 2f) . Thus COVID-19 symptoms tend to overlap with symptoms for other diseases and do not necessarily predict positive test results.

This overlap suggests that commonly used symptoms may not be sufficient criteria for evaluating COVID-19 infection. It has previously been reported that many people infected with SARS-CoV-2 are asymptomatic, mildly symptomatic, or in the presymptomatic phase of their presentation [32] [33] [34] and therefore unaware that they are infected. In our dataset, on the day of their test, most users (73%) that tested PCR positive for SARS-CoV-2 reported feeling unwell with the common symptoms listed by the CDC (dry cough, shortness of breath, chills/shaking, fever, muscle/joint pain, sore throat, loss of taste/smell). However, 11.5% of positive users reported feeling unwell and exclusively reported symptoms not listed as common for COVID-19 by the CDC on the day of their test and, and 15.4% reported feeling no symptoms at all (Fig. 2g) . Because of the commonly used symptom and occupation based screening criteria for receiving a PCR test and under-testing, this total of 36.9% likely underestimates the true fraction of asymptomatic, presymptomatic, and mildly symptomatic cases, which in Wuhan, China was estimated to be ~87% 24 , and in US was estimated to be >80%. A large number of asymptomatic cases were also observed in serological studies 35, 36 . 48.9% of users testing negative for SARS-CoV-2 reported feeling unwell with most common COVID-19 symptoms, compared to an expected false negative rate of 20-30% for PCR-based tests of symptomatic patients 37 , again suggesting symptom presentation overlap with other diseases (Fig. 2g) .

We investigated the symptoms that were most predictive of COVID-19 by exploring the distribution and dynamics of symptoms in PCR test (+) and (−) users around the test date. PCR test (+) users reported higher rate of common COVID-19 symptoms, including dry cough, fever, loss of appetite, and loss of taste and/or smell, than PCR test (−) users (Fig.  2h ). Many PCR-tested users longitudinally reported symptoms in the app in an interval extending ±2 weeks from their test date (Extended Data Fig. 5 ). We used these data to examine the time course of symptoms among those who tested positive. In the days preceding a test, dry cough, muscle pain, and nasal congestion were among the most commonly reported symptoms. Reported symptoms peaked in the week following a test and declined thereafter (Fig. 2i) . Taking the ratio of the symptom rates at each point in time between PCR test(+) and (−) users showed that the most distinguishing feature in users who tested positive was loss of taste and/or smell, as has been previously reported 13 (Fig. 2j) .

We next investigated medical and demographic factors associated with testing PCR positive for acute SARS-CoV2 infection, focusing on 3,829 users who took the V3 survey within ±2 weeks of their reported PCR test date (315 positive, 3,514 negative) (Fig. 3a , Supplementary Tables 2-6). These users are a subset of all the users who reported taking a test in the V3 survey, as some reported test results were outside this time window. To correct for selection bias of receiving a PCR test when studying the risk factors of a positive test result, we incorporated probability of receiving PCR tests as inverse probability weights (IPW) into our logistic model of PCR test result status (+/-) (Methods) 38 . As with the analysis of who received a test, the reported symptoms, loss of taste and/or smell was most strongly associated with a positive test result (OR: 33 However, we note that this result is based on a small sample of 48 pregnant women included in this analysis (9 test-positive, 39 test-negative) and is unstable, subject to potentially high selection bias. Performing this analysis with and without correction for selection bias produced similar results (Fig. 3a) . As a further sensitivity analysis, we reran the analyses excluding users from the states CA and CT, the state containing most users (Extended Data Figure 7a ), and correcting for broader demographic differences using US Census Data (Extended Data Figure 7b ), both obtaining similar results to the uncorrected model in both cases. Finally, we performed Firth-corrected logistic regression to check for bias in our testing model related to the large fraction of users testing negative, and obtained similar results to our uncorrected model (Extended Data Figure 8 ).

Motivated by previous studies that reported high cluster transmissions occurred in families in China, Korea, and Japan 39-41 , we explored household and community exposures as risk factors for users testing PCR positive. The odds of testing positive were much higher for those who reported within-household exposure to someone with confirmed COVID-19 than for those who reported no exposure at all (Methods) (OR: 19.10, 95% CI: [12.30, 30.51] ; p < 0.001) (Fig. 3a , Supplementary Table 5 ). This is stronger than comparing the odds of positive among those who reported exposure outside their household versus no exposure at all (OR: 3.61, 95% CI: [2.54, 5.18]; p < 0.001). Further, the odds of testing PCR positive are much higher for those exposed in the household versus exposed outside their household or not exposed at all, after adjusting for similar factors (OR: 10.3, 95% CI: [6.7, 15.8]; p <0.001) (Supplementary Table 10 ). These results are consistent with previous findings that indicate a very high relative risk associated with within-household infection 40, [42] [43] [44] [45] . This is compatible with finding that other closed areas with high levels of congregation and close proximity, such as churches 46 , food processing plants 47 , and nursing homes 48 , have shown similarly high risk of transmission.

Developing models to predict who is likely to be SARS-CoV-2(+) from self-reported data has been proposed as a means to help overcome testing limitations and identify disease hotspots 13, 14 . We used data from the 3,829 users who used the app within ±2 weeks of their reported PCR test results to develop a set of prediction models that were able to distinguish positive and negative results with a high degree of predictive accuracy on cross-validated data (Fig. 3b) . We used the machine learning method XGBoost, which outperformed other classification methods (Extended Data Fig. 6 ). For each user, we predicted their test results using either data before the test ("Pre-test"), which would be most useful in predicting COVID-19 cases in the absence of molecular testing, and using data before and after the test ("All data") as a benchmark for the best possible prediction we could make using all available data. We considered: (1) a symptoms-only model, which included only the most common COVID-19 symptoms listed by the CDC; (2) an expanded model, which further incorporated other features observed in the survey; and (3) a minimal-features model, which retained only the four most predictive features (loss of taste and/or smell, exposure to someone with COVID-19, exposure in the household to someone with confirmed COVID-19, and exposure to household members with COVID-19 symptoms) (Methods ,  Supplementary Tables 11-14) . The symptoms-only model achieved a cross validated AUC (area under the ROC curve) of 0.76 using data before and after a test, and AUC 0.69 using just the pre-test data. Expanding the set of features to include other survey questions substantially improved performance (cross-validated AUC 0.92 all data, 0.79 pre-test). In the minimal-features model, we were able to retain high accuracy (cross-validated AUC 0.87 all data, AUC 0.80 pre-test) despite only including 4 questions, one of which was a symptom and three referring to potential contact with known infected individuals. Restricting the observed inputs to the 1,613 users (89 positive, 1,524 negative) who answered the survey in the 14 days prior to being tested limited the sample size and reduced the overall accuracy, but the relative performance of the models was similar (Fig. 3b) .

The fact that a fraction of SARS-CoV-2(+) users report no symptoms or only less common symptoms (Fig. 2g ) raises the possibility that many infected users might behave in ways that could spread disease, such as leaving home while unaware that they are infectious. In spite of widespread shelter-in-place orders during the sample period, we found extensive heterogeneity across the US in the fraction of users reporting leaving home each day, with 61% of the responses from April 24 -May 12 indicating the user had left home that day (Fig. 4a) . The majority (77%) of these users reported leaving for non-work reasons, including exercising; 19% left for work (Fig. 4b) . Of people who left home, a majority but not all users reported social distancing and using face protection (Fig. 4c ). Different states had persistently different levels of people wearing masks and leaving home (Extended Data Figure 9 ). This incomplete shutdown with partial adherence, and lack of total social and physical protective measures, coupled with insufficient isolation of infected cases, may contribute to continued disease spread.

Given the large number of people leaving home each day, it is important to understand the behavior of people who are potentially infectious and therefore likely to spread SARS-CoV-2. To this end, we further analyzed the behavior of people both reporting to be PCR test (+) or (−). There was an abrupt large increase in users reporting staying home after receiving a positive test result (Fig. 4d,e) . Many, but not all, PCR test(+) users reported staying home in the 2-7 days after their test date (7% still went to work, N=14 out of 203 users), whereas 23% (N=62,483 out of 269,833 users) of untested and 26% (N=664 out of 2, 533 users) of PCR test(−) left for work (Fig. 4d,e) . Similarly, 3% of PCR test(+) (N=7 out of 203 users) users reported going to work without a mask, in contrast with untested (12.7%, N=34,481 out of 269,833 users) and PCR test(−) (10%, N=255 out of 2,533 users) users (Fig. 4f) . Positive individuals reported coming into close contact with a median of 1 individual over 3 days in contrast to individuals who tested negative or were untested, who typically came in close contact with a median of 4 people within 3 days (Fig. 4g ). Regression analysis suggested that healthcare workers (OR: 9.3, 95% CI: [7.3, 11.8]; p <0.001 ) and other essential workers (OR: 6.8, 95% CI: [5.2, 8.9] ; p < 0.001) were much more likely to go to work after taking a positive or negative test, and PCR positive users were more likely to stay home (OR: 0.1, 95% CI: [0.1, 0.2]; p <0.001) (Fig. 4h, Supplementary Table 15 ).

Using individual level data collected from the How We Feel app, we showed that incorporating information beyond symptoms -in particular, household and community exposure -is vital for identifying infected individuals from self-reported data. This finding is particularly important for risk assessment at the early stage of transmission, e.g., during the latent and presymptomatic periods when subjects have not developed symptoms yet, so high risk subjects can have priorities for being tested and quarantined and close contacts can be traced, in order to block the transmission chain early on. Our results show that vulnerable groups include subjects with household and community exposure, health care workers and essential workers, and African-American and Hispanic/Latinx users. They are at higher risk of infection and should have priorities for being tested and protected. Our finding also show significant racial disparity after adjusting for the effects of pre-existing medical conditions, and needs to be addressed.

We find evidence among our users for several factors that could contribute to continued COVID-19 spread despite widespread implementation of public health measures. These include a substantial fraction of users leaving their homes on a daily basis across the US, users who claim to not socially isolate or return to work after receiving a PCR test(+), selfreports of asymptomatic, mildly symptomatic, or presymptomatic presentation, and a much higher risk of infection for people with within-household exposure.

That said, we note several limitations of this study. The HWF user base is inherently a nonrandom sample of voluntary users of a smartphone app, and hence our results may not fully generalize to the broader US population. In particular, the study may be subject to selection bias by not capturing populations without internet access, such as low-income or minority populations, who may be at elevated risk and over-representation of females. Our results are based on self-reported survey data, hence may suffer from misclassification biasparticularly those based on self-reported behaviors. Moreover, a relatively small percentage of subjects received PCR testing. As shown in Figure 2 , the subjects who were tested were more likely to be symptomatic, health care workers and essential workers and people of color. Naïve regression analysis of test results using responses of subjects who were tested could be subject to selection bias. To mitigate this, we have attempted to correct for these selection biases via the inverse probability weighting approach by estimating the selection probability, the probably of receiving tests, using the observed covariates (Methods). Some residual bias may persist if there remain some unobserved factors related to underlying disease status and receiving a test, or if the selection model is in misspecified. What's more, the HWF user base may not be representative of the broader US population. Although our regression analysis conditioned on a wide range of covariates in order to account for possible selection bias, if any unobserved factors associated with underlying disease status are also related to using the app-e.g., health literacy, access to the internet, particularly vulnerable groups such as low income families-the results may be subject to additional selection bias.

Although there is enormous economic pressure on states, businesses, and individuals to be able to return to work as quickly as possible, our findings highlight the ongoing importance of social distancing, mask wearing, large-scale testing of symptomatic, asymptomatic and mildly symptomatic people, and potentially even more rigorous 'test-trace-isolate' approaches [49] [50] [51] [52] as implemented in several states, such as Massachusetts, New York, New Jersey and Connecticut, which have bended the infection curve [49] [50] [51] [52] . Applying predictive models on a population scale will be vitally important to provide an "early warning" system for timely detection of a second wave of infections in the US and for guiding an effective public policy response.

As testing resources are expected to continue to be limited, HWF results could be used to identify which groups should be prioritized, or potentially to triage individuals for molecular testing based on predicted risk. HWF's integration of behavioral, symptom, exposure, and demographic data provides a powerful platform to address emerging problems in controlling infection chains, rapidly assist public health officials and governments with developing evidence-based guidelines in real time, and stop the spread of COVID-19.

The How We Feel application was approved as exempt by the Ethical & Independent Review Services LLP IRB (Study ID: 20049 -01). The analysis of HWF data was also approved as exempt by Harvard University Longwood Medical Area IRB (Protocol #: IRB20-0514) and the Broad Institute of MIT and Harvard IRB (Protocol #: EX-1653).

Informed consent was obtained from all users and the data were collected in de-identified form.

Open-Source Software: We used the following open-source software in the analysis: 

The How We Feel application was developed in React Native (https://reactnative.dev/), using Google App Engine (https://cloud.google.com/appengine) and Google BigQuery (https:// cloud.google.com/bigquery) for the backend, and launched on the Android and iOS platforms. Users were identified only with a device-specific randomly generated number. Users below the age of 18 were not allowed to use the application.

If a user logged in multiple times in a day, only the first was retained. We excluded any users who responded to a survey version on one day and then on a later day responded to an older survey version. We excluded any users who reported different genders on different days, and we excluded any observations with missing feeling, gender, or smoking history.

Prior to survey version 3, users responded only whether or not they received a COVID-19 test, and we assumed that they received a PCR test. In survey version 3, users reported the type of test they received, and we excluded antibodies tests from analyses.

Logistic regression: receiving a test (Fig. 2) For the analysis of who received a test, the outcome was 1 if a user reported a swab test 0 otherwise. We fit a logistic regression model using demographics, professions, exposure, symptoms, among other covariates. Time-varying measures (e.g. symptoms) were averaged over their V3 survey responses. Analysis was conducted with the statsmodel package (v0.11.1) in Python. We reported the log odds ratios and odds ratios, along with corresponding 95% confidence intervals. Supplementary Table 3 lists the covariates used in the selection (who received a test) regression model, as well as the estimated coefficients, 95% confidence intervals, and p-values. Fig. 2d-f) Asymptomatic Analysis (Fig. 2g) : Each of the symptoms were categorized as either a CDC symptom, a non-CDC symptom or asymptomatic. The CDC symptoms were defined as patients that reported feeling well or unwell with either a dry cough, shortness of breath, chills/shaking, fever, muscle/joint pain, sore throat, or loss of taste/smell. The Non-CDC symptoms were defined as patients that reported feeling well or unwell with any symptoms that were not defined by the CDC, including abdominal pain, confusion, diarrhea, facial numbness, headache, irregular heartbeat, loss of appetite, nasal congestion, nausea/vomiting, tinnitus, wet cough, runny nose, etc.

We restricted analysis to the subset of patients for which we observed symptom data on their test date. For each user that tested positive or negative, we categorized participants into three groups: {CDC symptoms, Non-CDC symptoms, Asymptomatic}. Participants were grouped into CDC symptoms if they reported any CDC symptoms and participants that reported only non-CDC symptoms were grouped in the Non-CDC symptoms category. Participants were considered asymptomatic if they reported none of the above symptoms. Proportions were reported and graphically represented for each group in Figure 2g . (Fig. 2h-j) Logistic regression: test results (Fig. 3a) -A large number of risk factors survey questions were added in V3 of the survey, so we restricted analysis to V3 survey data for the purposes of identifying risk factors associated with SARS-CoV-2(+) test results. User responses were selected using a symmetric 28 day window around the last reported COVID-19 swab test date for any given user. Users that had no reported test outcome, or reported both positive and negative outcomes in different responses were removed. Users who identified as "other" in the gender response were dropped due to small sample size. Median neighborhood household income was estimated by mapping user's ZIP codes to corresponding ZCTAs from the census, and then using the American Community Survey 5year average results from 2018 to infer a neighborhood household income (B19013_001E). Population density was calculated at the county level for each user based on data from the Yu Group at UC Berkeley (https://github.com/Yu-Group/covid19-severity-prediction)

Race was a categorical variable, with distinct groups: "white", "African-American", "Hispanic/Latinx", "Asian", "multi-racial" for those who marked two or more race categories, "other" for those who marked "other", "Native American," or "Hawaiian/pacific islander," and "unknown" for those who did not disclose their race. A given food source was marked as True if the user had indicated the use of that food source over any response within the given time window.

Because the HWF app asks for a separate set of symptoms depending whether or not the user reported feeling "well," there is not a 1 to 1 correspondence between symptoms reported by those feeling "well" and "not well." We excluded symptoms that were only present among those feeling "well" or only among those feeling "not well". For symptoms reported by both those who were "well" and "not well", we combined them into single symptoms. Supplementary Table 2 shows the variables merged using the "any" function. Each symptom's responses were then averaged over all available responses over the 28 day window. Similarly, distribution of sleep was averaged across the time window.

Multiple logistic regression was performed using statsmodels with the binary response outcome being the swab test outcome (positive coded as 1, negative as 0) to obtain unadjusted coefficients, which were converted to odds ratios using exponentiation. Supplementary Table 4 lists the covariates used in the unadjusted outcome regression model, as well as the estimated coefficients, 95% confidence intervals, and P-values.

To mitigate selection bias inherent in restricting the analysis to those who have received a test, we used several inverse probability weighting (IPW) adjustments. The probability of selection was estimated via the logistic regression analysis of who received a test described above. They were incorporated into the outcome model via inverse probability weighting, and we reported confidence intervals based on robust (sandwich-form) standard errors and bootstrap standard errors. As IPW can be sensitive to very small selection probability, we truncated them at several different values, using 0.1 and 0.9; and 0.05 and 0.95. The results using the truncated IPW selection probabilities at 0.1 and 0.9 were reported in Figure 3 . The result using truncated IPW selection probabilities at 0.05 and 0.95 were similar. Supplementary Table 5 lists the covariates used in the outcome regression model with IPW truncation at 0.1 and 0.9, as well as the estimated coefficients, and 95% confidence intervals. Confidence intervals were obtained by bootstrapping the entire model selection process with 2000 replicates. Specifically, for each bootstrap replicate, the entire dataset was resampled with replacement, a new selection/propensity model was fitted for who gets a test, followed by a new IPW model fit using the inferred propensities from the bootstrap sample. Coefficients for the IPW models across the bootstrap samples were used to generate the confidence intervals and mean value of the coefficient.

For additional sensitivity analysis, we used the bivariate probit model with sample selection used in econometrics to simultaneously estimate a selection (who gets tested) equation and an outcome (who tests positive) equation incorporating the selection probability as an additional covariate. Due to possible collinearities, not all features could be used in both the selection model and the outcome model. Specifically, profession could only be included in the selection model, and thus should be interpreted with caution. Supplementary Table 6 lists the covariates used in the full information maximum likelihood estimates of the selection and outcome regression model, as well as the estimated coefficients, 95% confidence intervals, and p-values. Qualitatively, the trends observed in the simultaneous selection/ outcome model fitting are similar to those found in the 2step selection + IPW outcome logistic models.

To address sample bias in the user distribution in comparison to the distribution of individuals in the US, we employed a post-stratification correction for non-probability sampling models as an additional analysis. Post-stratification using age, gender, ethnicity, and location, was performed on the testing selection model which generates the IPW weights for the testing positive model. The US was subdivided into the 9 major census regions (see Supplementary Table 7) . A joint distribution of estimated population over age, gender, ethnicity, and region was obtained from the American Community Survey 5-year estimates from 2018. The corresponding distribution of users was generated across the same variables, and the ratio between each cell in the census distribution and the user distribution was used as the corresponding inverse probability weight in the testing selection model. The testing selection model thus should represent a user's probability of getting tested from a corrected user base distribution matching major US census demographics. The censuscorrected testing selection model was used to generate IPW weights for the subsequent testing positive model and was otherwise performed as before. Bootstrapping was performed on the entire process. The model coefficients for the post-stratification testing model are shown in Supplementary Table 8 , while coefficients and confidence intervals for the subsequent post-stratified IPW test outcome model are shown in Supplementary Table 9 . A comparison of model coefficients with and without post-stratification can be found in Extended Data Figure 9 . A comparison of the census based post-stratification corrected models to the uncorrected models can be found in Fig. S7a . Performing census based poststratification correction yields model coefficients and confidence intervals that are similar compared to when no census based post-stratification is performed.

To assess whether or not the states with the largest number of users bias the results, we also performed a comparison between the selection and outcome models with IPW correction with and without users from CA and CT (see Fig. S7b ). When removing CA and CT data, coefficients from the selection and outcome model remain largely similar, suggesting limited bias due to CA and CT. Moreover, there is an overall increase in confidence interval widths of the outcome model, reflecting an overall increase in variance. Together, this comparison suggests that the CA and CT userbase add additional datapoints without adding substantial bias that may make the overall sample and corresponding analyses unrepresentative of the entire US population.

In the HWF survey version 3, users were first asked if they were exposed to someone with confirmed COVID-19. If they answered yes, then they were asked if that person lived in their household. We removed users who answered something other than "yes" to the first question and who answered the second question. Additionally, we restricted the analysis to users who reported a negative or positive COVID-19 swab test and those who reported 2 or more household members. The outcome of interest was the binary outcome of testing positive on the COVID-19 swab test. The exposure of interest was the binary variable of having a household member test positive for COVID-19; we grouped respondents who answered no with those who did not answer the question regarding household members together.

The rest of the analysis proceeded similarly to the analysis for Fig. 3a , including the covariates used and the symptom collapsing strategy for each user across their responses within the two-week window before the test and two-week window after the test. We also performed sensitivity analysis using symptoms prior to the test. The difference between this analysis and that in Fig. 3a is that the reference group for household exposure was any other exposure or no exposure, whereas the reference group for household exposure and for other exposure in Fig. 3a is no exposure.

For both the unadjusted and adjusted analysis, we performed logistic regression without and with the covariates. Supplementary Table 10 shows the 95% confidence intervals were calculated on the log odds ratio scale and then exponentiated to obtain odds ratios.

Sensitivity analysis: Firth Regression-Because of the small number of users in the user base who received a SARS-Cov2 PCR test (1.7%) and the small number of tested users who received a positive test (8.2%), it is possible for standard logistic regression to be biased. To address this issue, we performed sensitivity analysis with Firth regression (Firth, D., 1993. Bias reduction of maximum likelihood estimates. Biometrika, pp. 27-38.) , as implemented in the logistf R package (https://cran.r-project.org/package=logistf). We found very little difference between the Firth regression results and the logistic regression results presented in the paper (Extended Data Figure 8) , indicating the imbalance of tested users or users who tested positive was not so severe as to bias the results.

Prediction models (Fig. 3c )-XGBoost was compared across different featurizations and subsets of the data to assess the predictiveness of the algorithm on the HWF test result data. Two datasets were generated according to the data selection and featurization used in the regression analysis of Covid-19 swab test outcomes, with the difference between the two sets being the time span used for the window, and the inclusion of additional features not used for inference. In the "pre-test" dataset, the window was selected such that only responses from 14 days before the test up until the day before the last reported test were included for analysis. The post-test dataset, on the other hand, is identical to the regression analysis dataset, using data from 14 days before and after the last reported test. The features for the different feature sets are shown in Supplementary Tables 11-13 . Mask wearing and social isolation were computed as time averages of the responses to these questions. Models were trained and tested using 5-fold cross-validation over the datasets. Within each fold, an additional 3-fold cross validation was performed on the training set to optimize model hyperparameters before testing on the test set of that fold (see Supplementary Table 14 for grid search coordinates). Test set AUCs from each fold were averaged to form a final AUC estimate. Final ROC curves were computed using the combined test set scoring and test set labels from each fold.

In addition to the models shown in the main text, we tested a range of classifiers, feature sets, and data aggregation strategies for their performance at predicting COVID-19 test results from HWF survey data (shown in Extended Data Fig. 4) . Input data was restricted to v3 survey data collected between 04-24 and 05-12, and to qPCR tested users who responded within −10 and +14 days of their test: total of 3,514 negative tests and 315 positive tests. Three different feature sets, each consisting of a series of binary input variables from the HWF survey, were used: 56 symptoms, 77 additional features, or all 133 features together (see 'HWF_model_comparison_final.py' for full feature lists). Note that this featurization differs slightly from the featurization used in the logistic regression described above, the goal of which was inference rather than prediction. Each of the 3,829 qPCR tested users responded between 1 and 25 times within the time window of analysis. To account for time and sparse response rates we binned data across time in four different ways: i) average response for each feature in the 9 days preceding the test data ('pre-test'); ii) average response from −10 to +14 days ('average'); iii) bin the data into three weeks ([−10,−1], [0, 7] , [8, 14] ) and average each separately, creating a separate time indexed feature label for each time bin ('week_bins_avg'); or iv) imputing the response for days with no data by backfilling, then forward filling, then proceeding as in iii ('week_bins_imp'). The classifiers were implemented from the scikit-learn and XGBoost Python packages with the following parameter choices: LogisticRegression(), LassoCV(max_iter=2000), ElasticNetCV(max_iter=2000), RandomForestClassifier(n_estimators=100), MLPClassifier(max_iter=2000), XGBClassifier(). Hyperparameters for CV methods were automatically optimized by grid-search using 5-fold cross-validation. Mean AUC was calculated for each classifier using 5-fold cross-validation. See FigS4.1 for results and 'HWF_model_comparison_final.py' for full implementation.

Post-Test Behavior Analysis (Fig. 4d-g) -Users with post test information (in the 2-7 days) after their test date (or hypothetical test date for untested users) were collected and analyzed. All featurization on this post test window was identical to that of the selection/test outcome models. For computing if a user went to work at least once, all responses for which users either leaving the house or not from version 3 were used, and if any response for a user contained a yes answer to leaving the house for work, the user was marked as leaving the home for work. Similar analysis was performed for leaving to work without a mask by marking the user as a yes if they reported they were going to work and separately reported not using a mask when leaving the house that day. Proportions of each behavior across the three populations (tested positive, tested negative, and untested) were computed, and were bootstrapped with 2000 replicates to generate confidence intervals. Estimated number of contacts was performed similarly, except using the average value over individual user responses across the 2-7 days after their test.

Logistic analysis was performed to understand the effect of PCR test result on user behavior in the 2-7 days after test, adjusting for other potential covariates. Supplementary Table 15 lists the covariates used in the unadjusted outcome regression model, as well as the estimated coefficients, 95% confidence intervals, and P-values. Comparison of testing outcome regression analysis between IPW corre ction alone and a, census based post-stratification + IPW correction and b, IPW correction on dataset with CT and CA users removed from the analysis. From left to right is 1) the comparison of the testing selection logistic regression model, 2) comparison of the predicted probability of getting tested using the testing selection logistic regression model, 3) comparison of the bootstrapped mean model coefficient from the testing outcome model, 4) comparison of the bootstrapped 95% confidence interval widths from the testing outcome model. a, The How We Feel (HWF) app: longitudinal tracking of self-reported COVID-19-related data. b, Responses over time, as well as percentage of users reporting feeling unwell, with releases of major updates to survey indicated. c, Information collected by the HWF app. d, Users by state across the United States. e, Age distribution of users. Note: users had to be older than 18 to use the app. f, Distribution of self-reported sex. g, Distribution of selfreported race or ethnicity. Users were allowed to report multiple races. "Multiracial" = the user indicated more than one category. "Other" includes American Indian/Alaskan Native and Hawaiian/Pacific Islander, as well as users who selected "Other". Reference categories are indicated where relevant, and when not indicated, the reference is not having that specific feature. Log odds ratios and their confidence intervals are plotted, with red indicating positive association and blue indicating negative association. Darker colors indicate confidence intervals that do not cover 0. Population density and neighborhood household income were approximated using the county level data. L = lower bound, U = upper bound of 95% confidence intervals. N = 3,829 users, 315 positive, 3,514 negative who took the V3 survey within ±2 weeks of receiving a test. b, Prediction of positive test results using ±2 weeks of data from test date, using 5-fold cross validation, shown as receiver operating characteristic (ROC) curves. The XGBoost model was trained on different subsets of questions: CDC Symptom Questions = using just the subset of COVID-19 symptoms listed by the CDC. All Survey Questions = using the entire survey. 4 Question survey = using a reduced set of 4 questions that were found to be highly predictive. Numerical values are AUC = area under the ROC curve. N = 3,829 users. 

A pneumonia outbreak associated with a new coronavirus of probable bat origin

Virological assessment of hospitalized patients with COVID-2019

High Contagiousness and Rapid Spread of Severe Acute Respiratory Syndrome Coronavirus 2

Health Response to the Initiation and Spread of Pandemic COVID-19 in the United States

The effect of human mobility and control measures on the COVID-19 epidemic in China

The Impact of the COVID-19 Pandemic on Consumption: Learning from High Frequency Transaction Data. SSRN Electron

Flu near you: Crowdsourced symptom reporting spanning 2 influenza seasons

Building an international consortium for tracking coronavirus health status

Syndromic Surveillance for COVID-19 in Canada

Rapid implementation of mobile technology for real-time epidemiology of COVID-19

Real-time tracking of self-reported symptoms to predict potential COVID-19

A framework for identifying regional outbreak and spread of COVID-19 from one-minute population-wide surveys

Key predictors of attending hospital with COVID19: An association study from the COVID Symptom Tracker App in 2,618,948 individuals medRxiv

A First Look at Contact Tracing Apps

Emergency Response to COVID-19 in Canada: Platform Development and Implementation for eHealth in Crisis Management

Risk of symptomatic Covid-19 among frontline healthcare workers

Key predictors of attending hospital with COVID19: An association study from the COVID Symptom Tracker App in 2,618,948 individuals medRxiv

Cancer and risk of COVID-19 through a general community survey

Longitudinal symptom dynamics of COVID-19 infection in primary care

The effect of a national lockdown in response to COVID-19 pandemic on the prevalence of clinical symptoms in the population

Who should we test for COVID-19A triage model built from national symptom surveys

Full-spectrum dynamics of the coronavirus disease outbreak in Wuhan, China: a 2 modeling study of 32,583 laboratory-confirmed cases

Estimates of the severity of coronavirus disease 2019: a model-based analysis

Fatality Rate and Characteristics of Patients Dying in Relation to COVID-19 in Italy

Maxmen A Thousands of coronavirus tests are going unused in US labs

COVID-19 and the Potential Devastation of Rural Communities: Concern from the Southeastern Belts Available at

Geographic access to United States SARS-CoV-2 testing sites highlights healthcare disparities and may bias transmission estimates

How to Use the Data | The COVID Tracking Project Available at

Coronavirus Disease 2019 (COVID-19) Available at

Presymptomatic Transmission of SARS-CoV-2 -Singapore

Incubation Period and Other Epidemiological Characteristics of 2019 Novel Coronavirus Infections with Right Truncation: A Statistical Analysis of Publicly Available Case Data

Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV2)

Universal Screening for SARS-CoV-2 in Women Admitted for Delivery

SARS-CoV-2 Infection in Residents of a Large Homeless Shelter in Boston

Boon D & Lessler J Variation in False-Negative Rate of Reverse Transcriptase Polymerase Chain Reaction-Based SARS-CoV-2 Tests by Time Since Exposure

Collider bias undermines our understanding of COVID-19 disease risk and severity Affiliations

Liang W Report of the WHO-China Joint Mission on Coronavirus Disease 2019 (COVID-19). WHO-China Jt. Mission Coronavirus Dis

Closed environments facilitate secondary transmission of coronavirus disease 2019 (COVID-19)

Contact Tracing during Coronavirus Disease Outbreak, South Korea

Temporal dynamics in viral shedding and transmissibility of COVID-19

Household transmission of SARS-CoV-2

Household Secondary Attack Rate of COVID-19 and Associated Determinants

Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohort study

High SARS-CoV-2 Attack Rate Following Exposure at a Choir Practice -69

COVID-19 Among Workers in Meat and Poultry Processing Facilities -69

Epidemiology of Covid-19 in a Long-Term Care Facility in King County

Association of Public Health Interventions with the Epidemiology of the COVID-19 Outbreak in Wuhan, China

COVID-19 pandemic: some lessons learned so far (UK House of Commons Science and Technology Committee

Weeks to Crush the Curve

s Not Too Late to Go on Offense Against the Coronavirus

The How We Feel Project would like to thank operational volunteers Ari Simon, Ricki Seidman, Arun Ranganathan, Celie O'Neil-Hart, Debbie Adler, Divya Silbermann, Jack Chou, Lother Determann, Mark Terry, Rhiannon Macrae, Robert Barretto, Ron Conway, Sid Shenai, Tony Falzone, and Yurie Shimabukuro. We would also like to thank Andrew Tang for graphic design support. We are grateful to the HWF participants who took our survey and allowed us to share our analysis. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript The How We Feel Project is a non-profit corporation. 

Refer to Web version on PubMed Central for supplementary material.