key: cord-293174-4ucqudn4 authors: Sun, Ziheng; Di, Liping; Sprigg, William; Tong, Daniel; Casal, Mariana title: Community venue exposure risk estimator for the COVID-19 pandemic date: 2020-09-29 journal: Health Place DOI: 10.1016/j.healthplace.2020.102450 sha: doc_id: 293174 cord_uid: 4ucqudn4 Complexities of virus genotypes and the stochastic contacts in human society create a big challenge for estimating the potential risks of exposure to a widely spreading virus such as COVID-19. To increase public awareness of exposure risks in daily activities, we propose a birthday-paradox-based probability model to implement in a web-based system, named COSRE (community social risk estimator) and make in-time community exposure risk estimation during the ongoing COVID-19 pandemic. We define exposure risk to mean the probability of people meeting potential cases in public places such as grocery stores, gyms, libraries, restaurants, coffee shops, offices, etc. Our model has three inputs: the real-time number of active and asymptomatic cases, the population in local communities, and the customer counts in the room. With COSRE, possible impacts of the pandemic can be explored through spatiotemporal analysis, e.g., a variable number of people may be projected into public places through time to assess changes of risk as the pandemic unfolds. The system has potential to advance understanding of the true exposure risks in various communities. It introduces an objective element to plan, prepare and respond during a pandemic. Spatial analysis tools are used to draw county-level exposure risks of the United States from April 1 to July 15, 2020. The correlation experiment with the new cases in the next two weeks shows that the risk estimation model offers promise in assisting people to be more precise about their personal safety and control of daily routine and social interaction. It can inform business and municipal COVID-19 policy to accelerate recovery. History repeats from pandemic to pandemic (Ghendon, 1994) . Reducing the impact of pandemics 25 relies on early detection and mitigation strategies that slow the spread to allow more preparation and 26 reduce impacts on patient care. However, as the COVID-19 pandemic continues to spread, the burden 27 weighs heavy on all of us to control further spread and consequences through social distancing. Lifestyles 28 leading up to this pandemic are capable of spreading disease further and faster than ever before. New and 29 more intense factors amplify disease transmission (Dashraath et al., 2020) . The risks apply reasonably 30 well to nearly all densely populated areas. Environmental changes such as disruptive climate, may also 31 contribute (Ogen, 2020) . As coronavirus is monitored across the U.S., some communities see a 32 diminishing number of cases, while others have yet to see the peak. Reporting errors and delays also play 33 a role in changes in disease incidence. An objective, publicly-accessible means to estimate risk of 34 infection is needed in order to cope safely with the consequences of social distancing and sequestration. 35 Although various self-quarantine and self-isolation policies are in place in most of the U.S., the virus is 36 not universally contained. People need to visit grocery stores, pharmacies and hospitals. They wish to 37 attend sport, recreation and entertainment events, and community activities and family gatherings. 38 Considering the peak in coronavirus cases might have passed, some states have started to reopen business 39 to restore the economy. Risk assessment tools are essential in order to assist reopening with key 40 information, as to where and when it may be safest to return to places of need or interest. 41 Reopening is a very daunting challenge for many businesses in the United States this year 42 (WORKPLACES, 2020). The duration of the pandemic is uncertain and its influence could last for a long 43 time. The current guidelines for reopening are very general and not very customizable to accommodate 44 the real scenarios of life in different communities. Communities differ in population, store types, average 45 visit time, patients, air quality, etc. There is no one-size-for-all solution to cope. A safer plan should be 46 able to flexibly fit each circumstance by considering specific factors like the locally confirmed cases, 47 potential asymptomatic patients, store space, layout, people density, local population composition, 48 commodities, age distribution, capacity limit, social distancing, temperature check, violation enforcement, 49 etc. However, precise recommendations are rarely available for business managers to use. Very few 50 quantitative models can estimate the impact of varying these factors. Thus, few individual businesses can 51 tailor their reopening policies to make smarter decisions in balancing life, safety, and economy. 52 To find a balancing point between a full lockdown in fear of the virus and a plan of reopening 54 businesses out of socioeconomic concerns, a mechanism to help assist that decision is urgent. The 55 decision-makers to be supported include municipal policymakers, business managers and customers. The 56 mechanism, or system, should provide a customized, real-time risk estimation for store managers, who 57 may then adapt their operations to the unpredictable pandemic. An estimation of risk will also give 58 potential customers the information necessary to assess their own risks of visiting businesses or other 59 venues in an area, with increased awareness and confidence about their decision. 60 To achieve that goal, this project proposes a straightforward social probability model based on the 61 birthday paradox theory (if n people are selected at random, what is the probability that at least r people 62 will have the same birthday?) (McKinney, 1966; Wagner, 2002) to estimate the venue risk of people 63 meeting at least one COVID case, either asymptomatic, pre-symptomatic or actively symptomatic, in 64 various public venues such as shopping centers, grocery stores, gyms, recreational areas, restaurants, 65 DMVs (Departments of Motor Vehicles), and federal agency public field offices. A prototype web-based 66 system named COSRE (community social risk estimator) (Sun, 2020a) is implemented to take zip codes 67 as inputs and output a percentage value indicating the probability of acquiring the virus in the community. Risk is calculated based on real-time data collected from public resources. The purpose of the tool is to 69 give people a reasonable quantitative estimate of their risks of exposure to the COVID-19 virus in their 70 communities. 71 Pandemics are large scale outbreaks of infectious diseases that can increase mortality significantly 73 over great geographic area and cause significant economic, social and political disruption (Madhav et al., 74 2017). Extensive global travels and long-distance contacts in human society raise the likelihood of 75 pandemics. One recent example, the 2003 severe acute respiratory syndrome (SARS) pandemic, was a 76 critical threat to public health (Ksiazek et al., 2003) . The World Health Organization compels member 77 states to meet specific standards to detect, report on, and respond to infectious disease outbreaks. This 78 international framework contributed to a more coordinated global response during the 2009 influenza 79 pandemic than in the previous 2003 event (Katz, 2009 ). Global public health departments look to improve 80 pandemic preparedness through refined standards and responding plans to flatten the time-incidence 81 curves and reduce the deaths. Management of community risks in a pandemic needs to apply more 82 restrictive emergency management strategies than other hazards. The objectives of risk management are 83 to strengthen our responding capacities to contain the diseases, enable and promote linkage and 84 integration across the governments and societies. Accurate estimation of risks is the first and fundamental 85 step in making an effective risk management plan. Common responding practices are to limit the point of 86 entry to reduce the possibility of virus traveling. People, communities, states, and countries should 87 continue to communicate with information and advice. Precise information provided early and often will 88 enable the community to correctly understand the health risks they face, and will make it easier to engage 89 in actions to protect themselves. (WHO, 2018) 90 The SARS-COV-2 virus was first isolated from a patient with a pneumonia of unknown origin in 91 Wuhan, China. Genetic analysis revealed that it is closely related to SARS-Cov-1 and genetically clusters 92 within the genus Betacoronavirus, subgenus Sarbecovirus (Cont, 2020) . Although pathology studies have 93 been done for the COVID-19 disease, as an ongoing pandemic few tools are available to evaluate the real-94 time social exposure risks. An accepted classification groups risk assessment into several levels: little, 95 lower, medium, high, very high, and severe. Nicholas et al proposed a conceptual full-risk-spectrum, 96 Comprehensive Pandemic Risk Management System (CPRMS) to prevent, prepare for, respond to, and 97 mitigate the multisector impacts of severe pandemics (Studzinski, 2020) . It contains six institutional 98 building blocks in the global domain: governance and leadership, sustainable financing, information and 99 knowledge systems, human capital resources, essential commodities and logistics, and operational service 100 delivery. But the detailed framework still needs refinement in order to implement in each geographical 101 area (Studzinski, 2020) . In the COVID-19 pandemic, there is still a long way to go to implement such a 102 framework. 103 The US Centers for Disease Control and Prevention (CDC) has developed an online tool called 104 Influenza Risk Assessment Tool (IRAT) to assess the potential pandemic risk posed by influenza A 105 viruses (Cox et al., 2014) . The IRAT uses 10 scientific criteria to measure the potential pandemic risk 106 associated with each of the life scenarios mentioned previously. These 10 criteria can be grouped into 107 three overarching categories: "properties of the virus", "attributes of the population", and "ecology & 108 epidemiology of the virus". A composite score for each virus is calculated based on the given scenario. 109 The score gives the means to rank and compare influenza viruses to each other in terms of their potential 110 pandemic risks. It is an evaluation tool, not a predictive tool (CDC, 2020 Cromley discussed how to use GIS technology and related technologies to analyze the geography of 130 disease, the relationships between pathological factors and their environments (Cromley, 2003) . To 131 develop a predictive risk mapping of leptospirosis, an environmentally-driven infectious disease using 132 J o u r n a l P r e -p r o o f spatial Bayesian network was proposed (Mayfield et al., 2018) . In a risk model developed for Rift Valley 133 Fever, exposure was measured as the proportion of total outbreak years for each district in prior 134 epizootics, whereas the district's risk of outcome was assessed as severity of observed disease in humans 135 and animals ( extended hierarchical spatial models to create spatio-temporal maps of disease rates (Waller et al., 1997) . The methods of assessing and mapping the potential economic and public health risks associated with 140 avian influenza outbreaks have been discussed (Dudley, 2008) . Olaf Berke proposed an exploratory 141 approach to map spatial relative risk using the background risk in the unexposed population. The results 142 can reveal the importance and geographical distribution of unknown spatial risk factors (Berke, 2005) . 143 Another paper offered a critical review of the methods of disease mapping and spatial regression based on 144 male lip cancer incidence data in Scotland (Wakefield, 2007) . However, these existing models require 145 input data that are unavailable or require contact tracing, or the targeted risks are not venue-based 146 exposure risks. These researches have inspired and influenced our work, which proposes a new index to 147 measure the community venue risks in COVID-19 pandemic, but without contact tracing data. 148 The idea of calculating birthday paradox probability (Flajolet et al., 1992 ) is reused here. The 157 probability distribution is similar to the Binomial distribution (Altham, 1978) . The basic formula is: 158 where is the probability function of meeting someone with active disease; is the total community 159 population (town, village, city, county, region, country); and is the total number of potential 160 COVID-19 cases in the area on day i and day i-d (defined as all accumulated COVID hosts in the 161 community at the given time, to include asymptomatic, symptomatic, hospitalized and self-quarantined; d 162 is the average number of days that the disease takes for people to recover or die (~30 days for COVID-19); 163 people who have recovered from the virus are assumed unable to infect others and are not counted in 164 potential cases); is the number of the people in the business venue, e.g. grocery stores, shopping centers, 165 gyms, restaurants, workplaces and recreational areas (Sun, 2020b) . 166 First, we calculate the odds of NOT meeting any infected person and subtract those odds from 1 to 167 get the probability of meeting at least one infected patient in that specific group of people. 168 This model assumes that all the people in the population visit a store with equal chance. If the 169 chance is not equal for everyone, the values of and should be changed to represent their varying 170 probabilities. For example, the population of Fairfax County, VA is over one million (Census, 2010 ). The Calculations for the maps in May differ from those for the April maps, a change based on physician 176 discovery that most confirmed COVID-19 cases will either recover or die within a 30-day window. The d 177 for COVID-19 is set to be 30 days. For the May calculation, patient numbers for the same day in April 178 (one month ago) are removed because they are no longer susceptible. As May has 31 days and April only 179 has 30 days, we use April 30 data in the calculation of May 31. 180 This risk estimation model can help people to evaluate risk and react accordingly instead of panicking 182 or just being indifferent towards the virus. The formula takes three parameters as inputs and the output is 183 the probability of meeting at least one infected person in the store. The result ranges from 0 to 1. The risk 184 can be generally interpreted as the probability of exposure to another COVID-infected person. Besides, 185 for policy making, it may be useful to classify the probabilities into several levels. For example: the risk 186 less than 25% may be considered relatively low, 25-50% means the probability is medium, 50-75% 187 reflect the place has a high probability for meeting a SARS-COV-2 carrier, >75% means the risk is very 188 high and people might think twice before taking actions. This is just an example and the classification 189 should be validated with real exposure data collected in businesses and public areas. Based on the 190 correlation analysis results, policymakers and community members will have a better understanding of 191 the risk when going outside in the community and may take corresponding actions to avoid being exposed 192 in their specific county. 193 To evaluate the proposed risk score, we use several correlation methods to analyze its relationships 195 with new case data for the next two weeks. For non-spatial correlation analysis, Spearman's correlation 196 coefficient (Spearman's ρ) (Spearman, 1961) , a statistical measure of the strength of a monotonic 197 relationship between two variables, is utilized. Its interpretation is similar to Pearson's R (Pearson, 1895) 198 and can be used for analyzing non-linear relationships.. In addition, spatial autocorrelation analysis is 199 used to study the relationship between risk estimates in one county and those in neighboring counties. If 200 there is a significant correlation, a cluster will be generated to identify the hot spot regions (Oster, 2020) . 201 Moran's I is the most common indicator for spatial correlation estimation. Moran's I ranges from -1 to 1. Using the proposed algorithm, we calculated the risks for all counties of the contiguous United States 211 and generated a series of maps since Apr 1, 2020. Fig. 1 shows six of the maps on the dates of April 1, 212 April 15, May 1, May 15, July 1, July 15 2020, from which the trends of social exposure risks in the 213 pandemic can be observed. The maps for June are similar to those for May and are omitted for brevity. 214 The map of April 1 (Fig. 1a) shows that New York City, Albany GA, New Orleans, Denver, Salt 215 Lake City, and Sun Valley in Idaho became the centers with highest exposure risks. On the contrary, the 216 two states, California and Washington, where the very first COVID-19 patients were found, have 217 relatively low risk. Overall, the epidemic centers had emerged by April 1 and began to spread among 218 neighboring counties. At that time, from abundant news media accounts, most states had instituted social 219 distancing and stay-at-home orders. These orders required people to stay at least 6 feet apart and avoid 220 gathering in groups. Most mass gatherings were canceled following CDC guidelines. However, somee 221 virus-affected people started to show symptoms and confirmed cases were reportedfrom coast to coast. As 222 Fig. 1(a) displays, almost every county showed social risk, but risks were low compared to succeeding 223 months, with the exception of several states that exhibited outbreaks. On April 1st, risks were highest in 224 the metropolitan areas of New York City, New Orleans, and Denver, and in Blaine County, Idaho, and 225 Summit County, Utah. A potential Outbreak could be foreseen from these areas as the time-incidence 226 curves began to rise. 227 The April 15 map (Fig. 1b) It is important to note at this point that state-to-state and region-to-region comparisons of critical 235 parameters, such as confirmed cases or population-patient ratios, depend on consistency and calibration of 236 methods, tools, training and, in some cases, infrastructure. With such inter-comparability, the information 237 content of the raw data and model output increases in important ways, e.g., nation-wide priority allocation 238 of medical resources and emergency assistance. COVID-19 cases make a very high case to population ratio. Unlike metropolitan areas, Navajo county has 243 fewer stores and public places, which would make it even risker if businesses reopen and people gather. 244 Also, in mid-April, the Detroit and Chicago metropolitan areas show signs of spreadingincidence. Other 245 regional differences appear. For example, risks of contagion in Washington and California remain 246 relatively low compared to outbreak centers in the U.S. south and east. Florida shows a pattern of 247 relatively high risk around Miami and less risk elsewhere. 248 The map of May 1 (Fig. 1c) shows that New York, Pennsylvania, New Jersey, and Massachusetts are 249 at high exposure risk in their coastal counties. New York City and its associated counties are the areas of 250 highest risk. For these counties, the risk map highlights the importance of following health guidelines to 251 wear masks and maintain social distancing in public places. . 252 Other east coast states, including Maryland, Virginia, Washington D.C., North Carolina, South 253 Carolina all turn darker red. Inland regions, such as western Pennsylvania, West Virginia, western 254 Virginia, western North Carolina, eastern Tennessee, show relatively lower exposure risks. On May 1 255 (Fig. 1c) , the populous counties in the midwestern states like Iowa, Nebraska, Ohio, Indiana, Kentucky, 256 Minnesota, have turned dark red. But there is less outward spread than in highly urbanized areas like New 257 York. The risks are restricted in certain counties and show no obvious signs of a state-wide outbreak. Those red counties are scattered and isolated. One explanation is that the other less populous counties 259 intentionally reduced their contacts with the populous counties. The connections among different counties 260 in these regions are not as close as those in the east coast counties. The risk in Washington, California, 261 Nevada, Texas, and Florida, is rising slowly and remains lower than in the other outbreak states. 262 According to the map of May 15 (Fig. 1d) , the riskiest areas are still along the east coast, however 263 with a sign that risks may be on the decline in some suburban counties in New York, Pennsylvania and, 264 further south. In Georgia, counties surrounding Columbus County saw less risk, suggesting the new case 265 load may have peaked. New York appears to show a downward trend in new confirmed COVID-19 cases. 266 High-risk areas on July 1 (Fig. 1e) and July 15 (Fig. 1f) include, high-populated counties in the 267 Mississippi basin and the corn belt states, which have become darker, and western counties in Oklahoma 268 and Kansas show that coronavirus has spread further (Fig. 1e) . Also, risk across Georgia, Florida, Texas, 269 and Louisiana has turned more serious, as has risk in southern California which has become higher after Missouri. None show downward trend of COVID-19 case incidences. However, in many of these states, 277 stay-at-home rules were slackened in order to "reopen" and spur the economy. Health experts warn that 278 without data and without proper assessment, decisions to reopen will likely result in a spike of 279 coronavirus cases. Yet, the success of any reopening will depend on decisions made by each and every 280 citizen. Our research helps inform everyone's individual, smart choices during reopening. To evaluate the reliability of the COSRE risk score, we validate its accuracy by comparing the risk 297 score for a county with the number of new COVID-19 cases confirmed in the following two weeks. A 298 two-week interval was used because symptomatic patients are normally detected within two weeks of 299 infection. Fig. 2 shows the Spearman correlation analysis results. In Fig. 2 , the x axis is the COSRE risk 300 score at time t, the y axis is new cases per thousand people in the next two weeks after t. Each spot in the 301 plot represents a county. Spearman's correlation coefficient is consistently above 0.6, which falls in the 302 "strong" category 1 . That means a large COSRE score is strongly correlated to more new cases in two 303 weeks, and a small COSRE score is strongly correlated to fewer new cases in two weeks. Chances of 304 coincidence are diminished because all the time steps have the similar correlation. The results show that 305 the COSRE risk scores well represent future exposure risks. Specifically, the April results show that risks 306 for most counties are below 30% while the new cases per thousand in two weeks are less than 5. On April 307 1 the confirmed numbers surpass exposure risks in many counties (many dots are above the diagonal line). 308 This is probably caused by insufficient testing in the early stage and the risk is underestimated. In May, 309 the increase of risk scores doesn't correspond to obvious increase in new cases. The new cases per 310 thousand for most counties remain under 5. One reason is that people became concerned and most states 311 have stay-at-home orders in place. So even though the confirmed cases are climbing, the number of 312 people with confirmed cases in public places are actually much lower than expected, resulting in low 313 numbers of infections. Things changed in July. Stay-at-home orders in most states expired, and people 314 gradually returned to normal life routines.. The Spearman's R values are higher in July than in May, 315 meaning that the increased new cases are more positively correlated to COSRE risk score. In summary, 316 the Spearman correlation results indicate that COSRE risk has a strong positive correlation with the new 317 cases in the next two weeks, which suggests that the risk scores reflect real exposure risk when the 318 quarantine orders relax or expire and people start to go back to normal life. 319 Figure 2 . Correlation between COSRE risk score and the new confirmed cases in the following two weeks 321 (14 days); Spearman's correlation coefficients are added in the tiles, the x axis is COSRE risk score, the y 322 axis is the new case per thousand people in the following two weeks. 323 (1) Spatial Autocorrelation and Moran's I 325 We calculated the Global Univariate Moran's I of all the semi-month COSRE risk maps from April to 326 July. As shown in Fig. 3 means that for each county the restriction on in-person activities in public places in late April, May and 342 June works to reduce risk. However, after entering July the correlation began to show significance 343 between two continuous time steps; the county with high confirmed cases will very likely have a similar 344 amount of confirmed cases at the next checkpoint. Overall, correlation of the risk scores of two 345 continuous time steps is positive and will have higher significance when venue activities increase. 346 347 Figure 3 . The trends of global univariate Moran's I and bivariate Moran's I 348 Cluster maps based on local univariate Moran's I between one county's COSRE score and its 349 neighbor counties are also generated to extract those counties that share the same pattern of case spikes or 350 lower cases (as shown in Fig. 4) . A p-value less than 0.05 (typically ≤ 0.05) is normally considered to be 351 statistically significant. We use 0.05 as p-value threshold to extract the significantly correlated counties; 352 they are plotted in Fig 4. All the colored counties have a p-value smaller than 0.05, which means 353 significant correlation between the COSRE score of the county and its spatial lags. The high-high hotspot 354 regions agree well with our observations in Section 4.1. The New York, New Orleans, Detroit, Denver 355 and Seattle metropolitan areas and neighboring counties are the major hotspot clusters in April. In May, 356 the Chicago metropolitan area joined the list while Seattle and Detroit drop off, along with many major 357 cities in the Midwest states. The Navajo Nation became a hotspot and persists since then. In July, the New 358 York and the other northeastern states turn blue after consistent low risks are observed. values less than 0.05 are considered as significantly correlated regions, and colored in Fig. 5 . The results 382 show that COSRE risk scores can actually reveal those hotspot counties on April 1, like those in New 383 York, Louisiana, and Detroit; case increases in the next two weeks agree very well with model estimates 384 in those regions. In all cluster maps, those hotspot regions are very consistent with the high-high regions 385 identified in Fig. 4 , which means that the correlation between COSRE risks and the new cases in the 386 following weeks is significantly positive, especially in those hotspot regions and when little venue 387 activity restrictions are in place. Overall, our COSRE risk shows reliable performance, with stable risk 388 scores that indicate true exposure risks, from the early stage to the reopening stage. It should be clarified that exposure risk is not equal to the actual transmission risk. The community 395 risk of exposure is an important contributing factor to the risk of transmission. The relationship is 396 delineated using the following generalized equation: 397 where "# ! $ " is the exposure risk in community daily activities, &! & is the infection risk. The two risks come from two steps. The former risk estimates the possibility of people coming in contact 399 with virus hosts and sources. For example, people who stay home are at lower risk of exposure than 400 people who go to the grocery store. The latter risk estimates the possibility of being infected by the virus 401 after contact with a source. If people wear mouth and nose masks, safety glasses and disposable gloves 402 and footwear, their contracting risk will be much lower than for people who wear no protections. ' $ ( represents the ability of the person or the community to be immune to the virus. People with 404 antibodies could be immune to certain types of viruses and their immunity is higher. The transmission 405 risk is inversely proportional to the immunity of the person or the entire community. 406 The work reported herein gives a customized quantitative estimation on the social exposure risks of 408 people who show up in public places. The results are tailored for every community by considering local 409 at-risk population, store numbers, potential guest flow. The purpose is to help people increase awareness 410 of their risks and help U.S. businesses adjust their store policies. Following this guidance should 411 accelerate the reopening of business while maintaining a low risk of virus contact for the wellbeing of 412 both employees and customers. Successful application of tools from this project will give both customer 413 and business a clearer understanding of the risk of opening up and going to a place of business or 414 recreation venue. Decisions can be made on the fly. The correlation evaluation has demonstrated that the 415 COSRE risk score has a strong positive correlation with new cases for the following two-week period. 416 The rate of new cases is basically determined by regional transmission risks in which the venue exposure 417 risk is a preconditional factor. The positive correlation can be used as evidence to validate the accuracy 418 and effectiveness of the COSRE risk score. We would expect the score will serve as a simple, stable and 419 reliable indicator to measure current risk and estimating new cases for the next two-week period. 420 A reliable model designed for real-world application must consider many factors. More refinement is 422 needed to improve accuracy of the three input parameters. For example, the model assumes that the 423 confirmed patients comprise all the existing patients and all are free to move about the community; in 424 reality, this is not entirely true. As an improvement, R 0 should be used to calculate new potential COVID 425 patient numbers, as close to the real situation as possible. As official virus tests become more available, 426 the ability to monitor and predict community-level risks amid a pandemic will improve. For the 427 population, the model assumes that everyone in the population has the same chance of showing up in one 428 store, which is also not true. Individual preferences exist for which grocery store, shopping center, coffee 429 shop or library to frequent. To make the model more realistic, one can obtain store visiting data from 430 SafeGraph (SafeGraph, 2020), which allows analysis of the age, gender, and other characteristics of a 431 store's customers. From this, a more accurate customer pool is possible based on the regional population 432 census. One may also remove people from the customer pool who will not show up from the population 433 used for calculation, as when data to populate the April and May maps of Figure 1 were adjusted to 434 accommodate finding of physicians. Further improvement will come by adjusting the real-time store 435 visitation based on popular hours, business density, county income level and real-time customer counts. 436 Also, real-world exposure data are scarce. Apple and Google are among the exceptions with more 437 accurate and more extensive data, where smartphone Bluetooth has been used to trace potential 438 coronavirus patients. Based on patient tracing data such as this, the real probability of a venue with 439 COVID patients present could be compared with the estimated risks derived from our formula. Since the 440 pandemic is still ongoing, such a dataset is relatively sensitive and hard to retrieve at present. Model 441 evaluation with real-world data is our next step of work. 442 Public awareness of virus exposure risks is important. Individual decisions will be made during a 444 pandemic as people decide to leave home sanctuaries and reengage in social activities. To inform and 445 assist, we propose a birthday-paradox-based probability model, coupled with a publicly-accessible web-446 based system to calculate community exposure risks in public gatherings. Model derived risks are 447 generated based on the real-time potential COVID-19 cases, the population in local communities, and the 448 number of people number in a given venue. With this web-accessed system, people may explore effects of 449 the pandemic through a geographical spatiotemporal view, moving through time, and testing different 450 venues and the expected numbers of people in them in order to assess changes in risk as the pandemic 451 unfolds. The system integrates the risk estimation model, computational tools, and the analysis of 452 evolutionary pathways, together with refinements to virus surveillance and to research-based, new 453 understanding of this novel virus. The model and system proposed is an improvement in assessing risks posed by the SARS-Cov-2 455 virus and other virus outbreaks, epidemics or pandemics. We are subsequently better equipped to prepare 456 and respond to the ongoing pandemic and to all future vector borne diseases. Yet, caveat and caution are 457 advised. The application scenarios of the system are made to be set in the middle of an outbreak or pandemic, 459 after testing and tracking of patients are in place. Today, without extensive and comparable national 460 testing, the estimated risks might be far from reality. An important objective of our study is to show what 461 can be accomplished with extensive testing and tracking, where methods, materials and objectives are 462 comparable-where apples are compared with apples, without regard to state or region. Valuable 463 information can be drawn from the current web-based system. Complex interactions among factors and 464 inaccuracy of any or all of the input parameters will distort model from reality. We will continue to work 465 to limit these distortions, improve accuracy and reliability through peer and crowd-sourced review. 466 467 The authors would like to thank the anonymous reviewers, Professor Dieter Pfoser, Professor Andreas 470 Zufle, and Professor Pat Gillevet from George Mason University, NASA project (17-HAQ17-0044) team 471 members, the MIT Datathon team for their valuable advice. Thanks for the COVID-19 dataset provided 472 by John Hopkins University, the population data from U.S. Census, and the foot traffic data kindly 473 provided by SafeGraph Inc. Thanks to all the developers of the Python libraries and tools used in this 474 work. 475 476 Two generalizations of the binomial distribution GeoDa: an introduction to spatial data analysis, Handbook of 481 applied spatial analysis Exploratory spatial relative risk mapping Spatial relative risk mapping of pseudorabies-seropositive pig herds 484 in an animal-dense region 486 Assessing real-time Zika risk in the United States CDC, 2020. Influenza Risk Assessment Tool (IRAT): Questions & Answers County Population Totals Outbreak of acute respiratory syndrome associated with a novel coronavirus Pandemic preparedness and the influenza risk assessment tool 492 (IRAT), Influenza pathogenesis and control-Volume I GIS and disease Coronavirus disease 2019 (COVID-19) pandemic and pregnancy An interactive web-based dashboard to track COVID-19 in real time. 498 The Lancet infectious diseases Public health and epidemiological considerations for avian influenza risk mapping 500 and risk assessment Birthday paradox, coupon collectors, caching algorithms and 502 self-organizing search Introduction to pandemic influenza through history Matplotlib: A 2D graphics environment GeoPandas: Python tools for geographic data Real-time estimation of the risk of death from novel coronavirus 511 (COVID-19) infection: inference using exported cases Use of revised International Health Regulations during influenza A (H1N1) epidemic, 513 2009. Emerging infectious diseases 15 A novel coronavirus associated with severe acute respiratory syndrome Pandemics: 518 risks, impacts, and mitigation, Disease Control Priorities: Improving Health and Reducing Poverty Predictive risk mapping of an environmentally-driven infectious disease using spatial Bayesian 522 networks: A case study of leptospirosis in Fiji Generalized birthday problem Notes on continuous stochastic phenomena Predictive factors and risk mapping for Rift Valley fever 527 epidemics in Kenya Assessing nitrogen dioxide (NO2) levels as a contributing factor to the coronavirus 529 (COVID-19) fatality rate Trends in Number and Distribution of COVID-19 Hotspot Counties-United States Notes on Regression and Inheritance in the Case of Two Parents Proceedings of the 533 PyShp: This library reads and writes ESRI shapefiles in pure Python SafeGraph Places Patterns data The proof and measurement of association between two things Comprehensive pandemic risk management: A systems approach. London: 539 Visiting International Research Fellow Policy Institute, King's College; 2020 COSRE: COVID-19 Community Social Risk Estimator What is the chance of meeting a COVID-19 infected person in grocery stores Estimation of the percentages of asymptomatic patients 545 and undiagnosed patients of the novel coronavirus (SARS-CoV-2) infection in Hokkaido, Japan by using 546 birth-death process with recursive full tracing A generalized birthday problem Disease mapping and spatial regression with count data Hierarchical spatio-temporal mapping of disease The authors would like to thank the anonymous reviewers, Professor Dieter Pfoser, Professor Andreas Zufle, and Professor Pat Gillevet from George Mason University, NASA project (17-HAQ17-0044) team members, the MIT Datathon team for their valuable advice. Thanks for the COVID-19 dataset provided by John Hopkins University, the population data from U.S. Census, and the foot traffic data kindly provided by SafeGraph Inc. Thanks to all the developers of the Python libraries and tools used in this work. J o u r n a l P r e -p r o o f Highlights:• A birthday-paradox-based probability model is proposed for estimating the community exposure risks in pandemics.• A web-based system named COSRE is implemented to calculate the daily risks of each U.S. county in the ongoing COVID-19 pandemic.• The model successfully reflected the early sign of outbreaks in COVID-19 serious counties, including the Navajo Nation, New York, Georgia, and Louisiana.• The spatial correlation analysis shows the proposed risk score has significant positive relationship with the new cases in the next two weeks.J o u r n a l P r e -p r o o f