key: cord-1016156-ogfwlr4m authors: Plasek, Joseph M; Tang, Chunlei; Zhu, Yangyong; Huang, Yajun; Bates, David W title: Following Data as it Crosses Borders During the COVID-19 Pandemic date: 2020-04-20 journal: J Am Med Inform Assoc DOI: 10.1093/jamia/ocaa063 sha: 88ea6354a5e2285d9ae78346f538f0b3b28e7e00 doc_id: 1016156 cord_uid: ogfwlr4m Data changes the game in terms of how we respond to pandemics. Global data on disease trajectories and the effectiveness and economic impact of different social distancing measures are essential to facilitate effective local responses to pandemics. COVID-19 data flowing across geographic borders are extremely useful to public health professionals for many purposes such as accelerating the pharmaceutical development pipeline, and for making vital decisions about intensive care unit rooms, where to build temporary hospitals, or where to boost supplies of personal protection equipment, ventilators, or diagnostic tests. Sharing data enables quicker dissemination and validation of pharmaceutical innovations, as well as improved knowledge of what prevention and mitigation measures work. Even if physical borders around the globe are closed, it is crucial that data continues to transparently flow across borders to enable a data economy to thrive which will promote global public health through global cooperation and solidarity. Tracing the origins of new diseases through their growth into global pandemics, such as the 2019 RNA virus strain from the Coronaviridae family known as COVID-19, necessitates following the flow of relevant data. Two weeks after the first COVID-19 hospitalization, virologists conducted metagenomic RNA sequencing on a patient and published its molecular blueprint (a dizzying string of more than 34,000 letters) about a month later [1] [2] . News reports and other biosurveillance related data pointed to a cluster of pneumonia cases that an AI-driven algorithm called BlueDot identified as being an outbreak on December 31 st , 2019, a week before global public health officials notified the public 3 . Outbreaks, such as on the Diamond Princess cruise ship, provided valuable information about how the disease is spread and its incubation period [4] [5] . Electronic health record systems have augmented their selfreported travel screening questionnaires to help identify patients who have recently visited areas where community spread is present 6 . Transportation data have been used to simulate the spread of a disease and estimate the effect of local and intercontinental travel restrictions 7 . Air, sea, and land transport networks continue to expand in reach, speed of travel, and volume of passengers carried, providing a vector for infectious disease spread. Simulations suggested cancelling the Spring Festival in China -a period known for crowded buses, trains, planes, and ferries culminating in an estimated 3 billion trips. Prescriptive analytics on outbreak data through algorithms or models can simulate possible outcomes and help answer: "what should we do" when the outbreak constitutes a public health emergency of local or international concern. Decisionmaking about travel advisories and quarantines is done locally, and each locale has its own level of preparedness for an outbreak. Some areas have used innovative approaches; for example, Taiwan integrated its health insurance database with biometric entry and exit data to generate real-time alerts based on travel history and clinical symptoms to aid in case identification and has used this data to decide whom to quarantine and track at the border 8 . The global health security index (GHSI) encompasses disease prevention, detection, reporting, and response capabilities for each country. Countries with a higher GHSI like Singapore can identify undetected cases through increased epidemiological surveillance and contact-tracing, leading to improved accuracy regarding disease prevalence 9 . There are many potential international data sources for disease surveillance systems 10 to utilize. The flow of COVID-19 data across borders also has economic implications. In the field of biomedical informatics, we sometimes ignore the economic effects that the data and data products we create and consume may have on the global economy, but it is worth examining them in the context of a global pandemic. Certainly, COVID-19 has had a devastating effect on the global economy, and that https://mc.manuscriptcentral.com/jamia could affect public health in a variety of ways 18 . From a health data economy perspective, the inflows and outflows of data and information across geopolitical boundaries has the potential to generate enormous economic value in a digitally connected global healthcare economy [19] [20] . The capital value of global COVID-19 data can be maximized when analyzed using descriptive, predictive, or prescriptive analytics for the purposes of clinical research, public health purposes, and pharmaceutical development. These cross-border data flows have the potential to be a driver of global economic growth, though altruism has largely dictated the free flow of data and ideas in the current crisis. COVID-19 data enables significant new opportunities for innovation and disruption within the health data economy, especially for emerging infectious diseases 20 , and telemedicine. Governance of data and data product sharing can take the form of the OHDSI network with the free flow of data products for a collective research gain, a commercial data sharing model such as between Google and HCA Healthcare 21 , or a self-governance model like DataBox 22 where profit sharing from the data transfer can be realized by the data owners. The goal of flattening the curve is to reduce the reproduction ratio. The reproduction ratio is how many people that a person in one disease episode passes the disease along to the next group (e.g., if the reproduction ratio is four, then that infected patient transmitted COVID-19 to four more people 23 ). However, the number of reported cases may not be a very useful indicator unless you know something about how the COVID-19 testing is being conducted and how the data are being gathered 23 . When there are major differences between COVID-19 testing strategies, as there has been in this pandemic, it is difficult to make direct comparisons accurately as the testing strategies can skew case counts 23 . Accurate estimation of the reproduction ratio depends on having comprehensive, diverse, and heterogeneous data sets to overcome the limitations of individual localized data sources. For COVID-19, countries that conducted comparatively high numbers of tests had lower mortality rates even though they reported high case counts that alarmed the public in the short run 23 . Tracking the viral mutations of COVID-19 cases in New York suggests that most cases were traced to travelers returning from Europe, not Asia as originally expected 24 . Missing this hidden spread due to insufficient testing and screenings at the borders meant that the suspension of air travel and mandatory quarantines for travelers from Europe occurred too late. measures work. Even if physical borders around the globe are closed, it is crucial that data related to COVID-19 continue to transparently flow across borders to enable a data economy to thrive which will promote global public health through global cooperation and solidarity. Funding Statement: This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. The authors have no competing interests to declare. Contributorship Statement: Tang C, Zhu Y, Huang Y, and Bates DW built on and extended the initial idea. Tang C drafted this manuscript. All authors provided substantial contribution to paper edits. Plasek JM, especially, filled up this manuscript with great content to increase its size. All the authors are accountable for the integrity of this work. A new coronavirus associated with human respiratory disease in China Wuhan seafood market pneumonia virus isolate Wuhan-Hu-1, complete genome. National Center for Biotechnology Information An AI epidemiologist sent the first warnings of the Wuhan virus Failures on the Diamond Princess Shadow Another Cruise Ship Outbreak A novel cornavirus from patients with pneumonia in China Epic pushes out software update to help spot coronavirus. Healthcare IT News The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science Response to COVID-19 in Taiwan: big data analytics, new technology, and proactive testing Quantifying bias of COVID-19 prevalence and severity estimates in Wuhan. China that depend on reported cases in international travelers Implementing syndromic surveillance: a practical guide informed by the early experience Children's hospital I'm a researcher who's helped change how we tackle pandemics like coronavirus forever -this is what we've learned. Independent.co.uk An interactive web-based dashboard to track COVID-19 in real time Epidemiological data from the COVID-19 outbreak, real-time case information Medical Center Opens at Boston Covention and Exhibition Center Covid-19: How to triage effectively in a pandemic COVID-19 updates page. Observational health data sciences and informatics How the coronavirus' economic toll could also affect public health Digital globalization: The new era of global flows. McKinsey Global Institute Rethinking Data Sharing at the Dawn of a Health Data Economy: A Viewpoint Google Cloud Launch COVID-19 Data Sharing Platform Self-governing openness of data Coronavirus case counts are meaningless. FiveTirthEight.com Most New York Coronavirus Cases Came From Europe, Genomes Show. NYTime Calling all coronavirus researchers: keep sharing, stay open The authors would like to thank Sheng Wang, PhD for valuable comments and suggestions on early drafts. The content is solely the responsibility of the authors.https://mc.manuscriptcentral.com/jamia