key: cord-0749949-vruttv2s authors: Arvisais-Anhalt, Simone; Lehmann, Christoph U; Park, Jason Y; Araj, Ellen; Holcomb, Michael; Jamieson, Andrew R; McDonald, Samuel; Medford, Richard J; Perl, Trish M; Toomay, Seth M; Hughes, Amy E; McPheeters, Melissa L; Basit, Mujeeb title: What the COVID-19 Pandemic Has Reinforced: The Need for Accurate Data date: 2020-11-04 journal: Clin Infect Dis DOI: 10.1093/cid/ciaa1686 sha: 22a0ce1e3333681829b2293286f4e2a0f1b85569 doc_id: 749949 cord_uid: vruttv2s The COVID-19 pandemic has challenged the United States’ existing national public health informatics infrastructure. This report details the factors that have contributed to COVID-19 data inaccuracies and reporting delays and their effect on the modeling and monitoring of the COVID-19 pandemic. M a n u s c r i p t 3 The COVID-19 pandemic has challenged the United States' existing national public health and laboratory infrastructure. Central to fighting the pandemic is accurate and timely reporting of COVID-19 tests and case patients. Achieving this requires multiple elements including: unambiguous data requests; clearly defined variables; consistent labeling and processing of laboratory samples; uniform reports; accurate and complete data collection, aggregation, and transmission; and trained personnel to identify quality issues. In the US, COVID-19 interventions and prevention strategies have been decentralized and not harmonized across jurisdictions. Data complexity has resulted in reporting inaccuracies and delays at all levels of government, curtailing efforts to interrupt the spread of COVID-19. Multiple governmental and non-governmental entities' requests for COVID-19 related data from healthcare institutions are duplicative, ambiguous, underspecified and lacking in granularity. Despite efforts to limit information gathering to the extent sufficient for informing policy and the public, the various requests commonly overlap, revealing poor communication between data requestors and lack of understanding of data availability, structure, and organization. Entities with legitimate data needs commonly require reports in various formats (faxes, pdfs, csv or xls files, emails, HL7 feeds, online portals), and have varying deadline intervals (daily, weekly, monthly). Currently, our regional health care system reports to seven organizations (2 City, 1 County, 1 Regional, 1 State, 2 National). Data requests include information about testing (6/7), hospital census (5/7), ventilators (3/7), and staffing/capacity (1/7). Although electronic case reporting from established electronic A c c e p t e d M a n u s c r i p t 4 health records (EHRs) to public health agencies can be automated, uniform adoption has not been achieved due to high costs and complexity to vendors and institutions. The uncoordinated data collection, lack of sharing among entities, and inability of the EHR to automatically generate report required us to create a reporting team composed of 11.85 full-time equivalents from the clinical laboratory, data warehouse, hospital operations, and hospital quality team. Not every institution has the required resources to create these complex reports. Data requests from public health and other authorities should be precise, actionable, and unambiguous. Early in the COVID-19 pandemic, confusion existed as to which date should be attributed to a positive test result (e.g., date of first symptoms, specimen collection, specimen resulting, or specimen reporting). [1] For a single specimen, these dates varied by weeks. Reporting requests lacking necessary specificity have led to confusion. The White House Coronavirus Taskforce requested daily COVID-19 test results from hospital-based laboratories. [2] The instructions (since reversed due to duplicate reporting) were "If all of your COVID-19 testing is sent out to private labs and performed by one of the commercial laboratories on the list below, you do not need to report using this spreadsheet." Hence, patient samples collected at institutions not performing COVID-19 testing and sent to one of the specified commercial laboratories for processing resulted in ONE report to the government. However, patient samples collected at institutions not performing COVID-19 testing and sent to laboratories for processing that were not on the A c c e p t e d M a n u s c r i p t 5 list may have been reported TWICE to the government -once by the hospital laboratory sending the specimen and once by the laboratory processing the specimen. Ambiguity was also seen in local (city, county, and state) data requests. Researchers voiced concerns about COVID-19 results being reported multiple times [3] and some requesting entities revealed that they could not address duplicated results. For example, the Texas Department of State Health Services is unable to de-duplicate the test numbers from private laboratories. [4] Because the extent and detection of these errors across entities is unknown, data quality remains in doubt. Fulfilling data requests by different entities is challenging for laboratories when associated specimen data are missing. Accredited laboratories require two forms of patient identifiers (e.g., name and date of birth), collection date, and collection time to accept specimens. While SARS-CoV-2 testing sites often provide the minimum data to meet laboratory acceptance criteria, they do not uniformly collect all data requested by the various entities. Specimens are frequently shipped from third parties (e.g. nursing homes, physician offices, etc.) to affiliated hospitals to be shipped to reference laboratories. When receiving specimens with missing reportable information, the laboratory must either reject the specimen, delay specimen processing until information is received, or process the specimen with missing reportable information. In the latter case, the laboratory must choose to delay reporting or report incomplete information. For our medical center, collecting missing reportable information is a manual, unfunded process that has taken between one and 20 days. Once missing data are collected and reported, downstream practices can affect data accuracy depending on entities' reporting policies. Using the date a result was reported to A c c e p t e d M a n u s c r i p t 6 an entity versus specimen collection date will skew the shape and the trajectory of the epidemic curve. Local and regional data aggregation strategies contribute to reporting delays depending upon the automation, integration, and interoperability of public health departments' electronic systems. Public health information technology infrastructure and capabilities vary widely. Resources allocated to local health departments depend on the investment and infrastructure provided by the federal government, state health departments, and categorical funding through initiatives and grants. Because categorical funding addresses specific public health goals, this funding is rarely used to develop or maintain flexible infrastructures to address future issues. Prior to COVID-19, health departments across the country received reportable infectious disease information through a combination of electronic laboratory reporting systems, faxes, and emails. [5] Health department staffing was usually sufficient to handle the volume of disparate reports. The volume generated by SARS-CoV-2 testing increased demands and stressed resources creating reporting failures. [6] The confluence of the aforementioned problems, in addition to reporting timeframes at an institutional level (Figure 1) , resulted in inaccurate monitoring of the pandemic. For example, in Dallas County from April 27, 2020 to May 27, 2020, the daily incidence of COVID-19 positive cases artifactually decreased according to data reported by authorities on June 2 (Figure 2a) . However, on June 9 after additional backlogged data were added for same timeframe, the revised data show the daily incidence had been stable or increasing. A c c e p t e d M a n u s c r i p t 7 The two different daily incidences for the same time period affected the calculated R , the virus's transmission rate (Figure 2b ). Using the data from June 2, the R for May 27 was 0.73 -a rate that could quench the spread of disease. However, using the revised data from June 9, the R for May 27 was 1.03 -a rate that could fuel the spread of disease. The data discrepancy affected SIR (Susceptible, Infectious, or Recovered) modeling for the estimated and 14-day projection of COVID-19 cases (Figure 2c ). For the June 2 data, the projected COVID-19 case numbers in Dallas County were 1,005; however, when corrected with backlogged data reported on June 9, the projected COVID-19 cases were 2,054. Data reporting delays directly affected policymakers' ability to interpret the trajectory of the epidemic, evaluate trends in cases and SARS-CoV-2 testing penetration and capacity, assess the role of interventions, and respond to the emerging data in a timely manner. To successfully address an emerging pathogenic infectious disease with pandemic potential and limited public health and therapeutic interventions, like COVID-19 or H1N1, the availability of accurate, reliable, and timely laboratory and diagnostic data is critical for researchers, policy makers, and the public. Informatics pitfalls experienced during the H1N1 pandemic, [7] now re-experienced during the COVID-19 pandemic have shown how data robustness complements surveillance strategies and assures that public health and government entities can identify and trace cases and contacts, assess disease trends and patterns, and evaluate the epidemiology and transmission dynamics and assess mitigation strategies. It is imperative that our national informatics strategy be improved upon. Local, regional, and national partners who share common data needs need to identify a finite list of clearly A c c e p t e d M a n u s c r i p t 8 defined variables for a minimum reportable data set to share across entities. This will reduce the reporting burden felt by healthcare providers. [6, 8] Existing public health information technology infrastructure must assure high-quality data and enable real-time data sharing. Healthcare institutions cannot opt-out of electronic reporting to public health agencies. Electronic health record vendors must increase interoperability among institutions and facilitate the basic public health reporting requirements to local, regional, and national public health entities without additional cost to an institution. Unique identifiers, like a universal healthcare identifier, should be implemented to minimize errors in providing needed demographic information and increase the integration of healthcare data with nonhealthcare data. The restrictions that limit data sharing through the Health Insurance Portability and Accountability Act (HIPAA) should be revisited and criteria should be developed for the use of novel digital data, such a cellular data, with transparent public communication. [3, [9] [10] [11] Increased data management practices must be developed to avoid preventable methodologic and interpretation errors, such as by standardizing analytic strategies [12, 13] and publishing best practices for data presentation through dashboards. [14, 15] These investments will facilitate responses to the COVID-19 pandemic and to the inevitable next infectious disease outbreak. The authors of this article have no conflicts of interest to disclose. Why is coronavirus data so damn difficult to communicate? Errors, lag, and perplexing charts-trying to understand COVID-19 data has become a major headache for many Georgians. Here's why Letter from the Vice President to Hospital Administrators The COVID-19 pandemic highlights shortcomings in U.S. healthcare informatics infrastructure: a call to action Texas Case Counts COVID-19 Coronavirus Disease Coronavirus Response: The Fax Machine Before public health officials can manage the pandemic, they must deal with a broken data system that sends incomplete results in formats they can't easily use Transmission dynamics: Data sharing in theCOVID-19 era. Learn Health Syst: 8 Infectious Disease Infrastructure: Impact and Continued Improvements Due to H1N1 Infectious-Disease-Infrastructure-Impact-and-Continued-Improvements-Due-to-H1N1-Investments When past is not a prologue: Adapting informatics practice during a pandemic Balancing health privacy, health information exchange, and research in the context of the COVID-19 pandemic On the responsible use of digital data to tackle the COVID-19 pandemic COVID-19 and the Need for a National Health Information Technology Infrastructure Accurate Statistics on COVID-19 Are Essential for Policy Guidance and Decisions Methodological challenges of analysing COVID-19 data during the pandemic Tracking COVID-19 in the United States From Information Catastrophe to Empowered Communities A c c e p t e d M a n u s c r i p t 9 A c c e p t e d M a n u s c r i p t A c c e p t e d M a n u s c r i p t 13 Figure 2