key: cord-0059729-zqkci53q
authors: Magnuson, J. A.; Hopkins, Richard; McFarlane, Timothy D.
title: Informatics in Disease Prevention and Epidemiology
date: 2020-02-27
journal: Public Health Informatics and Information Systems
DOI: 10.1007/978-3-030-41215-9_14
sha: d653f33fdbf38c23fc6f5b4ec1e4d1f2feed9c25
doc_id: 59729
cord_uid: zqkci53q

Technology changes continually, but the principles underlying informatics, epidemiology, and disease control are persistent. This chapter explores these principles and illustrates the varied information systems that support epidemiology. Concepts including reportable/notifiable diseases, passive/active surveillance, and the common components of public health prevention programs are discussed. Public health information systems support certain common functions. Four of these functions and their informatics implications are examined: public health surveillance; outbreak or cluster recognition and response; acquisition of laboratory information; and field investigation. The chapter provides an understanding of public health interoperability and integration challenges, solutions, and future goals.

Technology is always in a state of change, but the principles underlying informatics in epidemiology and disease control are persistent. Information systems are critical to the functions of public health surveillance, epidemiologic investigation of diseases and outbreaks, and case management.

Systems to support public health surveillance are often based on traditional passive casereporting, with enhancements such as electronic laboratory reporting and direct reporting from electronic health records. Several varieties of surveillance systems are discussed in this chapter, including syndromic surveillance; registries for reporting and follow-up of cases of cancer, birth defects, lead poisoning, hepatitis B, etc.; and population-based surveys (such as BRFSS or PRAMS).

Information systems must collect data and manage summary information about outbreaks, investigations, and responses. Epidemiologic investigation of outbreaks and clusters may be supported by generic tools and specialized toolkits (see section "Outbreak Tracking").

Surveillance systems are often integrated to some degree with systems to support case management, contact tracing, and case-based disease control interventions. This chapter discusses opportunities and choices in the design and implementation of these varied systems. 1 . Describe the range of information systems in current use to support public health surveillance, epidemiologic investigations, and disease prevention. 2. Identify opportunities for more effective epidemiology and disease prevention through implementation of informatics practices. 3. Describe the challenges and opportunities presented by integration of information systems for epidemiology and disease prevention.

While specific tools and technologies may change, the underlying principles and components of a disease prevention program are generally constant. Historically, public health programs to prevent disease typically have been designed and implemented one disease at a time. Each disease has its own patterns of distribution in populations and risk factors. Therefore, each disease may also have different optimal and practical intervention strategies that will be effective in controlling, preventing, or even eliminating cases of the disease. Some examples of these different prevention strategies include:

• Measles-vaccination • Gonorrhea-antibiotic treatment of case contacts before they become ill themselves (e.g., prophylaxis)

• Cervical cancer-screening with Pap smears, treatment of preclinical disease, and HPV vaccinations • Neural tube defects-folic acid supplementation of selected foods

Despite the variety in prevention strategies, each disease prevention program's components are drawn from a relatively short list. These common components and specific examples are presented in the accompanying table (Table 14 .1). Ideally, program managers choose the most effective combination of these program components to prevent or control the disease or diseases they are charged with addressing. However, as this must be done within the constraints imposed Planning processes leading to growth or reduction of a program Grant reporting, including evaluation and lessons learned Public health surveillance Electronic laboratory reporting (ELR) of communicable diseases Syndromic surveillance systems Emergency medical services (EMS) data surveillance Outbreak or cluster recognition and response by the available resources, cost-effectiveness and staffing are the usual criterion for choosing the preferred combination of program components.

Public health agencies are typically organized both by disease and by function, a factor which has contributed to both siloed activities and silo ed funding. For example, each diseasespecific program usually does not have its own laboratory, and a single public health clinical facility and its staff may provide varied services such as immunizations for well children, treatment of people with tuberculosis (TB) and their contacts, and Pap smear services. To varying degrees, they may even combine activities in a single patient encounter, for example, testing women for gonorrhea and chlamydia infections at the same visit where they get a Pap smear, or offering hepatitis B vaccination during a visit for sexually transmitted diseases (STD) treatment.

As information technology has become more widely used in public health, information systems have typically been implemented program area by program area, as resources became available. This has led to the creation of information silos. For example, laboratory information systems often were developed in isolation from systems to support clinical care or public health surveillance. Similarly, cases may be present within disparate and unlinked program surveillance systems (e.g., hepatitis C and HIV/AIDS), limiting understanding of shared risk factors that may be useful in prevention and control efforts (e.g., injection drug use).

Information systems to support clinical operations of public health departments (for example, clinical services for STDs, childhood immunizations, HIV/AIDS, TB, or family planning services) have characteristics similar to those of other electronic health record systems in ambulatory care. However, in some health departments, clinical information systems have been separated by disease or clinic. If an information system were to be designed from scratch for a set of multiple disease prevention programs, there would be potential savings and efficiencies from identifying the ways that the components of one program depend upon information from another program or how the new system could serve multiple programs. Potential efficiencies can be viewed from two perspectives:

• which are frequently developed using three methods: -deterministic matching (exact matching), which employ highly discriminatory factors (e.g., medical record number, social security number) -probabilistic matching, which bases matching on the probability of two records being the same person given a set of matching factors -a combination of both deterministic and probabilistic techniques (also known as fuzzy matching)

In reality, it is rare to have an opportunity to design such extensive information systems as a single project. Public health continues to deal with numerous legacy systems that were designed to support program-specific workflows.

A key challenge for the public health informaticist/informatician is to help their agency make decisions about where information system integration will yield substantial benefits and where it will not (e.g., unnecessary, impractical, not worth the cost).

There are numerous factors involved in making decisions about system integration. For example, if there is an isolated request to determine how many people in a jurisdiction have been reported with both syphilis and hepatitis B during a particular time interval, investigators could do an ad hoc match of information from two independent surveillance information systems. This task might take an analyst a few days or weeks to accomplish-which is almost certainly inexpensive compared to the cost of building a new information system that could do this task almost immediately. But if the request is to be ongoing, a more efficient solution should be reached.

For many purposes, it may be useful and sufficient to be able to display multiple streams of surveillance or programmatic data in the same environment, on the same screen or even in the same chart. For example, originally designed for syndromic surveillance, the ESSENCE analytic environment now commonly links pre-diagnostic, diagnostic, and ancillary data. In Florida, syndromic surveillance integrates emergency department visits, urgent care visits, death certificate information, reportable disease data, and poison center information into the ESSENCE analytic environment so that trends for similar conditions by age, sex, and geographic area in the multiple data streams can be easily compared [1] . However, if it is desired to have real-time information available to the STD clinic staff about past diagnoses of hepatitis B, or about past receipt of hepatitis B vaccine, then the information system needs to be designed to support this kind of look-up; the usual solution is a shared MPI between the two systems. Alternatively, a common data repository can be designed in which all information about each person is permanently linked.

In 2012, the Public Health Informatics Institute published a detailed analysis of the typical workflow involved in surveillance, investigation, and intervention for reportable diseases, and the corresponding information system requirements [2] . The work group of representatives from nine different state and local health departments were able to identify a large number of processes that were common to all nine jurisdictions. These common processes included case-finding, case investigation, data analysis and visualization, monitoring and reporting, and case/contact specific intervention. These common processes can then serve as a basis for designing information systems to support case-reporting, surveillance, and case-based intervention work.

As discussed earlier (Table 14. 1), there are certain components common to disease control and prevention programs, such as policy and guidance development, public education, etc. Similarly, there are specific, common PH functions that are supported by PH information systems. This section addresses information systems designed to support four of those functions:

• Public health surveillance • Outbreak or cluster recognition and response • Acquisition of laboratory information • Field investigation

The Centers for Disease Control and Prevention (CDC) defines public health surveillance as "the ongoing, systematic collection, analysis, and interpretation of health data essential to the planning, implementation and evaluation of public health practice, closely integrated with the dissemination of these data to those who need to know and linked to prevention and control" [3] . Each word of this definition is carefully chosen, and has implications for the design of surveillance information systems. A one-time data collection activity is not surveillance. Data col-lection for research purposes is not surveillance. Surveillance data are collected to support public health action, and analyses and recommendations based on these data must be shared with those who provided the data and with others who need to know. Well-designed electronic surveillance systems can facilitate standardization of data and improve timeliness of reporting, which in turn support rapid investigations and implementation of control and prevention activities.

Objectives of surveillance systems differ at the local, state, and federal levels. At the local level, immediate response to individual cases is relatively more important, while at the federal level the analysis of larger-scale patterns is the most important function of surveillance. For state health departments, both uses of surveillance data may be important, depending on the disease, the size of the state, and the structure of health departments (e.g., centralized vs. decentralized).

Public health surveillance systems may be based on data capture from a variety of sources, as shown in Table 14 .2. Data sources may include case reports, population-based surveys, sentinel providers, electronic health records or administrative data. For some non-infectious diseases, surveillance is carried out through registries. Registries are usually established by specific legislation, and typically relate to a single topicfor example a registry of records for a disease, or of immunization records. Registries may be restricted to a geographic region.

An important distinction of diseases is that they may be reportable and/or notifiable. Reportable diseases must, by state law or regulation, be reported to the appropriate state/territory. These reports include personal identifying data. Notifiable diseases are voluntarily reported to CDC by states/territories, and do not include personal identifiers. Nationally notifiable diseases in the United States are determined by the Council of State and Territorial Epidemiologists (CSTE) in collaboration with the CDC and are modified annually. In 2019 this list included 120 diseases, 113 of which were classified as infectious [13] .

Surveillance may be considered passive or active. A passive surveillance system utilizes regular, ongoing reporting based on specific criteria, such as reporting by health-care providers and electronic laboratory reporting (ELR). In passive surveillance, reporting entities initiate reports [12] as needed, following a protocol, without the health department actively collecting or soliciting them. Such systems may require considerable effort to design and implement. While passive surveillance limits the resource expenditure of the health department, the burden is shifted to reporters, and as a result case report data may be incomplete, delayed, and may not represent the true disease incidence in the population [14] . An active surveillance system requires the health department to actively collect data, and can be used to evaluate passive reporting mechanisms or supplement case reports obtained through passive surveillance when more detailed information is required (e.g., during outbreak investigations). Surveillance of chronic diseases and their risk factors based on surveys such as the Behavioral Risk Factor Surveillance System (BRFSS) and National Health and Nutrition Examination Survey (NHANES) also does not depend on providers to make case-reports. There is not a sharp line between active and passive surveillance systems. The latter can be enhanced, for example, by periodic review of reports received to identify reporting entities that appear to be delinquent in making required reports.

Information systems to support reportable disease surveillance contain records representing case reports that previously were, for the most part, entered manually into a database or information system by public health staff, based on information received from doctors, infection control practitioners, hospitals, and laboratories. Increasingly, the laboratory information in these records comes from electronic records transmitted by the public health laboratory, hospital laboratories, and commercial laboratories. Electronic laboratory reporting (ELR) is an enhanced passive reporting system in which a formatted message is triggered to transmit when a laboratory result matches specific reporting criteria (e.g., a positive IgM antibody test for hepatitis A). These records typically contain a combination of clinical, laboratory, and epidemiologic information about each case. In future, it is possible that increasing proportions of these case reports will be entered directly into a website by the practitioner creating the case report. The CDC-developed National Electronic Disease Surveillance System (NEDSS) Base System (NBS) is one of the tools available to process and share data supporting public health investigations, including an ELR user interface (UI). In 2019, 20 state health departments and six territorial/district health departments utilized NBS to manage investigations and transmit surveillance data to CDC [15] . Other alternatives include systems developed in-house, commercially available off-the-shelf (COTS) solutions, or hybrid systems that combine elements of more than one system.

Regardless of the specific underlying technology, ELR systems have clear value to PH. ELR has demonstrated improvements in both completeness and timeliness of disease reporting, providing significant benefit to PH surveillance and activities [16] .

In epidemiological surveillance practice, there is usually a relatively short list of required elements in the initial case report received from a physician, hospital, or laboratory. For some diseases, this is the only information received on all cases. For other diseases, usually those with high importance and lower case numbers, an additional data collection form is initiated by the receiving health department, which gathers information as appropriate from the ill person, the treating physician, and health records. The optimum amount of information to collect in the initial case report, as opposed to the disease-specific case report form, is a matter of judgment and may change as technology changes. In a largely manual system, health departments typically desire to minimize barriers to reporting of cases, so the incentive is to keep the initial case report form short. If much of the information desired for the disease-specific case report form can in fact be extracted from an electronic health record with no additional effort by the person making an electronic case report, then the balance changes. Careful decisions are needed when determining if follow-up interviews will be necessary for certain cases of certain diseases [17] .

Until relatively recently, virtually all of the epidemiological surveillance information used at the federal level was collected initially at the local (or sometimes state) level, where it was used in the first instance for local response. In this scenario, as the case report information passes from the local to the state to the federal level, it is subjected to validation and cleaning. Cases not meeting the surveillance case definition are removed from the data submitted to the federal level; missing data have been filled in to the extent possible; and cases have been classified as to whether they are confirmed, probable, or suspected, using standard national surveillance case definitions (case definitions developed by the CSTE in consultation with CDC) [18] .

More recently, advances in technology have allowed case reports, and the information on which they are based, to move almost instantaneously from electronic health record systems, maintained by doctors, hospitals, and laboratories, to public health authorities. There are no technical barriers to these data being available at the federal level essentially as soon as they are at the local and state levels. This ready availability of unfiltered clinical information may allow more rapid awareness by public health officials at all levels of individual cases of high-priority diseases (like botulism or hemorrhagic fevers like Ebola virus infection), and thus lead to more rapid detection and characterization of likely outbreaks. Data that are rapidly transmitted in this way typically have not benefitted from cleaning, error checking, or collection of initially-missing data by local staff.

The simultaneous availability of raw data to multiple agencies at different levels of government also presents certain challenges. The user at the local level will have ready access to information from many sources about local conditions and events, which can be used to interpret local observations. They will be in a position to understand when an apparent anomaly in their surveillance data is due to an artifact or to local conditions that are not a cause for alarm. They will also know whether a problem is already under investigation. A user at a state or federal level will be able to see patterns over a larger area, and thus may be able to identify multi-jurisdictional outbreaks, patterns, or trends that are not evident at a local level. The fact that several users may be examining the same raw data at the same time requires that these multiple users be in frequent communication about what they are seeing in their data and which apparent anomalies are already explained or need further investigation. There is a danger that users at a higher level may prematurely disseminate or act on information that, while based on facts, is incomplete or misleading. Similarly, users at a local level may not realize that what they are seeing is part of a larger phenomenon.

From an information management perspective, an important question is where to put human review of case reports in this information flow. For example, it is technically possible for likely cases of reportable diseases to be recognized automatically in health care electronic record systems. Some of these could be passed on to public health authorities without human review, in the same way that reportable laboratory results are already passed on in ELR. For which constellations of findings in the electronic health record would this be appropriate? Should some electronic case reports generated by electronic health record systems be passed to state or even federal public health officials before they are reviewed and validated at the local or state levels? If so, which ones? As always, there is a tension between the speed of information flow and its quality and completeness.

There is a need for research to determine which constellations of findings in electronic health records have adequate sensitivity (the fraction of the people who have the disease who are detected) and positive predictive value (the fraction of case reports received that meet the surveillance case definition for the disease) to warrant automated identification of a person as being likely to have a case of a reportable disease. The acceptable sensitivity and positive predictive value will vary by disease.

The relationship between sensitivity and positive predictive value (PPV) can be illustrated with an example of a surveillance system that collects case reports of a disease. Reporters are asked to report cases meeting certain criteria, a surveillance case definition. It is known that, in practice, some true cases of the disease won't meet the surveillance case definition (for reasons which might include missing lab data, too early in illness, very mild case, etc.), and that some reported cases will turn out to be cases of other diseases with similar symptoms and clinical findings. If epidemiologists want to increase the sensitivity of the surveillance system, without making any other changes to the system, they can change the surveillance case definition to make it easier for potential cases to meet the definition; e.g., by dropping a requirement for laboratory confirmation by a certain test or by requiring fewer symptoms. By thus including milder or less typical cases in the scope of what is to be reported, the chance of missing cases is reduced (higher sensitivity), but the chance that the cases that are reported are not actually cases of the disease of interest is increased (lower PPV). Conversely, if the epidemiologists want to reduce the number of cases reported that are not true cases (i.e., to increase PPV), they can make the surveillance case definition tighter (e.g., requiring laboratory confirmation or a more typical clinical picture). However, this tighter definition may cause the epidemiologists to miss some true cases that would have been captured before the definition change (lower sensitivity).

A similar conversation is evolving around the use of electronic health records for surveillance of chronic disease. In the absence of a robust passive surveillance system, public health relies on population-based surveys for local, state, and national estimates of chronic disease prevalence. Surveys, such as the BRFSS and the more locally focused community health assessments, possess several notable limitations that may be overcome by leveraging existing data sources, e.g., electronic health records. While sampling schemes and statistical procedures are applied to make surveys representative of the population as a whole, they often lack precision in describing the health status of smaller geographical areas, such as census tracts or neighborhoods, largely due to the small sample size obtained through surveys. An illustration of this is provided by 2015 data from Marion County, Indiana, a county which contains the capital city of Indianapolis. The BRFSS sampled 357 adults (0.05% of the population) while electronic health records captured 530,244 (75% of the population) adult residents. The larger sample resulted in greater precision of prevalence estimates. Similar to automated communicable disease reporting, there is a need to balance sensitivity and positive predictive value when choosing case definitions for electronic health record-based chronic disease surveillance. As case definition sensitivity and PPV change, the resulting point estimates (e.g., prevalence) will also vary. For example, electronic health record-derived hypertension prevalence in Marion County Indiana varied from 13.7% using only diagnostic codes to 36.2% when including diagnostic codes, measured blood pressure, or dispensed blood pressure medication [19] .

While electronic health records hold promise to be sources of large, timely chronic disease surveillance data about a population, at this time, they should be viewed as a supplement to, and not replacement for, population-based surveys. A key advantage of population-based surveys is the ability to collect information pertaining to chronic disease indicators and risk factors. The data generated from clinical systems represent only those who come in contact with health care services and thus are not likely representative of the entire population, and the risk factor data present may be inadequate/nonexistent. Additionally, depending upon the type of clinical encounter data utilized (outpatient vs. inpatient; acute vs. wellness) estimates of chronic disease may be biased. For example, a study leveraging electronic health records to quantify childhood obesity prevalence found encounters coded as wellness visits demonstrated a lower overweight or obese prevalence (33.9%) than other encounter types (37.8%) [20] . As electronic health records become more advanced and health systems expand their focus on population health, it is possible these systems will evolve to contain sufficient risk factor data and be tuned to represent the population, or at the very least, the catchment area of the health care facilities.

In 2001, CDC published the Updated Guidelines for Evaluating Public Health Surveillance Systems [21] . This document identifies a set of key attributes of surveillance systems to be assessed during a surveillance system evaluation, including simplicity, flexibility, data quality, acceptability, sensitivity, predictive value positive, representativeness, timeliness, and stability. These are also useful attributes to consider when designing a surveillance information system [22] . The relative importance of these attributes will vary depending on the condition under surveillance and the main purposes for surveillance. As an example, a surveillance system to detect cases of botulism for immediate public health response puts a high premium on timeliness, and its operators are likely to be willing to accept a modest number of false-positive reports (i.e., a lower positive predictive value) in order to assure that reports are received very quickly. On the other hand, surveillance to support planning of cancer prevention programs and treatment services is less time-sensitive, given the quite long incubation periods for most cancers; therefore the surveillance is more concerned with diagnostic accuracy of every case report than with speed of reporting. Timeliness, PPV, and sensitivity of a public health surveillance system are always in tension with each other; increasing two of these always compromises the third. Figure 14 .1 illustrates this tension with two curves separated by a vertical line; one curve for the number of healthy individuals in the population (left), another for cases of disease (right), and a vertical line representing the sensitivity and specificity of a given surveillance case definition. The area where the curves overlap is of concern as it represents potential false negative and false positive case reports, the ratio of which depends on the circumstance. In circumstances requiring immediate public health responsesuch as respiratory anthrax-the vertical line Fig. 14.1 The relationship between sensitivity and positive predictive value for performance of a surveillance system of case definition can be shifted left to ensure that all true positive cases are detected. However, for a given disease prevalence, increased sensitivity is accompanied by a decrease in the likelihood that case reports truly meet the surveillance case definition (i.e., a decrease in positive predictive value). Similarly, shifting the vertical line to the right will increase the likelihood that case reports meet the surveillance case definition (increasing positive predictive value), but at the expense of an increase in undetected cases (poorer sensitivity).

When evaluating positive and negative predictive value metrics, it is important to consider the prevalence of the disease in question. As prevalence increases it is more likely that individuals in the population will truly have the disease and as prevalence decreases it is more likely that individuals in the population will be free of disease. Consider detection of a disease in a population of 100 individuals, in which sensitivity and specificity are fixed at 85% and 70%, respectively. When the underlying disease prevalence is 30%, the specificity is 55% and negative predictive value is 92%. However, when the underlying disease prevalence is decreased to 10%, specificity falls to 23% while negative predictive value increases to 97%.

In systems based on case-reporting from doctors, hospitals, and laboratories, and receipt of electronic health records from these same organizations, records for an individual can, in principle, be linked with records for that same individual in numerous public health information systems, including those supporting clinical service, immunization registries, case-investigation, partner or contact identification, partner or contact notification, and provision of interventions to partners or contacts. Sometimes this will be done best by automated messaging of structured data from one system to another, sometimes by supporting real-time look-up capabilities, and sometimes by development of an MPI to underlie some or all of these applications. One critical decision is which application to consider as the hub or key for this information sharing, for example, the surveillance application itself or a clinical application.

Surveillance systems that are based on sample surveys (such as the BRFSS), on sentinel practices (such as ILI-Net for surveillance of influenza-like illness [23] ), or on syndromic surveillance do not (usually) have individual patient identifiers, and so intrinsically cannot be linked at the individual level to information systems supporting other disease control program components. For syndromic surveillance, additional data sources have been brought into the analytic environments, including death certificates, poison center consultations, over-thecounter medication sales, and various measures of social media attention/concern about syndromes or diseases of interest. Data for these types of surveillance are typically managed in systems built on standard statistical software packages or other independent systems.

Syndromic surveillance systems are based on rapid acquisition of unfiltered, real-time, electronic records, usually without individual identifiers, from hospital emergency rooms [24] and urgent care centers, and also, increasingly, from outpatient physicians' offices and from hospital admissions [25] . The primary purpose of these systems is to support detection and characterization of community disease outbreaks, as they are reflected in care received at emergency departments, physicians' offices, or hospitals. Each visit to an emergency department is assigned to a category or syndrome, based on words and strings contained in the patient's chief complaint and/or the triage nurse's notes. As the records received by the health department usually do not have individual identifiers, they cannot be linked to records in other information systems. However, records received by the syndromic surveillance system should contain unique identifiers that could allow the epidemiologist analyzing the data to work back through the sending facility to an identified clinical record. This traceback might become necessary if the person appeared to have a case of a reportable disease or to be part of a significant outbreak. Adding outpatient visits and hospital admissions to the scope of syndromic surveillance is opening up additional uses for this technology, especially in the areas of real-time non-infectious disease surveillance. This topic is discussed in detail in Chap. 16 .

Surveillance for cancers [26] , stroke [27] , birth defects [28] , and some other chronic diseases like amyotrophic lateral sclerosis (ALS) is carried out through registries. Registries are usually established by specific legislation, and typically relate to a single topic or class of datafor example a registry of records for a disease, or of immunization records. Registries may be restricted to a geographic region. Another distinctive feature of registries is that individual case reports are kept open for long periods of time, up to several or many years, allowing additional information about treatment, hospitalization, and death or other outcomes to be added. Registries thus serve as systems to monitor type, duration, and outcome of treatment for these diseases, in addition to the occurrence of new cases of disease (disease incidence). They may also support outreach efforts to patients or their families, as a way to document that appropriate steps have been taken to link patients to needed types and sources of care.

Most cases recorded in state-level cancer registries are acquired from hospital-level registries, using an electronic case report in a standardized format [29] . Some case abstracts are obtained directly by registry personnel or contractors, when hospitals do not have suitable registries of their own. Case reports require extensive review and abstraction of medical records by trained workers. Birth defect registries may also be built on active search for cases in hospital and other medical records, and abstraction of those records to make case reports. They also may be built by electronically linking records from vital statistics (birth and death records), centralized hospital discharge record systems, and clinical service providers for children with birth defects (such as state programs for children with special medical needs) [30] . The latter are much less expensive to develop but cannot be assumed to have captured all cases of the disease under surveillance, or captured them correctly [31] .

In general, public health surveillance informs practice through analysis, interpretation, and dissemination of data for program planning and evaluation (Fig. 14.2 Fig. 14.2 Cycle of public health surveillance and program activities ysis of surveillance activities results in descriptive studies examining outcomes or risk factors by characteristics of person (e.g., age, race, sex, occupation, lifestyle, genetics), place (e.g., state, county, city, rurality, event, nearby industry), and time (e.g., day, week, year, seasonal). Analytic studies often seek to compare effectiveness of public health programs to generate evidence for best practices, known as evidence-based public health. Results of analyses and interpretations should be disseminated to those providing surveillance data, as well as to public health practitioners and community stakeholders to inform current and future programs.

A disease outbreak is defined as a number of cases greater than the number expected during a particular time interval in a geographic area or population. This term usually is used for events due to infectious diseases, and sometimes for those of toxic origin. A similar increase above expected numbers for a non-infectious disease, such as birth defects or cancer, is usually called a cluster. Outbreaks and clusters may be due to diseases for which individual cases are reportable (like shigellosis or breast cancer), or diseases for which they are not (like food poisoning due to staphylococcal or Clostridium perfringens toxins in most states, SARS when it was new, or multiple sclerosis).

Surveillance systems are designed to facilitate recognition of outbreaks or clusters by frequent examination of the most current information available. The design of the user interface is particularly important. The interface should allow users to: flexibly display line lists, case counts by date of event (epidemic curves), and maps of location of cases; flexibly select subsets of cases for display; apply appropriate statistical tests to detect improbable increases in case counts; and display multiple streams of data on the same chart. For example, users may want to display the epidemic curve of an influenza outbreak for sev-eral different regions of a state or for several different age groups, or to display counts of positive influenza tests and emergency department visits for influenza-like illness on the same graph with different scales for each. Syndromic surveillance systems have been leaders in developing and evaluating statistical algorithms for automated detection of anomalies which may, on investigation, turn out to be outbreaks. Such algorithms have less frequently been applied for automated detection of possible outbreaks or clusters in reportable disease data streams.

Most outbreaks and clusters are in fact not recognized by examination of regularly-collected surveillance system data. Instead, they are recognized by private citizens (such as the organizer of a social event, a teacher or school nurse, the manager of a child care center, the manager of a food service facility, an employer, or the ill people themselves) or by practicing doctors, and brought to public health attention via a phone call or e-mail or entry on a web site established for the purpose [32] .

Public health workers assess the information and make the decision whether or not to do a formal investigation of the outbreak. One part of such an assessment is to look at available streams of surveillance data and determine whether there is information supporting the occurrence of an outbreak. For example, a report of a possible influenza outbreak in a high school might prompt closer examination of syndromic surveillance data from nearby hospital emergency departments to determine whether there is an increase in visits for influenza-like illness. A report of a neighborhood cluster of brain cancers would prompt closer examination of available cancer registry information, which might or might not support an interim conclusion that such a cluster is real and statistically significant.

In order to be accountable for the effectiveness of their work, local and state health departments need to track the occurrence of outbreaks and the public health response to those outbreaks. Since outbreaks can be due to reportable or nonreportable diseases, this cannot be done only by actions such as identifying some cases in the reportable disease data system as being part of an outbreak. Systems to track the occurrence of outbreaks need to document a specific set of items (Table 14.3) .

This information about outbreaks should be stored for ready retrieval, and to serve as a basis for quality improvement efforts. For quality improvement purposes, it is also helpful to document the content of the summary report written about each outbreak. When the outbreak is due to a reportable disease, individual cases in the reportable disease surveillance information system can be linked to the outbreak, for example by having an outbreak identifier attached to their records.

If preliminary information about outbreaks in a jurisdiction is entered into the outbreak information system in real time, as the investigation is proceeding, and if the outbreak database is readily searchable by all communicable disease investigators in the jurisdiction, then local investigators can use the outbreak database to help them with investigations of new illness or outbreak complaints. For example, if investigators receive a complaint that illness has occurred in people who consumed a particular food product, they can look in the database and determine whether other recent or current complaints or outbreaks mention the same food product. If they receive a report about a gastroenteritis outbreak in a childcare center, they can determine what agents have been found to be responsible for recent or current similar outbreaks in nearby communities; this can help focus laboratory testing and initial control strategies. Some US states have had long-standing systems to document all outbreaks investigated by local or state personnel, but others have not. A major variable in the design of such systems is the state-local division of responsibilities in each state, including the degree of state oversight of 'routine' local outbreak investigations.

The actual investigation of an outbreak or cluster may involve enhanced active case-finding, use of case-report forms, group surveys, and formal epidemiologic studies. Active case-finding includes regular solicitation of case reports from doctors, hospitals, and laboratories. Managing the reports of possible, probable, and confirmed cases that are part of the outbreak is an important task. For a reportable disease, the jurisdiction's reportable disease surveillance system may be adequate to manage reported cases. It may be necessary, however, to create a continuouslyupdated line list of cases and their current status, which is outside the scope of the standard reportable disease application.

Outbreak investigation surveys will typically involve interviewing everyone with a possible exposure (e.g., all attendees of a wedding reception), whether or not they felt ill. Formal studies involve interviewing selected non-ill people as part of a case-control study. The investigation may also involve obtaining and sending to a laboratory a large number of specimens from ill persons, and sometimes from exposed non-ill persons and from environmental sources (food, water, air, soil, etc.). Managing these disparate types of information is a challenge, especially in a large outbreak or one involving multiple jurisdictions. There is currently no single widelyaccepted and satisfactory way to manage data in such settings. Each investigation team typically uses the tools it is most familiar with, including some combination of data management tools like MS Excel, MS Access, or Epi Info [33] , and standard statistical packages such as SAS and R. Many health departments maintain libraries of standard questionnaires with associated empty databases, for use during outbreak investigations. Complaint forms, such as the one developed by the Environmental Health Specialists Network (EHS-Net), can be used to collect information and determine whether a complaint may be linked to an outbreak [34] . When CDC is involved in a multistate outbreak, the investigation team at the local or state level needs to be able to produce and transmit timely case report and other information in the format desired by CDC. The services of an experienced public health informaticist can be extremely helpful to the investigation team when outbreaks are large and multifocal. An ongoing challenge for CDC and the states is how to make the transition from specialized case reporting during an outbreak of a new disease, such as West Nile Virus encephalitis or SARS, to routine case-based surveillance. If this transition is not well-managed, it is likely to result in the creation of a permanent stand-alone surveillance information system (or silo) for that disease. If the new disease is of national importance, cases should be made nationally notifiable and its surveillance should be incorporated into existing systems.

Laboratory information is a critical component of disease surveillance and prevention, and laboratory data form the foundation of many surveillance systems. There are different types of laboratories involved in the public health data stream. Laboratories providing data to public health fall into the general categories of commercial or private industry, hospital or clinical, and public health laboratories.

Public health laboratory information systems (LIS) contain information about test results on specimens submitted for primary diagnosis; for confirmation of a commercial or hospital laboratory's results; for identification of unusual organisms; or for further characterization of organisms into subgroupings (like serotypes) that are of epidemiologic importance. In some states, all clinical laboratories must submit all isolates of certain organisms to the public health laboratory. Many of the results obtained in a public health laboratory turn out to be for diseases that are not reportable and not targets of specific prevention programs. Some of those results may, however, be for cases of non-reportable diseases that are historically rare in the jurisdiction but of great public health importance, or are new or newlyrecognized. PH laboratories are discussed in detail in Chap. 15 .

The main business of clinical laboratories (located both inside and outside hospitals) is to test specimens for pathogens or groups of pathogens specified by the ordering physician, and return the results to the person who ordered the test. Public health agencies have, since the early 1990s, asked or required such laboratories to also identify results meeting certain criteria (indicating the presence of a case of a reportable disease/ condition) and send a copy of the results to the public health agency for public health surveillance. Initially, case reporting by laboratories was accomplished on paper forms, which were mailed or faxed to public health departments. Some laboratories very soon moved to mailing printouts of relevant laboratory results, then to sending diskettes, then to transferring computerized files containing laboratory results by direct modem-to-modem transfer, and eventually to transferring such files via the Internet using standard formats and vocabularies as is done with ELR. In some states, public clinics (e.g., STD clinics) have used contract laboratories for their testing needs. In this situation, the outside laboratory supplies both positive and negative results to the public health agency, increasingly by transfer of electronic results in standard formats.

Laboratories provide data on reportable conditions to their local or state public health authority. Reportable diseases/conditions are determined by each state; clinicians, hospitals, and/or laboratories must report to public health when these conditions are identified. Some reportable conditions may also be nationally notifiable. De-identified cases of these are voluntarily notified by states and territories to CDC. CSTE, in collaboration with CDC, decides which conditions to add to or subtract from the listing of nationally notifiable conditions that includes both infectious (e.g., rabies, TB) and non-infectious (e.g., blood lead, cancer) conditions [35] .

The public health partnership with laboratories has led to the very successful and still increasing implementation of electronic laboratory reporting (ELR) in the US. ELR refers to the secure, electronic, standards-based reporting of laboratory data to public health. ELR implementation has been steadily escalating since its inception around the year 2000, replacing previous reporting systems that relied on slower, more laborintensive paper reporting.

The benefits of ELR include more rapid reporting of reportable cases to public health departments, allowing faster recognition of priority cases and outbreaks for investigation and response, and thus more effective prevention and control [36] . ELR also is expected to reduce the number of missed cases, as automated systems do not require laboratory staff to actively remember to make case reports, and to improve the item-level completeness and quality of case reports. Although experience shows that the expected improvements in timeliness, sensitivity, completeness, and accuracy are generally being realized [37] , timeliness may not be improved substantially for those diseases where clinicians routinely report based on clinical suspicion without waiting for laboratory confirmation (for example, meningococcal disease) [38] . In addition, laboratories often do not have access in their own information systems to home addresses for people whose specimens they are testing, and some have struggled with providing complete demographic information to public health agencies.

Implementation of an operational ELR system is not a trivial undertaking. Laboratories must configure data into an acceptable message format, most commonly Health Level Seven (HL7 ® ) [39] . Laboratory tests and results should be reported with correlated vocabulary or content codes. Two of the most common code systems used for laboratory tests and their associated results are Logical Observations Identifiers Names and Codes (LOINC ® ) [40] and Systematized Nomenclature of Medicine (SNOMED CT ® ) [41] . Neither of these systems is sufficient by itself to encode all the information needed for public health surveillance. Health data standards are discussed in more detail in Chap. 8.

Historically, public health jurisdictions have introduced ELR to their partner laboratories using one or more of the following approaches:

a. Persuasion-relies on establishing goodwill and collaboration with laboratory partners. While this collegial approach is very appealing, it may be unable to overcome significant barriers such as lack of laboratory funding or resources, and some facilities will supply data only in methods specifically required by law. b. Incentives-involves offering either financial or technical assistance to laboratory partners, assisting them in the startup process of ELR. While this approach may be preferred by many laboratories, relatively few jurisdictions have the discretionary funds (or are able to receive federal assistance funds) to implement the approach. c. Legislation-relies on reporting rules or legislation that require laboratories to participate in ELR. Including low-cost options for smaller laboratories, such as web data entry, will allow those entities to benefit from an ELR-"lite" implementation.

The mainstreaming of ELR systems in the US has pioneered a clear trajectory for public health to begin maximizing its presence in the domain of electronic data interchange.

At a local level, case reports for communicable diseases prompt action [42] . Although the specific action varies by disease, the general approach is the same. It starts with an interview of the ill person (or that person's parents and other surrogates) to determine who or what the person was in contact with in ways that facilitate transmission, both to determine a likely source of infection and to identify other people who may be at risk from exposure to this person.

Information systems to support contact tracing, partner notification, and post-exposure prophylaxis (for STDs or TB, for example) contain records about all elicited contacts (exposed persons) for each reported case of the disease in question. These records contain information about each contact, such as where they were located, whether they received post-exposure prophylaxis, and the results of any additional partner-elicitation interviews or clinical testing that were completed. Information systems to support surveillance for other reportable diseases also increasingly contain information about what disease-appropriate action was taken in response to each case; such actions may include identification of contacts, education of household members, vaccination or antibiotic prophylaxis of contacts, isolation of the case (including staying home from work or school), or quarantine of exposed people. STD and TB information systems typically capture full locating information for contacts, and can be used both to support field work and to generate statistics on effectiveness of partner notification activities worker by worker and in the aggregate. Systems for other reportable diseases may capture only the fact that various interventions were done, and the date that these were initiated. Information about the timeliness of initiation of recommended control measures is now required as a performance measure for selected diseases, by CDC's Public Health Emergency Preparedness Cooperative Agreement [43] .

In the investigation of a case of meningococcal disease, contacts are people who had very close contact with the original person, for example a household member, boyfriend, or regular playmate. Health department staff determine who the close contacts are. Each will then be offered spe-cific antibiotic treatment to prevent illness. For syphilis, contacts are people who have had sex with the original case. Contacts will be examined by a clinician and assessed serologically to see if they are already infected, and offered appropriate antibiotic treatment or prophylaxis. For measles, contacts may include anyone who spent even a few minutes in the same room as a case. Contacts whose exposure was recent enough, and who are not fully immunized already, will receive a dose of measles-containing vaccine, and all contacts will be asked to self-isolate immediately if they develop symptoms of measles. In investigating a common-source outbreak of legionellosis, histoplasmosis, or anthrax, the local health department may want to locate everyone who had a specified exposure to the suspected source of the infection. These exposed people may need antibiotic prophylaxis or may be told to seek medical care promptly if they become ill.

Information systems to support this type of work typically have three purposes 1. Recordkeeping-Serve as a place for workers to record and look up information about people who are or may be contacts, and to track which contacts have and have not yet received needed interventions. 2. Information source-Serve as a source of information for calculating indices of program or worker timeliness and performance, such as the average number of sexual contacts elicited per syphilis patient interviewed, or the percentage of measles contacts who were identified in a timely way and who received post-exposure measles vaccine prophylaxis. 3. Documentation-Document the workload and effort put in by epidemiology field staff It seems logical that the surveillance information system should serve as the basis for a system to support field investigation, and this is often the case. The fact that the recommended interventions vary by disease makes designing a single system more complex. Existing systems that track field worker activities in detail are much more common for STD and TB programs than for others. For general communicable disease fieldwork, it is currently more common that the system simply documents which interventions were done and when, rather than using the application to track specific named contacts or exposed people.

Existing or planned surveillance systems for multiple diseases and conditions each have three broad functions-acquiring the raw data; cleaning, processing, analyzing, and managing the data; and making the data available to partners. Each of these functions potentially can be integrated, to varying degrees. For example, multiple surveillance systems may benefit from receiving electronic laboratory reports with a result indicating the presence of a case of a reportable disease. Laboratories appreciate having a single set of instructions and a single destination for all their required reports, as this simplifies their work. The laboratories then benefit from the ability of the recipient health department to route the reports internally to the right surveillance information system.

At the other end of the data pathway, users appreciate having a single interface or dashboard with which to examine data about multiple conditions or diseases, using the same commands and definitions. The users do not have to understand how different surveillance information systems may internally code the same concept in different ways. They also appreciate being able to directly compare information that originally was submitted for the use of different program areas-for example, hepatitis B and gonorrhea in the same chart or table.

In the short to medium term, it is not necessary to build a single integrated data repository or an MPI to achieve these goals, even if that is what would be designed if developing an entirely new system. However, if users want to be able to see information about the same person that originates and is stored in multiple systems-for example, so that TB clinicians can see HIV data on their patients and vice versa-then some degree of interoperability and integration is required. Some solutions might include an integrated data repository, an MPI, a dashboard that can integrate data, or a multi-system query capability. Modifying existing systems to be able to carry out these functions is time consuming and expensive, so the business case and requirements need to be especially clear.

Public health surveillance is the foundation of public health practice. The ability of public health to respond to disease outbreaks and other health-related events in the population relies upon timely and valid surveillance data. Recent advances in technology allowing electronic transfer of disease case reports have enabled more complete and rapid reporting and public health responses compared to historical methods of paper-based and fax reporting. Technology and science continue to rapidly evolve and change, as do public health information systems. However, the underlying principles of public health informatics and information systems are quite constant. This chapter has provided an overview of epidemiologic activities and discussion of information system requirements, both historic and future.

One of the goals most important to the future of public health and public health informatics is the elimination of data silos. Public health surveillance is comprised of work from thousands of health care facilities, laboratories, and local, state, territorial and tribal public health agencies. Across the levels of surveillance there exists a large degree of heterogeneity in information systems, particularly between health care and public health. As a result, surveillance data are often stored and shared as data silos, which are often incapable of sharing data across information systems, creating the potential for incidents of public health significance to go undetected.

One example of work towards this future is the Digital Bridge, established in 2016 to use currently available health information technology to establish bidirectional communication between public health and health care. The use of standards-based information exchange in the Digital Bridge seeks to eliminate siloed data exchange, ease health care providers' reporting burden from the current passive surveillance system, and facilitate rapid investigation of potential public health threats by automatically generating and transmitted case reports from electronic health records to public health agencies (automated electronic case reporting [44] ). In addition to modernizing the reportable diseases from health care and laboratories to public health, advances to surveillance infrastructure are needed to improve collaboration within public health. Currently, state health departments submit information to more than 100 different CDC surveillance systems, decreasing the efficiency of state health departments. The CDC is currently developing the Surveillance Data Platform, a cloud-based platform that aims to centralize and share data using a standards-based approach to improve data collection, storage, and use [45] .

As stated throughout this chapter, the underlying principles of public health surveillance and public health informatics remain constant, despite the emergence of new technologies. The immediate goals of public health surveillance are to focus on the application of current standards and technologies to improve data infrastructure and data sharing nationwide. The ultimate goal of eliminating data silos will require the widespread adoption of existing standards to facilitate interoperability within the public health agency as well as between the agency and health care entities (Chaps. 8 and 18) . Another requirement will be the establishment of improved funding structures to avoid the creation of new silos, and to support the integration of existing silo systems. And finally, development of broader informatics visions across and between agencies will be necessary in order to achieve these improvements.

Florida's Essence system: from syndromic surveillance to routine epidemiologic analysis across syndromic and non-syndromic data sources

Public Health Informatics Institute. Redesigning public health surveillance in an eHealth World. Decatur: Public Health Informatics Institut

History of public health surveillance

National Outbreak Reporting System (NORS)

About the Behavioral Risk Factor Surveillance System (BRFSS)

European Centre for Disease Prevention and Control. Sentinel surveillance

Review Questions 1. What are some of the methods for surveillance besides case-reporting? 2. How are registries different from other surveillance information systems?

What are the advantages and disadvantages of building a master person index across surveillance information systems for multiple diseases?

What are the expected benefits of electronic laboratory reporting as a method to enhance surveillance?

What are the advantages and disadvantages of building a system to manage information about case contacts as part of the surveillance information system?

Who determines for which diseases cases are nationally notifiable?

Atlanta Congenital Defects Program (MACDP)

Cost and Utilization Project (HCUP)

National Poison Data System

National Vital Statistics System

National Notifiable Diseases Surveillance System (NNDSS)

Evaluation of reporting timeliness of public health surveillance systems for infectious diseases

A comparison of the completeness and timeliness of automated electronic laboratory reporting and spontaneous reporting of notifiable conditions

Prioritizing investigations of reported cases of selected enteric infections. Paper presented at Council of State and Territorial Epidemiologists

Surveillance Case Definitions

Using electronic health records for public health hypertension surveillance

Towards estimating childhood obesity prevalence using electronic health records

Updated guidelines for evaluating public health surveillance systems

Design and operation of local and state infectious disease surveillance systems

National Syndromic Surveillance Program Knowledge Repository. Electronic syndromic surveillance using hospital inpatient and ambulatory clinical care electronic health record data: recommendations from the ISDS Meaningful Use Workgroup

National Program of Cancer Registries (NPCR)

About the Coverdell Program

Implementation Guidelines

Report on birth defects in Florida

A comparison of two surveillance strategies for selected birth defects in Florida

Online foodborne illness complaint form

Foodborne Illness Complaint Form

National Notifiable Diseases Surveillance System (NNDSS)

Statewide system of electronic notifiable disease reporting from clinical laboratories: comparing automated reporting with conventional methods

A comparison of the completeness and timeliness of automated electronic laboratory reporting and spontaneous reporting of notifiable conditions

Potential effects of electronic laboratory reporting on improving timeliness of infectious disease notification

Logical observation identifiers names and codes (LOINC ® )

SNOMED International Health Terminology Standards Development Organisation

Using technologies for data collection and management

Public health emergency preparedness cooperative agreement, budget period 1, at-a-glance: requirements and recommendations for strengthening public health emergency preparedness programs

Digital Bridge

Surveillance Data Platform (SDP) Program