key: cord-017942-og0b2l6b authors: Chen, Yi-Da; Tseng, Chunju; King, Chwan-Chuen; Wu, Tsung-Shu Joseph; Chen, Hsinchun title: Incorporating Geographical Contacts into Social Network Analysis for Contact Tracing in Epidemiology: A Study on Taiwan SARS Data date: 2007 journal: Intelligence and Security Informatics: Biosurveillance DOI: 10.1007/978-3-540-72608-1_3 sha: doc_id: 17942 cord_uid: og0b2l6b In epidemiology, contact tracing is a process to control the spread of an infectious disease and identify individuals who were previously exposed to patients with the disease. After the emergence of AIDS, Social Network Analysis (SNA) was demonstrated to be a good supplementary tool for contact tracing. Traditionally, social networks for disease investigations are constructed only with personal contacts. However, for diseases which transmit not only through personal contacts, incorporating geographical contacts into SNA has been demonstrated to reveal potential contacts among patients. In this research, we use Taiwan SARS data to investigate the differences in connectivity between personal and geographical contacts in the construction of social networks for these diseases. According to our results, geographical contacts, which increase the average degree of nodes from 0 to 108.62 and decrease the number of components from 961 to 82, provide much higher connectivity than personal contacts. Therefore, including geographical contacts is important to understand the underlying context of the transmission of these diseases. We further explore the differences in network topology between one-mode networks with only patients and multi-mode networks with patients and geographical locations for disease investigation. We find that including geographical locations as nodes in a social network provides a good way to see the role that those locations play in the disease transmission and reveal potential bridges among those geographical locations and households. In epidemiology, contact tracing is a process to control the spread of an infectious disease and to identify individuals who were previously exposed to patients with the disease. Through contact tracing, healthcare workers can trace the possible source of infection for those patients, monitor individuals who may develop the disease, and prevent further spread of the disease. Traditional contact tracing is based on the notion that disease spread is either serial or parallel, but linear in either case [13] . According to this notion, the transmission of infectious diseases is like a branching tree with a single source and no interconnection among leaf nodes. This notion, however, ignores the fact that people are interconnected in a complex social context and that such interconnection has implications for the spread of infectious diseases. Although the idea of social network analysis in contact tracing can be traced back to the 1930s [1] , it got major attention after the emergence of AIDS. In 1985, Klovdahl [6] used AIDS as an example to illustrate the usefulness of Social Network Analysis (SNA) in studying the transmission of an infectious disease. Since then, SNA has been successfully applied to AIDS [8] [14] , gonorrhea [4] , syphilis [15] , tuberculosis (TB) [7] [9] , and Severe Acute Respiratory Syndrome (SARS) [10] . Through those studies, SNA has been demonstrated to be a good supplementary tool for contact tracing. Compared to traditional contact tracing, SNA provides a perspective to conceptualize the disease transmission within a group of people, identify the people who act as bridges between subgroups, and reveal some previously unrecognized patterns of transmission. In a study of a TB outbreak, McElroy et al. [9] found that the use of crack cocaine was an important factor in the outbreak which had not been recognized by the local TB control program. Since personal contacts are the most identifiable paths for disease transmission, the majority of SNA studies in epidemiology use personal contacts to model the spread of diseases. For sexually transmitted diseases (STDs), which require intimate contact for transmission, personal contacts are adequate to form a well-connected network for investigation. However, for diseases which can also transmit through the air or through contaminated inanimate objects, such as TB and SARS, personal contacts alone are not enough to explain the underlying context of disease transmission. In a study of a TB outbreak in Houston, Texas, Klovdahl et al. [7] included places as a type of actors in SNA and found that geographical contacts were critical for understanding the outbreak. In another study of a TB outbreak, McElroy et al. [9] also included attendance at exotic dance clubs in their network analysis and discovered some potential contacts among patients. Incorporating geographical contacts into SNA for contact tracing may raise some doubts about the connection of two people via their geographical contacts. Since geographical locations are places of social aggregation, many people are easily connected together via their geographical contacts without any actual contact. However, from these two studies, we can see that incorporating geographical contacts into SNA provides us a good way to find potential connections among patients and to see the role that those geographical locations play in disease outbreaks. Therefore, in this research, we aim to further investigate the necessity of incorporating geographical contacts into SNA for contact tracing and explore the strengths of multi-mode networks with patients and geographical locations for disease investigation. The remainder of the paper is organized as follows. In Section 2, we review the studies of SNA in epidemiology. In Sections 3 and 4, we discuss our research questions and present our research design. In Section 5, we report our analysis results. Section 6 concludes this paper with implications and future directions. In epidemiology, the modeling of disease transmission traditionally focuses on biological factors, such as the period of infectiousness and duration of incubation, and treats social factors as random or homogeneous mixing [11] . After the emergence of AIDS, SNA was shown to provide better understanding of the spread of some infectious diseases which transmit through intimate personal contact [6] . SNA has two major steps: network construction and analysis. During network construction, individuals in a group are represented as nodes in a network and connected through certain types of contact. In a study of a TB outbreak in Houston, Texas, Klovdahl et al. [7] included geographical contacts in their network and found that they were critical for understanding that outbreak. Later McElroy et al. [9] also included geographical contacts and had similar findings in their investigation. During the SARS outbreak in 2003, there were several super-spreaders, patients who directly infected more than 10 other people, reported in Singapore [2] and Beijing [16] . To illustrate SARS transmission through super-spreaders, social networks were used and constructed via the reported source of patients' infection. There are three levels of network analysis: network visualization, measurement, and simulation. In the visualization level, SNA is applied to visualize the disease transmission within a particular group. Through network visualization, we can identify subgroups within the group and the people who act as bridges to transmit the investigated disease from one subgroup to another. Combining some background information, we can further characterize each group and explain the underlying context of the transmission. In their investigation of a syphilis outbreak, Rothenberg et al. [15] reported that network visualization uncovered several people who acted as bridges between subgroups. They also included ethnographical data in their analysis and indicated that a group of young white girls was at the center of the outbreak and had interactions with several groups of boys, including a group of African-American boys. Network visualization for disease investigation suffers from some limitations. First, if the inputted network data is incomplete, the constructed network may have some missing nodes or fragments which could mislead our interpretation of the investigated scenario [4] . Second, the constructed network alone cannot provide a complete picture of the disease transmission [13] . We need some complementary data, such as ethnographic data, to explain the context of environment. In the measurement level, we measure structural properties of the constructed networks and make inferences about disease transmission from those properties. In epidemiology, the measurement focuses on the centrality of networks, which provides information on the importance of individuals in a network and reveals potential bridges between subgroups. By understanding the network structure, we can identify the group of people who are central to the disease transmission and design effective containment strategies to break the chains of transmission [6] . Most of the studies of SNA construct and measure only one social network. However, some studies construct several networks at different points of time in order to investigate the changes of network structure through time. For example, in a study of AIDS transmission, Rothenberg et al. [14] constructed three social networks with the same patients at different points of time and assessed network stability, behavioral change, and structural change among these three networks. In network simulation, mathematical models are used to form a contact network in which several parameters of the network may influence the transmission of the investigated disease. Network simulation is particularly useful when we have only a little knowledge about the investigated disease. For example, Meyers et al. [10] used three different networks, urban, power law, and Poisson, to study how contact patterns influence SARS transmission and found that in the same network settings, different contact patterns yield totally different epidemiological outcomes. The studies of SNA in epidemiology primarily use only personal contacts to construct social networks. This may stem from the fact that personal contacts are the most common and identifiable paths for disease transmission. However, for some diseases which transmit through multiple vectors, personal contacts alone may not be sufficient to explain the underlying context of the transmission. In an investigation of a TB outbreak in Houston, Texas, Klovdahl et al. [7] reported that in 37 investigated patients there were only 12 personal contacts found to explain that outbreak. Therefore, they incorporated geographical contacts into SNA by including geographical locations as nodes in their network and found that geographical contacts were critical for understanding the outbreak. In a study of another TB outbreak, McElroy et al. [9] included attendance at exotic dance clubs in their investigation and found some potential contacts among patients. Incorporating geographical contacts into SNA for contact tracing may raise some doubts about the connection of patients via their geographical contacts. First, geographical locations are places of social aggregation and it is easy to find many potential contacts among patients through their geographical contacts. Second, it is also questionable to connect two patients together only because they have been to the same place. They may have been to the place in different months. However, as noted by Klovdahl et al. [7] , "the appropriate relative person/place emphasis would depend on characteristics of an outbreak and the populations at risk." For the diseases which transmit not only through personal contacts and may have hospital or community outbreaks, incorporating geographical contacts into investigations would pinpoint the role that those geographical locations play in disease transmission. Since geographical locations are places of social aggregation, including geographical contacts in SNA helps to find potential connections among patients but also brings some noise into network analysis. Previous studies have not systematically discussed the necessity of incorporating geographical contacts and investigated the differences in connectivity between personal and geographical contacts in the construction of social networks or, more precisely, contact networks for disease investigation. Previous studies also have not explored the differences in network topology between one-mode networks with only patients and multi-mode networks with patients and geographical locations. In this research, we aim to answer the following questions: 1. What are the differences in connectivity between personal and geographical contacts in the construction of contact networks for the diseases which transmit not only through personal contacts? 2. What are the differences in network topology between one-mode networks with only patients and multi-mode networks with patients and geographical locations for disease investigation? In this section, we first introduce our research test bed which contains Taiwan SARS data and summarize the important events of SARS transmission in 2003. Then we present our research design. All data reported are scrubbed to protect privacy. SARS first emerged in Guangdong Province, mainland China, in November 2002 [12] . By late February 2003, it spread to more than 20 countries, including Hong Kong, Singapore, Canada, and Taiwan. SARS was transmitted mainly through close personal contacts with infectious droplets. However, there was evidence showing that SARS could also transmit through the air [17] and contaminated inanimate objects [3] . SARS patients were primarily infected in healthcare and hospital settings [12] . The first SARS patient in Taiwan, a businessman working in Guangdong Province, was reported on 8 March 2003 [5] . On 24 April, Taipei Heping Hospital, a municipal hospital, was reported to have the first SARS outbreak in Taiwan. In the beginning of May, several hospitals in Taiwan were reported to have SARS outbreaks. Taiwan SARS data was collected by the Graduate Institute of Epidemiology at National Taiwan University. It contains the contact tracing data of all SARS patients in Taiwan. In the test dataset, there are 961 patients, including 638 suspected and 323 confirmed SARS patients. The contact tracing data has two main categories, personal and geographical contacts, and nine types of contact. Table 2 summarizes the numbers of records and patients involved in each type of contact. Figure 1 shows our research design. We first apply personal and geographical contacts to construct contact networks. Then we perform connectivity analysis and evaluate network topology on the constructed networks. For personal contacts, we represent each SARS patient as a node in a contact network. If two patients have any personal contacts, such as family member, we connect these two patient nodes with a line. For geographical contacts, we introduce areas, such as countries or cities, and hospitals as two additional kinds of node in the contact network. For each geographical contact, we connect the patient node of that contact to the node of the geographical location with a line. Figure 2 demonstrates the contact network construction. Connectivity analysis has two steps: network transformation and connectivity measurement. In this research, we want to know how many other patients a particular patient is connected with after we apply a certain type of contact to construct a network. We first construct a contact network with only the type of contact we want to investigate. Table 3 lists the types of contact in our investigation. Geographical Contact Since a constructed contact network is a multi-mode network which has three kinds of nodes, patients, areas, and hospitals, we need to transform the constructed network into a one-mode network which has only one kind of nodes, patients. After the transformation of the constructed contact network, we are able to measure how many other patients a particular patient connects to with the type of contact under investigation by the standard SNA measures. In the network transformation, we connect two patient nodes together in the one-mode network if these two patient nodes are connected to the same geographical node in the contact network. For example, in Figure 2 , Patient 2 and 3 are connected to Area 1 in that contact network. In the network transformation, we connect Patients 2 and 3 together in the one-mode network of that contact network since they all have been to Area 1. In this research, connectivity is defined as the degree to which a type of contact can connect individual patients to form a network. We use two measures commonly used in SNA, the degree of a node and number of components, to measure the connectivity. The degree of a node in a network is the number of lines that are incident with the node. The number of components is the number of maximum connected sub-networks in a network. Logically, if a type of contact has high connectivity in network construction, the degree of a patient node should significantly increase from zero and, at the same time, the number of components should also significantly decrease from the total number of nodes. After each constructed contact network is transformed into a one-mode network, we perform the measurement on it. In this research, we also want to know whether including geographical locations as nodes in contact networks provides additional insights in disease investigation. We first evaluate the differences in network topology between a contact network and its transformed one-mode network. Then we analyze the potential insights that a contact network with geographical nodes provides. In this section, we present our research results. In the connectivity analysis, we first evaluate the differences in connectivity between two main categories, personal and geographical contacts in the construction of SARS contact networks. Table 4 shows the results of connectivity analysis for personal and geographical contacts. From Table 4 , we can see that geographical contacts provide much higher connectivity than personal contacts do in the SARS study. If we only apply personal contacts to construct a contact network, we increase the average degree of nodes from 0 to 0.31 and decrease the number of components from 961, the total number of patient nodes, to 847. In other words, the constructed contact network with only personal contacts is too sparse to get a comprehensive understanding of SARS transmission. In contrast to personal contacts, geographical contacts increase the average degree of nodes from 0 to 108.62 and decrease the number of components from 961 to 82. This means that the majority of SARS patients have no personal contacts with each other and that the connections among them are mainly through attendance at some hospitals or high risk areas. In Table 4 , the maximum degree of nodes in geographical contacts is 474. The node with the maximum degree represents a news reporter. He went to two hospitals during the outbreak investigation and then was admitted to another hospital after his onset of symptoms. Therefore, through his geographical contacts he is connected to other patients of these three hospitals. Table 5 shows the connectivity analysis of all nine types of contact. We can see that hospital admission history has the highest connectivity and hospital visit is the second highest. This is consistent with the fact that patients are primarily infected in hospitals. In personal contacts, family member has the highest connectivity. However, its average degree and number of components is still lower than any type of contact in the geographical contacts category. We use the contact network with all available personal and geographical contacts to explore the differences in network topology between the contact network and its transformed one-mode network. In this contact network, there are 961 patient nodes, 22 area nodes, 14 hospital nodes, and a total of 1313 edges. Figures 3 and 4 show the contact network and its transformed one-mode network. As seen in the one-mode network shown in Figure 4 , with only patient nodes, it is difficult to distinguish the interconnections between them through the network visualization when their relationships get complicated. Compared to its one-mode network, the contact network clearly shows the role of those geographical locations in disease transmission. Since there were several hospital outbreaks during the SARS period in Taiwan, we can see from Figure 3 that the majority of patient nodes are spread around the hospital nodes in the contact network. Furthermore, incorporating geographical locations into modeling of disease transmission also helps to reveal some potential people who act as bridges to transfer a disease from one subgroup to another one. Figure 5 shows the potential bridges among hospitals and households in the contact network. For a hospital outbreak, including geographical contacts in the network is also useful to see the possible disease transmission scenario. Figure 6 demonstrates the evolution of a small contact network of Heping Hospital through the onset dates of The studies of SNA in epidemiology primarily use personal contacts to construct social networks and model the transmission of diseases. However, for the diseases which also transmit through the air or through contaminated inanimate objects, personal contacts alone are not enough to explain the underlying transmission context. Previous studies have demonstrated that including geographical contacts in SNA can reveal potential contacts among patients. In this research, by using Taiwan SARS data as the test dataset, we further investigate the differences in connectivity between personal and geographical contacts in the network construction for these diseases. From our research results, we can see that geographical contacts provide much higher connectivity in network construction than personal contacts. Therefore, for modeling the transmission of these diseases, incorporating geographical contacts into SNA is necessary to construct a well-connected contact network for investigation. We also explore the strengths of including geographical locations as nodes in the network visualization. Our results show that introducing geographical locations in SNA provides a good way not only to see the role that those locations play in the disease transmission but also to identify potential bridges between those locations. If we apply some context data, such as the onset dates of symptoms, we can further understand the development of some outbreaks. For SARS, there is evidence showing that some patients were infected in flights or trains [12] . Incorporating transportation contacts into SNA will be our next focus. In addition, including geographical locations into SNA helps to reveal potential contacts but also brings some noise in network analysis. Therefore, we will also study approaches to filter those potential contacts and find the critical ones. Automating Social Network Models for Tuberculosis Contact Investigation Severe Acute Respiratory Syndrome -Singapore The Role of Sexual Partnership Networks in the Epidemiology of Gonorrhea Severe Acute Respiratory Syndrome Epidemic in Taiwan Social Networks and the Spread of Infectious Diseases: the AIDS Example Networks and Tuberculosis: an Undetected Community Outbreak Involving Public Places Social networks and Infectious Disease: the Colorado Springs Study A Network-Informed Approach to Investigating a Tuberculosis Outbreak: Implications for Enhancing Contact Investigations Network Theory and SARS: Predicting Outbreak Diversity Epidemiology and Social Networks: Modeling Structured Diffusion The Severe Acute Respiratory Syndrome Commentary: the Relevance of Social Network Concepts to Sexually Transmitted Disease Control Social Network Dynamics and HIV Transmission Using Social Network and Ethnographic Tools to Evaluate Syphilis Transmission Evidence of Airborne Transmission of the Severe Acute Respiratory Syndrome Virus