key: cord-0810724-hs38s0vc authors: Willem, Lander; Hoang, Thang Van; Funk, Sebastian; Coletti, Pietro; Beutels, Philippe; Hens, Niel title: SOCRATES: An online tool leveraging a social contact data sharing initiative to assess mitigation strategies for COVID-19 date: 2020-03-06 journal: nan DOI: 10.1101/2020.03.03.20030627 sha: 9c33486a49de4aea64ce61c0a2c21a88c316b6a8 doc_id: 810724 cord_uid: hs38s0vc Objective: Establishing a social contact data sharing initiative and an interactive tool to assess mitigation strategies for COVID-19. Results: We organized data sharing of published social contact surveys via online repositories and formatting guidelines. We analyzed this social contact data in terms of weighted social contact matrices, next generation matrices, relative incidence and R0. We incorporated location-specific isolation measures (e.g. school closure or telework) and capture their effect on transmission dynamics. All methods have been implemented in an online application based on R Shiny and applied to COVID-19 with age-specific susceptibility and infectiousness. Using our online tool with the available social contact data, we illustrate that social distancing could have a considerable impact on reducing transmission for COVID-19. The effect itself depends on assumptions made about disease-specific characteristics and the choice of intervention(s). Keywords: social contact data, user interface, transmission dynamics, infectious diseases, epidemics, social distancing, behavioral changes, data sharing initiative, open-source, COVID-19 Given the expanding pandemic of SARS-CoV-2, which causes COVID-19 disease, it is of great importance to consider and plan intervention strategies to slow down SARS-CoV-2 spread, with the aim to flatten the epidemic peak, and thus decrease surge capacity problems arising to health care provision and essential supplies. This may also allow to buy time for interventions, such as specific antivirals (and perhaps vaccines) to become available for widespread use [1] . Social distancing on a large scale, first at the epicentre of the outbreak in Wuhan, and later in other locations was shown to slow down SARS-CoV-2 spread (e.g. in Shanghai [2] ). Social contact surveys have proven to be an invaluable source of information about how people mix in the population [3] [4] [5] . They have been shown to explain close contact infectious disease data well [6] [7] [8] . During the A(H1N1)v2009 pandemic, contact survey data were used to reproduce the observed incidence patterns of the emerging outbreak [9] . Hens et al. [10] used social contact data collected in the POLYMOD project [4] to quantify the impact of school closure on the spread of airborne infections. This was done by comparing the basic reproduction number, or the average number of secondary infections caused by a single infectious individual in a completely susceptible population, derived using mixing patterns observed on weekends or during a holiday period with those derived using mixing patterns observed on weekdays. By considering mixing patterns at different locations including or excluding the contribution of some of these locations, social distancing measures can be mimicked and their impact on disease spread can be investigated to potentially guide policy makers. In this research note, we highlight a social contact data sharing initiative we recently launched and present an online tool to facilitate access to these data. We build upon the socialmixr R package [11] and hope to contribute to the analysis of social distancing measures. As a case study, we exploit the tool to quantify the potential impact of school closures and a shift of workers from a common workplace, to teleworking at home. The social contact data sharing initiative started under the umbrella of the ERC consolidator grant "TransMID" (Grant number: 682540). Following a systematic review [3] , the authors of publications describing different social contact surveys were contacted to share their data subject to ethical approvals and GDPR compliance. These authors were either requested to format their data according to guidelines we developed during a TransMID Social Contact Data Hackaton (on 6 & 7 November 2017), or the data was refactored by TVH and PC. Each survey is split into multiple files to capture data on participants, contacts, survey days, households and time use. For each data type, there is one "common" file in which variables that are available in most contact surveys are included; and an "extra" file in which more specific variables related to the survey are included. Each data set contains a dictionary to interpret the columns correctly (see socialcontactdata.org for more information). To extrapolate survey data to the country level, we apply participant weights to account for age and the day of the week she/he participated. Reference data on demography is based on the United Nation's World Population Prospects 2015 provided by the wpp2015 package [12] . Weights for type of day account for the proportion of week (5/7) and weekend days (2/7). We constrain weights to a maximum of 3 to limit the influence of single participants. We denote w d it the weight for participant t of age i who was surveyed on day type d ∈ {weekday, weekend}. The (i, j)th element of the (weighted) social contact matrix m ij represents the mean number of contacts with people in age class j during one day reported by a respondent in age class i and can be estimated by: where y ijt denotes the reported number of contacts made by participant t of age i with someone of age j. By nature, contacts are reciprocal and thus m ij N i should be equal to m ji N j . Due to differences in reporting, reciprocity needs to be imposed by considering with N i the population size in age class i [13] . This reciprocal behavior might not be valid for specific contact types, e.g. contacts at work for retail workers are most likely not contacts at work for their customers. Therefore, reciprocity should not always be imposed. The next generation matrix G with elements g ij indicates the average number of secondary infections in age class i through the introduction of a single infectious individual of age class j into a fully susceptible population [14] . The next generation matrix is defined by: with D the mean duration of infectiousness, M the contact matrix and q a proportionality factor [8; 10] . The proportionality factor q can be age-dependent and combines several characteristics that are related to susceptibility and infectiousness. It can also be considered a correction factor expressing to which extent the contact matrix represents a proxy for the circumstances under which transmission between infectious and susceptible persons occurs for the particular pathogen under analysis. The basic reproduction number R 0 can be calculated as the dominant eigenvalue of the next generation matrix. The expected incidence by age is proportional to the leading right eigenvector of G [4] . We focus on interventions and how they affect R 0 and the relative incidence. By cancelling disease specific features (though these could be readily implemented by allowing the proportionality factor q to be age dependent), we focus on the impact of adjusted social contact patterns only, in line with the so-called social contact hypothesis [6] . To estimate the relative change in R 0 , we used the R 0 ratio: where indices a and b refer to the different conditions. The R 0 ratio can be estimated using only social contact rates when assuming q to be constant since the normalizing constants cancel [10] . Under the same condition, the ratio of relative incidences is given by the ratio of normalised right eigenvectors for conditions a and b, respectively. Based on the reported contact locations, it is possible to exclude or reduce subsets of the social contact data. To do so, contacts at multiple locations were assigned to a single location in the following hierarchical order: (1) contacts at home, (2) contacts at work, (3) contacts at school, (4) contacts during transportation, (5) contacts during leisure activities and (6) contacts in other locations. For example, school closure can be simulated by excluding all contacts that took place "at school" before calculating m ij . To account for an increase in telework to proportion p target telework , we account for the observed social contacts at work M observed work and the observed proportion of telework p observed telework : To simulate the effect of telework and school closure, the social contact matrix M is calculated as: 3 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . We used the R package shiny [15] to build an interactive web application to access and visualise the social contact data. This application consists of a user interface (UI) and server script that use data processing algorithms based on the socialmixr package [11] . The UI enables the selection of a country, age categories, type of day (weekday, weekend, holiday, regular), contact duration (<15min, >15min, >1h, >4h), contact intensity (physical or non-physical) and gender (femalemale, male-female, male-male, female-female). Using a selection box, the user can opt to disable the assumption of reciprocity and participant weights by age and type of day. Finally, the user can enable reactive strategies such as school closure and increase the level of telework. Please note that the proportion of telework can only increase given a specified observed proportion. The extrapolation of social contact matrices given reductions in telework falls outside the scope of this project. Based on the selected inputs shown on the left hand side of the screen, the social contact matrix M is plotted on the right hand side. We use a color scale to indicate the number of contacts and superimpose the numeric values to the figure. Below this figure, the principal results of the social contact analysis are printed: the elements of M along with participant info. For reciprocal matrices and/or weighted matrices, the demography data and weights used are also displayed. If reactive strategies are selected, the effect in terms of R 0 , M and the relative incidence ratios are presented. All results can be downloaded as RData file. Note that we will continue to develop this tool and thus the output/plots/scenarios might change in future editions. We estimate the effect of school closure and telework on disease transmission dynamics. In order to do this, we use 3 age classes: 0-18 years, 19-60 years and over 60 years of age. For each country, we calculate contact rates between each age group after excluding data from holiday periods. We exclude compensation behavior if people do not go to work or school, to simulate quarantine-like scenarios. We fixed the reference proportion of telework at 5%, in line with European observations [Eurostat, 2020; https://ec.europa.eu/eurostat/data/]. We analyse the change in transmission dynamics with 20%, 35% and 50% telework with and without school closure, based on earlier survey-based responses on the possibilities of employees to conduct their work activities remotely as teleworker [16] . The http://www.socialcontactdata.org initiative, status 1 March 2020, includes data for Belgium, Finland, Germany, Italy, Luxembourg, Netherlands, Poland and the UK from the POLYMOD study [4] , as well as data from further studies of social mixing in France [17] , China [18] , Hong Kong [19] , Peru [20] , UK [21] , Russia [22] , Zimbabwe [23] , South Africa and Zambia [24] . All data are available on Zenodo [25] [26] [27] [28] [29] [30] [31] [32] [33] and can be retrieved from within R using the socialmixr package. Survey details are provided in the systematic review of Hoang et al [3] . The data sets for France and Zimbabwe contain multiple days per participant, hence we selected the first day for each participant (to minimise the effect of reporting fatigue). The SOcial Contact RATES (Socrates) data tool enables quick and convenient generation of social contact matrices, relevant for the spread of infectious diseases. Figure 1 presents a screenshot of the user interface. The analysis of school closure and increasing the proportion of telework is only one demonstration of the potential uses of this platform. The options and potential of 4 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.03.20030627 doi: medRxiv preprint using social contact patterns to simulate infectious disease transmission seem endless, and we hope with this initiative to support data-driven modeling endeavours. We provide the source code via github.com/lwillem/. COVID19 case study Figure 2 shows the effect of an increasing proportion of telework by country with and without school closure. For most countries, we predict a 10% decrease in the R 0 with a telework proportion of 50%. In some countries, like China, Poland and Hong Kong, the reduction is slightly higher. The analysis for Peru shows little impact of telework. This can be explained by the observation of Grijalva et al [20] that participants reported few contacts at work whereas a substantial proportion of contacts was reported at the market or street. Cultural differences in how "at work" is understood should be taken into account when interpreting results. The effect of school closure is country-specific, e.g. 10% for Belgium and Vietnam, which appears to be similar in effect size to an increase in telework up to 50%. For other countries, e.g. Italy, Luxembourg and France, we predict school closure to decrease the R 0 by 20%. The relative incidence, as presented in Figure 3 , shows the impact of school closure compared to an increase in telework. The predicted relative incidence in people 18-60 years of age decreases with an increasing proportion of telework. That is, this measure provides some protection from exposure, which might be of interest if these age groups are more vulnerable compared to children, as is the case for COVID-19 [34] . The relative incidence in the age group above 60 years of age increases in both situations compared to no intervention. This does not imply that the absolute number of cases in this age group would rise. This only means that the risk of infection in other age groups is more affected by the intervention (which reduces overall incidence) relative to their normal social contact behavior. Given that our intervention measures target only children and the population of working age, this observation is as expected. Most survey designs were derived from the POLYMOD survey design though each survey had additional features and objectives which could provide useful additional information. Therefore tools such as this one do not capture the full potential of each data set separately. The social contact analysis presented here focuses only on adapting school and work contacts. It does not capture compensation behavior due to not being at school or work, nor social distancing due to (pandemic) scares. Our estimates only account for adapted social contact patterns and do not account for age-specific differences in susceptibility or shedding. For example, assuming susceptibility and infectivity is lower for children, would imply that school closure as an intervention would have less impact whereas telework would have a larger impact. Our estimates also do not take into account travel restrictions or cancellation of public events, both of which may well have a large impact. The application contains a local version of each data set, with some additional data reformatting, though our future aim is to add an option to directly use data from the Zenodo repository. Note that other social contact surveys are available on Zenodo, though we have not yet included those surveys because they have a different set up. For example, in studies from China [18] and the UK [21] groups of contacts instead of unique records were recorded and only infants were recruited, respectively. In contrast, data from Zambia and South Africa had almost no information for individuals aged 0-18 5 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.03.20030627 doi: medRxiv preprint years of age, and data from Zimbabwe did not include location. Therefore these data were omitted here. The social contact data sharing initiative is part of the ERC consolidator grant "TransMID" which received ethical approval from the Hasselt University Medical Ethical Committee (CME2016/618) All data and material is open source. The authors declare no competing interests. CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.03.20030627 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . Figure 1 : Screenshot of the online application: This application enables the selection of country data in combination with temporal and contact features. The social contact matrix is shown on the right in addition to numerical results. The impact of intervention measures can be estimated in terms of R 0 and relative incidence ratios (not shown). See main text for more info. . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Figure 2 : Predicted R 0 ratio by country due to increased teleworking and/or school closure: The reference proportion for telework is fixed to 5% to present a relative increase in telework. . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.03.20030627 doi: medRxiv preprint with school closure Figure 3 : Predicted relative incidence by country with increased teleworking and/or school closure: The reference proportion for telework is fixed to 5% to present a relative increase in telework. . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.03.20030627 doi: medRxiv preprint Preparedness and vulnerability of African countries against importations of COVID-19: a modelling study. The Lancet A descriptive study of the impact of diseases control and prevention on the epidemics dynamics and clinical features of SARS-CoV-2 outbreak in Shanghai A systematic review of social contact surveys to inform transmission models of close-contact infections Social contacts and mixing patterns relevant to the spread of infectious diseases A nice day for an infection? weather conditions and social contact patterns relevant to influenza transmission Using data on social contacts to estimate age-specific transmission parameters for respiratory-spread infectious agents Using empirical social contact data to model person to person infectious disease transmission: an illustration for varicella Estimating infectious disease parameters from data on social contacts and serological status The impact of illness and the impact of school closure on social contact patterns Estimating the impact of school closure on social mixing behaviour and the transmission of close contact infections in eight European countries socialmixr: Social Mixing Matrices for Infectious Disease Modelling. The Comprehensive R Archive Network Population Division Handbook of Infectious Disease Data Analysis On the definition and the computation of the basic reproduction ratio R 0 in models for infectious diseases in heterogeneous populations Shiny: web application framework for r. R package version The French connection: the first large population-based contact survey in France relevant for the spread of infectious diseases patterns of human social contact and contact with animals in Social contact patterns relevant to the spread of respiratory infectious diseases in Hong Kong A household-based study of contact networks relevant for the spread of infectious diseases in the highlands of Peru The social life of infants in the context of infectious disease transmission; social contacts and mixing patterns of the very young Reactive school closure weakens the network of social interactions and reduces the spread of influenza Social contact structures and time use patterns in the Manicaland Province of Zimbabwe Age-and sex-specific social contact patterns and incidence of mycobacterium tuberculosis infection Russian Contact Matrices by Age Social Contact Data for Vietnam Social Contact Data for Hong Kong Social Contact Data for Zambia and South Africa Social Contact Data for China Mainland Clinical characteristics of coronavirus disease 2019 in China