key: cord-0110193-47jlhc3d authors: Qundus, Jamal Al; Schafermeier, Ralph; Karam, Naouel; Peikert, Silvio; Paschke, Adrian title: ROC: An Ontology for Country Responses towards COVID-19 date: 2021-04-15 journal: nan DOI: nan sha: 53cd46ac2446d11f4e0cd66cd0c3505077ccd1fa doc_id: 110193 cord_uid: 47jlhc3d The ROC ontology for country responses to COVID-19 provides a model for collecting, linking and sharing data on the COVID-19 pandemic. It follows semantic standardization (W3C standards RDF, OWL, SPARQL) for the representation of concepts and creation of vocabularies. ROC focuses on country measures and enables the integration of data from heterogeneous data sources. The proposed ontology is intended to facilitate statistical analysis to study and evaluate the effectiveness and side effects of government responses to COVID-19 in different countries. The ontology contains data collected by OxCGRT from publicly available information. This data has been compiled from information provided by ECDC for most countries, as well as from various repositories used to collect data on COVID-19. Almost all countries are affected by the (second wave) COVID-19 pandemic. Many organisations have setup systems and projects to collect and publish global data on the impact of the virus allowing to monitor the pandemic. Most countries have implemented responsive measures as reaction to COVID-19. The impact of the pandemic and policies implemented differ widely among countries. Meanwhile data on the infection spread and government responses during the first phase of the pandemic (March to July 2020) is available to enable research on the effects of individual policies. Policy makers face tough decisions on how to deal with the pandemic, since responses considered most effective have also significant negative impacts on economy and social life. This is strongly reflected in the measures implemented by the countries. The policies implemented and the reactions of society vary greatly among countries and cities. While many people support their government strategy, others reject measures included. Empirical studies need to be conducted to support decision-makers and information campaigns. Which responses have proven to be effective in containing and what are their effects on the economy and society? are questions that apply to individual countries as well as to cultural, political and geographical regions. To enable empirical research, global data on infections, recoveries, death rates and policies implemented in various countries and regions is published via various sources. The challenge for data scientists and epidemiologists is that these data sources do not share common standards or methodologies for reporting their data. The reported data is influenced by time zones/holidays, different political and/or economic incentives, variation in counting methods 1 , discovered / undiscovered numbers, etc. In addition, in many cases country's reports differ depending on the reporting institute. For example, the numbers of infected people in Germany differs between Johns Hopkins Coronavirus Resource Center 2 (USA) and Robert Koch institute 3 (GER). Unfortunately, this difference is not comprehensible. To study this valuable information and perform statistical analysis on it, a common standard for harmonizing and reporting the data is required. Many researchers are taking up this challenge, and the first solutions to this problem have already been developed. The COviD-19 ontology (CODO) is an ontology for organizing cases data and patient information and aims to use the technology of knowledge graphs to analyse the pandemic [3] . Our work follows the same approach to ontology design and has a common motivation. Nevertheless, it focuses on areas not yet covered, government responses to the pandemic, and therefore this work is equally unique in this field. The ontology developed in this paper addresses the following goals: • serve as a reference scheme for use in reporting COVID-19 related data. • provide a common conceptualization and thereby abstract from heterogeneous structures of existing sources of (static) data and provide a common linking schema, which is essential for data accessibility. • provide a defined data structure with a fixed semantics for further analysis or monitoring systems. • offer a supplement of the existing ontologies in order to build a common global data model. • offer a template to organize data from other pandemics or nationwide events or measures. • (as a follow-up outcome) enable countries to make coordinated strategic decisions while avoiding lockdowns and their entailed risks For the ontology development process, we followed a hybrid approach, which is driven by available data on the one hand and questions of interest that should be answerable by the ontology on the other hand. We mainly collected data from OxCGRT 4 , ILO 5 and ECDC 6 . ECDC provides data on infection rates, OxCGRT systematically monitors government actions related to the pandemic and ILO provides economic indicators focused on the labour market. The modeling of the data consisted of the following steps: manual review and merging of the data. Concepts were extracted and linked using logical relationships. To design and create the ontology, we used Protégé and then ingested the data based on the ROC ontology. The remainder of the paper is organized as follows: Section 2 contains a review on related work. Section 3 describes the development methodology. Section 4 introduces the ROC ontology and its concepts. Section 5 describes the data ingestion process and querying capabilities. Section 6 concludes the paper with a short summary and future work. This section provides an overview on related work investigating ontologies related to diseases, especially relevant to COVID-19. The Human Disease Ontology (DO) 7 classifies thousands of human diseases moving to a multi-editor model in web ontology language enabling collaboration of several working groups, and has recently been extended by DOID 8 including COVID-19 concepts [7] . The Infectious Disease Ontology (IDO) [2] and the Ontology of Coronavirus Infectious Disease (CIDO) [5] define vocabulary relating to infectious diseases such as flu, malaria, and brucellosis. CIDO additionally represents comparative analysis of COVID-19 and other diseases wrt. symptoms, drugs, clinical trials etc. One of the most relevant work is the Ontology for cases and patient information (CODO) 9 , which initiated the development of CIDO. It provides a standards-based and comprehensive open source model for data collection on the COVID 19 pandemic. The ontology is very well suited for the integration of data from heterogeneous data sources and thus represents one of the major inspirations for our work. Furthermore, our methodology follows the procedure for term definition described in the work. This is based on the reuse of concepts from other leading vocabularies and the use of the W3C standards RDF, OWL, SWRL and SPARQL. The evaluation of CODO was conducted on the basis of data received from the Indian government [3] . Of course, the ontologies (COVIDCRFRAPID) 10 of the World Health Organization (WHO) as data model for the case COVID-19 RAPID, the ontologies Kg-COVID-19 11 for the creation of knowledge graphs including SARS-COV-2 4 https://www.bsg.ox.ac.uk/ 5 https://www.ilo.org/global/lang--en/index.htm 6 https://www.ecdc.europa.eu/ 7 http://www.disease-ontology.org 8 https://disease-ontology.org/term/DOID:11725/ 9 https://github.com/biswanathdutta/CODO 10 https://bioportal.bioontology.org/ontologies/COVIDCRFRAPID 11 https://github.com/Knowledge-Graph-Hub/kg-covid-19 and the ontology Linked-Data COVID-19 12 are also relevant. These ontologies contain conceptualizations (and partially instance data) related to COVID-19 cases and are primarily aimed at software applications such as question & answering or monitoring dashboards [3] . However, to our knowledge, the work described in this paper is the first work to take into account government responses and link them to epidemiolical data, such as case data. All these works support the containment of the pandemic, but more importantly, the works build on and complement each other. The focus so far has been mainly on gathering cases, symptoms and information from infected people, for example to identify and isolate hot spots. However, one field has not been covered so far, namely the field of countries government responses. It is precisely this knowledge gap that our work with the ROC ontology aims to fill. In the next sections we describe the methodology for the development of the ROC ontology based on the state-of-the-art principles, its structure and evaluation. The artifact created and investigated in this work is an ontology authored using the Web Ontology Language 2 (OWL 2) 13 , the semantics of which are based on description logics. The main purpose of the ontology, as pointed out in Section 1, is the integration of public data on national responses to the COVID-19 pandemic and to provide a layer of interoperability between different and diverse resources as well as to answer interesting questions from the data. The development of the ontology was therefore bottom-up data-driven as well as top-down application driven. A further requirement was the integration of existing ontologies in the domain of COVID-19 in order to avoid redundancy as well as to leverage knowledge gained from the integration of existing knowledge (such as the combination of data about national responses with case data). A multitude of ontology development methods exist, comprising, but not limited to METHONTOLOGY [4] and On-To-Knowledge [9] , both of which provide general ontology development guidelines and principles, Diligent [11] , which defines an argumentation-based development process as well as agile development methods, such as RapidOWL [1] . The selection of the ontology development method was driven by the nonfunctional requirements to (a) integrate existing data sources and (b) integrate existing ontological sources as well as (c) by content-related requirements, which are formulated in the form of competency questions. We decided to use the NeOn methodology [8] , since it provides guidelines for either of the above-mentioned requirement types and allows combining them. NeOn identifies a set of scenarios and provides an ontology development method for each scenario. The scenarios applicable to the development of the ROC ontology are scenario 1 "From specification to implementation" (which involves requirement engineering and is suitable for our non-functional requirement (c)), scenario 2 "Reusing and re-engineering non-ontological resources" , as we use external, non-ontological data sources (item (a)) and scenario 3 "Reusing ontological resources", since we reuse concepts from the CODO ontology and align concepts derived from the external data sources with existing ones in the CODO ontology manually (item (b)). The ontology development process therefore consists of three main activities, which start in parallel and which are divided into subtasks, some of which interact (see Figure 1 ): The ontology 14 was designed around a set of competency questions, for instance, one can ask the following: CQ1 Which countries do establish a certain response? CQ2 At which incidence level do individual countries establish certain responses / response levels? CQ3 How long do individual countries keep their response measures active? CQ4 Were countries which established responses at low incidence levels able to avoid high incidence rates? CQ5 Is there an effect of certain responses on infection rates? Can we measure effect or delay of effect? The ontology consists of 27 OWL classes, 10 object properties, 42 data properties and 3 annotation properties. Its expressivity is ALEHI(D), i.e., AL with full existential qualification, role hierarchy, inverse roles and data types, making it a member of the OWL 2 DL profile. The central domain concepts of the ROC ontology are the indicators as defined by the Oxford Covid-19 Government Response Tracker (OxCGRT) mentioned in Section 1. The values for each of these indicators is acquired and stored in a data record. A record is represented in the ROC ontology as an instance of the class ResponseStatistics, which is a subclass of the CODO class CountryWiseStatistics (see Figure 2a ). In alignment with the structure of the data sources, which assign numerical values to each of these indicators, we modeled these indicators as data properties. We established a data property hierarchy reflecting the taxonomy of OxCGRT coding categories (containment and closure (C), economic response (E), health systems (H), and miscellaneous (M)) (see Figure 2b ). Creating common super properties for each category of response values allows for a Description Logic reasoner to infer that if any of the response values in a certain category has a value, then the super data property (representing the whole category) has a value. This section describes how the ontology is used to ingest a set of data collected from multiple sources. It outlines the data transformation process and how to query the resulting RDF knowledge base (KB) to answer competency questions. We manually reviewed and merged data collected from OxCGRT, ILO and ECDC. We then transformed data coming from those different sources into RDF based on the ROC ontology. The resulting KB serves as an integrated view to answer queries spreading over all required sources. For data transformation, we made use of the Karma integration tool [10] . The tool offers a user interface (as depicted in Figure 3 ) for mapping different types of structured data and publish it in an RDF format. It automatically generates an R2RML 15 mapping model based on users input. The model can be stored and reused on similarly structured data. The resulting KB contains 1850 instances with data for Germany, Jordan and Sweden collected between January and the beginning of November 2020. We loaded the RDF data into the Virtuoso 16 triple store, making it accessible and queryable through the SPARQL endpoint. We devised an initial set of queries to answer the competency questions over the RDF data. As an example, for the question "List the countries and their respective health responses?" the corresponding query would be the one depicted in Listing 1. The query results are depicted in Figure 4 . We can see for instance that Germany has a higher testing policy index and the highest emergency investments in healthcare and vaccines. We can also note that Sweden did not implement a facial covering policy. Based on the correlation between countries stringency and actual statistics, the effectiveness or failure of implemented measures can be derived, informing government future decisions. The present work focuses on country responses against COVID-19 and proposes a novel ontology ROC to enable the integration of data from heterogeneous data sources and answer interesting questions. This facilitates statistical analysis to investigate and evaluate the effectiveness and side effects of such responses. The ontology consists of 27 OWL classes, 10 object properties, 42 data properties and 3 annotation properties. The data collected by OxCGRT, ILO and ECDC were manually reviewed and merged. Then we converted data from these different sources into RDF based on the ROC ontology. The resulting RDF serves as an integrated view to answer queries that span all required sources. The resulting KB contains 1850 instances of data for Germany, Jordan and Sweden collected between January and early November 2020. We uploaded the RDF data to the Virtuoso Triple Store and made it accessible and queryable through a SPARQL endpoint. Given the fact that most experts are not familiar with SPARQL nor with Semantic Web technologies, we plan to connect our Controlled Natural Language querying system [6] , offering a user-friendly interface for querying the KB. Other factors could influence the effectiveness of country responses. These are either difficult to capture 17 or relatively easy to determine 18 . Consideration of such factors would lead to the extension of the properties of the concepts/terms in order to increase the semantic expressiveness of the ontology. Lastly, a technical and a goal-based evaluation of the effectiveness of the approach is a challenge to be addressed by future work. RapidOWL -An Agile Knowledge Engineering Methodology The infectious disease ontology in the age of covid-19 Codo: An ontology for collection and analysis of covid-19 data METHONTOLOGY: from Ontological Art towards Ontological Engineering Cido, a community-based ontology for coronavirus disease knowledge and data integration, sharing, and analysis Answering controlled natural language questions over RDF clinical data Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data 17 such as culture, acceptance of people, discipline, people's perseverance, family structure, types of contacts in society, etc. 18 such as public events during the pandemic, foreign trade, tourism, weather, country's experience of pandemics, time (how long the reactions will last) The NeOn Methodology for Ontology Engineering On-To-Knowledge Methodology (OTKM) Learning the semantics of structured data sources Argumentation-Based Ontology Engineering The research presented in this article is partially funded by the German Federal Ministry of Education and Research (BMBF) through the project QURA-TOR (Unternehmen Region, Wachstumskern, grant no. 03WKDA1F). http: //qurator.ai