key: cord-0577663-znbtur8w authors: Ahmad, Aakash; Bandara, Madhushi; Fahmideh, Mahdi; Proper, Henderik A.; Guizzardi, Giancarlo; Soar, Jeffrey title: An Overview of Ontologies and Tool Support for COVID-19 Analytics date: 2021-10-12 journal: nan DOI: nan sha: 2cb957614aac49be5f9629be1daca1a34cee0acd doc_id: 577663 cord_uid: znbtur8w The outbreak of the SARS-CoV-2 pandemic of the new COVID-19 disease (COVID-19 for short) demands empowering existing medical, economic, and social emergency backend systems with data analytics capabilities. An impediment in taking advantages of data analytics in these systems is the lack of a unified framework or reference model. Ontologies are highlighted as a promising solution to bridge this gap by providing a formal representation of COVID-19 concepts such as symptoms, infections rate, contact tracing, and drug modelling. Ontology-based solutions enable the integration of diverse data sources that leads to a better understanding of pandemic data, management of smart lockdowns by identifying pandemic hotspots, and knowledge-driven inference, reasoning, and recommendations to tackle surrounding issues. Context: The outbreak of the SARS-CoV-2 pandemic of the new COVID-19 disease (COVID-19 for short) demands empowering existing medical, economic, and social emergency backend systems with data analytics capabilities. An impediment in taking advantages of data analytics in these systems is the lack of a unified framework or reference model. Ontologies are highlighted as a promising solution to bridge this gap by providing a formal representation of COVID-19 concepts such as symptoms, infections rate, contact tracing, and drug modelling. Ontologybased solutions enable the integration of diverse data sources that leads to a better understanding of pandemic data, management of smart lockdowns by identifying pandemic hotspots, and knowledge-driven inference, reasoning, and recommendations to tackle surrounding issues. Objective: This study aims to investigate COVID-19 related challenges that can benefit from ontology-based solutions, analyse available tool support, and identify emerging challenges that impact research and development of ontologies for COVID-19. Moreover, reference architecture models are presented to facilitate the design and development of innovative solutions that rely on ontology-based solutions and relevant tool support to address a multitude of challenges related to COVID-19. Method: We followed the formal guidelines of systematic mapping studies and systematic reviews to identify a total of 56 solutions published research on ontology models for COVID-19 -and qualitatively selected 10 of them for the review. Results: Thematic analysis of the investigated solutions pinpoints five research themes including telehealth, health monitoring, disease modelling, data intelligence, and drug modelling. Each theme is supported by tool(s) enabling automation and user-decision support. Furthermore, we present four reference architectures that can address recurring challenges towards the development of the next generation of ontology-based solutions for COVID-19 analytics. Index Terms-COVID-19, Ontology, Analytics, Semantic Web, Reference Architecture, Tool Support Throughout history, pandemics have ravaged humanity with plagues and infections that created humanitarian crises, sev-ered social ties, hindered economic growth, and caused loss of human lives [1] . With the most recent outbreak of COVID-19 pandemic, states, communities, and individuals are facing warlike situationwith economic recession and an exponentially growing infection ratethat needs to be restrained [2] . Such an 'invisible enemy' has forced the world into a grand lockdown that has not been experienced in the past; as of now, COVID-19 has infected more than 207 million people with 4.35 million recorded deaths. Researchers and practitioners across various domains such as medical and life sciences, economics, and engineering are striving to put forward solutions to counter such a threat and aid the society in coping with the fallbacks [3]- [5] . In the same context, researchers and practitioners in knowledge management communities face a challenge about how ontology-based systems can be exploited to tackle the current pandemic? Ontology-based systems utilise semantic models, data storage and processing software as well as user-oriented analytics tools to enable information integration, multi-modal data analysis, and data visualisation related to COVID-19 infections [6]- [9] . They can help annotating real-time context-sensitive data from different sources such as tracing apps, performing context-aware data integration, and analytics functions that aid in taking the right actions to protect citizens and prevent transmission of the disease [10] , [11] . The aim of this paper is to investigate existing ontology-based solutions proposed by the research community and evaluate their capabilities that assist in analysing and managing COVID-19 spread. The primary contributions of this research are to: • Investigate the role of ontology-based solutions in software systems to address the challenges related to COVID-19 data. • Evaluate the available tool support that automates and customises (i.e., enabling user decision support) to com-plement ontology-based solutions. • Present reference architectures as a generic solution to address the recurring problems for COVID-19 analytics. To our knowledge, there is no existing systematic investigation, classification, and comparison of ontology-based solutions and tools support for COVID-19 analytics. Furthermore, the presented reference architecture models can provide guidance to develop software systems exploiting ontologies to address the issues related to COVID-19 pandemic. The rest of the paper is organised as follows. Section II presents the context and background details of this study. Section III presents the methodology. Section IV presents the results of this study. Section V discusses the key findings of this research. Section VI concludes the paper. With a widespread proliferation of COVID-19 infections and their socio-economic impacts across the globe, nations and their administrative stakeholders are striving hard to exploit existing disaster management infrastructures [3] or smart city frameworks [4] to counter the pandemic. Moreover, novel approaches such as those that unify big data and artificially intelligent systems [5] accumulate an unprecedented amount of data from public health monitoring to support data-driven intelligence for pandemic management. We classify the most relevant existing work as (i) modelling of pandemic data (Section II. A), (ii) ontological management of pandemic (Section II. B), and (iii) analytics for COVID-19 (Section II. C). Some recent research and development efforts [1] represent a catalogue of solutions for data-driven modelling of pandemic scenarios along with decision strategies to counter the impacts of the pandemic on health care, social norms, and economic downturns. Modelling pandemic data (e.g., infection tracing, spread rates, and social distancing, etc.) is fundamental to visualise, simulate, and analyse pandemic spread as demonstrated by several studies conducted across the globe from Asia [2] to Europe [12] , Americas [6] , and Africa [13] . Specifically, Kaxiras & Neofotistos [7] and Bastos & Cajueiro [6] indicate how mathematical models helped simulate infection growth and contact tracing in countries like India and Brazil that represented the epicenter of the pandemics. For example, Bastos & Cajueiro [6] modelled data collected from public health and surveillance units to simulate long-term scenarios of the pandemics that reflect different levels of engagement of the Brazilian social distancing policy. Multi-disciplinary research efforts including but not limited to mathematical models, software tools, biological structures, and chemical compositions have been synergised for infection modelling, and spread prediction to the simulation of contact tracing [1] , [7] . Despite the strategic benefits of pandemic modelling, several issues such as performance, scalability, and accuracy represent the potential limitations of model-based solutions [1] , [2] , [6] , [13] . Several ontology-based techniques have been proposed to conceptualise fundamental concepts of COVID-19 data (e.g., infection symptoms or patients' health) and define essential relationships between these concepts to enable automated reasoning about data [10] , [14] . The Infectious Disease Ontology (IDO) is a suite of interoperable ontology modules that aims to provide coverage of all aspects of the infectious disease domain, including biomedical research, clinical care, and public health. IDO provides foundations for ontologybased reasoning for COVID-19 use-cases and supports reproducibility of infectious disease research [14] . Some existing solutions [10] , [15] have extended IDOenabling data representation and automated reasoningto support safety monitoring for indoor individuals and to analyse patients' data about the COVID-19 pandemic. Specifically, COVID-19 Ontology for cases and patient information (CODO) extends IDO to enable the representation of patient and infection data to create a network that supports the behaviour analysis of the disease, possible paths of disease spread, and various factors of disease transmission [10] . Such ontology-based models are useful because they enable (i) structural representation of COVID-19 data, (ii) semantics of the data, and (ii) behavioural analysis of data to support predictive analytics for pandemic spread [11] . The ontologies in [11] , [15] leverages big data analytics techniques to extract significant information from multiple data sources for generating real-time statistics for contactless temperature sensing, mask detection, and monitoring social distancing. The research on addressing technical challenges associated with COVID-19 analytics is still in its early phase and primarily focused on survey-based studies investigating the applications of existing solutions [8] , [16] or proposing data analytics methods for sensing, mining, and visualising data from healthcare units [17] , [18] . Most recently, a multitude of survey-based studies have been published on investigating contact tracing apps [8] , COVID-19 dashboards for data visualisation [19] , and predictive analytics for infection spread and social distancing [16] . Such studies streamlined needed solutions and provided recommendations to develop solutions for predictive analytics of COVID-19 [16] . A few works have relied on big data analytics for real-time contact tracing based on travel history and clinical symptoms of potential infections [17] , [19] . For example, to support real-time visualisation of infection spread, C2SMART team exploited data mining and cloud computing techniques to investigate the impact of COVID-19 on mobility and sociability based on New York City and Seattle based case study [19] . The solution provides a dashboard for interactive data analytics and visualisation to facilitate the understanding of the impact of the outbreak and corresponding policies such as social distancing on transportation systems. In general, ontology-based systems that utilise knowledge graphs and linked data are being used for data integration, information retrieval, recommender systems as well as explainable machine learning [20] . Such approaches have the feasibility to be adopted in solving analytics challenges associated with pandemic [21] - [23] . Currently, ontology-based solutions have been largely deployed for health information exchange and communication only [24] . Given the significance of applications that can use ontological reasoning for COVID-19 data, and the lack of systematic studies that explore how ontologies are utilised in designing and supporting COVID-19 analytics, this paper aims to investigate existing solutions based on published literature on ontology-based solutions and tool support for COVID-19. We followed the methodology of Systematic Mapping Studies (SMS) [25] to conduct and document this review. SMS is used to provide an overview (systematic map) of a research topic by showing the type of research and the results that have been published by categorising them in line with answering a specific research question [25] , [26] . This review is conducted to answer three research questions which are stated as follows: • RQ1 -How ontologies are used to address the challenges related to COVID-19 analytics? • RQ2 -What level of tool support is provided in identified studies for ontology models and applications that support COVID-19 analytics? • RQ3 -What are the recurring challenges and reference architecture modules that emerge from investigated solutions? We followed the process proposed by Petersen et al. [25] and conducted the initial evidence search on three databases (IEEE Xplore, ACM Digital Library, and ScienceDirect) for published research on ontologies for COVID-19. The intent behind the selection for three primary sources was that we wanted to analyse COVOD-19 specific ontologies from computing or information system point of view [25] , [27] ; explicitly highlighting reference architectures and tools while discarding ontology solutions that overlook the design (i.e., architecture) and implementation (tool support) of the solutions. Google Scholar search engine was used to search for literature that may have been missed by the three main sources and also to ensure the latest studies that may be available as preprints are also included as our evidence. Findings were further extended through snowballing approach proposed by Wohlin [28] . All the solutions identified from the search phase were reviewed for relevancy. If a paper satisfied the selection criteria, we included it in the list of studies qualified for the synthesis. Below are the exclusion criteria we adapted from Khan et al. [27] : • Books and news articles • Papers where ontologies were not incorporated to propose COVID-19 related challenges • Vision papers • Papers not written in English • Full text that was not available for public access or through digital library services Through the initial database search, we identified 55 empirical studies as candidates for review. Among those, 8 studies were qualitatively selected as relevant studies, based on the study quality assessment and exclusion criteria. The same steps were applied to the 8 studies identified through snowballing and we identified 1 additional relevant paper for our study. One additional study was included based on expert recommendations. To avoid the inclusion of duplicate studies which would inevitably bias the result of the synthesis, we thoroughly checked if very similar studies were published in more than one paper. Out of 56, we eventually selected a total of 10 studies (i.e., 18% approx. of total identified literature) that were included in the synthesis of evidence. The following section presents the results based on thematic mapping of identified studies and how they answer each of the research questions of our study. To understand the nature of COVID-19 analytics applications that exploit ontology-based solutions, we mapped the ten identified studies into key themes as illustrated in Fig. 1 . We identified five themes, with the majority of studies using ontologies for health monitoring, disease modelling, drug modelling, and data intelligence. One study [S1] 1 uniquely proposed to utilise an ontology for telehealth applications to provide semi-automated recommendations and suggestions for telehealth patients and practitioners. A. RQ1 -How ontologies are used to address the challenges related to COVID-19 analytics? To answer the RQ1, we mapped the identified studies into challenges they addressed, solutions they propose, along with the method of evaluation and the ontologies they use or propose. This mapping is summarised in Table I. Five studies [S2, S3, S4, S5, S10] propose new ontologies, and four studies [S6, S7, S8, S9] propose solutions based on existing ontologies. S1 is the only study that is not specific to a particular ontology, and proposes a placeholder to plug-in any existing ontology that suits the end-user. To answer RQ2, we identified and characterised the tools proposed or used in identified studies, including the type of tool, their intent, source type and level of automation (Table II) . We observed that [S1, S2] proposed new tools. One study [S4] does not report on the tool(s) they used to develop the ontology they propose. The rest of the studies utilise existing open source tools and libraries such as Protégé, and KGX. Other than Protégé, identified studies tend to use a unique set of tools and libraries to match their application intent. The level of automation of existing tools are largely observed to be limited to rule execution and data annotation. [S1, S2] are the only applications that go beyond the traditional scope and used ontologies to automate the analytics and recommendation aspects of COVID-19 information systems. To answer RQ3, we created a template-based specification (serving as a structured catalogue) that can map existing studies to recurring challenges, proposed solutions and reference architectures that emerge from proposed solutions. Recurring challenges refer to challenging scenarios of COVID-19 analytics common to different application domains that need unique architecture-centric solutions to address the problems. Furthermore, the mapping we developed contains a thumbnail architectural view for the solution and examples from identified studies. This information is captured using the following topical structure: Derived from the reviewed studies, we defined four recurring challenges referred to as 1. Agent and intelligent systems; 2. Infection monitoring and analysis; 3. Integration and interoperability of disease data; and 4. Establishing information repositories. Details of the findings for each category are presented here. • Recurring Challenge: How to exploit data-driven ontology-based intelligence for semi-autonomous systems that can assist in COVID-19 scenarios in the context of telehealth care? • Proposed Solution: An ontology-based domain model structures a knowledge base that represents information about the disease, its typical symptoms, possible courses of action, and aspects to be monitored. Moreover, this domain model can structure data about the patient's characteristics and health status. A semi-autonomous telehealth system can then: -Reason with knowledge base as to infer the best course of action to be recommended for that particular patient -With the approval and monitoring of that patient's physician, activate a telehealth agent for pursuing a number of assisting actions (e.g., monitoring and transmitting specific signs of the patient, notifying the proper health authorities). How to support integration and interoperability for data from various domains? A methodology for building novel, powerful, pathogen-specific ontologies that represent data about novel diseases that can be used to easily compare multiple dimensions of data pre-curated from past diseases. How to support text mining and semantic interoperability of unstructured data in the COVID-19 domain? A novel ontology that forms the bases for establishing reference namespace for a COVID-19 knowledge graph that encapsulates COVID-19 domain specific topics ranging from epidemiology, and prevention and control to genetics and molecular processes. The COVID-19 Ontology a Provides a placeholder to plug-in any ontology Based on the evidence gathered through three research questions under the SMS, we identified a unique set of challenges that are addressed by ontologies, related to COVID-19 analytics. We identified the following areas as emerging research trends in line with the interests of researchers and capabilities developed to date: • Telehealth agents • Public and open data and models for disease modelling • Explainable statistical algorithms and data mining techniques used in infection modelling • Drug and impact modelling, particularly for drug repurposing Furthermore, Internet of Things (IoTs) as a network of interconnected devices and an enabling technology has exploited ontologies to represent COVID-19 specific knowledge for applications such as orchestrating sensors for contextsensing of infection transmission [15] and health symptoms monitoring [29] . Ontology-driven IoTs to manage COVID-19 represent a diverse topic that requires future work to investigate the integration of ontologies that provide a knowledge base and decision support for IoT sensors [29] , [30] . A noticeable limitation in the identified studies is the lack of rigor in the evaluation of the proposed solutions. Many studies lack concrete implementations, real-world validations, and use-cases that can validate the practical applicability of proposed solutions. This issue is understandable, given the fact that COVID-19 analytics is an emerging field and its literature is immature. In this paper, we provided a review of existing ontologybased solutions concerning COVID-19 analytics. We investigated challenges that have been already addressed in existing research along with tool support availability. Furthermore, we derived reference architectures from the identified studies to represent recurring challenges associated with COVID-19 analytics. As future work, it is necessary to conduct a qualitative analysis of novel ontologies proposed in the identified studies based on existing ontology quality evaluation literature conducted on popular knowledge bases such as Wikidata and utilising formal criterion set for ontology quality evaluation [31] . APPENDIX A-SELECTED STUDIES FOR REVIEW Agents and robots for collaborating and supporting physicians in healthcare scenarios In-pandemic development of an application ontology for covid-19 surveillance in a primary care sentinel network The infectious disease ontology in the age of covid-19 Cido, a community-based ontology for coronavirus disease knowledge and data integration, sharing, and analysis Towards integrated and open covid-19 data Towards an ontology proposal model in data lake for real-time covid-19 cases prevention Reference ontology and database annotation of the covid-19 open research dataset Ontological and bioinformatic analysis of anti-coronavirus drugs and their implication for drug repurposing against covid-19 Covid-kop: integrating emerging covid-19 data with the robokop database The COVID-19 Ontology Charting the next pandemic: modeling infectious disease spreading in the data science age Generalized logistic growth modeling of the covid-19 pandemic in asia National disaster management system: Covid-19 case in korea Covid-19 pandemic: a review of smart cities initiatives to face new outbreaks How big data and artificial intelligence can help better manage the covid-19 pandemic Modeling and forecasting the covid-19 pandemic in brazil Multiple epidemic wave model of the covid-19 pandemic: modeling study A survey of covid-19 contact tracing apps Empowering virus sequence research through conceptual modeling Codo: an ontology for collection and analysis of covid-19 data Towards an ontology proposal model in data lake for real-time covid-19 cases prevention Modeling and prediction of covid-19 pandemic using gaussian mixture model Modeling the transmission dynamics of the covid-19 pandemic in south africa The infectious disease ontology in the age of covid-19 Iot-based system for covid-19 indoor safety monitoring Data analytics: Covid-19 prediction using multimodal data Response to covid-19 in taiwan: big data analytics, new technology, and proactive testing An interactive data visualization and analytics tool to evaluate mobility and sociability trends during covid-19 Chasing john snow: data analytics in the covid-19 era Semantic modeling for engineering data analytics solutions Semantic web technologies for explainable machine learning models: A literature review A retrospective of knowledge graphs On the role of knowledge graphs in explainable ai Implications of knowledge organization systems for health information exchange and communication during the covid-19 pandemic Systematic mapping studies in software engineering Guidelines for performing systematic literature reviews in software engineering Systematic reviews to support evidence-based medicine Guidelines for snowballing in systematic literature studies and a replication in software engineering Special issue on iot for fighting covid-19 Intelligent covid-19 forecasting, diagnoses and monitoring systems: A survey A quality evaluation framework for bio-ontologies