key: cord-177610-8aodvgww authors: Groza, Adrian title: Detecting fake news for the new coronavirus by reasoning on the Covid-19 ontology date: 2020-04-26 journal: nan DOI: nan sha: doc_id: 177610 cord_uid: 8aodvgww In the context of the Covid-19 pandemic, many were quick to spread deceptive information. I investigate here how reasoning in Description Logics (DLs) can detect inconsistencies between trusted medical sources and not trusted ones. The not-trusted information comes in natural language (e.g."Covid-19 affects only the elderly"). To automatically convert into DLs, I used the FRED converter. Reasoning in Description Logics is then performed with the Racer tool. In the context of Covid-19 pandemic, many were quick to spread deceptive information [7] . Fighting against misinformation requires tools from various domains like law, education, and also from information technology [21, 16] . Since there is a lot of trusted medical knowledge already formalised, I investigate here how an ontology on Covid-19 could be used to signal fake news. I investigate here how reasoning in description logic can detect inconsistencies between a trusted medical source and not trusted ones. The not-trusted information comes in natural language (e.g. "Covid-19 affects only elderly"). To automatically convert into description logic (DL), I used the FRED converter [12] . Reasoning in Description Logics is then performed with the Racer reasoner [15] 1 The rest of the paper is organised as follows: Section 2 succinctly introduces the syntax of description logic and shows how inconsistency can be detected by reasoning. Section 4 analyses FRED translations for the Covid-19 myths. Section 5 illustrates how to formalise knowledge patterns for automatic conflict detection. Section 6 browses related work, while section 7 concludes the paper. Syntax Semantics conjunction C D C I ∩ D I disjunction C D C I ∪ D I existential restriction ∃r.C {x ∈ ∆ I |∃y : (x, y) ∈ r I ∧ y ∈ C I } value restriction ∀r.C {x ∈ ∆ I |∀y : (x, y) ∈ r I → y ∈ C I } individual assertion a : C {a} ∈ C I role assertion r(a, b) (a, b) ∈ r I 2 Finding inconsistencies using Description Logics In the Description Logics, concepts are built using the set of constructors formed by negation, conjunction, disjunction, value restriction, and existential restriction [4] (Table 1). Here, C and D represent concept descriptions, while r is a role name. The semantics is defined based on an interpretation I = (∆ I , · I ), where the domain ∆ I of I contains a non-empty set of individuals, and the interpretation function · I maps each concept name C to a set of individuals C I ∈ ∆ I and each role r to a binary relation r I ∈ ∆ I × ∆ I . The last column of Table 1 shows the extension of · I for non-atomic concepts. A terminology TBox is a finite set of terminological axioms of the forms C ≡ D or C D. Example 1 (Terminological box) "Coronavirus disease is an infectious disease caused by a newly discovered coronavirus" can be formalised as: (1) In f ectiousDisease Disease (2) CoronavirusDisease In f ectiosDisease ∀causedBy.NewCoronavirus Here the concept Covid-19 is the same as the concept CoronavirusDisease. We know that an infectious disease is a disease (i.e. the concept In f ectiousDisease is included in the more general concept Disease). We also learn from (3) that the coronovirus disease in included the intersection of two sets: the set In f ectionDisease and the set of individuals for which all the roles causedBy points towards instances from the concept NewCoronavirus. An assertional box ABox is a finite set of concept assertions i : C or role assertions r(i,j), where C designates a concept, r a role, and i and j are two individuals. Example 2 (Assertional Box) SARS-CoV-2 : Virus says that the individual SARS-CoV-2 is an instance of the concept Virus. hasSource(SARS-CoV-2, bat) formalises the information that SARS-Cov-2 comes from the bats. Here the role hasSource relates two individuals SARS-CoV-2 and bat that is an instance of mammals (i.e. bat : Mammal). A concept C is satisfied if there exists an interpretation I such that C I = / 0. The concept D subsumes the concept C (C D) if C I ⊆ D I for all interpretations I. Constraints on concepts (i.e. disjoint) or on roles (domain, range, inverse role, or transitive properties) can be specified in more expressive description logics 2 . By reasoning on this mathematical constraints, one can detect inconsistencies among different pieces of knowledge, as illustrated in the following inconsistency patterns. An ontology O is incoherent iff there exists an unsatisfiable concept in O. Covid-19 In f ectionDisease is incoherent because COV ID19 is unsatisfiable in O since it included to two disjoint sets. In most of the cases, reasoning is required to signal that a concept is includes in two disjoint concepts. Covid-19 In f ectionDisease (6) In f ectiousDisease Disease ∃causedBy.(Bacteria Virus Fungi Parasites) (7) Covid-19 ¬Disease (8) From axioms 6 and 7, one can deduce that Covid-19 is included in the concept Disease. From axiom 8, one learns the opposite: Covid-19 is outside the same set Disease. A reasoner on Description Logics will signal an incoherence. An ontology is inconsistent when an unsatisfiable concept is instantiated. For instance, inconsistency occurs when the same individual is an instance of two disjoint concepts SARS-CoV-2 : Virus (9) SARS-CoV-2 : Bacteria (10) Virus Bacteria (11) We learn that SARS-CoV-2 is an instance of both Virus and Bacteria concepts. Axiom (8) states the viruses are disjoint of bacteria. A reasoner on Description Logics will signal an inconsistency. Two more examples of such antipatterns 3 are: Antipattern 1 (Onlyness Is Loneliness -OIL) Here, concept A can only be linked with role r to B. Next, A can only be linked with role r to C, disjoint with B. Example 6 (OIL antipattern) Assume that axioms UE 2 and UE 3 comes from a trusted source, while axiom UE 1 from the social web. By combining all three axioms, a reasoner will signal the inconsistency or incoherence. The technical difficulty is that information from social web comes in natural language. Sample medical misconceptions on Covid-19 are collected in Table 2 ). Organisations such as WHO provides facts for some myths (denoted f i in the table). Let for instance myth m 1 with the formalisation: Assume the following formalisation for the corresponding fact f 1 : The following line of reasoning signals that the ontology is inconsistent: Virus ¬(∃travel.MobileNetworks) (28) Virus ∀travel.¬MobileNetworks (29) Virus ∀spread.¬MobileNetworks Here we need the subsumption relation between roles (travel spread). The reasoner finds that the individual 5G (which is a mobile network by axiom (24)) that spreads COV ID19 (which is a virus by axiom (25)) is in conflict with the axiom (30). Regularly rinsing your nose with saline help prevent infection with Covid-19 There is no evidence that regularly rinsing the nose with saline has protected people from infection with the new coronavirus m 7 Eating raw ginger counters the coronavirus f 7 There is no evidence that eating garlic has protected people from the new coronavirus m 9 The new coronavirus can be spread by Chinese food f 9 The new coronavirus can not be transmitted through food m 10 Hand dryers are effective in killing the new coronavirus f 10 Hand dryers are not effective in killing the 2019-nCoV m 11 Cold weather and snow can kill the new coronavirus f 11 Cold weather and snow can not kill the new coronavirus m 12 Taking a hot bath prevents the new coronavirus disease f 12 Taking a hot bath will not prevent from catching Covid-19 m 13 Ultraviolet disinfection lamp kills the new coronavirus f 13 Ultraviolet lamps should not be used to sterilize hands or other areas of skin as UV radiation can cause skin irritation m 14 Spraying alcohol or chlorine all over your body kills the new coronavirus f 14 Spraying alcohol or chlorine all over your body will not kill viruses that have already entered your body m 15 Vaccines against pneumonia protect against the new coronavirus As a second example, let the myth m 33 in Table 2 : The corresponding fact f 33 states: The inconsistency will be detected on the Abox contains and individual affected by Covid-19 and who is not elderly: We need also some background knowledge: Based on the definition of Elderly and on jon's age, the reasoner learns that jon does not belong to that concept (i.e jon : ¬Elderly). From the inverse roles a f f ects − ≡ a f f ectedBy, one learns that the virus Covid-19 affects jon. Since the concept Covid-19 includes only the individual with the same name Covid-19 (defined with the constructor one − o f for nominals), the reasoner will be able to detect inconsistency. Note that we need some background knowledge (like definition of Elderly) to signal conflict. Note also the need of a trusted Covid-19 ontology. There is ongoing work on formalising knowledge about Covid-19 . First,Coronavirus Infectious Disease Ontology (CIDO) 4 . Second, the Semantics for Covid-19 Discovery 5 adds semantic annotations to the CORD-19 dataset. The CORD-19 dataset was obtained by automatically analysing publications on Covid-19 . Note also that the above formalisation was manually obtained. Yet, in most of the cases we need automatic translation from natural language to description logic. Transforming unstructured text into a formal representation is an important task for the Semantic Web. Several tools are contributing towards this aim: FRED [12] , OpenEI [17] , controlled languages based approach (e.g. ACE), Framester [11] , or KNEWS [2] . We here the FRED tool, that takes a text an natural language and outputs a formalisation in description logic. FRED is a machine reader for the Semantic Web that relies on Discourse Representation Theory, Frame semantics and Ontology Design Patterns 6 [8, 12] . FRED leverages multiple natural language processing (NLP) components by integrating their outputs into a unified result, which is formalised as an RDF/OWL graph. Fred relies on Table 3 ). VerbNet [19] contains semantic roles and patterns that are structure into a taxonomy. FrameNet [5] introduces frames to describe a situation, state or action. The elements of a frame include: agent, patient, time, location. A frame is usually expressed by verbs or other linguistic constructions, hence all occurrences of frames are formalised as OWL n-ary relations, all being instances of some type of event or situation. We exemplify next, how FRED handles linked data, compositional semantics, plurals, modality and negations with examples related to Covid-19 : Let the myth "Hand dryers are effective in killing the new coronavirus", whose automatic translation in DL appears in Figure 1 . Fred creates the individual situation 1 : Situation. The role involves from the boxing ontology is used to relate situation 1 with the instance hand dryers: boxing : involves(situation 1 , hand dryers 1 ) Note that hand dryers 1 is an instance of the concept Hand dryer from the DBpedia. The plural is formalised by the role hasQuanti f ier from the Quant ontology: q : hasQuanti f ier(hand dryers 1 , q : multiple) The information that hand dryers are effective is modeled with the role hasQuality from the Dolce ontology: dul : hasQuality(hand dryers 1 , e f f ective) ( Note also that the instance e f f ective is related to the instance situation 1 with the role involves: boxing : involves(situation 1 , e f f ective) ( The instance kill 1 is identified as an instance of the Kill 42030000 verb from the VerbNet and also as an instance of the Event concept from the Dolce ontology: Here the concept New is identified as a subclass of the Quality concept from Dolce. Note that Fred has successfully linked the information from the myth with relevant concepts from DBpedia, Verbnet, or Dolce ontologies. It also nicely formalises the plural of "dryers",uses compositional semantics for "hand dryers" and "new coronavirus", Here, the instance kill 1 has the object coronavirus 1 as patient. (Note that the Patient role has the semantics from the VerbNet ontology and there is no connection with the patient as a person suffering from the disease). Also the instance kill 1 has Agent something (i.e. thing 1 ) to which the situation 1 is in: in(situation 1 ,thing 1 ) (47) vn.role : Agent(kill 1 ,thing 1 ) (48) vn.role : Patient(kill 1 , coronavirus 1 ) The translating meaning would be: "The situation involving hand dryers is in something that kills the new coronavirus". One possible flaw in the automatic translation from Figure 1 is that hand dryers are identified as the same individual as coronavirus: owl : sameAs(hand dryers 1 , coronavirus 1 ) This might be because the term "are" from the myth ("Hand dryers are ....") which signals a possible definition or equivalence. These flaw requires post-processing. For instance, we can automatically remove all the relations sameAs from the generated Abox. Actually, the information encapsulated in the given sentence is: "Hand dryers kill coronavirus". Given this simplified version of the myth, Fred outputs the translation in Figure 2 . Here the individual kill 1 is correctly linked with the corresponding verb from VerbNet and also identified as an event in Dolce. The instance kill 1 has the agent dryer 1 and the patient coronavirus 1 . This corresponds to the intended semantics: hand dryers kill coronavirus. Deceptive information makes extensively use of modalities. Since OWL lacks formal constructs to express modality, FRED uses the Modality class from the Boxing ontology: • boxing:Necessary: e.g., will, must, should Let the following myth related to Covid-19 "You should take vitamin C" (Figure 3 ). The frame is formalised around the instance take 1 . The instance is related to the corresponding verb from the VerbNet and also as an event from the Dolce ontology. The agent of the take verb is a person and has the modality necessary. The individual C is an instance of concept Vitamin. Although the above formalisation is correct, the following axioms are wrong. First, Fred links the concept Vitamin from the Covid-19 ontology with the Vitamin C singer from DBpedia. Second, the concept Person from the Covid-19 ontology is linked with Hybrid theory album from the DBpedia, instead of the Person from schema.org. By performing word sense disambiguation (see Figure 4 , Fred correctly links the vitamin C concept with the noun vitamin from WordNet that is a subclass of the substance concept in the word net and aslo of PhysicalOb ject from Dolce. Most of the myths are in positive form. For instance, in Table 2 only myths m 3 and m 22 includes negation. Let the translation of myth m 3 in Figure 5 . The frame is built around the recover 1 event (recover 1 is an instance of dul : event concept) Indeed, FRED signals that the event recover 1 : • has truth value false (axiom 51) • has modality "possible" (axiom 52) • has agent a person (axiom 53) • has source an infection of type coronavirus (axiom 54) The axioms in Figure 9 state that an elderly is a person and that the instance person 1 is not elderly. The conflict detection pattern is defined as: ∃dul : hasQuality.Only The SWRL rule states that for each individual ?x with the quality only that is related via the role experiencer with two distinct individuals ?y and ?z (where ?y is an instance of the concept Elderly), then the individual ?z is also an instance of Elderly. The conflict comes from the fact that person 1 is not an instance of Elderly, but still he/she is affected by COVID (i.e. experiencer(a f f ect 1 , person 1 )). The system architecture appears in Figure 10 . We start with a core ontology for Covid-19 . This ontology is enriched with trusted facts on COVID using the FRED converter. Information from untrusted sources is also formalised in DL using FRED. The merged axioms are given to Racer that is able to signal conflicts. To support the user understanding which knowledge from the ontology is causing incoherences, we use the Racer's explanation capabilities. RacerPro provides explanations for unsatisfiable concepts, for subsumption relationships, and for unsatisfiable A-boxes through the commands (check-abox-coherence), (check-tbox-coherence) and (check-ontology) or (retrieve-with-explanation). These explanations are given to an ontology verbalizer in order to generated natural language explanation of the conflict. We aim to collect a corpus of common misconceptions that are spread in online media. We aim to analyse these misconceptions and to build evidence-based counter arguments to each piece of deceptive information. We aim to annotate this corpus with concepts and roles from trusted medical ontologies. Our topic is related to the more general issue of fake news [9] . Particular to medical domain, there has been a continuous concern of reliability of online heath information [1] . In this line, Waszak et al. have recently investigated the spread of fake medical news in social media [22] . Amith and Tao have formalised the Vaccine Misinformation Ontology (VAXMO) [3] . VAXMO extends the Misinformation Ontology, aiming to support vaccine misinformation detection and analysis 7 . Teymourlouie et al. have recently analyse the importance of contextual knowledge in detecting ontology conflicts. The added contextual knowledge is applied in [20] 8 to the task fo debugging ontologies. In our case, the contextual ontology is represented by patterns of conflict detection between two merged ontologies. The output of FRED is given to the Racer reasoner that detects conflict based on trusted medical source and conflict detection patterns. FiB system [10] labels news as verified or non-verified. It crawls the Web for similar news to the current one and summarised them The user reads the summaries and figures out which information from the initial new might be fake. We aim a step forward, towards automatically identify possible inconsistencies between a given news and the verified medical content. MERGILO tool reconciles knowledge graphs extracted from text, using graph alignment and word similarity [2] . One application area is to detect knowledge evolution across document versions. To obtain the formalisation of events, MERGILO used both FRED and Framester. Instead of using metrics for compute graph similarity, I used here knowledge patterns to detect conflict. Enriching ontologies with complex axioms has been given some consideration in literature [13, 14] . The aim would be bridge the gap between a document-centric and a model-centric view of information [14] . Gyawali et al translate text in the SIDP format (i.e. System Installation Design Principle) to axioms in description logic. The proposed system combines an automatically derived lexicon with a hand-written grammar to automatically generates axioms. Here, the core Covid-19 ontology is enriched with axioms generated by Fred fed with facts in natural language. Instead of grammar, I formalised knowledge patterns (e.g. axioms in DL or SWRL rules) to detect conflicts. Conflict detection depends heavily on the performance of the FRED translator One can replace FRED by related tools such as Framester [11] or KNEWS [6] . Framester is a large RDF knowledge graph (about 30 million RDF triples) acting as a umbrella for FrameNet, WordNet, VerbNet, BabelNet, Predicate Matrix. In contrast to FRED, KNEWS (Knowledge Extraction With Semantics) can be configured to use different external modules a s input, but also different output modes (i.e. frame instances, word aligned semantics or first order logic 9 ). Frame representation outputs RDF tuples in line with the FrameBase 10 model. First-order logic formulae in syntax similar to TPTP and they include WordNet synsets and DBpedia ids as symbols [6] . Even if fake news in the health domain is old hat, many technical challenges remain to effective fight against medical myths. This is preliminary work on combining two heavy machineries: natural language processing and ontology reasoning aiming to signal fake information related to Covid-19 . The ongoing work includes: i) system evaluation and ii) verbalising explanations for each identified conflict. Revisiting the online health information reliability debate in the wake of web 2.0: An inter-disciplinary literature and website review Event-based knowledge reconciliation using frame embeddings and frame similarity. Knowledge-Based Systems Representing vaccine misinformation using ontologies The description logic handbook: Theory, implementation and applications The berkeley framenet project Knews: Using logical and lexical semantics to extract knowledge from natural language The covid-19 social media infodemic Fred: From natural language text to rdf and owl in one click The current state of fake news: challenges and opportunities The current state of fake news: challenges and opportunities Framester: a wide coverage linguistic linked data hub Semantic web machine reading with fred Ontology enrichment using semantic wikis and design patterns Mapping natural language to description logic The racerpro knowledge representation and reasoning system The science of fake news Openie-based approach for knowledge graph construction from text Antipattern detection: how to debug an ontology without a reasoner Verbnet: A broad-coverage, comprehensive verb lexicon Detecting hidden errors in an ontology using contextual knowledge The spread of true and false news online The spread of medical fake news in social media-the pilot quantitative study