Microsoft Word - ESWA-semantic-similarity-final.doc Ontology-based semantic similarity: a new feature-based approach David Sánchez1, Montserrat Batet, David Isern, Aida Valls Intelligent Technologies for Advanced Knowledge Acquisition (ITAKA). Departament d’Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili, Avda. Països Catalans, 26. 43007 Tarragona (Spain) Abstract. Estimation of the semantic likeness between words is of great importance in many applications dealing with textual data such as natural language processing, knowledge acquisition and information retrieval. Semantic similarity measures exploit knowledge sources as the base to perform the estimations. In recent years, ontologies have grown in interest thanks to global initiatives such as the Semantic Web, offering an structured knowledge representation. Thanks to the possibilities that ontologies enable regarding semantic interpretation of terms many ontology-based similarity measures have been developed. According to the principle in which those measures base the similarity assessment and the way in which ontologies are exploited or complemented with other sources several families of measures can be identified. In this paper, we survey and classify most of the ontology-based approaches developed in order to evaluate their advantages and limitations and compare their expected performance both from theoretical and practical points of view. We also present a new ontology-based measure relying on the exploitation of taxonomical features. The evaluation and comparison of our approach’s results against those reported by related works under a common framework suggest that our measure provides a high accuracy without some of the limitations observed in other works. Keywords: semantic similarity, semantic relatedness, ontologies, feature-based similarity, WordNet 1 Corresponding author. Address: Departament d’Enginyeria Informàtica i Matemàtiques. Universitat Rovira i Virgili. Avda. Països Catalans, 26. 43007. Tarragona. Spain Tel.: +34 977 556563; Fax: +34 977 559710; E-mail: david.sanchez@urv.net. 1. Introduction With the enormous success of the Information Society and the World Wide Web, the amount of textual electronic information available has significantly increased. As a result, computer understanding of text has acquired great interest in the research community in order to enable a proper exploitation, management, classification or retrieval of textual data. One of the most basic problems when aiming to interpret textual data is the assessment of semantic likeness between terms because, as it has been demonstrated in psychological experiments (Goldstone, 1994), it acts as a fundamental organizing principle by which humans organize and classify objects. It is important to note that two different paradigms can be found in the literature. On one hand, semantic similarity states how taxonomically near two terms are, because they share some aspects of their meaning (e.g., dogs and cats are similar to the extend they are mammals). On the other hand, the more general concept of semantic relatedness does not necessary rely on a taxonomic relation (e.g., car and wheel or pencil and paper); other non taxonomic relationships (e.g., meronymy, antonymy, functionality, cause-effect, etc.) are also considered. Semantic similarity/relatedness computation has many direct and relevant applications. Some basic natural language processing tasks such as word sense disambiguation (Patwardhan, Banerjee, & Pedersen, 2003), synonym detection (Lin, 1998) or automatic spelling error detection and correction (Budanitsky & Hirst, 2001) rely on the assessment of words’ semantic resemblance. Direct applications can be found in the knowledge management field, such as thesauri generation (Curran, 2002), information extraction (M. Y. Chen, Chu, & Chen, 2010; P. Chen, Lin, & Chu, 2011; D. Sánchez & Isern, 2009; Stevenson & Greenwood, 2005) or ontology learning (D. Sánchez, 2010; David Sánchez & Antonio Moreno, 2008; D. Sánchez & A. Moreno, 2008), in which new terms related to already existing concepts, should be acquired from textual resources. The Semantic Web is an especially relevant application area, when dealing with automatic annotation of documents (Cimiano, Handschuh, & Staab, 2004; Chu, Chen, & Chen, 2009; D. Sánchez, Isern, & Millan, 2010) and text clustering (Song, Li, & Park, 2009). Despite its usefulness, robust measurement of semantic similarity/relatedness between textual terms remains a challenging task (Bollegala, Matsuo, & Ishizuka, 2007). Many works have been developed in the last years, especially with the increasing interest on the Semantic Web and the popularization of ontologies (Lanzenberger, et al., 2008). Proposed methods aim for automatically assessing a numerical score between a pair of terms according to the semantic evidence observed in one or several knowledge sources, which are used as semantic background. Ontologies have been of great interest for the semantic similarity research community as they offer a structured and unambiguous representation of knowledge in the form of conceptualizations interconnected by means of semantic pointers. These structures can be exploited in order to assess the degree of semantic proximity between terms. According to the principle in which the similarity/relatedness computation is based and the way in which the ontology is exploited and/or complemented with other sources (e.g., thesaurus, domain corpora, etc.), different families of methods can be identified. In this paper we survey and compare most of the ontology-based similarity/relatedness measures developed in recent years. We tried to collect as much relevant approaches as possible in order to offer an updated and detailed review and comparison of the expected performance of these measures both from theoretical and practical points of view. Concretely, for each family of functions, we identify their main advantages and limitations under the dimensions of expected accuracy, computational complexity, dependency on knowledge sources (type, size, structure- dependency and pre-processing) and parameter tuning. We also present a new feature-based method for semantic likeness assessment based on the exploitation of the taxonomical knowledge available in an ontology. By considering semantic evidence typically omitted by other feature-based approaches, this measure aims to rival state of the art ontology-based approaches in terms of accuracy, while retaining a low computational complexity. In order to compare all the semantic similarity/relatedness measures in a practical setting and evaluate them against our own approach in an objective manner, we collected the results reported by related works covered in the paper and tested our approach under the same conditions. Several widely used benchmarks have been considered in order to enable an objective comparison. The rest of the paper is organized as follows. Sections 2 surveys, reviews and compares related works in ontology-based semantic similarity/relatedness estimation, classifying them in different families. Section 3 presents the new method for measuring semantic likeness. Section 4 summarizes the evaluation results reported by related works for the analysed measures and tests the accuracy of our own approach under the same conditions. Section 5 discusses and compares the different measures according to the reported evaluation results. The final section contains the conclusions. 2. Ontology-based semantic similarity/relatedness Ontologies provide a formal specification of a shared conceptualization (Guarino, 1998). Being machine readable and constructed from the consensus of a community of users or domain experts, they represent a very reliable and structured knowledge source. Due to this reason, and thanks to initiatives such as the Semantic Web, which brought the creation of thousands of domain ontologies (Ding, et al., 2004), ontologies have been extensively exploited in knowledge-based systems (Valls, Gibert, Sánchez, & Batet, 2010) and, more precisely, to compute semantic likeness. A paradigmatic example is WordNet (Fellbaum, 1998), a domain-independent and general purpose thesaurus that describes and organizes more than 100,000 general English concepts, which are semantically structured in an ontological fashion. It contains words (nouns, verbs, adjectives and adverbs) that are linked to sets of cognitive synonyms (synsets), each expressing a distinct concept (i.e., a word sense). Synsets are linked by means of conceptual-semantic and lexical relations such as synonymy, hypernymy (is-a), six types of meronymy (part-of), antonymy, complementary and so on. The backbone of the network of words is the subsumption hierarchy which accounts more than an 80% of all the modelled semantic links, with a maximum depth of 16 nodes. The result is a network of meaningfully related words, where the graph model can be exploited to interpret the meaning of the concept. In fact, WordNet has been used as the background ontology in all related works. In this section, we survey and compare related works in ontology-based similarity assessment according to the following classification: 1. Edge-counting approaches. 2. Feature-based measures. 3. Measures based on Information Content. 2.1. Edge counting approaches Ontologies can be seen as a directed graph in which concepts are interrelated mainly by means of taxonomic (is-a) and, in some cases, non-taxonomic links. By mapping input terms to ontological concepts by means of their textual labels, a straightforward method to calculate their similarity is to compute the minimum Path Length connecting their corresponding ontological nodes via is-a links (Rada, Mili, Bichnell, & Blettner, 1989). The longer the path, the more semantically far the terms are. Let us define path(a,b)=l1,....lk as a set of links connecting the terms a and b in a taxonomy. Let |path(a,b)|=k be the length of this path. Then, considering all the possible paths from a to b, their semantic distance as defined by (Rada, et al., 1989) is (1). ( ) ( )bapatha,bdis iirad ,min ∀= (1) Several variations and improvements of this edge-counting approach have been proposed. On one hand, in addition to this absolute distance between terms, Wu and Palmer (Wu & Palmer, 1994) considered that the relative depth in the taxonomy of the concepts corresponding to the evaluated terms is an important dimension, because concept specializations become less distinct as long as they are recursively refined. So, equally distant pairs of concepts belonging to an upper level of a taxonomy should be considered less similar than those belonging to a lower lever. Wu and Palmer’s measure count the number of is-a links (N1 and N2) from each term to their Least Common Subsumer (LCS) (i.e., the most concrete taxonomical ancestor that subsumes both terms) and also the number of is-a links of the LCS to the root (N3) of the ontology (2). 321 3 & 2 2 ),( NNN N basim pw ×++ × (2) Based on the same principle, Leadcock and Chodorow (Leacock & Chodorow, 1998) also proposed a measure that considers both the number of nodes Np separating the ontological nodes corresponding to terms a and b, included themselves, and the depth D of the taxonomy in which they occur in a non-linear fashion (3). )2/log(),(& DNbasim pcl −= (3) Li et al., (Li, Bandar, & McLean, 2003) also proposed a similarity measure that combines the shortest path length and the depth of ontology information in a non-linear function (4). ( ) hh hh bapath li ee ee ebasim ii ββ ββ α − − − + − = ∀ ·),( ,min (4) , where h is the minimum depth of LCS in the hierarchy and α ≥0 and β> 0 are parameters scaling the contribution of shortest path length and depth, respectively. Based on a benchmark data, authors stated that the optimal parameters for the measure with respect to a concrete set of human judgements were: α =0.2; β =0.6. However, this is just an empirical finding for a specific setting. It lacks a theoretical basis and cannot be generalized. Al-Mubaid and Nguyen (H. Al-Mubaid & Nguyen, 2006) proposed a cluster-based measure that combines the minimum path length and the taxonomical depth. They define clusters for each of the branches in the hierarchy with respect the root node. They measure the common specificity of two terms by subtracting the depth of their LCS from the depth Dc of the cluster (5). )),((),( baLCSdepthDbaCSpec c −= (5) The common specificity is used to consider that lower level pairs of concept nodes are more similar than higher level pairs, as in Wu and Palmer approach. So, the proposed distance measure (sem) is defined as follows (6): ( ) ))()1,log((min),( kCSpecbapathbadis iisem +×−= ∀ βα (6) , where α>0 and β>0 are contribution factors of path length and common specify features and k is a constant. Authors use k=1 because with k≥1 they proved that the distance is positive. Moreover, in their experiments, they give an equal weight to the contribution of the two components (path length and common specify) by using α= β = 1. Both Li et al., And Al-Mubaid and Nguyen approaches are often considered in the literature (Petrakis, Varelas, Hliaoutakis, & Raftopoulou, 2006; Pirró, 2009) as “hybrid” approaches, as they combine several structural characteristics (such as path length, depth and local density) and assign weights to balance the contribution of each component to the final similarity value. Even though their accuracy for a concrete scenario (see evaluation section) is higher than more basic edge-counting measures, they depend on the empirical tuning of weights according to the ontology and input terms. Hirst and St-Onge (Hirst & St-Onge, 1998) extended the notion of taxonomical edge- counting by considering also non-taxonomic semantic links in the path (full_path). All types of relations found in WordNet together with rules that restrict possible semantic chains are considered, along with the intuition that the longer the path and the more changes in relation’s direction, the lower the likeness. The following path directions are considered: upward (such as hypernymy and meronymy), downward (such as hyponymy and holonymy) and horizontal (such as antonymy). The resulting formula is (7) ),(),(_),(& baturnskbapathfullCbasim sh ×−−= (7) , where C and k are constants (C = 8 and k = 1 are used by the authors), and turns(a, b) is the number of times the path’s direction changes. Due to the non-taxonomic nature of some of the relations considered during the assessment, Hirst and St-Onge’s measure captures a more general sense of relatedness than of taxonomical similarity, assessed by the approaches detailed above. The main advantage of the presented measures is their simplicity. They only rely on the graph model of an input ontology whose evaluation requires a low computational cost (in comparison to approaches dealing with text corpora, see Section 2.3). However, several limitations hamper their performance. First, edge-counting measures only consider the shortest path between concept pairs. When they are applied to wide and detailed ontologies such as WordNet that incorporate multiple taxonomical inheritance, the result is that several taxonomical paths are not taken into account. Other features also influencing the concept semantics, such as the number and distribution of common and non-common taxonomical ancestors are neither considered. As a result, by taking only the minimum path between concepts, many of the taxonomical knowledge explicitly modelled in the ontology is omitted. Another problem of path-based measures typically admitted (Bollegala, Matsuo, & Ishizuka, 2009; Wan & Angryk, 2007) is that they rely on the notion that all links in the taxonomy represent a uniform distance. In practice, the semantic distance among concept specializations/generalizations in an ontology would depend on the degree of granularity and taxonomic detail implemented by the knowledge engineer. 2.2. Feature-based measures Feature-based methods try to overcome the limitations of path-based measures regarding the fact that taxonomical links in an ontology do not necessary represent uniform distances. This is addressed by considering the degree of overlapping between sets of ontological features. As a result, they are more general and, potentially, they could be applied in cross ontology similarity estimation settings (i.e., when concept pairs belong to two different ontologies), a situation in which edge-counting methods cannot be directly applied (Petrakis, et al., 2006). So, on the contrary to edge-counting measures which, as stated above, are based on the notion of minimum path distance, feature-based approaches assess similarity between concepts as a function of their properties. This is based on the Tversky’s model of similarity (8), which, derived from the set theory, takes into account common and non common features of compared terms, subtracting the latter from the former ones. In fact, common features tend to increase similarity and non-common ones tend to diminish it (Tversky, 1977). Formally, let Ψ(a) and Ψ(b) be the features of terms a and b respectively, let Ψ(a) ∩ Ψ(b) be the intersection between those two sets of features, and Ψ (a)\ Ψ (b) the set obtained when eliminating the elements of Ψ(b) from the set of features of concept a, Ψ(a). Then, the similarity between a and b is proposed to be computed as a function of Ψ(a) ∩ Ψ(b), Ψ(a)\Ψ (b) and Ψ(b)\ Ψ (a) as. ))(\)(())(\)(())()((),( abFbaFbaFbasimtve ΨΨ⋅−ΨΨ⋅−Ψ∩Ψ⋅= γβα (8) , where F is a function that reflects the salience of a set of features, and α, β and γ are parameters that weight the contribution of each component. The definition of the set of features is crucial in this model. The existing approaches rely on information that is available in ontologies, in particular the set of synonyms (called synsets in Wordnet), definitions (i.e., glosses, containing textual descriptions of word senses) and different kinds of semantic relationships are considered. In Rodriguez and Egenhofer (Rodríguez & Egenhofer, 2003), the similarity is computed as the weighted sum of similarities between synsets, features (e.g., meronyms, attributes, etc.) and neighbour concepts (those linked via semantic pointers) of evaluated terms (9). ),(),(),(),(& baSvbaSubaSwbasim odsneighborhofeaturessynsetser ⋅+⋅+⋅= (9) , where w, u and v weight the contribution of each component, which depend on the characteristics of the ontology and S represents the overlapping between the different features, computed as: A\)),(1(B\),( ),( BbaAbaBA BA baS γγ −++∩ ∩ = (10) , where A, B are the terms evaluated for concepts corresponding to a and b, A\B is the set of terms in A but not in B and B\A the set of terms in B but not in A. Finally, γ(a, b) is computed as a function of the depth of a and b in the taxonomy as follows: ⎪ ⎪ ⎩ ⎪⎪ ⎨ ⎧ > + − ≤ + = )()(, )()( )( 1 )()(, )()( )( ),( bdepthadepth bdepthadepth adepth bdepthadepth bdepthadepth adepth baγ (11) In Petrakis et al., (Petrakis, et al., 2006) a feature-based function called X-similarity relies on the matching between synsets and a concept’s glosses extracted from WordNet (i.e., words extracted by parsing term definitions). They consider that two terms are similar if the synsets and glosses of their concepts and those of the concepts in their neighbourhood (following semantic relations) are lexically similar. The similarity function is expressed as follows: ⎪⎩ ⎪ ⎨ ⎧ = > =− 0),()},,(),,(max{ 0)(,1 ),( , baSifbaSbaS baSif basim synsetsglossesodsneighborho synsets SimilarityX (12) The similarity for the semantic neighbours Sneighborhoods is calculated as follows: ii ii i odsneighborho BA BA baS ∪ ∩ = ∈SR max),( (13) , where each different semantic relation type (i.e., is-a and part-of in WordNet) is computed separately and the maximum (considering all the synsets of all concepts up to the root of each hierarchy) is taken. Equivalently, the similarity for glosses Sglosses and synonyms Ssynsets are both computed as: BA BA baS ∪ ∩ =),( (14) , where A and B denote the set of synsets or glosses for the term a and b. Feature-based measures exploit more semantic knowledge than edge-counting approaches, evaluating both commonalities and differences of compared concepts. However, by relying on features like glosses or synsets (in addition to taxonomic and non-taxonomic relationships), those measures limit their applicability to ontologies in which this information is available. Only big ontologies/thesauri like WordNet include this kind of information. In fact, an investigation of the structure of existing ontologies via the Swoogle ontology search engine (Ding, et al., 2004) reveals that domain ontologies very occasionally model any semantic feature apart from taxonomical relationships. Another problem is their dependency on the weighting parameters that balance the contribution of each feature. In all cases, those parameters should be tuned according to the nature of the ontology and even to the evaluated terms. This hampers their applicability as a general purpose solution. Only the definition of Petrakis (Petrakis, et al., 2006) does not depend on weighting parameters, as the maximum similarity provided by each feature alone is taken. Even though this adapts the behaviour of the measure to the characteristics of the ontology, the contribution of other features is omitted if only the maximum value is taken at each time. 2.3. Measures based on Information Content Also acknowledging some of the limitations of edge-counting approaches, Resnik (Resnik, 1995) proposed to complement the taxonomical structure of an ontology with the information distribution of concepts evaluated in input corpora. He exploited the notion of Information Content (IC), by associating appearance probabilities to each concept in the taxonomy, computed from their occurrences in a given corpus. IC of a term a is computed according to the negative log of its probability of occurrence, p(a) (15). In this manner, infrequent words are considered more informative than common ones. )(log)( aPaIC −= (15) According to Resnik, semantic similarity depends on the amount of shared information between two terms, a dimension which is represented by their Least Common Subsumer (LCS) in an ontology. Two terms are maximally dissimilar if a LCS does not exist (i.e., in terms of edge-counting, it would be not possible to find a path connecting them). Otherwise, their similarity is computed as the IC of the LCS (16). )),((),( baLCSICbasimres = (16) One of the problems of Resnik’s metric is that any pair of terms having the same LCS results in exactly the same semantic similarity. Both Lin (Lin, 1998) and Jiang and Conrath (Jiang & Conrath, 1997) extended Resnik’s work by also considering the IC of each of the evaluated terms. Lin proposed that the similarity between two terms should be measured as the ratio between the amount of information needed to state their commonality and the information needed to fully describe them. As a corollary of this theorem, his measure considers, on one hand, commonality in the same manner as Resnik’s approach and, on the other hand, the IC of each concept alone (17). ))()(( ),(2 ),( bICaIC basim basim reslin + × = (17) The measure proposed by Jiang and Conrath is based on quantifying, in some way, the length of the taxonomical links as the difference between the IC of a concept and its subsumer. When comparing term pairs, they compute their distance by subtracting the sum of the IC of each term alone from the IC of their LCS (18). ),(2))()((),(& basimbICaICbadis rescj ×−+= (18) It is important to note that IC-based measures need, in order to behave properly, that the probability of appearance p monotonically increases as one moves up in the taxonomy (i.e., ∀ ci | cj is hypernym of ci => p(ci)≤ p(cj) ). This is achieved by computing p(a) as the probability of encountering any instance of a in the given corpus. In practice, each individual occurrence of any noun in the corpus is counted as an occurrence of each taxonomic class containing it (19) (Resnik, 1995). N wcount ap aWw ∑ ∈ = )( )( )( (19) , where W(a) is the set of nouns in the corpus whose senses are subsumed by a, and N is the total number of nouns in the corpus. As a result, an accurate computation of concept probabilities requires a proper disambiguation and annotation of each noun found in the corpus. This process is usually done manually to ensure the correctness of the tagging, hampering the scalability and applicability of this approach with large corpora. Moreover, if either the taxonomy or the corpus changes, re-computations are needed to be recursively executed for the affected concepts. So, it is necessary to perform a manual and time- consuming analysis of corpora and resulting probabilities would depend on the size and nature of input corpora. Moreover, the background taxonomy must be as complete as possible (i.e., it should include most of the specializations of each concept) in order to provide reliable results. Partial taxonomies with a limited scope may not be suitable for this purpose. All those aspects limit the scalability and applicability of those approaches (David Sánchez, Batet, Valls, & Gibert, 2009). Considering the limitations of IC-based approaches due to their dependency on corpora, some authors tried to intrinsically derive IC values from an ontology. These works rely on the assumption that the taxonomic structure of ontologies like WordNet is organized in a meaningful way, according to the principle of cognitive saliency (Blank, 2003). This states that humans specialise concepts when they need to differentiate them from already existing ones. So, concepts with many hyponyms (i.e., specializations) are more general and provide less information than the concepts at the leaves of the hierarchy. From the Information Theory point of view, they consider that abstract ontological concepts appear more probably in a corpus as they subsume many other ones. In this manner, the probability of appearance of a concept (i.e. the IC) is estimated as a function of the number of hyponyms and/or their relative depth in the taxonomy. Seco et al., (Seco, Veale, & Hayes, 2004) and Pirró and Seco (Pirró & Seco, 2008) base IC calculations on the number of hyponyms. Being hypo(a) the number of hyponyms of the concept a and max_nodes the number of hyponyms of the root node, they compute IC of a concept in the following way (20): )_log( )1)(log( 1)( nodesmax ahypo aICseco + −= (20) The denominator ensures that IC values are normalized in the range [0..1]. This approach only considers hyponyms of a given concept in the taxonomy; so, concepts with the same number of hyponyms but different degrees of generality appear to be equally similar. In order to tackle the problem, and in the same manner as for edge-counting measures, Zhou et al., (Zhou, Wang, & Gu, 2008) proposed to complement hyponym-based IC computation with the relative depth of each concept in the taxonomy. IC of a concept is computed as (21): ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ −+⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ + −= )_log( ))(log( )1( )_log( )1)(log( 1)( depthmax adepth k nodesmax ahypo kaIC zhou (21) In addition to hypo and max_nodes, which has the same meaning as eq. 20, depth(a) corresponds to the depth of the concept a in the taxonomy and max_depth is the maximum depth of the taxonomy. The factor k adjusts the weight of the two features involved in the IC assessment. They use k=0.5. Both ways of intrinsically computing IC have been applied directly on the similarity functions proposed by Resnik, Lin and Jiang and Conrath. Those approaches overcome most of the problems observed for corpus-based IC approaches (specifically, the need of corpus processing and their high data-sparseness) competing and even improving them in terms of accuracy (as it will be stated in the evaluation) when applied over WordNet. However, they require big, and fine grained taxonomies/ontologies with a detailed taxonomical structure in order to properly differentiate concept’s IC. For small or very specialized ontologies with a limited taxonomical depth and low branching factor, resulting IC values between concepts would become too homogenous to enable a proper differentiation. 3. A new feature-based measure exploiting taxonomical knowledge From the study of ontology-based semantic similarity measures presented in the previous section, the basic conclusions that we want to stress are the following: • Pure ontology-based measures, like the ones based on edge-counting, features and intrinsic IC computation are characterized by their simplicity and computational efficiency as they only exploit the semantic network provided by the ontology. Edge-counting measures are the simplest ones and, in consequence, their accuracies have been surpassed (as it will be stated in the evaluation section) by more complex approaches exploiting additional semantic evidence. Feature-based approaches, however, rely on features which are hardly found in domain ontologies, such as non taxonomic relationships, attributes, synonym sets or glosses. In consequence, their applicability and accuracy depend on the availability of this information. • Information Content approaches based on semantically-annotated textual data aim to improve pure ontology-based ones by capturing implicit semantic as a function of concept distribution in corpora. However, the association between the words found in a corpus and concepts which are needed to compute accurate concept appearance frequencies is not straightforward, requiring a process of manual word sense tagging for disambiguation. This hampers the applicability of these methods in practise, which are also affected by corpora availability and data sparseness. In this section we present a new measure that relies on taxonomical features extracted from an ontology. Being a feature-based method, our proposal follows a similar principle as proposed in the Tversky’s model (eq. 8), which considers that the similarity between two concepts can be computed as a function their common and differential features. Differently from previous feature-based approaches presented in section 2.2, we only rely on taxonomic information. This is due to the fact that available ontologies rarely model other kind of knowledge a part from taxonomical relationships (Ding, et al., 2004). As a matter of fact, as stated in section 2, taxonomical knowledge represent more than the 80% of the total amount of relationships modelled in WordNet. Moreover, on the contrary to other feature-based measures, as only one type of feature will be considered (i.e., taxonomic relationships) no tuning parameters will be used to weight the contribution of potentially scarce semantic features, overcoming one of the limitations observed in related works and improving the generality of our measure. 3.1. Evaluating concept dissimilarity Our measure considers as features the taxonomical categorization of concepts given by the ontology in order to evaluate the amount of dissimilarity (i.e. semantic distance) between concepts. Concretely, we consider that a term can be semantically distinguished from other ones by comparing the set of concepts that subsume it. Using the same notation introduced by Tversky in (Tversky, 1977), the following definitions formalize this idea. Definition 1. Let C be the set of concepts of an ontology, we define concept subsumption (≤) as a binary relation ≤ : C×C. Having two concepts ci and cj, ci ≤ cj is fulfilled if ci is a hierarchical specialization of cj or if ci=cj (i.e., they are the same concept even though they could be expressed by means of equivalent synonyms). The fact of including the concept itself in the subsumption relation assumes the notion of dominance as a reflexive relation (Partee, ter Meulen, & Wall, 1990). Definition 2. The set of taxonomical features describing the concept a is defined in terms of the relation ≤ as: }|{)( caCca ≤∈=φ (22) It is important to note that several immediate generalizations (i.e. categorizations) per concept may be available in ontologies modelling multiple inheritance, such in WordNet (Devitt & Vogel, 2004) or in detailed domain ontologies such as MeSH or SNOMED-CT (H. Al- Mubaid & Nguyen, 2006). So, in our case, the set of taxonomical features associated to the concept includes all the upper categories found when recursively going through all the upper taxonomical paths modelled in the ontology for that concept. Oppositely, ontology-based related works (Jiang & Conrath, 1997; Leacock & Chodorow, 1998; Lin, 1998; Rada, et al., 1989; Resnik, 1995; Wu & Palmer, 1994) deal with the case of multiple taxonomical generalizations per concept by taking the one that defines the maximum similarity with respect to another concept (e.g., taking the minimum path between both concepts). In our opinion, this simplification omits a large amount of explicitly available knowledge (i.e., other generalizations and their corresponding taxonomical paths). So, an important characteristic of our definition of taxonomical features (22) is that it does not introduce this limitation. Given two concepts which are semantically described according to their taxonomic features (those of eq. 22), we consider the degree of disjunction between their feature sets (non-common taxonomical features) as a function of their distance (or dissimilarity) whereas the degree of overlap (common ones) is proportional to their similarity. Considering the set of differential taxonomical features of a with respect to b as: )}()(|{)(-)()(\)( bcacCcbaba φφφφφφ ∉∧∈∈== , we formally define the dissimilarity between a and b as: Definition 3. The dissimilarity dis: C×C →ℜ between a and b is given by the cardinality of the set of differential features of a with respect to b and the set of differential features of b with respect to a: )(\)()(\)(),( abbabadis φφφφ += (23) In order to enable accurate comparisons between the dissimilarity computed for pairs of concepts corresponding to taxonomical branches with different degrees of taxonomical detail or multiple inheritance, the value given by (23) should be normalized taking into account the total size of the set of features. To include this normalizing factor, we divide the absolute dis value by the whole amount of features extracted for both terms. This value corresponds to the sum of cardinalities of differential and common taxonomical feature sets, that is )()()(\)()(\)( baabba φφφφφφ ∩++ . This will give a relative dissimilarity value in the [0..1] interval. As a final consideration, many authors argued that an information theoretic approach, in which semantic features are evaluated in a non linear fashion (H. Al-Mubaid & Nguyen, 2006; Leacock & Chodorow, 1998; Li, et al., 2003), approximates better the concept of similarity. Following a similar principle as intrinsic IC approaches (Seco, et al., 2004; Zhou, et al., 2008), in which the IC of the concept is computed as a non-linear function of the amount of semantic features, we introduce the logarithm in our calculation, obtaining the following formulation. Definition 4. The normalized dissimilarity between a and b according to their taxonomical features is calculated as: ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ∩++ + += )()()(\)()(\)( )(\)()(\)( 1log),( 2 baabba abba badisnorm φφφφφφ φφφφ (24) In this expression, the logarithm is calculated by adding 1 to the ratio. In this manner, we avoid infinite values for equivalent terms (i.e., they are equal terms or exact synonyms corresponding to the same concept), which will lead to a zero numerator because there are not unique features differentiating them. Moreover, this expression maintains the value range of the results in the interval [0..1], being the boundary values: min_dis=log2(1+0)=0, for equivalent terms (i.e., ))()(|:()()( bxaxxiffba φφφφ ∈⇔∈∀= ). max_dis=log2(1+1)=1, for maximally different terms with disjoint feature sets resulting in equal numerator and denominator (i.e., ))()(|:()()( bxaxxiffba φφφφ ∉⇒∈∀≠ ). 3.2. Example To illustrate the behavior of our approach, let us consider the following portion of an ontology (Figure 1). At the lowest level we have four concepts: surfing, sailing, swimming and sunbathing. All of them are activities of leisure in the beach, but some of them are also sports related to wind and/or water. If only the minimal path connecting the concepts is considered, all of them are at the same distance because they are all brothers with respect to the concept beach leisure. So, this will be the answer given by the path length measures presented in section 2.1. With our proposal, which takes into account all the subsumers of the concepts, we are able to distinguish different levels of dissimilarity, which better captures the semantic distance that one would attach to those terms. <
> Figure 1. Ontology example. In our case, the set of features generated for the following four concepts are: { } { } { } { }activityleisurebeachsunbathingsunbathing urebeach_leisactivity,sport,water,swimming,swimming leisurebeachactivitysportwaterwindsailingsailing leisurebeachactivitysportwaterwindsurfingsurfing ,_,)( )( _,,,,,)( _,,,,,)( = = = = φ φ φ φ From these sets of features we can calculate the dissimilarity between any of those pairs of terms. Table 1 contains the results of applying eq. 24. As the measure is symmetric, we present only the lower triangular distance matrix. To give an example, the dissimilarity between sailing and sunbathing is calculated as follows: 78.0 214 14 1log),( 2 =⎟ ⎠ ⎞ ⎜ ⎝ ⎛ ++ + +=sunbathingsailingdisnorm , where the number of differential taxonomical features of sailing with respect to sunbathing is 4 (i.e., sailing, wind, water and sport) and the number of differential features of sunbathing with respect to sailing is 1 (i.e., sunbathing), and the set of common elements has a cardinality of 2 (i.e., beach_leisure and activity). Table 1. Dissimilarities calculated with disnorm for leaf concepts presented in Figure 1. Concepts Surfing Sailing Swimming Sunbathing Surfing 0 Sailing 0.36 0 Swimming 0.51 0.51 0 Sunbathing 0.78 0.78 0.65 0 For the ontology fragment in Figure 1, the concepts sailing and surfing are the most similar ones (least distance) as it was expected, because they are sports made in the water with the help of the wind, as well as, beach leisure activities (as it is explicitly stated in the ontology via taxonomical relationships). Swimming is more similar to sailing and surfing than sunbathing because it is also a water sport. In fact, sunbathing is only a relaxing activity in the beach not related with sports, so it is considered the semantically farthest to the rest of concepts. All those semantic evidences are properly captured by our approach by exploiting taxonomical knowledge (especially in the case of multiple inheritance) as it is presented in Table 1. 3.3. Properties In order to show the validity of the presented measure, we have studied the properties that a dissimilarity measure must fulfill. It is important to note that the fulfilment of those properties is a requirement if the measure is used in conjunction with some reasoning or data mining techniques (for example, similarity is a core element in case-based reasoning stages: case base building, case retrieval, and even case adaptation (O'Sullivan, Smyth, & Wilson, 2005)). In fact, the coherency of the results obtained in clustering algorithms relying on similarity measures may depend on the fulfilment of those properties (Everitt, Landau, & Leese, 2001). In this respect, several of the measures proposed in related works (especially weighted and feature- based ones (Li, et al., 2003; Rodríguez & Egenhofer, 2003; Tversky, 1977)), do not accomplish basic properties such as minimality, hampering their applicability in some data mining algorithms. Proposition 1. The function disnorm fulfills the properties of dissimilarity measures (Euzenat & Shvaiko, 2007): 0),(,, ≥Ο∈∀ badisba (positiveness) 0),(, =Ο∈∀ aadisa (minimality) ),(),(,, abdisbadisba =Ο∈∀ (symmetry) Proof. By adding 1 to the inner expression of the logarithm, values positively range from 0 to 1 so, positiveness is ensured. Equivalent terms will result in an empty set of unique features and in a disnorm value of log2(1)=0, accomplishing the second property that says that the minimum dissimilarity must be given by the comparison of one item with itself. Finally, as a and b are evaluated according to their feature sets (i.e., the order of the elements in the sets is irrelevant) and all the operations performed over those sets are commutative (i.e., difference and intersection), the measure accomplishes the symmetry property. □ 3.4. Dealing with polysemic terms When input terms are raw words extracted from text, some of them may be polysemous. For general ontologies such as WordNet, polysemic words correspond to several concepts (i.e., one per word sense) which can be found by mapping words to concept synsets. As a matter of fact, in WordNet 2, polysemic nouns correspond to an average of 2.77 concepts2. A proper disambiguation of input terms may solve the ambiguity, assigning input words to unique ontological concepts. However, as stated in the introduction, semantic similarity is promoted exactly for applications dealing with various levels of ambiguity because in texts we find words rather than concepts (Budanitsky & Hirst, 2006). In previous works, polysemic words have been tackled by retrieving all possible concepts corresponding to a term and then, computing individual similarities for each possible pair of 2 http://wordnet.princeton.edu/wordnet/man2.1/wnstats.7WN.html concepts and selecting, as the final result, the maximum similarity value obtained. The rationale for this criterion is that in order to evaluate the similarity between two non-disambiguated words (i.e., no context is available), human subjects would pay more attention to their similarities (i.e., most related senses) rather than their differences, as it has been demonstrated in psychological studies (Tversky, 1977). Therefore, we have taken the same approach to solve this problem, taking the minimum dissimilarity value obtained for all the possible combinations. Definition 5. The generalized dissimilarity measure which is able to deal with polysemic terms is defined as: )','(min),( ' ' badisbadis norm Bb Aadgeneralize ∈∀ ∈∀ = (25) , where A is the set of concepts (i.e., word senses) for the term a, and equally for term b. Proposition 2. The function disgeneralized fulfills the properties of dissimilarity measures: positiveness, minimality and symmetry. Proof. The proof is straightforward considering that disnorm fulfills the positiveness, minimality and symmetry as the minimum operator keeps the properties when applied to it. □ 4. Results As stated in (Bollegala, et al., 2009), an objective evaluation of the accuracy of a semantic similarity function is difficult because the notion of similarity is a subjective human judgement. In order to enable fair comparisons, several authors created evaluation benchmarks consisting of word pairs whose similarity were assessed by a set of humans. Rubenstein and Goodenough (Rubenstein & Goodenough, 1965) defined the first experiment in 1965 in which a group of 51 students, all native English speakers, assessed the similarity of 65 word pairs selected from ordinary English nouns on a scale from 0 (semantically unrelated) to 4 (highly synonymous). Miller and Charles (Miller & Charles, 1991) re-created the experiment in 1991 by taking a subset of 30 noun pairs which similarity was reassessed by 38 undergraduate students. The correlation obtained with respect to Rubenstein and Goodenough experiment was 0.97. Resnik (Resnik, 1995) replicated again the same experiment in 1995, in this case, requesting 10 computer science graduate students and post-doc researchers to assess similarity. The correlation with respect to Miller and Charles results was 0.96. Finally, Pirro (Pirró, 2009) replicated and compared the three above experiments in 2009, involving 101 human subjects, both English and non-English native speakers. He obtained an average correlation of 0.97 with respect to Rubenstein and Goodenough experiment, and 0.95 with respect to Miller and Charles experiment. It is interesting to see the high correlation obtained between the experiments even though being performed in a period of more than 40 years and with heterogeneous sets of people. This states that similarity between selected words is stable over the years, making them a reliable source for comparing measures. In fact, Rubenstein and Goodenough and Miller and Charles benchmarks have become de facto standard tests to evaluate and compare the accuracy of similarity measures. Authors quantify the accuracy of their measures by computing the correlation between the similarity ratings reported in these benchmarks against those obtained by means of the computerized assessments. If the two ratings are exactly the same which means that the similarity function perfectly mimics human judgements, correlation coefficient will be 1, whereas 0 means that automatic assessments are unrelated to human opinions. Spearman’s and Pearson’s correlations coefficients have been commonly used in the literature; both are equivalent if ratings sets are ordered (which is the case). They are also invariant to linear transformations which may be performed over results such as a change between distance and similarity by changing the sign of the value or normalizing values in a range. The use of these benchmarks and the correlation coefficient as a measure of evaluation enables an objective comparison between measures. In order to evaluate the accuracy of related works commented in section 2, we have taken the correlation values originally reported by related works for Rubenstein and Goodenough and Miller and Charles benchmarks (when available) and summarized them in Table 2. In the case in which a concrete measure depends on certain parameters (such as weights or corpora selection/processing) the best correlation value reported in authors’ experiments according to optimum parameter tuning was compiled. It is important to note that, even though some of them rely on different knowledge sources (such as tagged corpora), all measures use WordNet as background ontology. Concretely, WordNet 2 is the most common version used in related works. In cases in which original authors used an older version (WordNet 2 was released in July 2003), we took a more recent replication of the measure evaluation performed by another author in order to enable a fair comparison. At the end, we picked up correlation results reported by authors in papers published from 2004 to 2009. In order to evaluate and compare our approach against related works under the same conditions, we applied it to the two benchmarks also using WordNet 2 as ontology. The correlation values obtained in our case are shown in the last row of Table 2. Table 2. Correlation values for each measure. From left to right: authors, measure type, correlation for Miller and Charles benchmark, correlation for Rubenstein and Goodenough benchmark and reference in which those correlations where reported Measure Type M&C R&G Evaluated in Rada et al., (path length) Edge-counting 0.59 N/A (Petrakis, et al., 2006) Wu and Palmer Edge-counting 0.74 N/A (Petrakis, et al., 2006) Leacock and Chodorow Edge-counting 0.74 0.77 (Patwardhan & Pedersen, 2006) Li et al., Edge-counting 0.82 N/A (Petrakis, et al., 2006) Al-Mubaid and Nguyen (sem) Edge-counting N/A 0.815 (Hisham Al-Mubaid & Nguyen, 2009) Hirst and St-Onge Edge-counting 0.78 0.81 (Wan & Angryk, 2007) Rodriguez and Egenhofer Feature 0.71 N/A (Petrakis, et al., 2006) Tversky Feature 0.73 N/A (Petrakis, et al., 2006) Petrakis et al., (X-similarity) Feature 0.74 N/A (Petrakis, et al., 2006) Resnik IC (corpus) 0.72 0.72 (Patwardhan & Pedersen, 2006) Lin IC (corpus) 0.7 0.72 (Patwardhan & Pedersen, 2006) Jiang and Conrath IC (corpus) 0.73 0.75 (Patwardhan & Pedersen, 2006) Resnik (IC computed as Seco et al.,) IC (intrinsic) N/A 0.829 (Zhou, et al., 2008) Lin (IC computed as Seco et al.,) IC (intrinsic) N/A 0.845 (Zhou, et al., 2008) Jiang and Conrath (IC computed as Seco et al.,) IC (intrinsic) N/A 0.823 (Zhou, et al., 2008) Resnik (IC computed as Zhou et al.,) IC (intrinsic) N/A 0.842 (Zhou, et al., 2008) Lin (IC computed as Zhou et al.,) IC (intrinsic) N/A 0.866 (Zhou, et al., 2008) Jiang and Conrath (IC computed as Zhou et al.,) IC (intrinsic) N/A 0.858 (Zhou, et al., 2008) Our approach (eq. 25) Feature 0.83 0.857 - 5. Discussion From the results reported in Table 2, in the following, we will make a comparative analysis of the different measures covered in this paper. Together with their accuracy, their main advantages and drawbacks from the application point of view will be discussed. The basic path length measure (Rada, et al., 1989) presents the lowest accuracy (0.59) due to the fact that absolute lengths of the paths connecting two concepts may not accurately represent their specificity. This is the case of WordNet, since concepts higher in the hierarchy are more general than those lower in the hierarchy (Pirró, 2009). As a result, other edge-counting approaches also exploiting the relative depth of the taxonomy (Wu and Palmer (Wu & Palmer, 1994), Leadcock and Chodorow (Leacock & Chodorow, 1998)) offer a higher accuracy (0.74). It is remarkable the correlation values obtained by Li (Li, et al., 2003) and Al-Mubaid and Nguyen approaches (H. Al-Mubaid & Nguyen, 2006), which combine the length of the path with the depth of the concepts in a weighted and non-linear manner. However, they rely on empirical parameters whose values have been experimentally determined to optimize the accuracy for the evaluated benchmark, hampering their generality. Hirst and St-Onge (Hirst & St-Onge, 1998) presents a similar behaviour, also relying on tuning parameters but, in this case, using non-taxonomic relationships that consider a more general concept of relatedness. Feature-based methods try to overcome the limitations of path-based measures by considering different kinds of ontological features. The problem, which has been also noted for some edge-counting measures, is their dependence on the parameters introduced to weight the contribution of each feature (for Rodriguez and Egenhofer (Rodríguez & Egenhofer, 2003) and Tversky (Tversky, 1977) approaches). Correlation values are, however, very similar to those offered by edge-counting measures (0.71-0.74) in these benchmarks. This can be motivated by the fact that they rely on concept features, such as synsets, glosses or non-taxonomic relationships which have secondary importance in ontologies like WordNet in comparison with taxonomical knowledge. In fact, those kind of features are scarce in ontologies (Ding, et al., 2004), which causes those approaches are based on partially modelled knowledge. As a result, those measures, even being more complex, are not able to significantly outperform the state of the art of edge-counting measures. For IC-based measures, we observe that approaches relying on an intrinsic computation of IC (based on the number of concept hyponyms) clearly outperform approaches relying on corpora (0.72 vs. 0.84, in average). This is very convenient as corpora dependency seriously hampers the applicability of classic IC measures. The difference between both ways of computing IC is caused by two factors. Firstly, the data sparseness problem that appear when relying on tagged corpora (which would be necessary small due to manual tagging) to obtain accurate concept appearance frequencies. Secondly, the fact that WordNet’s taxonomy is detailed and fine-grained, which enables an accurate estimation of a term’s generality as a function of its number of hyponyms. With regards to the performance of each measure, Lin (Lin, 1998) tends to improve Resnik (Resnik, 1995) one when IC is computed intrinsically, as the former is able to differentiate terms with identical LCS but different taxonomical depths. With regards to the way in which intrinsic IC is computed, more complex approaches also exploiting relative depth and relying on weighting parameters (Zhou et al., (Zhou, et al., 2008)) offer the highest accuracy (0.86). Comparing the proposed measure with other ontology-based works, one can note that our approach’s accuracy surpasses the basic edge-counting approaches (0.83 vs. 0.74). In general, in complex and detailed ontologies like WordNet, where multiple taxonomical paths can be found connecting concept pairs (overlapping hierarchies), path-based measures waste explicitly available taxonomical knowledge as only the minimum path is considered as an indication of distance. Only the Li et al.,’s measure is able to achieve a very similar accuracy when the appropriate scaling parameters are empirically chosen. Feature-based approaches’ correlations are also surpassed (0.84 vs. 0.74), even though they are based on other non-taxonomical features and weighting parameters. This shows that taxonomical knowledge plays a more relevant role in stating term similarity than other more scarce features which are typically poorly considered in available ontologies. The same situation is repeated for corpus-based IC measures (0.84 vs. 0.73) showing that the exploitation of high quality taxonomical knowledge available in ontologies provides even more reliable semantic evidences than unstructured textual resources. This is coherent to what is observed for approaches computing IC in an intrinsic manner, which, conceptually, follow a similar principle as our approach. In their case, similarity is computed as a function on the number of hyponyms whereas in our case it is estimated as a function of overlapping and non- overlapping hypernyms. Moreover, Lin and Jiang and Conrath measures computing IC intrinsically follow the same principle as feature-based measures: similarity is proportional to feature overlapping (in their case, represented by the IC of the LCS) and inversely proportional to the differences (in their case, the IC of each individual term). So, if the IC is computed from taxonomical evidences (i.e., number of hyponyms) it is coherent that their correlation values are similar as those of our approach. The only case in which they surpass our measure’s correlation is when IC is computed as Zhou et al.’s (Zhou, et al., 2008), in which a weighting parameter is introduced to optimize the assessment. Summarizing, our measure is able to provide a high accuracy without any dependency on data availability, data pre-processing or tuning parameters for a concrete scenario. As it only relies on the most commonly available ontological feature, our measure ensures its generality as a domain-independent proposal. At the same time, it retains the low computation complexity and lack of constraints of edge-counting measures as it only requires retrieving, comparing and counting ontological subsumers. This ensures its scalability when it must be used in engineering or data mining applications, which may require dealing with large sets of terms (Armengol, 2009; Batet, Valls, & Gibert, 2008). Compared to other approaches based on taxonomical knowledge, the exploitation of the whole amount of unique and shared subsumers seems to give solid semantic evidences of semantic resemblance. First, the distinctive features implicitly include information about the different paths connecting the pair of terms. In the same manner, the depth of the Least Common Subsumers of those concepts is implicitly included in the set of shared subsumers (i.e., the deeper the LCS, the higher the amount of common features). Other features that have been identified in the literature, such as relative taxonomical densities and branching factors, are also implicitly considered, being all of them useful dimensions to assess semantic similarity. As any other ontology-based measure, the final accuracy will depend on the coverage, detail, completeness and coherency of taxonomical knowledge. Moreover, most of the improvements achieved by our approach are derived from the fact that similarity is estimated from the total set of subsumer concepts considering the different taxonomical hierarchies. If the input ontology offers little taxonomical detail or does not consider multiple inheritance, the accuracy improvements of our approach with respect to measures based on the minimum path are likely to be less noticeable. Fortunately, large and broad ontologies are being developed, like WordNet as a general purpose description of concepts, or the UMLS repository in the medical context. 6. Conclusions As it has been explained in the introduction, semantic similarity assessment is a crucial component embedded in many applications framed in the artificial intelligence research area. This paper provides an up-to-date survey of ontology-based semantic similarity measures that can be used to estimate the resemblance between terms. A new measure based on taxonomical features has been also presented and compared in the context of the survey. In this measure the set of features is built from the categorization (i.e., subsumers) of the concepts modelled in the ontology. In our case, we consider subsumers as labels that describe the meaning of the concept in different levels of generality. Differently to the other feature-based approaches, this measure only relies on taxonomic ontological knowledge (which is the most commonly available one), lacking of corpora- dependency or parameter-tuning. It is computational efficient as only taxonomical branches are explored and it fulfils the mathematical properties required in many applications for coherent similarity computation (Everitt, et al., 2001; O'Sullivan, et al., 2005). The paper has analysed, under a common framework, the pros and cons of both related works and our proposal, with the aim of giving some insights on their accuracy, applicability, dependencies and limitations. In addition, a complete comparison of all these measures in a practical setting is reported, using the two widely used benchmarks. The conclusions extracted from those analyses would help practitioners in selecting the measure that better fits with the requirements of a concrete application. In particular, the results reported by our measure for the two benchmarks suggest a promising accuracy, improving the correlations reported by most of other ontology-based approaches, while minimizing the constraints that may hamper its applicability both from the computational efficiency and resource-dependency points of view. For this reason, as future work, we want to study how the inclusion of this measure can improve the results in some concrete applications. In particular, we are studying semantically grounded data mining processes and statistical disclosure control methods, which may directly benefit from a more accurate similarity assessment. Acknowledgements This work has been partially supported by the Universitat Rovira i Virgili (a pre-doctoral grant of M. Batet, a post-doctoral grant of D. Isern and 2009AIRE-04) and the Spanish Ministry of Science and Innovation (DAMASK project, Data mining algorithms with semantic knowledge, TIN2009-11005) and the Spanish Government (PlanE, Spanish Economy and Employment Stimulation Plan). References Al-Mubaid, H., & Nguyen, H. A. (2006). A cluster-based approach for semantic similarity in the biomedical domain. In 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS 2006 (pp. 2713–2717). New York, USA: IEEE Computer Society. Al-Mubaid, H., & Nguyen, H. A. (2009). Measuring Semantic Similarity between Biomedical Concepts within Multiple Ontologies. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 39, 389-398. Armengol, E. (2009). Using explanations for determining carcinogenecity in chemical compounds. Engineering Applications of Artificial Intelligence, 22, 10-17. Batet, M., Valls, A., & Gibert, K. (2008). Improving classical clustering with ontologies. In 4th World Conference of the IASC and 6th Conference of the Asian Regional Section of the IASC on Computational Statistics & Data Analysis, IASC 2008 (pp. 137-146). Yokohama, Japan: International Association for Statistical Computing. Blank, A. (2003). Words and Concepts in Time: Towards Diachronic Cognitive Onomasiology. In R. Eckardt, K. von Heusinger & C. Schwarze (Eds.), Words and Concepts in Time: towards Diachronic Cognitive Onomasiology (pp. 37-66). Berlin, Germany: Mouton de Gruyter. Bollegala, D., Matsuo, Y., & Ishizuka, M. (2007). Measuring Semantic Similarity between Words Using Web Search Engines. In C. Williamson & M. E. Zurko (Eds.), 16th international conference on World Wide Web, WWW 2007 (pp. 757-766). Banff, Alberta, Canada ACM. Bollegala, D., Matsuo, Y., & Ishizuka, M. (2009). A Relational Model of Semantic Similarity between Words using Automatically Extracted Lexical Pattern Clusters from the Web. In P. Koehn & R. Mihalcea (Eds.), Conference on Empirical Methods in Natural Language Processing, EMNLP 2009 (pp. 803–812). Singapore, Republic of Singapore: ACL and AFNLP. Budanitsky, A., & Hirst, G. (2001). Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures. In Workshop on WordNet and Other Lexical Resources, Second meeting of the North American Chapter of the Association for Computational Linguistics (pp. 10- 15). Pittsburgh, USA. Budanitsky, A., & Hirst, G. (2006). Evaluating wordnet-based measures of semantic distance. Computational Linguistics, 32, 13-47. Cimiano, P., Handschuh, S., & Staab, S. (2004). Towards the self-annotating web. In 13th international conference on World Wide Web, WWW 2004 (pp. 462 - 471). New York, USA: ACM. Curran, J. R. (2002). Ensemble Methods for Automatic Thesaurus Extraction. In Conference on Empirical Methods in Natural Language Processing, EMNLP 2002 (pp. 222–229). Philadelphia, PA, USA: Association for Computational Linguistics. Chen, M. Y., Chu, H. C., & Chen, Y. M. (2010). Developing a semantic-enable information retrieval mechanism Expert Systems with Applications, 37, 322-340. Chen, P., Lin, S. J., & Chu, Y. C. (2011). Using Google latent semantic distance to extract the most relevant information Expert Systems with Applications, 38, 7349-7358. Chu, H. C., Chen, M. Y., & Chen, Y. M. (2009). A semantic-based approach to content abstraction and annotation for content management Expert Systems with Applications, 36, 2360-2376. Devitt, A., & Vogel, C. (2004). The topology of WordNet: Some Metrics. In P. Sojka, K. Pala, P. Smrz, C. Fellbaum & P. Vossen (Eds.), The Second Global Wordnet Conference, GWC 2004 (pp. 106- 111). Brno, Czech Republic: Masaryk University. Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R. S., Peng, Y., Reddivari, P., Doshi, V., & Sachs, J. (2004). Swoogle: A Search and Metadata Engine for the Semantic Web. In thirteenth ACM international conference on Information and knowledge management, CIKM 2004 (pp. 652-659). Washington, D.C., USA: ACM Press. Euzenat, J., & Shvaiko, P. (2007). Ontology Matching: Springer Verlag. Everitt, B. S., Landau, S., & Leese, M. (2001). Cluster Analysis. London: Arnold. Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. Cambridge, Massachusetts: MIT Press. Goldstone, R. L. (1994). Similarity, interactive activation, and mapping. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 3-28. Guarino, N. (1998). Formal Ontology in Information Systems. In N. Guarino (Ed.), 1st International Conference on Formal Ontology in Information Systems, FOIS 1998 (pp. 3-15). Trento, Italy: IOS Press. Hirst, G., & St-Onge, D. (1998). Lexical chains as representations of context for the detection and correction of malapropisms. In C. Fellbaum (Ed.), WordNet: An Electronic Lexical Database (pp. 305–332): MIT Press. Jiang, J. J., & Conrath, D. W. (1997). Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In International Conference on Research in Computational Linguistics, ROCLING X (pp. 19-33). Taiwan. Lanzenberger, M., Sampson, J., Kargl, H., Wimmer, M., Conroy, C., O'Sullivan, D., Lewis, D., Brennan, R., Ramos Gargantilla, J. Á., Gómez-Pérez, A., Fürst, F., Trichet, F., Euzenat, J., Polleres, A., Scharffe, F., & Kotis, K. (2008). Making Ontologies Talk: Knowledge Interoperability in the Semantic Web. IEEE Intelligent Systems, 23, 72-85. Leacock, C., & Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In WordNet: An electronic lexical database (pp. 265-283): MIT Press. Li, Y., Bandar, Z., & McLean, D. (2003). An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. IEEE Transactions on Knowledge and Data Engineering, 15, 871-882. Lin, D. (1998). An Information-Theoretic Definition of Similarity. In J. Shavlik (Ed.), Fifteenth International Conference on Machine Learning, ICML 1998 (pp. 296-304). Madison, Wisconsin, USA: Morgan Kaufmann. Miller, G. A., & Charles, W. G. (1991). Contextual correlates of semantic similarity. Language and Cognitive Processes, 6, 1-28. O'Sullivan, D., Smyth, B., & Wilson, D. C. (2005). Understanding case-based recommendation: A similarity knowledge perspective. International Journal on Artificial Intelligence Tools, 14, 215- 232. Partee, B., ter Meulen, A., & Wall, R. (1990). Mathematical Methods in Linguistics: Kluwer Academic Publishers. Patwardhan, S., Banerjee, S., & Pedersen, T. (2003). Using Measures of Semantic Relatedness for Word Sense Disambiguation. In A. F. Gelbukh (Ed.), 4th International Conference on Computational Linguistics and Intelligent Text Processing and Computational Linguistics, CICLing 2003 (Vol. 2588, pp. 241-257). Mexico City, Mexico: Springer Berlin / Heidelberg. Patwardhan, S., & Pedersen, T. (2006). Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts. In EACL 2006 Workshop on Making Sense of Sense: Bringing Computational Linguistics and Psycholinguistics Together (pp. 1-8). Trento, Italy. Petrakis, E. G. M., Varelas, G., Hliaoutakis, A., & Raftopoulou, P. (2006). X-Similarity:Computing Semantic Similarity between Concepts from Different Ontologies. Journal of Digital Information Management, 4, 233-237. Pirró, G. (2009). A semantic similarity metric combining features and intrinsic information content. Data & Knowledge Engineering, 68, 1289-1308 Pirró, G., & Seco, N. (2008). Design, Implementation and Evaluation of a New Semantic Similarity Metric Combining Features and Intrinsic Information Content. In R. Meersman & Z. Tari (Eds.), OTM 2008 Confederated International Conferences CoopIS, DOA, GADA, IS, and ODBASE 2008 (Vol. 5332, pp. 1271-1288). Monterrey, Mexico: Springer Berlin / Heidelberg. Rada, R., Mili, H., Bichnell, E., & Blettner, M. (1989). Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 9, 17-30. Resnik, P. (1995). Using Information Content to Evalutate Semantic Similarity in a Taxonomy. In C. S. Mellish (Ed.), 14th International Joint Conference on Artificial Intelligence, IJCAI 1995 (Vol. 1, pp. 448-453). Montreal, Quebec, Canada: Morgan Kaufmann Publishers Inc. . Rodríguez, M. A., & Egenhofer, M. J. (2003). Determining semantic similarity among entity classes from different ontologies. IEEE Transactions on Knowledge and Data Engineering, 15, 442–456. Rubenstein, H., & Goodenough, J. (1965). Contextual correlates of synonymy. Communications of the ACM, 8, 627-633. Sánchez, D. (2010). A methodology to learn ontological attributes from the Web. Data & Knowledge Engineering, 69, 573-597. Sánchez, D., Batet, M., Valls, A., & Gibert, K. (2009). Ontology-driven web-based semantic similarity. Journal of Intelligent Information Systems, 35, 383-413. Sánchez, D., & Isern, D. (2009). Automatic extraction of acronym definitions from the Web. Applied Intelligence, doi: 10.1007/s10489-009-0197-4 (in press). Sánchez, D., Isern, D., & Millan, M. (2010). Content Annotation for the Semantic Web: an Automatic Web-based Approach. Knowledge and Information Systems, doi:10.1007/s10115-010-0302-3, (in press). Sánchez, D., & Moreno, A. (2008). Learning non-taxonomic relationships from web documents for domain ontology construction. Data & Knowledge Engineering, 63, 600-623. Sánchez, D., & Moreno, A. (2008). Pattern-based automatic taxonomy learning from the Web. AI Communications, 21, 27-48. Seco, N., Veale, T., & Hayes, J. (2004). An Intrinsic Information Content Metric for Semantic Similarity in WordNet. In R. López de Mántaras & L. Saitta (Eds.), 16th Eureopean Conference on Artificial Intelligence, ECAI 2004, including Prestigious Applicants of Intelligent Systems, PAIS 2004 (pp. 1089-1090). Valencia, Spain: IOS Press. Song, W., Li, C. H., & Park, S. C. (2009). Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures. Expert Systems with Applications, 36, 9095-9104. Stevenson, M., & Greenwood, M. A. (2005). A semantic approach to IE pattern induction. In K. Knight (Ed.), 43rd Annual Meeting on Association for Computational Linguistics, COLING-ACL 2005 (pp. 379–386). Ann Arbor, Michigan, USA: Association for Computational Linguistics. Tversky, A. (1977). Features of Similarity. Psycological Review, 84, 327-352. Valls, A., Gibert, K., Sánchez, D., & Batet, M. (2010). Using ontologies for structuring organizational knowledge in Home Care assistance. International Journal of Medical Informatics, 79, 370-387. Wan, S., & Angryk, R. A. (2007). Measuring Semantic Similarity Using WordNet-based Context Vectors. In M. El-Hawary (Ed.), IEEE International Conference on Systems, Man and Cybernetics, SMC 2007 (pp. 908 - 913). Montreal, Quebec, Canada: IEEE Computer Society. Wu, Z., & Palmer, M. (1994). Verb semantics and lexical selection. In 32nd annual Meeting of the Association for Computational Linguistics (pp. 133 -138). Las Cruces, New Mexico: Association for Computational Linguistics. Zhou, Z., Wang, Y., & Gu, J. (2008). A New Model of Information Content for Semantic Similarity in WordNet. In S. S. Yau, C. Lee & Y.-C. Chung (Eds.), Second International Conference on Future Generation Communication and Networking Symposia, FGCNS 2008 (pp. 85-89). Sanya, Hainan Island, China: IEEE Computer Society.