Data-Driven Metaphor Recognition and Explanation Data-Driven Metaphor Recognition and Explanation Hongsong Li Microsoft Research Asia hongsli@microsoft.com Kenny Q. Zhu Shanghai Jiao Tong University kzhu@cs.sjtu.edu.cn Haixun Wang Google Research haixun@google.com Abstract Recognizing metaphors and identifying the source-target mappings is an important task as metaphorical text poses a big challenge for machine reading. To address this problem, we automatically acquire a metaphor knowledge base and an isA knowledge base from billions of web pages. Using the knowledge bases, we develop an inference mechanism to rec- ognize and explain the metaphors in the text. To our knowledge, this is the first purely data- driven approach of probabilistic metaphor ac- quisition, recognition, and explanation. Our results shows that it significantly outperforms other state-of-the-art methods in recognizing and explaining metaphors. 1 Introduction A metaphor is a way of communicating. It enables us to comprehend one thing in terms of another. For example, the metaphor, Juliet is the sun, allows us to see Juliet much more vividly than if Shakespeare had taken a more literal approach. We utter about one metaphor for every ten to twenty-five words, or about six metaphors a minute (Geary, 2011). Specifically, a metaphor is a mapping of concepts from a source domain to a target domain (Lakoff and Johnson, 1980). The source domain is often con- crete and based on sensory experience, while tar- get domain is usually abstract. Two concepts are connected by this mapping because they share some common or similar properties, and as a result, the meaning of one concept can be transferred to an- other. For example, in “Juliet is the sun,” the sun is the source concept while Juliet is the target concept. One interpretation of this metaphor is that both con- cepts share the property that their existence brings about warmth, life, and excitement. In a metaphor- ical sentence, at least one of the two concepts must be explicitly present. This leads to three types of metaphors: 1. Juliet is the sun. Here, both the source (sun) and the target (Juliet) are explicit. 2. Please wash your claws before scratching me. Here, the source (claws) is explicit, while the target (hands) is implicit, and the context of wash is in terms of the target. 3. Your words cut deep. Here, the target (words) is explicit, while the source (possibly, knife) is implicit, and the context of cut is in terms of the source. In this paper, we focus on the recognition and ex- planation of metaphors. For a given sentence, we first check whether it contains a metaphoric expres- sion (which we call metaphor recognition), and if it does, we identify the source and the target con- cepts of the metaphor (which we call metaphor ex- planation). Metaphor explanation is important for understanding metaphors. Explaining type 2 and 3 metaphors is particularly challenging, and, to the best of our knowledge, has not been attempted for nominal concepts 1 before. In our examples, know- ing that life and hands are the target concepts avoids the confusion that may arise if source concepts sun and claws are used literally in understanding the sen- tences. This, however, does not mean that the source 1Nominal concepts are those represented by noun phrases. 379 Transactions of the Association for Computational Linguistics, 1 (2013) 379–390. Action Editor: Lillian Lee. Submitted 6/2013; Revised 9/2013; Published 10/2013. c©2013 Association for Computational Linguistics. concept is a useless embellishment. In the 3rd sen- tence, knowing that words is mapped to knife en- ables the system to understand the emotion or the sentiment embedded in the text. This is the reason why metaphor recognition and explanation is impor- tant to applications such as affection mining (Smith et al., 2007). It is worth noting that some prefer to consider the verb “cut”, rather than the noun “words”, to be metaphoric in the 3rd sentence above. We instead concentrate on nominal metaphors and seek to ex- plain source-target mappings in which at least one domain is a nominal concept. This is because verbs usually have nominal arguments, as either subject or object, thus explaining the source-target mapping of the nominal argument covers most, if not all, cases where a verb is metaphoric. In order for machines to recognize and explain metaphors, it must have extensive human knowl- edge. It is not difficult to see why metaphor recog- nition based on simple context modeling (e.g., by selectional restriction/preference (Resnik, 1993)) is insufficient. First, not all expressions that violate the restriction are metaphors. For example, I hate to read Heidegger violates selectional restriction, as the context (embodied by the verb read) prefers an object other than a person (Heidegger). But, Heideg- ger is not a metaphor but a metonymy, which in this case denotes Heidegger’s books. Second, not every metaphor violates the restriction. For example, life is a journey is clearly a metaphor, but selectional re- striction or preference is helpless when it comes to the isA context. Existing approaches based on human-curated knowledge bases fall short of the challenge. First, the scale of a human-curated knowledge base is of- ten very limited, which means at best it covers a small set of metaphors. Second, new metaphors are created all the time and the challenge is to rec- ognize and understand metaphors that have never been seen before. This requires extensive knowl- edge. As a very simple example, even if the machine knows Sports cars are fire engines is a metaphor, it still needs to know what is a sports car before it can understand My Ferrari is a fire engine is also a metaphor. Third, existing human-curated knowl- edge bases (including metaphor databases and the WordNet) are not probabilistic. They cannot tell how typical an instance is of a category (e.g., a robin is a more typical bird than a penguin), or how popu- lar an expression (e.g., a breath of fresh air) is used as a source concept to describe targets in another concept (e.g., young girls). Unfortunately, without necessary probabilistic information, not much rea- soning can be performed for metaphor explanation. In this paper, we address the above challenges. We start with a probabilistic isA knowledge base of many entities and categories harnessed from billions of web documents using a set of strict syntactic pat- terns known as the Hearst patterns (Hearst, 1992). We then automatically acquire a large probabilistic metaphor database with the help of both syntactic patterns and the isA knowledge base (Section 3). Finally we combine the two knowledge bases and a probabilistic reasoning mechanism for automatic metaphor recognition and explanation (Section 4). This paper makes the following contributions: 1. To our knowledge, we are the first to intro- duce the metaphor explanation problem, which seeks to recover missing or implied source or target concepts in an implicit metaphor. 2. This is the first big-data driven, unsupervised approach for metaphor recognition and expla- nation. One of the benefits of leveraging big data is that the knowledge we obtain is less bi- ased, has great coverage, and can be updated in a timely manner. More importantly, a data driven approach can associate with each piece of knowledge probabilities which are not avail- able in human curated knowledge but are indis- pensable for inference and reasoning. 3. Our results show the effectiveness both in terms of coverage and accuracy of our approach. We manage to acquire one of the largest metaphor knowledge bases ever existed with a preci- sion of 82%. The metaphor recognition accu- racy significantly outperforms the state-of-the- art methods (Section 5). 2 Related Work Existing work on metaphor recognition and interpre- tation can be divided into two categories: context- oriented and knowledge-driven. The approach pro- posed in this paper touches on both categories. 380 2.1 Context-oriented Methods Some previous work relies on context to differentiate metaphorical expressions from literal ones (Wilks, 1978; Resnik, 1993). The selection restriction the- ory (Wilks, 1978) argues that the meaning of an ex- pression is restricted by its context, and violations of the restriction imply a metaphor. Resnik (1993) uses KL divergence to measure the selectional preference strength (SPS), i.e., how strongly a context restricts an expression. Although he did not use this measure directly for metaphor recognition, SPS (and also a related measure called the selection association) is widely used in more re- cent approaches for metaphor recognition and inter- pretation (Mason, 2004; Shutova, 2010; Shutova et al., 2010; Baumer et al., 2010). For example, Ma- son (2004) learns domain-specific selectional prefer- ences and use them to find mappings between con- cepts from different domains. Shutova (2010) de- fines metaphor interpretation as a paraphrasing task. The method discriminates between literal and fig- urative paraphrases by detecting selectional prefer- ence violation. The result of this work has been compared with our approach in Section 5. Shutova et al. (2010) identify concepts in a source domain of a metaphor by clustering verb phrases and filter- ing out verbs that have weak selectional preference strength. Baumer (2010) uses semantic role labeling techniques to calculate selectional preference on se- mantic relations instead of grammatic relations for metaphor recognition. A less related but also context-based work is analogy interpretation by relation mapping (Turney, 2008). The problem is to generate mapping between source and target domains by computing pair-wise co-occurrences for different contextual patterns. Our approach uses selectional restriction when enriching the metaphor knowledge base, and adopts context preference when explaining type 2 and 3 metaphors by focusing on the nearby verbs of a po- tential source or target concept. 2.2 Knowledge-driven Methods A growing number of works use knowledge bases for metaphor understanding (Martin, 1990; Narayanan, 1997; Barnden et al., 2002; Veale and Hao, 2008). MIDAS (Martin, 1990) checks if a sen- tence contains an expression that can be explained by a more general metaphor in a human-curated metaphor knowledge base. ATT-Meta (Barnden et al., 2002) performs metaphor reasoning with a human-curated metaphor knowledge base and first order logic, and it focuses on affection detection (Smith et al., 2007; Agerri, 2008; Zhang, 2010). Kr- ishnakumaran and Zhu (2007) use the isA relation in WordNet (Miller, 1995) for metaphor recognition. Gedigian et al. (2006) use FrameNet (Fillmore et al., 2003) and Probank (Kingsbury and Palmer, 2002) to train a maximum entropy classifier for metaphor recognition. TroFi (Birke and Sarkar, 2006) rede- fines literal and non-literal as two senses of the same verb and provide two senses with seed sentences from human-curated knowledge bases like Word- Net, known metaphor and idiom sets. For a given sentence containing target verb, it compares the sim- ilarity of the sentence with two seed sets respec- tively. If the sentence is closer to the non-literal sense set, the verb is recognized as non-literal usage. While the above work all relies on human cu- rated data sets or manual labeling, Veale and Hao (2008) introduced the notion of talking points which are figurative properties of noun-based concepts. For example, the concept “Hamas” has the follow- ing talking points: is islamic:movement and gov- erns:gaza strip. They automatically constructed a knowledge base called Slip Net from WordNet and Web corpus. Concepts that are connected on the Slip Net can “slip” to one another and are hence considered related in a metaphor. However, straight- forward traversal on the Slip Net can become com- putationally impractical and the authors did not elab- orate on the implementation details. In practice, the knowledge acquired in this paper is much larger but our algorithms are computationally more feasible. 3 Obtaining Probabilistic Knowledge In this section, we describe how to use a large, general-purpose, probabilistic isA knowledge base ΓH to create a probabilistic metaphor dataset Γm. ΓH contains isA pairs as well as scores associated with each pair. The metaphor dataset Γm contains metaphors of the form: (source, target), and a weight function Pm that maps a metaphor pair to a probabilistic score. The purpose of creating ΓH is 381 to help clean and expand Γm, and to perform proba- bilistic inference for metaphor detection. 3.1 IsA Knowledge ΓH ΓH , a general-purpose, probabilistic isA knowl- edge base, was previously constructed by Wu et al.(2012).2 ΓH contains isA relations in the form of (x,hx), a pair of hyponym and hypernym, for exam- ple, (Steve Ballmer, CEO of IT companies), and each pair is associated with a set of probabilistic scores. Two of the most important scores are known as typ- icality: P(x|hx), the typicality of x of category hx, and P(hx|x), the typicality of category hx for in- stance x, which will be used in metaphor recogni- tion and explanation. Both scores are approximated by frequencies, e.g., P(x|hx) = # of (x,hx) in Hearst extraction # of hx in Hearst extraction In total, ΓH contains 16 million unique isA rela- tionships, and 2.7 million unique concepts or cate- gories (the hx’s in (x,hx) pairs). The importance of big data is obvious. ΓH contains millions of cat- egories and probabilistic scores for each category which enables inference for metaphor understand- ing, as we will show next. 3.2 Acquiring Metaphors Γm We acquire an initial set of metaphors Γm from sim- iles. A simile is a figure of speech that explicitly compares two different things using words such as “like” and “as”. For example, the sentence Life is like a journey is a simile. Without the word “like,” it becomes a metaphor: Life is a journey. This property makes simile an attractive first target for metaphor extraction from a large corpus. We use the following syntactic pattern for extraction: 〈target〉 BE/VB like [a] 〈source〉 (1) where BE denotes is/are/has been/have been, etc., VB denotes verb other than BE, and 〈target〉 and 〈source〉 denote noun phrases or verb phrases. Note that not every extracted pair is a metaphor. Poetry is like an art matches the pattern, but it is not a metaphor because poetry is really an art. We will use ΓH to clean such pairs. Furthermore, due to the 2Dataset can be found at http://probase.msra.cn/. idiosyncrasies of natural languages, it is not trivial to correctly extract the 〈target〉 and the 〈source〉 from each sentence that matches the pattern. We use a postagger and a lemmatizer on the sentences, and we develop a rule-based system that contains more than two dozen rules for extraction. For example, a rule of high-precision but low-recall is “〈target〉 must be at the beginning of a sentence or the beginning of a clause (e.g., following the word that)”. Finally, from 8,552,672 sentences that match the above pattern (pattern 1), we obtain 1.2 million unique (x,y) pairs, and after filtering, we are left with close to 1 million unique metaphor pairs, which form the starting point of Γm. 3.3 Cleaning, Expanding, and Weighting Γm The simile pattern only allows us to extract some of the available metaphor pairs. To expand Γm, we use a more flexible but also noisier pattern to extract more candidate metaphor pairs from billions of sen- tences in the web corpus: 〈target〉 BE [a] 〈source〉 (2) The above “is a” pattern covers metaphors such as Life is a journey. But many pairs thus extracted are not metaphors, for example, Malaysia is a tropical country. That is, pairs extracted by the “is a” pat- tern contains at least two types of relations: the lit- eral isA relations and the metaphor relations. The problem is how to distinguish one from the other. In theory, the set of all IsA relations, I, and the set of all metaphor relations, M, do not overlap, because by definition, the source concept and the target con- cept in a metaphor are not the same thing. Thus, our intuition is the following. The pairs produced by the simile pattern, called S, is a subset of M, while the pairs extracted from the Hearst pattern, called H, is also a subset of I. Since M and I hardly overlap, S and H should have little overlap, too. In practice, very few people would say something like journeys such as life. Figure 1 illustrates this scenario. To verify this intuition, we randomly sampled 1,000 sentences and manually annotated them. Of these sentences, 40 contain an IsA relation, of which 27 are enclosed in a Hearst’s pattern and 13 can be extracted by the “is a” pattern. Furthermore, 28 of these 1000 sentences contain a metaphor expression, 382 (beast, sports car)(sports car, ferrari) (vehicle, ferrari) (beast, ferrari) Hearst pattern Is-a relation Simile pattern Metaphor relation “Is a” pattern Figure 1: Relations among different sets. Dotted circles represent relations (ground truth). Solid circles represent pairs extracted by syntactic patterns. and within the 28 metaphors, 15 are embedded in a simile pattern. More importantly, there is no overlap between the IsA relations and metaphors (and hence the similes). In a larger scale experiment, we crawled 1 billion sentences which match the “is a” pattern (2) from the web corpus. From these, we extracted 180 million unique (x,y) pairs. 24.8% of ΓH can be found in “is a” pattern pairs, while 16.8% of Γm can be found in “is a” pattern pairs. Further more, there is almost no overlap between ΓH and Γm: 1.26% of ΓH can be found in Γm, and 1.31% of Γm can be found in ΓH . Our goal is to use the information collected through the syntactic patterns to enrich the metaphor relations or Γm. Armed with the above observations, we make two conclusions. First, the (life, journey) pair we extracted from life is a journey is more likely a metaphor since it does not appear in the set ex- tracted from Hearst patterns. Second, if any existing pair in Γm also appears in ΓH , we can remove that pair from Γm. From the 180 million unique (x,y) pairs we ex- tracted earlier, by filtering out low frequency pairs 3 and those pairs in ΓH , we obtain 2.6 million of fresh metaphors. This is almost 3 times larger than initial metaphor set obtained from the simile pattern. We further expand Γm by adding metaphors derived from Γm and ΓH . Assume (x,y) ∈ Γm, and (x,hx) ∈ ΓH , then we add (hx,y) to Γm. As an example, if (Julie,sun) ∈ Γm, 3Specifically, we randomly sample pairs of frequency 1, 2, ..., 10 from Γm and check the precisions of each group. We filter out pairs with frequency less than 5 to optimize the precision. then we add (person name,sun) to Γm, since (Julie,person name) ∈ ΓH . This enables the metaphor detection approach we describe in Section 4. Note that we ignore transitivity in the isa relations from ΓH as such transitivity is not always reliable. For example, car seat is a chair, and chair is furni- ture, but car seat is not furniture. How to handle transitivity in a data driven isA taxonomy is a chal- lenging problem, and is beyond the scope here. Finally, we calculate the weight of each metaphor (x,y). The weight Pm(x,y) is calculated as follows: Pm(x,y) = occurrences of (x,y) in isA pattern occurrences of isA pattern (3) The weights of derived metaphors, such as (person name,sun), are calculated as follows: Pm(hx,y) = ∑ (x,hx)∈ΓH Pm(x,y) (4) 4 Probabilistic Metaphor Understanding In this paper, we consider two aspects of metaphor understanding, metaphor recognition and metaphor explanation. The latter is needed for type 2 and 3 metaphors where either the source or the target con- cept is implicit or missing. Next, we describe a prob- abilistic approach to accomplish these two tasks. 4.1 Type 1 Metaphors In a type 1 metaphor, both the source and the tar- get concepts appear explicitly. When a sentence matches “is a” pattern (pattern 2), it is a potential metaphor expression. The first noun in the pattern is the target candidate, while the second noun is the source candidate. To recognize type 1 metaphors, we first obtain the candidate (source, target) pair from the sentence. Then, we check if we have any knowledge about the (source, target) pair. Intuitively, if the pair exists in the metaphor dataset Γm, then it is a metaphor. If the pair ex- ists in the is-A knowledge base ΓH , then it is not a metaphor. But because Γm is far from being com- plete, if a pair exists in neither Γm nor ΓH , there is a possibility that it is a metaphor we have never seen before. In this case, we reason as follows. Consider a sentence such as My Ferrari is a beast. Assume (Ferrari, beast) 6∈ Γm, but (sports car, 383 beast) ∈ Γm. Note that (sports car, beast) may it- self be a derived metaphor which is added into Γm in metaphor expansion, and the original metaphor ex- tracted from the web data is (Lamborghinis, beast). Furthermore, from ΓH , we know Ferrari is a sports car, that is, (Ferrari, sports car) ∈ ΓH , we can then infer that Ferrari to beast is very likely a metaphor mapping. Specifically, let (x,y) be a pair we are concerned with. We want to compute the odds of (x,y) repre- senting a metaphor vs. a normal is-A relationship: P(x,y) 1 −P(x,y) (5) where P(x,y) is the probability that (x,y) forms a metaphor. Now, combining the knowledge we have in ΓH , we have P(x,y) = ∑ (x,hx)∈ΓH P(x,hx,y) (6) Here, hx is a possible superconcept, i.e., a possible interpretation, for x. For example, if x = apple, then two highly possible interpretations are com- pany and fruit. In Eq.(6), we want to aggregate on all possible interpretations (all superconcepts) of x. This is possible because of the massive size of the concept space in ΓH . We can rewrite Eq.(6) to the following: P(x,y) = ∑ (x,hx)∈ΓH P(y|x,hx)P(x|hx)P(hx) (7) Here, P(y|x,hx) means when x is interpreted as an hx, the probability of y as a target metaphorical con- cept for hx. Thus, given hx, y is independent with x, so P(y|x,hx) can be simply replaced by P(y|hx). We can then rewrite Eq.(7) to: P(x,y) = ∑ (x,hx)∈ΓH P(y|hx)P(x|hx)P(hx) = ∑ (x,hx)∈ΓH P(hx,y)P(x|hx) (8) It is clear P(hx,y) is simply Pm(hx,y) in Eq.(4) given by the metaphor dataset Γm. Furthermore, P(x|hx) is the typicality of x in the hx category, and P(hx) is the prior of the category hx. Both of them are available from the isA knowledge base ΓH . Thus, we can calculate Eq.(8) using information in the two knowledge bases we have created. If the odds in Eq.(5) is greater than a thresh- old δ, which is determined empirically to be δ = P (metaphor) P (isa) 4, we declare (x,y) as a metaphor. 4.2 Context Preference Modeling It is more difficult to recognize metaphors when the source concept or the target concept is not explic- itly given in a sentence. In this case, we rely on the context in the sentence. Given a sentence, we find metaphor candidates and the context. Here, candidates are noun phrases in the sentence which can potentially be the target or the source concept of a metaphor, while context denotes words that have a grammatic dependency with the candidate. The dependency can be subject- predicate, predicate-object, or modifier-header, etc. The context can be a verb, a noun phrase, or an ad- jective which has certain preference over the target or source candidate. For example, the word horse prefers verbs such as jump, drink and eat; the word flower prefers modifiers such as red, yellow and beautiful. In this work, we focus on analyzing the prefer- ences of verbs using subject-predicate or predicate- object relation between the verb and the noun phrases. We select 2,226 most frequent verbs from the web corpus. For each verb, we construct the dis- tribution of noun phrases depend on the verb in the sentences sampled from the web corpus. The noun phrases are restricted to be those that occur in ΓH . More specifically, for any noun phrase y that ap- pears in ΓH , we calculate the following Pr(C|y) = fr(y,C)∑ C fr(y,C) (9) where fr(y,C) means the frequency of y and con- text C with relation r. Note we can build prefer- ence distribution for context other than verbs since, in theory, r can be any relation (e.g. modifier-head relation). 4.3 Type 2 and Type 3 Metaphors If a sentence contains type 2 and type 3 metaphors, either the source or the target concepts in the sen- 4This is the ratio between the number of metaphors and is-a pairs in a random sample of “is a” pattern sentences. 384 tence is missing. For each noun phrase x and a con- text C in such a sentence, we want to know whether x is of literal or metaphoric use. It is a metaphoric use if the selectional preference of some y, which is a source or target concept of x in Γm, is larger than the selectional preference of any super-concept of x in ΓH , by a factor δ. Formally, there exists a y where (x,y) ∈ Γm or (y,x) ∈ Γm, such that P(y|x,C) P(h|x,C) ≥ δ, ∀(x,h) ∈ ΓH. (10) To compute (10), we have P(y|x,C) = P(x,y,C) P(x,C) = P(x,y)P(C|x,y) P(x,C) (11) Assuming x is a target concept and y is a source concept (a Type 3 metaphor), we can obtain P(x,y) by Eq.(8). 5 Furthermore, C is independent of x in a type 2 or 3 metaphor, since a metaphor is an unusual use of x (the target) within a given context. Therefore P(C|x,y) = P(C|y), where P(C|y) is available from Eq. (9). Similarly, we have P(h|x,C) = P(x,h)P(C|h) P(x,C) (12) where P(x,h) is obtained from ΓH and P(C|h) is from the context preference distribution. To explain the metaphor, or uncover the missing concept, y∗ = arg max y ∧ (y,x)∈Γm P(y|x,C) = arg max y ∧ (y,x)∈Γm P(y,x)P(C|y) As a concrete example, consider sentence My car drinks gasoline. There are two possible targets: car and gasoline. The context for both targets is the verb drink. Let x = car. By Eq.(11), we first find all y’s for which (car,y) ∈ Γm or (y,car) ∈ Γm. We get terms such as woman, friend, gun, horse, etc. When we calculate P(car,y) by Eq.(8), we also need to find hypernyms of car in ΓH , which 5Type 2 metaphors can be handled similarly. may include vehicle, product, asset, etc. For each candidate y, P(y|car,C) is calculated by metaphor knowledge P(x,y) and context preference P(C|yi). Table 1 shows the result. Since the selectional pref- erence of horse (from Γm) is much larger than other literal uses of car, this sentence is recognized as a metaphor, and the missing source concept is horse. Table 1: Log probabilities (M: Metaphor, L:Literal). Type yi log log log P(yi P(yi,car) P(C|yi) |car,C) L vehicle -6.2 -∞ -∞ L product -6.9 -∞ -∞ L asset -6.3 -∞ -∞ M woman -8.5 -2.8 -11.3 M friend -8.0 -3.0 -11.0 M gun -8.4 -∞ -∞ M horse -8.2 -2.4 -10.6 ... ... ... ... ... 5 Experimental Result We evaluate the performance of metaphor acquisi- tion, recognition and explanation in our system and compare it with several state-of-the-art mechanisms. 5.1 Metaphor Acquisition From the web corpus, we collected 8,552,672 sen- tences matching the “is like a” pattern (pattern 1) and we extracted 932,621 unique high quality sim- ile mappings from them. These simile mappings became the core of Γm. ΓH contains 16,736,068 unique isA pairs. We also collected 1,131,805,382 sentences matching the “is a” pattern (pattern 2), from which 180,446,190 unique mappings were ex- tracted. These mappings contain both metaphors and isA relations. From there, we identified 2,663,127 pairs of metaphors unseen in the sim- ile set. These new metaphor pairs were added to Γm. Random samples show that the precisions of the core metaphor dataset and the whole dataset are 93.5% and 82%, respectively. All of the above datasets, a sample of context preference, as well as the test sets mentioned in this section can be found at http://adapt.seiee.sjtu.edu. cn/˜kzhu/metaphor. 385 5.2 Type 1 Metaphor Recognition We compare our type 1 metaphor recognition with the method (known as KZ) by Krishnakumaran and Zhu (2007). For sentences containing “x is a y” pat- tern, KZ used WordNet to detect whether y is a hy- pernym of x. If not, then this sentence is considered a metaphor. Our test set is 200 random sentences that match the “x BE a y” pattern. We label a sen- tence in the set as a metaphor if the two nouns con- nected by BE do not actually have isA relation; or if they do have isA relation but the sentence expressed a strong emotion 6. Table 2: Type 1 metaphor recognition Precision Recall F1 KZ 13% 30% 18% Our Approach 73% 66% 69% The result is summarized in Table 2. KZ does not perform as well due to the small coverage of Word- Net taxonomy. Only 33 out of 200 sentences con- tain a concept x that exists in WordNet and has at least one hypernym. And among these, only 2 sen- tences contain a y which is the hypernym ancestor of x in WordNet. Clearly, the bottleneck is the scale of WordNet. 5.3 Type 2/3 Metaphor Recognition For type 2/3 metaphor recognition, we compare our results with three other methods. The first compet- ing method (called SA) employs the selectional as- sociation proposed by Resnik (1993). Selectional association measures the strength of the connection between a predicate (c) and a term (e) by: A(c,e) = Pr(e|c) log Pr(e|c) Pr(e) S(c) , (13) where S(c) = KL(Pr(e|c)||Pr(e)) = ∑ e Pr(e|c) log Pr(e|c) Pr(e) Given an NP-predicate pair, if its SA score is less than a threshold α (set to 10−4 by empirics), then the pair is recognized as a metaphor context. 6For example, “this man is an animal!”. Second competing method (called CP) is the con- textual preference approach (Resnik, 1993) intro- duced in Section 4.2. To establish context prefer- ence distributions, we randomly select 100 million sentences from the web corpus, parse each sentence using Stanford parser (Group, 2013) to obtain all subject-predicate-object triples, and aggregate the triples to get 33,236,292 subject-predicate pairs and 38,890,877 predicate-object pairs. The occurrences of these pairs are used as context preference. Given a pair of NP-predicate pair, if its context preference score is less than a threshold β (set to 10−5 by em- pirics 7), then the pair is considered as metaphoric. The third competing method (called VH) is a vari- ant of our own algorithm with Γm replaced by a metaphor database derived from the Slip Net pro- posed by Veale and Hao (2008), which we call ΓV H . We built a Slip Net containing 21,451 concept nodes associated with 27,533 distinct talking points. We consider two concepts to be metaphoric if they are at most 5 hops apart on the Slip Net The choice of 5 hops is a trade-off between precision and recall for Slip Net. We thus created ΓV H with 5,633,760 pairs of concepts. We sampled 1,000 sentences from the BNC dataset (Clear, 1993) as follows. We prepare a list of 2,945 frequent verbs (and their different forms). For each verb, we obtain at most 5 sentences from BNC dataset which contain this verb as a predicate. At this point, we obtain a total of 22,601 sentences and randomly sample 1,000 sentences to form a test set. Each sentence in the set is then manually la- beled as being “metaphor” or “non-metaphor”. We label them according to this procedure: 1. for each verb, we collect the intended use, i.e., the categories of its arguments (subject or ob- ject) according to Marriam Webster’s dictio- nary; 2. if the argument of the verb in the sentence be- longs to the intended category, the sentence is labeled “non-metaphor”; 3. if the argument and the intended meaning form a metonymy which uses a part or an attribute to 7The authors didn’t specify the choice of α and β, and we pick values which optimize the performance of their algorithms. 386 represent the whole object, the pair is labeled as “non-metaphor”; 4. else the sentence is labeled as “metaphor”. Table 3: Type 2/3 metaphor recognition Precision Recall F1 SA 23% 20% 21% CP 50% 20% 26% VH 11% 86% 20% Our Approach 65% 52% 58% The results for type 2 and 3 metaphor recogni- tion are shown in Table 3. Our knowledge-based ap- proach significantly outperforms the other peers by F-1 measure. Although VH achieves a good recall, its precision is poor. This is because i) Slip Net con- struction makes heavy use of sibling terms on the WordNet but sibling terms are not necessarily simi- lar terms; ii) many pairs generated by slipping over the Slip Net are in theory related but are not com- monly uttered due to the lack of practical context. 0% 10% 20% 30% 40% 50% 60% 70% 80% SPS (2,3] SPS (3,4] SPS (4,5] F 1 s c o re SPS of verbs SA CP VH Our approach Figure 2: Metaphor recognition of type 2 and 3 Fig. 2 compares the four methods on verbs with different selectional preference strength, which indi- cates how strong a verb’s arguments are restricted to a certain scope of nouns.8 Again, our method shows a significant advantage across the board. We explain why our approach works better us- ing the examples in Table 4. In sentence AAU200, shatters is a metaphoric usage because silence is not a thing that can be broken into pieces. SA and CP scores for shatters-silence pair are high because this word combination is quite common, 8Note that no verb has SPS larger than 5. and hence these methods incorrectly treat it as lit- eral expression. The situation is similar with stalk- company pair in ABG2327. On the other hand, for AN81309, manipulate-life is considered rare com- bination and hence has low SA and CP scores and is deemed a metaphor while in reality it is a literal use. A similar case occurs for work-concur pair. In all these cases, our knowledge bases Γm and ΓH are comprehensive and accurate enough to correctly identify metaphors vs. non-metaphors. On the con- trary, the metaphor database ΓV H covers way too many pairs that it treats every pair as a metaphor. Besides our own dataset, we also experiment on TroFi Example Base9, which consists of 50 verbs and 3,736 sentences containing these verbs. Each sentence is annotated as literal and nonliteral use of the verb. Our algorithm is used to classify the sub- jects and the objects of the verbs. We use Stanford dependency parser to obtain collapsed typed depen- dencies of these sentences, and for each sentence, run our algorithm to classify the subjects and objects related to the verb, if the verb acts as a predicate. Results show that our approach achieves 77.5% pre- cision but just under 5% in recall. The low recall is because, i) non-literal uses in the TroFi dataset in- clude not only metaphor but also metonymy, irony and other anomalies; ii) our approach currently fo- cuses on subject-predicate and predicate-object de- pendencies in a sentence only, but the target verbs do not act as predicate in many of the example sen- tences; iii) the Stanford dependency parser is not ro- bust enough so half of the sentences are not parsed correctly. 5.4 Metaphor Explanation In this experiment, we use the classic labeled metaphoric sentences from (Lakoff and Johnson, 1980). Lakoff provided 24 metaphoric mappings, and for each mapping there are about ten example sentences. In total, there are 214 metaphoric sen- tences. Among them, we focus on 83 sentences whose metaphor is expressed by subject-predicate or predicate-object relation, as this paper focuses on verb centric context preferences. We evaluate the results of competing algorithms 9TroFi Example Base is available at http://www.cs. sfu.ca/˜anoop/students/jbirke/. 387 Table 4: Metaphor recognition for some example sentences from BNC dataset (HM: Human, M: Metaphor, L : Literal). ID Sentence HM SA CP VH Ours AAU 200 Road-block salvo shatters Bucharest’s fragile silence. M L L M M ABG 2327 Obstruction and protectionism do not stalk only big companies. M L L M M AN8 1309 But when science proposes to manipulate the life of a human baby, L M M M L ACH 1075 Nevertheless, recent work on Mosley and the BUF has concurred L M M M L about their basic unimportance. by the following labeling criteria. We consider an output (i.e. a pair of concept mapping) as a match, if the produced pair exactly matches the ground truth pair, of if the pair is subsumed by the ground truth pair. For example, the ground truth for the sentence Let that idea simmer on the back burner is ideas → foods according to Lakoff (Lakoff and Johnson, 1980). If our algorithm outputs idea → stew, then it is considered a match since stew belongs to the food category. An output pair is considered correct if it is not a match to the ground truth but is otherwise considered metaphoric by at least 2 of the 3 human judges. Given a sentence, since our algorithm returns a list of possible explanations for the missing concept, ranked by the probability, we evaluate the results by three different metrics: Match Top 1: result considered correct if there is a match with the top explanation; Match Top 3: result considered correct if there is a match in the top 3 ranked explanations; Correct Top 3: result considered correct if there is a correct in the top 3 explanations. Table 5: Precision of metaphor explanation using differen metaphor databases Match Top 1 Match Top 3 Correct Top 3 ΓV H 26% 49% 54% Γm 43% 67% 78% Comparison with Slip Net We compare the result of our algorithm (from Section 4.3) against the variant which uses ΓV H ob- tained in Section 5.3. Table 5 summarizes the precisions of the two al- gorithms under three different metrics. Some of these sentences and the top explanations given by our algorithm are listed in Table 6. The concept to be explained is italicized while the explanation that is a match or correct is bolded or bold-italicized, re- spectively. The explanations are ordered from left to right by the score. Comparison with paraphrasing While we define metaphor explanation as a task to recover the missing noun-based concept in a source-target mapping, an alternative way to explain a metaphor (Shutova, 2010) is to find the paraphrase of the verb in the metaphor. Here we evaluate para- phrasing task on verbs in metaphoric sentence by Shutova et al(Shutova, 2010). For a metaphoric verb V in a sentence, Shutova et al. select a set of verbs that probabilistically best matches grammar relations of V , and then filter out those verbs that are not related to V according to the WordNet, and eventually re-rank remaining verbs based on selec- tion association. In some sense, Shutova’s work uses a similar framework as ours: first restrict the target para- phrasing set using a knowledge, then select the most proper word based on the context. The difference is that the target of (Shutova, 2010) is the verb in sentence, while our approach focuses on the noun. To implement algorithm by Shutova, we extract and count each grammar relation in 1 billion sen- tences. These counts are used to calculate con- text matching in (Shutova, 2010), and are also used to calculate selection association. We perform Shutova’s paraphrasing on verbs in 83 sentences, of which only 25 finds a good paraphrases in Shutova’s top 3 results. After removing 17 sentences which contain light verbs (e.g., take, give, put), the algo- 388 Table 6: Metaphor sentences explained by the system Metaphor mapping Sentence Explanation Ideas are food Let that idea simmer on the back burner. stew; carrot; onion We don’t need to spoon-feed our students egg roll; acorn; word with knowledge. Eyes are containers His eyes displayed his compassion. window; symbol; tiny camera His eyes were filled with anger. hollow ball; water balloon; balloon Emotional effect is His mother’s death hit him hard. enemy; monster physical contact That idea bowled me over. punch; stew; onion Life is a container. Her life is crammed with activities. tapestry; beach; dance Get the most out of life. game; journey; prison rithm finds 21 good paraphrases in top 3 results. One reason for the low recall is that Wordnet is in- adequate in providing candidate metaphor mapping. This is also the reason why our metaphor base is better than the metaphor base generated by talking points. 6 Conclusion Knowledge is essential for a machine to identify and understand metaphors In this paper, we show how to make use of two probabilistic knowledge bases au- tomatically acquired from billions of web pages for this purpose. This work currently recognizes and ex- plains metaphoric mappings between nominal con- cepts with the help of selectional preference of just subject-predicate or predicate-object contexts. An immediate next step is to extend this framework to more general contexts and a further improvement will be to identify mappings between any source and target domains. 7 Acknowledgements Kenny Q. Zhu was partially supported by Google Faculty Research Award, and NSFC Grants 61100050, 61033002 and 61373031. References Rodrigo Agerri. 2008. Metaphor in textual entailment. In COLING (Posters), pages 3–6. John Barnden, Sheila Glasbey, Mark Lee, and Alan Wallington. 2002. Reasoning in metaphor under- standing: the att-meta approach and system. In COL- ING ’02, pages 1–5. Eric P. S. Baumer, James P. White, and Bill Tomlinson. 2010. Comparing semantic role labeling with typed dependency parsing in computational metaphor iden- tification. In CALC ’10, pages 14–22. Julia Birke and Anoop Sarkar. 2006. A clustering ap- proach for nearly unsupervised recognition of nonlit- eral language. In In Proceedings of EACL-06, pages 329–336. Jeremy H. Clear. 1993. The digital word. chapter The British national corpus, pages 163–187. Charles J. Fillmore, Christopher R. Johnson, and Miriam R.L. Petruck. 2003. Background to FrameNet. International Journal of Lexicography, 16.3:235–250. James Geary. 2011. I is an Other: The Secret Life of Metaphor and How It Shapes the Way We See the World. Harper. Matt Gedigian, John Bryant, Srini Narayanan, and Bra- nimir Ciric. 2006. Catching metaphors. In In Work- shop On Scalable Natural Language Understanding. Stanford NLP Group. 2013. The Stanford parser. http://nlp.stanford.edu/software/ lex-parser.shtml. Marti A. Hearst. 1992. Automatic acquisition of hy- ponyms from large text corpora. In COLING ’92, pages 539–545. Paul Kingsbury and Martha Palmer. 2002. From tree- bank to propbank. In In Language Resources and Evaluation. Saisuresh Krishnakumaran and Xiaojin Zhu. 2007. Hunting elusive metaphors using lexical resources. 389 In Proceedings of the Workshop on Computational Approaches to Figurative Language, pages 13–20, Rochester, New York, April. ACL. George Lakoff and Mark Johnson. 1980. Metaphors We Live By. University of Chicago Press, Chicago, USA. J. H. Martin. 1990. A Computational Model of Metaphor Interpretation. Academic Press Professional, Inc. Zachary J. Mason. 2004. Cormet: a computational, corpus-based conventional metaphor extraction sys- tem. Comput. Linguist., 30:23–44, March. George A. Miller. 1995. Wordnet: a lexical database for english. Commun. ACM, 38:39–41, November. Srinivas Sankara Narayanan. 1997. Knowledge- based action representations for metaphor and aspect (karma). Technical report. Philip Stuart Resnik. 1993. Selection and information: a class-based approach to lexical relationships. Ph.D. thesis. Ekaterina Shutova, Lin Sun, and Anna Korhonen. 2010. Metaphor identification using verb and noun cluster- ing. In COLING ’10, pages 1002–1010. Ekaterina Shutova. 2010. Automatic metaphor interpre- tation as a paraphrasing task. In HLT ’10, pages 1029– 1037. Catherine Smith, Tim Rumbell, John Barnden, Bob Hendley, Mark Lee, and Alan Wallington. 2007. Don’t worry about metaphor: affect extraction for con- versational agents. In ACL ’07, pages 37–40. P.D. Turney. 2008. The latent relation mapping engine: Algorithm and experiments. Journal of Artificial In- telligence Research, 33(1):615–655. Tony Veale and Yanfen Hao. 2008. A fluid knowledge representation for understanding and generating cre- ative metaphors. In COLING, pages 945–952. Yorick Wilks. 1978. Making preferences more active. Artificial Intelligence, 11(3):197 – 223. Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Qili Zhu. 2012. Probase: a probabilistic taxonomy for text understanding. In SIGMOD Conference, pages 481– 492. Li Zhang. 2010. Metaphor interpretation and context- based affect detection. In COLING (Posters), pages 1480–1488. 390