Data-Driven Metaphor Recognition and Explanation


Data-Driven Metaphor Recognition and Explanation

Hongsong Li
Microsoft Research Asia

hongsli@microsoft.com

Kenny Q. Zhu
Shanghai Jiao Tong University
kzhu@cs.sjtu.edu.cn

Haixun Wang
Google Research

haixun@google.com

Abstract

Recognizing metaphors and identifying the
source-target mappings is an important task
as metaphorical text poses a big challenge for
machine reading. To address this problem, we
automatically acquire a metaphor knowledge
base and an isA knowledge base from billions
of web pages. Using the knowledge bases,
we develop an inference mechanism to rec-
ognize and explain the metaphors in the text.
To our knowledge, this is the first purely data-
driven approach of probabilistic metaphor ac-
quisition, recognition, and explanation. Our
results shows that it significantly outperforms
other state-of-the-art methods in recognizing
and explaining metaphors.

1 Introduction

A metaphor is a way of communicating. It enables
us to comprehend one thing in terms of another. For
example, the metaphor, Juliet is the sun, allows us
to see Juliet much more vividly than if Shakespeare
had taken a more literal approach. We utter about
one metaphor for every ten to twenty-five words, or
about six metaphors a minute (Geary, 2011).

Specifically, a metaphor is a mapping of concepts
from a source domain to a target domain (Lakoff and
Johnson, 1980). The source domain is often con-
crete and based on sensory experience, while tar-
get domain is usually abstract. Two concepts are
connected by this mapping because they share some
common or similar properties, and as a result, the
meaning of one concept can be transferred to an-
other. For example, in “Juliet is the sun,” the sun is
the source concept while Juliet is the target concept.

One interpretation of this metaphor is that both con-
cepts share the property that their existence brings
about warmth, life, and excitement. In a metaphor-
ical sentence, at least one of the two concepts must
be explicitly present. This leads to three types of
metaphors:

1. Juliet is the sun. Here, both the source (sun)
and the target (Juliet) are explicit.

2. Please wash your claws before scratching me.
Here, the source (claws) is explicit, while the
target (hands) is implicit, and the context of
wash is in terms of the target.

3. Your words cut deep. Here, the target (words)
is explicit, while the source (possibly, knife) is
implicit, and the context of cut is in terms of
the source.

In this paper, we focus on the recognition and ex-
planation of metaphors. For a given sentence, we
first check whether it contains a metaphoric expres-
sion (which we call metaphor recognition), and if
it does, we identify the source and the target con-
cepts of the metaphor (which we call metaphor ex-
planation). Metaphor explanation is important for
understanding metaphors. Explaining type 2 and 3
metaphors is particularly challenging, and, to the
best of our knowledge, has not been attempted for
nominal concepts 1 before. In our examples, know-
ing that life and hands are the target concepts avoids
the confusion that may arise if source concepts sun
and claws are used literally in understanding the sen-
tences. This, however, does not mean that the source

1Nominal concepts are those represented by noun phrases.

379

Transactions of the Association for Computational Linguistics, 1 (2013) 379–390. Action Editor: Lillian Lee.
Submitted 6/2013; Revised 9/2013; Published 10/2013. c©2013 Association for Computational Linguistics.


concept is a useless embellishment. In the 3rd sen-
tence, knowing that words is mapped to knife en-
ables the system to understand the emotion or the
sentiment embedded in the text. This is the reason
why metaphor recognition and explanation is impor-
tant to applications such as affection mining (Smith
et al., 2007).

It is worth noting that some prefer to consider
the verb “cut”, rather than the noun “words”, to be
metaphoric in the 3rd sentence above. We instead
concentrate on nominal metaphors and seek to ex-
plain source-target mappings in which at least one
domain is a nominal concept. This is because verbs
usually have nominal arguments, as either subject or
object, thus explaining the source-target mapping of
the nominal argument covers most, if not all, cases
where a verb is metaphoric.

In order for machines to recognize and explain
metaphors, it must have extensive human knowl-
edge. It is not difficult to see why metaphor recog-
nition based on simple context modeling (e.g., by
selectional restriction/preference (Resnik, 1993)) is
insufficient. First, not all expressions that violate
the restriction are metaphors. For example, I hate
to read Heidegger violates selectional restriction, as
the context (embodied by the verb read) prefers an
object other than a person (Heidegger). But, Heideg-
ger is not a metaphor but a metonymy, which in this
case denotes Heidegger’s books. Second, not every
metaphor violates the restriction. For example, life
is a journey is clearly a metaphor, but selectional re-
striction or preference is helpless when it comes to
the isA context.

Existing approaches based on human-curated
knowledge bases fall short of the challenge. First,
the scale of a human-curated knowledge base is of-
ten very limited, which means at best it covers a
small set of metaphors. Second, new metaphors
are created all the time and the challenge is to rec-
ognize and understand metaphors that have never
been seen before. This requires extensive knowl-
edge. As a very simple example, even if the machine
knows Sports cars are fire engines is a metaphor,
it still needs to know what is a sports car before it
can understand My Ferrari is a fire engine is also
a metaphor. Third, existing human-curated knowl-
edge bases (including metaphor databases and the
WordNet) are not probabilistic. They cannot tell

how typical an instance is of a category (e.g., a robin
is a more typical bird than a penguin), or how popu-
lar an expression (e.g., a breath of fresh air) is used
as a source concept to describe targets in another
concept (e.g., young girls). Unfortunately, without
necessary probabilistic information, not much rea-
soning can be performed for metaphor explanation.

In this paper, we address the above challenges.
We start with a probabilistic isA knowledge base of
many entities and categories harnessed from billions
of web documents using a set of strict syntactic pat-
terns known as the Hearst patterns (Hearst, 1992).
We then automatically acquire a large probabilistic
metaphor database with the help of both syntactic
patterns and the isA knowledge base (Section 3).
Finally we combine the two knowledge bases and
a probabilistic reasoning mechanism for automatic
metaphor recognition and explanation (Section 4).

This paper makes the following contributions:

1. To our knowledge, we are the first to intro-
duce the metaphor explanation problem, which
seeks to recover missing or implied source or
target concepts in an implicit metaphor.

2. This is the first big-data driven, unsupervised
approach for metaphor recognition and expla-
nation. One of the benefits of leveraging big
data is that the knowledge we obtain is less bi-
ased, has great coverage, and can be updated
in a timely manner. More importantly, a data
driven approach can associate with each piece
of knowledge probabilities which are not avail-
able in human curated knowledge but are indis-
pensable for inference and reasoning.

3. Our results show the effectiveness both in terms
of coverage and accuracy of our approach. We
manage to acquire one of the largest metaphor
knowledge bases ever existed with a preci-
sion of 82%. The metaphor recognition accu-
racy significantly outperforms the state-of-the-
art methods (Section 5).

2 Related Work

Existing work on metaphor recognition and interpre-
tation can be divided into two categories: context-
oriented and knowledge-driven. The approach pro-
posed in this paper touches on both categories.

380


2.1 Context-oriented Methods

Some previous work relies on context to differentiate
metaphorical expressions from literal ones (Wilks,
1978; Resnik, 1993). The selection restriction the-
ory (Wilks, 1978) argues that the meaning of an ex-
pression is restricted by its context, and violations of
the restriction imply a metaphor.

Resnik (1993) uses KL divergence to measure
the selectional preference strength (SPS), i.e., how
strongly a context restricts an expression. Although
he did not use this measure directly for metaphor
recognition, SPS (and also a related measure called
the selection association) is widely used in more re-
cent approaches for metaphor recognition and inter-
pretation (Mason, 2004; Shutova, 2010; Shutova et
al., 2010; Baumer et al., 2010). For example, Ma-
son (2004) learns domain-specific selectional prefer-
ences and use them to find mappings between con-
cepts from different domains. Shutova (2010) de-
fines metaphor interpretation as a paraphrasing task.
The method discriminates between literal and fig-
urative paraphrases by detecting selectional prefer-
ence violation. The result of this work has been
compared with our approach in Section 5. Shutova
et al. (2010) identify concepts in a source domain
of a metaphor by clustering verb phrases and filter-
ing out verbs that have weak selectional preference
strength. Baumer (2010) uses semantic role labeling
techniques to calculate selectional preference on se-
mantic relations instead of grammatic relations for
metaphor recognition.

A less related but also context-based work is
analogy interpretation by relation mapping (Turney,
2008). The problem is to generate mapping between
source and target domains by computing pair-wise
co-occurrences for different contextual patterns.

Our approach uses selectional restriction when
enriching the metaphor knowledge base, and adopts
context preference when explaining type 2 and 3
metaphors by focusing on the nearby verbs of a po-
tential source or target concept.

2.2 Knowledge-driven Methods

A growing number of works use knowledge
bases for metaphor understanding (Martin, 1990;
Narayanan, 1997; Barnden et al., 2002; Veale and
Hao, 2008). MIDAS (Martin, 1990) checks if a sen-

tence contains an expression that can be explained
by a more general metaphor in a human-curated
metaphor knowledge base. ATT-Meta (Barnden
et al., 2002) performs metaphor reasoning with a
human-curated metaphor knowledge base and first
order logic, and it focuses on affection detection
(Smith et al., 2007; Agerri, 2008; Zhang, 2010). Kr-
ishnakumaran and Zhu (2007) use the isA relation
in WordNet (Miller, 1995) for metaphor recognition.
Gedigian et al. (2006) use FrameNet (Fillmore et al.,
2003) and Probank (Kingsbury and Palmer, 2002)
to train a maximum entropy classifier for metaphor
recognition. TroFi (Birke and Sarkar, 2006) rede-
fines literal and non-literal as two senses of the same
verb and provide two senses with seed sentences
from human-curated knowledge bases like Word-
Net, known metaphor and idiom sets. For a given
sentence containing target verb, it compares the sim-
ilarity of the sentence with two seed sets respec-
tively. If the sentence is closer to the non-literal
sense set, the verb is recognized as non-literal usage.

While the above work all relies on human cu-
rated data sets or manual labeling, Veale and Hao
(2008) introduced the notion of talking points which
are figurative properties of noun-based concepts.
For example, the concept “Hamas” has the follow-
ing talking points: is islamic:movement and gov-
erns:gaza strip. They automatically constructed a
knowledge base called Slip Net from WordNet and
Web corpus. Concepts that are connected on the
Slip Net can “slip” to one another and are hence
considered related in a metaphor. However, straight-
forward traversal on the Slip Net can become com-
putationally impractical and the authors did not elab-
orate on the implementation details. In practice, the
knowledge acquired in this paper is much larger but
our algorithms are computationally more feasible.

3 Obtaining Probabilistic Knowledge

In this section, we describe how to use a large,
general-purpose, probabilistic isA knowledge base
ΓH to create a probabilistic metaphor dataset Γm.
ΓH contains isA pairs as well as scores associated
with each pair. The metaphor dataset Γm contains
metaphors of the form: (source, target), and a
weight function Pm that maps a metaphor pair to a
probabilistic score. The purpose of creating ΓH is

381


to help clean and expand Γm, and to perform proba-
bilistic inference for metaphor detection.

3.1 IsA Knowledge ΓH
ΓH , a general-purpose, probabilistic isA knowl-
edge base, was previously constructed by Wu et
al.(2012).2 ΓH contains isA relations in the form of
(x,hx), a pair of hyponym and hypernym, for exam-
ple, (Steve Ballmer, CEO of IT companies), and each
pair is associated with a set of probabilistic scores.
Two of the most important scores are known as typ-
icality: P(x|hx), the typicality of x of category hx,
and P(hx|x), the typicality of category hx for in-
stance x, which will be used in metaphor recogni-
tion and explanation. Both scores are approximated
by frequencies, e.g.,

P(x|hx) =
# of (x,hx) in Hearst extraction

# of hx in Hearst extraction

In total, ΓH contains 16 million unique isA rela-
tionships, and 2.7 million unique concepts or cate-
gories (the hx’s in (x,hx) pairs). The importance of
big data is obvious. ΓH contains millions of cat-
egories and probabilistic scores for each category
which enables inference for metaphor understand-
ing, as we will show next.

3.2 Acquiring Metaphors Γm
We acquire an initial set of metaphors Γm from sim-
iles. A simile is a figure of speech that explicitly
compares two different things using words such as
“like” and “as”. For example, the sentence Life is
like a journey is a simile. Without the word “like,”
it becomes a metaphor: Life is a journey. This
property makes simile an attractive first target for
metaphor extraction from a large corpus. We use
the following syntactic pattern for extraction:

〈target〉 BE/VB like [a] 〈source〉 (1)

where BE denotes is/are/has been/have been, etc.,
VB denotes verb other than BE, and 〈target〉 and
〈source〉 denote noun phrases or verb phrases.

Note that not every extracted pair is a metaphor.
Poetry is like an art matches the pattern, but it is not
a metaphor because poetry is really an art. We will
use ΓH to clean such pairs. Furthermore, due to the

2Dataset can be found at http://probase.msra.cn/.

idiosyncrasies of natural languages, it is not trivial to
correctly extract the 〈target〉 and the 〈source〉 from
each sentence that matches the pattern. We use a
postagger and a lemmatizer on the sentences, and we
develop a rule-based system that contains more than
two dozen rules for extraction. For example, a rule
of high-precision but low-recall is “〈target〉 must be
at the beginning of a sentence or the beginning of a
clause (e.g., following the word that)”.

Finally, from 8,552,672 sentences that match the
above pattern (pattern 1), we obtain 1.2 million
unique (x,y) pairs, and after filtering, we are left
with close to 1 million unique metaphor pairs, which
form the starting point of Γm.

3.3 Cleaning, Expanding, and Weighting Γm
The simile pattern only allows us to extract some
of the available metaphor pairs. To expand Γm, we
use a more flexible but also noisier pattern to extract
more candidate metaphor pairs from billions of sen-
tences in the web corpus:

〈target〉 BE [a] 〈source〉 (2)

The above “is a” pattern covers metaphors such as
Life is a journey. But many pairs thus extracted are
not metaphors, for example, Malaysia is a tropical
country. That is, pairs extracted by the “is a” pat-
tern contains at least two types of relations: the lit-
eral isA relations and the metaphor relations. The
problem is how to distinguish one from the other. In
theory, the set of all IsA relations, I, and the set of
all metaphor relations, M, do not overlap, because
by definition, the source concept and the target con-
cept in a metaphor are not the same thing. Thus, our
intuition is the following. The pairs produced by the
simile pattern, called S, is a subset of M, while the
pairs extracted from the Hearst pattern, called H, is
also a subset of I. Since M and I hardly overlap,
S and H should have little overlap, too. In practice,
very few people would say something like journeys
such as life. Figure 1 illustrates this scenario.

To verify this intuition, we randomly sampled
1,000 sentences and manually annotated them. Of
these sentences, 40 contain an IsA relation, of which
27 are enclosed in a Hearst’s pattern and 13 can be
extracted by the “is a” pattern. Furthermore, 28 of
these 1000 sentences contain a metaphor expression,

382


(beast, sports car)(sports car, ferrari)
(vehicle, ferrari)

(beast, ferrari)

Hearst 

pattern

Is-a 

relation

Simile 

pattern

Metaphor 

relation
“Is a” 

pattern

Figure 1: Relations among different sets. Dotted circles
represent relations (ground truth). Solid circles represent
pairs extracted by syntactic patterns.

and within the 28 metaphors, 15 are embedded in a
simile pattern. More importantly, there is no overlap
between the IsA relations and metaphors (and hence
the similes).

In a larger scale experiment, we crawled 1 billion
sentences which match the “is a” pattern (2) from the
web corpus. From these, we extracted 180 million
unique (x,y) pairs. 24.8% of ΓH can be found in “is
a” pattern pairs, while 16.8% of Γm can be found in
“is a” pattern pairs. Further more, there is almost no
overlap between ΓH and Γm: 1.26% of ΓH can be
found in Γm, and 1.31% of Γm can be found in ΓH .

Our goal is to use the information collected
through the syntactic patterns to enrich the metaphor
relations or Γm. Armed with the above observations,
we make two conclusions. First, the (life, journey)
pair we extracted from life is a journey is more likely
a metaphor since it does not appear in the set ex-
tracted from Hearst patterns. Second, if any existing
pair in Γm also appears in ΓH , we can remove that
pair from Γm.

From the 180 million unique (x,y) pairs we ex-
tracted earlier, by filtering out low frequency pairs 3

and those pairs in ΓH , we obtain 2.6 million of fresh
metaphors. This is almost 3 times larger than initial
metaphor set obtained from the simile pattern.

We further expand Γm by adding metaphors
derived from Γm and ΓH . Assume (x,y) ∈
Γm, and (x,hx) ∈ ΓH , then we add (hx,y) to
Γm. As an example, if (Julie,sun) ∈ Γm,

3Specifically, we randomly sample pairs of frequency 1, 2,
..., 10 from Γm and check the precisions of each group. We filter
out pairs with frequency less than 5 to optimize the precision.

then we add (person name,sun) to Γm, since
(Julie,person name) ∈ ΓH . This enables the
metaphor detection approach we describe in Section
4. Note that we ignore transitivity in the isa relations
from ΓH as such transitivity is not always reliable.
For example, car seat is a chair, and chair is furni-
ture, but car seat is not furniture. How to handle
transitivity in a data driven isA taxonomy is a chal-
lenging problem, and is beyond the scope here.

Finally, we calculate the weight of each metaphor
(x,y). The weight Pm(x,y) is calculated as follows:

Pm(x,y) =
occurrences of (x,y) in isA pattern

occurrences of isA pattern
(3)

The weights of derived metaphors, such as
(person name,sun), are calculated as follows:

Pm(hx,y) =
∑

(x,hx)∈ΓH
Pm(x,y) (4)

4 Probabilistic Metaphor Understanding

In this paper, we consider two aspects of metaphor
understanding, metaphor recognition and metaphor
explanation. The latter is needed for type 2 and 3
metaphors where either the source or the target con-
cept is implicit or missing. Next, we describe a prob-
abilistic approach to accomplish these two tasks.

4.1 Type 1 Metaphors

In a type 1 metaphor, both the source and the tar-
get concepts appear explicitly. When a sentence
matches “is a” pattern (pattern 2), it is a potential
metaphor expression. The first noun in the pattern
is the target candidate, while the second noun is the
source candidate. To recognize type 1 metaphors,
we first obtain the candidate (source, target) pair
from the sentence. Then, we check if we have any
knowledge about the (source, target) pair.

Intuitively, if the pair exists in the metaphor
dataset Γm, then it is a metaphor. If the pair ex-
ists in the is-A knowledge base ΓH , then it is not a
metaphor. But because Γm is far from being com-
plete, if a pair exists in neither Γm nor ΓH , there is a
possibility that it is a metaphor we have never seen
before. In this case, we reason as follows.

Consider a sentence such as My Ferrari is a beast.
Assume (Ferrari, beast) 6∈ Γm, but (sports car,

383


beast) ∈ Γm. Note that (sports car, beast) may it-
self be a derived metaphor which is added into Γm in
metaphor expansion, and the original metaphor ex-
tracted from the web data is (Lamborghinis, beast).
Furthermore, from ΓH , we know Ferrari is a sports
car, that is, (Ferrari, sports car) ∈ ΓH , we can then
infer that Ferrari to beast is very likely a metaphor
mapping.

Specifically, let (x,y) be a pair we are concerned
with. We want to compute the odds of (x,y) repre-
senting a metaphor vs. a normal is-A relationship:

P(x,y)

1 −P(x,y) (5)

where P(x,y) is the probability that (x,y) forms a
metaphor. Now, combining the knowledge we have
in ΓH , we have

P(x,y) =
∑

(x,hx)∈ΓH
P(x,hx,y) (6)

Here, hx is a possible superconcept, i.e., a possible
interpretation, for x. For example, if x = apple,
then two highly possible interpretations are com-
pany and fruit. In Eq.(6), we want to aggregate on
all possible interpretations (all superconcepts) of x.
This is possible because of the massive size of the
concept space in ΓH .

We can rewrite Eq.(6) to the following:

P(x,y) =
∑

(x,hx)∈ΓH
P(y|x,hx)P(x|hx)P(hx)

(7)
Here, P(y|x,hx) means when x is interpreted as an
hx, the probability of y as a target metaphorical con-
cept for hx. Thus, given hx, y is independent with x,
so P(y|x,hx) can be simply replaced by P(y|hx).
We can then rewrite Eq.(7) to:

P(x,y) =
∑

(x,hx)∈ΓH
P(y|hx)P(x|hx)P(hx)

=
∑

(x,hx)∈ΓH
P(hx,y)P(x|hx) (8)

It is clear P(hx,y) is simply Pm(hx,y) in Eq.(4)
given by the metaphor dataset Γm. Furthermore,
P(x|hx) is the typicality of x in the hx category,
and P(hx) is the prior of the category hx. Both of
them are available from the isA knowledge base ΓH .

Thus, we can calculate Eq.(8) using information in
the two knowledge bases we have created.

If the odds in Eq.(5) is greater than a thresh-
old δ, which is determined empirically to be δ =
P (metaphor)

P (isa)
4, we declare (x,y) as a metaphor.

4.2 Context Preference Modeling
It is more difficult to recognize metaphors when the
source concept or the target concept is not explic-
itly given in a sentence. In this case, we rely on the
context in the sentence.

Given a sentence, we find metaphor candidates
and the context. Here, candidates are noun phrases
in the sentence which can potentially be the target
or the source concept of a metaphor, while context
denotes words that have a grammatic dependency
with the candidate. The dependency can be subject-
predicate, predicate-object, or modifier-header, etc.
The context can be a verb, a noun phrase, or an ad-
jective which has certain preference over the target
or source candidate. For example, the word horse
prefers verbs such as jump, drink and eat; the word
flower prefers modifiers such as red, yellow and
beautiful.

In this work, we focus on analyzing the prefer-
ences of verbs using subject-predicate or predicate-
object relation between the verb and the noun
phrases. We select 2,226 most frequent verbs from
the web corpus. For each verb, we construct the dis-
tribution of noun phrases depend on the verb in the
sentences sampled from the web corpus. The noun
phrases are restricted to be those that occur in ΓH .

More specifically, for any noun phrase y that ap-
pears in ΓH , we calculate the following

Pr(C|y) =
fr(y,C)∑
C fr(y,C)

(9)

where fr(y,C) means the frequency of y and con-
text C with relation r. Note we can build prefer-
ence distribution for context other than verbs since,
in theory, r can be any relation (e.g. modifier-head
relation).

4.3 Type 2 and Type 3 Metaphors
If a sentence contains type 2 and type 3 metaphors,
either the source or the target concepts in the sen-

4This is the ratio between the number of metaphors and is-a
pairs in a random sample of “is a” pattern sentences.

384


tence is missing. For each noun phrase x and a con-
text C in such a sentence, we want to know whether
x is of literal or metaphoric use. It is a metaphoric
use if the selectional preference of some y, which is
a source or target concept of x in Γm, is larger than
the selectional preference of any super-concept of x
in ΓH , by a factor δ. Formally, there exists a y where
(x,y) ∈ Γm or (y,x) ∈ Γm, such that

P(y|x,C)
P(h|x,C) ≥ δ, ∀(x,h) ∈ ΓH. (10)

To compute (10), we have

P(y|x,C) = P(x,y,C)
P(x,C)

=
P(x,y)P(C|x,y)

P(x,C)
(11)

Assuming x is a target concept and y is a source
concept (a Type 3 metaphor), we can obtain P(x,y)
by Eq.(8). 5 Furthermore, C is independent of x
in a type 2 or 3 metaphor, since a metaphor is an
unusual use of x (the target) within a given context.
Therefore P(C|x,y) = P(C|y), where P(C|y) is
available from Eq. (9).

Similarly, we have

P(h|x,C) = P(x,h)P(C|h)
P(x,C)

(12)

where P(x,h) is obtained from ΓH and P(C|h) is
from the context preference distribution.

To explain the metaphor, or uncover the missing
concept,

y∗ = arg max
y ∧ (y,x)∈Γm

P(y|x,C)

= arg max
y ∧ (y,x)∈Γm

P(y,x)P(C|y)

As a concrete example, consider sentence My car
drinks gasoline. There are two possible targets: car
and gasoline. The context for both targets is the verb
drink. Let x = car. By Eq.(11), we first find all
y’s for which (car,y) ∈ Γm or (y,car) ∈ Γm.
We get terms such as woman, friend, gun, horse,
etc. When we calculate P(car,y) by Eq.(8), we
also need to find hypernyms of car in ΓH , which

5Type 2 metaphors can be handled similarly.

may include vehicle, product, asset, etc. For each
candidate y, P(y|car,C) is calculated by metaphor
knowledge P(x,y) and context preference P(C|yi).
Table 1 shows the result. Since the selectional pref-
erence of horse (from Γm) is much larger than other
literal uses of car, this sentence is recognized as a
metaphor, and the missing source concept is horse.

Table 1: Log probabilities (M: Metaphor, L:Literal).
Type yi log log log P(yi

P(yi,car) P(C|yi) |car,C)
L vehicle -6.2 -∞ -∞
L product -6.9 -∞ -∞
L asset -6.3 -∞ -∞
M woman -8.5 -2.8 -11.3
M friend -8.0 -3.0 -11.0
M gun -8.4 -∞ -∞
M horse -8.2 -2.4 -10.6
... ... ... ... ...

5 Experimental Result

We evaluate the performance of metaphor acquisi-
tion, recognition and explanation in our system and
compare it with several state-of-the-art mechanisms.

5.1 Metaphor Acquisition

From the web corpus, we collected 8,552,672 sen-
tences matching the “is like a” pattern (pattern 1)
and we extracted 932,621 unique high quality sim-
ile mappings from them. These simile mappings
became the core of Γm. ΓH contains 16,736,068
unique isA pairs. We also collected 1,131,805,382
sentences matching the “is a” pattern (pattern 2),
from which 180,446,190 unique mappings were ex-
tracted. These mappings contain both metaphors
and isA relations. From there, we identified
2,663,127 pairs of metaphors unseen in the sim-
ile set. These new metaphor pairs were added to
Γm. Random samples show that the precisions of
the core metaphor dataset and the whole dataset are
93.5% and 82%, respectively. All of the above
datasets, a sample of context preference, as well
as the test sets mentioned in this section can be
found at http://adapt.seiee.sjtu.edu.
cn/˜kzhu/metaphor.

385


5.2 Type 1 Metaphor Recognition
We compare our type 1 metaphor recognition with
the method (known as KZ) by Krishnakumaran and
Zhu (2007). For sentences containing “x is a y” pat-
tern, KZ used WordNet to detect whether y is a hy-
pernym of x. If not, then this sentence is considered
a metaphor. Our test set is 200 random sentences
that match the “x BE a y” pattern. We label a sen-
tence in the set as a metaphor if the two nouns con-
nected by BE do not actually have isA relation; or if
they do have isA relation but the sentence expressed
a strong emotion 6.

Table 2: Type 1 metaphor recognition
Precision Recall F1

KZ 13% 30% 18%
Our Approach 73% 66% 69%

The result is summarized in Table 2. KZ does not
perform as well due to the small coverage of Word-
Net taxonomy. Only 33 out of 200 sentences con-
tain a concept x that exists in WordNet and has at
least one hypernym. And among these, only 2 sen-
tences contain a y which is the hypernym ancestor
of x in WordNet. Clearly, the bottleneck is the scale
of WordNet.

5.3 Type 2/3 Metaphor Recognition
For type 2/3 metaphor recognition, we compare our
results with three other methods. The first compet-
ing method (called SA) employs the selectional as-
sociation proposed by Resnik (1993). Selectional
association measures the strength of the connection
between a predicate (c) and a term (e) by:

A(c,e) =
Pr(e|c) log Pr(e|c)

Pr(e)

S(c)
, (13)

where

S(c) = KL(Pr(e|c)||Pr(e))

=
∑

e

Pr(e|c) log Pr(e|c)
Pr(e)

Given an NP-predicate pair, if its SA score is less
than a threshold α (set to 10−4 by empirics), then
the pair is recognized as a metaphor context.

6For example, “this man is an animal!”.

Second competing method (called CP) is the con-
textual preference approach (Resnik, 1993) intro-
duced in Section 4.2. To establish context prefer-
ence distributions, we randomly select 100 million
sentences from the web corpus, parse each sentence
using Stanford parser (Group, 2013) to obtain all
subject-predicate-object triples, and aggregate the
triples to get 33,236,292 subject-predicate pairs and
38,890,877 predicate-object pairs. The occurrences
of these pairs are used as context preference. Given
a pair of NP-predicate pair, if its context preference
score is less than a threshold β (set to 10−5 by em-
pirics 7), then the pair is considered as metaphoric.

The third competing method (called VH) is a vari-
ant of our own algorithm with Γm replaced by a
metaphor database derived from the Slip Net pro-
posed by Veale and Hao (2008), which we call ΓV H .
We built a Slip Net containing 21,451 concept nodes
associated with 27,533 distinct talking points. We
consider two concepts to be metaphoric if they are
at most 5 hops apart on the Slip Net The choice of 5
hops is a trade-off between precision and recall for
Slip Net. We thus created ΓV H with 5,633,760 pairs
of concepts.

We sampled 1,000 sentences from the BNC
dataset (Clear, 1993) as follows. We prepare a list
of 2,945 frequent verbs (and their different forms).
For each verb, we obtain at most 5 sentences from
BNC dataset which contain this verb as a predicate.
At this point, we obtain a total of 22,601 sentences
and randomly sample 1,000 sentences to form a test
set. Each sentence in the set is then manually la-
beled as being “metaphor” or “non-metaphor”. We
label them according to this procedure:

1. for each verb, we collect the intended use, i.e.,
the categories of its arguments (subject or ob-
ject) according to Marriam Webster’s dictio-
nary;

2. if the argument of the verb in the sentence be-
longs to the intended category, the sentence is
labeled “non-metaphor”;

3. if the argument and the intended meaning form
a metonymy which uses a part or an attribute to

7The authors didn’t specify the choice of α and β, and we
pick values which optimize the performance of their algorithms.

386


represent the whole object, the pair is labeled
as “non-metaphor”;

4. else the sentence is labeled as “metaphor”.

Table 3: Type 2/3 metaphor recognition
Precision Recall F1

SA 23% 20% 21%
CP 50% 20% 26%
VH 11% 86% 20%

Our Approach 65% 52% 58%

The results for type 2 and 3 metaphor recogni-
tion are shown in Table 3. Our knowledge-based ap-
proach significantly outperforms the other peers by
F-1 measure. Although VH achieves a good recall,
its precision is poor. This is because i) Slip Net con-
struction makes heavy use of sibling terms on the
WordNet but sibling terms are not necessarily simi-
lar terms; ii) many pairs generated by slipping over
the Slip Net are in theory related but are not com-
monly uttered due to the lack of practical context.

0%

10%

20%

30%

40%

50%

60%

70%

80%

SPS (2,3] SPS (3,4] SPS (4,5]

F
1

 s
c
o

re

SPS of verbs

SA CP VH Our approach

Figure 2: Metaphor recognition of type 2 and 3

Fig. 2 compares the four methods on verbs with
different selectional preference strength, which indi-
cates how strong a verb’s arguments are restricted to
a certain scope of nouns.8 Again, our method shows
a significant advantage across the board.

We explain why our approach works better us-
ing the examples in Table 4. In sentence AAU200,
shatters is a metaphoric usage because silence is
not a thing that can be broken into pieces. SA
and CP scores for shatters-silence pair are high
because this word combination is quite common,

8Note that no verb has SPS larger than 5.

and hence these methods incorrectly treat it as lit-
eral expression. The situation is similar with stalk-
company pair in ABG2327. On the other hand, for
AN81309, manipulate-life is considered rare com-
bination and hence has low SA and CP scores and
is deemed a metaphor while in reality it is a literal
use. A similar case occurs for work-concur pair. In
all these cases, our knowledge bases Γm and ΓH
are comprehensive and accurate enough to correctly
identify metaphors vs. non-metaphors. On the con-
trary, the metaphor database ΓV H covers way too
many pairs that it treats every pair as a metaphor.

Besides our own dataset, we also experiment on
TroFi Example Base9, which consists of 50 verbs
and 3,736 sentences containing these verbs. Each
sentence is annotated as literal and nonliteral use of
the verb. Our algorithm is used to classify the sub-
jects and the objects of the verbs. We use Stanford
dependency parser to obtain collapsed typed depen-
dencies of these sentences, and for each sentence,
run our algorithm to classify the subjects and objects
related to the verb, if the verb acts as a predicate.
Results show that our approach achieves 77.5% pre-
cision but just under 5% in recall. The low recall is
because, i) non-literal uses in the TroFi dataset in-
clude not only metaphor but also metonymy, irony
and other anomalies; ii) our approach currently fo-
cuses on subject-predicate and predicate-object de-
pendencies in a sentence only, but the target verbs
do not act as predicate in many of the example sen-
tences; iii) the Stanford dependency parser is not ro-
bust enough so half of the sentences are not parsed
correctly.

5.4 Metaphor Explanation

In this experiment, we use the classic labeled
metaphoric sentences from (Lakoff and Johnson,
1980). Lakoff provided 24 metaphoric mappings,
and for each mapping there are about ten example
sentences. In total, there are 214 metaphoric sen-
tences. Among them, we focus on 83 sentences
whose metaphor is expressed by subject-predicate
or predicate-object relation, as this paper focuses on
verb centric context preferences.

We evaluate the results of competing algorithms

9TroFi Example Base is available at http://www.cs.
sfu.ca/˜anoop/students/jbirke/.

387


Table 4: Metaphor recognition for some example sentences from BNC dataset (HM: Human, M: Metaphor, L : Literal).

ID Sentence HM SA CP VH Ours
AAU 200 Road-block salvo shatters Bucharest’s fragile silence. M L L M M
ABG 2327 Obstruction and protectionism do not stalk only big companies. M L L M M
AN8 1309 But when science proposes to manipulate the life of a human baby, L M M M L
ACH 1075 Nevertheless, recent work on Mosley and the BUF has concurred L M M M L

about their basic unimportance.

by the following labeling criteria. We consider an
output (i.e. a pair of concept mapping) as a match, if
the produced pair exactly matches the ground truth
pair, of if the pair is subsumed by the ground truth
pair. For example, the ground truth for the sentence
Let that idea simmer on the back burner is ideas
→ foods according to Lakoff (Lakoff and Johnson,
1980). If our algorithm outputs idea → stew, then it
is considered a match since stew belongs to the food
category. An output pair is considered correct if it
is not a match to the ground truth but is otherwise
considered metaphoric by at least 2 of the 3 human
judges.

Given a sentence, since our algorithm returns a
list of possible explanations for the missing concept,
ranked by the probability, we evaluate the results by
three different metrics:

Match Top 1: result considered correct if there is a
match with the top explanation;

Match Top 3: result considered correct if there is a
match in the top 3 ranked explanations;

Correct Top 3: result considered correct if there is
a correct in the top 3 explanations.

Table 5: Precision of metaphor explanation using differen
metaphor databases

Match Top 1 Match Top 3 Correct Top 3
ΓV H 26% 49% 54%
Γm 43% 67% 78%

Comparison with Slip Net
We compare the result of our algorithm (from

Section 4.3) against the variant which uses ΓV H ob-
tained in Section 5.3.

Table 5 summarizes the precisions of the two al-
gorithms under three different metrics. Some of
these sentences and the top explanations given by
our algorithm are listed in Table 6. The concept to
be explained is italicized while the explanation that
is a match or correct is bolded or bold-italicized, re-
spectively. The explanations are ordered from left to
right by the score.

Comparison with paraphrasing

While we define metaphor explanation as a task
to recover the missing noun-based concept in a
source-target mapping, an alternative way to explain
a metaphor (Shutova, 2010) is to find the paraphrase
of the verb in the metaphor. Here we evaluate para-
phrasing task on verbs in metaphoric sentence by
Shutova et al(Shutova, 2010). For a metaphoric
verb V in a sentence, Shutova et al. select a set
of verbs that probabilistically best matches grammar
relations of V , and then filter out those verbs that
are not related to V according to the WordNet, and
eventually re-rank remaining verbs based on selec-
tion association.

In some sense, Shutova’s work uses a similar
framework as ours: first restrict the target para-
phrasing set using a knowledge, then select the most
proper word based on the context. The difference
is that the target of (Shutova, 2010) is the verb in
sentence, while our approach focuses on the noun.

To implement algorithm by Shutova, we extract
and count each grammar relation in 1 billion sen-
tences. These counts are used to calculate con-
text matching in (Shutova, 2010), and are also
used to calculate selection association. We perform
Shutova’s paraphrasing on verbs in 83 sentences, of
which only 25 finds a good paraphrases in Shutova’s
top 3 results. After removing 17 sentences which
contain light verbs (e.g., take, give, put), the algo-

388


Table 6: Metaphor sentences explained by the system

Metaphor mapping Sentence Explanation
Ideas are food Let that idea simmer on the back burner. stew; carrot; onion

We don’t need to spoon-feed our students egg roll; acorn; word
with knowledge.

Eyes are containers His eyes displayed his compassion. window; symbol; tiny camera
His eyes were filled with anger. hollow ball; water balloon; balloon

Emotional effect is His mother’s death hit him hard. enemy; monster
physical contact That idea bowled me over. punch; stew; onion

Life is a container. Her life is crammed with activities. tapestry; beach; dance
Get the most out of life. game; journey; prison

rithm finds 21 good paraphrases in top 3 results.
One reason for the low recall is that Wordnet is in-
adequate in providing candidate metaphor mapping.
This is also the reason why our metaphor base is
better than the metaphor base generated by talking
points.

6 Conclusion

Knowledge is essential for a machine to identify and
understand metaphors In this paper, we show how to
make use of two probabilistic knowledge bases au-
tomatically acquired from billions of web pages for
this purpose. This work currently recognizes and ex-
plains metaphoric mappings between nominal con-
cepts with the help of selectional preference of just
subject-predicate or predicate-object contexts. An
immediate next step is to extend this framework to
more general contexts and a further improvement
will be to identify mappings between any source and
target domains.

7 Acknowledgements

Kenny Q. Zhu was partially supported by Google
Faculty Research Award, and NSFC Grants
61100050, 61033002 and 61373031.

References
Rodrigo Agerri. 2008. Metaphor in textual entailment.

In COLING (Posters), pages 3–6.
John Barnden, Sheila Glasbey, Mark Lee, and Alan

Wallington. 2002. Reasoning in metaphor under-
standing: the att-meta approach and system. In COL-
ING ’02, pages 1–5.

Eric P. S. Baumer, James P. White, and Bill Tomlinson.
2010. Comparing semantic role labeling with typed
dependency parsing in computational metaphor iden-
tification. In CALC ’10, pages 14–22.

Julia Birke and Anoop Sarkar. 2006. A clustering ap-
proach for nearly unsupervised recognition of nonlit-
eral language. In In Proceedings of EACL-06, pages
329–336.

Jeremy H. Clear. 1993. The digital word. chapter The
British national corpus, pages 163–187.

Charles J. Fillmore, Christopher R. Johnson, and
Miriam R.L. Petruck. 2003. Background to
FrameNet. International Journal of Lexicography,
16.3:235–250.

James Geary. 2011. I is an Other: The Secret Life
of Metaphor and How It Shapes the Way We See the
World. Harper.

Matt Gedigian, John Bryant, Srini Narayanan, and Bra-
nimir Ciric. 2006. Catching metaphors. In In Work-
shop On Scalable Natural Language Understanding.

Stanford NLP Group. 2013. The Stanford parser.
http://nlp.stanford.edu/software/
lex-parser.shtml.

Marti A. Hearst. 1992. Automatic acquisition of hy-
ponyms from large text corpora. In COLING ’92,
pages 539–545.

Paul Kingsbury and Martha Palmer. 2002. From tree-
bank to propbank. In In Language Resources and
Evaluation.

Saisuresh Krishnakumaran and Xiaojin Zhu. 2007.
Hunting elusive metaphors using lexical resources.

389


In Proceedings of the Workshop on Computational
Approaches to Figurative Language, pages 13–20,
Rochester, New York, April. ACL.

George Lakoff and Mark Johnson. 1980. Metaphors We
Live By. University of Chicago Press, Chicago, USA.

J. H. Martin. 1990. A Computational Model of Metaphor
Interpretation. Academic Press Professional, Inc.

Zachary J. Mason. 2004. Cormet: a computational,
corpus-based conventional metaphor extraction sys-
tem. Comput. Linguist., 30:23–44, March.

George A. Miller. 1995. Wordnet: a lexical database for
english. Commun. ACM, 38:39–41, November.

Srinivas Sankara Narayanan. 1997. Knowledge-
based action representations for metaphor and aspect
(karma). Technical report.

Philip Stuart Resnik. 1993. Selection and information:
a class-based approach to lexical relationships. Ph.D.
thesis.

Ekaterina Shutova, Lin Sun, and Anna Korhonen. 2010.
Metaphor identification using verb and noun cluster-
ing. In COLING ’10, pages 1002–1010.

Ekaterina Shutova. 2010. Automatic metaphor interpre-
tation as a paraphrasing task. In HLT ’10, pages 1029–
1037.

Catherine Smith, Tim Rumbell, John Barnden, Bob
Hendley, Mark Lee, and Alan Wallington. 2007.
Don’t worry about metaphor: affect extraction for con-
versational agents. In ACL ’07, pages 37–40.

P.D. Turney. 2008. The latent relation mapping engine:
Algorithm and experiments. Journal of Artificial In-
telligence Research, 33(1):615–655.

Tony Veale and Yanfen Hao. 2008. A fluid knowledge
representation for understanding and generating cre-
ative metaphors. In COLING, pages 945–952.

Yorick Wilks. 1978. Making preferences more active.
Artificial Intelligence, 11(3):197 – 223.

Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Qili
Zhu. 2012. Probase: a probabilistic taxonomy for text
understanding. In SIGMOD Conference, pages 481–
492.

Li Zhang. 2010. Metaphor interpretation and context-
based affect detection. In COLING (Posters), pages
1480–1488.

390