Learning a Compositional Semantics for Freebase with an Open Predicate Vocabulary Jayant Krishnamurthy Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 jayantk@cs.cmu.edu Tom M. Mitchell Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 tom.mitchell@cmu.edu Abstract We present an approach to learning a model- theoretic semantics for natural language tied to Freebase. Crucially, our approach uses an open predicate vocabulary, enabling it to produce denotations for phrases such as “Re- publican front-runner from Texas” whose se- mantics cannot be represented using the Free- base schema. Our approach directly converts a sentence’s syntactic CCG parse into a log- ical form containing predicates derived from the words in the sentence, assigning each word a consistent semantics across sentences. This logical form is evaluated against a learned probabilistic database that defines a distribu- tion over denotations for each textual pred- icate. A training phase produces this prob- abilistic database using a corpus of entity- linked text and probabilistic matrix factoriza- tion with a novel ranking objective function. We evaluate our approach on a compositional question answering task where it outperforms several competitive baselines. We also com- pare our approach against manually annotated Freebase queries, finding that our open pred- icate vocabulary enables us to answer many questions that Freebase cannot. 1 Introduction Traditional knowledge representation assumes that world knowledge can be encoded using a closed vocabulary of formal predicates. In recent years, semantic parsing has enabled us to build compo- sitional models of natural language semantics us- ing such a closed predicate vocabulary (Zelle and Mooney, 1996; Zettlemoyer and Collins, 2005). These semantic parsers map natural language state- ments to database queries, enabling applications such as answering questions using a large knowl- edge base (Yahya et al., 2012; Krishnamurthy and Mitchell, 2012; Cai and Yates, 2013; Kwiatkowski et al., 2013; Berant et al., 2013; Berant and Liang, 2014; Reddy et al., 2014). Furthermore, the model- theoretic semantics provided by such parsers have the potential to improve performance on other tasks, such as information extraction and coreference res- olution. However, a closed predicate vocabulary has inher- ent limitations. First, its coverage will be limited, as such vocabularies are typically manually con- structed. Second, it may abstract away potentially relevant semantic differences. For example, the se- mantics of “Republican front-runner” cannot be ad- equately encoded in the Freebase schema because it lacks the concept of a “front-runner.” We could choose to encode this concept as “politician” at the cost of abstracting away the distinction between the two. As this example illustrates, these two problems are prevalent in even the largest knowledge bases. An alternative paradigm is an open predicate vocabulary, where each natural language word or phrase is given its own formal predicate. This paradigm is embodied in both open information ex- traction (Banko et al., 2007) and universal schema (Riedel et al., 2013). Open predicate vocabularies have the potential to capture subtle semantic distinc- tions and achieve high coverage. However, we have yet to develop compelling approaches to composi- tional semantics within this paradigm. This paper takes a step toward compositional se- 257 Transactions of the Association for Computational Linguistics, vol. 3, pp. 257–270, 2015. Action Editor: Katrin Erk. Submission batch: 12/2014; Revision batch 3/2015; Published 5/2015. c©2015 Association for Computational Linguistics. Distributed under a CC-BY-NC-SA 4.0 license. Input Text Republican front-runner from Texas → Logical Form λx.∃y,z.FRONT-RUNNER(x)∧y = /EN/REPUBLICAN∧NN(y,x)∧z = /EN/TEXAS ∧ FROM(x,z) → Entity Prob. /EN/GEORGE BUSH 0.57 /EN/RICK PERRY 0.45 ... φ θ TEXAS REPUB. G. BUSH ... F. -R U N . P O L . S T A T E .. . Entity/Predicate Embeddings φ θ (G. BUSH, TEXAS) (G. BUSH, REPUB.) (REPUB., G. BUSH) ... F R O M L IV E S IN N N .. . → TEXAS REPUB. G. BUSH ... F. -R U N . P O L . S T A T E .. . .1 .1 .9 .1 .1 .9 .8 .1 .1 Probabilistic Database (G. BUSH, TEXAS) (G. BUSH, REPUB.) (REPUB., G. BUSH) ... F R O M L IV E S IN N N .. . .9 .1 .1 .9 .1 .1 .1 .1 .7 Figure 1: Overview of our approach. Top left: the text is converted to logical form by CCG syntactic parsing and a collection of manually-defined rules. Bottom: low-dimensional embeddings of each entity (entity pair) and category (relation) are learned from an entity-linked web corpus. These embeddings are used to construct a probabilistic database. The labels of these matrices are shortened for space reasons. Top right: evaluating the logical form on the probabilistic database computes the marginal probability that each entity is an element of the text’s denotation. mantics with an open predicate vocabulary. Our ap- proach defines a distribution over denotations (sets of Freebase entities) given an input text. The model has two components, shown in Figure 1. The first component is a rule-based semantic parser that uses a syntactic CCG parser and manually-defined rules to map entity-linked texts to logical forms contain- ing predicates derived from the words in the text. The second component is a probabilistic database with a possible worlds semantics that defines a dis- tribution over denotations for each textually-derived predicate. This database assigns independent prob- abilities to individual predicate instances, such as P(FRONT-RUNNER(/EN/GEORGE BUSH)) = 0.9. Together, these components define an exponentially- large distribution over denotations for an input text; to simplify this output, we compute the marginal probability, over all possible worlds, that each entity is an element of the text’s denotation. The learning problem in our approach is to train the probabilistic database to predict a denotation for each predicate. We pose this problem as probabilis- tic matrix factorization with a novel query/answer ranking objective. This factorization learns a low- dimensional embedding of each entity (entity pair) and category (relation) such that the denotation of a predicate is likely to contain entities or entity pairs with nearby vectors. To train the database, we first collect training data by analyzing entity-linked sen- tences in a large web corpus with the rule-based semantic parser. This process generates a collec- tion of logical form queries with observed entity an- swers. The query/answer ranking objective, when optimized, trains the database to rank the observed answers for each query above unobserved answers. We evaluate our approach on a question answer- ing task, finding that our approach outperforms sev- eral baselines and that our new training objective improves performance over a previously-proposed objective. We also evaluate the trade-offs between open and closed predicate vocabularies by compar- ing our approach to a manually-annotated Freebase query for each question. This comparison reveals that, when Freebase contains predicates that cover the question, it achieves higher precision and recall than our approach. However, our approach can cor- rectly answer many questions not covered by Free- base. 2 System Overview The purpose of our system is to predict a denota- tion γ for a given natural language text s. The de- notation γ is the set of Freebase entities that s refers to; for example, if s = “president of the US,” then γ = {/EN/OBAMA, /EN/BUSH, ...}.1 Our system 1This paper uses a simple model-theoretic semantics where the denotation of a noun phrase is a set of entities and the de- notation of a sentence is either true or false. However, for no- tational convenience, denotations γ will be treated as sets of entities throughout. 258 represents this prediction problem using the follow- ing probabilistic model: P(γ|s) = ∑ w ∑ ` P(γ|`,w)P(w)P(`|s) The first term in this factorization, P(`|s), is a dis- tribution over logical forms ` given the text s. This term corresponds to the rule-based semantic parser (Section 3). This semantic parser is deterministic, so this term assigns probability 1 to a single logical form for each text. The second term, P(w), repre- sents a distribution over possible worlds, where each world is an assignment of truth values to all possible predicate instances. The distribution over worlds is represented by a probabilistic database (Section 4). The final term, P(γ|`,w), deterministically evalu- ates the logical form ` on the world w to produce a denotation γ. This term represents query evaluation against a fixed database, as in other work on seman- tic parsing. Section 5 describes inference in our model. To produce a ranked list of entities (Figure 1, top right) from P(γ|s), our system computes the marginal probability that each entity is an element of the de- notation γ. This problem corresponds to query eval- uation in a probabilistic database, which is known to be tractable in many cases (Suciu et al., 2011). Section 6 describes training, which estimates pa- rameters for the probabilistic database P(w). This step first automatically generates training data using the rule-based semantic parser. This data is used to formulate a matrix factorization problem that is op- timized to estimate the database parameters. 3 Rule-Based Semantic Parser The first part of our compositional semantics system is a rule-based system that deterministically com- putes a logical form ` for a text s. This component is used during inference to analyze the logical struc- ture of text, and during training to generate training data (see Section 6.1). Several input/output pairs for this system are shown in Figure 2. The conversion to logical form has 3 phases: 1. CCG syntactic parsing parses the text and ap- plies several deterministic syntactic transfor- mations to facilitate semantic analysis. 2. Entity linking marks known Freebase entities in the text. 3. Semantic analysis assigns a logical form to each word, then composes them to produce a logical form for the complete text. 3.1 Syntactic Parsing The first step in our analysis is to syntactically parse the text. We use the ASP-SYN parser (Kr- ishnamurthy and Mitchell, 2014) trained on CCG- Bank (Hockenmaier and Steedman, 2002). We then automatically transform the resulting syntactic parse to make the syntactic structure more amenable to semantic analysis. This step marks NPs in conjunctions by replacing their syntactic category with NP[conj]. This transformation allows seman- tic analysis to distinguish between appositives and comma-separated lists. It also transforms all verb ar- guments to core arguments, i.e., using the category PP/NP as opposed to ((S\NP)\(S\NP))/NP. This step simplifies the semantic analysis of verbs with prepositional phrase arguments. The final transformation adds a word feature to each PP cat- egory, e.g., mapping PP to PP[by]. These features are used to generate verb-preposition relation predi- cates, such as DIRECTED BY. 3.2 Entity Linking The second step is to identify mentions of Freebase entities in the text. This step could be performed by an off-the-shelf entity linking system (Ratinov et al., 2011; Milne and Witten, 2008) or string matching. However, our training and test data is derived from Clueweb 2009, so we rely on the entity linking for this corpus provided by Gabrilovich et. al (2013). Our system incorporates the provided entity links into the syntactic parse provided that they are con- sistent with the parse structure. Specifically, we re- quire that each mention is either (1) a constituent in the parse tree with syntactic category N or NP or (2) a collection of N/N or NP/NP modifiers with a single head word. The first case covers noun and noun phrase mentions, while the second case cov- ers noun compounds. In both cases, we substitute a single multi-word terminal into the parse tree span- ning the mention and invoke special semantic rules for mentions described in the next section. 259 Dan Hesse, CEO of Sprint λx.∃y.x = /EN/DAN HESSE ∧ CEO(x)∧ OF(x,y)∧y = /EN/SPRINT Yankees pitcher λx.∃y.PITCHER(x)∧ NN(y,x)∧y = /EN/YANKEES Tom Cruise plays Maverick in the movie Top Gun. ∃x,y,z.x = /EN/TOM CRUISE ∧ PLAYS(x,y) ∧ y = /EN/MAVERICK (CHARACTER) ∧ PLAYS IN(x,z)∧z = /EN/TOP GUN Figure 2: Example input/output pairs for our seman- tic analysis system. Mentions of Freebase entities in the text are indicated by underlines. 3.3 Semantic analysis The final step uses the syntactic parse and entity links to produce a logical form for the text. The sys- tem induces a logical form for every word in the text based on its syntactic CCG category. Composing these logical forms according to the syntactic parse produces a logical form for the entire text. Our semantic analyses are based on a relatively naı̈ve model-theoretic semantics. We focus on lan- guage whose semantics can be represented with existentially-quantified conjunctions of unary and binary predicates, ignoring, for example, tempo- ral scope and superlatives. Generally, our sys- tem models nouns and adjectives as unary predi- cates, and verbs and prepositions as binary predi- cates. Special multi-word predicates are generated for verb-preposition combinations. Entity mentions are mapped to the mentioned entity in the logical form. We also created special rules for analyzing conjunctions, appositives, and relativizing conjunc- tions. The complete list of rules used to produce these logical forms is available online.2 We made several notable choices in designing this component. First, multi-argument verbs are ana- lyzed using pairwise relations, as in the third exam- ple in Figure 2. This analysis allows us to avoid rea- soning about entity triples (quadruples, etc.), which are challenging for the matrix factorization due to sparsity. Second, noun-preposition combinations are analyzed as a category and relation, as in the first example in Figure 2. We empirically found that combining the noun and preposition in such 2http://rtw.ml.cmu.edu/tacl2015_csf instances resulted in worse performance, as it dra- matically increased the sparsity of training instances for the combined relations. Third, entity mentions with the N/N category are analyzed using a spe- cial noun-noun relation, as in the second example in Figure 2. Our intuition is that this relation shares instances with other relations (e.g., “city in Texas” implies “Texan city”). Finally, we lowercased each word to create its predicate name, but performed no lemmatization or other normalization. 3.4 Discussion The scope of our semantic analysis system is some- what limited relative to other similar systems (Bos, 2008; Lewis and Steedman, 2013) as it only out- puts existentially-quantified conjunctions of predi- cates. Our goal in building this system was to an- alyze noun phrases and simple sentences, for which this representation generally suffices. The reason for this focus is twofold. First, this subset of language is sufficient to capture much of the language surround- ing Freebase entities. Second, for various techni- cal reasons, this restricted semantic representation is easier to use (and more informative) for training the probabilistic database (see Section 6.3). Note that this system can be straightforwardly ex- tended to model additional linguistic phenomena, such as additional logical operators and generalized quantifiers, by writing additional rules. The seman- tics of logical forms including these operations are well-defined in our model, and the system does not even need to be re-trained to incorporate these addi- tions. 4 Probabilistic Database The second part of our compositional semantics sys- tem is a probabilistic database. This database rep- resents a distribution over possible worlds, where each world is an assignment of truth values to ev- ery predicate instance. Equivalently, the probabilis- tic database can be viewed as a distribution over databases or knowledge bases. Formally, a probabilistic database is a collection of random variables, each of which represents the truth value of a single predicate instance. Given en- tities e ∈ E, categories c ∈ C, and relations r ∈ R the probabilistic database contains boolean random 260 variables c(e) and r(e1,e2) for each category and relation instance, respectively. All of these ran- dom variables are assumed to be independent. Let a world w represent an assignment of truth values to all of these random variables, where c(e) = wc,e and r(e1,e2) = wr,e1,e2 . By independence, the probabil- ity of a world can be written as: P(w) = ∏ e∈E ∏ c∈C P(c(e) = wc,e)× ∏ e1∈E ∏ e2∈E ∏ r∈R P(r(e1,e2) = wr,e1,e2) The next section discusses how probabilistic ma- trix factorization is used to model the probabilities of these predicate instances. 4.1 Matrix Factorization Model The probabilistic matrix factorization model treats the truth of each predicate instance as an indepen- dent boolean random variable that is true with prob- ability: P(c(e) = TRUE) = σ(θTc φe) P(r(e1,e2) = TRUE) = σ(θ T r φ(e1,e2)) Above, σ(x) = e x 1+ex is the logistic function. In these equations, θc and θr represent k-dimensional vectors of per-predicate parameters, while φe and φ(e1,e2) represent k-dimensional vector embeddings of each entity and entity pair. This model contains a low-dimensional embedding of each predicate and entity such that each predicate’s denotation has a high probability of containing entities with nearby vectors. The probability that each variable is false is simply 1 minus the probability that it is true. This model can be viewed as matrix factorization, as depicted in Figure 1. The category and relation instance probabilities can be arranged in a pair of matrices of dimension |E| × |C| and |E|2 × |R|. Each row of these matrices represents an entity or entity pair, each column represents a category or re- lation, and each value is between 0 and 1 and rep- resents a truth probability (Figure 1, bottom right). These two matrices are factored into matrices of size |E|×k and k ×|C|, and |E|2 ×k and k ×|R|, re- spectively, containing k-dimensional embeddings of each entity, category, entity pair and relation (Figure 1, bottom left). These low-dimensional embeddings are represented by the parameters φ and θ. 5 Inference: Computing Marginal Probabilities Inference computes the marginal probability, over all possible worlds, that each entity is an element of a text’s denotation. In many cases – depending on the text – these marginal probabilities can be com- puted exactly in polynomial time. The inference problem is to calculate P(e ∈ γ|s) for each entity e. Because both the semantic parser P(`|s) and query evaluation P(γ|`,w) are deter- ministic, this problem can be rewritten as: P(e ∈ γ|s) = ∑ γ 1(e ∈ γ)P(γ|s) = ∑ w 1(e ∈ J`Kw)P(w) Above, ` represents the logical form for the text s produced by the rule-based semantic parser, and 1 represents the indicator function. The notation J`Kw represents denotation produced by (determin- istically) evaluating the logical form ` on world w. This inference problem corresponds to query eval- uation in a probabilistic database, which is #P-hard in general. Intuitively, this problem can be difficult because P(γ|s) is a joint distribution over sets of en- tities that can be exponentially large in the number of entities. However, a large subset of probabilistic database queries, known as safe queries, permit polynomial time evaluation (Dalvi and Suciu, 2007). Safe queries can be evaluated extensionally using a prob- abilistic notion of a denotation that treats each entity as independent. Let J`KP denote a probabilistic de- notation, which is a function from entities (or entity pairs) to probabilities, i.e., J`KP (e) ∈ [0,1]. The denotation of a logical form is then computed re- cursively, in the same manner as a non-probabilistic denotation, using probabilistic extensions of the typ- ical rules, such as: JcKP (e) = ∑ w P(w)1(wc,e) JrKP (e1,e2) = ∑ w P(w)1(wr,e1,e2) Jc1(x)∧ c2(x)KP (e) = Jc1KP (e)× Jc2KP (e) J∃y.r(x,y)KP (e) = 1− ∏ y∈E (1− JrKP (e,y)) 261 The first two rules are base cases that simply re- trieve predicate probabilities from the probabilistic database. The remaining rules compute the prob- abilistic denotation of a logical form from the de- notations of its parts.3 The formula for the prob- abilistic computation on the right of each of these rules is a straightforward consequence of the (as- sumed) independence of entities. For example, the last rule computes the probability of an OR of a set of independent random variables (indexed by y) us- ing the identity A ∨ B = ¬(¬A ∧¬B). For safe queries, J`KP (e) = P(e ∈ γ|s), that is, the proba- bilistic denotation computed according to the above rules is equal to the marginal probability distribu- tion. In practice, all of the queries in the experi- ments are safe, because they contain only one query variable and do not contain repeated predicates. For more information on query evaluation in probabilis- tic databases, we refer the reader to Suciu et al. (2011). Note that inference does not compute the most probable denotation, maxγ P(γ|s). In some sense, the most probable denotation is the correct output for a model-theoretic semantics. However, it is highly sensitive to the probabilities in the database, and in many cases it is empty (because a conjunction of independent boolean random variables is unlikely to be true). Producing a ranked list of entities is also useful for evaluation purposes. 6 Training The training problem in our approach is to learn pa- rameters θ and φ for the probabilistic database. We consider two different objective functions for learn- ing these parameters that use slightly different forms of training data. In both cases, training has two phases. First, we generate training data, in the form of observed assertions or query-answer pairs, by ap- plying the rule-based semantic parser to a corpus of entity-linked web text. Second, we optimize the parameters of the probabilistic database to rank ob- served assertions or answers above unobserved as- sertions or answers. 3This listing of rules is partial as it does not include, e.g., negation or joins between one-argument and two-argument log- ical forms. However, the corresponding rules are easy to derive. 6.1 Training Data Training data is generated by applying the process illustrated in Figure 3 to each sentence in an entity- linked web corpus. First, we apply our rule-based semantic parser to the sentence to produce a logi- cal form. Next, we extract portions of this logical form where every variable is bound to a particu- lar Freebase entity, resulting in a simplified logical form. Because the logical forms are existentially- quantified conjunctions of predicates, this step sim- ply discards any conjuncts in the logical form con- taining a variable that is not bound to a Freebase entity. From this simplified logical form, we gen- erate two types of training data: (1) predicate in- stances, and (2) queries with known answers (see Figure 3). In both cases, the corpus consists entirely of assumed-to-be-true statements, making obtaining negative examples a major challenge for training.4 6.2 Predicate Ranking Objective Riedel et al. (2013) introduced a ranking objective to work around the lack of negative examples in a sim- ilar matrix factorization problem. Their objective is a modified version of Bayesian Personalized Rank- ing (Rendle et al., 2009) that aims to rank observed predicate instances above unobserved instances. This objective function uses observed predicate instances (Figure 3, bottom left) as training data. This data consists of two collections, {(ci,ei)}ni=1 and {(rj, tj)}mj=1, of observed category and relation instances. We use tj to denote a tuple of entities, tj = (ej,1,ej,2), to simplify notation. The predicate ranking objective is: OP (θ,φ) = n∑ i=1 log σ(θTci(φei −φe′i)) + m∑ j=1 log σ(θTrj (φtj −φt′j )) where e′i is a randomly sampled entity such that (ci,e ′ i) does not occur in the training data. Simi- larly, t′j is a random entity tuple such that (rj, t ′ j) does not occur. Maximizing this function attempts 4A seemingly simple solution to this problem is to randomly generate negative examples; however, we empirically found that this approach performs considerably worse than both of the pro- posed ranking objectives. 262 Original sentence and logical form General Powell, appearing Sunday on CNN ’s Late Edition, said ... ∃w,x,y,z. w = /EN/POWELL ∧ GENERAL(w) ∧ APPEARING(w,x) ∧ SUNDAY(x) ∧ APPEARING ON(w,y) ∧y = /EN/LATE ∧ ’S(z,y)∧z = /EN/CNN ∧ SAID(w,...) Simplified logical form ∃w,y,z. w = /EN/POWELL ∧ GENERAL(w)∧ APPEARING ON(w,y)∧y = /EN/LATE ∧ ’S(z,y)∧z = /EN/CNN Instances Queries Answers GENERAL(/EN/POWELL) λw.GENERAL(w)∧ APPEARING ON(w, /EN/LATE) /EN/POWELL APPEARING ON(/EN/POWELL, /EN/LATE) λy.APPEARING ON(/EN/POWELL,y)∧ ’S(/EN/CNN,y) /EN/LATE ’S(/EN/CNN, /EN/LATE) λz.’S(z, /EN/LATE) /EN/CNN Figure 3: Illustration of training data generation applied to a single sentence. We generate two types of train- ing data, predicate instances and queries with observed answers, by semantically parsing the sentence and extracting portions of the generated logical form with observed entity arguments. The predicate instances are extracted from the conjuncts in the simplified logical form, and the queries are created by removing a single entity from the simplified logical form. to find θci , φei and φe′i such that P(ci(ei)) is larger than P(ci(e′i)) (and similarly for relations). During training, e′i and t ′ j are resampled on each pass over the data set according to each entity or tuple’s em- pirical frequency. 6.3 Query Ranking Objective The previous objective aims to rank the entities within each predicate well. However, such within- predicate rankings are insufficient to produce correct answers for queries containing multiple predicates – the scores for each predicate must further be cali- brated to work well with each other given the inde- pendence assumptions of the probabilistic database. We introduce a new training objective that encour- ages good rankings for entire queries instead of sin- gle predicates. The data for this objective consists of tuples, {(`i,ei)}ni=1, of a query `i with an ob- served answer ei (Figure 3, bottom right). Each `i is a function with exactly one entity argument, and `i(e) is a conjunction of predicate instances. For ex- ample, the last query in Figure 3 is a function of one argument z, and `(e) is a single predicate instance, ’S(e, /EN/LATE). The new objective aims to rank the observed entity answer above unobserved enti- ties for each query: OQ(θ,φ) = n∑ i=1 log Prank(`i,ei,e ′ i) Prank generalizes the approximate ranking prob- ability defined by the predicate ranking objec- tive to more general queries. The expression σ(θTc (φe − φe′)) in the predicate ranking objective can be viewed as an approximation of the prob- ability that e is ranked above e′ in category c. Prank uses this approximation for each individual predicate in the query. For example, given the query ` = λx.c(x) ∧ r(x,y) and entities (e,e′), Prank(`,e,e ′) = σ(θc(φe − φe′)) × σ(θr(φ(e,y) − φ(e′,y))). For this objective, we sample e ′ i such that (`i,e ′ i) does not occur in the training data. When `’s body consists of a conjunction of pred- icates, the query ranking objective simplifies con- siderably. In this case, ` can be described as three sets of one-argument functions: categories C(`) = {λx.c(x)}, left arguments of relations RL(`) = {λx.r(x,y)}, and right arguments of re- lations RR(`) = {λx.r(y,x)}. Furthermore, Prank is a product so we can distribute the log: OQ(θ,φ) = n∑ i=1 ∑ λx.c(x)∈C(`i) log σ(θc(φei −φe′i)) + ∑ λx.r(x,y)∈RL(`i) log σ(θr(φ(ei,y) −φ(e′i,y))) + ∑ λx.r(y,x)∈RR(`i) log σ(θr(φ(y,ei) −φ(y,e′i))) This simplification reveals that the main differ- ence between OQ and OP is the sampling of the unobserved entities e′ and tuples t′. OP samples them in an unconstrained fashion from their empir- ical distributions for every predicate. OQ considers 263 the larger context in which each predicate occurs, with two major effects. First, more negative exam- ples are generated for categories because the logical forms ` are more specific. For example, both “pres- ident of Sprint” and “president of the US” generate instances of the PRESIDENT predicate; OQ will use entities that only occur with one of these as negative examples for the other. Second, the relation param- eters are trained to rank tuples with a shared argu- ment, as opposed to tuples in general. Note that, although Prank generalizes to more complex logical forms than existentially-quantified conjunctions, training with these logical forms is more difficult because Prank is no longer a product. In these cases, it becomes necessary to perform in- ference within the gradient computation, which can be expensive. The restriction to conjunctions makes inference trivial, enabling the factorization above. 7 Evaluation We evaluate our approach to compositional seman- tics on a question answering task. Each test exam- ple is a (compositional) natural language question whose answer is a set of Freebase entities. We com- pare our open domain approach to several baselines based on prior work, as well as a human-annotated Freebase query for each example. 7.1 Data We used Clueweb09 web corpus5 with the corre- sponding Google FACC entity linking (Gabrilovich et al., 2013) to create the training and test data for our experiments. The training data is derived from 3 million webpages, and contains 2.1m predicate in- stances, 1.1m queries, 172k entities and 181k entity pairs. Predicates that appeared fewer than 6 times in the training data were replaced with the predicate UNK, resulting in 25k categories and 2.2k relations. Our test data consists of fill-in-the-blank natu- ral language questions such as “Incan emperor ” or “Cunningham directed Auchtre’s second music video .” These questions were created by apply- ing the training data generation process (Section 6.1) to a collection of held-out webpages. Each natural language question has a corresponding logical form 5http://www.lemurproject.org/clueweb09. php # of questions 220 Avg. # of predicates / query 2.77 Avg. # of categories / query 1.75 Avg. # of relations / query 1.02 Avg. # of answers / query 1.92 # of questions with ≥ 1 answer 116 (found by at least one system) Table 1: Statistics of the test data set. query containing at least one category and relation. We chose not to use existing data sets for seman- tic parsing into Freebase as our goal is to model the semantics of language that cannot necessarily be modelled using the Freebase schema. Existing data sets, such as Free917 (Cai and Yates, 2013) and We- bQuestions (Berant et al., 2013), would not allow us to evaluate performance on this subset of language. Consequently, we evaluate our system on a new data set with unconstrained language. However, we do compare our approach against manually-annotated Freebase queries on our new data set (Section 7.5). All of the data for our experiments is available at http://rtw.ml.cmu.edu/tacl2015_csf. 7.2 Methodology Our evaluation methodology is inspired by infor- mation retrieval evaluations (Manning et al., 2008). Each system predicts a ranked list of 100 answers for each test question. We then pool the top 30 an- swers of each system and manually judge their cor- rectness. The correct answers from the pool are then used to evaluate the precision and recall of each sys- tem. In particular, we compute average precision (AP) for each question and report the mean average precision (MAP) across all questions. We also re- port a weighted version of MAP, where each ques- tion’s AP is weighted by its number of annotated correct answers. Average precision is computed as 1 m ∑m k=1 Prec(k) × Correct(k), where Prec(k) is the precision at rank k, Correct(k) is an indicator function for whether the kth answer is correct, and m is the number of returned answers (at most 100). Statistics of the annotated test set are shown in Table 1. A consequence of our unconstrained data generation approach is that some test questions are difficult to answer: of the 220 queries, at least one system was able to produce a correct answer for 116. The remaining questions are mostly unanswerable 264 MAP Weighted MAP CLUSTERING 0.224 0.266 CORPUSLOOKUP 0.246 0.296 FACTORIZATION (OP ) 0.299 0.473 FACTORIZATION (OQ) 0.309 0.492 ENSEMBLE (OP ) 0.391 0.614 ENSEMBLE (OQ) 0.406 0.645 Upper bound 0.527 1.0 Table 2: Mean average precision for our question answering task. The difference in MAP between each pair of adjacent models is statistically signifi- cant (p < .05) via the sign test. because they reference rare entities unseen in the training data. 7.3 Models and Baselines We implemented two baseline models based on ex- isting techniques. The CORPUSLOOKUP baseline answers test questions by directly using the predi- cate instances in the training data as its knowledge base. For example, given the query λx.CEO(x) ∧ OF(x, /EN/SPRINT), this model will return the set of entities e such that CEO(e) and OF(e, /EN/SPRINT) both appear in the training data. All answers found in this fashion are assigned probability 1. The CLUSTERING baseline first clusters the pred- icates in the training corpus, then answers ques- tions using the clustered predicates. The clustering aggregates predicates with similar denotations, ide- ally identifying synonyms to smooth over sparsity in the training data. Our approach is closely based on Lewis and Steedman (2013), though is also con- ceptually related to approaches such as DIRT (Lin and Pantel, 2001) and USP (Poon and Domingos, 2009). We use the Chinese Whispers clustering al- gorithm (Biemann, 2006) and calculate the similar- ity between predicates as the cosine similarity of their TF-IDF weighted entity count vectors. The de- notation of each cluster is the union of the denota- tions of the clustered predicates, and each entity in the denotation is assigned probability 1. We also trained two probabilistic database mod- els, FACTORIZATION (OP ) and FACTORIZATION (OQ), using the two objective functions described in Sections 6.2 and 6.3, respectively. We optimized both objectives by performing 100 passes over the Recall P re ci si on ENS. (OQ) ENS. (OP ) FACT. (OQ) FACT. (OP ) C.LOOKUP CLUSTERING 0 0.2 0.4 0.6 0.8 1.0 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 b b b b b b b b b b b + + + + + + + + + + + r r r r r r r r r r r u u u u u u u u u u u b b b b b b b b b b b + + + + + + + + + + + Figure 4: Averaged 11-point precision/recall curves for the 116 answerable test questions. training data with AdaGrad (Duchi et al., 2011) us- ing an L2 regularization parameter of λ = 10−4. The predicate and entity embeddings have 300 di- mensions. These parameters were selected on the basis of preliminary experiments with a small vali- dation set. Finally, we observed that CORPUSLOOKUP has high precision but low recall, while both matrix fac- torization models have high recall with somewhat lower precision. This observation suggested that an ensemble of CORPUSLOOKUP and FACTORIZA- TION could outperform either model individually. We created two ensembles, ENSEMBLE (OP ) and ENSEMBLE (OQ), by calculating the probability of each predicate as a 50/50 mixture of each model’s predicted probability. 7.4 Results Table 2 shows the results of our MAP evaluation, and Figure 4 shows a precision/recall curve for each model. The MAP numbers are somewhat low be- cause almost half of the test questions have no cor- rect answers and all models get an average preci- sion of 0 on these questions. The upper bound on MAP is the fraction of questions with at least 1 correct answer. Note that the models perform well on the answerable questions, as reflected by the ra- tio of the achieved MAP to the upper bound. The weighted MAP metric also corrects for these unan- swerable questions, as they are assigned 0 weight in the weighted average. These results demonstrate several findings. First, we find that both FACTORIZATION models outper- form the baselines in both MAP and weighted MAP. 265 # of questions w/ an annotated MQL query 142 query returns > 1 answer 95 query returns no answers 47 # of questions w/o an MQL query 78 Table 3: Statistics of the Freebase MQL queries an- notated for the test data set. The performance improvement seems to be most significant in the high recall regime (right side of Figure 4). Second, we find that the query ranking objective OQ improves performance over the predi- cate ranking objective OP by 2-4% on the answer- able queries. The precision/recall curves show that this improvement is concentrated in the low recall regime. Finally, the ensemble models are consider- ably better than their component models; however, even in the ensembled models, we find that OQ out- performs OP by a few percent. 7.5 Comparison to Semantic Parsing to Freebase A natural question is whether our open vocabu- lary approach outperforms a closed approach for the same problem, such as semantic parsing to Freebase (e.g., Reddy et al. (2014)). In order to answer this question, we compared our best performing model to a manually-annotated Freebase query for each test question. This comparison allows us to understand the relative advantages of open and closed predicate vocabularies. The first author manually annotated a Freebase MQL query for each natural language question in the test data set. This annotation is somewhat sub- jective, as many of the questions can only be inex- actly mapped on to the Freebase schema. We used the following guidelines in performing the map- ping: (1) all relations in the text must be mapped to one or more Freebase relations, (2) all enti- ties mentioned in the text must be included in the query, (3) adjective modifiers can be ignored and (4) entities not mentioned in the text may be in- cluded in the query. The fourth condition is nec- essary because many one-place predicates, such as MAYOR(x), are represented in Freebase using a binary relation to a particular entity, such as GOVERNMENT OFFICE/TITLE(x, /EN/MAYOR). Statistics of the annotated queries are shown in MAP ENSEMBLE (OQ) 0.263 Freebase 0.385 Table 4: Mean average precision of our best per- forming model compared to a manually annotated Freebase query for each test question. Table 3. Coverage is reasonably high: we were able to annotate a Freebase query for 142 questions (65% of the test set). The remaining unannotatable ques- tions are due to missing predicates in Freebase, such as a relation defining the emperor of the Incan em- pire. Of the 142 annotated Freebase queries, 95 of them return at least one entity answer. The queries with no answers typically reference uncommon en- tities which have few or no known relation instances in Freebase. The annotated queries contain an aver- age of 2.62 Freebase predicates. We compared our best performing model, EN- SEMBLE (OQ), to the manually annotated Freebase queries using the same pooled evaluation methodol- ogy. The set of correct answers contains the correct predictions of ENSEMBLE (OQ) from the previous evaluation along with all answers from Freebase. Results from this evaluation are shown in Table 4.6 In terms of overall MAP, Freebase outperforms our approach by a fair margin. However, this ini- tial impression belies a more complex reality, which is shown in Table 5. This table compares both ap- proaches by their relative performance on each test question. On approximately one-third of the ques- tions, Freebase has a higher AP than our approach. On another third, our approach has a higher AP than Freebase. On the final third, both approaches per- form equally well – these are typically questions where neither approach returns any correct answers (67 of the 75). Freebase outperforms in the over- all MAP evaluation because it tends to return more correct answers to each question. Note that the annotated Freebase queries have several advantages in this evaluation. First, Freebase contains significantly more predicate instances than our training data, which allows it to produce more complete answers. Second, the Freebase queries 6The numbers in this table are not comparable to the num- bers in Table 2 as the correct answers for each question are dif- ferent. 266 # of queries Freebase higher AP 75 (34%) equal AP 75 (34%) ENSEMBLE (OQ) higher AP 70 (31%) Table 5: Question-by-question comparison of model performance. Each test question is placed into one of the three buckets above, depending on whether Freebase or ENSEMBLE (OQ) achieves a better av- erage precision (AP) for the question. correspond to the performance of a perfect semantic parser, while current semantic parsers achieve accu- racies around 68% (Berant and Liang, 2014). The results from this experiment suggest that closed and open predicate vocabularies are comple- mentary. Freebase produces high quality answers when it covers a question. However, many of the re- maining questions can be answered correctly using an open vocabulary approach like ours. This evalu- ation also suggests that recall is a limiting factor of our approach; in the future, recall can be improved by using a larger corpus or including Freebase in- stances during training. 8 Related Work Open Predicate Vocabularies There has been considerable work on generating semantic representations with an open predicate vo- cabulary. Much of the work is non-compositional, focusing on identifying similar predicates and enti- ties. DIRT (Lin and Pantel, 2001), Resolver (Yates and Etzioni, 2007) and other systems (Yao et al., 2012) cluster synonymous expressions in a corpus of relation triples. Matrix factorization is an alter- native approach to clustering that has been used for relation extraction (Riedel et al., 2013; Yao et al., 2013) and finding analogies (Turney, 2008; Speer et al., 2008). All of this work is closely related to dis- tributional semantics, which uses distributional in- formation to identify semantically similar words and phrases (Turney and Pantel, 2010; Griffiths et al., 2007). Some work has considered the problem of com- positional semantics with an open predicate vocab- ulary. Unsupervised semantic parsing (Poon and Domingos, 2009; Titov and Klementiev, 2011) is a clustering-based approach that incorporates com- position using a generative model for each sentence that factors according to its parse tree. Lewis and Steedman (2013) also present a clustering-based ap- proach that uses CCG to perform semantic compo- sition. This approach is similar to ours, except that we use matrix factorization and Freebase entities. Finally, some work has focused on the problem of textual inference within this paradigm. Fader et al. (2013) present a question answering system that learns to paraphrase a question so that it can be an- swered using a corpus of Open IE triples (Fader et al., 2011). Distributional similarity has also been used to learn weighted logical inference rules that can be used for recognizing textual entailment or identifying semantically similar text (Garrette et al., 2011; Garrette et al., 2013; Beltagy et al., 2013). This line of work focuses on performing inference between texts, whereas our work computes a text’s denotation. A significant difference between our work and most of the related work above is that our work computes denotations containing Freebase entities. Using these entities has two advantages: (1) it en- ables us to use entity linking to disambiguate textual mentions, and (2) it facilitates a comparison against alternative approaches that rely on a closed predi- cate vocabulary. Disambiguating textual mentions is a major challenge for previous approaches, so an entity-linked corpus is a much cleaner source of data. However, our approach could also work with automatically constructed entities, for example, cre- ated by clustering mentions in an unsupervised fash- ion (Singh et al., 2011). Semantic Parsing Several semantic parsers have been developed for Freebase (Cai and Yates, 2013; Kwiatkowski et al., 2013; Berant et al., 2013; Berant and Liang, 2014). Our approach is most similar to that of Reddy et al. (2014), which uses fixed syntactic parses of unla- beled text to train a Freebase semantic parser. Like our approach, this system automatically-generates query/answer pairs for training. However, this sys- tem, like all Freebase semantic parsers, uses a closed predicate vocabulary consisting of only Freebase predicates. In contrast, our approach uses an open predicate vocabulary and can learn denotations for words whose semantics cannot be represented using 267 Freebase predicates. Consequently, our approach can answer many questions that these Freebase se- mantic parsers cannot (see Section 7.5). The rule-based semantic parser used in this pa- per is very similar to several other rule-based sys- tems that produce logical forms from syntactic CCG parses (Bos, 2008; Lewis and Steedman, 2013). We developed our own system in order to have control over the particulars of the analysis; however, our ap- proach is compatible with these systems as well. Probabilistic Databases Our system assigns a model-theoretic semantics to statements in natural language (Dowty et al., 1981) using a learned distribution over possible worlds. This distribution is concisely represented in a probabilistic database, which can be viewed as a simple Markov Logic Network (Richardson and Domingos, 2006) where all of the random vari- ables are independent. This independence simplifies query evaluation: probabilistic databases permit ef- ficient exact inference for safe queries (Suciu et al., 2011), and approximate inference for the remain- der (Gatterbauer et al., 2010; Gatterbauer and Suciu, 2015). 9 Discussion This paper presents an approach for compositional semantics with an open predicate vocabulary. Our approach defines a probabilistic model over deno- tations (sets of Freebase entities) conditioned on an input text. The model has two components: a rule- based semantic parser that produces a logical form for the text, and a probabilistic database that defines a distribution over denotations for each predicate. A training phase learns the probabilistic database by applying probabilistic matrix factorization with a query/answer ranking objective to logical forms de- rived from a large, entity-linked web corpus. An ex- perimental analysis demonstrates that this approach outperforms several baselines and can answer many questions that cannot be answered by semantic pars- ing into Freebase. Our approach learns a model-theoretic semantics for natural language text tied to Freebase, as do some semantic parsers, except with an open predi- cate vocabulary. This difference influences several other aspects of the system’s design. First, because no knowledge base with the necessary knowledge exists, the system is forced to learn its knowledge base (in the form of a probabilistic database). Sec- ond, the system can directly map syntactic CCG parses to logical forms, as it is no longer neces- sary to map words to a closed vocabulary of knowl- edge base predicates. In some sense, our approach is the exact opposite of the typical semantic pars- ing approach: usually, the semantic parser is learned and the knowledge base is fixed; here, the knowl- edge base is learned and the semantic parser is fixed. From a machine learning perspective, train- ing a probabilistic database via matrix factorization is easier than training a semantic parser, as there are no difficult inference problems. However, it remains to be seen whether a learned knowledge base can achieve similar recall as a fixed knowledge base on the subset of language it covers. There are two limitations of this work. The most obvious limitation is the restriction to existentially quantified conjunctions of predicates. This limita- tion is not inherent to the approach, however, and can be removed in future work by using a system like Boxer (Bos, 2008) for semantic parsing. A more serious limitation is the restriction to one- and two-argument predicates, which prevents our system from representing events and n-ary relations. Con- ceptually, a similar matrix factorization approach could be used to learn embeddings for n-ary entity tuples; however, in practice, the sparsity of these tu- ples makes learning challenging. Developing meth- ods for learning n-ary relations is an important prob- lem for future work. A direction for future work is scaling up the size of the training corpus to improve recall. Low re- call is the main limitation of our current system as demonstrated by the experimental analysis. Both stages of training, the data generation and matrix factorization, can be parallelized using a cluster. All of the relation instances in Freebase can also be added to the training corpus. It should be feasible to increase the quantity of training data by a factor of 10-100, for example, to train on all of ClueWeb. Scaling up the training data may allow a semantic parser with an open predicate vocabulary to outper- form comparable closed vocabulary systems. 268 Acknowledgments This research was supported in part by DARPA un- der contract number FA8750-13-2-0005, and by a generous grant from Google. We additionally thank Matt Gardner, Ndapa Nakashole, Amos Azaria and the anonymous reviewers for their helpful com- ments. References Michele Banko, Michael J. Cafarella, Stephen Soderland, Matt Broadhead, and Oren Etzioni. 2007. Open infor- mation extraction from the web. In Proceedings of the 20th International Joint Conference on Artifical Intel- ligence. Islam Beltagy, Cuong Chau, Gemma Boleda, Dan Gar- rette, Katrin Erk, and Raymond Mooney. 2013. Mon- tague meets markov: Deep semantics with probabilis- tic logical form. In Second Joint Conference on Lexi- cal and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity. Jonathan Berant and Percy Liang. 2014. Semantic pars- ing via paraphrasing. In Proceedings of the 52nd An- nual Meeting of the Association for Computational Linguistics. Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on Freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Lan- guage Processing. Chris Biemann. 2006. Chinese whispers: An efficient graph clustering algorithm and its application to natu- ral language processing problems. In Proceedings of the First Workshop on Graph Based Methods for Nat- ural Language Processing. Johan Bos. 2008. Wide-coverage semantic analysis with boxer. In Proceedings of the 2008 Conference on Se- mantics in Text Processing. Qingqing Cai and Alexander Yates. 2013. Large-scale Semantic Parsing via Schema Matching and Lexicon Extension. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Nilesh Dalvi and Dan Suciu. 2007. Efficient query eval- uation on probabilistic databases. The VLDB Journal, 16(4), October. David R. Dowty, Robert E. Wall, and Stanley Peters. 1981. Introduction to Montague Semantics. John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121–2159, July. Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations for open information ex- traction. In Proceedings of the Conference on Empiri- cal Methods in Natural Language Processing. Anthony Fader, Luke Zettlemoyer, and Oren Etzioni. 2013. Paraphrase-driven learning for open question answering. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Evgeniy Gabrilovich, Michael Ringgaard, and Amar- nag Subramanya. 2013. FACC1: Freebase anno- tation of ClueWeb corpora, Version 1 (Release date 2013-06-26, Format version 1, Correction level 0). http://lemurproject.org/clueweb09/. Dan Garrette, Katrin Erk, and Raymond Mooney. 2011. Integrating logical representations with probabilistic information using markov logic. In Proceedings of the International Conference on Computational Seman- tics. Dan Garrette, Katrin Erk, and Raymond J. Mooney. 2013. A formal approach to linking logical form and vector-space lexical semantics. In Harry Bunt, Johan Bos, and Stephen Pulman, editors, Computing Mean- ing, volume 4, pages 27–48. Wolfgang Gatterbauer and Dan Suciu. 2015. Approx- imate lifted inference with probabilistic databases. Proceedings of the VLDB Endowment, 8(5), January. Wolfgang Gatterbauer, Abhay Kumar Jha, and Dan Su- ciu. 2010. Dissociation and propagation for efficient query evaluation over probabilistic databases. In Pro- ceedings of the Fourth International VLDB workshop on Management of Uncertain Data (MUD 2010) in conjunction with VLDB 2010, Singapore, September 13, 2010. Thomas L. Griffiths, Joshua B. Tenenbaum, and Mark Steyvers. 2007. Topics in semantic representation. Psychological Review 114. Julia Hockenmaier and Mark Steedman. 2002. Acquir- ing compact lexicalized grammars from a cleaner tree- bank. In Proceedings of Third International Confer- ence on Language Resources and Evaluation. Jayant Krishnamurthy and Tom M. Mitchell. 2012. Weakly supervised training of semantic parsers. In Proceedings of the 2012 Joint Conference on Empir- ical Methods in Natural Language Processing and Computational Natural Language Learning. Jayant Krishnamurthy and Tom M. Mitchell. 2014. Joint syntactic and semantic parsing with combinatory cat- egorial grammar. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguis- tics. Tom Kwiatkowski, Eunsol Choi, Yoav Artzi, and Luke Zettlemoyer. 2013. Scaling semantic parsers with on-the-fly ontology matching. In Proceedings of the 269 2013 Conference on Empirical Methods in Natural Language Processing. Mike Lewis and Mark Steedman. 2013. Combined distributional and logical semantics. Transactions of the Association for Computational Linguistics, 1:179– 192. Dekang Lin and Patrick Pantel. 2001. DIRT — discov- ery of inference rules from text. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Christopher D. Manning, Prabhakar Raghavan, and Hin- rich Schütze. 2008. Introduction to Information Re- trieval. Cambridge University Press, New York, NY, USA. David Milne and Ian H. Witten. 2008. Learning to link with wikipedia. In Proceedings of the 17th ACM Con- ference on Information and Knowledge Management. Hoifung Poon and Pedro Domingos. 2009. Unsuper- vised semantic parsing. In Proceedings of the 2009 Conference on Empirical Methods in Natural Lan- guage Processing. Lev Ratinov, Dan Roth, Doug Downey, and Mike An- derson. 2011. Local and global algorithms for dis- ambiguation to wikipedia. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Siva Reddy, Mirella Lapata, and Mark Steedman. 2014. Large-scale semantic parsing without question-answer pairs. Transactions of the Association of Computa- tional Linguistics – Volume 2, Issue 1. Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian per- sonalized ranking from implicit feedback. In Proceed- ings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. Matthew Richardson and Pedro Domingos. 2006. Markov logic networks. Machine Learning, 62(1- 2):107–136, February. Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. 2013. Relation extraction with matrix factorization and universal schemas. In Joint Human Language Technology Conference/Annual Meeting of the North American Chapter of the Asso- ciation for Computational Linguistics. Sameer Singh, Amarnag Subramanya, Fernando Pereira, and Andrew McCallum. 2011. Large-scale cross- document coreference using distributed inference and hierarchical models. In Association for Computa- tional Linguistics: Human Language Technologies (ACL HLT). Robert Speer, Catherine Havasi, and Henry Lieberman. 2008. AnalogySpace: Reducing the dimensionality of common sense knowledge. In AAAI. Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. 2011. Probabilistic databases. Synthesis Lec- tures on Data Management, 3(2):1–180. Ivan Titov and Alexandre Klementiev. 2011. A bayesian model for unsupervised semantic parsing. In Proceed- ings of the 49th Annual Meeting of the Association for Computational Linguistics. Peter D. Turney and Patrick Pantel. 2010. From fre- quency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research, 37(1), Jan- uary. Peter D. Turney. 2008. The latent relation mapping en- gine: Algorithm and experiments. Journal of Artificial Intelligence Research, 33(1):615–655, December. Mohamed Yahya, Klaus Berberich, Shady Elbassuoni, Maya Ramanath, Volker Tresp, and Gerhard Weikum. 2012. Natural language questions for the web of data. In Proceedings of the 2012 Joint Conference on Em- pirical Methods in Natural Language Processing and Computational Natural Language Learning. Limin Yao, Sebastian Riedel, and Andrew McCallum. 2012. Unsupervised relation discovery with sense dis- ambiguation. In Proceedings of the 50th Annual Meet- ing of the Association for Computational Linguistics: Long Papers - Volume 1. Limin Yao, Sebastian Riedel, and Andrew McCallum. 2013. Universal schema for entity type prediction. In Proceedings of the 2013 Workshop on Automated Knowledge Base Construction. Alexander Yates and Oren Etzioni. 2007. Unsupervised resolution of objects and relations on the web. In Pro- ceedings of the 2007 Annual Conference of the North American Chapter of the Association for Computa- tional Linguistics. John M. Zelle and Raymond J. Mooney. 1996. Learning to parse database queries using inductive logic pro- gramming. In Proceedings of the thirteenth national conference on Artificial Intelligence. Luke S. Zettlemoyer and Michael Collins. 2005. Learn- ing to map sentences to logical form: structured clas- sification with probabilistic categorial grammars. In UAI ’05, Proceedings of the 21st Conference in Un- certainty in Artificial Intelligence. 270