key: cord-0043893-eb502cin authors: Liu, Qingxia; Chen, Yue; Cheng, Gong; Kharlamov, Evgeny; Li, Junyou; Qu, Yuzhong title: Entity Summarization with User Feedback date: 2020-05-07 journal: The Semantic Web DOI: 10.1007/978-3-030-49461-2_22 sha: e5540eba384fc900f9cdb9723fbc7ea8da68f23b doc_id: 43893 cord_uid: eb502cin Semantic Web applications have benefited from entity summarization techniques which compute a compact summary for an entity by selecting a set of key triples from underlying data. A wide variety of entity summarizers have been developed. However, the quality of summaries they generate are still not satisfying, and we lack mechanisms for improving computed summaries. To address this challenge, in this paper we present the first study of entity summarization with user feedback. We consider a cooperative environment where a user reads the current entity summary and provides feedback to help an entity summarizer compute an improved summary. Our approach represents this iterative process as a Markov decision process where the entity summarizer is modeled as a reinforcement learning agent. To exploit user feedback, we represent the interdependence of triples in the current summary and the user feedback by a novel deep neural network which is incorporated into the policy of the agent. Our approach outperforms five baseline methods in extensive experiments with both real users and simulated users. Entity summarization is the task of computing an optimal compact summary for an entity by selecting a size-constrained subset of triples [13] . It has found application in many domains. For example, in Google's Knowledge Graph, an entity may be described by dozens or hundreds of triples. Showing all of them in an entity card would overload users. Google performs entity summarization by selecting key triples that users are likely to need for that particular entity. An entity summarizer is a tool that computes entity summaries. A wide variety of entity summarizers have been developed [13] . They generate summaries for general purposes [4, 7, 21, 22] or for specific applications such as Web browsing [9, 23] , Web search [10, 28] , and crowdsourcing [5, 6] . However, entity summarization is a difficult task. Recent evaluation results [14] show that summaries generated by existing entity summarizers still differ significantly from ground-truth summaries created by human experts (F1 < 0.6). Moreover, current entity summaries are static. There is a lack of mechanisms for improving an entity summary when its quality could not satisfy users' information needs. Research Challenges. One promising direction to improve entity summarization is to exploit user feedback. This idea has been practiced in related research such as document summarization [1, 27] and document retrieval [15, 24] . One can establish a cooperative environment where a user reads the current entity summary and conveniently provides feedback to help an entity summarizer compute an improved summary, which in turn is a motivation for user feedback. To effectively incorporate user feedback into entity summarization, there are two research challenges. First, we need to represent the cooperative process using a formal model. Second, to exploit user feedback, we need to represent the interdependence of the current summary and the user feedback. This is non-trivial because triples have both textual semantics and structural features. We address these challenges and study entity summarization with user feedback in the following cross-replace scenario, while our approach can be easily extended to support other scenarios. As illustrated in Fig. 1 , a user reads a computed summary S i for entity Solson Publications and provides negative feedback by crossing off an irrelevant triple f i . An entity summarizer analyzes the connection between S i and f i , and then replaces f i with a more relevant triple r i to form an improved summary S i+1 . The process can be repeated. To represent this cooperative process, we model an entity summarizer as a reinforcement learning agent, and model the iterative process as a Markov decision process. Further, we represent the interdependence of triples in the current summary and the user feedback by a novel deep neural network which is incorporated into the policy of the agent. Our approach is referred to as DRESSED, short for Deep Reinforced Entity Summarization with uSer fEeDback. We carry out a user study to demonstrate the effectiveness of DRESSED. We also conduct extensive offline evaluation based on two benchmarks for evaluating entity summarization and a standard framework of simulating user behavior. DRESSED outperforms five baseline methods including entity summarizers and relevance feedback models for document summarization/retrieval. To summarize, our contributions in this paper include -the first research effort to improve entity summarization with user feedback, -a representation of entity summarization with iterative user feedback as a Markov decision process, -a representation of sets of triples and their interdependence as a novel deep neural network, and -the first empirical study of entity summarization with user feedback based on both real users and simulated users. The remainder of the paper is organized as follows. We formulate the problem in Sect. 2, and describe our approach in Sect. 3. Online user study and offline evaluation with simulated users are reported in Sects. 4 and 5, respectively. We discuss related work in Sect. 6 before we conclude in Sect. 7. In this section we define the terms used in the paper and formulate the problem. Entity Description. Let I, B, L be the sets of all IRIs, blank nodes, and literals in RDF, respectively. An RDF dataset T is a set of RDF triples: For triple t ∈ T , let subj(t), pred(t), obj(t) return the subject, predicate, and object of the triple, respectively. The description of an entity e comprises all the triples in T where e is described as the subject or as the object: Desc(e) = {t ∈ T : subj(t) = e or obj(t) = e}. Entity Summarization. Given an integer size constraint k, a summary of entity e is a subset of triples S ⊆ Desc(e) such that |S| ≤ k. The problem of entity summarization is to generate an optimal summary from the original entity description by selecting an optimal subset of triples. Optimality could depend on the task and/or the context. We follow most existing researches to generate entity summaries for general purposes. User Feedback. Users and an entity summarizer work in a cooperative environment towards obtaining optimal summaries to best satisfy users' information needs. We consider the following cross-replace scenario where a user reads a computed summary and can provide negative feedback. Specifically, the summarizer computes and presents a summary S i for entity e. The user reads S i and crosses off an irrelevant triple f i ∈ S i . Based on this negative feedback, the summarizer selects a new candidate triple r i ∈ (Desc(e) \ S i ) to replace f i and form an improved summary S i+1 = (S i \ {f i }) ∪ {r i }. The process can be repeated until the user provides no further feedback due to satisfaction or loss of patience, or the candidate triples are used up. The problem we study in this paper is how a summarizer exploits user feedback to identify relevant triples for replacement. In the cross-replace scenario, an entity summarizer interacts with a user by iteratively exploiting user feedback to compute improved summaries. We want to optimize the user experience during the entire iterative process. It inspires us to model the summarizer as a reinforcement learning agent. Furthermore, an irrelevant triple crossed off by the user should not be presented again. Therefore, the iterative process has states and hence can be modeled as a Markov decision process (MDP), which we will describe in Sect. 3 We firstly review MDP and then we model the cross-replace scenario as an MDP. MDP. An MDP is represented as a state-action-reward-transition quadruple denoted by Z, A, ρ, τ . An agent interacts with the environment in discrete time steps: i = 0, 1, . . . , I. At time step i, the agent is in state Z i ∈ Z, and follows a θparameterized policy π θ : A × Z → [0, 1] to choose an action A i ∈ A to take. For action A ∈ A and state Z ∈ Z, the policy π θ (A|Z) gives the probability of taking action A when the agent is in state Z. At time step i + 1, the agent receives a real-valued immediate reward R i+1 ∈ R, and enters state Z i+1 . We assume immediate rewards and state transition can be deterministically characterized by functions ρ : Z × A → R and τ : Z × A → Z, respectively. An iterative process in the cross-replace scenario is represented as a trajectory denoted by ξ: The main learning problem here is to find a policy π θ that will maximize the expected discounted sum of the immediate rewards over ξ: where γ ∈ [0, 1] is a discount-rate parameter. We model an iterative process in the cross-replace scenario as an MDP. For integer i ≥ 0, let S i be the summary computed at time step i, i.e., in the i-th iteration. User feedback is part of the environment. Let f i be the irrelevant triple crossed off by the user at time step i. Let F i = {f j : 0 ≤ j ≤ i − 1} represent all the irrelevant triples crossed off prior to time step i. An entity summarizer is an agent. The set of candidate triples for time The full model is defined as follows: , The policy π θ uses a softmax function to map the scores of candidate triples to a probability distribution. Scores are computed by a θ-parameterized deep neural network shown in Fig. 2 , which we will describe in Sect. 3.2. In the computation of reward during training, rel(r i ) is the binary relevance label of triple r i : either rel(r i ) = 1 (relevant) or rel(r i ) = 0 (irrelevant). We will describe the generation of labeled data in Sect. 3.3. The core of our MDP is the representation of policy. A learned policy informs an entity summarizer of how to exploit irrelevant triples in user feedback to identify relevant triples in candidates for replacement. The decision should be conditioned on the current summary and user feedback as well as the user feedback in history. Therefore, the key to the design of our policy in the following is to properly represent all of these triples and their interdependence in a state. We design a novel deep neural network in Fig. 2 to represent θ-parameterized policy π θ . All the parameters in the network are collectively referred to as θ and will be jointly learned. We rewrite the four elements S i , F i , C i , f i of a state as three sets of triples which are fed into the network as input: the set of irrelevant triples crossed off till now, and -C i , the set of candidate triples for replacement. Below we detail the four modules of our policy network in Fig. 2 . We describe the encoding of a single triple, the encoding of a set of triples, and the encoding of triple interdependence. Based on the encoded triple interdependence in a state as the context, a candidate triple will be selected for replacement. Encoding Triples. For each input triple t, we jointly encode its textual semantics and structural features using an embedding layer converting t into a vector representation. Specifically, for each element of t, i.e., the subject, the predicate, Candidate Triples or the object of t, we obtain its textual form from its rdfs:label if it is an IRI or a blank node, or from its lexical form if it is a literal. We average the pre-trained fastText embeddings [3] for all the words in this textual form as a vector representation of the element to encode its textual semantics. Then we concatenate the vector representations of the three elements of t to jointly encode its textual semantics and structural features. Encoding Sets of Triples. A state Z i is fed into the network as three sets of triples: A representation of a set should be permutation invariant to the order of elements. Networks that are sensitive to the order of elements in the input (e.g., RNN) are not suitable. We use a multilayer perceptron (MLP) with two fully connected hidden layers of size 300 and 150, applying Leaky ReLU activations, to process each triple in a set. Then we perform average pooling over all the triples in the set to generate a vector representation for the set, which satisfies permutation invariance. Separate copies of this network (MLP S , MLP F , MLP C in Fig. 2 ) are used to encode the three input sets. Their vector representations are concatenated to represent state Z i . Encoding Triple Interdependence and Scoring Candidates. Finally we encode the interdependence of the three sets in Z i and each candidate triple t ∈ C i to score t. We concatenate the vector representation of Z i with the vector representation of t, and feed the result into an MLP with two fully connected hidden layers of size 64 and 1, applying Leaky ReLU activations. This MLP intercorrelates t and the three sets of triples in Z i to encode their interdependence, and its output is taken as the score of t, i.e., score(t|Z i , θ) in Eq. (5) . The score considers the current summary, user feedback in history, as well as other candidate triples. The scores of all the candidate triples in C i are normalized by a softmax layer into a probability distribution, i.e., π θ (t|Z i ) in Eq. (5). Selecting Replacement Triples. One candidate triple in C i will be selected as the replacement triple r i+1 . During training, the selection follows the current probability distribution π θ (t|Z i ) to address the well-known explorationexploitation trade-off in reinforcement learning. During testing, exploitation is primary, and hence we greedily select the candidate with the highest probability. Now we describe our learning algorithm and the generation of labeled data. To learn an optimal policy π θ to maximize J(θ) in Eq. (4), we implement REINFORCE [25] , a standard policy gradient method in reinforcement learning. Specifically, we update θ by computing the following gradient: Our implementation uses the Adam optimizer based on TensorFlow with learning rate = 0.01. In Eq. (6), we set the discount rate γ = 0.6 to reduce the influence of rewards after 10 iterations below 1% (i.e., 0.6 10−1 ≈ 0.01) because users may not be patient with many iterations of interaction. Generating Labeled Data. It is expensive and inflexible to train with real user feedback. We follow a standard synthetic setting in recent information retrieval research [11] to train our model with simulated user behavior. Simulation is based on relevance labels on triples, which can be easily obtained from ground-truth summaries provided by existing benchmarks for evaluating entity summarization such as ESBM [14] . Specifically, for an entity description Desc(e) and a ground-truth summary S gt thereof, a triple t ∈ Desc(e) is relevant if it appears in S gt , otherwise irrelevant. The rel function in Eq. (5) is defined accordingly: We follow a standard framework of simulating user behavior [11] and we adapt it to the cross-replace scenario over the entity summarization task. For entity e, an initial summary S 0 is generated under size constraint k = |S gt | using any standard entity summarizer. Then in the i-th iteration of the crossreplace scenario, a simulated user: (a) needs to decide whether to provide any feedback, and if so, (b) needs to select an irrelevant triple f i from the current summary S i to cross off. In our implementation we simulate a perfect user [11] who: (a) will stop providing feedback if and only if S i = S gt , and (b) always provides noise-free feedback, i.e., never mistakenly crosses off triples in S i ∩ S gt . We leave experiments with other user models (e.g., with noise) as future work. When S i = S gt , there may be more than one irrelevant triple in S i . Any of them could be crossed off. To let our simulated user behave consistently, we compute and cross off the triple with the highest degree of irrelevance (doi). We learn the doi of triple t by exploiting all the available ground-truth summaries, denoted by S GT . Existing benchmarks such as ESBM usually provide multiple ground-truth summaries created by different human experts for an entity. A triple that appears in fewer ground-truth summaries is more irrelevant. We implement this idea by feeding the vector representation of t defined in Sect. 3.2 into a two-layer neural network which outputs doi(t) ∈ [0, 1]. We train this network on S GT to minimize the following logistic loss function: where S gt is a ground-truth summary for entity e, and rel is defined by Eq. (7). In our first experiment, we carry out a preliminary user study with 24 participants. They are graduate students with at least a basic background in RDF and/or knowledge graphs. To the best of our knowledge, DRESSED is the first entity summarizer that can exploit user feedback. We compare it with 2 baselines: a state-of-the-art entity summarizer that cannot exploit user feedback, and a document summarizer that can exploit user feedback and is adapted to perform entity summarization. FACES-E [8] is a state-of-the-art entity summarizer. We obtain its implementation and configuration from its authors. FACES-E relies on UMBC's Sim-Service which is no longer available. We replace it with a string metric [20] . For entity e, FACES-E generates a ranking of triples in Desc(e) and chooses k top-ranked triples as a summary. While it cannot exploit user feedback, in each iteration we take the top-ranked candidate triple as the replacement triple. IPS [27] is a popular document summarizer that exploits user feedback to compute an improved document summary by selecting a new set of sentences. To adapt it to entity summarization, we transform each triple into a sentence by concatenating the textual forms of the three elements of the triple. We constrain the search space of IPS such that a re-computed summary differs from the current summary by exactly one triple. This triple will become the replacement triple. Originally, IPS only supports positive feedback. In our implementation we negate the effect of feedback to fit negative feedback in our scenario. Training and Tuning. Following Sect. 3.3, we train DRESSED with simulated user behavior over ground-truth summaries from ESBM v1.0. 2 Each groundtruth summary of an entity consists of 5 triples selected by a human expert from the original entity description. We set epoch = 100 and batchsize = 16. We also use this dataset to tune the two hyperparameters δ and λ of IPS. Their optimal values are found in the range of 0-10 using grid search. We use CD [26] to generate initial summaries throughout the experiment. However, we could not compare with CD because its output cannot be treated as a ranking of candidate triples like FACES-E. For each participant, we randomly sample 35 entities, including 25 entities from DBpedia version 2015-10 and 10 entities from LinkedMDB. Different entities may be assigned to different participants, and they are disjoint from the entities in ESBM which we use for training and parameter tuning. As a withinsubject design, for each entity, the participant starts from the initial summary and separately interacts with each summarizer to help to improve the summary. The three summarizers are provided in random order. The experiment is blind, i.e., the participant does not know the order of systems. In each iteration, the participant is required to cross off an irrelevant triple and then rate the relevance of the replacement triple. This rating, Q rplc , is in the range of 1-5. Participants are instructed to assess relevance with reference to a satisfying general-purpose summary. When the participant decides to stop providing feedback for this entity, s/he rates the quality of the final summary. This rating, Q stop , is in the range of 1-5. We also record the number of iterations till termination denoted by I. Table 1 presents the results of the online user study. We compare DRESSED with the two baselines and we perform two-tailed t-test to analyze whether their differences are statistical significant (p < 0.01). DRESSED is generally the best-performing approach. First, with FACES-E and DRESSED, participants stop quickly (I < 4) and obtain reasonably good summaries (Q stop > 4). The replacement triples selected by the feedback-aware DRESSED during the iterative process are significantly better than those of the feedback-unaware FACES-E according to the results of Q rplc , demonstrating the usefulness of user feedback and the effectiveness of our approach. Second, compared with DRESSED, participants using IPS perform significantly more iterations but the replacement triples and final summaries they receive are significantly worse. We will justify the performance of these systems in Sect. 5.5. Compared with evaluation with real user feedback, in recent information retrieval research [11] it has been more common to evaluate with simulated user behavior. By conducting this kind of offline evaluation, it would be more achievable and affordable to evaluate many methods at different time steps in varying conditions and, more importantly, the results would be easily reproducible. Our second experiment follows such a standard synthetic setting [11] , which has been adapted to the crossreplace scenario over the entity summarization task in Sect. 3.3. As described in Sect. 3.3, simulated user behavior is derived from ground-truth summaries. We obtain ground-truth summaries from the two largest available benchmarks for evaluating entity summarization: ESBM 3 and FED. 4 ESBM v1.0 provides 600 ground-truth summaries for entities in DBpedia version 2015-10, which we refer to as ESBM-D. It also provides 240 groundtruth summaries for entities in LinkedMDB, which we refer to as ESBM-L. FED provides 366 ground-truth summaries for entities in DBpedia version 3.9. In all these datasets, a ground-truth summary of an entity consists of 5 triples selected by a human expert from the original entity description. For each dataset, we partition ground-truth summaries and the derived user simulation into 5 equal-sized subsets to support 5-fold cross-validation: 60% for training, 20% for validation, and 20% for test. We compare DRESSED with 5 baselines. FACES-E [8] and IPS [27] have been described in Sect. 4.1. In this experiment we add three relevance feedback models for document retrieval as baselines. For document retrieval, NRF [24] is a well-known work that exploits negative relevance feedback, and PDGD [15] represents the state of the art in online learning to rank. Both of them re-rank documents based on user feedback. To adapt them to entity summarization, we transform each triple into a document by concatenating the textual forms of the three elements of the triple. The name of the entity to summarize is treated as a keyword query. After re-ranking, the top-ranked candidate triple is selected as the replacement triple. NRF has three strategies, among which we implement the SingleQuery strategy. In fact, the three strategies are essentially equivalent in our scenario where user feedback in each iteration is a single triple. For PDGD, we obtain its implementation and configuration from its authors. It has two variants: PDGD-L using a linear model and PDGD-N using a neural model. Training and Tuning. IPS has two hyperparameters δ and γ in the range of 0-10. NRF has three hyperparameters: k 1 in the range of 0-2, b in the range of 0-1, and γ in the range of 0.5-2. We tune them on the validation set using grid search. PDGD and DRESSED require training. We train their models on the training set. For DRESSED we set epoch = 50 and batchsize = 1. Initial summaries are generated using CD [26] throughout the experiment. Since we simulate a perfect user, it is meaningless to evaluate the quality of the final summary S I which is exactly the ground-truth summary S gt . Instead, we evaluate the iterative process, and we use two metrics for different elements of the process: NDCF for summaries, and NDCG for replacement triples. NDCF. Following ESBM, we assess the quality of a computed summary S i by comparing it with a ground-truth summary S gt and calculating F1: Note that in the experiments we have P = R = F1 because |S i | = |S gt | = 5. We evaluate a sequence of summaries S 1 , S 2 , . . . , S I computed during the iterative process. Considering that users will be better satisfied if high-quality summaries are computed earlier, we calculate the normalized discounted cumulative F1 (NDCF) over the first i iterations (1 ≤ i ≤ I): where β ∈ [0, 1] is a discount factor representing the decay of importance. We set β = 0.6 to reduce the influence of summaries after 10 iterations below 1% (i.e., 0.6 10−1 ≈ 0.01). The result of NDCF is in the range of 0-1. We are particularity interested in NDCF@I, which evaluates the entire iterative process. We can also assess the quality of a sequence of replacement triples r 0 , r 1 , . . . , r I−1 selected during the iterative process. We treat the sequence as a (partial) ranking of the triples in Desc(e). Considering that users will be better satisfied if relevant triples are selected earlier, we calculate the normalized discounted cumulative gain (NDCG) of the ranking at position i (1 ≤ i ≤ I): where rel is defined by Eq. (7) . NDCG has been widely used in information retrieval. The result is in the range of 0-1. We are particularity interested in NDCG@I, which evaluates the entire iterative process. Table 2 . Overall results of offline evaluation (mean ± standard deviation). For each method, significant improvements and losses over other methods are indicated by (p < 0.01) or (p < 0.05), and by (p Table 2 presents the overall results of the offline evaluation. We compare DRESSED with the five baselines and we perform two-tailed t-test to analyze whether their differences are statistical significant (p < 0.01 and p < 0.05). DRESSED significantly (p < 0.01) outperforms FACES-E, IPS, and NRF in terms of both NDCF@I and NDCG@I on all the three datasets. FACES-E is better than IPS. These results are consistent with the results of our online user study reported in Sect. 4. PDGD-L and PDGD-N are stronger baselines. These latest online learning to rank models achieve better results, but still, DRESSED significantly (p < 0.01) outperforms them in terms of both NDCF@I and NDCG@I on ESBM-L and FED. However, the difference between DRESSED and PDGD-N in NDCF@I is not significant (p < 0.05) on ESBM-D. In Fig. 3 and Fig. 4 , we plot NDCF@i and NDCG@i for 1 ≤ i ≤ 10, respectively. The results reflect user experience over varying numbers of iterations. DRESSED is consistently above all the baselines in terms of both NDCF@i and NDCG@i on all the three datasets. It establishes superiority when i is very small, i.e., DRESSED better exploits early feedback to quickly improve computed summaries. In particular, NDCG@1 in Fig. 4 indicates the proportion of iterative processes where the first replacement triple is relevant. We observe a very high value NDCG@1 = 0.782 achieved by DRESSED on ESBM-L. We try to partially justify the performance of the participating systems. FACES-E is a state-of-the-art entity summarizer but cannot exploit user feedback. Compared with FACES-E, the better performance of DRESSED demonstrates the usefulness of user feedback and the effectiveness of our exploitation. IPS generates a summary having a similar word distribution to the original data. This feature is useful in document summarization but less useful when it is adapted to entity summarization. For example, entity descriptions in Linked-MDB often contain many triples about the performance property, which are thus favored by IPS but are rarely included in ground-truth summaries. NRF relies on word distributions and exact word matching. Such simple text processing techniques are not very effective when processing the textual form of a triple, which can be very short and shows sparsity. By contrast, DRESSED concatenates word embeddings to represent a triple with both textual and structural features to more comprehensively exploit the semantics of the triple. The two variants of PDGD represent the state of the art in relevance feedback research. Compared with PDGD, one possible reason for the better performance of DRESSED is that we directly and comprehensively model the interdependence of the current summary, user feedback, as well as the user feedback in history, whereas PDGD does not explicitly model all such interdependence. We discuss three related research topics. Entity Summarization. Entity summarization has been studied for years. RELIN [4] computes informativeness. DIVERSUM [21] improves the diversity of an entity summary by choosing triples with different properties. CD [26] jointly optimizes informativeness and diversity. More than that, FACES [7] and its extension FACES-E [8] consider the frequency of property value, while LinkSUM [22] computes PageRank. ES-LDA [16, 17] relies on a Latent Dirichlet Allocation (LDA) model. However, none of these methods could exploit user feedback to compute improved summaries. From a technical perspective, these methods are unsupervised, whereas our model is based on reinforcement learning. Some application-specific entity summarizers [10, 23, 28] are supervised based on a set of carefully designed features. Our approach avoids such manual feature engineering and learns deep representations of triples and their interdependence. Document Summarization with User Feedback. Some document summarizers exploit user feedback to compute improved summaries. They allow users to select interesting topics [12] , keywords [18] , or concepts (e.g., named entities) [2] . IPS [27] is the most similar method to our approach. It supports clicking an interesting sentence in a document summary, and leverages this feedback to compute an improved summary. Compared with unstructured documents, we process structured triples and we encode both the textual semantics and the structural features of a triple. Besides, the above summarizers are unsupervised and stateless, while we model a reinforcement learning agent and we represent the entire iterative process as a Markov decision process. Relevance Feedback Models. Relevance feedback improves the quality of document retrieval based on user-provided relevance judgments. Wang et al. [24] implement a set of methods including the well-known Rocchio algorithm which is based on the vector space model. It modifies the query vector according to user-specified relevant and irrelevant documents. Recent online learning to rank models formulate document retrieval with iterative user feedback as a reinforcement learning problem-usually a dueling bandit problem. A state-of-the-art method is PDGD [15] , which constructs a pairwise gradient to infer preferences between document pairs from user clicks. As described in Sect. 5.2, these methods can be adapted to perform entity summarization with user feedback, but their effectiveness may be affected by the adaptation. In document retrieval, there is a query, and the order of the retrieved documents is often used to interpret user feedback. However, in entity summarization, there may not be any query, and the triples in an entity summary can be presented in any order. We presented the first attempt to improve entity summarization with user feedback. Our reinforcement learning based modeling of the task and, in particular, our deep neural policy network for representing triples and their interdependence, showed better performance than a wide variety of baselines. Our approach has the potential to replace static entity cards deployed in existing applications to facilitate task completion and improve user experience. Our encoder for triples and their interdependence may also find application in other knowledge graph based tasks like entity clustering. We studied the cross-replace scenario but our implementation can be easily extended to support other scenarios, e.g., crossing off multiple triples in each iteration, crossing without replacement, or providing positive feedback. We will experiment with these extensions in future work. To further improve generalizability, e.g., to deal with paths and structures more complex than triples, one may extend the scope of entity summary with concepts like concise bounded description [19] or RDF sentence [29] , to better process blank nodes in RDF. Joint optimization of user-desired content in multidocument summaries by learning from user feedback Joint optimization of user-desired content in multidocument summaries by learning from user feedback Enriching word vectors with subword information RELIN: relatedness and informativeness-based centrality for entity summarization C3D+P: a summarization method for interactive entity resolution Summarizing entity descriptions for effective and efficient human-centered entity linking FACES: diversity-aware entity summarization using incremental hierarchical conceptual clustering Gleaning types for literals in RDF triples with application to entity summarization Relatedness-based multi-entity summarization Dynamic factual summaries for entity cards To model or to intervene: a comparison of counterfactual and online learning to rank from user interactions iNeATS: interactive multi-document summarization Entity summarization: state of the art and future challenges ESBM: an entity summarization benchmark Differentiable unbiased online learning to rank Combining word embedding and knowledge-based topic modeling for entity summarization ES-LDA: entity summarization using knowledge-based topic modeling A multiple-document summarization system with user interaction CBD -concise bounded description A string metric for ontology alignment The notion of diversity in graphical entity summarisation on semantic knowledge graphs LinkSUM: using link analysis to summarize entity data Contextualized ranking of entity types based on knowledge graphs A study of methods for negative relevance feedback Simple statistical gradient-following algorithms for connectionist reinforcement learning CD at ENSEC 2016: generating characteristic and diverse entity summaries Summarize what you are interested. In: An optimization framework for interactive personalized summarization Summarizing highly structured documents for effective search interaction Ontology summarization based on RDF sentence graph Acknowledgments. This work was supported in part by the NSFC under Grant 61772264 and in part by the Qing Lan Program of Jiangsu Province.