key: cord-0597664-g7wemmpb authors: Jung, Jaehun; Jung, Jinhong; Kang, U title: T-GAP: Learning to Walk across Time for Temporal Knowledge Graph Completion date: 2020-12-19 journal: nan DOI: nan sha: fa9fad2d86531cff22983a4a1c56b5258f1fd5b2 doc_id: 597664 cord_uid: g7wemmpb Temporal knowledge graphs (TKGs) inherently reflect the transient nature of real-world knowledge, as opposed to static knowledge graphs. Naturally, automatic TKG completion has drawn much research interests for a more realistic modeling of relational reasoning. However, most of the existing mod-els for TKG completion extend static KG embeddings that donot fully exploit TKG structure, thus lacking in 1) account-ing for temporally relevant events already residing in the lo-cal neighborhood of a query, and 2) path-based inference that facilitates multi-hop reasoning and better interpretability. In this paper, we propose T-GAP, a novel model for TKG completion that maximally utilizes both temporal information and graph structure in its encoder and decoder. T-GAP encodes query-specific substructure of TKG by focusing on the temporal displacement between each event and the query times-tamp, and performs path-based inference by propagating attention through the graph. Our empirical experiments demonstrate that T-GAP not only achieves superior performance against state-of-the-art baselines, but also competently generalizes to queries with unseen timestamps. Through extensive qualitative analyses, we also show that T-GAP enjoys from transparent interpretability, and follows human intuition in its reasoning process. Knowledge graph (KG), due to its expressiveness over structured knowledge, has been widely used in various applications including recommender system (Wang et al. 2019) , information retrieval (Liu et al. 2018) , synonym discovery (Kang et al. 2012; Papalexakis et al. 2013) , concept discovery (Jeon et al. 2015 (Jeon et al. , 2016 , and question answering (Zhang et al. 2018) . Moreover, the inherent sparseness of KGs gave rise to research interests over automatic knowledge graph completion, predicting missing entity for incomplete queries in form of (subject, predicate, ?) . Recent advancements in KG completion tasks have extended to a more challenging domain of temporal knowledge graphs (TKGs), as they model realistic events that are only temporarily valid. Triples in temporal graphs are annotated with corresponding time token, taking form of (subject, predicate, object, timestamp) . Naturally, TKG completion task can be formulated as predicting missing tail entity for queries in form of (subject, predicate, ?, timestamp) . Majority of existing approaches to TKG completion propose a straightforward extension of conventional KG embeddings onto the temporal graphs (Dasgupta et al. 2018; Lacroix, Obozinski, and Usunier 2019) . Although these approaches are generally successful in TKG completion, we observe two major rooms of improvement from the existing models, each from the encoding phase, and the decoding phase. In the encoding phase, a model could benefit from the rich neighborhood information residing in the structure of TKGs. Extracting, and encoding query-relevant information from the neighborhood nodes and their associated edges would help in fine-grained modeling of entity representation. The importance of neighborhood encoding has already been appreciated in static KGs (Chang et al. 2018; Nathani et al. 2019) , but extension of these models to TKG is non-trivial, due to the additional time dimension in each triple. Next, in the decoding phase, relational reasoning on TKG could leverage path-based inference. Several works adopted path-traversal model in static KGs Das et al. 2018) , showing preferable performance in relational reasoning compared to embedding-based models. Although pathbased inference helps in capturing long-term dependency between nodes and gives better interpretability over model's reasoning process, these approaches are yet to be examined in TKG completion tasks. To this end, we propose T-GAP (Temporal GNN with Attention Propagation), a novel model for TKG completion, that tackles both challenges stated above. In the encoder, we introduce a new type of temporal graph neural network (GNN), which attentively aggregates query-relevant information from each entity's local neighborhood. Specifically, we focus on encoding the temporal displacement between the timestamps of the input query and each edge being encoded. An intuitive example is presented in Figure 1 . Evidently, the two most important facts to answer the given query infects, ?, 12/20) , are that A has been infected to COVID-19 at 12/18, and A met B at 12/19. Here, one should note that the valuable information lies in the fact that A got infected 2 days before the time of interest, not that he was infected at a specific day of 12/18. What matters most when accounting for temporal events, is the relative displacement between the event and the time of interest, rather than the absolute time of the event. To effectively capture the temporal displacement, our proposed encoder separately encodes both the sign of the displacement (i.e. whether the time of the event belongs to past, present, or future), and the magnitude of the displacement (i.e. how far is the event from the time of our interest). Also, T-GAP performs a generalized path-based inference over TKG, based on the notion of Attention Flow (Xu et al. 2018) . In each decoding step, our model explores KG by propagating attention value at each node to its reachable neighbor nodes, rather than sampling one node to walk from the neighborhood. The soft approximation of path traversal with attention propagation not only allows our model to be easily trained with end-to-end supervised learning, but also provides better interpretation over its reasoning process, compared to embedding-based models. In summary, our contributions are as follows: • We introduce a new GNN encoder that effectively captures query-relevant information from temporal KGs. • Based on the encoder, we present T-GAP, a novel pathbased TKG reasoning model. We examine T-GAP in 3 benchmark datasets in TKG completion task, and the quantitative metrics show clear improvement in all benchmarks compared to the state-of-the-art baselines. • By analyzing the inferred attention distribution, we show that T-GAP possesses clear interpretability over its reasoning process, which has not yet been extensively discussed in TKG domain. Various approaches have been made toward automatic completion of static KGs. Majority of conventional approaches propose embedding-based models, including translative (Bordes et al. 2013; Wang et al. 2014) , and factorizationbased models (Yang et al. 2014; Trouillon et al. 2016) . To complement for the weak representation power of KG embeddings, several recent works incorporate neural network either to the scoring function, or as an additional encoding layer. ConvE (Dettmers et al. 2018 ) adopts convolutional layer to model sophisticated interaction between entities in the scoring function. KBGAT (Nathani et al. 2019), and RGHAT (Zhang et al. 2020 ) adopt a variant of graph attention network to contextualize entity embedding with the corresponding neighborhood structure. DPMPN ) employs two GNNs to encode both the original graph and an induced subgraph, for a scalable learning of KG structure. We extend the neighborhood encoding scheme of these prior works to temporal graphs, especially focusing on query dependent encoding of KG structure. Existing works on TKG completion primarily focus on extending static KG embedding to dynamic graphs. Different models mainly differ in how to represent independent timestamps, and incorporate it to their scoring functions. HyTE (Dasgupta et al. 2018 ) extends TransH (Wang et al. 2014) , projecting entity and relation embedding to timespecific hyperplane. García-Durán et al. (2018) propose to represent temporal relation as a sequence of relation type and characters in the timestamp, and encode the sequence using RNN. TComplEx (Lacroix, Obozinski, and Usunier 2019) considers the score of each triple as canonical decomposition of order 4 tensors in complex domain, adding time embedding to the order 3 decomposition of ComplEx. Goel et al. (2019) suggest to learn entity representation that changes over time, transforming part of the embedding with sinusoidal activation of learned frequencies. Meanwhile, path-based reasoning has been actively employed for node prediction on knowledge graphs. Lin et al. (2015) infuse multi-hop path information into entity representation, using additive composition of relation embeddings. Das et al. (2018) and Lin, Socher, and Xiong (2018) consider KG reasoning problem as partially observed Markov Decision Process, training a policy network that starts traversing from the query head and reaches at the predicted tail entity. Xu et al. (2018) propose a soft approximation of path traversal with attention distribution. T-GAP aligns with these line of works by exploring informative paths relevant to the input query with attention propagation, to maximally utilize the KG structure during the decoding process. Lastly, we find connections to our work from recently suggested GNNs for dynamic graphs. Pareja et al. (2020) propose a variant of graph convolutional network (GCN) suited for TKG, employing RNN to evolve GCN parameters across time. Han et al. (2020) present Graph Hawkes Neural Network, modeling temporal dependency between events with Hawkes process. While these works focus on modeling the evolution of the graph as whole, we discuss a new methodology of encoding temporal displacement between each event and the input query, which better suits our goal to explore query-relevant paths and reach the answer node. Overview First, we denote TKG as infects, B, 12/20) a set of relations, and T KG is a set of timestamps associated with the relations. Given the graph G KG and query q = (v query , r query , ?, t query ), TKG completion is formulated as predicting u ∈ V KG most probable to fill in the query. We also denote ← − N i as a set of incoming neighbor nodes of v i , i.e. nodes posessing edges toward v i , and − → N i as a set of outgoing neighbor nodes of v i . Figure 2 illustrates an overview of T-GAP's reasoning process. T-GAP consists of 4 sub-modules: Preliminary GNN (PGNN), Subgraph GNN (SGNN), Subgraph Sampling, and Attention Flow. Given G KG and the query, in the encoding phase, T-GAP first uses PGNN to create preliminary node feature h i for all entities in the G KG . Next, at each decoding step t = 1, · · · , T , T-GAP iteratively samples a subgraph G i , incorporating the query vector q and the preliminary feature h i . As noted in Xu et al. (2019) , this additional encoding of induced subgraph not only helps in extracting only the query-relevant information from the original graph, but also scales up GNN by input-dependent pruning of irrelevant edges. Using both h i and g (t) i , Attention Flow computes transition probability to propagate the attention value of each node to its reachable neighbor nodes, creating the next step's node attention distribution a (t+1) i . After the final propagation step T , the answer to the input query is inferred as the node with the highest attention value a (T ) i . Given G KG , T-GAP first randomly initializes node feature h i for all v i ∈ V KG . Then, to contextualize the representation of entities in G KG with the graph structure, each layer in PGNN updates node feature h i of entity v i by attentively aggregating v i 's neighborhood information. The important intuition underlying PGNN is that the temporal displacement between timestamps of the query and each event is integral to capture the time-related dynamics of each en-tity. Therefore, for each timestamp t e of edge e in G KG , we resolve to separately encode the sign and magnitude of the temporal displacement t e = t e − t query . Concretely, PGNN computes message m ij from entity v i to v j as follows: ρ ij is a relation-specific parameter associated with r ij which denotes the relation that connects v i to v j . In addition to the entity and relation, we learn the discretized embedding of size of temporal displacement, i.e. τ | tij | . We take account of the sign of displacement by applying sign-specific weight for each event. Next, the new node feature h j is computed by attention weighted sum of all incoming messages to v j : The attention values are computed by applying softmax over all incoming edges of v j , with h j as query and m ij as key. In addition, we extend this attentive aggregation scheme to multi-headed attention, which helps to stabilize the learning process and jointly attend to different representation subspaces (Veličković et al. 2017) . Hence our message aggregation scheme is modified to: concatenating independently aggregated neighborhood features from each attention heads, where K is a hyperparameter indicating the number of attention heads. At each decoding step t, SGNN updates node feature g i for all entities that are included in the induced subgraph of current step, G sub . We present the detailed procedure of subgraph sampling in upcoming section. Essentially, SGNN not only contextualizes g i with respective incoming edges, but also infuses the query context vector with the entity representation. First, the subgraph features for entities newly added to the subgraph, are initialized to their corresponding preliminary features h j . Next, SGNN performs message propagation, using the same message computation and aggregation scheme as PGNN (Eq. 1-3), but with separate parameters: This creates an intermediate node feature g j . The intermediate features are then concatenated with query context vector q, and linear-transformed back to the node embedding dimension, creating new feature g j : where h query is the preliminary feature of v query , and ρ query is the relation parameter for r query . T-GAP models path traversal with the soft approximation of attention flow, iteratively propagating the attention value of each node to its outgoing neighbor nodes. Initially, the node attention is initialized to 1 for v query , and 0 for all other entities. Hereafter, at each step t, Attention Flow propagates edge attention a (t) ij and aggregates them to node attention a The key here is the transition probability T ij . In this work, we define T ij as applying softmax over the sum of two scoring terms, regarding both the preliminary feature h, and the subgraph feature g: The first scoring term accounts only for subgraph feature g i and g j , giving additional point to entities that are already included in the subgraph (note that g i is initialized to zero for entities not yet included in the subgraph). Meanwhile, the second scoring term could be regarded as exploring term, as it relatively prefers entities not included in the subgraph, by modeling the interaction between g i and h j . As T-GAP consists only of differentiable operations, the path traversal of T-GAP can be trained end-to-end by directly supervising on the node attention distribution after T propagation steps. We train T-GAP to maximize the log probability of the answer entity u label at step T . The decoding process of T-GAP depends on the iterative sampling of query-relevant subgraph G (t) sub . The initial subgraph G (0) sub before the first propagation step contains only one node, v query . As the propagation step proceeds, edges with high relevance to the input query, measured by the size of attention value assigned to the edges, are added to the previous step's subgraph. Specifically, the subgraph sampling at step t proceeds as follows: • Find x number of core nodes with highest (nonzero) node attention value a (t−1) i at the previous step. • For each of the core node, sample y number of edges that originate from the node. • Among x · y sampled edges, find z number of edges with highest edge attention value a (t) ij at the current step. • Add the z edges to G (t−1) sub . In this module, x, y, z are hyperparameters. Intuitively, we only collect 'important' events that originate from 'important' entities (core nodes) with respect to the query, while keeping the subgraph size under control (edge sampling). We provide an illustrative example on subgraph sampling in Appendix A. Note that although edge sampling brings in stochasticity to T-GAP's inference, this does not hinder the end-to-end training of the model. Since the sampling is not parameterized and we only use node feature g from the sampled subgraph, gradients back-propagate through g, not through the sampling operation. We evaluate our proposed method on three benchmark datasets for TKG completion: ICEWS14, ICEWS05-15, and Wikidata11k, all suggested by García-Durán, Dumančić, and Niepert (2018) . ICEWS14 and ICEWS05-15 are subsets of ICEWS 1 , each containing socio-political events in 2014, and from 2005 to 2015 respectively. Wikidata11k is a subset of Wikidata 2 , posessing facts of various timestamps that span from A.D. 20 to 2020. All facts in Wikidata11k are annotated with additional temporal modifier, occurSince or oc-curUntil. For the sake of consistency and simplicity, we follow García-Durán, Dumančić, and Niepert (2018) to merge the modifiers into predicates rather than modeling them in separate dimension (e.g. (A, loves, B, since, 2020) transforms to (A, loves-since, B, 2020)). Detailed statistics of the three datasets are provided in Appendix B. For each dataset, we create G KG with only the triples in the train set. We add inverse edges to G KG for proper pathbased inference on reciprocal relations. Also, we follow Xu et al. (2019) by adding self-loops to all entities in the graph, allowing the model to stay at the 'answer node' if it reaches an optimal entity in t < T steps. To measure T-GAP's performance in head entity prediction, we add reciprocal triples to valid and test sets too. For all datasets, we find through empirical evaluation that setting the maximal path length T = 3 results in the best performance. Following previous works, we fix the dimension of entity / relation / displacement embedding to 100. Except for the embedding size, we search for the best set of hyperparameters using gridbased search, choosing the value with the best Hits@1 while all other hyperparameters are fixed. We implement T-GAP with PyTorch and DGL, and plan to make the code publicly available. We provide further implementation details including hyperparameter search bounds and the best configuration in Appendix C. Table 1 shows the overall evaluation result of T-GAP against baseline methods. Along with Hits@1, 3, 10, we report MRR of the ground-truth entity, compared to the baseline methods. As seen in the table, T-GAP outperforms baseline models in all the benchmarks, improving up to 10% relative performance consistently over all metrics. We find through ablation study that the improvements are mainly contributed from resolving the two shortcomings of preexisting TKG embeddings, which we indicate in earlier sections -absence of 1) neighborhood encoding scheme, 2) path-based inference with scalable subgraph sampling. Table 3 : Generalization performance over unseen timestamps in ICEWS14. Accounting for relative displacement rather than independent timestamps, our model is the most robust to queries with unseen timestamps. To further examine the effect of our proposed method in solving the aforementioned two problems, we conduct an ablation study as shown in Table 2 . First, we consider T-GAP without temporal displacement encoding. In both PGNN and SGNN, we do not consider the sign and magnitude of temporal displacement, and simply learn the embedding of each timestamp as is. While computing the message m ij , the two GNNs simply add the time embedding τ tij to h i + ρ ij . No sign-specific weight is multiplied, and all edges are linear-transformed with a same weight matrix. In this setting, T-GAP's performance on ICEWS14 degrades about 30% in Hits@1, and performs similar to TA-DistMult in Table 1 . The result attests to the importance of temporal displacement for the neighborhood encoding in temporal KGs. Next, to analyze the effect of subgraph sampling on overall performance, we resort to a new configuration of T-GAP where no subgraph sampling is applied, and SGNN creates node feature g i for all entities in G KG . Here, T-GAP's per- formance slightly degrades about 1 percent in all metrics. This implies the importance of subgraph sampling to prune query-irrevlant edges, helping T-GAP to concentrate on the plausible substructure of the input graph. Finally, we analyze the effect of PGNN by training T-GAP with different numbers of PGNN layers. We find that T-GAP, trained with 1-layer PGNN, performs superior to the model without PGNN by absolute gain of 1% in MRR. However, adding up more layers in PGNN gives only a minor gain, or even aggravates the test set accuracy, mainly owing to early overfitting on the triples in train set. Generalizing to Unseen Timestamps We conduct an additional study that measures the performance of T-GAP in generalizing to queries with unseen timestamps. Following Goel et al. (2019) , we modify ICEWS14 by including all triples except those on 5 th , 15 th , 25 th day of each month in the train set, and creating valid and test sets using only the excluded triples. The performance of T-GAP against the strongest baselines in this dataset are presented in Table 3 . In this setting, DE-SimplE and T(NT)ComplEx perform much more similar to each other than in Table 1 , while T-GAP performs superior to all baselines. DE-SimplE shows strength in generalizing over time, as it represents each entity as a continuous function over temporal dimension. However, the model is weak when the range of timestamps is large and sparse, as shown for Wikidata in Table 1 . Meanwhile, TComplEx and TNTComplEx show fair performance in Wikidata, but poorly infer for unseen timestamps, as they only learn independent embeddings of discrete timestamps. On the contrary to these models, T-GAP not only shows superior performance in all benchmarks but also is robust to unseen timestamps, by accounting for the temporal displacement, not the independent time tokens. We provide a detailed analysis on the interpretability of T-GAP in its relational reasoning process. Relation Type and Temporal Displacement Intuitively, the query relation type, and the temporal displacement be-tween relevant events and the query are closely correlated. For a query such as (PersonX, member of sports team, ?, t 1 ), events that happened 100 years before t 1 or 100 years after t 1 will highly likely be irrelevant. On the contrary, for a query given as (NationX, wage war against, ?, t 2 ), one might have to consider those events far-off the time of interest. To verify whether T-GAP understands this implicit correlation, we analyze the attention distribution over edges with different temporal displacements, when T-GAP is given input queries with a specific relation type. The visualization of the distributions for three relation types are presented in Figure 3 . For all queries in the test set of WikiData11k with a specific relation type, we visualize the average attention value assigned to edges with each temporal displacement (red bars). We compare this with the original distribution of temporal displacement, counted for all edges reachable in T steps from the head entity v query (blue bars). Remarkably, on the contrary with the original distribution of high variance over wide range of displacement, T-GAP tends to focus most of the attention to edges with specific temporal displacement, depending on the relation type. For queries with relation award won, the attention distribution is extremely skewed, focusing over 90% of the attention to events with displacement = 0 (i.e. events in the same year with the query). Note that we have averaged the distribution for all queries with award won, including both temporal modifiers occurSince and occurUntil. The skewed distribution mainly results from the fact that the majority of the 'award' entities in Wikidata11k are annual awards, such as Latin Grammy Award, or Emmy Award. The annual property of the candidate entities naturally makes T-GAP to focus on clues such as the nomination of awardees, or significant achievement of awardees in the year of interest. Next, we test T-GAP for queries with relation member of sports team-occurUntil. In this case, the attention is more evenly distributed than the former case, but slightly biased toward past events. We find that this phenomenon is mainly due to the existence of temporally reciprocal edge in G KG , which is a crucial key in solving the given query. Table 4 : List of predominant edges for a case study. Numbers in the right are the corresponding edge attention assigned to each edge. Predicates in red color carry negative meaning, while predicates in blue color hold positive meaning. We find through the case study that the attention propagation allows T-GAP to fix its misleading focus on sub-optimal entities. Here, T-GAP sends more than half of the attention value (on average) to an event with relation member of sports team-occurSince, that happened few years before the time of interest. The inference follows our intuition to look for the last sports club where the player became member of, before the timestamp of the query. The third case with relation educated at-occurSince is the opposite of the second case. Majority of the attention have been concentrated on events in 1-5 years after the query time, searching for the first event with relation educated at-occurUntil, that happened after the time of interest. As the analysis suggests, T-GAP discovers important clues for each relation type, adequately accounting for the temporal displacement between the query and related events, while aligning with human intuition. We resort to a case study, to provide a detailed view on T-GAP's attention-based traversal. In this study, our model is given an input query (North Korea, threaten, ?, 2014/04/29) in ICEWS14 where the correct answer is South Korea. For each propagation step, we list top-5 edges that received the highest attention value in the step. The predominant edges and their associated attention values are shown in Table 4 . In the first step, T-GAP attends to various events pertinent to North Korea, that mostly include negative predicates against other nations. As seen in the table, the two plausible query-filling candidates are Japan, and South Korea. Japan receives slightly more attention than South Korea, as it is associated with more relevant facts such as "North Korea threatened Japan at May 12th". In the second step, however, T-GAP discovers additional relevant facts, that could be crucial in answering the given query. As these facts have either Japan or South Korea as head entity, they could not be discovered in the first propagation step, which only propagates the attention from the query head North Korea. T-GAP attends to the events (South Korea, threaten / criticize or denounce, North Korea) that happened only a few days before our time of interest. These facts imply the strained relationship between the two nations around the query time. Also, T-GAP finds that most of the edges that span from Japan to North Korea before/after few months the time of interest, tend to be positive events. As a result, in the last step, T-GAP propagates most of the node attention in North Korea to the events associated with South Korea. Highest attention have been assigned to the relation make statement. Although the relation itself does not hold a negative meaning, in ICEWS14, make statement is typically accompanied by threaten, as entities do formally threaten other entities by making statements. Through the case study, we find that T-GAP leverages the propagation-based decoding as a tool to fix its traversal over wrongly-selected entities. Although Japan seemed like an optimal answer in the first step, it understands through the second step that the candidate was sub-optimal with respect to the query, propagating the attention assigned to Japan back to North Korea. T-GAP fixes its attention propagation in the last step, resulting in a completely different set of attended events compared to the first step. Such an amendment would not have been possible with conventional approaches to path-based inference, which greedily select an optimal entity to traverse at each decoding step. In this paper, we propose a novel approach to TKG completion named T-GAP, which explores a query-relevant substructure of TKG with attention propagation. Unlike other embedding-based models, the proposed method effectively gathers useful information from the existing KG, by accounting for the temporal displacement between the query and respective edges. Quantitative results show that T-GAP not only achieves state-of-the-art performance consistently over all three benchmarks, but also competently generalizes to queries with unseen timestamps. Through extensive analysis, we also show that the propagated attention distribution well serves as an interpretable proxy of T-GAP's reasoning process that aligns with human intuition. Figure 4 is an illustrative example for the subgraph sampling procedure in T-GAP. The hyperparameters for the example are as follows: x = 2 (the maximum number of core nodes), y = 3 (the maximum number of candidate edges, considered by each core node), and z = 2 (the number of sampled edges added to the subgraph). In the initial state, the only node associated with nonzero attention a (0) i is the query head v query . Also, the initial subgraph G (0) sub consists only of the node v query . After the first propagation step t = 1, T-GAP first finds top-x core nodes (where x = 2) w.r.t. nonzero node attention scores a (0) i of the previous step t = 0. Since the only node with nonzero attention value is v query , it is retrieved as the core node. Next, T-GAP randomly samples at most y = 3 edges that originate from the core node (e.g. dashed edges with weights). Among the sampled edges, it selects top-z (where z = 2) edges in the order of edge attention values a (1) ij at the current step; then, they are added to G After the second propagation step t = 2, T-GAP again finds x core nodes that correspond to highest attention values a (1) i (e.g. nodes annotated with 0.1 and 0.2, respectively). Then, y outgoing edges for each core node are sampled; among x · y sampled edges, z edges with highest edge attention values a (2) ij are added to G (1) sub , creating the new subgraph G (2) sub . As seen in the figure, the incremental subgraph sampling scheme allows our model to iteratively expand the range of nodes and edges to attend, while guaranteeing that the critical nodes and edges in the previous steps are kept included in the latter subgraphs. By flexibly adjusting the subgraph related hyperparameters x, y, and z, T-GAP is readily calibrated between reducing computational complexity and optimizing the predictive performance. Intuitively, with more core nodes, more sampled edges, and more edges added to the subgraph, T-GAP can better attend to the substructure of TKG that otherwise might have been discarded. Meanwhile, with small x, y, and z, T-GAP can easily scale-up to large graphs by reducing the number of message-passing operations in SGNN. Figure 4 : Example of subgraph sampling in T-GAP. The graphs above represent the respective node / edge attention distribution at the initial state (t = 0), after the first propagation step (t = 1), and after the second propagation step (t = 2). The graphs below show the sampled subgraph at each step t. x (= 2) orange nodes are the core nodes retrieved at a step, and y (= 3) dashed edges from each core node are candidate edges for the sampled subgraph. Among the candidate edges, z (= 2) orange edges are newly added to the previous subgraph. Translating embeddings for modeling multi-relational data Structure-aware convolutional neural networks Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases using Reinforcement Learning Hyte: Hyperplane-based temporally aware knowledge graph embedding Learning Sequence Encoders for Temporal Knowledge Graph Completion Diachronic embedding for temporal knowledge graph completion Graph Hawkes Neural Network for Future Prediction on Temporal Knowledge Graphs Mining billion-scale tensors: algorithms and discoveries HaTen2: Billion-scale tensor decompositions Towards time-aware knowledge graph completion GigaTensor: scaling tensor analysis up by 100 timesalgorithms and discoveries Tensor Decompositions for Temporal Knowledge Base Completion Multi-Hop Knowledge Graph Reasoning with Reward Shaping Modeling Relation Paths for Representation Learning of Knowledge Bases Entity-Duet Neural Ranking: Understanding the Role of Knowledge Graph Semantics in Neural Information Retrieval Embedding models for episodic knowledge graphs Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs Large Scale Tensor Decompositions: Algorithmic Developments and Applications EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs Complex embeddings for simple link prediction Explainable reasoning over knowledge graphs for recommendation Knowledge Graph Embedding by Translating on Hyperplanes Dynamically Pruned Message Passing Networks for Large-scale Knowledge Graph Reasoning Modeling Attention Flow on Graphs