key: cord-0480207-735uooqr
authors: Cui, Jian; Kim, Kwanwoo; Na, Seung Ho; Shin, Seungwon
title: Meta-Path-based Fake News Detection Leveraging Multi-level Social Context Information
date: 2021-09-13
journal: nan
DOI: nan
sha: 0134a5da8a52d0b37462499e53ac999f6aaadb4e
doc_id: 480207
cord_uid: 735uooqr

Fake news, false or misleading information presented as news, has a significant impact on many aspects of society, such as in politics or healthcare domains. Due to the deceiving nature of fake news, applying Natural Language Processing (NLP) techniques to the news content alone is insufficient. The multi-level social context information (news publishers and engaged users in social media) and temporal information of user engagement are important information in fake news detection. The proper usage of this information, however, introduces three chronic difficulties: 1) multi-level social context information is hard to be used without information loss, 2) temporal information is hard to be used along with multi-level social context information, 3) news representation with multi-level social context and temporal information is hard to be learned in an end-to-end manner. To overcome all three difficulties, we propose a novel fake news detection framework, Hetero-SCAN. We use Meta-Path to extract meaningful multi-level social context information without loss. Meta-Path, a composite relation connecting two node types, is proposed to capture the semantics in the heterogeneous graph. We then propose Meta-Path instance encoding and aggregation methods to capture the temporal information of user engagement and produce news representation end-to-end. According to our experiment, Hetero-SCAN yields significant performance improvement over state-of-the-art fake news detection methods.

The wide dissemination of fake news has become a major social problem in the world. The most recent and infamous distribution of fake news was in the 2020 United States presidential election fraud [9] and COVID-19 rumors [1] . Both industry and government Anonymous Submission to The Web Conference, 2022 2021. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00 https://doi.org/10.1145/nnnnnnn.nnnnnnn are making efforts to prevent the spread of fake news [10] . Nevertheless, fake news verification still relies on human experts and their manual efforts in analyzing the news contents with additional evidence. Therefore, there should be an automatic and efficient way to identify the veracity of the news.

The most typical way to detect fake news is applying Natural Language Processing (NLP) techniques on the news content [15, 18] . Considering that even people struggle in identifying the news authenticity by the news content alone, these NLP solutions are ineffective. Thus, more information is required to improve fake news detection.

The first important information is the users in social media. Social media is one of the most influential mediums to propagate information, and it has become a common practice for people to share their thoughts in social media. Even though regular users use social media as a communication tool, some users, known as instigators, intentionally spread fake news. Instigators usually have a highly partisan-biased personal description and a lot of followers and followings, which is significantly different from the profiles of regular users (See in Figure 1 ). Therefore, analyzing the users engaged in the news can provide additional evidence for identifying news authenticity. The publisher information can also play an important role because certain partisan-biased publishers are more likely to publish fake news [3, 5, 6] . Information on users and publishers can be viewed as multi-level social context information, and they provide additional clues for fake news detection.

In addition to multi-level social context information, temporal information of user engagement (temporal information for short) is another instrumental information in fake news detection. Fake and real news show different propagation properties in social media: Fake news is periodically mentioned by people and usually lasts longer, but real news receives attention only at the beginning of the news publication [27] . In this context, the temporal information should be included in the news representation along with multilevel social context information.

Using multi-level social context and temporal information, however, leads to three chronic difficulties. Firstly, due to the heterogeneity of multi-level social context information, it is hard to use this information without loss. Secondly, temporal information is hard to be used along with multi-level social context information. The graph is a typical way to present social context and its connectivity to the news, but the graph itself has complications in Figure 1 : Example of fake news distribution and dissemination. Publishers publish the news, and users tweet the news. Some publishers are regarded as low credibility sources according to the famous fact-checking website, MBFC. User A is an example of an instigator in Twitter, and User B is an example of a regular user. presenting temporal information. The last difficulty is to learn the news representation end-to-end. Multi-level social context and temporal information are two different kinds of information, which increases the difficulty of adopting end-to-end learning while utilizing both information. To promise a high-performing fake news detection, it is necessary to adopt end-to-end learning. It enables us to eliminate the effect from the sub-tasks and optimize the training parameters with a single news detection objective.

To the best of our knowledge, existing approaches fail to address all three difficulties, so we propose a novel fake news detection framework, Hetero-SCAN , to tackle above-listed difficulties. In Hetero-SCAN , to preserve multi-level social context information, we use the Meta-Path. Meta-Path is a composite relation connecting two node types, aiming to capture the semantics in the heterogeneous graph. We define two Meta-Paths containing different aspects of news (users and publishers) to extract multi-level social context information without information loss. Moreover, Meta-Path instance encoding and aggregation methods are proposed to capture the temporal information of user engagement and learn the news representation end-to-end.

To show that our proposed method outperforms existing solutions, we test Hetero-SCAN with two real-world datasets [16, 32] , and the results show that Hetero-SCAN achieves significant improvement over previous approaches in terms of F1 score, accuracy, and AUC score. Our code with data is released on the GitHub 1 for reproducibility. Our major contributions are:

(1) We pose three chronic difficulties in social context aware fake news detection and address them by proposing a novel fake news detection framework, Hetero-SCAN . (2) We conduct diverse experiments on the two real-world fake news datasets, covering the broad definition of fake news (Section 3), and demonstrate that Hetero-SCAN shows better performance than existing solutions. (3) We provide new insights into the differences in the behavior of engaged users between intentional and unintentional fake news.

1 https://github.com/(anonymous)/hetero_scan Fake news detection methods can be categorized into two types: content-based and graph-based approaches. The content-based approach models the content of the news, such as headline or body text, to detect news authenticity. Some research on content-based approaches utilizes linguistic features such as stylometry, psycholinguistic properties, and rhetorical relations [12, 34, 35, 38] . Researchers also use Multi-modal approaches, the combination of visual and linguistic features to verify the news authenticity [20, 24, 36, 46, 48] .

The graph-based approach, also known as the social context aware approach, adds auxiliary information of the user or publisher to model the news. CSI [39] is a framework that aims to capture the information of users and their temporal engagements. CSI, however, does not consider publishers, and the connection between users and news was also ignored. Bi-GCN [11] and SAFER [13] use Graph Convolution Network (GCN) [25] to obtain the news representation with user information. However, they suffer from a severe information loss since they present news and user information in a homogeneous graph. In other words, they fail to taking the node and relation types into account. Most recently, FANG [32] is proposed to preserve information by dividing the fake news detection task into several sub-tasks, such as textual encoding and stance detection. Nonetheless, dividing into sub-tasks causes the error propagation problem: If the sub-tasks have errors, the errors can propagate up to the final news representation and thereby deteriorate the detection performance. AA-HGNN [37] uses adversarial active learning and extends Graph Attention Network (GAT) [45] into the heterogeneous graph to learn the news representation with limited training data. Information of users and their temporal engagement information, however, are not considered in AA-HGNN. Table 1 compares Hetero-SCAN and existing fake news detection methods.

Graph Neural Network, the extension of the deep learning method into graphs, shows its effectiveness in graph-represented data. The first method proposed is Graph Convolutional Network (GCN) [25] which aggregates the features from the adjacent nodes in the graph. To further improve it, some methods adopt the attention mechanism and random work with restart sampling strategy, namely Graph Attention Network (GAT) [45] and GraphSAGE [22] .

As these methods are designed for homogeneous graphs, they are not general enough to apply to the heterogeneous graph, so new approaches tailor to heterogeneous graphs are then proposed. To model the multi-relations in the graph, the Relation aware GCN (R-GCN) [40] is proposed first. HetGNN [51] uses a sampling strategy based on random walk with restart and Bi-LSTM to aggregate the node features in the heterogeneous graph. Later, the methods based on Meta-Path and attention mechanism, such as HAN [23] and MAGNN [19] , are proposed. Contrary to the amount of research done, the term fake news has only just been defined by the recent work of Zhou, Xinyi and Reza Zafarani [52] . They define fake news in two scopes, broad and narrow. The broad definition emphasizes the authenticity of the information, and the narrow one emphasizes the intentions of the author. Most research on fake news detection has employed a broad definition of fake news. We experiment on the two dataset (with and without intention) following broad definition and analyze how intention affect the performance of the detection (in Section 5.3).

A heterogeneous graph is defined as a graph G = (V, E) associated with a node type mapping function : V → A and an edge type mapping function : E → R. A and R denotes the predefined sets of node types and edge types, respectively, with |A| + |R| > 2. Definition 3.5 (Meta-Path Instance). Given a Meta-Path of a heterogeneous graph, a Meta-Path instance of is defined as a node sequence in the graph following the schema defined by .

To integrate multi-level social context information, we build a heterogeneous graph of news ( Figure 2 ). The graph consists of three types of nodes (publisher, news, and users) and four types of edges (citation, publication, tweet, and following). Formally, the heterogeneous graph of news is noted as G(V, E), and the set of three node types are symbolized as A = { , , }.

Before utilizing this heterogeneous graph, it is necessary to construct initial node features for three types of nodes in the graph. For news nodes, Doc2Vec [28] is applied to the news article to construct their initial features. The user and publisher nodes, however, need additional information to construct their respective initial features. Users' profiles are used for user nodes since the importance of the user profiles for detecting news authenticity has been proved by Shu, Kai et al. [43] . The distinct feature of each publisher is acquired from about-us pages on their official websites; If there is no aboutus page on the publisher's official website, we use Wikipedia's description instead. Doc2Vec is applied again to leverage these text contents. To also include the structural role they play in their respective networks, we apply Node2Vec [21] to capture user connections and citations among publishers as features. By concatenating the two vectors obtained from Doc2Vec and Node2Vec, we construct the initial features of user and publisher nodes. Figure 2 shows the overall node feature construction process.

After constructing initial node features, we then need to learn the news representation containing multi-level social context and temporal information. Multi-level social context information should be used without loss, which is the first difficulty in social context aware fake news detection. To address this difficulty, we use the concept, Meta-Path (defined in Section 3). Meta-Paths can be used to extract meaningful social context with respect to publishers and users. We define two Meta-Paths that reflect the method used for actual news verification. When people verify the news authenticity, they need to cross-check both publisher and the news published by this publisher. The same goes for users: User information, as well as the news tweet by the user, needs to be reviewed. From these two intuitions, a set of Meta-Path P that we find useful is defined as below:

where P :

After defining a set of Meta-Path, we extract Meta-Path instances following each Meta-Path, P or P , for each target news node. To efficiently extract Meta-Path instances, we first divide the whole graph into two sub-graphs, which only contain the nodes types specified in the Meta-Path, P or P . Then, in each sub-graph, the Meta-Path instances following each Meta-Path are extracted. The corresponding collection of features are fed into Hetero-SCAN to get the final representation of the target news node. The sets of instances following two Meta-Path P and P are denoted as P and P respectively. For instance, if we want to extract the Meta-Path instances of the target news node 2 in Figure 3 , we first divide the whole graph into two sub-graphs. One is composed of news and publisher nodes, and the other is made of news and user nodes. Then, the Meta-Path instances follow Meta-Path P or P are selected from each sub-graph, and the corresponding features of nodes along these Meta-Path instances will be prepared for our model. In particular, the Meta-Path instance 1 is made of features of nodes following Meta-Path P : → ℎ → , which is 1 , 1 and 2 in the graph. In the same manner, 2, 3 and 4 are extracted. For the target node , we use P and P to denote the set of Meta-Path instances follow each Meta-Path. In this case, P = { 1, 2} and P = { 3, 4} are set of Meta-Path instances of target node 2 .

There are usually a large number of users engaged per news in the real world. To cope with this situation, we extract Meta-Path instances from our heterogeneous graph of news with random sampling. Specifically, a certain number of Meta-Path instances are randomly sampled for each news node according to a pre-defined Meta-Path. At last, in order to capture the temporal information, the model should be aware of the chronological information of Meta-Path instances. Thus, the Meta-Path instances from the Meta-Path P are sorted chronologically before being fed into the proposed model. In the following sections, we assume that the Meta-Path instances from P are sorted in chronological order.

Hetero-SCAN takes in vectors from the previous step as input and processes them through four steps as shown in Figure 4 to tackle the yet addressed chronic difficulties.

Node Feature Transformation. The initial node features have different dimensions since different sources and techniques are used in the feature engineering process (Section 4.1). To make them lie in the same latent space, we apply the type-specific linear transform on the features of each type of node. Type-specific transformation refers to the linear projection of a vector into another dimension for each type of node in the graph. The transformed feature for a node ∈ V of type ∈ A is:

where x ∈ R is the initial feature of node , and W ∈ R ′ × is the learnable type-specific weight matrix for node type .

Meta-Path Instance Encoding. The first step transformed all the features of the node into the same dimension. We then need to efficiently summarize the Meta-Path instances for the remaining aggregation steps, which is important in capturing temporal information and learning the representation end-to-end. To efficiently encode node features, we adopted the method that shows excellent performance in knowledge graph triple embedding [17, 44, 49] . The major advantage of using knowledge graph triple embedding is the structural similarity between knowledge graph triples and our Meta-Paths. In the knowledge graph, the knowledge graph triple usually refers to the subject, predicate, and object ( , , ). The Meta-Path we defined is similar to the knowledge graph triple in a sense that Meta-Path is the same format along with one more entity and relation. Formally,

where is target node, and refer to the nodes along the Meta-Path. Considering the Meta-Path we defined in the Section 4.2, ∈ , ∈ , and ∈ { , }. The and −1 is the relation between , and , respectively. h is the transformed embedding of the node as we stated in Section 4.3.1, and e is the embedding of the knowledge graph triple.

Several research on knowledge graph domain tackle the triple embedding problem [17, 44, 49] . We use TransE [49] as our main encoding method for the proposed model. TransE [49] represents relations as translations, so the object vector e in the triple is considered as a translation of subject vector e on predicate vector e . Other than TransE, RotatE [44] and ConvE [17] knowledge graph embedding methods are also examined in our work. Ablation study on different knowledge graph triple embedding methods and their descriptions are provided in the Appendix.

In knowledge graph, there are usually explicit features for predicated (e in Equation 3), but in our case, there is no explicit features for the relations ( in Equation 3), so we use learnable embedding vectors to present relations. Inverse relationships, such as ℎ − and − ℎ , are represented by taking the sign inverses. For instance, if we define as the embedding of

The existing knowledge graph triple embedding methods explained above are designed for two nodes and the relation between them. In our Meta-Path, we have a total of three nodes and two relations in a Meta-Path instance. We deal with this by slightly tuning the formulation to fulfill our needs. The original formulation of knowledge graph triple embedding methods and ours are summarized in Table 2 . In this table, theh means the reshape of 

vector h in a 2D form, and the ⊙ and ∥ represent the element-wise product and concatenation of vector, respectively.

Meta-Path Instance Aggregation. The encoded vectors from two different Meta-Paths are aggregated by using different methods.

The encoded vectors from Meta-Path P : → ℎ → contain information of other news from the same publisher. Among the news published by the publisher, not all news will contain valuable information for detection. Thus, the model should 'focus' on some of the news published by this publisher and include this information in the aggregated representation. For each Meta-Path instance ∈ P :

where is the attention value calculated by multiplying encoded Meta-Path instance h with attention vector a ∈ R 2 ′ , and it is normalized by a softmax function over all Meta-Path instances of the target node , the result is denoted as above. To alleviate the effect of the high variance of the data in a heterogeneous graph, we adopt multi-head attention mechanism. independent attention mechanisms execute the transformation as shown in Equation 6 , and their features are concatenated after they pass the activation function . The output feature representation can be formulated as:

where [ ] is the normalized attention value of Meta-Path instance of target node at the -th attention head.

Temporal information of user engagement is another critical feature to determine the veracity of the given news, and incorporating this information is the second difficulty to resolve. To capture the temporal information, we aggregate the Meta-Path instances follow P :

→ → through Recurrent Neural Network (RNN). Since Meta-Path instances are already encoded in the previous step, we can directly feed them into the RNN. There are usually a large number of users engaged per news, so we choose GRU [14] as our RNN unit to avoid the vanishing or exploding gradients problem.

The last hidden state of the GRU is used for the downstream task as it is the high-level representation that summarizes the temporal information of the user engagement.

Two vectors, h P and h P , from previous step represents two different aspects of the news. The final news representation is produced by fusing these two vectors, which enables us to learn the news representation end-to-end (the third difficulty). As two Meta-Paths show two different aspects of a given news, the model should be able to weigh the importance of the two aspects with different news. To this end, we adopt another attention mechanism. Before applying the attention mechanism, non-linear transformations are applied to summarize h P and h P . Thus for ∈ {P , P }:

Here, M ∈ R × ′ and b ∈ R is a learnable weight matrix and bias vector. V is the set of news nodes.

Then we apply the attention mechanism to aggregate two vectors to obtain our final news representation h .

where ∈ R is the attention vector and is the normalized importance of Meta-Path .

The final representation of the target news vector is passed to the classification layer to get the classification result. During training, our predictions and labels are used to calculate the loss, and we update the learnable parameters of the model by using the backpropagation algorithm. The loss function used in Hetero-SCAN is cross-entropy loss, which is:

The overall all learning algorithm is summarized in Algorithm 1 (Appendix).

To test the effectiveness of our method, we conducted our experiments with two real-world datasets: FANG [32] and FakeHealth [16] .

The dataset FANG was composed in a study by Nguyen et al. [32] based on the datasets collected by related work on rumor and news classification [26, 29, 41] . The original news content was obtained through the provided news url, and for the 100 news urls that did not have the news content available, resorted to manually searching the news title for the content. From provided tweet ids, users and their profiles on Twitter could be found through the Twitter API [8] . The labels of the news in FANG are obtained from two well-known fact-checking websites: Snopes [7] and PolitiFact [4]. FakeHealth is another publicly available benchmark dataset for fake news detection, mainly focused on the healthcare domain. The dataset consists of two subsets, HealthStory and HealthRelease; HealthStory was used in our study due to the number of news articles in HealthRelease being too small. HealthStory is collected from the healthcare information review website HealthNewsReviews [2] . On this website, the professional reviewers gave scores of 1 to 5 for each news. Similar to the original study that published the FakeHealth dataset, an article is considered as fake if the score is less than three and real otherwise. The detailed statistics of the dataset used in our experiment are listed in Table 4 . In each dataset, we used 70% of news articles as our training set, and the remaining 30% of news articles are further divided into equal sizes of validation and test set. For the hyper-parameters, the transformed hidden dimension and the learning rate are set to 512 and 0.0001, respectively. The early-stopping training strategy with patience 20 is adopted to avoid overfitting. Since fake news detection is a binary classification problem, the real class was treated as positive and the fake class as negative.

We trained Hetero-SCAN by connecting the output representation to a fully connected layer to classify the news. After training, we evaluated our news representation with five classical machine learning baselines, such as Naive Bayes, Logistic Regression, etc. The metrics used for comparison are precision, recall, accuracy, F1 score, and AUC score, and the evaluation results are summarized in Table 3 . Table 3 , the trained classification layer gives relatively better results than other machine learning algorithms in terms of F1 score and accuracy because the classification layer is optimized by classification objective (cross-entropy loss). In terms of AUC score, SVM gives a better result, but in terms of standard deviation, random forest generally gives more stable results. Based on this, random forest is chosen as the classification algorithm for upcoming evaluations. Regardless of downstream classification methods, Hetero-SCAN surpass any existing fake news detection methods (details in Section 5.4). In the dataset -HealthStory, Hetero-SCAN does not give an ideal result. The explanation for the result on the HealthStory dataset is discussed in the next section.

Wardle et al. [50] published a report about information disorder on the Council of Europe in 2017. The report intends to examine information disorder and its related challenges. The authors argue that a large portion of the word 'fake news' consists of three concepts: misinformation, disinformation, and malinformation. They point out the importance of distinguishing the fake news in accordance with creators' intention and provide the definition of three terms: Definition 5.1 (Disinformation). Information that is false and deliberately created to harm a person, social group, organization or country.

Information that is false, but not created with the intention of causing harm.

Information that is based on reality, used to inflict harm on a person, organization or country.

According to the definition of malinformation, it is the information based on reality, while the fake news we talk about in this paper is false information. Therefore, we mainly consider disinformation and misinformation here, which are classified according to the news creator's intention. Considering the definition of fake news given in Section 3, the narrow definition of fake news only covers disinformation, but the broad definition of fake news covers both disinformation and misinformation.

The dataset FANG is mainly composed of checked news from PolitiFact and Snopes, which are political-related fact-checking websites. Thus, the fake news in this dataset is either partisan-biased news or some false information to demean certain politicians, which are considered as information intended to harm the specific person or the organizations. Hence, the fake news in this dataset can be considered as disinformation. The news in HealthStory is collected and fact-checked from Health News Review where evaluates and rates the completeness, accuracy, and balance of news stories that include claims about medical treatments, health care journalism, etc. Most of this information is not spread deliberately to harm anyone, so the fake news in the HealthStory dataset can be regarded as misinformation. Figure 5 compares the number of engaged users along with the time to see how people react to disinformation, misinformation, and real news. As shown in Figure 5 , the disinformation (fake in the left) has many periodic spikes, which means the users periodically talk about disinformation. On the contrary, the misinformation (fake in the right) does not have any periodic spikes and converges to zero not long after the news is published, which is similar to the real news. As such, disinformation behaves significantly differently from real information, but misinformation behaves in a similar manner to real news. To see the impact of temporal information in Hetero-SCAN , we replace the RNN in Hetero-SCAN with attention mechanism. In other words, we checked the detection performance difference between the Hetero-SCAN with and without temporal information. We set the hyperparameters the same for both approaches for a fair comparison, with Random Forest chosen as the classification algorithm. The evaluation result on the datasets can be found in Table 5 . The results show that the RNN based approach performs better than the other one in FANG dataset, but for the HealthStory dataset, the performance is better when the attention is applied. This means the existence of temporal information is not helpful in detecting misinformation. Furthermore, in FANG dataset, the validation loss of Hetero-SCAN with RNN converges much faster than the one with attention mechanism; by contrast, the convergence speed of the two approaches is similar in the HealthStory dataset. (See Figure 6 in Appendix)

To sum up, in a dataset has temporal behavior difference between real and fake class (i.e., disinformation dataset), Hetero-SCAN with RNN not only improves the performance of the fake news detection but also accelerates the learning speed.

To show that Hetero-SCAN is superior to other fake news detection, we compared Hetero-SCAN with other existing fake news detection methods. The bench-marked detection methods can be categorized into text-based approaches and graph-based approaches. For text-based approach, we use three different document embedding methods, TF-IDF, LIWC [33] , and Doc2Vec [28] , combined with SVM as baselines; and several representative graph-based fake news detection frameworks [13, 32, 37, 39] are also compared in this experiment.

Hetero-SCAN is also compared with some Graph Neural Network (GNN) methods to show that Hetero-SCAN is better than just simply applying the GNN on the graph. The basic GNN methods [22, 25, 45] , as well as the methods tailor to the heterogeneous graph, are compared [23, 40] . The brief descriptions of the aforementioned fake news detection methods and GNN baselines we compared with are listed in the Appendix. The results of Table 6 indicates that Hetero-SCAN outperforms existing text-based or graph-based fake news detection methods. This is because these existing approaches cannot produce representation with rich social context and temporal information as Hetero-SCAN do, i.e., they fail to tackle all three difficulties. CSI and SAFER, for example, did not use multi-level social context, and they also incurred some information loss as they ignored the node and relation types. AA-HGNN, including SAFER, miss temporal information in the news representation. AA-HGNN also did not use users as social context. FANG performs better than these methods since it tries to preserve multi-level social context and temporal information. To preserve information, FANG divides the fake news detection task into several sub-tasks, and each sub-task deals with certain information. Dividing into several sub-tasks is ineffective because errors in sub-task will be propagated up to the final news representation and thus harm the detection performance. As such, the result emphasizes the importance of resolving the proposed three difficulties in fake news detection.

For GNN baselines, the graph embedding methods made for homogeneous graphs, such as GCN, GAT, and GraphSAGE, did not give ideal results since node types and relations are ignored in these cases. R-GCN and HAN, which are designed for heterogeneous graph, also has no significant improvement, which implies that Hetero-SCAN is better than a simple application of these graph embedding methods on the heterogeneous graph of news. The fail of GNN baselines target on the heterogeneous graph can attribute to the missing temporal information of user engagement, which is the second difficulty that needs to be resolved in the social contextaware fake news detection.

Normally, the fake news dataset has limited training data due to the large-scale requirement of human labor, so the model should work well in the circumstance of limited training samples. To show that Hetero-SCAN outperforms existing methods given the circumstance of scarce training data, we gradually enlarge the training data, from 10% to 90%, and compare the fake news detection result with existing methods. Table 7 shows the comparison result. The AUC score of Hetero-SCAN achieves over 0.8 with only 30% of training data and even outperforms the rest of the methods with 90% of the training data. AA-HGNN is designed to overcome the scarcity of training data issues in the fake news detection task, but Hetero-SCAN is still better than AA-HGNN even when the size of training data is small.

A deep learning based approach dealing with graph-structured data should have generality to produce practical predictions for unseen data. A method is an inductive approach if it can generate embeddings for the nodes that were not seen during training. In contrast, it is called a transductive approach if the method cannot generate embeddings for the nodes appearing in the testing phase for the first time. For example, GCN is inductive, whereas Node2Vec is transductive.

In graph-based fake news detection, unseen nodes can appear in the testing phase. It might be newly published news, new publishers, or new users. Some approaches using matrix decomposition [39, 42] are not able to generate embedding for newly published news with social context information. In Hetero-SCAN , however, the learnable parameters in our model are used after Meta-Path extraction with random sampling, and they are shared by all nodes. Therefore, our method is highly inductive, that is, Hetero-SCAN can generate news embeddings that are not seen during the training.

As expected, a single news article is engaged with by a large number of users. Using every single user's information as a feature is therefore impractical, and we eventually used simple random sampling to select a certain number of users. Therefore, an improved method of screening important users is necessary for fake news detection to overcome the limitation. In addition, to apply the proposed method, we must first identify the relevant tweets for particular news. Since this paper focuses primarily on identifying the news in the context in which news and related tweets are given, finding relevant tweets for particular news is left as future work.

Fake news is a critical social problem threatening many aspects of the lives of the general public. We pose three difficulties in social context aware fake news detection and address them by proposing a novel fake news detection framework Hetero-SCAN . Our model overcomes the shortcomings of the previous graph-based approaches and exhibits state-of-the-art performance. We also provide insight about misinformation and disinformation by clarifying their different propagation properties. Hetero-SCAN can be of aid in future studies not only residing to fake news detection but also various events concerning disinformation. 

• FANG [32] : FANG divides the detection task into several subtasks, such as textual encoding and stance detection. The final detection object is optimized by defining loss functions for those sub-tasks. • AA-HGNN [37] : AA-HGNN uses active learning to tackle the limited training data problem and extends GAT [45] to learn the news representation in the graph.

GNN baselines:

• GCN [25] : GCN is a deep learning based method on a graphstructured data. Each node is learned by aggregating the feature information from its neighbors and the feature of itself. • GAT [45] : GAT is similar to GCN, but it introduces the attention mechanism to replace the statically normalized convolution operation in GCN. • GraphSAGE [22] : GraphSAGE is a general inductive framework that learns a node representation by sampling its neighbors and aggregating features of sampled nodes. • R-GCN [40] : R-GCN is an application of the GCN framework for modeling relational data. In R-GCN, edges can represent different relations. • HAN [47] : HAN is an extension of GAT on the heterogeneous graph. Meta-Path extraction strategy and attention mechanism are adopted to learn the representation of a node.

In Section 5.3, to see the impact of temporal information in Hetero-SCAN , we replace the RNN with attention mechanism. In other words, we compare the Hetero-SCAN trained with and without temporal information. These two Hetero-SCAN are trained with two dataset, and corresponding validation loss during the training is shown in Figure 6 . The Hetero-SCAN trained with temporal information has faster convergence speed than Hetero-SCAN trained without temporal information in FANG dataset; In the HealthStory dataset, however, two models have no significant difference. Considering that fake news in FANG dataset is disinformation, and fake news in HealthStory is misinformation, temporal information can accelerates the convergence speed of training when identifying disinformation. To show the performance differences when different knowledge triple embedding methods are applied, F1 score, Accuracy, and AUC score were measured on two datasets: FANG and HealthStory. Table 9 indicates that TransE gives better results than the others. This reason can be drawn from the fact that TransE requires fewer parameters and operations than RotatE and ConvE. With limited training data, complex models are easy to suffer from over-fitting, which will cause performance degradation. 

To show that the news representation produced by Hetero-SCAN is better than the existing methods, t-SNE was adopted to visualize news representation in a two-dimensional plane (Figure 7) . The t-SNE technique is a well-known method to visualize the highdimensional data in a two-dimensional plane [30] . As can be seen in Figure 7 , the representations of Hetero-SCAN are clustered tighter than the other methods, implying a significant improvement over existing methods. 

Coronavirus: The viral rumours that were completely wrong

Left bias Publishers checked by MBFC

Questionable Publishers checked by MBFC

Right bias Publishers checked by MBFC

US election 2020: Fact-checking Trump team's main fraud claims

2021. Facebook Media: Working to Stop Misinformation and False News

Rumor detection on social media with bi-directional graph convolutional networks

Information credibility on twitter

Graph-based Modeling of Online Communities for Fake News Detection

Empirical evaluation of gated recurrent neural networks on sequence modeling

Automatic deception detection: methods for finding fake news

Ginger cannot cure cancer: Battling fake health news with a comprehensive data repository

Convolutional 2d knowledge graph embeddings

Syntactic stylometry for deception detection

MAGNN: metapath aggregated graph neural network for heterogeneous graph embedding

Multimodal Fake News Detection with Textual, Visual and Semantic Information

node2vec: Scalable feature learning for networks

Inductive Representation Learning on Large Graphs

Graph neural networks with continual learning for fake news detection from social media

Mvae: Multimodal variational autoencoder for fake news detection

Semi-supervised classification with graph convolutional networks

PHEME dataset for Rumour Detection and Veracity Classification

Prominent features of rumor propagation in online social media

Distributed representations of sentences and documents

Detecting rumors from microblogs with recurrent neural networks

Visualizing data using t-SNE

Efficient estimation of word representations in vector space

FANG: Leveraging social context for fake news detection using graph representation

The development and psychometric properties of LIWC2015

Automatic Detection of Fake News

A Stylometric Inquiry into Hyperpartisan and Fake News

Hierarchical multi-modal contextual attention network for fake news detection

Adversarial active learning based heterogeneous graph neural network for fake news detection

Truth and deception at the rhetorical structure level

Csi: A hybrid deep model for fake news detection

Modeling relational data with graph convolutional networks

FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media

Beyond news contents: The role of social context for fake news detection

The role of user profiles for fake news detection

Rotate: Knowledge graph embedding by relational rotation in complex space

Graph attention networks

Hierarchical Multi-head Attentive Network for Evidence-aware Fake News Detection

Heterogeneous graph attention network

Eann: Event adversarial neural networks for multi-modal fake news detection

Knowledge graph embedding by translating on hyperplanes

Information disorder: Toward an interdisciplinary framework for research and policy making. Council of

Heterogeneous graph neural network

A survey of fake news: Fundamental theories, detection methods, and opportunities