key: cord-354783-2iqjjema
authors: Wang, Wei; Ma, Yuanhui; Wu, Tao; Dai, Yang; Chen, Xingshu; Braunstein, Lidia A.
title: Containing misinformation spreading in temporal social networks
date: 2019-04-24
journal: Chaos
DOI: 10.1063/1.5114853
sha: 
doc_id: 354783
cord_uid: 2iqjjema

Many researchers from a variety of fields including computer science, network science and mathematics have focused on how to contain the outbreaks of Internet misinformation that threaten social systems and undermine societal health. Most research on this topic treats the connections among individuals as static, but these connections change in time, and thus social networks are also temporal networks. Currently there is no theoretical approach to the problem of containing misinformation outbreaks in temporal networks. We thus propose a misinformation spreading model for temporal networks and describe it using a new theoretical approach. We propose a heuristic-containing (HC) strategy based on optimizing final outbreak size that outperforms simplified strategies such as those that are random-containing (RC) and targeted-containing (TC). We verify the effectiveness of our HC strategy on both artificial and real-world networks by performing extensive numerical simulations and theoretical analyses. We find that the HC strategy greatly increases the outbreak threshold and decreases the final outbreak threshold.

Many communications platforms, e.g., Twitter, Facebook, email, WhatsApp, and mobile phones, allow numerous ways of sharing information [1] [2] [3] [4] [5] [6] . One task for researchers is developing ways to distinguish between true and false information, i.e., between "news" and "fake news" [7] . This task is important because access to true information is essential in the process of intelligent decisionmaking [8] [9] [10] . For example, when the Severe Acute Respiratory Syndrome (SARS) spread across Guangzhou, China in 2003, the Chinese Southern Weekly published a newspaper article entitled "There is a Fatal Flu in Guangzhou." This information was forwarded over 126 million times by TV news and in other newspapers [11, 12] . Individuals receiving this true information could adopt simple, effective protective measures against being infected (e.g., by staying at home, washing hands, or wearing masks). Misinformation, on the other hand, encourages irrational behavior and reckless decision-making, and its spread can undermine societal well-being and sway the outcome of elections [13] [14] [15] [16] . Bovet and Makse [17] analyzed 171 million tweets sent during the five months prior to the 2016 US presidential election and found that misinformation strongly affected the outcome of that election.

To contain the spread of misinformation we must understand the dynamic information spreading mechanisms that facilitate it [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] . Vosoughi et al. [1] examined true and fake information on Twitter from 2006 to 2017 and found that misinformation spreads more quickly than true information. Using the spreading mechanisms common in real-data analysis, researchers have proposed several mathematical models to describe the spreading dynamics of true and fake information [29] [30] [31] [32] [33] [34] . Moreno et al. [29] developed mean-field equations to describe the spread of classical misinformation on static scale-free networks that enables a theoretical study not requiring extensive numerical simulations. Borge-Holthoefer and Moreno [35] found that although there are no influential spreaders in the classical misinformation model presented in Ref. [29] , nodes with high k-cores and ranking values are more likely to be the influential spreaders of true information and also of infectious diseases [36] [37] [38] [39] [40] . When we include the burst behavior of individuals in the misinformation model, hubs emerge as influential nodes [41] . Using real-world data, researchers found that social networks evolve with time, and thus evolving temporal networks more accurately represent the topology of real-world networks than static networks [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] .

Researchers have found that the temporal nature of networks strongly affect their spreading dynamics. Perra et al. [53] found that in susceptible-infected-susceptible (SIS) epidemic spreading a temporal network behavior suppresses the spreading more effectively than a static integrated network. Researchers have also found that the SIS and susceptible-infected-recovered (SIR) models on temporal networks exhibit the same outbreak threshold [54] [55] [56] . Nadini et al. [49] found that tightly connected clusters in temporal networks inhibit SIR processes, but accelerate SIS spreading. Pozzana et al. found that when node attractiveness-its role as a preferential target for interactions-in temporal networks is heterogeneous, the contagion process is altered [57] . Karsai et al. [58] found that strong ties between individuals strongly inhibit the classical spreading dynamics of misinformation.

Several strategies for containing the spread of misinformation in temporal networks have been proposed [59] [60] [61] [62] . Liu et al. [60] examined epidemic spreading on activity driven temporal networks and developed mean-field based theoretical approaches for three different control strategies, i.e., random, targeted, and egocentric. The egocentric strategy is most effective. It immunizes a randomly selected neighbor of a node in the observation window. Other effective approaches using extensive numerical simulations have been proposed [63] [64] [65] . For example, Holme and Liljeros [63] take into consideration the time variation of nodes and edges and propose a strategy for containing the outbreak of an epidemic based on the birth and death of links.

Because there is still no theoretical approach to containing the spread of misinformation in temporal networks, we here systematically examine its spread in activity-driven networks. The rest of the paper is organized as follows. Section II describes the misinformation spreading dynamics on temporal networks and develops a theory to describe the spreading dynamics. Section IV proposes three containment strategies. Section V describes the results of our extensive numerical stochastic simulations, which show that our suggested theory agrees with the numerical simulations. Section IV presents our conclusions.

We here introduce our model for the spreading dynamics of misinformation in temporal networks.

The widely used approaches mathematically describing temporal networks tend to be either eventbased or snapshot representations [47] . The event-based representation approach describes de-scribes temporal networks using ordered events {u i , v i , t i , ∆t i ; i = 1, 2, · · · }, where node u i and v i are connected at time t i in the time period ∆t i . The snapshot approach describes temporal networks using a discrete sequence of static networks G = {G(1), G(2), · · · , G(t max )}, where G(t)

is the snapshot network at time t, and t max is the number of snapshots of the temporal network.

Each snapshot network G(t), contains N nodes, where N is fixed, and M t edges. Thus the average temporal degree of snapshot network G(t) is k t = 2M t /N. Using the adjacency matrix,

We here adopt the snapshot approach to describe temporal networks. As the meaning of the adjacency matrix in static networks, A uv (t) = 1 when nodes u and v are connected at time t,

for undirected temporal networks. The average degree of node u in the temporal network G is

Knowing the adjacency matrix A we obtain the eigenvalues of A.

Here Λ 1 (A) Λ 2 (A), · · · , and Λ N (A) are the eigenvalues of A in decreasing order. The spectral radius is thus Λ 1 (A), which quantifies the threshold outbreak of epidemics in temporal networks.

We use the classical activity-driven network [53] to model a temporal network with N nodes. We build the activity-driven network using the following steps.

1. We assign to each node i an activity potential value x i according to a given probability density distribution f (x). The activity of node i is a i = ηx i , where η is a rescaling factor, i.e., at each time step, node i is active with probability a i . The higher the value of η, the higher the average degree of the temporal network. The higher the value of a i , the higher the degree of node i. We assume that f (x) follows a power-law function, i.e., f (x) ∼

x −γ , where γ is the potential exponent. We make this assumption in order to generate a heterogeneous degree distribution temporal network. After further calculations we find

and ǫ is the minimum value of the activity potential x i . Each active node has m edges, and each edge randomly links to a network node. An edge connects the same pair of nodes with probability m/N. Note that in the thermodynamic limit of a sparse temporal network, there are no multiple edges between nodes and non-local loops 2. At the end of time step t, we delete all edges in network G(t). (2) and (3) until t max in order to generate temporal network G.

We use an ignorant-spreader-refractory model to describe the spreading dynamics of misinformation [29] . Here nodes are classified as either ignorant, spreader, or refractory. Ignorant nodes are unaware that the information is false but are susceptible to adopting it. Spreader nodes are aware that the information is false and are willing to transmit it to ignorant nodes. Refractory nodes receive the misinformation but do not spread it. The misinformation spreading dynamics on temporal networks evolves as follows. We first randomly select a small fraction ρ 0 of spreader nodes to be seeds in network G(t 0 ), where 1 ≤ t 0 ≤ t max . We designate the remaining 1 − ρ 0 nodes to be ignorant. At time step t each spreader i transmits with probability λ the misinformation to ignorant neighbors in network G(t). In addition, each spreader i becomes a refractory node with a probability

where µ is the intrinsic recovery probability, and n(t) is the number of nodes in the spreader and refractory states of node i. The dynamics evolve until there are no spreader nodes. Note that when t reaches t max , the misinformation spreads on G(1) in the next time step. Figure 1 shows misinformation spreading on a temporal network.

We here develop a generalized discrete Markovian approach to describe the misinformation spreading dynamics on temporal networks [54] . We denote I i (t), S i (t), and R i (t) to be the fraction of nodes in the ignorant, spreader, and refractory states, respectively, at time t. Because a node can only be in one of the three states,

An ignorant node i, becomes a spreader with probability p i (t) at time t, where Here

is the probability that node i has not received any misinformation from neighbors at time t in G(t). At time step t + 1, node i remains ignorant with a probability

The decrease of I i (t) is equal to the increase of S i (t), because an ignorant node will become a spreader when it obtains the information from neighbors in state S. In addition, spreader i becomes

. Thus the evolution of node i in the spreader state is

The evolution of node i in the refractory state is

Using Eqs. (3)-(5), we obtain the fraction of nodes at time t in each state,

where H ∈ {I, S, R}. Note that in the steady state there are no spreader nodes, only refractory and stifler nodes. The fraction of nodes that receive the misinformation in the final state is R(∞) = R.

Here R is the order parameter of a continuous phase transition with λ. If the misinformation transmission probability λ is larger than the critical threshold, i.e., λ > λ c , the size of the global misinformation is of the order of the system size. Otherwise, the global misinformation R = 0

for λ ≤ λ c is in the thermodynamic limit. At shorter times, a vanishingly small fraction of nodes receive the misinformation, i.e., S i (t) ≈ 0 and R i (t) = 1 − S i (t) − I i (t) ≈ 0. The recovery probability in Eq. (1) of a spreader node i is µ i (t) ≈ µ, since node i must connect to a spreader that supplies the misinformation, and there is a low probability that it will connect to other spreader or refractory neighbors. Thus Eq. (4) can be rewritten

where δ ij is the Kronecker delta function, i.e., δ ij = 1 if i = j, and zero otherwise. We define the transmission tensor M to be

We mask the tensorial origin of the space through the map

Inserting Eq. (9) into (7), we have

where S(τ ) is the probability that a node is in the spreader state at each time step t during

Here S(τ ) increases exponentially if the largest eigenvalue of M, denoted Λ 1 , is larger than 1. Thus the misinformation spreads, and the threshold condition is [54] Λ 1 (M) = 1,

In an unweighted undirected network G, the largest eigenvalue Λ 1 (M) of M is

where P = tmax t=1

(1 − µ + λA(t)).

Because misinformation spreading on social networks can induce social instability, threaten political security, and endanger the economy, we propose three strategies-random, targeted, and heuristic-for containing the spread of misinformation in temporal networks using a given fraction of containing nodes f . We first immunize a fraction of f nodes using a static containment strategy. The misinformation then spreads on the residual temporal network. If node i is "immunized," it cannot be infected (transmit) by the misinformation received from neighbors (the misinformation to neighbors). Mathematically the immunized node set is V, and the number of immunized nodes equals the number of elements in V, i.e., |V| = ⌈f N⌉. We set v i = 1 if node i is immunized, otherwise v i = 0. After immunization, Eqs. (2)-(4) can be written

and

respectively. In an effective containing strategy the misinformation spreading dynamics is suppressed for a given fixed fraction f of immunized nodes, i.e., the objective function is

where the constraint conditions are Eqs. • Strategy I: Random containment (RC). The most used strategy for containing the spread of misinformation is randomly immunizing a fraction of f nodes [66] .

• Strategy II: Targeted containment (TC). Another intuitive way is to immunize the nodes with highest average degree k in the temporal network G. Specifically, we first compute the average degree of each node i as k i = 1 tmax tmax t=1 N j=1 A ij (t). We then rank all nodes in descending order in the vector W according to the average degree of each node. Finally we immunize the top ⌈f N⌉ nodes of W.

• Strategy III: Heuristic containment (HC). Using the TC, we apply an HC strategy. Because TC is much better than RC, we perform the HC strategy by replacing the immunization nodes. When the repeat time is very large, the immunized nodes reach an optimal value.

(i): We initialize a vector W according to the descending order of the average degree of nodes.

The first ⌈f N⌉ nodes of W are immunized, the final misinformation outbreak size is R o , and W 0 is denoted a set. The remaining nodes W 1 = W\W 0 are denoted a set.

(ii): We randomly select nodes in W 0 and W 1 , denoted v 0 and v 1 , respectively. We switch their order in vector W and denote the new vector W n . We immunize the first ⌈f N⌉ nodes W n and compute the final misinformation outbreak size R n .

(iii): When R n > R o , we update vector W, i.e., W → W n . Otherwise, there is no change.

(iv): We repeat steps (ii) and (iii) until 1 ts ts i=1 |W −W n | < ǫ ′ . In the simulations we set t s = 100 and ǫ ′ = N −1 .

For the activity-driven network, we set N = 10 3 , t max = 20, η = 10, m = 50, γ = 2.1, and ǫ = 10 −3 . For real-world networks, we use the data collected by the Sociopatterns group [? ] , which records the interactions among the participants at a conference. The time resolution of the signal is 20 sec. Because the temporal network is sparse, it is difficult for the information to spread in the original network. We thus aggregate the temporal network using four windows, w = 30min, 60min, 120min, and 240min. We average all simulation results more than 1000 times.

We use variability to locate the numerical network-sized dependent outbreak threshold [67, 68] , where R is the relative size of misinformation spreading at the steady state. At the outbreak threshold λ c , χ exhibits a peak. When λ ≤ λ c , the global misinformation does not break out, but when λ > λ c the global misinformation does break out. Figure 2 shows the misinformation spreading on activity-driven networks. Note that the final misinformation outbreak size R increases with λ. The larger the recovery probability µ, the lower the values of R because spreader nodes are less likely to transmit the misinformation to stifler neighbors [see Fig. 2(a) ]. Note that our theoretical and numerical predictions of the final misinformation outbreak size R agree. Figure 2(b) shows the variability χ as a function of λ. There is a peak at the misinformation outbreak threshold λ c . Figure 3 shows λ c versus µ in which λ c increases linearly with µ. The theoretical predictions of λ c obtained from Eq. (11) agree with the stochastic simulations. Figure 4 shows misinformation spreading in real-world temporal networks. As in Fig. 2 , R increases with λ and decreases with µ. As in SIR epidemic spreading [59] , the effective outbreak threshold (λ/µ) c is a constant value. In addition, when the value aggregating window w is small, there are fewer opportunities for spreaders to transmit the misinformation to stifler neighbors, thus the misinformation does not break out globally, i.e., there are smaller values of R for smaller w.

Once again our theoretical results agree with the numerical simulations.

We next examine the performances of our proposed strategies for mitigating misinformation spreading on artificial and real-world temporal networks. Figure 5 shows R versus λ for different values of the fraction of containing nodes f . Note that R decreases with f because no more nodes receive the misinformation. Note also that the TC strategy performs much better than the RC strategy because the higher degree nodes k are contained, and spreaders can no longer transmit the misinformation to stiflers. Thus when we immunize the same fraction of containing nodes, the values of R for the TC strategy are smaller than those for the RC strategy. For example, when f = 0.5, using the RC strategy ≈ 30% of the nodes are informed by the misinformation, but using the TC strategy none are informed by the misinformation. In addition, the outbreak threshold λ c to contain the misinformation the fraction of nodes informed by the misinformation is finite, i.e., R ≈ 0.25. Our theoretical results agree with the numerical simulation results.

An effective containing strategy with a fraction of immunized node f and a small outbreak threshold λ c greatly decreases the final misinformation outbreak size R. Figure 7 shows the effective outbreak threshold (λ/µ) c versus f on activity-driven networks for the RC, TC, and HC strategies. Here (λ/µ) c increases with f , and (λ/µ) c is the largest using the HC strategy when f is fixed. When f is sufficiently large, no λ value can induce a global misinformation outbreak.

We denote f c the critical probability that at least a fraction of f c nodes must be containing to halt misinformation speading in temporal networks. We find that the values of f c for the HC containing strategy are the smallest of all containing strategies. The f c value for the RC strategy is 5 times the f c value for the TC strategy, and the f c value for the TC strategy is 2.5 times the f c value for The lines and symbols are the theoretical and numerical predictions of (λ/µ) c , respectively. The vertical line represents the critical probability f c . HC strategy. Thus the HC strategy is the most effective.

We finally examine real-world networks to varify the effectiveness of our proposed three strategies. Figure 8 compares the performances of the TC and HC strategies by examining R versus f for given values of λ. As in Fig. 6 , the HC strategy most effectively contains the misinformation spreading on temporal networks irrespective of the values of λ. In addition, Fig. 9 shows that the effective outbreak threshold (λ/µ) c is the smallest when using the HC strategy. Thus our theory accurately predicts the numerical simulation results.

We have systematically examined the dynamics of misinformation spreading on temporal networks. We use activity driven networks to describe temporal networks, and use a discrete Marko- The lines and symbols are the theoretical and numerical predictions of R, respectively. vian chain to describe the spreading dynamics. We find that the global misinformation outbreak threshold correlates with the topology of temporal networks. Using extensive numerical simulations, we find that our theoretical predictions agree with numerical predictions in both artificial and real-world networks.

To contain misinformation spreading on temporal networks, we propose three strategies, random containing (RC), targeted containing (TC), and heuristic containing (HC) strategies. We perform numerical simulations and a theoretical analysis on both artificial and four real-world networks and find that the HC strategy outperforms the other two strategies, maximizes the outbreak threshold, and minimizes the final outbreak size. Our proposed containing strategy expands our understanding of how to contain public sentiment and maintain social stability.

This work was partially supported by the China Postdoctoral Science Foundation (Grant No. 2018M631073), and Fundamental Research Funds for the Central Universities. LAB thanks UNMdP and CON-ICET

Weekly Releases

2013 IEEE 13th International Conference on Data Mining

Complex Spreading Phenomena in Social Systems

Temporal networks

A Guidance to Temporal Networks

Temporal Network Epidemiology

Journal of Physics: Conference Series