key: cord-0584196-edcyqvgd authors: Lapus, Raymond; Simon, Frank; University, Peter Tittmann De La Salle; Manila,; Philippines,; Sciences, Mittweida University of Applied; Mittweida,; Germany, title: Random Information Spread in Networks date: 2010-08-12 journal: nan DOI: nan sha: 88ee2578173aad996fc3ac67f92eba5459feceb9 doc_id: 584196 cord_uid: edcyqvgd Let G=(V,E) be an undirected loopless graph with possible parallel edges and s and t be two vertices of G. Assume that vertex s is labelled at the initial time step and that every labelled vertex copies its labelling to neighbouring vertices along edges with one labelled endpoint independently with probability p in one time step. In this paper, we establish the equivalence between the expected s-t first arrival time of the above spread process and the notion of the stochastic shortest s-t path. Moreover, we give a short discussion of analytical results on special graphs including the complete graph and s-t series-parallel graphs. Finally, we propose some lower bounds for the expected s-t first arrival time. The spreading of information in networks is a random process. Consider an undirected loopless graph that possibly has parallel edges G = (V, E). Let s ∈ V be a chosen vertex that is labelled at time step 0. In addition we assume that that every edge of G that has a labelled endpoint independently copies the label to the possibly unlabelled endpoint with the given infection probability p in one time step. Some applications are network based models for virus spread in epidemiology or the analysis of gossipping across social networks. The analysis of malware propagation in computer networks, such as spread of computer viruses or worms, represents another important application in technical networks. The epidemic threshold is an interesting measure that characterises the transition from local to global spread. Random graphs and small-world networks gained most attention for investigations in this field (see for instance [1] or [8] ). Certain regularity conditions are necessary in order to make the analysis of large or infinite graphs possible. An application for the analysis of the outbreak of Severe Acute Respiratory Syndrome (SARS) is presented in [6] . We deal with the following questions in this paper. First we present the problem of finding the expected s-t first arrival time using the generating function approach (see Section 2) . Second we establish in Section 3 the relationship between the spread process and the stochastic shortest s-t path problem stated in [9, Chapter 9] ). In Section 4 we introduce the s-t spreading resistance of a graph. We show that this notion is related to Kulkarni's concept [4] of the expected length of a shortest s-t path, whenever the edge lengths are exponentially distributed random variables with intensity p. Subsequently, we establish two reduction techniques for calculating the expected first arrival time in s-t series-parallel graphs (see Section 6) . Due to the computational complexity of the spread process, we propose in Section 7 methods for yielding lower and upper bounds for the expected s-t first arrival time in terms of the s-t reliability polynomial and the distance between s and t. All graphs considered in this paper are finite and loopless, but possibly with parallel edges. In this section, we present how the spread process propagates in arbitrary graphs. Let G = (V, E) be a graph and X k ⊆ V be the set of labelled vertices in G at the discrete nonnegative integral time step k. Assume that u ∈ X k copies the message to an adjacent unlabelled vertex v with probability p uv = 1 − q uv along the edges {u, v} in G. We refer to the numbers p uv and q uv as the infection and noninfection probability from u to v. The spread is performed simultaneously along all edges that link labelled vertices to unlabelled vertices within one unit of time. The copying of labels along the edges is assumed to be stochastically independent. In addition the message spread is symmetric, that is for all u, v ∈ V , p uv = p vu . We presuppose that p uv > 0 for any {u, v} ∈ E, whereas {u, v} / ∈ E implies p uv = 0. Therefore we can describe the spread process by a homogeneous Markov chain {X k } k∈N , where the state space is a subset of 2 V . If G is a connected graph then the labelled vertices spread until all vertices of G are labelled; that is, lim k→∞ X k = V . Let G = (V, E) be a graph and v ∈ V . The open neighbourhood of v, denoted by N (v), is the set of vertices that are adjacent to v in G. For each A ⊆ V , the open neighbourhood of A is defined by B) , is the set of edges of G with one endpoint in A and the other endpoint in B. That is, Moreover if G is a simple graph, then PROOF. Figure 1 explains the derivation of P G (A, B) in a simple graph G. There must be no label propagation along edges from A to N (A) \ B that are represented as broken lines. This gives the first double product. The remaining products correspond to transmissions along solid drawn edges. For each vertex x ∈ B \ A, there must be at least one edges that transports the label which gives the term in brackets. At this point, we denote the transition probability from A to B in the graph G as P (A, B) , whenever G is known from the context. We first describe some general properties of the spread process. In order to simplify the presentation, we now assume that all infection probabilities are identical. Let G = (V, E) be a graph and X ⊆ V be a nonempty vertex subset. The graph G X is obtained by merging the vertex subset X in G. The merged vertex in G X is denoted by X ′ , where G X contains an edge {X ′ , y} for each edge {x, y} ∈ E in G with x ∈ X and y ∈ V \ X. Consequently, the exponents of q from Corollary 3 remain unchanged while transforming G into G X . Thus it is advantageous to work with G X instead of G since the number of vertices is reduced, but the graph G X may not be simple even in the case that G is a simple graph. However, G X can be transformed into a simple graph by replacing all parallel edges joining a pair of vertices u, v ∈ V by just one edge {u, v}. That is, two parallel edges namely, e = {u, v} and f = {u, v} with infection probabilities p e and p f are replaced by a new edge g with infection probability p g = p e + p f − p e p f . In this case we obtain the more general model including edges with different probabilities. Definition 6. Let G = (V, E) be a graph and A, B ⊆ V . Denote by Z AB the random variable of the first time that all vertices in the set B are infected if the vertex set A is infected at the beginning of the spread process. The following lemma describes the transition from one state to another in terms of a Markov chain. This result will be utilised to establish Theorem 9. In the following denotes the proper subset inclusion. holds for all n ≥ 1. PROOF. The condition A ∩ B B implies that there is at least one vertex t ∈ B that is not yet infected, when the spread process reaches the vertex subset A. The probability that all vertices in B are infected for the first time after n time steps can be divided into disjoint events by considering one further time step. In one time step a superset C of A can be infected, where the transition probabilities are given by the P (A, C). In order to infect B for the first time after n time steps, it is necessary to reach B from C in exactly n − 1 time steps for the first time to account for the already elapsed extra time step. The ordinary generating function of the A-B first arrival times is defined as It clearly follows from the definition of Z AB that Φ AB (G, z) = 1 holds for the case of A ⊇ B. PROOF. The statement follows directly from Lemma 7. Note that the assumption of A ∩ B B implies Pr({Z AB = 0}) = 0, as B cannot be reached from A in zero time steps. Observe that the nice triangular structure of the system of linear equations in Theorem 9, caused by P (A, B) = 0 for all pairs (A, B), whenever A is not a subset of B, permits the practical solution of systems with more than 10 5 equations. It is now possible to deduce a recurrence relation for the expectation of the random variable Z AB by using the generating function for the A-B first arrival time probabilities. A similiar approach for the later discussed exponential model is presented in [4] . The expectation of the random variable Z AB , denoted by T AB (G), obeys the recurrence equation with the initial conditions T AB (G) = 1, for any A ⊇ B. In the case that the sets A and B are singletons say A = {s}, B = {t}, we will write T st (G) instead of T {s}{t} (G). This notation is also used when only one of the two sets is a singleton, e.g. T At (G) means T A{t} (G). The same notation also applies to the random variable Z AB (G) and the later defined spreading resistance ρ At (G). In this section, we recast the problem of the spread process in a stochastic shortest path problem. A nice introduction to solve stochastic shortest path problems in an algebraic way in the case of discrete arc lengths assuming finitely many values is found in [9, Chapter 9]. Definition 11. Let G = (V, E) be a graph and s, t ∈ V . Furthermore let D st be the random variable of the length of a shortest s-t path in G, if the lengths L(e) of the edges e ∈ E in G are independent geometric random variables with parameter p, where p equals the success (infection) probability of the spread process. PROOF. For all e ∈ E, we let L(e) be independent random variables representing the edge lengths, that are drawn from a geometric distribution with success probability p = 1 − q. That is for all n ≥ 1, Now for all A ⊆ V the random variable D At denotes the shortest length among all possible lengths of shortest u-t paths in G with u ∈ A, that is, D At = min{D ut : u ∈ A}. Note that whenever the target vertex t satisfies t ∈ A, we have Pr({D At = 0}) = 1. We observe that With the notions of the length L(e) of an edge e and L(A, v), we can restate the random variable D At as which coincides to the transition probability of Section 2. The application of the law of total probability on the event {D At = n} yields Now the conditional probability on the right hand side can be restated by reducing for all v ∈ N (A) the realisation of the random variables L(A, v) by one and asking for the probability of a shortest A-t path of length n − 1. That is, An important consequence of the above correspondence is the symmetry of the s-t first arrival time probabilities, which is only obvious in the view of the stochastic shortest path formulation of the spread process. Corollary 13. Let G = (V, E) be a graph and s, t ∈ V then Z st = Z ts . The spread process is connected to a continuous time Markov chain, which was already examined by Kulkarni and Corea in [4, 2] . PROOF. Observe that the system of linear equations for the expected A-t first arrival times that is obtained from (2) with the initial condition T Ct (G) = 0, for all C ⊆ V and t ∈ C can be restated as As a consequence, the limit value of (1 − q) T At (G) yields to as q tends to 1. Now consider the two limits from the right-hand side of (3) and By applying Corollary 3 and L'Hôpital's rule to the right-hand side of L 1 , one finds Now suppose |C \ A| ≥ 2. Then for every c ∈ C \ A there is a v = c that contributes the factor 1 − q |(A,{v})| to the product in (4). Hence the limit vanishes for every c ∈ C \ A. In the case of C = A ∪ {c} with c ∈ N (A) one readily finds Definition 17. Let G = (V, E) be a graph. The exponential spreading model in G is a spread process such that every edge e ∈ E is weighted with ω(e) and {ω(e) : e ∈ E} is a collection of independent exponential random variables with intensity p. The following result, due to Kulkarni, shows the connection between ρ At (G) and the exponential spreading model in G. The substitution of (1 − q)τ At (G) = ρ At (G) transforms Kulkarni's result into the recursive definition of ρ At (G). This yields an interpretation of the A-t spreading resistance ρ At (G). This section illustrates the problem of the spread process discussed in Sections 2 and 4 on graphs with a special structure. The simplest is a path P n of length n. If s and t are the end vertices of P n and p is the infection probability assigned to every edges in P n , then T st (P n ) = n/p and ρ st (P n ) = n. This result is useful when deriving some simple upper bounds as seen in Chapter 7. Another trivial example is a tree, which follows from the fact that every two (distinct) vertices in a tree are joined by a unique path of nonzero length. for each i = 1, 2, . . . , n. Now define the random variable Y = min{Y i : i = 1, 2, . . . , n}, then Because Y is a continuous random variable that takes nonnegative real values, it follows that its expectation obeys The inner sum in the above equation runs over all nonnegative integers i k with k = 1, 2, . . . , n satisfying i 1 + i 2 + . . . + i n = j and i k < m k for all k with 1 ≤ k ≤ n. The desired claim follows by multiplying E[Y ] by p = 1 − q. Davis and Prieditis [3] considered the expected s-t first arrival time for the complete graph with respect to the exponential model. The same analysis can be also accomplished in the case of the spreading process, which is the assertion given by the following result. Theorem 20. Let K n = (V, E) be the complete graph with n ≥ 2 vertices, A ⊆ V \ {t}, |A| = i and T i := T At (K n ). Then T i satisfies the recurrence relation with the initial condition T n−1 = 1/(1 − q n−1 ). PROOF. Note that in the case of the complete graph K n the transition probabilities P (A, B) do not depend on the sets A and B themselves, but on their cardinality. Assume therefore that A is an i element set and that B is an j element superset of A, then we find Hence, the A-t first arrival time of K n is obtained by plugging (5) in the recurrence formula of the A-t first arrival times stated in (2) . That is with A ⊆ V \ {t} and |A| = j, we have The result then follows by factoring out T i in (6) . Consequently, the solution of the recurrence relation in Theorem 20 gives us the expected s-t first arrival time of the K n by setting i = 1. In effect, the s-t spreading resistance of K n with n ≥ 2 yields to the ratio of the (n − 1)-st harmonic number and n − 1. This result was deduced in a different fashion in the work of David and Prieditis [3] . In the case of s-t serial parallel graphs, we present an algebraic approach that utilises the Hadamard product of formal power series. The approach presented here is viewed as a special case of Shier's [9] general algebraic approach for the stochastic shortest path problem under the assumption that all s-t paths in G are edge disjoint. (Series reduction technique) . Let G = (V, E) be a connected graph with an articulation a ∈ V such that there are two subgraphs G 1 = (V 1 , E 1 ) and ∅) . Let s ∈ V 1 \ {a} and t ∈ V 2 \ {a}, then the following equations are true: Correspondingly, let Φ st (G, z) be the ordinary generating function for the s-t first arrival time probability in G. Similarly, the functions Φ sa (G 1 , z) and Φ at (G 2 , z) denote the ordinary generating functions for the first arrival time that the contact process reaches a in G 1 and t in G 2 , respectively. Hence follows. By applying the formal differentiation with respect to the indeterminate z and then setting z = 1 gives the desired expression T st (G) = T sa (G 1 ) + T at (G 2 ). In order to establish the parallel reduction technique (Theorem 24), we need to introduce the notion of Hadamard multiplication in the ring C[[Z]] of all formal power series with complexvalued coefficients over the indeterminate z. In other words, the resulting formal power series A(z) ⊙ B(z) is obtained by termwise multiplication of A(z) and B(z). It is clear that the Hadamard product is well-defined and is closed in C[[Z]] as presented in [10] . Together with the usual polynomial addition, polynomial multiplication, scalar multiplication and the Hadamard product, C[[Z]] forms a commutative algebra over the field C. In addition, the geometric series J(z) = Let r be a positive integer. We call the formal power series J r (z) to be the r-geometric series, which is defined as The following lemma presents a closed rational expression after pointwise multiplying mand n-geometric series. Lemma 23. Let m and n be positive integers with m ≥ n, then for any complex numbers a and b, J m (az) ⊙ J n (bz) = J m+n−1 (abz) PROOF. Suppose a and b are nonzero complex numbers. Assume further that m and n are positive integers with m ≥ n. Let F (z) = k≥0 f (k)z k be the resulting formal power series obtained by taking the Hadamard product of J m (az) and J n (bz). Then for each k ∈ N, the coefficient of z k in F (z) yields We see that where the rightmost part of (7) is first obtained by treating (k + 1) n−1 /(n − 1)! as a polynomial in k and then applying the Newton's forward difference equation to the said polynomial. Using (7) and setting w = abz in F (z), we now obtain Oberve that we can factor J n+m−1 (w) from the sum in (8) . This gives us The identity holds for each i with 0 ≤ i ≤ j. In effect, F (w) in (9) becomes We interpret the inner sum of (10) in a combinatorial way. Let Y be an (m + j − 1)-element set. Let I ⊆ Y be a j-element subset of marked elements in Y . For each A ⊆ I, we define S A to be the collection of (m − 1)-element subsets of Y such that all the elements of A are not included in the said subsets. In view of the principle of inclusion-exclusion, we have This gives us the number of ways we can draw an (m − 1)-element set from Y such that all the marked elements of I are included. In turn, this corresponds to the number of ways of forming an j-element subset of an arbitrary (m − 1)-element set. A known lower bound for τ st (G) was presented by Lyons [5] and is connected to the ideas presented in [7] . That is, if Res st (G) is the electrical resistance between the vertices s and t in a given connected graph G, where the edges are assumed to have unit resistance, then A lower bound for T st (G) can be found by considering the s-t reliability polynomial of G. Definition 29. Let G = (V, E) be a connected graph with m edges and R st (G, q) = c 0 +c 1 q+c 2 q 2 + . . . + c m q m the s-t reliability polynomial of G, i.e. the probability that there is at least one intact s-t path in G, if the edges of G are failing independently with probability q. The s-t insertion probability, denoted by R st (G, q), is then defined as Theorem 30. Let G = (V, E) be a graph and s, t ∈ V . If d(s, t) is the distance between s and t in G, then PROOF. For each edge e ∈ E, let L(e) be independent geometric random variables, with non infection probabilities q, representing the edge lengths. Denote now by G k = (V, E k ) with k = 0, 1, 2, . . . a discrete random process that gives an infinite sequence of subgraphs G k of G, where e ∈ E k if and only if the event {L(e) ≤ k} occurs. Moreover, denote by I st the random variable of the smallest k, so that G k contains an s-t path. Now the following inequality holds for all k ≥ 0. In order to prove this estimate, we assume an elementary event E in {Z st ≤ d(s, t) + k − 1}, e.g. a realisation of edge lengths, so that the length of a shortest s-t path is smaller or equal than d(s, t) + k − 1. In order that E occurs there must be at least one s-t path in G, so that the sum of the edge lengths of the path is smaller or equal than d(s, t) + k − 1. Therefore all edge lengths of this s-t path must be smaller than k, as if there would be one edge length in this path that is greater than k + 1, then the sum of the path lengths would exceed (k + 1) + (d(s, t) − 1), which contradicts the assumption Z st ≤ d(s, t) + k − 1. As all edge lengths are smaller than k, the event E is by definition also in {I st ≤ k}, i.e. G k must also contains this s-t path. Furthermore the probabilities for the event E coincide in both probability spaces, so that (13) follows. Due to (13), we find that the expectation of the random variable I st is smaller than the expectation of the random variable Z st − d(s, t) + 1. Hence, In order to show the relationship between E[I st ] and the s-t reliability polynomial, consider the edge e = {u, v} and the event {L(e) ≤ k}. We easily find that Pr({L(e) ≤ k}) = 1 − q k , whenever the non insertion probability is q. Assume now that there are k parallel edges between u and v, that are failing independently with probability q, then the probability that there is at least one non failing edge is also given by 1− q k . Therefore, the probability for the event {I st > k} coincides with the probability of the event that there is no s-t path in the multigraph G(k), where G(k) emerges from G by replacing each edge in G with k parallel edges. Equivalently, and we can write k R st (G, q k , . . . , q k ) − R st (G, q k−1 , . . . , q k−1 ) . Now suppose that the s-t reliability polynomial is expressed as R st (G, q) = m i=0 c i q i , then and hence, d(s, t) − 1 + R st (G, q) ≤ T st (G). We have shown that there is a one-to-one correspondence of the spread process and the stochastic shortest path problem. Furthermore, it was shown that this process can be well-described by the exponential model, when the infection probability p tends to zero. Several closed formulae or at least efficient calculation schemes for special graphs, such as complete graphs, parallel paths and s-t series-parallel graphs are presented. Finally some ideas for bounds are proposed that utilise the well-known concept of the s-t reliability polynomial, which can be efficiently computed for graphs with bounded tree-width. The authors conjecture the following bound for the s-t first arrival time, that behaves good for values of q, that are close to 1, i.e. lim q→1 T st (G)/τ st (G) = 1. Conjecture 31. Let G = (V, E) be a graph, s, t ∈ V and T st (G), τ st (G) be the expected s-t first arrival times of the spread process and the exponential model, respectively. In this case τ st (G) ≤ T st (G) holds for all 0 ≤ q < 1. Statistical mechanics of complex networks Shortest paths in stochastic networks with arc lengths having discrete distributions. Networks The expected length of a shortest path Shortest paths in networks with exponentially distributed arc lengths Resistance bounds for first-passage percolation and maximum flow Network theory and SARS: Predicting outbreak diversity Random walk and electric currents in networks The structure and function of complex networks Network reliability and algebraic structures The authors Raymond Lapus and Frank Simon receive a grant from the German Academic Exchange Service (DAAD) and the European Social Fund (ESF), respectively. The authors want to thank Frank Göring from the University of Technology in Chemnitz for giving an direct and nice symmetry proof for Corollary 13. Theorem 24 (Parallel reduction technique). Let G = (V, E) be a graph and {s, t} be a separating vertex pair of G such that H and K are two connected nontrivial graphs with H ∪ K = G and H ∩ K = ({s, t} , ∅), thenbe the random variable of the s-t first arrival time in G, and Z st (H) and Z st (K) be the random variables of the s-t first arrival times in H and K, respectively. As the events {Z st (H) > n} and {Z st (K) > n} are independent, we haveMultiplying both sides of the above equation with z n and summation over all n ≥ 0 yieldswhich leads after rearranging terms to the desired result.We apply the parallel reduction for the series-parallel graph G with s and t as its start and terminal vertices such that {s, t} forms a vertex separating pair. The subsequent result deals with the case that one of the two subgraphs of G is a path of length r ≥ 1.Corollary 25. Let G = (V, E) be a graph and {s, t} a separating vertex pair of G, furthermore assume that H and K are subgraphs of G with G = H ∪ K and H ∩ K = ({s, t}, ∅) and that K is a path of length r connecting s and t. Then the following equations hold:PROOF. For the first arrival times of the path K of length r in G, one findsBy Theorem 24, the ordinary generating function for the first arrival time of G yieldsAfter evaluating the formal derivative at z = 1 we obtainAnother instance of the parallel reduction technique is given by two parallel paths of lengths n and m between s and t.Corollary 26. Let G = (V, E) be the graph that consists of two parallel paths of lengths n ≥ 1 and m ≥ 1 between the vertices s, t ∈ V where m ≥ n. ThenPROOF. Let Φ st (P ℓ , z) be the ordinary generating function for the s-t first arrival time in a path of length ℓ. Then the ratio Φ st (P ℓ , z)/(1 − z) becomesby Equation (11) and the application of partial fraction decomposition. With ℓ = n and ℓ = m, it follows from the parallel reduction technique thatThe desired result now follows by applying Lemma 23 to (12). PROOF. Every removal of an edge e ∈ E of G causes an increase in the expected s-t first arrival time. By considering a shortest s-t path as a subgraph of G one gets the following corollary giving an upper bound, that is not very tight in general.