key: cord-0878753-jfvy1rhu
authors: Tang, Disheng; Du, Wenbo; Shekhtman, Louis; Wang, Yijie; Havlin, Shlomo; Cao, Xianbin; Yan, Gang
title: Predictability of real temporal networks
date: 2020-07-09
journal: Natl Sci Rev
DOI: 10.1093/nsr/nwaa015
sha: e3665e8ff0767e66f7dab780152989fa4d78f104
doc_id: 878753
cord_uid: jfvy1rhu

Links in most real networks often change over time. Such temporality of links encodes the ordering and causality of interactions between nodes and has a profound effect on network dynamics and function. Empirical evidences have shown that the temporal nature of links in many real-world networks is not random. Nonetheless, it is challenging to predict temporal link patterns while considering the entanglement between topological and temporal link patterns. Here we propose an entropy-rate based framework, based on combined topological-temporal regularities, for quantifying the predictability of any temporal network. We apply our framework on various model networks, demonstrating that it indeed captures the intrinsic topological-temporal regularities whereas previous methods considered only temporal aspects. We also apply our framework on 18 real networks of different types and determine their predictability. Interestingly, we find that for most real temporal networks, despite the greater complexity of predictability brought by the increase in dimension the combined topological-temporal predictability is higher than the temporal predictability. Our results demonstrate the necessity of incorporating both temporal and topological aspects of networks in order to improve predictions of dynamical processes.

Link temporality describes the time-varying nature of couplings and interactions between nodes in real networks [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] , which has been found to significantly affect network dynamics. Examples include innovative or epidemic diffusion [13] , information aggregation [14] , the emergence of cooperation [15] and the achievability of control [16] . Hence, in order to alter network dynamical states in a desirable way, it is essential to quantitatively understand both topological and temporal patterns. This raises a fundamental question: how predictable are real temporal networks? This question is much broader and distinct from time-series forecasting [17] [18] [19] , which aims to predict the future evolution of single variables, and link prediction [20] [21] [22] [23] , the goal of which is to uncover the missing or future links in static networks. Here we offer an entropy-rate-based framework that considers the combined topologytemporal patterns and apply them to a wide range of model and real weighted and unweighted temporal networks, uncovering the prediction limits of real temporal networks.

A temporal network with n nodes consists of a series of snapshots (Fig. 1A) , which can be described as a 2D expanded matrix M (Fig. 1B) . Each column in M represents one snapshot and each row represents the temporality of a possible link, i.e. whether this link is present in a specific snapshot and its weight. Since all pairs of nodes must be taken into account, the number of rows in M is n 2 . The full information of the temporal network is encoded by this matrix M , which can be viewed as a stochastic vector process-a sequence of random vectors. To quantify the predictability of this vector process, we use the entropy rate, H , i.e. the asymptotic lower bound on the per-symbol description length [24] , which is a rigorous measure of the level of randomness in the process. As illustrated in Fig. 1C , H can be calculated using a generalized Lempel-Ziv algorithm [25] , of which the essence is to calculate the recurrence times of different patterns within a square: a 2D square with side k is defined as M C (k) , where C (k) = {v = (t, s ) ∈ Z 2 : 0 ≤ t ≤ k, 0 ≤ s ≤ k} and v RESEARCH ARTICLE denotes the coordination of an element in M ; v v represents the smallest integer k such that block M v−C (k) does not occur within the rectangle (0, v] except at position v, where 0 = (0, 0). It has been proven [25] that lim inf n→∞ n 2 logn 2 v∈C (n) ( v v ) 2 → H . Thus, the entropy rate H of the matrix M is captured by

when the temporal network has a large number of snapshots (see Supplementary Material, Section II). The predictability of a temporal network is the probability M that a predictive algorithm can correctly forecast the future evolution of this network based on its history. Once we have the entropy rate H (M ), the upper bound of predictability max M can be obtained by solving (2) where N is the number of unique values in matrix M (see the 'Methods' section and Supplementary Material, Section III). Here, max M is the fundamental limit of predictability, i.e. in principle, no algorithm can predict the temporal network with an accuracy higher than max M . It is worth stressing that the entropy rate obtained with the generalized Lempel-Ziv algorithm is an asymptotic measure of randomness and Eq. (1) becomes more accurate when the number of time steps is larger. Hence, given the finite number of snapshots in real temporal network data sets, the calculated value of max M should be interpreted as an asymptotic estimate of the upper bound of predictability. Moreover, max M is an intrinsic property of the temporal network and does not depend on a specific predictive algorithm.

The snapshots of real temporal networks are usually very sparse, so most rows in M consist of many zeros. Thus, we sort the rows, i.e. all potential links, in descending order according to the number of their occurrences in all snapshots and remove those links that are present in <10% of the snapshots, obtaining a new matrix M (see the 'Methods' section).

Our analyses in both model and real networks show that the filtering process and the ordering of rows in M have a negligible effect on the predictability (see Fig. 1D , and also Supplementary Material, Sections IV and V); therefore, we use max M to quantify the predictability of temporal networks hereafter. Note that the original entropy rate (Eq. 1) applies to square matrices only, although the matrix M of a temporal network can be non-square. To overcome this issue, we split the original matrix into smaller squares with shorter history and find a linear relationship between the predictability and the number of squares, implying that longer history leads to higher predictability, allowing us to calculate the predictability of any temporal network (see the 'Methods' section and Supplementary Figs 1 and 2 in Supplementary Material, Section III).

Next, we test and validate our measure, max M , i.e. the topological-temporal predictability (TTP), in synthetic weighted temporal networks ( Fig. 2A) . The initial snapshot is a network with communities generated by a stochastic block model [26] with links assigned with random weights. In each snapshot henceforth, to generate a neighbor correlation for each link, we activate either the temporal parameter γ or the structural parameter β. With probability β, we modify the structure and the link changes its weight to that of an adjacent link; with probability γ , we modify the temporal aspect and the link weight stays the same as in the last snapshot; otherwise, the link is assigned a random value (see the 'Methods' section). Long-range correlations are generated through 2D fractional Gaussian noise (FGN) [27] , with a power-law correlation function C (r, ϕ) = r −γ x cos 2 ϕ + r −γ y sin 2 ϕ, where (r, ϕ) are polar coordinates and γ x is regarded as a decay parameter in the temporal dimension, while γ y is for the topological dimension. We compare, in Fig. 2B and C, our TTP with an existing measure, namely temporal predictability (TeP) [28] , which considers the links of a temporal network as merely a set of uncorrelated time series and captures only the temporal regularity (see the 'Methods' section), and also with three predictive algorithms. For this, we employ three commonly used methods, namely Markov [29] , ConvLSTM [30] and PredNet [31] , to forecast the future evolution of real networks. Markov considers a temporal network as a set of uncorrelated time series, ConvLSTM takes into consideration link correlations, and PredNet is a dynamic matrix-prediction algorithm based on ConvLSTM (see Supplementary Material, Section IX for details). As shown in Fig. 2B , TeP is significantly smaller than TTP and, when parameters β and γ increase, TeP can be seen to be nearly independent of the structural parameter β, due to the fact that TeP only partially characterizes the regularity of temporal networks. Note that, since β and γ are not completely independent of each other, TTP is still slightly higher than TeP even when β and γ 

The fluctuations of topological-temporal predictability (TTP) for different orders of rows in matrix M . All matrices are extracted from a synthetic temporal network in Fig. 2A with rewiring probability p = 0.5. We also change p and observe that it has no effect on the results (see Supplementary Material, Section V).

approach zero (highest correlations). The higher accuracy of PredNet than the upper bound of predictability provided by TeP, as well as the poor performance of Markov, both indicate the significance of topological information. The unexpected insufficient performance of ConvLSTM, however, is caused by the deconvolution layer, which introduces errors. We further find similar results for 2D FGN, although the varying range of predictabilities is much smaller due to fewer possible values.

We also introduce synthetic unweighted temporal networks (Fig. 2D ) to validate our measure (TTP). The initial snapshot is a ring and each snapshot thereafter is generated by randomly rewiring a fraction p of links in the most recent snapshot. Obviously, as p increases, the network becomes more random, and hence less predictable (see Supplementary Material, Section VI). However, the structural consistency of the aggregated network-a measure that captures only topological regularity on static networks [32] -leads to conflicting increasing predictability (Fig. 2E) , demonstrating again the necessity for considering link temporality.

In contrast, our measure, TTP, decreases monotonously when p increases. Yet, due to the high sparsity of these temporal networks, TTP remains high for all values of p. To remove the impact of sparsity, we define and calculate the normalized topological-temporal predictability (NTTP) = (TTP − TTP bl )/(1.0 − TTP bl ) for TTP bl < 1.0 (see the 'Methods' section), where NTTP is the normalized TTP and TTP bl is the TTP of the shuffled network, which can be viewed as the lower bound of the predictability of temporal networks. In comparison with NTTP, we also normalize TeP (called here the normalized temporal predictability (NTeP)) over shuffled links (see the 'Methods' section). As shown in Fig. 2F , for p = 0, the network is fully predictable (NTTP ≈ 1.0) and, for p = 1.0, the network becomes totally random and unpredictable (NTTP vanishes). Even though NTeP has the analogous decreasing behavior, it is usually lower than NTTP due to the lack of topological information. Therefore, the NTTP indeed captures the intrinsic regularity of temporal networks.

We apply our framework on 18 real temporal networks in diverse scenarios, including animal interactions, human contacts, online communications, political events and transportation (see Supplementary Material, Section I for the description of these network datasets). We group these networks into five categories and reveal the intrinsic predictability profile, consisting of NTTP and NTeP, for each network (Fig. 3A) . We find that human contacts have the highest averaged NTTP, probably resulting from their synchronized bursty nature, while temporal regularities dominate the overall predictability of To test the impact of link weights on predictability, we develop a temporal stochastic block model with nearestneighbor correlations (see Supplementary Material, Section VII for details) and long-range correlations [27] . The initial snapshot is generated by the stochastic block model [26] , consisting of nodes uniformly assigned to specific communities. There are four communities with 100 nodes and 300 snapshots in each network, while the degree of each node is 3. The link weights in subsequent snapshots are generated according to topological parameters and temporal parameters for the two models, respectively, without changing the network topology. (B) Two predictability measures (TTP and TeP) with three predictive algorithms (Markov [29] , ConvLSTM [30] and PredNet [31] ) on nearestneighbor correlations with topological parameter β and temporal parameter γ . Maximum of β or γ means the strongest memory in the topological or temporal dimension. TeP is obtained by averaging PIL. (C) Two predictability measures with three predictive algorithms on long-range correlations with a power-law correlation function C (r , ϕ) = r −γx cos 2 ϕ + r −γy sin 2 ϕ, where (r , ϕ) are polar coordinates and γ x is regarded as the temporal parameter, while γ y is the topological dimension. Results are averaged over 10 independent realizations of the networks. (D) To test the impact of network topology on predictability, we develop an evolving small-world network model. The first snapshot is a ring network; subsequent topologies of the network are generated by randomly rewiring a fraction p of links in the previous snapshot. (E, F) Predictabilities of evolving smallworld networks against rewiring probability. The networks are generated by the model in (D) with 50 nodes and average degree 2. Structural consistency (SC) is an existing predictability measure for static undirected and unweighted networks [32] . We normalize TTP over the TTP bl to eliminate the impact of link sparsity (see the 'Methods' section), obtaining the intrinsic predictability of a temporal network, and also obtain normalized TeP for comparison (see the 'Methods' section).

transportation networks due to the periodicity of each link (see Supplementary Material, Section VIII for details). Since the baselines for the normalizations in NTTP and NTeP are different, NTeP can be higher than NTTP. However, we find another interesting phenomenon in most networks (excluding Enron-Email (EE), Levant-Event (LE), Aviation-Network (AN) and Britain-Transportation (BT)): the intrinsic combined predictability is higher than TeP despite the greater complexity of capturing 2D regularity rather than 1D. This implies the significance of the topological information as well as the correlation between the temporal and topological patterns. Surprisingly, we also find strong correlations between topological regularity (characterized by the Hamming distance between each link pair) and the difference between TTP and NPIL (normalized predictability of individual links) (see Fig. 3B and C), suggesting that the intrinsic predictability of real networks mostly originates from temporal and topological regularity, rather than from the interdependence between them.

Next, we compare our measure to the predictive power of existing algorithms. We find that the above existing algorithms mostly fall short in prediction (see Fig. 4 ). Indeed, for a few networks

their accuracy is higher than the maximum predictability found by TeP. This is probably because TeP fails to incorporate the topological aspects. However, we found that the predictability given by our TTP measure always remains out of reach from the current algorithms, indicating again that TeP alone cannot characterize the regularities in temporal networks. Figure 4 . Predictive power of existing algorithms. Markov considers a temporal network as a set of uncorrelated time series [29] , ConvLSTM takes into consideration of link correlations [30] , and PredNet is a dynamic matrix-prediction algorithm based on ConvLSTM [31] (see Supplementary Material, Section IX for details). Error bars are the standard deviation of each algorithm over 10 different runs. Note that all algorithms do not reach our topological-temporal predictabilities of real temporal networks. The accuracy of at least one algorithm is higher than TeP on AC, Marseilles-Contact (MC), Workplace-Contact (WC), College-Forum (CF) and CM networks.

We developed a 2D framework, based on combined topology-temporal features, for quantifying the intrinsic predictability and uncovering the predictability profile of any temporal network. Importantly, we find that the accuracy of current algorithms could be higher than the current temporal-only predictability methods for some real temporal networks. Furthermore, they never exceed our TTP measure. Given the fact that predictability is an essential property of temporal networks, our findings suggest that more accurate predictive algorithms are needed to capture the regularities of real temporal networks, i.e. there is room for researchers to continue improving their predictive algorithms. In addition, applying our measure of predictability to detect the changing points of temporal networks and systematically investigating the impact of predictability on dynamical processes and control on temporal networks are worth future pursuits.

As mentioned above, most rows in matrix M for a real temporal network consistently remain zero. Such link sparsity leads to high predictability. We sort links 1 , 2 , . . . , n in the matrix M by their activation rates a 1 , a 2 , . . . , a n in descending order, hence a 1 > a 2 > . . . a n , then obtain M according to the filtering rules:

Since the estimation of the entropy rate converges to the real entropy when the size of the matrix goes to infinity [25] , we include at least m θ most active links or 60% of non-zero elements in the matrix to diminish errors. Due to computational restrictions, we set m θ = 1000, although, in principle, it could be higher with sufficient resources. All the calculations are performed on M . We show that the matrix-filtering method has no influence on the results after reducing the matrix M to M in Supplementary Material, Section IV.

Although a detailed derivation is provided in Supplementary Material, Section II, here we adumbrate the main steps used to derive the upper bound of predictability. The entropy rate of a temporal network, which is characterized as a random field, is defined as

where history of M l t ≡ {M i j : ( j < t) or ( j = t and i < l )}.

Tang et al. 935

Suppose P(M l t = M l t |τ l t ) is the probability that is based on the history τ l t , the actual value of M l t agrees with our estimation M l t , and λ(τ l t ) is the probability that M l t takes the most likely value given τ l t , thus

Let P(τ l t ) be the probability of observing a specific history. It can be demonstrated that the best prediction strategy based on this history is to adopt the most likely value [28] ; thus, the predictability of M l t is

Then the overall predictability of a random field is

Because the entropy increases as the distribution becomes uniform, the distribution created by setting the remaining probabilities to be the same while preserving the most likely value λ(τ l t ) = p max has an entropy no less than the original distribution. Note that M v ∈ A, M v ≡ |A| and denote N ≡ |A|. The entropy of the new distribution is

Then the solution of max M in the above equation is the upper bound of predictability M . We adopt the entropy estimator [25] as the entropy rate H (M ) for the calculation of predictability's upper bound max

The link sparsity of a temporal network largely determines its predictability even after we adopt matrix filtering. To remove the impact of sparsity and obtain the intrinsic predictability of a temporal network, we normalize max M over the baseline (i.e. predictability of shuffled network), which captures only the regularity in the link-weight distribution. Therefore, we have

where p is the original predictability and b is the baseline. It is worth noting that the NTeP is the average of the NPIL, which is obtained by normalizing PIL over its own baseline PIL bl .

Since the application of our entropy estimator is limited to only square matrices [25] , we explore the correlation of weighted-average predictability and number of squares by gradually splitting the nonsquare matrix into a set of units, i.e. 1 × 1 squares, and compute the predictability of each square. We find the linear relationship between the weightedaverage predictability and the number of squares, including units. Assume matrix M is split into Q squares s 1 , s 2 , . . . , s Q in the first splitting stage, along with u units; let e s 1 , e s 2 , . . . , e s Q be the sizes of squares, then the areas of the squares are e s 1 2 , e s 2 2 , . . . , e s Q 2 . It is worth noting that we define the predictability of units as 1 |A| , where A is the finite value set of link weights in the temporal network. Then the weighted-average predictability at stage i , of which the weight equals the portion of corresponding square in the matrix, is defined as

while the number of squares for splitting stage i is

Note that N 1 = Q + u. Since there is a linear relationship between p i and N i , thus

where 1 ≤ i ≤ Q − 1, k and b are constants. According to our observations of k < 0, the negative linear relationship between p i and N i indicates the positive correlation between the length of memory and predictability, since a smaller N i means larger squares with more memory. We define this as

predictive congruency and use it to obtain the TTP of each temporal network with non-square matrix

The temporal stochastic block model is used to test the impact of link weight while topology remains invariant. When generating neighbor correlation in a temporal stochastic block model, we determine the weight of links individually according to parameters β and γ . For each link, there is a probability of modifying the link weight based on structural or temporal aspects. If the structural parameter is selected, then the probability for the link to adopt the same weight as its neighboring link is β; when the temporal parameter is activated, the probability for the link to remain the same as the previous snapshot equals γ . Otherwise, the link is assigned a random value. Suppose there are m links 1 , 2 , . . . , n in the matrix; then, the probability density function of a link weight at a certain time is

where i, j ∈ [1, n], i = j . p β and p γ are the probabilities to choose the structural parameter β and temporal parameter γ , respectively, p β + p γ = 1. δ xy is the Kronecker delta function, and r is a random number. Specifically, we assume p β = p γ = 0.5 and generate the matrix column by column, from top to bottom within each column. If a link is determined to adopt the weight of its neighboring link at a certain time, we assign the prior link weight to it, which is its adjacent element in the matrix. The initial snapshot of the evolving small-world model is a ring network; then, we obtain each snapshot by rewiring a fraction of links in the previous snapshot.

TeP is the average PIL in the network (Fig. 1) . To eliminate the influence of sparsity, we also normalize PIL over its own baseline to obtain NPIL, and NTeP as the average of NPIL. , otherwise ,

where PIL bl is the PIL of shuffled links.

Supplementary data are available at NSR online. 

Each temporal network can be considered as a random field on the integer lattice 2 ,

i.e. a family of random variables { : ∈ 2 } , indexed by a two-dimensional vector = ( , ). It is natural to assume that each potential link in the temporal network takes weights in a finite set , i.e. ∈ , ∈ 2 . The uncertainty of a random field can be measured by the entropy rate, which equals the minimum rate required to encode the field without any distortion. Let be the set { : 1 ≤ ≤ , 1 ≤ ≤ }, the entropy rate of a random field is defined as

where is a certain element in matrix . We use Ω to denote the ℎ To calculate the entropy rate, we use the following estimator which has been demonstrated to have good performance over random fields on the integer lattice Let ( =̂|Ω ) be the probability that the actual value of agrees with our estimation ̂, and (Ω ) be the probability that, given Ω , takes the most likely value, thus

The goal of predictive algorithms is to achieve (Ω ), i.e. predicting the most likely value based on Ω .

Next, we define predictability Π ( , ) for a certain element in random field based on the history. Let (ω ) be the probability of observing a specific history ω ,

The overall predictability Π of a random field can be obtained by averaging 

Since the function ℱ(x) monotonically deceases with x, we assume ℱ(x) = ( ), then the value of x in this equation will be no less than Π , i.e.,

where the Π in the equation is the upper bound of predictability Π , i.e., Π ≥ Π .

The entropy rate estimator only applies to square matrix, here we generalize the calculation of predictability for general matrices. Specifically, we split a non-square matrix into a set of squares according to the following rules: Start from left or top of the matrix; Split the largest square from the remaining part; Unit cannot be split (see Fig. S1C ). This splitting process is executable and unique for any non-square matrix, and the predictability of this non-square matrix is defined as the weighted average predictability of all the square matrices split from it, where the weight of a square equals the proportion of the square area to the original matrix area.

In order to discover the relationship between the number of squares and the predictability, we continue to successively split all squares into units. To ensure the uniqueness of the splitting procedure, we split the closest square matrix to a set of units to obtain the subsequent phase. We calculate again the predictability for current splitting scheme. The process is repeated until the whole matrix is completely split into units.

Suppose the cardinality of finite set from which potential links of a temporal network take weights is | |, the predictability of a unit is then 1 | | , since there is not any historical information anymore.

As shown in Fig. S1A , we find a linear relationship between the predictability and the number of squares N (Fig. S1A) . Since our measure, the topological-temporal predictability (TTP) corresponds to the value of when there is only one square split from the original matrix, we can obtain TTP by extending the linear relationship to the dot N = 1 (Fig. S1B) . We examine the calculation in all other real temporal networks,

showing that there are minor deviations in only AN and LE networks (Fig. S6) . Suppose matrix ̃ is split into squares 1 , 2 , … , , along with units in the original splitting phase and the square sizes are 1 , 2 , … , respectively, then the predictability acquired at stage is The area that approaches axis is zoomed in, where we use the linear relationship to obtain TTP of a matrix. (C) Different phases generated by the splitting process. Each matrix corresponds to a splitting phase, the color of each square represents the splitting order: Green denotes the first square to split in next phase, orange is the last square to split while blue represents units. In this instance, we obtain six predictabilities from corresponding splitting phases. 

To obtain TTP baseline , we usually need to use the average of many different realizations of shuffled network to diminish the errors, yet considering the high time complexity of TTP, we attempt to find a minimum number of realizations that can roughly achieve the same goal. We obtain 100 different shuffling of a network, and average over 100 different combinations of those realizations. Figure S3 indicates the stabilization after we average over 40 different realizations of shuffled network, thus we adopt the average of at least 40 runs as the TTP baseline of the network. Due to the extremely high sparsity of real world networks, we use matrix filtering to remove node pairs that never or seldom have connections (See Methods). By changing the portion of matrix being used, we find that although TTP and TTP baseline increase due to more inactive links being included, matrix filtering has no impact on NTTP after 30% of most active links are included in the matrix. We continue to remove inactive links from the remaining matrix of real world networks and find that NTTP maintains stable at an early stage (Sec. VII.). In this section, we systematically examine the impact of row orders.

First, we use Genetic Algorithm (GA) to obtain the specific order of rows that achieve the highest or lowest predictability. In GA, the row order is encoded as the chromosome, and the TTP serves as the fitness function (16) . Assume that there are rows in the matrix and the ordering sequence is = ( 1 , 2 , … , ) , where represents the sequence number of row and apparently ∈ {1,2, … , }; ≠ ≠ , , ∈ {1,2, … , }. The genetic sequence is denoted by = ( 1 , 2 , … , ), and = ( 1 , 2 , … , ) represents the ascending sequence ( 1, 2, … , ) , then the encoding process is as follows:

1) Suppose = , and the sequence number of in is , then = ;

2) Delete from ;

3) Loop through 1) and 2) until the end.

Since ∈ {1, 2, … , − + 1} , in the crossover operations we exchange only the corresponding part of two chromosomes, and in mutation operations we mutate only the gene in its value range. Let = ( 1 , 2 , … , ) be the decoding sequence, the decoding process is quite similar to encoding:

1) Suppose is the -th element in , then = ;

2) Delete from ;

3) Loop through 1) and 2) until the end.

The parameters are population size = 50, generation = 100, crossover probability = 0.5, mutation probability = 0.001. In order to retain good individuals with a higher probability in the selection process, we use exponential function to enlarge the difference between good and bad individuals. The goal of genetic algorithm is to obtain not only max( ), the maximum value of TTP, but also min( ), the minimum value of TTP. We extract a group of submatrices with different sizes from a real temporal network AN and observe the variations in max( ) and min( ), as well as TTP of 100 random orders. For datasets ME and RM, there are fluctuations in TTP and NTTP when the snapshot duration changes, but their varying patterns are quite similar (Fig. S7) . When snapshot duration becomes smaller and the number of snapshots increases, there is a short decline then a steady increase in TTP. The varying patterns of TTP and NTTP are more similar on ME, while NTTP of RM fluctuates within a small interval. TTP changes in the opposite trend with link density, mainly due to the more predictable nature of more sparse network, especially for the same network with different time windows. Baselines are quite close to TTP in both datasets, implying that network sparsity is the most significant contributor to the high predictability. In this paper, we adopt the most frequently used snapshot duration for each real network.

To test the impact of data incompleteness on the predictability of temporal networks, we remove a proportion of links in each real dataset and calculate its NTTP. In this section we examine the predictability of a part of a temporal network. Figures   S9A These results indicate that the predictability of a real temporal network can be well estimated from its submatrices. Especially, when the network is extremely large, using submatrices to calculate NTTP can dramatically reduce time complexity at the cost of less than 5% error for all the datasets studied in this paper. 

To determine the cause of high predictability we study the correlation between TTP and TTP baseline . Interestingly, TTP is proportional to the TTP baseline for both model and real networks, indicating that theoretically there is significant room to improve the quantification of predictability. Furthermore, the high predictability mostly comes from the distribution of link weights, i.e. the sparsity of networks, additionally revealing the significance of adopting normalized predictability for analyzing intrinsic predictable nature of temporal networks.

Figure. S11. Correlation between TTP and the TTP baseline for real networks as well as temporal stochastic block model.

While there seems to be no significant pattern in animal interactions and political events, activities of human contacts are quite bursty and highly synchronized ( Figure   S12 ). It's not difficult to understand the fundamental cause of it lies in the regular nature of human life. Less restricted by space and distance than proximity networks, online communications are considerably less bursty and synchronized, leading to generally 21 lower predictability. Periodicity of BT contributes to its highly predictable nature, while less regular pattern of AN defines its more random essence. Even though periodic pattern is only observed on human-related networks (human contacts, online communications and transportation) in Figure S12 , we discover all the real networks are highly synchronized (See Figure S13 ), implicating the contacts or events tend to happen at the same time. The symmetry and local maximum also accord perfectly with the observations in Figure S11 , strengthening again the periodicity of real networks. Although there's no significant correlation between NTTP and burstiness for real networks (See Figure S14 ), human contacts have the highest predictability with the most bursty nature among all networks. Based on the characteristics of real networks, burstiness is one of the main causes of the highly predictable nature of human contacts. 

Markov is a predictive algorithm that considers a temporal network as a set of uncorrelated time series, and only uses the temporal information of each time sequence to predict. Suppose the time series is , its length is , and set the memory length as , then the time series is divided into − groups of time series, of which the i-th group consists of { , +1 , … , + }, with { , +1 , … , + −1 } being the history to help us predict + . These -length groups are randomly shuffled since there are no intergroup correlations under this circumstance, and 70% of the groups are used as training set while 30% are test set.

For each link in the temporal network we construct an -th order transition matrix, in which the rows are the input states of { , +1 , … , + −1 }, and columns are the output states of the next timestamp + . We modify the transition matrix through training set, and use the test set to check its performance. When predicting a time series, previous symbols prior to the object determine the input state, and the output state with the highest transition probability in the matrix is the output of the algorithm.

The Long Short-Term Memory network (LSTM) (17) is considered as an effective model to process sequential data, which is used as a basic model in this paper. The most significant difference between predicting time series and predicting network is that there is some correlation among links in network which could be used to promote prediction. So we adopt convolution LSTM (ConvLSTM) (18) , an improved version of the LSTM network, which captures spatiotemporal correlations of a network rather than just its temporality.

Assume there are snapshots in a temporal network, and the history length is set as , which means we use preceding snapshots to predict the next one. As a result we obtain − groups of network packages for prediction. For each network package, former snapshots are used as the input of ConvLSTM, with the last one as the output, or the label of ConvLSTM, since it is supervised learning. 70% of the network packages is used as the training set and 30% as the test set.

For ConvLSTM, the input 1 , 2 , … , , cell outputs 1 , 2 , … , , hidden states 1 , 2 , … , , and gates , , are three-dimensional tensors, while the counterparts of the FC-LSTM are two-dimensional vectors. Therefore we transform the twodimensional 1 × 2 matrix into a three-dimensional 1 × × matrix as the input.

The ConvLSTM determines the future state of a certain cell in the grid by the inputs and previous states of its local neighbors, which can be easily achieved by using a convolution operator in the state-to-state and input-to-state transitions (see Figure S17 ).

The ConvLSTM cell is the same as LSTM cell (see Figure S17 ) Since each link of the network takes value in a discreet set, the output of our predicting algorithm should also be discretized. Therefore before predicting, we discretize our prediction as its closest element in the value set of the network.

Suppose equals the number of correct predictions, while is the number of incorrect ones, then the accuracy of prediction for series is defined as = + , thus the accuracy of predicting the whole network is = ∑ .

neural network inspired by the principles of predictive coding from the neuroscience literature. It is trained for next-frame video prediction with the belief that prediction is an effective objective for unsupervised learning. We adopt a three-layer PredNet model as a predictive algorithm in this paper, and the input and data preprocessing are the same as the counterparts of ConvLSTM. 

Temporal networks

Empirical analysis of an evolving social network

Community structure in time-dependent, multiscale, and multiplex networks

Temporal networks: slowing down diffusion by long lasting interactions

Spontaneous recovery in dynamical networks

Causality-driven slowdown and speed-up of diffusion in non-Markovian temporal networks

Compensating for population sampling in simulations of epidemic spread on temporal contact networks

Modelling sequences and temporal networks with dynamic community structures

Dynamic smallworld behavior in functional brain networks unveiled by an event-related networks approach

Tracking brain dynamics via time-dependent network analysis

A dynamic network approach for the study of human phenotypes

Reconstruction of stochastic temporal networks through diffusive arrival times

Impact of non-Poissonian activity patterns on spreading processes

Gossip-based aggregation in large dynamic networks

Dynamic social networks promote cooperation in experiments with humans

The fundamental advantages of temporal networks

Introduction to Time Series and Forecasting

Quantifying long-term scientific impact

Hierarchical structure and the prediction of missing links in networks

Network-based prediction of protein interactions

Network-based prediction of drug combinations

The link-prediction problem for social networks

Some results regarding the entropy rate of random fields (Corresp.)

Nonparametric entropy estimation for stationary processes and random fields, with applications to English text

Stochastic blockmodels: first steps

Method for generating long-range correlations for large systems

Limits of predictability in human mobility

Convolutional LSTM network: a machine learning approach for precipitation nowcasting

Deep predictive coding networks for video prediction and unsupervised learning

Toward link predictability of complex networks

Time-Ordered Networks Reveal Limitations to Information Flow in Ant Colonies

The multilayer temporal network of public transport in Great Britain

Triadic closure in two-mode networks: Redefining the global and local clustering coefficients

Patterns and dynamics of users' behavior and interaction: Network analysis of an online community

The Enron email dataset database schema and brief statistical report. Information sciences institute technical report

Motifs in temporal networks

Matching organizational structure and social network extracted from email communication

Impact of human mobility on opportunistic forwarding algorithms

What's in a crowd? Analysis of face-to-face behavioral networks

Estimating potential infection transmission routes in hospital wards using wearable proximity sensors

Contact patterns among high school students

Reality mining: sensing complex social systems

Data on face-to-face contacts in an office building suggest a low-cost vaccination strategy based on community linkers

Nonparametric entropy estimation for stationary processes and random fields, with applications to English text

Limits of predictability in human mobility

A fast and elitist multiobjective genetic algorithm: NSGA-II

Long short-term memory

Convolutional LSTM network: A machine learning approach for precipitation nowcasting

Deep predictive coding networks for video prediction and unsupervised learning

None declared.