key: cord-0005573-x0ygzz4u authors: Wu, Guang; Yan, Shao-min title: Mutation trend of hemagglutinin of influenza A virus: a review from a computational mutation viewpoint date: 2006-05-03 journal: Acta Pharmacol Sin DOI: 10.1111/j.1745-7254.2006.00329.x sha: e9dd01bd0bd4a7b03a5034809e8c251b1b6d8879 doc_id: 5573 cord_uid: x0ygzz4u Since 1999 we have developed two computational mutation approaches to analyze the protein primary structure whose methodology and implications were reviewed in 2002. Our first approach is the calculation of predictable and unpredictable portions of amino-acid pairs in a protein, and the second is the calculation of amino-acid distribution rank in a protein. Both approaches provide quantitative measures to present a protein, which we have used to study a number of proteins with numerous mutations such as p53 proteins. More recently, we focussed our efforts on analyzing the proteins mutating frequently over time such as hemagglutinins of influenza A viruses. In this review we summarise our findings and their implications for hemagglutinin mutations in combination with some newly available data. Our approaches throw light on the true nature of genetic heterogeneity of influenza virus hemagglutinins; that is, the protein variability is highly relevant to its amino-acid construction. Using these approaches, we can monitor new mutations from influenza virus hemagglutinins and may predict their mutations in the future. Influenza A viruses have been responsible for four pandemics of severe human respiratory disease over the last century, resulting in tens of thousands of deaths all over the world [1] [2] [3] [4] [5] . Since 1997, avian influenza virus infections in poultry have taken on new significance, with increasing numbers of cases involving bird-to-human transmission and the resulting production of clinically severe and fatal human infections [6] [7] [8] . At present, two out of three general conditions for the onset of a pandemic have been met; namely, the emergence of a new virus and its ability to replicate in humans causing serious illness. Should the virus achieve efficient human-to-human transmission, the next influenza pandemic might occur [9] . By accumulating point mutations (genetic drift), the influenza A viruses change their antigenic properties mainly in the RNA genes, which code for ten proteins [10] . Hemagglutinin is one of the antigens for neutralizing antibodies and is involved in the binding of virus particles to receptors on host cells [11, 12] . It plays an important role in the propagation of the virus and in the infection of the host [13, 14] . There are two mechanisms to explain the host-mediated variation of influenza A viruses. The first mechanism is the pressure of the antibody [15] . The second is the selective pressure for the appearance of the host cell variant with altered receptor binding specificities [16] . The sequence analysis is widely used to estimate the mutations of influenza viruses [17] [18] [19] [20] . Other approaches have been used to study the mutations of influenza viruses, such as the model of protein evolution [21] , the mathematical model capturing both realistic epidemiological dynamics and viral evolution at the sequence level [22] , the travelling waves in a one-dimensional model [23] , and so on. From 1999 to 2002, we have developed two computational mutation approaches to analyze the protein primary structure, which include (i) the calculation of predictable and unpredictable portions of amino-acid pairs in a protein [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] , and (ii) the calculation of amino-acid distribution rank in a protein [39] [40] [41] [42] [43] . We reviewed these approaches and their implications in 2002 [44] (Table 1) . Since then, we have mainly been applying these approaches to analyze mutations in different proteins. One of the advantages of our approaches is that we can use a single value as a numerical measure to represent a protein, and then compare different proteins among a protein family [45] [46] [47] [48] [49] , evaluate the mutation effect on proteins [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] , reveal the mutation trend in proteins [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] , calculate the mutation periodicity [66, 67] , and trace the mutation process along the time course [66, 67] . In this review we summarise our findings and their implications for mutation features of influenza A virus hemagglutinins in combination with some newly available data. With our first approach (Appendix), we mainly use the unpredictable portion of amino-acid pairs to study the historical trend of hemagglutinin mutations because the unpredictable portion is not engineered by randomness and is deliberately constructed. Also it can present precisely a hemagglutinin with a single value, which is an unambiguous record of hemagglutinin evolutionary process as we can view each hemagglutinin as a sample from its evolution. Figure 1 displays the general mutation trend of influenza A virus hemagglutinins from 1918 to 2005 using the unpredictable type and frequency of amino-acid pairs. Each symbol represents the mean value of unpredictable portion of all full-length hemagglutinins in the given year. Several points can be drawn in Figure 1 : (i) Two regressed lines indicate that the unpredictable portions of amino-acid pairs are decreasing over time; that is, the hemagglutinins are structur- [66] . With our second approach (Appendix) and similar rationale, we use the amino-acid distribution rank to study the historical trend in hemagglutinin mutations. Figure 2 shows the general mutation trend of influenza A virus hemagglutinins from 1918 to 2005 using the amino-acid distribution rank. The regressed lines indicate the trend of the amino-acid distribution rank with respect to different species over time. This measure is quite stable for the human hemagglutinins, which suggests that the functional clusters in human hemagglutinins are relatively stable over time because any permanent re-distribution of amino acids would lead to the change in its regressed line. On the other hand, the amino-acid distribution rank increases in avian and equine hemagglutinins and decreases in swine hemagglutinins over time, which indicates that these hemagglutinins have experienced the re-distribution of amino acids. As a result, the function of these hemagglutinins has either increased or decreased over this period of time; that is, the hemagglutinin structure of influenza A viruses becomes more probabilistically complicated for avian and equine, but less complicated for swine [67] . The fluctuated data in Figures 1 and 2 imply that there may be some kind of periodicity in mutation sensitivity of influenza A virus hemagglutinins. Figure 3 displays the periodicity obtained from 1963 to 2005 using the fast Fourier transform (Appendix), where each stick represents a periodicity and the stick height is the magnitude of change with respect to unpredictable type/frequency of amino-acid pairs and amino-acid distribution rank. For example, the first prominent peak in the unpredictable type of amino-acid pair is located at 2.3 years/cycle with the height of 2.24%, which means the hemagglutinins would experience 9 type mutations (2.24%/0.25%=8.96, Appendix) at an interval of 2.3 years. This observation is different from the findings in our previous study [66] , where we found 5 type mutations based on the data from 1971 to 2004. This difference, of course, results from different databases as more data are documented herein, however it further supports that the number of hemagglutinin mutations is indeed decreasing over time. Notably, the prominent peak in the unpredictable frequency of amino-acid pair is located at a cycle of 11 years, which could approximately be connected to the interval of three pandemics (the second pandemic in 1957, the third in 1968 and the fourth in 1977 [68, 69] ). Compared to the periodicities based upon unpredictable type/frequency of amino-acid pairs and amino-acid distribution rank [67] , we understand that there would be type mutations in quite a short period of time with the change in distribution of functional clusters, then there would be frequency mutations in a relative long cycle. In other words, some types of amino-acid pairs in functional clusters appear or disappear during a short period of time, thereafter the mutations are more likely to modify the existing types of amino-acid pairs. As can be seen in Figure 1 , the historical trend of hemag-glutinin mutations is that the unpredictable portions are decreasing. Thus we would deduce that the larger the unpredictable portion is, the larger the mutation trend is, and this deduction is identical to the findings in our studies on other proteins [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] . Figure 4 illustrates the unpredictable portions of aminoacid pairs and distribution rank of amino acids in different hemagglutinin subtypes from influenza A viruses. Although each hemagglutinin subtype has different sensitivity to future mutations, a slight trend can be found in the top panel, for example, the unpredictable type decreases from H2 to H4, from H5 to H7, and increases from H11 to H14. We are particularly interested in H1, H2, H3, H5, H7, and H9 because they are linked to previous human infections. It is important to note that the unpredictable type of H5 hemagglutinins is remarkably larger than others, so the H5 hemagglutinins have a stronger mutation trend than others [46, 66] , we would there- fore expect to see more mutations in H5 hemagglutinins until their unpredictable type reaches the average level, which is currently about 63.3%. This means that we would anticipate 12 type mutations in the future (3%/0.25%) seeing that the mean value of unpredictable type is about 66.5% in H5 hemagglutinins. However, H5 hemagglutinins do not appear remarkably larger than the others in unpredictable frequency and distribution rank (the middle and bottom panels in Figure 4 ). As mentioned in the Appendix, the unpredictable type includes both absent and present types of amino-acid pairs, whereas the unpredictable frequency and distribution rank deal only with the amino acids present in a protein. Thus, the unpredictable type of amino-acid pair does indicate the mutation trend for the future, as some absent types can appear through mutations. Although the percentage of unpredictable portions indicates the mutation trend, the unpredictable portions include a large number of amino-acid pairs, for example, the hemagglutinin of 1918 "Spanish" influenza virus contains 565 amino-acid pairs, of which 434 are unpredictable (76.81%). To search the amino-acid pair with big mutation potency, we use the difference between actual and predicted frequencies (Appendix) to determine the amino-acid pairs sensitive to mutations because this difference can serve as a measure of the structural stability of an amino-acid pair; that is, the smaller the difference the more stable the construction of the amino-acid pair. In particular, the larger the positive difference the more unpredictable the present amino-acid pair and, in contrast, the larger the negative difference the more unpredictable the absent amino-acid pair. In practice, the aminoacid pairs whose actual frequency is larger than their predicted one are likely to be targeted by mutations. In contrast, the amino-acid pairs whose actual frequency is smaller than their predicted one are more likely to be formed through mutation according to our previous studies [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] . Hence, the hemagglutinin containing a large number of amino-acid pairs with a big difference between actual and predicted frequencies is more sensitive to mutations than that with a large number of amino-acid pairs with a small difference between actual and predicted frequencies. Figure 5 gives us an idea of such an analysis, where two aspects can be seen. (i) The bars are not symmetric with respect to the zero difference (x-axis), indicating that there are more amino-acid pairs with a positive difference and less with a negative difference. In general, there are more amino-acid pairs being targeted by mutation than those formed through mutation. (ii) The avian hemagglutinins have more aminoacid pairs with larger positive differences than human ones. For instance, the largest positive difference is up to 9 in avian, but only 6 in humans. Seeing that the species vulnerability depends on the number of amino-acid pairs with larger differences, there would be more mutations in avian, this elucidates why so many mutations have been found in avian influenza viruses, and supports the notion that the avian species harbour a large reservoir of influenza virus strains [70] [71] [72] [73] [74] [75] . In addition, we can compare the mutation trend among different birds by reason that they are directly correlated to the occurrence of bird flu. Figure 6 shows that there are more unpredictable types in aquatic avian than in terraneous ones, thus water flock, especially aquatic birds and geese, are more sensitive to mutations [66] . These features provide the evidence supporting the hypothesis that aquatic birds are the primordial source of all influenza viruses in other species [1, 76] . Because unpredictable portions of amino-acid pairs and their fluctuation are becoming smaller along the time course (Figure 1 ), the trend line and channel clearly suggest the future of influenza virus hemagglutinins (Figure 7 ). Generally speaking, we would expect that the unpredictable portions and their fluctuation are continuing smaller along the trend line and channel although some fluctuations may go beyond the band. In such a case, there would be fewer pandemics in this century than in the last one. This presumption can be supported by the fact that there were four major influenza epidemics/pandemics recorded during the less than 20 years between 1830 and 1848 [77] . It was quite the opposite, with only four influenza pandemics recorded in the last century. These phenomena can be explained by our approach [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] 66] . The unpredictable amino-acid pairs should deliberately be conserved for a certain purpose because their construction requires more time and energy, so that a protein maintains only absolutely necessary numbers of unpredictable aminoacid pairs. During the evolutionary process, nature is trying to minimize the unpredictable portion through mutations, which may bring about new unpredictable amino-acid pairs, and the newly introduced mutations result in the fluctuations [66] . Based on the fast Fourier transform of unpredictable type of amino-acid pairs, we can approximately designate the hemagglutinin evolutionary process with a cycle of around 7 years, and furthermore we can estimate our position at the current cycle of hemagglutinin evolutionary process to determine how many years remain before the next spike, which may be related to severe mutations. Figure 8 demonstrates our position at the current cycle of hemagglutinin evolutionary process, and we are approaching the last year of the cycle. Compared with five historical lines, the unpredictable type of amino-acid pairs has a 3/5 chance of going up. The undetermined factor for future mutation is the impact, which can either lead to the mutation or has no effect, meanwhile a mutation can occur without any impact (Appendix). Figure 11 shows the change in amino-acid distribution rank of different hemagglutinin subtypes from 1992 to 2002. Although we have yet to know what the impact was in 1997, the impact did interrupt the pattern of amino-acid distribution ranks in different subtypes of hemagglutinins along the time course. Nevertheless, the hemagglutinin of influenza A virus must have the ability to mutate itself, otherwise no impact would have an effect on it. This ability is the probability of the occurrence of mutation, which can be determined using the cross-impact analysis with Bayesian equation (Appendix). Table 2 shows the probability of the occurrence of mutation in different subtypes of hemagglutinins and neuraminidases. As can be seen in Table 2 , the H5 hemagglutinins have the maximal of 0.427 chance of occurrence of mutation without any impact, which is the largest probability of occurrence of spontaneous mutations among different hemagglutinin subtypes. Also, the chance of occurrence of spontaneous mutation is larger in N1 neuraminidases than in N2 ones [65] . Thus, the H5N1 viruses have the largest chance of simultaneous occurrence of mutations among different subtypes of hemagglutinins and neuraminidases, which can explain why there are so many mutations in H5N1 influenza viruses [72] [73] [74] [75] [76] . In fact, the mutation chance listed in the lower part of Table 2 is quite small if we consider that these probabilities represent the largest chance of simultaneous occurrence of mutation. This means that the impact does play an important role in the enhancement of the occurrence of mutation. Thus, we need to implement the probability of occurrence of mutation P(1) into a dynamic frame; that is, the chance of occurrence of mutation with different intensities of an impact. This is defined by the cross-impact analysis as the impacted probabilities P(1/2) and P(1/2) with equations 1 and 2 (Appendix). Figure 12 illustrates the impacted probabilities with respect to the probability of occurrence of mutation P(1) and to the probability of occurrence of impact P(2) in different hemagglutinin subtypes from influenza A viruses, which can be read as follows. First, the left panels show these three probabilities in a 3-dementional configuration: the P(1) in xaxis, the P(2) in z-axis and the impacted probabilities in yaxis. Second, both impacted probabilities P(1/2) and P(1/2) are represented by two triangles ABC and ACD, respectively. Third, they are transferred into a 2-demensional figure (black and gray triangles in the right panels), so the dynamics of impact on the occurrence of mutation can be viewed easily and clearly [65] . Three clues to insights can be drawn from Figure 12 : (i) as the intensity of the impact P(2) increases, probability of the occurrence of spontaneous mutations P(1|2) decreases (black triangle ABC), whereas the probability of the occurrence of P(1/2) induced mutations P(1/2) increases (gray triangle ACD); (ii) the impact can significantly enhance the probability of the occurrence of mutations P(1), whose range enlarges to the top-right corner from bottom-left; for example, the P(1) in H1 hemagglutinins changes to the range of 0.653-1 from the range of 0-0.347 (top panel); and (iii) among different hemagglutinin subtypes, the probability of the occurrence of mutations (both the black and gray triangles) is large in H5 hemagglutinins, which further supports the view that H5 hemagglutinins mutate more frequently. Thus, Figure 12 provides a quantitative way to predict the mutation trend of influenza virus hemagglutinins if we could predict an impact and define its intensity. In this review, we summarise the results at the first stage of our studies [46, 48, [65] [66] [67] 78] ; that is, we focus mainly on the general trend of hemagglutinin mutations. This is not only because we are still developing our methods, but also because at this stage the systematic trends are more important than the trends in an individual case, which is masked by numerous random and uncertain factors. We hope to understand how the principal average or trend arises before we are able to explain what is causing individual cases to depart from it. In our future studies, one direction is to monitor the change in unpredictable portions and distribution rank with new available data, since we should be alert to a possible outbreak of influenza and bird flu if a significant change is recorded. The other direction is to predict future mutations, because given a mutation in a protein, our first approach reveals the mutation trend of an individual amino-acid pair; our second approach indicates the sensitive position, and our third approach, which is the translation probability between RNA and mutated amino acid [79] , highlights the possible mutated amino acid. Together with these three approaches, we may predict the mutations at a specific amino acid in a particular region of influenza virus hemagglutinins in the future. The rationale and detailed description of our approaches have been published in all of our previous publications 78] ; however, as they are not yet familiar to many researchers, we describe our methods with the example of the hemagglutinin of 1918 "Spanish" influenza A virus (accession number AF117241), which is the earliest complete sequence documented in the data bank [80, 81] . As we know that an amino-acid pair in a protein is composed of any 20 kinds of amino acids, so theoretically there are 400 possible types of amino-acid pairs. In terms of amino-acid pairs, the distinguishing of protein differs in the number of possible types of amino-acid pairs and/or in the frequency of each type. The 1918 hemagglutinin is composed of 566 amino acids, thus there are 565 amino-acid pairs. Of 400 possible types, 137 are absent and 263 present: 111 types appear once, 71 twice, 42 three, 24 four, 8 five, 5 six, 1 nine and 1 eleven times, respectively. Randomly predictable present type of amino-acid pair with predictable frequency T here are 4 4 glycines (G) and 37 threonines (T) in the 1918 hemagglutinin, the frequency of random presence of amino-acid pair GT is 3 (44/566×37/565×565=2.876); that is, GT would appear three times in the protein. In fact, we do find three GT in this hemagglutinin, so the actual frequency of GT is 3. In this case both the presence of the type GT and its frequency are predictable, and the difference between its actual and predicted frequencies is 0. Randomly predictable present type of amino-acid pair with unpredictable frequency There are 51 leucines (L) in the 1918 hemagglutinin, the frequency of random presence of aminoacid pair LL is 5 (51/566×50/565×565=4.505); that is, there would be five LL in this hemagglutinin. But actually the LL appears eleven times, so the presence of LL is predictable, bu t its frequ ency is unpredictable, and the difference between its actual and predicted frequencies is 6. It is also the case that the actual frequency is larger than the predicted one. Another case is that the actual frequency is smaller than predicted. For instance, there are 35 glutamic acids (E) in the protein, the predicted frequency of EG is 3 (35/566×44/565×565= 2.721), while its actual frequency is only 1 and the difference between its actual and predicted frequencies is -2. There are 13 histidines (H) in the 1918 hemagglutinin, and the predicted frequency of HH is 0 (13/566×12/565×565=0.276), so the type HH would not appear in this hemagglutinin. However it appears twice in the reality, thus the presence of HH is unpredictable. Naturally its frequency is unpredictable too, and the difference between its actual and predicted frequencies is 2. Randomly predictable absent type of amino-acid pair There are 8 methionines (M) and 17 glutamines (Q) in the 1918 hemagglutinin. The predicted frequency of MQ is 0 (8/566×17/565×565= 0.240); that is, the MQ would not appear in this hemagglutinin, which is true in the real situation. Thus the absence of MQ with its frequency is predictable, and the difference between its actual and predicted frequencies is 0. Randomly unpredictable absent type of amino-acid pair There are 49 serines (S) and 37 alanines (A) in the 1918 hemagglutinin. The predicted frequency of SA is 3 (49/566×37/565×565=3.203); that is, there would be three SA in this hemagglutinin. However, in reality no SA appears, therefore the absence of SA from this hema gglu ti ni n is unpr edicta ble. N at ur al ly it s fr equency is unpredictable too, and the difference between its actual and predicted frequencies is -3. Predictable and unpredictable portions of amino-acid pairs After the calculations described above, the amino-acid pairs in a protein can be classified as predictable and unpredictable portions with respect to type and frequency, and the sum of both predictable and unpredictable portions is 100%. Of the amino-acid pairs in the 1918 hemagglutinin, the unpredictable type and frequency are 68.75% and 76.81%, respectively. Either the predictable or unpredictable portions can serve as a quantitative measure to present a protein. Difference between actual and predicted frequencies of aminoacid pairs The unpredictable amino-acid pairs can be further divided into two parts: one is the actual frequency smaller than predicted, while the other is the actual frequency larger than predicted. To compare the constructive stability of amino-acid pairs, we calculate the difference between actual frequency and predicted frequency of amino-acid pairs in a protein. Considering 400 theoretical aminoacid types of the 1918 hemagglutinin, 153 types (38.25%) have the actual frequencies smaller than predicted and the average difference is -1.183, while 122 types (30.50%) show the actual frequencies larger than predicted and the average difference is 1.492. Taking 565 amino-acid pairs of the 1918 hemagglutinin into account, 71 pairs (12.57%) contain actual frequencies smaller than predicted and the average difference is -1.197, whereas 363 pairs (64.25%) reveal the actual frequencies larger than predicted and the average difference is 1.846. Type mutation and frequency mutation As there are 400 types of theoretically possible amino-acid pairs and we use the 100% to classify them as predicable and unpredictable types, thus 0.25% represents one of 400 types, so a 0.25% change indicates that one of 400 types mutates to an unpredictable type from a predictable type or vice versa. T his is the type mutation. However, the situation related to the frequency of amino-acid pairs is dependent on the length of amino-acid sequence. Hence, approximately 0.18% (1/565) change can be regarded as a modification in an amino-acid pair in the 1918 hemagglutinin, as there are 565 amino-acid pairs in the protein. This is the frequency mutation. The position of any 20 kinds of amino acids in a protein can be determined by experimental approach, so each kind of amino acid has a certain distribution pattern in a protein with respect to its position. Furthermore a certain distribution pattern can be associated with a certain probability, which can be calculated according to the occupancy problems of subpopulations and partitions [82] . For a certain distribution of a kind of amino acid in a protein, its distribution probability is equal to r!/(q 0 !×q 1 !×...×q n !)×r!/(r 1 !×r 2 !×...×r n !)×n -r , where ! is the factorial function; that is, n!=n×(n -1)×(n -2)×...×1, 0!=1 by definition, r is the number of a type of amino acid, q is the number of parts with the same number of amino acids and n is the number of grouped parts in the protein for a type of amino acid. For example, there are eight metheonines (M), the least abundant amino acid, in the 1918 hemagglutinin, how do these 8 M distribute among 566 amino acids in this hemagglutinin? We can imagine to group this hemagglutinin into 8 parts, and each one contains about 71 amino acids (566/8=70.75). Table 3 lists all 22 possible distribution patterns regarding 8 M in 8 parts and their distribution probabilities and ranks. The first eight columns in Table 3 show that the 1918 hemagglutinin is grouped into 8 parts, and the first eight cells in each row represent a possible distribution pattern of M, and the last two columns display the corresponding distribution probability and rank. As different distribution patterns can have the same distribution probability, we rank the distribution probabilities according to a descending order, thus the largest distribution probability is ranked as one. Again in our example, there are 22 possible distributions for 8 M in 8 parts in Table 3 , while there are only 18 distribution ranks. In general, the smaller the distribution rank is, the larger the distribution probability is. Although there are many possible distributions for a type of amino acids in a protein (such as 22 possible distributions for 8 M), the protein in question possesses only one distribution pattern, therefore there is only one distribution probability/rank for each kind of amino acid and a maximum of 20 distribution probabilities/ ranks in a protein. Similarly, we imagine to group 11 parts for 11 tryptophans As different kinds of amino acids have different contributions to a protein, we standardize them by means of the distribution rank per amino acid, which is calculated by dividing the rank of each kind of amino acids by the number of corresponding amino acids. In the 1918 hemagglutinin, the distribution rank of metheonines is 1/8= 0.125, because these 8 metheonines distribute in the hemagglutinin with the largest probability (0.2523). Accordingly, the sum of ranks Table 3 . Distributions of 8 methionines in 8 parts of the 1918 "Spanish" influenza virus hemagglutinin with their distribution probabilities and ranks (the bold font indicates the actual distribution probability and rank). for all 20 kinds of amino acids is 22.8698, thus the distribution rank in this protein is 22.8698/20=1.1435. Naturally, the distribution rank can serve as a quantitative measure to present a protein. The construction of a protein with large predictable portions of amino-acid pairs and with small amino-acid distribution rank is certainly a way to adapt to the fast-changes in surroundings and environments, as the speed of construction of a protein might be crucial for its survival, this would require the least time and energy. However, nature should deliberately construct the amino-acid pairs whose actual frequency differs from the predicted frequency and whose amino acids cluster somewhere of a protein with a larger aminoacid distribution rank. The functional amino-acid pairs should be deliberately evolved, so a protein keeps only absolutely necessary unpredictable amino-acid pairs. During the evolutionary process, nature is trying to minimize the unpredictable portion through mutations, which may bring about new unpredictable amino-acid pairs, so the evolutionary process is continuing. One of important applications of Fourier analysis is to determine the periodicity in a chaotic fluctuating dataset, so we use it to analyze the potential periodicity of hemagglutinin mutations over time [66, 67] . An impact can either lead to the mutation or has no effect, meanwhile a mutation can occur without any impact. In the context of our study, these relationships can be represented by means of the cross-impact analysis in Figure 13 [83] [84] [85] [86] [87] [88] . At the impact level, P(2) is the probability of not occurring of an impact, and the P(2) is the probability of occurring of an impact. At the mutation level, the P(1|2) is the impacted probability of occurring of mutation without an impact, the P(1|2) is the impacted probability of not occurring of mutation without an impact, the P(1|2) is the impacted probability of not occurring of mutation with an impact, and the P(1|2) is the impacted probability of occurring of mutation with an impact. At the influenza level, both P (12) and P (12) are the probabilities of not occurring of influenza without and with an impact, whereas both P (12) and P (12) are the probabilities of occurring of influenza without and with an impact. The impact on mutation can be traced through Figure 13 , for example, the probability P(12) will increase if an impact enhances the chance of the occurrence of mutation. Most importantly, Bayes' law indicates that the probabilities of occurrences of two events can be related by where P(1) is the probability of the occurrence of mutation. In the cross-impact analysis, both enhancement and inhibition have been defined with respect to coupled events. The enhancement is that the occurrence of the second event enhances the probability of occurrence of the first event; that is, P(1|2)>P(1), in our case, an impact increases the occurrence of mutation. The inhibition is that the occurrence of second event inhibits the probability of the occurrence of the first event; that is, P(1|2)