key: cord-0002459-i4c6i6rj
authors: Du, Yushen; Wu, Nicholas C.; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting; Sun, Ren
title: Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis
date: 2016-11-01
journal: mBio
DOI: 10.1128/mbio.01801-16
sha: 6e6e95ecc4688bc3345ecfa11f8fee61eccb82bc
doc_id: 2459
cord_uid: i4c6i6rj

Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available.

Because of their compact genome, viruses usually encode multifunctional proteins, including viral polymerase proteins. Viral RNA-dependent RNA polymerase (vRdRp) is used by many RNA viruses for transcription and replication. Functions of vRdRp can be grouped into two classes: canonical and noncanonical. The canonical vRdRp functions include template and nucleotide binding, initiation, and elongation (17) (18) (19) . Among different classes of RNA viruses, these canonical functions and corresponding protein structural features are conserved (17) (18) (19) (20) (21) (22) . The noncanonical functions of vRdRp, however, are specific to each virus. For example, multimerization of hepatitis C virus (HCV) vRdRp is essential for viral replication. Thus, the interacting residues in HCV vRdRp are noncanonical functional residues specific to HCV (23, 24) . Moreover, vRdRp often recruits cellular machinery for replication and plays a role in inhibition of the cellular immune response (25) (26) (27) (28) (29) (30) (31) . Noncanonical functional residues are usually involved in the performance of those functions and thus are essential for viral replication. Noncanonical functional residues in vRdRp are difficult to determine by commonly used methods and are not as well studied as the key residues for polymerase catalytic functions. However, the noncanonical functional residues are indispensable for thorough protein characterization and may act as drug targets. As a result, it is essential to develop methods that enable the identification of noncanonical functional residues.

We previously developed a method to systematically identify functional residues by coupling experimental fitness measurement with protein stability prediction (16) . Here, we extended this method to annotate functional residues in combination with structural comparison of homologous proteins. The method consists of three steps. First, the effect of PB1 mutations on viral replication at single-nucleotide resolution is examined by saturation mutagenesis and high-throughput sequencing. Second, functional PB1 residues that are essential for viral growth but do not affect protein stability are identified by protein stability prediction. Third, homologous structural alignment is used to further annotate specific biological functions (canonical versus noncanonical functions) for each functional residue (Fig. 1) . We achieved high sensitivity in identifying and annotating the canonical polymerase functional residues. Moreover, we also identified noncanonical functional residues, which are exemplified by a cluster of residues located in the loop region of the PB1 ␤-ribbon. These previously uncharacterized residues were shown to be important for PB1 protein nuclear import by interacting with Ranbinding protein 5 (RanBP5) (32) .

High-throughput genetics have been applied to a number of viral, bacterial, and cellular proteins (16, 33-38, 111, 112) . Here, point mutations were randomly introduced into segment 2 of influenza A/WSN/33 virus through errorprone PCR. To provide a more accurate quantification of the fitness effect of single mutations, we employed the "small-library" method that we recently developed (16) . Nine small libraries were generated to cover all of segment 2 (see Fig. S1A in the supplemental material). Each small library was transfected into 293T cells together with seven plasmids that encoded the other wild-type viral segments (39) . Reconstituted mutant virus libraries were used to infect A549 cells at a multiplicity of infection (MOI) of 0.05, and supernatants were collected 24 h postinfection. The input DNA libraries, posttransfection libraries, and postinfection libraries were subjected to Illumina sequencing. To control for technical error and assess library quality, biological duplicates were included in both transfection and subsequent infection steps ( Fig. 2A) .

The distribution of the number of mutations in the input DNA library was examined. Thirty to 35% of the input DNA library plasmids contained the desired single nucleotide mutations (see Fig. S1B in the supplemental material). We achieved at least 20,000ϫ sequencing coverage for each nucleotide position (see

Comparison of the conservation-based method and our method. The conservation-based method is commonly used to identify and annotate functional protein residues, but it has three major limitations. First, it is limited by the insufficient sampling of protein functional space in natural evolution. Second, it is challenging for this method to dissect residues with structural or functional constraints. Lastly, it is limited to distinguishing the diverse functions within the same protein. The method we present here may overcome these limitations and provide a systematic way to annotate functional residues. Using high-throughput fitness profiling, we can identify essential residues for viral replication. Through mutant protein stability prediction, we are able to dissect the structural and functional constraints. Homologous structural analysis is used to further annotate canonical and noncanonical functional residues.

Fig. S1C). The library covered 94.9% of the nucleotides in segment 2 and included 98.2% of the single nucleotide mutations of observed positions (see Fig. S2A and C). To further improve the accuracy of fitness quantification, we focused on the mutations that make up Ͼ0.1% of the plasmid mutant library. After this quality control, we were still able to observe 94.2% of the nucleotide positions with 63.9% of the single nucleotide mutations. More than 82% of the nucleotide positions were covered with two or three nucleotide mutations (see Fig. S2B and D in the supplemental material). To assess the quality and reproducibility of our mutant library, we compared the relative frequencies of single mutations between biological replicates. We obtained a strong Spearman correlation coefficient of 0.93 for two independent transfections and 0.75 for infections (Fig. 2B) . A relative fitness (RF) index was calculated for individual mutations as the ratio of relative frequency in the infection library to that in the input DNA library. The profiling data of all of segment 2 are shown in Fig. 2C , where most of the mutations had a fitness cost (log 10 RF index of Ͻ0).

Systematic identification of deleterious mutations of the PB1 protein. Segment 2 of influenza A virus encoded three proteins: PB1, PB1-F2, and N40. N40 was a truncated form of the PB1 protein that lacked the first 39 amino acids. PB1-F2 is not essential for viral replication in vitro, as completely abolishing PB1-F2 expression had no effect on viral growth (40, 41) (see Fig. S3 in the supplemental material). So we focused on the PB1 protein for downstream analysis. The RF indexes of silent mutations were considered an internal quality control since most, if not all, of them were expected to have a growth capacity comparable to that of the wild type. In the fitness profile of the PB1 protein, the RF indexes of silent mutations followed a normal distribution with a mean of 0.9 and were significantly higher than those of nonsense mutations (two-tailed t test, P ϭ 4.6E-21) (see Fig. S4 in the supplemental material). This result confirms the presence of fitness selection and validates the data quality.

To systematically identify deleterious mutations, we chose a stringent RF index cutoff of Յ0.1. A total of 2.4 percentage points of silent mutations fell below the cutoff, which represented type I error. A total of 43.1 percentage points of missense mutations that satisfied this cutoff were identified as deleterious mutations (Fig. 3A) . We randomly selected 14 deleterious mutations and reconstructed them individually. Rescue experiments were performed, and the resultant viral titers were quantified by 50% tissue culture infective dose (TCID 50 ) assay. Thirteen of 14 mutant viruses had at least a 10-fold drop in the viral titer compared to that of the wild type. The other mutant also showed a more-than-6- fold titer decrease (Fig. 3B ). These results validated the approach we used to systematically quantify the RF and identify deleterious mutations of the PB1 protein.

Identifying functional residues by dissecting structural and functional constraints. A mutation might be deleterious because of structural or functional constraints (16, 42) . We have recently demonstrated that coupling high-throughput genetics with mutant stability predictions can identify residues that are dominated by functional constraints (16) . Briefly, deleterious mutations that do not destabilize the protein are identified as functional residues. Here, we modeled protein stability by using two computational tools: I-Mutant and Rosetta ddg monomer (see Data Set S1 in the supplemental material).

I-Mutant is a supporter vector machine-based software used to predict the effect of single-site mutations on protein stability (⌬⌬G) (43) (44) (45) . On the basis of the predicted ⌬⌬G, mutations can be classified as destabilizing (⌬⌬G, ՅϪ0.5), neutral (Ϫ0.5 Ͻ ⌬⌬G Ͻ 0.5), or stabilizing (⌬⌬G, Ն0.5). We applied I-Mutant predictions for all missense mutations in PB1 with the structure resolved from the bat influenza A virus polymerase complex (Protein Data Bank [PDB] code 4WSB) (46, 47) . Of the mutations for which structure information is available, 64.5% were shown to be destabilizing, 33.5% were neutral, and 2% were stabilizing (see Fig. S5A in the supplemental material). As expected, destabilizing mutations had a significantly small solvent-accessible surface area (SASA) (48-50) (see Fig. S5B ). To further reduce the rate of falsenegative functional residue identification, we performed protein stability prediction with Rosetta for all deleterious mutations (16, 42, 44) . Unlike the machine learning algorithm used by I-mutant, Rosetta generated structural models for single amino acid mutations based on a preoptimized wild-type structure. With a highresolution protocol, 50 models of wild-type and mutant protein structures were generated and the three lowest ⌬⌬G values were averaged on the basis of optimized rotamers. The absolute correlation coefficient of the predictions that resulted from these two methods was 0.3 (see Fig. S5C ). Aiming at getting a conserved classification of functional residues, we classified a residue as func-tional if it had one or more missense mutations satisfying both the deleterious RF index cutoff and nondestabilizing criteria of ⌬⌬G predictions from either software. We identified 297 residues as functional.

To examine the sensitivity of our method of identifying functional residues in PB1, we performed a thorough literature search, compiled 31 residues that were reported to be functional in PB1 (32, (51) (52) (53) (54) , and compared the performance of our method with that of four other methods: FireStar, Frpreq, Consurf, and Concavity (6, 10, 55-58) ( Table 1) . Our method was able to identify 21 of the 31 residues and thus had a sensitivity of~68%. FireStar failed to identify any of them. Frprep, Concavity, and Consurf identified 4 (Frprep score, Ն8), 7 (Concavity score, Ͼ0.1), and 17 (Consurf score, 9) residues, respectively. Notably, our method was the only one that identified functional residues related to noncanonical polymerase functions (four of the eight residues) that were not conserved in sequence or structure. Overall, these results validated our method of combining high-throughput genetics with mutant stability prediction to identify functional residues in PB1 in a sensitive and unbiased manner (16, 42, 44) .

Annotating functional residues by homologous structural alignment. The vRdRp family has a conserved "right-handed" structure. It consists of three major conserved domains (finger, palm, and thumb) and six motifs (pre-A/F and A to E) (20) . Since canonical vRdRp functional residues of the PB1 protein are expected to be structurally conserved, they aligned well with other protein structures from the vRdRp family. Therefore, homologous structural alignment might enable us to further annotate PB1 residues by distinguishing canonical and noncanonical vRdRp functional residues. The recent improvement of algorithms provides opportunities for more accurate structure comparison. Here we used TM-align and 3DCOMB for pairwise and multiple structure alignments (MSAs) (59) (60) (61) . Both softwares use TM-score to quantify protein structural similarity, which is robust to local structural variation and is protein length independent (59, 60) . Moreover, 3DCOMB takes into account both local and global features, which is suitable for alignment of distantly related protein structures (61) .

Twenty representative vRdRp structures were selected from positive single-stranded RNA (ssRNA) viruses, negative ssRNA viruses, and double-stranded RNA (dsRNA) virus families on the basis of previously stated criteria (20) . Briefly, representative structures were selected from each of the Baltimore classes that encoded vRdRp, including positive ssRNA viruses (Caliciviridae, Flaviviridae, Picornaviridae, Cystoviridae), dsRNA viruses (Birnaviridae, Cystoviridae, and Reoviridae), and negative ssRNA viruses (Bunyaviridae) (62) (63) (64) (65) (66) (67) (68) (69) (70) (71) (72) (73) (74) (75) (76) (77) (78) (79) (80) (81) . Structures with no mutations and with a bound substrate were preferred. PDB files with the highest resolution were picked for each protein (see Table S1 in the supplemental material).

To ensure sufficient structural similarity, a pairwise structural comparison was performed with the selected protein and PB1 by using TM-align. The structures with TM scores of Ͼ0.5 were kept for multiple structural alignment, which generally indicated similar protein folding (43) . Figure S6A in the supplemental material provides an example superimposition of the PB1 protein with HCV NS5B (PDB code 2XI3) with decent alignment in major protein domains (67) . A total of 16 proteins were included for MSA with PB1 by using 3DCOMB (see Table S1 in the supplemental material).

The root mean square deviation (RMSD), the measurement of the average distance between the atoms and superimposed proteins, was reported by 3DCOMB for each residue as the representative of structure conservation. As the reported aligned residues had RMSD scores ceiled at 9, we assigned the residues that did not align among structures with an RMSD value of 10 ( Fig. 4A) . Low RMSDs meant that the residues were conserved in the vRdRp family and thus more likely to have canonical vRdRp functions. As expected, the structurally conserved residues were less tolerant of mutations. The average RF index of structurally conserved residues was significantly lower than that of nonconserved residues (two-tailed t test, P ϭ 0.0006, Fig. 4B ). The RMSDs of all of the identified functional residues of the PB1 protein were plotted. A smooth curve of RMSDs was fitted by local polynomial (loess) regression. We could clearly identify the six conserved domains (pre-A/F and A to E) of vRdRp as valleys on the smooth curve (Fig. 4C) . These results demonstrated the feasibility of using homologous structural alignment to identify canonical vRdRp residues.

Identification of noncanonical functional residues, ones involved in nuclear import of the PB1 protein. Forty-three percent of the functional residues identified could not be aligned with other protein structures from the vRdRp family. Although this could be due to poor alignment quality, it is also possible that these residues have noncanonical functions that are essential for viral growth. Interestingly, 62% of these residues belong to the protein interface between PB1 and PB2 or PA, as identified by the change in SASA upon complex formation by using Sppider (residues with at least a 4% decrease in SASA and Ͼ5 Å 2 of exposed surface area upon complex formation) (82) (see Fig. S6B in the supplemental material). These interface residues also accounted for some of the peaks (residue 50 to 80, residues 350 to 400, and residues at the C terminus of PB1) in the smooth RMSD curve of functional residues in Fig. 3C . We then performed a detailed analysis of the noncanonical functional residues that were not located in the heterotrimerforming interface. When mapped onto the protein structure, some of them (residues 180 to 220) formed a noticeable cluster ( Fig. 5A and B) . This clustered region is unique to the PB1 protein, which consists of a long twisted ␤-ribbon connected by a nonstructured loop (47) . It protrudes from the polymerase complex structure and is fully solvent exposed. Two nuclear localization signals (NLSs) were reported in the ␤-ribbon region (amino acids 187 to 190 and 207 to 210) to mediate PB1 nuclear import through interaction with RanBP5 (32, 83) . Nonetheless, the function of this loop region is not completely clear. It is suspected to interact with the viral genome in the resolved influenza B and C virus structures (46, 47, 84) , and K198 of influenza A virus was suggested to be related to host adaptation (85) . As the density of the loop region (residues 195 to 198) is missing from the influenza A virus polymerase crystal structure, we used kinematic loop modeling in the Rosetta software to computationally reconstruct the loop region (86) . From the above-described analysis, D193 in the loop region was identified as a noncanonical functional residue. Interestingly, it was the only negatively charged residue located within a highly positively charged environment. It was 100% conserved among all of the human influenza A virus PB1 sequences from the Influenza Research Database under purifying selection (ratio of nonsynonymous to synonymous evolutionary changes [dN/dS ratio] of 0.015) (87) (88) (89) 113 ) (see Table S2 in the supplemental material). Two positively charged residues (K197, K198) located on the side opposite D193 in the loop region were also highly conserved in human influenza A viruses (Ͼ99%) and possibly interact with D193. Although they were not classified as essential residues according to our high-throughput fitness profile, their mutations in charges (K197E, K198E) resulted in a Ͼ6-fold drop in the RF index. To examine if the loop region has possible noncanonical functions, we introduced single substitutions (D193G, K197E, K198E) and double substitutions (D193G-K197E, G193G-K198E, K197E-K198E) into the PB1 protein. We also constructed mutant versions with substitutions in the NLS region (K188A-R189A, R208A-K209A) and mutant versions that decreased the polymerase activity (W55R, H184R, H47L, Q268L) as controls. Of note, all of the controls were identified as deleterious in our high-throughput fitness profile. Viral production of all of the mutant versions was measured by TCID 50 assay with viral rescue experiments. D193G, D193G-K197E, G193G-K198E, K197E-K198E, and the reported substitutions in the NLS region (K188A-R189A) had severe impacts on viral production, with no detectable viral titer posttransfection (Fig. 5C) (32) . Consistently, these mutations also resulted in a significantly lower viral growth rate in A549 cells (Fig. 5D) . To examine the vRdRp function of these mutations, we used a minigenome replicon assay by cotransfecting a virus-inducible luciferase reporter and polymerase segments (PB2, PB1, PA, NP) in 293T cells. The reported mutant NLS (K188A-R189A), which was highly deleterious for viral replication, still had~50% polymerase activity in the minigenome replicon assay. Similarly, D193G and all of the double substitutions in the vRdRp family. The PB1 structure is rainbow colored according to the RMSD of each residue. (B) Histograms of the RF indexes are shown for residues that cannot be aligned (red) and residues that can be aligned with other structures in the vRdRp family. The RF indexes of residues that cannot be aligned were significantly higher (two-tailed t test, P ϭ 0.0006). (C) RMSDs of functional residues. A smooth curve was fitted by loess regression. Conserved vRdRp domains (pre-A/F and A to E) are labeled and shown as valleys on the smooth RMSD curve.

(D193G-K197E, D193G-K198E, K197E-K198E) showed discordance between vRdRp function and viral growth capacity. Compared with W55R, H184R, H47L, and Q268L, which remained at 0.1 to 65% polymerase activity, the fitness drop caused by these newly identified loop mutations was much more severe, indicating that they might have a noncanonical polymerase function of PB1 (Fig. 5C) .

Unlike other RNA viruses, the genome replication and transcription of influenza virus are performed inside the nucleus. Nuclear localization function is thus specific to influenza virus and belongs to noncanonical functions of the PB1 protein. We tested if the mutations identified in the loop region (D193G, D193G-K197E, K197E-K198E) had effects on protein nuclear import. A549 cells were infected with wild-type and mutant viruses at an MOI of 0.1. Cells were fixed and subjected to immunofluorescence analysis (IFA) at 18 h postinfection. As expected, the PB1 proteins of the wild-type virus were localized mostly in the nucleus. However, the PB1 proteins of mutant viruses were significantly enriched in the cytoplasm, suggesting that these mutations were defective in PB1 protein nuclear import (Fig. 6A and B) . More severe defects were observed for double mutations (D193G-K197E, K197E-K198E). Similar results were observed at earlier time points (8 h postinfection) at an MOI of 0.5 (see Fig. S7 in the supplemental material). Interestingly, for those PB1 mutant versions, the nuclear import of PA protein was also delayed, which is consistent with the notion that PA and PB1 are imported into the nucleus as a complex (32, 83, 90, 91 ) (see Fig. S7 ).

RanBP5 belongs to the importin-␤ family, which has a nonclassical nuclear import function (92, 93) . RanBP5 has been shown to be important for influenza A virus PB1 nuclear import. The NLS mutations affected protein nuclear import by decreasing binding to RanBP5 (32, 83, 92) . Thus, we further tested if mutations in the loop region (D193G, D193G-K197E, and K197E-K198E) would also affect the interaction between PB1 and RanBP5. Immunoprecipitation (IP) was performed by cotransfecting the FLAG-tagged PB1 protein and the hemagglutinin (HA)-tagged RanBP5 protein into 293T cells. Two days later, the total cell lysate was collected and subjected to IP with anti-HA antibody-conjugated beads or IgG-conjugated beads. As shown in Fig. 6B , all three mutant proteins showed decreased binding with RanBP5. Consistent with our IFA results, double mutations (D193G-K197E, K197E-K198E) produced a greater reduction in protein binding. The above-described results indicate that the residues in the loop region are important for nuclear import of the influenza A virus PB1 protein through interaction with RanBP5, which is a noncanonical function in the vRdRp family.

For a comprehensive characterization of protein function, identification and annotation of functional residues are the fundamental tasks. Here we present a systematic approach to these tasks by using influenza A virus PB1 as the target protein. Our approach combines high-throughput fitness profiling with mutant stability prediction and homologous structural alignment to identify and annotate canonical and noncanonical vRdRp functional residues (Fig. 1) . Interestingly, we identified a cluster of mutations that were highly deleterious for viral replication but resulted in relatively intact vRdRp function. These mutations were located in the loop region of the PB1 ␤-ribbon and were shown to be important for PB1 nuclear import. The combination of high-throughput fitness profiling and structural analysis provided a general approach to the identification and annotation of functional residues that can be applied to a wide range of proteins about which homologous structural information is available.

In the context of evolutionary biology, proteins from the same homolog family have an ancestor in common and possess significant sequence and structural similarities (94) (95) (96) (97) . Structural similarities are postulated to be maintained by functional constraints (98, 99) . vRdRps probably evolved from a common ancestor (100) . Although their sequence identity is~20%, they have adopted similar structural domains and use similar catalytic mechanisms (20) . Throughout evolution, different proteins also evolved diverse functions to satisfy the needs of specific organisms. Thus, the specific structural motifs that differentiate one protein from homologous proteins may have organism-specific functions. Here we used homologous protein structure information to further annotate the diverse protein functions. Therefore, a multifunctional protein might harbor both canonical (evolutionarily conserved) and noncanonical (organism-specific) functions. The combination of high-throughput genetic screening with homologous-structure analysis enabled us to systematically understand functional residues and important single nucleotide polymorphisms.

Here we show that the residues in the loop region of the PB1 ␤-ribbon are important for PB1 nuclear import. Unlike other RNA viruses, influenza A virus performs its genome replication inside the nucleus. Thus, the polymerase complex needs to be translocated into the nucleus to perform its function. It is known that PB1 and PA are translocated together as a complex, while PB2 can be translocated by itself (101) . RanBP5 is important for the nuclear import of PB1 and PA through direct interaction with PB1. Besides the two reported NLSs, we show that the mutations in the loop region also impact the interaction between PB1 and RanBP5, thus causing the defect in PB1 nuclear import. We do not have direct evidence that the loop region works as a direct NLS or by affecting the nearby NLS regions, but on the basis of the sequence of the loop region, it did not fall into any of the six classes of NLSs (32, 102) . Thus, we suspected that this region affected PB1 nuclear import by affecting the nearby NLS regions. In agreement with previous observations, there seems to be no clear consensus sequence that is responsible or important for RanBP5 binding (32, 103) . The detailed mechanism needs to be further defined.

Genetic studies are greatly facilitated by the improvement of sequencing capacity and the growing number of protein structures being resolved. Large amounts of information generated with current technologies demand more effective approaches to determine structure-function relationships. Coupling mutagenesis with high-throughput sequencing, high-throughput fitness profiling provides a sensitive and unbiased way to identify the essential residues of targeted proteins (16, (33) (34) (35) (36) (37) (104) (105) (106) (107) . The same principle applies to other proteins/organisms, as long as the proper functional measurement can be made (37) . For example, we can study the proteins related to cell proliferation by using the cell growth rate as a readout. By using saturated mutagenesis, we can learn which mutation is related to an abnormal cell growth rate and can further use flow cytometry to differentiate cells in different phases. We can also investigate the roles of mutant proteins in cancer metastasis through transwell migration assays in vitro or by using mouse xenograft models in vivo. The structures of target or homologous proteins can be linked to a genetic profile and further facilitate the understanding of biomolecular functions related to each functional residue. We foresee that this approach will become more powerful as more protein structures are determined at an accelerated rate by crystallography and cryoelectron microscopy and the escalating sequencing technology.

In summary, we have developed a systematic and sensitive method to identify and annotate functional residues. More importantly, the method presented here is generally applicable to other proteins with structural information of homologous proteins.

A/WSN/33 virus segment 2 mutant libraries were generated with the eight-plasmid transfection system (39) . In brief, the entire influenza virus gene was separated into nine small 240-bp segments. Random mutagenesis was performed with error-prone polymerase Mutazyme II (Stratagene). For each small library, mutagenesis was performed separately and the amplified segment was gel purified, BsaI digested, ligated to the vector, and transformed with MegaX DH10B T1R cells (Life Technologies). As each small library was expected to have~1,000 single mutations,~50,000 bacterial colonies were collected to cover the entirety. Plasmids from collected bacteria were midiprepped as the input DNA library. Individual mutant viral plasmids were generated with a quick-change system. To generate mutant virus,~2 million 293T cells were transfected with 10 g of DNA. To measure the growth curve,~1 million A549 cells were infected at an MOI of 0.1 and supernatants were collected at the times indicated.

Sequencing library construction and data analysis. Viral RNA was extracted with the QIAamp Viral RNA Minikit (Qiagen Sciences). DNase I (Life Technologies) treatment was performed, followed by reverse transcription with the SuperScript III system (Life Technologies). At least 10 6 viral copies were used to amplify the mutated segment. The amplified segment was then digested with BpuEI and ligated with the sequencing adaptor, which had three nucleotides multiplexing ID to distinguish between different samples.

Deep sequencing was performed with Illumina sequencing MiSeq PE250. Raw sequencing reads were demultiplexed by using the threenucleotide ID. Sequencing error was corrected by filtering unmatched forward and reverse reads. Mutations were called by comparing sequencing reads with the wild-type sequence. Clones containing two or more mutations were discarded. The RF index was calculated for individual point mutations, and only mutations that had a frequency of Ͼ0.1% in the DNA library were reported. The formula used was RF index mutant All data processing and analysis was performed with customized python scripts, which are available upon request.

Protein structural analysis. Chain B (PB1 protein) of PDB code 4WSB was used for protein ⌬⌬G prediction with single amino acid mutations (46, 47) . ⌬⌬G predictions were performed with both the I-Mutant 2.0 package and ddg_monomer in the Rosetta software (43, 108) . Default parameters (temperature of 25°C, pH 7.0) were used in the I-Mutant package. The parameters used for Rosetta were the same as those previous described (16, 109) . A ⌬⌬G of Ͻ0 in I-Mutant and a ⌬⌬G of Ͼ0 in Rosetta mean destabilization.

The DSSP tool was used to calculated SASA, which was then normalized to the empirical scale as previously described (48) (49) (50) . Sppider was used to identify the protein-protein interface. Residues with at least a 4% reduction and a Ͼ5-Å 2 reduction in SASA upon complex formation were identified as protein-protein interface residues (82) .

TM-align and 3DCOMB were used for pairwise structural alignment and multiple structural alignment (59, 61) . TM-score normalized to the PB1 protein was used.

Protein loop modeling. In the loop region of the PB1 ␤-ribbon, electron density for residues 195 to 198 is missing from the X-ray crystal structure (PDB code 4WSB). Rosetta software was used to computationally reconstruct the loop region, which was based on Monte Carlo sampling with exact kinematic loop closure (86) . After energy optimization, each model was ranked by Rosetta full atom energy function (80) . The lowest-energy model with a hairpin-like loop was selected.

Polymerase activity assay. One hundred nanograms each of PB2, PB1 (wild type and indicated mutations), PA, and NP; 50 ng of a virusinducible luciferase reporter; and 5 ng of PGK-Renilla luciferase were transfected into 293T cells in 24-well plates (110) . Cells were lysed at 24 h posttransfection, and luciferase assay was measured with the Dual-Luciferase Assay kit (Promega).

IFA. The localizations of wild-type PB1 and mutant PB1 proteins were determined by Immunofluorescence analysis (IFA). Infected A549 cells were fixed in 2% paraformaldehyde, permeabilized with 0.1% Triton X-100, and then blocked with 3% bovine serum albumin and 10% fetal bovine serum. Viral PB1 protein was detected with anti-PB1 antibody (GeneTex GTX125923). Hoechst 33342 dye was used for nucleic acid staining.

IP. Immunoprecipitation (IP) experiments were performed with HAand FLAG-tagged proteins expressed in 293T cells. Briefly, cells were transfected with corresponding expression plasmids with Lipofectamine 2000 reagents (Invitrogen) and lysed at 2 days posttransfection with radioimmunoprecipitation assay (RIPA) buffer (50 mM Tris-HCl [pH 7.4], 0.5% NP-40, 150 mM KCl, 1 mM EDTA, protease inhibitor). Cell lysates were incubated with 1 g of anti-HA antibody for 4 h at 4°C with constant agitation, washed with RIPA buffer five times, and eluted with 60 l of SDS-PAGE sample buffer. All samples were subjected to SDS-PAGE and Western blotting.

Western blotting. Proteins in SDS-PAGE sample buffer were heated at 95°C, resolved by SDS-PAGE, and then transferred onto polyvinylidene difluoride membrane. Proteins were detected with antibodies against FLAG-epitope, HA-epitope, or actin.

Phylogenetic analysis. PB1 coding sequences were downloaded from the Influenza Research Database (87) . Multiple sequence alignment was performed with MUSCLE (88). We randomly sampled 3,000 sequences for dN/dS calculation by Fubar with HyPhy (89) .

Accession number(s). Raw sequencing data have been submitted to the NIH Short Read Archive under accession number PRJNA318707.

Supplemental material for this article may be found at http://mbio.asm.org/ lookup/suppl/doi:10.1128/mBio.01801-16/-/DCSupplemental. Figure S1 , TIF file, 22.9 MB. Figure S2 , TIF file, 22.9 MB. Figure S3 , TIF file, 22.9 MB. Figure S4 , TIF file, 22.9 MB.

November/December 2016 Volume 7 Issue 6 e01801-16 ® mbio.asm.org 9 Figure S5 , TIF file, 23.7 MB. Figure S6 , TIF file, 23.7 MB. Figure S7 , TIF file, 23.6 MB. The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Biochemical functional predictions for protein structures of unknown or uncertain function

Computational approaches to identify functional genetic variants in cancer genomes

Automated structurebased prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking

Fitness effects of beneficial mutations: the mutational landscape model in experimental evolution

Functional annotations improve the predictive score of human diseaserelated mutations in proteins

ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information

INTREPID: a web server for prediction of functionally important residues by evolutionary analysis

The use of evolutionary patterns in protein annotation

Prediction of functional sites by analysis of sequence and structure conservation

Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure

Enhanced performance in prediction of protein active sites with THE-MATICS and support vector machines

A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand binding sites

FINDSITE: a combined evolution/ structure-based approach to protein function prediction

Automated prediction of protein function and detection of functional sites from structure

Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties

Functional constraint profiling of a viral protein reveals discordance of evolutionary conservation and functionality

Common and unique features of viral RNAdependent polymerases

Structural organization of viral RNA-dependent RNA polymerases

Structure and function of RNA replication

Evolution of tertiary structure of viral RNA dependent polymerases

A structural and primary sequence comparison of the viral RNA-dependent RNA polymerases

Structure-function relationships underlying the replication fidelity of viral RNA-dependent RNA polymerases

Oligomerization and cooperative RNA synthesis activity of hepatitis C virus RNA-dependent RNA polymerase

Interactions between viral nonstructural proteins and host protein hVAP-33 mediate the formation of hepatitis C virus RNA replication complex on lipid raft

Human host factors required for influenza virus replication

Identification of host proteins required for HIV infection through a functional genomic screen

Genome-wide RNAi screen identifies human host factors crucial for influenza virus replication

Influenza virus protein PB1-F2 inhibits the induction of type I interferon by binding to MAVS and decreasing mitochondrial membrane potential

The influenza virus protein PB1-F2 inhibits the induction of type I interferon at the level of the MAVS adaptor protein

Pathogenic influenza viruses and coronaviruses utilize similar and contrasting approaches to control interferonstimulated gene responses

A comprehensive collection of systems biology data characterizing the host response to viral infection

Characterization of the interaction between the influenza A virus polymerase subunit PB1 and the host nuclear import factor Ran-binding protein 5

High-throughput identification of loss-of-function mutations for anti-interferon activity in the influenza A virus NS segment

A quantitative high-resolution genetic profile rapidly identifies sequence determinants of hepatitis C viral fitness and drug sensitivity

High-throughput profiling of influenza A virus hemagglutinin gene at single-nucleotide resolution

Evolvability as a function of purifying selection in article evolvability as a function of purifying selection in TEM-1 ␤-lactamase

Deep mutational scanning: a new style of protein science

Genome-wide mutagenesis of influenza virus reveals unique plasticity of the hemagglutinin and NS1 proteins

A DNA transfection system for generation of influenza A virus from eight plasmids

A novel influenza A virus mitochondrial protein that induces cell death

Influenza A virus PB1-F2 protein contributes to viral pathogenesis in mice

Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design

I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure

Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details

Assessing predictors of changes in protein stability upon mutation using self-consistency

Structural insight into cap-snatching and RNA synthesis by influenza polymerase

Structure of influenza A polymerase bound to the viral RNA promoter

A series of PDB related databases for everyday needs

Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features

Maximum allowed solvent accessibilities of residues in proteins

Functional analysis of conserved motifs in influenza virus PB1 protein

Integrating computational modeling and functional assays to decipher the structurefunction relationship of influenza virus PB1

Functional analysis of PA binding by influenza A virus PB1: effects on polymerase activity and viral infectivity

A new promoter-binding site in the PB1 subunit of the influenza A virus polymerase

Firestar-prediction of functionally important residues using structural templates and alignment reliability

Prediction of protein functional residues from sequence by probability density estimation

ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids

Firestar-advances in the prediction of functionally important residues

TM-align: a protein structure alignment algorithm based on the TM-score

Scoring function for automated assessment of protein structure template quality

Alignment of distantly related protein structures: algorithm, bound and implications to homology modeling

Crystal structures of oseltamivirresistant influenza virus neuraminidase mutants

Structure-based inhibition of norovirus RNA-dependent RNA polymerases

Structural insights into mechanisms of catalysis and inhibition in Norwalk virus polymerase

Structural and functional characterization of sapovirus RNA-dependent RNA polymerase

Conformational flexibility of the dengue virus RNA-dependent RNA polymerase revealed by a complex with an inhibitor

Further insights into the roles of GTP and the C terminus of the hepatitis C virus polymerase in the initiation of RNA synthesis

The structure of the RNA-dependent RNA polymerase from bovine viral diarrhea virus establishes the role of GTP in de novo initiation

Sequential structures provide insights into the fidelity of RNA replication

The crystal structure of the RNA-dependent RNA polymerase from human rhinovirus: a dual function target for common cold antiviral therapy

The crystal structure of coxsackievirus B3 RNA-dependent RNA polymerase in complex with its protein primer VPg confirms the existence of a second VPg binding site on Picornaviridae polymerases

Structural basis for active site closure by the poliovirus RNA-dependent RNA polymerase

Noncatalytic ions direct the RNA-dependent RNA polymerase of bacterial double-stranded RNA virus 6 from de novo initiation to elongation

RNA synthesis in a cage-structural studies of reovirus polymerase lambda3

Mechanism for coordinated RNA packaging and genome replication by rotavirus polymerase VP1

Structural insights into bunyavirus replication and its regulation by the vRNA promoter

Crystal structure of the full-length Japanese encephalitis virus NS5 reveals a conserved methyltransferase-polymerase interface

Molecular basis for RNA polymerization by Q␤ replicase

The N-terminus of the RNA polymerase from infectious pancreatic necrosis virus is the determinant of genome attachment

Activation mechanism of a noncanonical RNAdependent RNA polymerase

Structure of foot-and-mouth disease virus RNAdependent RNA polymerase and its complex with a template-primer RNA

Prediction-based fingerprints of proteinprotein interactions

Role of Ran binding protein 5 in nuclear import and assembly of the influenza virus RNA polymerase complex

Crystal structure of the RNA-dependent RNA polymerase from influenza C virus

Novel polymerase gene mutations for human adaptation in clinical isolates of avian H5N1 influenza viruses

Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling

Influenza research database: an integrated bioinformatics resource for influenza research and surveillance Influenza Other Respir Viruses

MUSCLE: multiple sequence alignment with high accuracy and high throughput

HyPhy: hypothesis testing using phylogenies

The temperature-sensitive and attenuation phenotypes conferred by mutations in the influenza virus PB2, PB1, and NP genes are influenced by the species of origin of the PB2 gene in reassortant viruses derived from influenza A/California/07/2009 and A/WSN/33 viruses

Temperature-sensitive mutants in the influenza A virus RNA polymerase: alterations in the PA linker reduce nuclear targeting of the PB1-PA dimer and result in viral attenuation

Ran-binding protein 5 (RanBP5) is related to the nuclear transport factor importin-beta but interacts differently with RanBP1

Cloning and characterization of human karyopherin beta3

Exon structure conservation despite low sequence similarity: a relic of dramatic events in evolution?

Striking structural and functional similarities suggest that intestinal sucrase-isomaltase, human lysosomal alpha-glucosidase and Schwanniomyces occidentalis glucoamylase are derived from a common ancestral gene

Predicting protein function from sequence and structure

An introduction to sequence and series

Supersites within superfolds. Binding site similarity in the absence of homology

Detection of protein three-dimensional side-chain patterns: new examples of convergent evolution

Structure of the RNAdependent RNA polymerase of poliovirus

Nuclear import of the influenza A virus transcriptional machinery

Six classes of nuclear localization signals specific to different binding grooves of importin alpha

Nuclear import by karyopherin-␤s: recognition and inhibition

Protein tolerance to random amino acid change

Capturing the mutational landscape of the beta-lactamase TEM-1

The spatial architecture of protein function and adaptation

Comprehensive mutational scanning of a kinase in vivo reveals substratedependent fitness landscapes

Macromolecular modeling with Rosetta

Role of conformational sampling in computing mutation-induced changes in protein structure and stability

Virus-inducible reporter genes as a tool for detecting and quantifying influenza A virus replication

An experimentally determined evolutionary model dramatically improves phylogenetic fit

Accurate measurement of the effects of all amino-acid mutations on influenza hemagglutinin

FUBAR : A Fast, Unconstrained Bayesian AppRoximation for inferring selection