key: cord-0698365-9jhn0b7q authors: Leyson, Christina Lora M.; Jordan, Brian J.; Jackwood, Mark W. title: Insights from molecular structure predictions of the infectious bronchitis virus S1 spike glycoprotein date: 2016-11-09 journal: Infect Genet Evol DOI: 10.1016/j.meegid.2016.11.006 sha: 5e6e7aa2ed1dcbfb1a68c659eae0721921d3a28d doc_id: 698365 cord_uid: 9jhn0b7q Infectious bronchitis virus is an important respiratory pathogen in chickens. The IBV S1 spike is a viral structural protein that is responsible for attachment to host receptors and is a major target for neutralizing antibodies. To date, there is no experimentally determined structure for the IBV S1 spike. In this study, we sought to find a predicted tertiary structure for IBV S1 using I-TASSER, which is an automated homology modeling platform. We found that the predicted structures obtained were robust and consistent with experimental data. For instance, we observed that all four residues (38, 43, 63, and 68) that have been shown to be critical for binding to host tissues, were found at the surface of the predicted structure of Massachusetts (Mass) S1 spike. Together with antigenicity index analysis, we were also able to show that Ma5 vaccine has higher antigenicity indices at residues close to the receptor-binding region than M41 vaccine, thereby providing a possible mechanism on how Ma5 achieves better protection against challenge. Examination of the predicted structure of the Arkansas IBV S1 spike also gave insights on the effect of polymorphisms at position 43 on the surface availability of receptor binding residues. This study showcases advancements in protein structure prediction and contributes useful, inexpensive tools to provide insights into the biology of IBV. Infectious bronchitis virus (IBV) is a highly transmissible respiratory virus in poultry. It belongs to the family Coronaviridae and genus Gammacoronavirus (International Committee on Taxonomy of Viruses et al., 2012) . Biosecurity and vaccination are the main control strategies against IBV, and vaccination has proven to be an effective strategy to prevent IBV outbreaks. Typically, broiler chickens are vaccinated by spray with live-attenuated IBV vaccine at the day of hatch. However, some problems still occur. Poor vaccine coverage and lateral transmission of vaccine virus among birds are examples of vaccination issues encountered in the field (Jackwood et al., 2009) . The spike glycoprotein is one of the structural proteins of all coronaviruses. In electron micrographs, it appears as large bulbous projections radiating from the virus particles and makes up the hallmark "crown-like" appearance that is characteristic of the Coronaviridae. The spike glycoprotein is a major determinant of host and tissue tropism (Wickramasinghe et al., 2014) . Binding of the spike protein to host cells is dependent on α-2,3 sialic acid, though a secondary protein receptor has been proposed (Winter et al., 2006) . The spike protein is composed of two subunits (Cavanagh, 1983) ; the S1 subunit, which is responsible for binding the virus to host cells (Cavanagh and Davis, 1986) ; and the S2 subunit, which is responsible for mediating viral-cellular membrane fusion (Bosch et al., 2003) . The S1 spike subunit of IBV also serves as the major target of virus neutralizing antibodies (Cavanagh et al., 1988; Cavanagh et al., 1997; Koch et al., 1990; Mockett et al., 1984; Moore et al., 1997) as well as an important antigen for cell-mediated immunity (Collisson et al., 2000) . The S1 subunit gene contains the greatest region of genetic diversity across the IBV genome. Within the S1 spike gene, three hypervariable regions have been identified as hotspots for nucleotide sequence changes (Koch et al., 1990; Moore et al., 1998; Niesters et al., 1986) . It is interesting to note however, that as little as 2-4% or 10-15 amino acid changes are sufficient to alter serotypes (Cavanagh, 2003; de Wit et al., 2011; Hodgson et al., 2004) . Thus, it appears that changes in just one or a few amino acids in S1 could destroy an epitope or result in changes to the protein's structure, which may have implications on binding to host cells and on inducing an immune response against the virus. Several three-dimensional structures are currently available for the spike protein of other coronaviruses, particularly those for members of the genera Alphacoronavirus and Betacoronavirus (Protein Data Bank; rcsb.org). However, none exists for the IBV or any other viral species in the genus Gammacoronavirus. It has recently been proposed that Infection, Genetics and Evolution 46 (2016) Infection, Genetics and Evolution j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / m e e g i d the spike protein of all coronaviruses share a common evolutionary origin and that it is reasonable to predict structure from existing structural data, despite having a sequence identity below 30% (Li, 2012) . Furthermore, the same study proposed that Alpha-, Beta-, and Gammacoronaviruses likely share protein domains, namely the N-terminal domain (NTD) and C-terminal domain (CTD) in the S1 spike protein. Thus, it is possible to map these proteins and identify putative regions that function as epitopes or as receptor binding residues. For our study, we sought to produce a three-dimensional structural model of the full length S1 spike glycoprotein of various IBV vaccine serotypes and to map regions in the S1 spike that are of significant interest, such as the receptor binding region. To produce a structural model for the entire S1 spike protein, we attempted to perform conventional homology modeling using SWISS-MODEL (Biasini et al., 2014) . The first step was to perform an alignment between the S1 spike protein of IBV and the S1 spike protein of a coronavirus with known crystal structure. Among coronavirus species examined, the human coronavirus NL63 receptor binding domain (Alphacoronavirus, PDB ID: 3KBH) had the highest sequence similarity to IBV S1, albeit at 17%. An amino acid alignment between IBV and NL63 S1 proteins was performed in preparation for homology modeling. However, we found that the NL63 receptor-binding domain (RBD) aligned to IBV S1 positions 239-377, which did not correspond with the region previously identified in the IBV S1 protein that contains the critical residues responsible for binding to the host cell (Promkuntod et al., 2014) . This was not surprising since the sequence identity between NL63 and IBV S1 spike proteins was extremely low, and the RBD of NL63 is found in the CTD (Wu et al., 2009) , whereas the RBD of IBV S1 has been mapped to the NTD (Promkuntod et al., 2014) . It has been proposed that NTD and CTD of coronaviruses have distinct capacity in terms of receptor function since NTDs from various coronavirus species primarily interact with carbohydrates such as sialic acid, while CTDs primarily interact with protein host receptors (Li, 2015) . Thus, the structure of NL63 spike RBD was not useful as a model since it is likely that IBV has a receptor-binding strategy that is distinct from NL63. Although there are other three-dimensional spike protein structures available for members of the Alphacoronavirus and Betacoronavirus genera (PDB, rcsb.org), only three include the entire S1 protein and most of the other structures only include the RBD. Fig. 1 shows domains and amino acid sequence identity between IBV S1 and other S1 spike proteins for which three dimensional structures have been determined for the entire protein. Unfortunately, the sequence identity was poor, indicating that those proteins would not be useful to predict structure. Since we wanted to model the entire S1 protein, we sought to find an alternative strategy. I-TASSER (Iterative Threading Assembly Refinement) is an automated program that performs a series of multiple homology modeling across a database of known structures and uses additional algorithms to refine structure and to infer function (Roy et al., 2010; Yang et al., 2014; Zhang, 2008 ). An online service for I-TASSER is currently available at http://zhanglab.ccmb.med.umich.edu/I-TASSER/. We submitted IBV S1 sequences of two vaccine virus serotypes namely, the Massachusetts (Mass) and Arkansas (Ark) serotypes. The S1 sequences we submitted spans from amino acid 19 through the last amino acid before the furin cleavage site that delineates S1 and S2 spike protein subunits (aa 533 -538 in Mass). The signal peptide of S1 spike is found at amino acid positions 1-18 and was removed from the sequence submitted to I-TASSER since it is not present in the mature S1 spike protein. Among the top 5 models predicted by I-TASSER, we chose the model with the highest C-score, which is a score that estimates the quality of the models produced. Fig. 2 shows the overall structure of the S1 spike protein for Ma5 (Mass) and ArkDPI (Arkansas Delmarva Poultry Industry) vaccine strains. We observed that the predicted structures have a dumbbell shape with two distinct domains that are composed primarily of β sheets. This observation is consistent with the proposal that all coronavirus S1 spike proteins have two functional domains, NTD and CTD, as previously mentioned. Cryo-electron microscopy structures of the homotrimeric spike protein have been determined for two other coronavirus species, mouse hepatitis virus (Walls et al., 2016) and human coronavirus HKU1 (Kirchdoerfer et al., 2016) . The S1 spike in these structures is V-shaped with the NTD and CTD as distinct domains composed of β sheets, similar to the predicted structures obtained for IBV. We observed that the minimal RBD previously reported for M41, which spans amino acids 19-272 (Promkuntod et al., 2014) , mapped as one of the distinct protein domains in M41. After conducting an amino acid alignment of the M41 RBD with the ArkDPI S1 protein using Muscle (MegAlign Pro, DNAStar, Wisconsin USA) we observed that two beta-strands found in the NTD of M41 (amino acids 254 to 265) were located in the CTD of ArkDPI (amino acids 261 to 272). This suggests that the minimal RBD for ArkDPI may be different from M41. Aside from the minimal RBD, we also highlighted amino acid residues (38, 43, 63, 69) that were identified as critical for binding to host tissues (Promkuntod et al., 2014) . We observed that these amino acids mapped to the surface of the S1 spike, which leads to the assumption A. B. Fig. 1 . Domains and amino acid sequence identity between IBV S1 and S1 spike proteins for which three dimensional structures have been determined for the entire protein. Full length S1 spike protein sequences were obtained for coronaviruses for which the structures of the trimeric spike protein have been determined by cryoelectron microscopy, namely, NL63, MHV, and HKU1. Multiple sequence alignment was performed to compare residues 19 to 69 the area in the IBV S1 protein found to be an important determinant for binding, the N-terminal domain (NTD) residues 19 to 272 and C-terminal domain (CTD) residues 273 to 537 (Promkuntod et al., 2014) for the various coronavirus S1 spikes to that of IBV strains Ma5 and ArkDPI. (A) Schematic representation of domains in S1 (B) amino acid identity. Low sequence identity was observed between IBV and coronaviruses from the other genera. Multiple sequence alignment was performed using Muscle. PDB ID for NL63 is 5SZS, for MHV is 3JCL, and for HKU is 5I108. Genbank accession number for Ma5 is AAS67647.1 and for ArkDPI is ADP06471.2. that they could be part of the receptor binding site; however, they could also be affecting the conformation of the receptor binding region (Fig. 3) . It is of note however that these residues were identified in M41, which is a Mass-type IBV. Other amino acid residues may play a critical role in binding to the host in serotypes other than Mass. 1.1. Examination of predicted structure of S1 protein of Mass IBV vaccines The Mass-type IBV vaccine is the most commonly used vaccine across the globe. There are several genotypes of Mass vaccines commercially available. Among them are Ma5, M41, H52 and H120 vaccines. Vaccine trials in our laboratory comparing two vaccine strains, Ma5 and M41, have shown that Ma5 achieves better protection against challenge than M41 (unpublished results). One hypothesis to partially explain this observation is that the Ma5 S1 protein has greater antigenicity than the M41 S1 protein. In order to test this, we calculated the Jameson-Wolf antigenicity indices (Jameson and Wolf, 1988) for the S1 spike protein at each amino acid position and compared numerical values of the antigenicity scores between the two Mass-type vaccine strains at each amino acid position (Fig. 4) . The overall difference in antigenicity between Ma5 and other vaccine strains was also obtained by taking the absolute value of the difference in antigenicity score at each amino acid position and calculating the mean. We found that the overall antigenicity indices differed by only 0.0592, 0.0185, and 0.0593 between Ma5 and M41, H52, or H120, respectively. Most of the regions where we found the highest differences in antigenicity indices coincide with hypervariable regions in S1, as expected (Cavanagh et al., 1992) (Fig. 4) . The differences in antigenicity indices between Ma5 and M41 were mapped onto their respective predicted tertiary structure (Fig. 5) . Interestingly, we found that Ma5 had a higher antigenicity index in a region close to the residues critical for binding to host tissues (Fig. 5 ). There are a number of possible explanations for why the Ma5 vaccine achieves better protection than M41 against challenge including; replication efficiency of the vaccine, ability to induce neutralizing antibodies that closely match challenge strain epitopes, as well as the ability to induce antibodies near the receptor-binding region that presumably would block challenge virus attachment. Our data shows a higher antigenicity index close to the S1 receptor binding region of the Ma5 vaccine potentially leading to attachment blocking antibodies that could contribute to the better protection observed for the Ma5 vaccine. Ark IBV is the most common serotype isolated in the United States (Jackwood et al., 2005; Nix et al., 2000) . Currently, ArkDPI is the only commercially available vaccine against Ark IBV. However, ArkDPI has been shown to replicate poorly in birds (Roh et al., 2013) and to persist within a flock (Jackwood et al., 2009 ). Re-isolation of ArkDPI vaccine virus from vaccinated birds reveals that certain polymorphisms emerge, particularly in the S1 spike gene (Ammayappan et al., 2009; McKinley et al., 2008; Ndegwa et al., 2014; Nix et al., 2000; Roh et al., 2013; Toro et al., 2012) . One of the most common polymorphisms observed in re-isolated ArkDPI is a tyrosine to histidine change at position 43 (Y43H). As mentioned above, amino acid position 43 is one of the residues that are critical for binding to host tissues. A histidine is conserved at position 43 (H43) in many other serotypes of IBV (Fig. 6A) . However, position 43 has been substituted with a tyrosine in the reference sequence of the ArkDPI vaccine (GenBank accession ADP06471.2). Upon re-isolation of ArkDPI vaccine virus from chickens, S1 spike sequencing reveals that almost all re-isolated ArkDPI vaccine virus has H43. This suggests that H43 has some impact in the ability of IBV to infect chickens, likely by influencing the binding properties of S1 spike. To examine the impact of Y43H, we submitted the S1 spike of ArkDPI with Y43 or H43 to the I-TASSER server. The predicted tertiary structure revealed that position 43 is buried in the structure of ArkDPI S1 spike with Y43 or H43 ( Fig. 6B and C) . However, ArkDPI with H43 allowed for a conformational change NTD CTD NTD CTD Fig. 2 . Predicted molecular structure of the S1 spike protein has two distinct domains. The S1 spike protein sequences from a Massachusetts (Ma5) and an Arkansas (ArkDPI) serotype vaccine viruses were submitted to I-TASSER (Zhang, 2008) . The signal peptide (amino acid positions 1-18) has been removed from submitted sequences to reflect the mature peptide sequence. Predicted three-dimensional structures of Ma5 (A) and ArkDPI (B) S1 spike proteins exhibited two distinct domains. The minimal receptor-binding domain (Promkuntod et al., 2014) has been mapped to amino acids 19-272, which are shown in the structures as blue. Interestingly, the minimal receptor domain forms a distinct domain for both IBV S1 spike proteins examined. This observation is consistent with the proposition that the S1 spike of all coronaviruses have two domains namely, the N-terminal and C-terminal domains (Li, 2015) . NTD = N-terminal domain and CTD = C-terminal domain. Visualization of protein structures was done using PyMol (Schrödinger, 2015) . GenBank accession number for Ma5 is AAS67647.1 and for ArkDPI is ADP06471.2. Fig. 3 . Residues critical for binding to chicken tissues map to a localized region on the surface of S1. The predicted molecular structure for Ma5 (Mass) vaccine S1 spike protein is shown as a surface representation to examine residues exposed to the solvent. Amino acids identified as critical for binding to host tissues (Promkuntod et al., 2014) are colored red. All four amino acids (N38, H43, P63, T69) mapped to the surface, consistent with its function in binding to host receptors. NTD = N-terminal domain and CTD = Cterminal domain. Visualization of protein structures was done using PyMol (Schrödinger, 2015) . to expose amino acid residues 38 and 63, which have been implicated in binding to host tracheal tissues. Our binding experiments on host tissues have shown that a tyrosine (Y) at position 43 allows S1 spike to bind strongly to chorioallantoic membrane in embryos, whereas a histidine (H) at the same position allows S1 spike to bind well on mature chicken trachea (Leyson et al., 2016) . These findings provided an Fig. 4 . Antigenicity indices were compared between two Mass IBV vaccine strains, Ma5 and M41. The Jameson-Wolf antigenicity index (Jameson and Wolf, 1988) was applied to the S1 spike amino acid sequence of four common Mass-type IBV vaccines namely: Ma5, M41, H120, and H52. The antigenicity score for each amino acid position in Ma5 was numerically compared to that of M41, H120, or H52. Shown here is a heat map of regions where antigenicity scores were found to be different between Ma5 and other Mass-type IBV vaccine strains. Regions in blue represent amino acid positions where Ma5 has a higher antigenicity score than M41, H120, or H52; whereas regions in yellow represent regions where M41, H120, or H52 has a higher antigenicity score than Ma5. The green bars at the bottom represent known hypervariable regions in S1 spike (Cavanagh et al., 1992) . Jameson-Wolf antigenicity indices were calculated using Protean from DNAStar (Wisconsin, USA). The heat map was generated in Microsoft Excel (Washington, USA). A B NTD CTD NTD CTD Fig. 5 . Differences in antigenicity indices in the receptor binding region of S1 s of Massachusetts-type IBV vaccine viruses. The antigenicity index scores for Ma5 and M41 were determined by the method previously described by Jameson and Wolf (1988) . Antigenicity scores obtained were compared at each amino acid position. The M41 scores were subtracted from the Ma5 scores and the values obtained are represented in the Ma5 model (A). The Ma5 scores were subtracted from the M41 scores and the values obtained are represented in the M41 model (B). The values were overlaid onto the predicted molecular structures using a heat map to indicate the magnitude of the difference wherein blue represents the lowest value and red represents the highest value. The amino acid residues shown to be critical for binding are indicated by yellow dots (surface model) or by residue name and number (inset). We observed that Ma5 had higher antigenicity scores close to the receptor-binding region. This may help explain better efficacy with Ma5 vaccine, since this model suggests that Ma5 could elicit more neutralizing antibodies that target the receptor-binding region. NTD = N-terminal domain and CTD = C-terminal domain. Visualization of protein structures was done using PyMol (Schrödinger, 2015) . explanation for the emergence of the Y43H polymorphism in ArkDPI vaccine virus re-isolated from chickens. We have shown in this study that the prediction of three-dimensional protein structure using I-TASSER can offer insights that aid in explaining experimental data as well as phenomena observed in the field. In silico analysis has advanced to a point where protein structure prediction has become robust. Indeed, the availability of such structure predictions is especially important to proteins that are difficult to empirically determine their structure. Applying this type of analysis to other IBV serotypes will offer a fast, inexpensive way to design experiments and to predict outcomes of sequence changes observed in the field. Fig. 6 . Y43H polymorphism in the ArkDPI S1 spike protein potentially alters the receptor-binding region. The amino acids 38, 43, 63, and 68 have been shown to be critical for S1 spike binding to host tissues (Promkuntod et al., 2014) . A multiple alignment showing amino acid positions 36 to 70 from Mass and Ark serotypes are found in (A). In most serotypes, a histidine (H43) is well conserved. In the ArkDPI vaccine however, position 43 has a tyrosine (Y43). Predicted tertiary structure of the ArkDPI vaccine S1 spike revealed that positions 38, 43, and 63 are occluded from the surface of the S1 spike, while position 68 is completely buried in the structure as a part of a β-strand (B). On the other hand, the predicted structure of ArkDPI S1 with H43 has amino acids 38 and 63 on its surface, perhaps allowing it to interact with host receptors and to bind more effectively to host cells (C). NTD = Nterminal domain and CTD = C-terminal domain. Multiple alignment was done using ClustalW algorithm and eBioX software (A). Visualization of protein structures was done using PyMol (Schrödinger, 2015) (B, C). tertiary and quaternary structure using evolutionary information The coronavirus spike protein is a class I virus fusion protein: structural and functional characterization of the fusion core complex Coronavirus IBV: further evidence that the surface projections are associated with two glycopolypeptides Severe acute respiratory syndrome vaccine development: experiences of vaccination against avian infectious bronchitis coronavirus Coronavirus IBV: removal of spike glycopolypeptide S1 by urea abolishes infectivity and haemagglutination but not attachment to cells Amino acids within hypervariable region 1 of avian coronavirus IBV (Massachusetts serotype) spike glycoprotein are associated with neutralization epitopes Location of the amino acid differences in the S1 spike glycoprotein subunit of closely related serotypes of infectious bronchitis virus Relationship between sequence variation in the S1 spike protein of infectious bronchitis virus and the extent of cross-protection in vivo Cytotoxic T lymphocytes are critical in the control of infectious bronchitis virus in poultry Infectious bronchitis virus variants: a review of the history, current situation and control measures Recombinant infectious bronchitis coronavirus Beaudette with the spike protein gene of the pathogenic M41 strain remains attenuated but induces protective immunity Data from 11 years of molecular typing infectious bronchitis virus field isolates Infectious bronchitis virus field vaccination coverage and persistence of Arkansastype viruses in commercial broilers The antigenic index: a novel algorithm for predicting antigenic determinants Pre-fusion structure of a human coronavirus spike protein Antigenic domains on the peplomer protein of avian infectious bronchitis virus: correlation with biological functions Polymorphisms in the S1 spike glycoprotein of Arkansas-type infectious bronchitis virus (IBV) show differential binding to host tissues and altered antigenicity Evidence for a common evolutionary origin of coronavirus spike protein receptor-binding subunits Receptor recognition mechanisms of coronaviruses: a decade of structural studies Avian coronavirus infectious bronchitis attenuated live vaccines undergo selection of subpopulations and mutations following vaccination Monoclonal antibodies to the S1 spike and membrane proteins of avian infectious bronchitis coronavirus strain Massachusetts M41 Identification of amino acids involved in a serotype and neutralization specific epitope within the S1 subunit of avian infectious bronchitis virus Sequence comparison of avian infectious bronchitis virus S1 glycoproteins of the Florida serotype and five variant isolates from Georgia and California Comparison of vaccine subpopulation selection, viral loads, vaccine virus persistence in trachea and cloaca, and mucosal antibody responses after vaccination with two different Arkansas Delmarva poultry industry-derived infectious bronchitis virus vaccines The peplomer protein sequence of the M41 strain of coronavirus IBV and its comparison with Beaudette strains Emergence of subtype strains of the Arkansas serotype of infectious bronchitis virus in Delmarva broiler chickens Mapping of the receptor-binding domain and amino acids critical for attachment in the spike protein of avian coronavirus infectious bronchitis virus Evaluation of infectious bronchitis virus Arkansas-type vaccine failure in commercial broilers I-TASSER: a unified platform for automated protein structure and function prediction The Pymol Molecular Graphics System (Version 1.8) Infectious bronchitis virus subpopulations in vaccinated chickens after challenge Cryo-electron microscopy structure of a coronavirus spike glycoprotein trimer The avian coronavirus spike protein Sialic acid is a receptor determinant for infection of cells by avian infectious bronchitis virus Crystal structure of NL63 respiratory coronavirus receptor-binding domain complexed with its human receptor The I-TASSER suite: protein structure and function prediction I-TASSER server for protein 3D structure prediction