key: cord-0798914-jkc5ronu authors: Li, Wei title: Delving deep into the structural aspects of a furin cleavage site inserted into the spike protein of SARS-CoV-2: A structural biophysical perspective date: 2020-06-29 journal: Biophys Chem DOI: 10.1016/j.bpc.2020.106420 sha: 6f9e120d7b96628db47d177a9a0bf8d92f34201a doc_id: 798914 cord_uid: jkc5ronu One notable feature of the SARS-CoV-2 genome, the spike (S) protein of SARS-CoV-2 has a polybasic furin cleavage site (FCS) at its S1-S2 boundary through the insertion of 12 nucleotides encoding four amino acid residues PRRA. Quite intriguingly, this polybasic FCS is absent in coronaviruses of the same clade as SARS-CoV-2. Thus, with currently available experimental structural data for S protein, this short article presents a set of comprehensive structural characterization of the insertion of FCS into S protein, and argues against a hypothesis of the origin of SARS-CoV-2 from purposeful manipulation: (1), the inserted FCS is spatially located at a random coil loop region, mostly distantly solvent-exposed (instead of deeply buried), with no structural proximity to the other part of the S protein; (2), the insertion of FCS itself does not alter, neither stabilize nor de-stabilize, the three-dimensional structure of S; (3), the net result here is the insertion of a furin cleavage site into S protein, whose S1 and S2 subunits will still be strongly electrostatically bonded together from a structural and biophysical point of view, even if the polybasic FCS is actually cleaved by furin protease before or after viral cell entry. sequences that mediate fusion of viral and cell membranes [3, 4, 5, 6, 7] . Recently, it was reported that S protein contains a potential cleavage site for furin protease [8] , including four residues (Pro681 (P681), Arg682 (R682), Arg683 (R683) and Ala684 (A684)) [9, 10, 11, 12, 13] . Functionally, R682, R683, A684 and Arg685 (R685) constitute the minimal polybasic furin cleavage cite (FCS), i.e., RXYR, where X or Y is to be a positively charged arginine or lysine. With respect to the origin of this COVID-19 pandemic, of further interest is the fact that the S protein has a specific FCS that is absent in coronaviruses of the same clade as SARS-CoV-2 [14, 8, 15, 16] . In the midst of this COVID-19 pandemic, FCS is reportedly linked to a natural-selection, instead of purposeful-manipulation, -based hypothesis of the origin of this COVID-19 outbreak [9, 17, 18, 19, 20, 21] . Regardless of this COVID-19-origin hypothesis [9] , First, the sequence above was plugged into the SwissModel homology modelling [23] server in search of an experimental structural model of the S protein of SARS-CoV-2. In principle, it is expected that the experimental S protein structure is complete, i.e., with no experimentally uncharted territories (EUTs) [24] . In fact, however, the structural search led to a Cryo-EM structure (PDB ID: 6VSB) with a range of EUTs [24] , representing the three-dimensional structure of the prefusion 2019-nCoV spike glycoprotein with a single receptor-binding domain up [1] . From an amino acid sequence alignment (Figure 1.2 in supplementary file suppleme nt.pdf), it is obvious that those EUTs are widely scattered throughout the homotrimeric structure (PDB ID 6VSB) [1] . Yet, with another amino acid sequence alignment (Figure 1.1 in supplementary file supple ment.pdf), it was revealed that the sequence similarity between the Cryo-EM structure (PDB ID 6VSB) and QHD43416.1 is as high as 99.26%, making PDB ID 6VSB rather suitable to be used as a structural template for the subsequent homology structural modelling of the S protein of the SARS-CoV-2. Therefore, the Cryo-EM PDB ID 6VSB structure was used as the template by the SwissModel [23] homology modelling server to build a structural model with as less EUTs [24] as possible of the S protein of SARS-CoV-2. buried or exposed according to a comparison with its standard SASA value contained in the "standard.data" file (supplementary file standard.data) available with the Naccess [29] software distribution. With SwissModel [23] and UCSF Chimera [25] , a homology homotrimeric structural model (H)  On the other hand, however, the last (hydrogen bond No. 5 in Table 1 ) hydrogen bond was formed between the main chain (backbone) oxygen atom of the FCS residue ( Arg685) and the positively charged side chain of Lys310, for which a comprehensive set of structural electrostatic interaction analysis is included in Tables 2 and 3 below. and C), making it unlikely that the 5 th (Table 1 ) hydrogen bond is able to disrupt the basic residue pair Lys310-Asp663 and induce a major conformational change for the structure of the S protein of SARS-CoV-2. To sum up, the inserted FCS is only involved in a set of weak electrostatic interactions (five hydrogen bonds, Table 1 ) within the S protein of SARS-CoV-2, whose overall scaffold is not to be altered, neither stabilized nor de-stabilized, by the insertion of the polybasic FCS. the inserted FCS fragment from Chain B is coloured red. This figure is prepared using PyMol [30] with supplementary file model.pdb as an input. As shown in Figure 1 , Table 4 for residues at FCS as below, observed in a ALA-X-ALA tripeptide built using the QUANTA molecular graphics so ftware package [29] . In addition to Table 4 To answer this question, a similar set of electrostatic interaction analysis [26] was conducted for the trimeric [1] structural model of the S protein, with the difference here being that the whole S protein structure was splitted into two parts : the first part consists of three (chains A, B and C) S1 structural fragments, i.e., before the polybasic FCS, while the second part consists of three (chains A, B and C) S2 structural fragments, i.e., after the polybasic FCS. The results of this FCS-specific electrostatic interaction analysis are included in Tables 5, 6 (Table 5 ). This figure is prepared using PyMol [30] with supplementary file respectively. From Tables 5, 6 Table 5 ), Arg319 of chain A also forms a side chain hydrogen bond with Asp737 of chain B (Table 7) . Among all electrostatic interactions listed in Tables 5, 6 Incorporating currently available structural data (as of Thu Jun 25 09:39:40 2020) of the S protein [22] , and in the hope of pushing a little bit forward the boundary of our scientific knowledge on COVID-19, this short article presents a comprehensive structural characterization of the FCS inserted into the S protein of SARS-CoV-2, and puts forward a set of structural analysis as below, 1. the polybasic FCS is only involved in set of weak electrostatic interactions, and is t herefore not able to alter, neither stabilize nor de-stabilize, the overall scaffold of the S protein of SARS-CoV-2. 2. the polybasic FCS is spatially located at a random coil loop region, mostly distantly solvent-exposed ( Figure 1 , instead of deeply buried), with no structural proximity to the other part of the S protein of SARS-CoV-2. Quite recently, it has been reported that this polybasic FCS is essential for SARS-CoV-2 to infect human lung cells, and that campaigns to develop therapeutics against SARS-CoV-2 should include the evaluation of furin inhibitors [16, 14] . In view of the reported in vitro functional relevance of FCS [16] , it is postulated here that the polybasic FCS does need to be cleaved by furin protease, leading to a different action mechanism of SARS-CoV-2 from its siblings, where its S1 and S2 subunits (while still strongly bonded together) undergoes a major structural rearrangement before or after viral cell entry. Nevertheless, in the midst of this COVID-19 pandemic [32] with more than 5.5 million (as of Thu Jun 25 09:39:40 2020) confirmed cases globally, this short article puts forward a set of analysis that the net structural consequence of FCS here is the insertion of a furin cleavage site into the S protein of SARS-CoV-2, and is thus of only limited structural biophysical relevance here. Finally, along with [9] , the structural biophysical analysis here makes even more unlikelier a purposeful-manipulation-based hypothesis of the origin of SARS-CoV-2. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Structure of 2019-nCoV chimeric receptor-binding domain complexed with its receptor human ACE2. Worldwide Protein Data Bank A pneumonia outbreak associated with a new coronavirus of probable bat origin A new coronavirus associated with human respiratory disease in China Severe acute respiratory syndrome-related coronavirus: The species and its viruses -a statement An interactive web-based dashboard to track COVID-19 in real time. The Lancet Infectious Diseases A Database of 20-Residue Furin Cleavage Site Motifs, Substrates and Their Associated Drugs The proximal origin of SARS-CoV-2 Mimicking the passage of avian influenza viruses through the gastrointestinal tract of chickens. Veterinary Microbiology A Novel Activation Mechanism of Avian Influenza Virus H9N2 by Furin Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus Structure, function and antigenicity of the SARS-CoV-2 spike glycoprotein The spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-like cleavage site absent in CoV of the same clade Functional analysis of potential cleavage sites in the MERS-coronavirus spike protein A Multibasic Cleavage Site in the Spike Protein of SARS-CoV-2 Is Essential for Infection of Human Lung Cells Functional assessment of cell entry and receptor usage for SARS-CoV-2 and other lineage B betacoronaviruses Trypsin Treatment Unlocks Barrier for Zoonotic Bat Coronavirus Infectio n Pangolin homology associated with 2019-nCoV Clinical features of patients infected J o u r n a l P r e -p r o o f Journal Pre-proof with 2019 novel coronavirus in Wuhan, China. The Lancet Evidence of recombination in coronaviruses implicating pangolin origins of nCoV-2019 Announcing the worldwide Protein Data Bank SWISS-MODEL: homology modelling of protein structures and complexes Visualising the Experimentally Uncharted Territories of Membrane Protein Structures inside Protein Data Bank UCSF Chimera: A visualization system for exploratory research and analysis How do SMA-linked mutations of SMN1 lead to structural/functional deficiency of the SMA protein? PLOS ONE Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features A solution for the best rotation to relate two sets of vectors Computer Program, Department of Biochemistry and Molecular Biology Pymol: An open-source molecular graphics tool Principles that Govern the Folding of Protein Chains