key: cord-351835-1s2zsqoq authors: Liu, Zhixin; Xiao, Xiao; Wei, Xiuli; Li, Jian; Yang, Jing; Tan, Huabing; Zhu, Jianyong; Zhang, Qiwei; Wu, Jianguo; Liu, Long title: Composition and divergence of coronavirus spike proteins and host ACE2 receptors predict potential intermediate hosts of SARS‐CoV‐2 date: 2020-03-11 journal: J Med Virol DOI: 10.1002/jmv.25726 sha: doc_id: 351835 cord_uid: 1s2zsqoq From the beginning of 2002 and 2012, severe respiratory syndrome coronavirus (SARS‐CoV) and Middle East respiratory syndrome coronavirus (MERS‐CoV) crossed the species barriers to infect humans, causing thousands of infections and hundreds of deaths, respectively. Currently, a novel coronavirus (SARS‐CoV‐2), which has become the cause of the outbreak of Coronavirus Disease 2019 (COVID‐19), was discovered. Until 18 February 2020, there were 72 533 confirmed COVID‐19 cases (including 10 644 severe cases) and 1872 deaths in China. SARS‐CoV‐2 is spreading among the public and causing substantial burden due to its human‐to‐human transmission. However, the intermediate host of SARS‐CoV‐2 is still unclear. Finding the possible intermediate host of SARS‐CoV‐2 is imperative to prevent further spread of the epidemic. In this study, we used systematic comparison and analysis to predict the interaction between the receptor‐binding domain (RBD) of coronavirus spike protein and the host receptor, angiotensin‐converting enzyme 2 (ACE2). The interaction between the key amino acids of S protein RBD and ACE2 indicated that, other than pangolins and snakes, as previously suggested, turtles (Chrysemys picta bellii, Chelonia mydas, and Pelodiscus sinensis) may act as the potential intermediate hosts transmitting SARS‐CoV‐2 to humans. SARS-CoV and MERS-CoV are considered highly pathogenic and are known to be transmitted from bats to humans via intermediate host palm civets 4 and dromedary camels 5 The full-length structure of SARS-CoV-2 spike glycoprotein was simulated by the I-TASSER server online tool. 10 The spike-ACE2 binding model was predicted using PRISM 2.0. 11 The spike protein and ACE2 structure files were analyzed using PyMOL software (PyMOL v1.0). SARS-CoV-2 encodes at least 27 proteins, including 15 nonstructural proteins, 4 structural proteins, and 8 auxiliary proteins. 12 Spike glycoprotein (S), a structural protein located on the outer envelope of the virion, binds to the host-receptor angiotensin-converting enzyme 2 (ACE2). The S glycoprotein of SARS-CoV, MERS-CoV, and SARS-CoV-2 has 1104 to 1273 amino acids and contains an amino (N)terminal S1 subunit and a carboxyl (C)-terminal S2 subunit 13 ( Figure 1 ). In the S1 subunit, the receptor-binding domain (RBD), spanning about 200 residues, consists of two subdomains: the core and external subdomains. 14, 15 The RBD core subdomain is responsible for the formation of S trimer particles. 16 The external subdomain contains two exposed loops on the surface, which bind with ACE2. 17 Investigating the evolutionary relationship of the RBD sequence in spike protein is helpful for understanding the virus origin trends. F I G U R E 1 Structural diagrams of spike glycoproteins of SARS-CoV, MERS-CoV, and SARS-CoV-2. All spike proteins of coronaviruses contain S1 subunit and S2 subunit, which were divided by the S cleavage sites. FP, fusion peptide; HR, heptad repeat 1 and heptad repeat 2; RBD, receptor-binding domain, contains core binding motif in the external subdomain; SP, signal peptide Phylogenetic reconstruction determines the evolutionary relationship and host selection between spike glycoproteins in the human-close beta coronaviruses. To better understand the host selection of beta coronaviruses, the relationship of spike glycoprotein between SARS-CoV-2 and other closely related beta coronaviruses has been analyzed. The result showed that bat SARS-like CoV RaTG13, with 96.2% overall genome sequence identity, 18 is an inner joint neighbor of SARS-CoV-2 ( Figure 2 ). In October 2019, it was reported that Sendai virus and coronavirus were the dominant viruses in the virome data of Malayan pangolins. 19 It is noteworthy that SARS-CoV was the most widely distributed coronavirus in the pangolin samples. We blasted the SARS-CoV-2 reads and found that the spike protein sequence was present in pangolin SARS-like CoV (numbered SRR10168377 here). The S protein in pangolin SARS-like CoV SRR10168377 (marked as red star) possesses only 75% similarity with SARS-CoV-2, partly because there were more than 220 residues in the S2 subunit that had not been read in the virome data. Removing the lost sequences, there was still 88% similarity with SARS-CoV-2 (data not shown); the full-length sequence of spike protein between pangolin SARS-like CoV and SARS-CoV-2 seems to be a little different. The temporal tree showed that the divergence time of spike sequence between bat SARS-like RaTG13 and SARS-CoV-2 is 0.18, while it is 1.50 in bat SARS-like RaTG13 to SARS-CoV-2 cluster ( Figure S1 ). Global expansion and deep sequencing work have led to an increased amount of the SARS-CoV-2 genotype. Studying selective stress may be helpful for assessing the variability and potential for host changes of SARS-CoV-2. Based on a report, 20 the selective pressure analysis showed that genes (ORF10 and ORF7a) have a greater selective pressure, and Spike has an average pressure relative to the whole genome ( Figure S2 ). The spike RBD-receptor interaction is a key factor determining the host range of coronaviruses. The RBD sequences of spike protein from SARS-CoV, bat, or pangolin SARSlike CoV and SARS-CoV-2 were aligned ( Figure 3 ). There are some deletions from 473 to 490 residues in bat SARS-like CoV, which are located in the external subdomain, and it seems that these viruses do not infect humans naturally, given that there is no direct evidence showing that bat SARS-like CoV has ever infected humans. Interestingly, the SARS-CoV-2 RBD sequence from 329 to 521 possesses 93% identity with pangolin SARS-like CoV SRR10168377, which has higher than 89% similarity with bat SARS-like CoV RaTG13 Residues K479 and S487 in civet SARS-CoV can effectively recognize civet ACE2, but bind with human ACE2 much less efficiently. 23 The residue Thr487 in RBD binds to Tyr41 and Lys353 in human ACE2 by van der Waals contacts, 25 (Table 1) . Mouse and dog also have multiple (≥5) substitutions while cat and hamster only contain three mutations in the region. It was showed that Lys31, Tyr41, and Lys353 mutations substantially interfere with the S1-Ig association. 24 This study provides information and possibilities that like snakes and pangolins, turtles (C. picta bellii, C. mydas, and P. sinensis) may also act as the potential intermediate hosts transmitting SARS-CoV-2 to human, although much more needs to be confirmed. Severe acute respiratory syndrome-related coronavirus: the species and its viruses-A statement of the Coronavirus Study Group A novel coronavirus from patients with pneumonia in China Inferring the hosts of coronavirus using dual statistical models based on nucleotide composition Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China Evidence for camel-to-human transmission of MERS coronavirus Infections: Disease Outbreak News WHO Middle East Respiratory Syndrome Coronavirus (MERS-CoV)-Saudi Arabia Maximal viral information recovery from sequence data using VirMAP I-TASSER server: new development for protein structure and function predictions Predicting proteinprotein interactions on a proteome scale by matching evolutionary and structural similarities at interfaces using PRISM Genome composition and divergence of the novel coronavirus (2019-nCoV) originating in China Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Structure of SARS coronavirus spike receptor-binding domain complexed with receptor Molecular basis of binding between novel human coronavirus MERS-CoV and its receptor CD26 Cryo-EM structures of MERS-CoV and SARS-CoV spike glycoproteins reveal the dynamic receptor binding domains Functional assessment of cell entry and receptor usage for lineage B βcoronaviruses, including 2019-nCoV A pneumonia outbreak associated with a new coronavirus of probable bat origin Viral metagenomics revealed Sendai virus and coronavirus infection of Malayan pangolins (Manis javanica) Decoding the evolution and transmissions of the novel pneumonia coronavirus (SARS-CoV-2) using whole genomic data Identification of 2019-nCoV related coronaviruses in Malayan pangolins in southern China Evidence of recombination in coronaviruses implicating pangolin origins of nCoV-2019 Recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission Receptor and viral determinants of SARScoronavirus adaptation to human ACE2 Bat-to-human: spike features determining host 'host jump' of coronaviruses SARS-CoV, MERS-CoV, and beyond Development of a colloidal gold immunochromatographic strip for the rapid detection of soft-shelled turtle systemic septicemia spherical virus Discovery of a novel single-stranded DNA virus from a sea turtle fibropapilloma by using viral metagenomics Transcriptome profiling analysis of lung tissue of Chinese soft-shell turtle infected by Trionyx sinensis Hemorrhagic syndrome virus Soft-shelled turtle iridovirus enters cells via cholesterol-dependent, clathrin-mediated endocytosis as well as macropinocytosis Identification of a novel nidovirus as a potential cause of large scale mortalities in the endangered Bellinger River snapping turtle (Myuchelys georgesi) Homologous recombination within the spike glycoprotein of the newly identified coronavirus may boost cross-species transmission from snake to human Composition and divergence of coronavirus spike proteins and host ACE2 receptors predict potential intermediate hosts of SARS-CoV-2 The authors declare that there are no conflict of interests.