key: cord-0983073-f35cqkpf authors: Zhou, Hong; Chen, Xing; Hu, Tao; Li, Juan; Song, Hao; Liu, Yanran; Wang, Peihan; Liu, Di; Yang, Jing; Holmes, Edward C.; Hughes, Alice C.; Bi, Yuhai; Shi, Weifeng title: A novel bat coronavirus closely related to SARS-CoV-2 contains natural insertions at the S1/S2 cleavage site of the spike protein date: 2020-05-11 journal: Curr Biol DOI: 10.1016/j.cub.2020.05.023 sha: 81f08eb50af92d224572d05886b2e5ca377715ec doc_id: 983073 cord_uid: f35cqkpf Summary The unprecedented pandemic of pneumonia caused by a novel coronavirus, SARS-CoV-2, in China and beyond has had major public health impacts on a global scale [1,2]. Although bats are regarded as the most likely natural hosts for SARS-CoV-2 [3], the origins of the virus remain unclear. Here, we report a novel bat-derived coronavirus, denoted RmYN02, identified from a metagenomics analysis of samples from 227 bats collected from Yunnan Province in China between May and October, 2019. Notably, RmYN02 shares 93.3% nucleotide identity with SARS-CoV-2 at the scale of the complete virus genome and 97.2% identity in the 1ab gene, in which it is the closest relative of SARS-CoV-2 reported to date. In contrast, RmYN02 showed low sequence identity (61.3%) to SARS-CoV-2 in the receptor binding domain (RBD) and might not bind to angiotensin-converting enzyme 2 (ACE2). Critically, and in a similar manner to SARS-CoV-2, RmYN02 was characterized by the insertion of multiple amino acids at the junction site of the S1 and S2 subunits of the spike (S) protein. This provides strong evidence that such insertion events can occur naturally in animal betacoronaviruses. novel bat coronavirus novel bat coronavirus novel bat coronavirus closely related to SARS closely related to SARS closely related to SARS closely related to SARS----CoV CoV CoV CoV----2 2 2 2 1 contains contains contains contains natural insertions at the S1/S2 cleavage site of natural insertions at the S1/S2 cleavage site of natural insertions at the S1/S2 cleavage site of natural insertions at the S1/S2 cleavage site of The unprecedented pandemic of pneumonia caused by a novel coronavirus, 34 SARS-CoV-2, in China and beyond has had major public health impacts on a global scale 35 [1, 2] . Although bats are regarded as the most likely natural hosts for SARS-CoV-2 [3], the 36 origins of the virus remain unclear. Here, we report a novel bat-derived coronavirus, 37 denoted RmYN02, identified from a metagenomics analysis of samples from 227 bats 38 collected from Yunnan Province in China between May and October, 2019. Notably, 39 RmYN02 shares 93.3% nucleotide identity with SARS-CoV-2 at the scale of the complete 40 virus genome and 97.2% identity in the 1ab gene, in which it is the closest relative of 41 SARS-CoV-2 reported to date. In contrast, RmYN02 showed low sequence identity 42 (61.3%) to SARS-CoV-2 in the receptor binding domain (RBD) and might not bind to 43 angiotensin-converting enzyme 2 (ACE2). Critically, and in a similar manner to 44 8 using Simplot [10] also revealed that RmYN02 was more similar to SARS-CoV-2 than 137 RaTG13 in most genome regions ( Figure 1A ). Again, in the RBD, the 138 pangolin/MP789/2019 virus shared the highest sequence identity to SARS-CoV-2 ( Figure 139 1B). 140 141 Results from both homology modelling [2] , in vitro assays [3] and resolved 142 three-dimensional structure of the S protein [11] have revealed that like SARS-CoV, 143 SARS-CoV-2 could also use ACE2 as a cell receptor. We analyzed the RBD of RmYN02, 144 RaTG13, and the two pangolin beta-CoVs using homology modelling ( The S protein of CoVs is functionally cleaved into two subunits, S1 and S2 [15] in a similar 171 manner to the haemagglutinin (HA) protein of avian influenza viruses (AIVs). The insertion 172 of polybasic amino acids at the cleavage site in the HAs of some AIV subtypes is 173 associated with enhanced pathogenicity [16, 17] . Notably, SARS-CoV-2 is characterized 174 by a four-amino-acid-insertion at the junction of S1 and S2, not observed in other lineage 175 B beta-CoVs [18]. This insertion, which represents a poly-basic (furin) cleavage site, is 176 unique to SARS-CoV-2 and is present in all SARS-CoV-2 sequenced so far. The insertion 177 of three residues, PAA, at the junction of S1 and S2 in RmYN02 ( Figure 2H , and Figure 178 S2A for results from Sanger sequencing) is therefore of major importance. Although the 179 inserted residues (and hence nucleotides) are not the same as those in RmYN02, and 180 10 hence are indicative of an independent insertion event, that they are presented in wildlife 181 (bats) strongly suggests that they are of natural origin and have likely acquired by 182 recombination. As such, these data are strongly suggestive of a natural zoonotic origin of 183 SARS-CoV-2. 184 We next performed a phylogenetic analysis of RmYN02, RaTG13, SARS-CoV-2 and the 185 pangolin beta-CoVs. Consistent with a previous research [6], the pangolin beta-CoVs 186 formed two well-supported sub-lineages, representing animal seized by anti-smuggling 187 authorities in Guangxi (Pangolin-CoV/GX) and Guangdong (Pangolin-CoV/GD) provinces 188 ( Figure 3A and Figure S4A ). However, whether pangolins are natural reservoirs for these 189 viruses, or they acquired these viruses independently from bats or other wildlife, requires 190 further sampling [6] . More notable was that RmYN02 was the closest relative of 191 SARS-CoV-2 in most of the virus genome, although these two viruses were still separated 192 from each other by a relatively long branch length ( Figure 3A and Figure S4A ). In the 193 spike gene tree, SARS-CoV-2 clustered with RaTG13 and was distant from RmYN02, 194 suggesting that the latter virus has experienced recombination in this gene ( Figure 3B and 195 Figure S4B ). In phylogeny of the RBD, SARS-CoV-2 was most closely related to 196 pangolin-CoV/GD, with the bat viruses falling in more divergent positions, again indicative 197 of recombination ( Figure 3C and Figure S4C ). Finally, phylogenetic analysis of the 198 complete RNA dependent RNA polymerase (RdRp) gene, which is often used in the 199 phylogenetic analysis of RNA viruses, revealed that RmYN02, RaTG13 and SARS-CoV-2 200 formed a well-supported sub-cluster distinct from the pangolin viruses ( Figure 3D and 201 Figure S4D ). 202 11 We confirmed the bat host of RmYN02, Rhinolophus malayanus, by analyzing the 203 sequence of the cytochrome b (Cytb) gene from the next generation sequencing data; this 204 revealed 100% sequence identity to a Rhinolophus malayanus isolate (GenBank 205 accession MK900703). Both Rhinolophus malayanus and Rhinolophus affinis are widely 206 distributed in southwest China and southeast Asia. Generally, they do not migrate over 207 long distances and are highly gregarious such that they are likely to live in the same caves, 208 which might facilitate the exchange of viruses between them and the occurrence of 209 recombination. Notably, RaTG13 was identified from anal swabs and RmYN02 was 210 identified from feces, which is a simple, but feasible way for bats to spread the virus to 211 other animals, especially species that can utilize cave environments. 212 Our study reaffirms that bats, particularly those of the genus Rhinolophus, are important 213 natural reservoirs for coronaviruses and currently harbor the closest relatives of 214 SARS-CoV-2, although this picture may change with increased wildlife sampling. In this 215 context it is striking that the RmYN02 virus identified here in Rhinolophus malayanus is 216 the closest relative of SARS-CoV-2 in the long 1ab replicase gene, although the virus 217 itself has a complex history of recombination. Finally, the observation that RmYN02 218 contains a polybasic insertion at the S1/S2 cleavage site in the spike protein clearly 219 indicates that events of this kind are a natural and expected component of coronavirus 220 the S1/S2 cleavage site of RmYN02 and representative beta the S1/S2 cleavage site of RmYN02 and representative beta the S1/S2 cleavage site of RmYN02 and representative beta the S1/S2 cleavage site of RmYN02 and representative beta----CoVs CoVs CoVs Figure S4 . (n=219), lung (n=2) and liver (n=3), and 300 feces (n=78). All but three bats were sampled alive and subsequently released. All 301 samples were first stored in RNAlater and then kept at -80°C until use. 302 Next Next Next Next generation sequencing generation sequencing generation sequencing generation sequencing 303 Based on the bat species primarily identified according to morphological criteria and 304 confirmed through DNA barcoding, the 224 tissue and 78 fecal samples were merged into 305 38 and 18 pools, respectively, with each pool containing 1 to 11 samples of the same type 306 (Table S1 ). Samples were transferred into the RNAiso Plus reagent (TAKARA) for 307 homogenization with steel beads. Total RNA was extracted and subsequently purified 308 using EZNA Total RNA Kit (OMEGA). Libraries were constructed using the NEB Next 309 Ultra RNA Library Prep Kit (NEB). rRNA of feces or tissues was removed using the To validate the two novel CoV genomes, the clean reads of pool 39 were then de novo 18 assembled using Trinity [24] with default settings. The assembled contigs were compared 335 with the consensuses obtained in the previous step and merged using Geneious (version 336 11.1.5) (https://www.geneious.com). We found that contigs with high and low abundance 337 corresponded to RmYN02 and RmYN01, respectively, with the abundance of RmYN02 338 5-10 times greater than that of RmYN01. The gaps between contigs of RmYN02 were 339 complemented by re-mapping the reads to the ends of the contigs, which produced the 340 full-length genome sequence of RmYN02. However, due to the limited number of reads 341 available, only a partial genome sequence of RmYN01 was obtained (23395 bp reading frames (ORFs) of the verified genome sequences were predicted using Geneious 354 (version 11.1.5). Pairwise sequence identities were also calculated using Geneious. 355 19 Potential recombination events were investigated using Simplot (version 3.5.1) [10] . 356 The three-dimensional structures of RBD from RmYN02, RaTG13, pangolin/GD and 357 pangolin/GX were modeled using Swiss-Model program [20] using SARS CoV RBD 358 structure (PDB: 2DD8) [12] as a template. 359 Multiple sequence alignment of SARS-CoV-2 and the reference sequences was 360 performed using Mafft [26] . Phylogenetic analyses of the complete genome and major 361 encoding regions were performed using RAxML [21] with 1000 bootstrap replicates, 362 employing the GTR nucleotide substitution model (Figure 3 ). Phylogenetic analysis was 363 also performed using MrBayes [27] , employing the GTR nucleotide substitution model 364 ( Figure S4 ). Ten million steps were run, with trees and parameters sampled every 1,000 365 steps. 366 Sanger seq Sanger seq Sanger seq Sanger sequencing uencing uencing uencing 367 Based on the spike gene sequence of RmYN02, a TaqMan-based qPCR was performed 368 to test the feces of pool 39 (Table S2) Figure S1 ). To further 372 confirm the S1/S2 cleavage site and the 1b (RdRp) gene sequence of RmYN02, five pair 373 primers, F1/R1-F4/R4 and F6/R6, were designed for Sanger sequencing (Table S2) Metagenomic analysis identified a novel coronavirus, RmYN02, from R. malayanus. RmYN02 was the closest relative of SARS-CoV-2 in most of the virus genome. Two loop deletions in RBD may reduce the binding of RmYN02 with ACE2. RmYN02 contains an insertion at the S1/S2 cleavage site in the spike protein. Table 1 . Sequence identity for SARS-CoV-2 compared with RmYN02 and representative beta-CoVs genomes. Complete genome A Novel Coronavirus from Patients with Pneumonia in China Genomic characterisation and epidemiology of 2019 novel 385 coronavirus: implications for virus origins and receptor binding A pneumonia outbreak associated with a new 388 coronavirus of probable bat origin Epidemiology, Genetic Recombination, and Pathogenesis of Coronaviruses The 394 species Severe acute respiratory syndrome-related coronavirus: classifying 395 2019-nCoV and naming it SARS-CoV-2 Studies of SARS Coronavirus Evidence 423 of recombination in coronaviruses implicating pangolin origins of nCoV-2019. bioRxiv Receptor-binding domain of SARS-CoV spike protein induces highly potent 427 neutralizing antibodies: implication for developing subunit vaccine Emergence of a highly pathogenic 431 avian influenza virus from a low-pathogenic progenitor Human infections with recently-emerging highly pathogenic 434 H7N9 avian influenza virus in China The 436 proximal origin of SARS-CoV-2 Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS RAxML version 8: a tool for phylogenetic analysis and 446 post-analysis of large phylogenies fastp: an ultra-fast all-in-one FASTQ 448 preprocessor BtRl-BetaCoV/SC2018|China|Rhinolophus sp BtKY72|Kenya|Rhinolophus sp HKU3-7|Guangdong|R. sinicus BtRs-BetaCoV/YN2013|Yunnan|R. sinicus