key: cord-349015-5oisrm5s authors: Liu, Zhe; Zheng, Huanying; Yuan, Runyu; Li, Mingyue; Lin, Huifang; Peng, Jingju; Xiong, Qianlin; Sun, Jiufeng; Li, Baisheng; Wu, Jie; Hulswit, Ruben J.G.; Bowden, Thomas A.; Rambaut, Andrew; Loman, Nick; Pybus, Oliver G; Ke, Changwen; Lu, Jing title: Identification of a common deletion in the spike protein of SARS-CoV-2 date: 2020-04-02 journal: bioRxiv DOI: 10.1101/2020.03.31.015941 sha: doc_id: 349015 cord_uid: 5oisrm5s Two notable features have been identified in the SARS-CoV-2 genome: (1) the receptor binding domain of SARS-CoV-2; (2) a unique insertion of twelve nucleotide or four amino acids (PRRA) at the S1 and S2 boundary. For the first feature, the similar RBD identified in SARs-like virus from pangolin suggests the RBD in SARS-CoV-2 may already exist in animal host(s) before it transmitted into human. The left puzzle is the history and function of the insertion at S1/S2 boundary, which is uniquely identified in SARS-CoV-2. In this study, we identified two variants from the first Guangdong SARS-CoV-2 cell strain, with deletion mutations on polybasic cleavage site (PRRAR) and its flank sites. More extensive screening indicates the deletion at the flank sites of PRRAR could be detected in 3 of 68 clinical samples and half of 22 in vitro isolated viral strains. These data indicate (1) the deletion of QTQTN, at the flank of polybasic cleavage site, is likely benefit the SARS-CoV-2 replication or infection in vitro but under strong purification selection in vivo since it is rarely identified in clinical samples; (2) there could be a very efficient mechanism for deleting this region from viral genome as the variants losing 23585-23599 is commonly detected after two rounds of cell passage. The mechanistic explanation for this in vitro adaptation and in vivo purification processes (or reverse) that led to such genomic changes in SARS-CoV-2 requires further work. Nonetheless, this study has provided valuable clues to aid further investigation of spike protein function and virus evolution. The deletion mutation identified in vitro isolation should be also noted for current vaccine development. SARS-CoV-2 is a novel coronavirus firstly identified at the end of December 2019 1 but has caused a global pandemic of COVID-19 2 . Unlike the other two zoonotic coronaviruses SARS CoV-1 and MERS 3 , the genetic evolution history is mostly unknown for SARS-CoV-2. A recent analysis based on the genetic information and protein structure highlights there are two notable features in the SARS-CoV-2 genome: (1) the receptor binding domain (RBD) of SARS-CoV-2 is distinct from the most closely-related batorigin SARs related virus (RaTG13) and is demonstrated to have a high affinity to human ACE2 receptor; 2) a unique insertion of 12 nucleotides (or four amino acids, PRRA) at the S1 and S2 boundary results in a polybasic (furin) cleavage site and three predicted O-linked glycans around the cleavage site 4 . With respect to the first feature, the similar RBD identified in a SARs-like virus from a pangolin suggests that the RBD in SARS-CoV-2 may already exist in its potential animal host(s) before it transmitted into human 5 . The question remaining is the history and function of the insertion at the S1/S2 boundary, which is uniquely identified in SARS-CoV-2. The insertion of proline is predicted to result in three addition of O-linked glycans. The functional consequence of the polybasic cleavage site and O-linked glycans in SARS-CoV-2 is unknown. By sequencing the whole genome of SARS-CoV-2, we identified two variants having deletion mutations on polybasic cleavage site (PRRAR) and its flank sites. More extensive screening indicates the deletion at the flank sites of PRRAR have been frequently observed in cell isolated strains and could be verified by multiple sequencing methods. The first COVID-19 clinical case in Guangdong was reported on 19 th January, with illness onset on 1 st To investigate whether these deletions described above are random mutations occasionally identified in a strain or would commonly occur after cell passages, we performed whole genome sequencing on the other 21 SARS-CoV-2 viral strains collected after 2 rounds of cell passage in Vero-E6 or Vero cells (Supplemental Table) . The corresponding original samples for these strains were collected between 19 th January and 28 th February 2020. Multiplex-PCR combined with the nanopore sequencing was used, following the general protocol as described in (https://artic.network/ncov-2019). The ARTIC pipeline was applied to trimmed primers and generated the bam files, which included all reads mapping to the SARS-CoV-2 reference genome (MN908947.3). Variant sites were called by using iVar 7 with depth >=20 as a threshold. With this method, 10 of 21 cell isolate strains have different ratios of variants (>10%) with deletion at the flank of the polybasic cleavage site (deletion at 23585-23599) ( Figure C) . One has the variant with deletion on the polybasic cleavage site (deletion at 23596-23617). To find out whether the deletion on 23585-23599 was restricted in a specific genetic lineage, we next investigated the phylogenetic relationship of these strains and first 014 strain described above. As shown in Figure D , the strains with a relative higher ratio of this deletion were dispersed in the phylogenetic tree suggesting the deletion mutation was not restricted to a specific genetic lineage of SARS-CoV-2 viruses. To identify whether these deletions also occurred in original clinical samples, we screened the high through-put sequencing data from 149 clinical samples, which collected between 6 th February and 20 th March in Guangdong, China. These samples were sequenced as by using multiplex PCR combined with nanopore sequencing. There were 68 SARS-CoV-2 genomes with sequencing average depth >=20 at the sites neighboring 23585. As shown in Table 1 , the variants with the deletion at 23585-23599 were found in 3 (6%) of clinical samples with ratios ranging from 8.8-32.8% indicating this deletion may also occur in vivo infections even though the rate was extremely low compared to the results from in vitro ( Figure D ). To date, there are no genome sequences deposited in public dataset having this deletion. However, this did not mean this variant did not exist in currently released sequences since most of the variants with a lower ratio would be discarded when generating the final consensus sequences. A new coronavirus associated with human respiratory disease in China Coronavirus disease (COVID-2019) situation reports Origin and evolution of pathogenic coronaviruses The proximal origin of SARS-CoV-2 Identifying SARS-CoV-2 related coronaviruses in Malayan pangolins Evidence and characteristics of human-to-human transmission of SARS-CoV-2 An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar The Chinese SARS Molecular Epidemiology Consortium. Molecular Evolution of the SARS Coronavirus During the Course of the SARS Epidemic in China Attenuation of replication by a 29 nucleotide deletion in SARS-coronavirus acquired during the early stages of human-to-human transmission A novel bat coronavirus reveals natural insertions at the S1/S2 cleavage site of the Spike protein and a possible recombinant origin of HCoV-19 Genome Sequence Archive