key: cord-270919-0hldozml authors: Cortey, Martí; Li, Yanli; Díaz, Ivan; Clilverd, Hepzibar; Darwich, Laila; Mateu, Enric title: SARS-CoV-2 amino acid substitutions widely spread in the human population are mainly located in highly conserved segments of the structural proteins date: 2020-05-17 journal: bioRxiv DOI: 10.1101/2020.05.16.099499 sha: doc_id: 270919 cord_uid: 0hldozml The Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic offers a unique opportunity to study the introduction and evolution of a pathogen into a completely naïve human population. We identified and analysed the amino acid mutations that gained prominence worldwide in the early months of the pandemic. Eight mutations have been identified along the viral genome, mostly located in conserved segments of the structural proteins and showing low variability among coronavirus, which indicated that they might have a functional impact. At the moment of writing this paper, these mutations present a varied success in the SARS-CoV-2 virus population; ranging from a change in the spike protein that becomes absolutely prevalent, two mutations in the nucleocapsid protein showing frequencies around 25%, to a mutation in the matrix protein that nearly fades out after reaching a frequency of 20%. The emergence of the novel Severe acute respiratory syndrome coronavirus 2 (SARS- 29 CoV-2) and the subsequent pandemic has become a health problem unparalleled in the 30 last century. SARS-CoV-2 is thought to be originated from an animal coronavirus that 31 successfully adapted to humans. The species of origin of SARS-CoV-2 has not been fully 32 identified, but the virus seems to be related to SARS-CoV and other coronaviruses found 33 in bats and other mammal species, although different from them (Chan et The SARS-CoV-2 genome size is around 30 kb with the typical gene structure known in 36 other betacoronaviruses: starting from the 5′, more than two-thirds of the genome 37 comprises orf1ab encoding polyproteins (nsp1 to nsp15), while the last third consists of 38 genes encoding major structural proteins; including spike (S or ORF2), envelope (E or 39 ORF4), membrane (M or ORF5), and nucleocapsid (N or ORF9) proteins. Additionally, the 40 SARS-CoV-2 contains at least 6 minor structural proteins, encoded by ORF3a, ORF6, 41 ORF7a, ORF7b, ORF8, and ORF10 genes (Khailany et al. 2020) . 42 The first cases of the novel coronavirus associated disease (CoVID-19) have been traced 43 to the Chinese province of Hubei in early December 2019 44 (https://www.who.int/csr/don/12-january-2020-novel-coronavirus-china/en/). 45 Although the actual index case is not really known, the first sequence of the novel 46 coronavirus was produced within weeks from the emergence of the disease (Zhu et al. 47 2019). As of the moment of writing this paper, more than 16,000 sequences have been 48 produced in less than five months since the start of the pandemic. This is a unique 49 opportunity to gain insight on the evolution of a betacoronavirus in a completely naïve 50 4 human population. In this context, viral variants efficiently transmitted will have less 51 influence of the selection exerted by the immune response, since most transmissions 52 will occur from individuals before the development of an efficient immune response to 53 naïve recipients. 54 The aim of the present study was to determine the amino acid substitutions in viral 55 proteins that were widely present in available sequences of SARS-CoV-2, relating them 56 to the known chronology of the pandemic. Also, the mutations found were assessed in 57 order to try to understand its potential significance for viral fitness. week, they were also reported in other continents (Supplementary Table S1 ). 117 Interestingly, the only substitution that became fully predominant was Asp614Gly in the 118 spike protein (ORF2-S). Gly57His in ORF3a reached a frequency of 50% at the moment 119 7 of writing this paper. It is worth noting that when the sequences where analysed by 120 continents, all mutations were spread worldwide, except the 175Met in the ORF5-M, 121 that was absent in Africa (Supplementary Table S1 ). Fig. S4 ). These clades did correspond to the L and S types reported by 212 Tang et al. (2020). This mutation was predicted to be neutral. 213 Finally, the last change was found in the nsp6 protein, Leu37Phe, which significance was 214 unclear. This mutation was also predicted to be neutral. CFSSP: Chou and Fasman Secondary Structure Prediction 351 server Evolutionary analysis of SARS-CoV-2: how mutation of Non-353 Structural Protein 6 (NSP6) could affect viral autophagy Global Spread of SARS-CoV-2 Subtype with Spike Protein 356 Mutation D614G is Shaped by Human Genomic Variations that Regulate Expression of 357 TMPRSS2 and MX1 Genes', bioRxiv Genomic characterization of the 2019 novel human-pathogenic 359 coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan Predicting the Functional Effect of Amino Acid Substitutions and 362 Indels The membrane M protein carboxy terminus binds to transmissible 364 gastroenteritis coronavirus core and contributes to core stability SARS-CoV nucleocapsid protein binds to hUbc9, a ubiquitin 367 conjugating enzyme of the sumoylation system BioEdit: a user-friendly biological sequence alignment editor and 370 analysis program for Windows 95/98/NT Molecular Evolution of the SARS Coronavirus During the Course of 372 the SARS Epidemic in China SARS-CoV-2 and ORF3a: Non-Synonymous Mutations Polyproline Regions', mSystems Genomic characterization of a novel SARS-CoV-2' Spike mutation pipeline reveals the emergence of a more 378 transmissible form of SARS-CoV-2 Genetic evidence for a structural interaction between 380 the carboxy termini of the membrane and nucleocapsid proteins of mouse hepatitis 381 virus Genomic characterisation and epidemiology of 2019 novel 383 coronavirus: implications for virus origins and receptor binding The coronavirus nucleocapsid is a multifunctional protein It is too soon to attribute ADE to COVID-19', Microbes and Infection On the origin and continuing evolution of SARS-CoV-2', National 389 Science Review On the origin and continuing evolution of SARS-CoV-2', National 391 Science Review Is COVID-19 receiving ADE from other coronaviruses?', Microbes & 393 Infection CLUSTAL W: improving the sensitivity of progressive 395 multiple sequence alignment through sequence weighting, position-specific gap 396 penalties and weight matrix choice Immunodominant SARS Coronavirus Epitopes in Humans Elicited 398 both Enhancing and Neutralizing Effects on Infection in Non-human Primates Cryo-EM structure of the 2019-nCoV spike in the prefusion 401 conformation A pneumonia outbreak associated with a new coronavirus of 403 probable bat origin A Novel Coronavirus from Patients with Pneumonia in China