key: cord-0756251-tvbnv5gz authors: Benvenuto, Domenico; Giovanetti, Marta; Ciccozzi, Alessandra; Spoto, Silvia; Angeletti, Silvia; Ciccozzi, Massimo title: The 2019‐new coronavirus epidemic: Evidence for virus evolution date: 2020-02-07 journal: J Med Virol DOI: 10.1002/jmv.25688 sha: 7142ff45dbca092b80f3e5eaf6da6d4f4b42e03a doc_id: 756251 cord_uid: tvbnv5gz There is a worldwide concern about the new coronavirus 2019‐nCoV as a global public health threat. In this article, we provide a preliminary evolutionary and molecular epidemiological analysis of this new virus. A phylogenetic tree has been built using the 15 available whole genome sequences of 2019‐nCoV, 12 whole genome sequences of 2019‐nCoV, and 12 highly similar whole genome sequences available in gene bank (five from the severe acute respiratory syndrome, two from Middle East respiratory syndrome, and five from bat SARS‐like coronavirus). Fast unconstrained Bayesian approximation analysis shows that the nucleocapsid and the spike glycoprotein have some sites under positive pressure, whereas homology modeling revealed some molecular and structural differences between the viruses. The phylogenetic tree showed that 2019‐nCoV significantly clustered with bat SARS‐like coronavirus sequence isolated in 2015, whereas structural analysis revealed mutation in Spike Glycoprotein and nucleocapsid protein. From these results, the new 2019‐nCoV is distinct from SARS virus, probably trasmitted from bats after mutation conferring ability to infect humans. The phylogenetic tree showed that 2019-nCoV significantly clustered with bat SARS- The family Coronaviridae comprises a group of large, single, plusstranded RNA viruses isolated from several species, and it is previously known to cause the common cold and diarrheal illnesses in humans. 1 well as for drug and vaccine development. In this short report, we provide a phylogenetic tree of the 2019-nCoV and identify sites of positive or negative selection pressure in distinct regions of the virus. The complete genomes of 15 2019-nCoV sequences have been downloaded from GISAID (https://www.gisaid.org/) and GenBank (http://www.ncbi.nlm.nih.gov/genbank/). A dataset has been built using five highly similar sequences for SARS, two sequences for the Middle East respiratory syndrome (MERS), and five highly similar sequences for bat SARS-like coronavirus. The percentage of similarity has been identified using a basic local alignment search tool (https:// blast.ncbi.nlm.nih.gov/Blast.cgi); eventually duplicated sequences have been excluded from the datasets. The dataset including 27 sequences has been aligned using multiple sequence alignment online tool 9 and manually edited using BioEdit program v7.0.5. 10 Maximum likelihood (ML) methods were employed for the analyses because they allow for testing different phylogenetic hypotheses by calculating the probability of a given model of evolution generating the observed data and by comparing the probabilities of nested models by the likelihood ratio test. The best-fitting nucleotide substitution model was chosen by jModeltest software. 11 ML tree was reconstructed using generalized time-reversible plus gamma distribution and invariant sites (+G+I) as an evolutionary model using MEGA-X. 12 The adaptive evolution server (http://www.datamonkey.org/) was used to find eventual sites of positive or negative selection. For this purpose, the following test has been used: fast unconstrained Bayesian approximation (FUBAR). 13 This test allowed us to infer the site-specific pervasive selection, the episodic diversifying selection across the region of interest, and to identify episodic selection at individual sites. 14 The statistically significant positive or negative selection was based on P value less than .05. 14 Homology models have been built relying on the website SwissModel. 15 Structural templates have been searched and validated using the software available within the SwissModel environment and HHPred. 16 Homology models have been validated using the QMEAN tool. 17 Three-dimensional structures have been analyzed and displayed using PyMOL. 18 The ML phylogenetic tree, performed on whole genome sequences, is represented in Figure 1 Identification of a novel coronavirus associated with severe acute respiratory syndrome Emerging coronaviruses: genome structure, replication, and pathogenesis A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-toperson transmission: a study of a family cluster Clinical features of patients infected with 2019 novel coronavirus in Wuhan 3 Cartoon model of the structural superposition between the homology model of the 2019-nCoV in blue and the spike glycoprotein of SARS coronavirus (PDB code 6acc.1) in orange SARS, severe acute respiratory syndrome A novel coronavirus from patients with pneumonia in China The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health -The latest 2019 novel coronavirus outbreak in Wuhan, China Outbreak of pneumonia of unknown etiology in Wuhan China: the mystery and the miracle Homologous recombination within the spike glycoprotein of the newly identified coronavirus may boost cross-species transmission from snake to human MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization BioEditA user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT jModelTest 2: more models, new heuristics and parallel computing Molecular Evolutionary Genetics Analysis across computing platforms FUBAR: a fast, unconstrained bayesian approximation for inferring selection Detecting individual sites subject to episodic diversifying selection SWISS-MODEL: homology modelling of protein structures and complexes A completely reimplemented MPI Bioinformatics Toolkit with a new HHpred server at its core QMEAN server for protein model quality estimation The {PyMOL} Molecular Graphics System, Version~1 Codon usage pattern and prediction of gene expression level in Bungarus species The 2019-new coronavirus epidemic: Evidence for virus evolution