key: cord-0053017-zetqonae authors: Kumar, Amit; Saxena, Ajit Kumar; Lee, Gwo Giun (Chris); Kashyap, Amita; Jyothsna, G. title: Genomics and Evolution of Novel Corona Virus 2019 date: 2020-09-27 journal: Novel Coronavirus 2019 DOI: 10.1007/978-981-15-7918-9_2 sha: 5fb08ae9dc67fab522831ead44bcdd4a4be43672 doc_id: 53017 cord_uid: zetqonae Novel Corona Virus has become the talk and thought of the year disturbing the entire world with its new disease COVID. Being totally new, it has become a challenge to diagnose and treat the victims. Thus, it is necessary to understand in detail the genomics of the virus, thereby predicting its evolutionary relation to the other known organisms. This can provide an insight for the development of diagnosis and treatment methodology. In view of the importance of understanding the genetic make of NCoV, the current work was undertaken. The chapter deals in detail with the genomic sequence, important regions, conservations and variations in the genetic make up of the Novel Corona virus. Genome [2] From the data of NCBI it can be inferred that the major proteins involved in the virus structure and function are ORFs and structural proteins M, S, E and N. The longest gene region in the viral genome codes for ORF1ab followed by the gene coding for S protein. Furthermore, certain important characteristics of the genomic region are identified as follows: • The complete genome is from 1 to 29903 bp. The genomic sequence of Novel Corona Virus (NCoV) was retrieved from NCBI which was subjected to BLAST run to detect the organisms sharing similarity. According to the data mining related to NCoV it was found that the organism belongs to the family of SARS which are known to cause severe acute respiratory syndrome ( Fig. 2.1 ). Inference: The above BLAST analysis for the complete genome of NCoV reveals its association with all the other SARS Corona virus of class 2 and share an identity of 99% to 100% with most of them. Furthermore to understand the conservation pattern among the functional protein coding genes of NCoV genome their nucleotide sequences were retrieved from NCBI. A similarity search was performed using BLAST. Among them the major gene covering the maximum length of the genome is ORF1ab polyprotein (Gene region: Fig. 2.2) . Inference: From the above BLAST analysis of ORF1AB polyprotein gene sequence it can be inferred that all the organisms that belong to SARS Corona Virus 2 family share completely identical sequence for this gene. Thus ORF1AB is conserved evolutionarily and might not be the reason for the variations among the isolates. A similar comparison was performed for the genes coding for Surface Glycoprotein S, Nucleocapsid Protein N, ORF3a polyprotein, Membrane Glycoprotein M, Envelope Protein E, ORF6, ORF7a, ORF8 and ORF10. It was revealed that all these sequences are completely conserved and do not have any variable regions within the SARS corona 2 family. The above analysis related to the conservation study of the SARS Corona virus 2 reveals that all these 10 important genes share 99 to 100% sequence identity to the other SARS Corona Virus 2 family isolates. Thus there are no regions which are variable and are not the reason for evolution of any novel strain. Database resources of the national center for biotechnology information A new coronavirus associated with human respiratory disease in China