key: cord-324251-wgtatr8v authors: Joshi, Madhvi; Puvar, Apurvasinh; Kumar, Dinesh; Ansari, Afzal; Pandya, Maharshi; Raval, Janvi; Patel, Zarna; Trivedi, Pinal; Gandhi, Monika; Pandya, Labdhi; Patel, Komal; Savaliya, Nitin; Bagatharia, Snehal; Kumar, Sachin; Joshi, Chaitanya title: Genomic variations in SARS-CoV-2 genomes from Gujarat: Underlying role of variants in disease epidemiology date: 2020-07-13 journal: bioRxiv DOI: 10.1101/2020.07.10.197095 sha: doc_id: 324251 cord_uid: wgtatr8v Humanity has seen numerous pandemics during its course of evolution. The list includes many such as measles, Ebola, SARS, MERS, etc. Latest edition to this pandemic list is COVID-19, caused by the novel coronavirus, SARS-CoV-2. As of 4th July 2020, COVID-19 has affected over 10 million people from 170+ countries, and 5,28,364 deaths. Genomic technologies have enabled us to understand the genomic constitution of the pathogens, their virulence, evolution, rate of mutations, etc. To date, more than 60,000 virus genomes have been deposited in the public depositories like GISAID and NCBI. While we are writing this, India is the 3rd most-affected country with COVID-19 with 0.6 million cases, and >18000 deaths. Gujarat is the fourth highest affected state with 5.44 percent death rate compared to national average of 2.8 percent. Here, 361 SARS-CoV-2 genomes from across Gujarat have been sequenced and analyzed in order to understand its phylogenetic distribution and variants against global and national sequences. Further, variants were analyzed from diseased and recovered patients from Gujarat and the World to understand its role in pathogenesis. From missense mutations, found from Gujarat SARS-CoV-2 genomes, C28854T, deleterious mutation in nucleocapsid (N) gene was found to be significantly associated with mortality in patients. The other significant deleterious variant found in diseased patients from Gujarat and the world is G25563T, which is located in Orf3a and has a potential role in viral pathogenesis. SARS-CoV-2 genomes from Gujarat are forming distinct cluster under GH clade of GISAID. detailed mutation frequency profile is provided as Supplemental Table S4 . With reference to 153 Indian genomes, G11083T, C28311T, C6312A, C23929T and C13730T were found to be 154 occurring at more than 24% frequencies (p-value <0.001). From these mutations, G11083T, C28311T and C6312A were found to be missense mutations. G11083T and C6312A lie in the 156 region of Orf1a encoding Nsp6. Further deceased versus recovered patient mutation profile 157 analysis of the known patient's status dataset from Gujarat and Global is represented in Figure 6 . 158 Similarly, comparison of missense mutation profile of deceased verses recovered patients with 159 genome count, frequency >5%, and p-value for global dataset is represented in improve the adaptive behaviour of pathogenic species, thus, making them highly contagious. Further, laboratory and experimental studies need to be carried out to validate the exact role of this 297 particular mutation in respect to the molecular pathways and interactions in the biological systems 298 despite being a strong possible mutation candidate found in the Gujarat region. The genomics approach has been a useful resource to identify and characterize the virulence, FastQC Version 0.11. 5. A Quality Control Tool for High Throughput genomes Positive selection of ORF3a and 477 ORF8 genes drives the evolution of SARS-CoV-2 during the 2020 COVID-19 pandemic Genome Composition and Divergence of the Novel Coronavirus (2019-nCoV) Originating 481 in China Beware of the second wave of COVID-19 Full-genome sequences of the first two SARS-CoV-2 485 viruses from India Genotyping coronavirus SARS-CoV-2: methods and implications PhyloSuite: 488 an integrated and scalable desktop platform for streamlined molecular sequence data 489 management and evolutionary phylogenetics studies The authors are grateful to the Secretory, Department of Science and