key: cord-0906142-occgc3v2 authors: Rahman, Mohammad Shaminur; Islam, Mohammad Rafiul; Hoque, Mohammad Nazmul; Alam, Abu Sayed Mohammad Rubayet Ul; Akther, Masuda; Puspo, Joynob Akter; Akter, Salma; Anwar, Azraf; Sultana, Munawar; Hossain, Mohammad Anwar title: Comprehensive annotations of the mutational spectra of SARS‐CoV‐2 spike protein: a fast and accurate pipeline date: 2020-10-06 journal: Transbound Emerg Dis DOI: 10.1111/tbed.13834 sha: 7f7356cce905c92f92e9ab77724541c445df59b9 doc_id: 906142 cord_uid: occgc3v2 Infecting millions of people, the SARS‐CoV‐2 is evolving at an unprecedented rate, demanding advanced and specified analytic pipeline to capture the mutational spectra. In order to explore mutations and deletions in the spike (S) protein — the most‐discussed protein of SARS‐CoV‐2 — we comprehensively analyzed 35,750 complete S protein‐coding sequences through a custom Python‐based pipeline. This GISAID‐collected dataset of until 24 June 2020 covered six continents and five major climate zones. We identified 27,801 (77.77% sequences) mutated strains compared to reference Wuhan‐Hu‐1 wherein 84.40% of these strains mutated by only a single amino acid (aa). An outlier strain (EPI_ISL_463893) from Bosnia and Herzegovina possessed six aa substitutions. We also identified 11 residues with high aa mutation frequency, and each contains four types of aa variations. The infamous D614G variant has spread worldwide with ever‐rising dominance and across regions with different climatic conditions alongside L5F and D936Y mutants, which have been documented throughout all regions and climate zones, respectively. We also found 988 unique aa substitutions spanned across 660 residues, which differed significantly among different continents (p = .003) and climatic zones (p = .021) as inferred with the Kruskal–Wallis test. Besides, 17 in‐frame deletions at four sites adjacent to receptor‐binding‐domain were determined that may have a possible impact on attenuation. This study provides a fast and accurate pipeline for identifying mutations and deletions from the large dataset for coding and also non‐coding sequences as evidenced by the representative analysis on existing S protein data. By using separate multi‐sequence alignment, removing ambiguous sequences and in‐frame stop codons, and utilizing pairwise alignment, this method can derive both synonymous and non‐synonymous mutations (strain_ID reference aa:mutation position:strain aa). We suggest that the pipeline will aid in the evolutionary surveillance of any SARS‐CoV‐2 encoded proteins and will prove to be crucial in tracking the ever‐increasing variation of many other divergent RNA viruses in the future. The code is available at https://github.com/SShaminur/Mutation-Analysis. Mutations in the viral genomes serve as the building blocks of viral evolution and remain the main reason for the novelty in evolution (Baer, 2008; Duffy, 2018) . However, mutations in the viral genomes are not restricted to their replication since they can also result from spontaneous nucleic acid damage over time in different host populations or from editing of the genetic materials. Thus, a large portion of mutations, either at nucleotides (nt) and/or change in amino acids (aa) levels, are harmful (Loewe & Hill, 2010) . RNA viruses like SARS-CoV-2 generally have higher mutation rates; however, a few of these mutations are correlated with differential virulence, evolving ability, and traits considered beneficial for viruses (Duffy, 2018; Islam et al., 2020) . Inherent high mutation rate of SARS-CoV-2 has already produced many descendants from the original Wuhan strain, which complicates its genotyping. The ability of the structural proteins especially spike protein, in different strains of the SARS-CoV-2 to undergo rapid changes have enabled their genomes to emerge in novel hosts, escape vaccine-induced immunity and evolve in diverse geo-climatic conditions (Duffy, 2018; Islam et al., 2020; Loewe & Hill, 2010) . Moreover, spontaneous mutation is a key parameter in modelling the genetic structure and evolution of populations (Drake & Holland, 1999) . Therefore, investigation of the increased rate of non-synonymous mutations in the SARS-CoV-2 genomes could be an important tool in assessing the genetic health of the populations. SARS-CoV-2 comprises of four major structural proteins-specifically spike (S) glycoproteins, envelope (E) proteins, membrane (M) proteins and nucleocapsid (N) proteins (Ahmed et al., 2020; Rahman et al., 2020; Wu et al., 2020) . The entry of SARS-CoV-2 into the host cells is mediated by the transmembrane S protein which consists of two functional subunits responsible for binding to the host cell receptor (S1 subunit), and for fusing the viral and cellular membranes (S2 subunit) (Walls et al., 2020) . The higher antigenic and surface exposure properties of the S protein facilitate the attachment and entry of viral particles into the host cells through the host angiotensin-converting enzyme 2 (ACE2) receptor (Grant et al., 2020; Shang et al., 2020; Zhou et al., 2019) . Therefore, the spike contains highest variations and determines, to some extent, the viral host range (Coutard et al., 2020; Wu et al., 2020) . Furthermore, the S protein is the main target of neutralizing antibodies (Abs) upon infection and is thus one of the most important structures for therapeutics and vaccine design Walls et al., 2020) . The continuing rapid transmission and global spread of have raised intriguing questions regarding the evolution and adaptation of SARS-CoV-2 in diverse geographic and climatic conditions driven by non-synonymous mutations, deletions and/or replacements (Bal et al., 2020; Islam et al., 2020; Pachetti et al., 2020) . The capability of the different strains of SARS-CoV-2 strains for swiftly adapting to diverse environments could be linked with their geographic distributions. Though not yet well studied, evidence suggests that the transmission of SARS-CoV-2 infections and per day mortality rate from this infection is positively associated with weather conditions, and the diurnal temperature range (DTR) (Su et al., 2016; Islam et al., 2020) . However, the exact role of geo-climatic conditions on SARS-CoV-2 is unknown, but it would be worth keeping in mind that this novel disease originated from wildlife before spreading to humans (Harvey, 2020) . Therefore, genomic mutation analysis of SARS-CoV-2 strains, integrated with geographic and climatic data, would provide a fuller understanding of the origin, dispersal and dynamics of the evolving SARS-CoV-2 virus. Although several reports predicted possible adaptations at the nucleotide and aa level, along with structural heterogeneity in viral proteins, especially in the S protein (Armijos-Jaramillo et al., 2020; Islam et al., 2020; Phan, 2020; Sardar et al., 2020) , most of these studies were carried out few complete representative genomes from a limited geographic area. As the genome number is increasing day by day, regular in-house monitoring of the crucial components such as the S protein is urgently necessary to understand the genomic basis and evolution of the diagnostic RT-PCR primer. There are a few pipelines (Yin, 2020) and websites (https://mendel.bii.astar.edu.sg/METHO DS/ coron a/beta/MUTAT IONS/hCoV19_Human_2019_Wuhan WIV04/ hCoV-19_Spike_new_mutat ions_table.html) in GSAID where aa change or substitution can be observed. In order to provide an alternative tool with a wider range of functions, we present an easy rapid pipeline that will assist in the alignment of large volumes of viral genomes, remove low-quality sequences and in-frame stop codons and provide in-house non-synonymous mutation analysis of large volumes of sequences while requiring minimal knowledge of the command line. This tool can perform this analysis for any other proteins as required. This study aimed to investigate the mutational spectra of aa utilizing this novel methodology in the S proteins in 35,750 complete genome sequences of the SARS-CoV-2 belonging to 135 countries and/regions, and five climatic zones around the world, retrieved from the global initiative on sharing all influenza data (GISAID) (https://www.gisaid.org/) up to 24 June 2020 (Data S1). To decipher the genetic variations of the S glycoprotein, we retrieved 53,981 complete (or near-complete) genome sequences of variation of many other divergent RNA viruses in the future. The code is available at https://github.com/SShaminur/Mutation-Analysis. Climate, Geography, Mutations, SARS-CoV-2, Spike (S) protein | COVID-19 SARS-CoV-2, available at the global initiative on sharing all influenza data (GISAID) (https://www.gisaid.org/) up to 24 June 2020. These sequences belonged to infected patients from 135 countries and/or regions from across six continents (Data S1). Using pyfasta (https://github.com/brent p/pyfasta), we split the total genome into 6 separate files having around 8,900 sequences in each. We aligned each file through the MAFFT (maximum limit 10,000 sequences) online server (https://mafft.cbrc.jp/align ment/serve r/add_fragm ents. html?fromm anual) using default parameters (Katoh et al., 2002) . The complete genome sequence of SARS-CoV-2 Wuhan-Hu-1 strain (Accession NC_045512, Version NC_045512.2) was used as a reference genome. MEGA 7 was used to differentiate the spike protein of SARS-CoV-2 from multiple sequence alignment (Sudhir Kumar et al., 2016) . Sequence cleaner (https://github.com/metag eni/Seque nce-Cleaner) with set parameters of minimum length (m = 3,822), percentage N (mn = 0), keep_all_duplicates, and remove_ambiguous was employed to remove all ambiguous, and low-quality sequences. We utilized SeqKit toolkit (seqkit grep -s -p "-" in.fa > out.fa) to apprehend gap containing strains for deletion analysis (Shen et al., 2016) . Internal stop codon containing sequences were removed by using SEquence DAtaset builder (SEDA; https://www.sing-group.org/seda/). Amino acid mutation analysis was done with bio-python program using pairwise alignment (https:// github.com/SSham inur/Mutat ion-Analysis). The custom Venn diagrams (http://bioin forma tics.psb.ugent.be/webto ols/Venn/) server was used to make the Venn diagrams, and visualize the data. Swiss-Model, a structure homology-modelling server (https://swiss model.expasy.org/), was used to predict the 3D structure (template, PDB ID:6VSB) of the S protein of the reference genome, and the structure was visualized in PyMOL (DeLano, 2002; Rahman et al., 2020; Waterhouse et al., 2018) . Furthermore, we divided the S glycoprotein mutation of SARS-CoV-2 data according to their geographic origins from six continents-Europe, Asia, North America, South America, Africa and Australia, and five related climatic zones-temperate, tropical, diverse, dry and continental (Kissler, Tedijanto, Goldstein, & Yonatan, 2020) . To estimate the case fatality (mortality) rates of SARS-CoV-2 infections, we collected information on total infected cases and total reported deaths in these countries from the World Health Organization (WHO) COVID-19 Reports up to 24 June 2020 (WHO Reports, 2020). The overview of the methods is described in Figure 1 . The SARS-CoV-2 genomes are increasing very rapidly in the Global initiative on sharing all influenza data (GISAID), but not all genomes are of high quality or complete. So, non-synonymous mutation analysis with particular crucial part of the virus like S or other structural protein gives statistically more significant insights rather considering the complete genome of the SARS-CoV-2 virus. Of the total S protein sequences, sequence cleaner removed 33.77% of the low-quality or ambiguous sequences. Of the rest cleaned sequences (66.23%), we found ten in-frame stop codon containing sequences which were eventually removed using SEDA (https://www.sing-group.org/seda/manua l/opera tions). SeqKit toolkit was used to arrest gap containing sequences which identified around 453 sequences, and we also carefully checked the in-frame deletion, and 103 strains containing in-frame deletions. SNP-sites is a very efficient tools for nucleotide variation detection in different format like multi-fasta alignment, variant call format (VCF), and relaxed F I G U R E 1 Workflow of the pipeline used for non-synonymous mutation analyses in this study. File splitting needs if the number of sequences is more than 10,000. Through these methods, nucleotide mutations can also be calculated. Here: MSA: multiple sequence alignment, and ORFs: open reading frames (GISAID/NCBI) (pyfasta) Remove Inframe stop conon containing sequences (SEDA) Translate into Protein and Merge Split Files (MEGA) Output (pairwise_mutation.py) Downstream analysis (Microsoft Excel and R) Remove Ambigous Sequences (Sequence Cleaner) Differentiate into ORFs (MEGA) MSA with Reference (MAFFT) phylip format (Page et al., 2016) but this tool is highly dedicated for nucleotide. Snippy (Seemann, 2015) is another tool where nucleotide and protein variation can also be detected, but for large data set with ambiguous sequences will require a separate processing to en- Wu-Kabat variability coefficient was employed to calculate the aa position variability in regard to evolutionary adaptation (Garcia-Boronat et al., 2008; Kabat et al., 1977) . The variability coefficient was calculated using the following formula: N = total number of sequences in the alignment, k = number of different aa at a given position, and n = frequency of the most common aa at that position. We used Microsoft Excel 2016 to calculate the frequency, percentages, Wu-Kabat variability coefficient calculation using the above mentioned formula and overall data management (David, 2017) . Wu-Kabat variability coefficient plot was visualized in RStudio by using ggplot2 package (Wickham, 2011) . Frequency lolliplot was also visualized in RStudio with the trackViewer Vignette package (https:// bioco nduct or.org/packa ges/relea se/bioc/vigne ttes/track Viewe r/ inst/doc/track Viewer.htm) (Ou et al., 2020a (Ou et al., ,2020b . To measure the morbidity and case fatality rates, and association between the S protein mutational spectra and case fatality rates, we applied non-parametric test Kruskal-Wallis rank sum test (Hoque et Trimming of the low-quality, ambiguous and non-human host RNA sequences resulted in 35,750 (66.23%) cleaned and full-length S protein sequences (Data S1). These sequences belonged to 135 countries and/or regions from six continents (Europe, Asia, North America, South America, Africa and Australia) and five major climatic zones (temperate, tropical, diverse, dry and continental) around the world (Data S1). European countries and/or regions had the highest percentage (58.90%) of S protein sequences, followed by North American (25.78%), Asian (9.34%), Australian (3.61%), South American (1.21%) and African (1.18%) countries or regions. On the other hand, the temperate climatic zone covered the majority of these S protein sequences (60.18%), followed by diverse (33.08%), continental (3.25%), tropical (2.81%) and dry (0.69%) climatic conditions (Data S1). We selected the complete genome sequence SARS-CoV-2 Wuhan-Hu-1 strain (Accession NC_045512, Version NC_045512.2) as a reference genome. Through non-synonymous mutations analysis, we found 27,801 (77.77%) mutated strains of the SARS-CoV-2 in the cleaned sequences (n = 35,750). Furthermore, country or region-specific aa change patterns revealed the highest number of mutated SARS-CoV-2 strains in England (7,067) followed by USA (6,501), Wales (3,002), Scotland (1,463), Netherlands (1,194), Australia (681), Belgium (596) and Denmark (582) (Data S1). Our mutational analyses revealed a total of 988 unique amino acid Figure 3b ). Remarkably, we found eleven highly variable sites (position: 32, 142, 146, 215, 261, 477, 529, 570, 622, 778, 791, 1,146, 1,162) showing four types of aa variations in a single position (Table 1 ). We also found that positions 52, 185 and 410 in the S glycoprotein underwent to 3, 2 and 1 aa substitutions, respectively (Table 1 Overall, the aa substitutions related to asparagine in the RBD (ACE binding domain) and/or in S1/2 domains nearer to the glycosylated sites may affect the glycosylation shield, folding of S protein, hostpathogen interactions, viral entry and finally immune modulation, thus affecting antibody recognition and viral pathogenicity (Ou et al., 2020a (Ou et al., ,2020b Watanabe et al., 2020) . Overall, these variability profiles may have notable implications in therapeutic and/or prophylactic interventions targeting the S protein of SARS-CoV-2. Besides site-specific mutations, our analysis revealed 17 in-frame deletions of ranged nucleotides across the SARS-CoV-2 S protein sequences originating from different countries worldwide ( 1,170 3 S1170T, S1170Y, S1170P Note: Here, the position(s) where more than 2 variations occurred are represented. Note: Countries represent the origin of strains where the deletions found. We considered the deletions that occurred in at least two strains in a certain position. TA B L E 2 Deletion sites observed across the S glycoprotein . Moreover, attenuated SARS-CoV-2 variants with 15-30-bp deletions (Del-mut) at the S1/S2 junction were reported to show less virulence in an animal model (Lau et al., 2020) . These deletions may affect viral adaptations to human, virushost interactions for infections, attenuation, pathogenicity and immune modulations by potentially influencing the tertiary structures and functions of the associated proteins (Phan, 2020) . However, further studies are required for the mechanistic clarification and functional implication of these deletions in the SARS-CoV-2 S glycoprotein. The deletion mutations identified in this study should be also considered for current vaccine development. Considering geo-climatic impacts on aa changes in the S protein of the SARS-CoV-2, we sought to determine the possible residue positions, and total number of mutations in the S protein sequences from 135 countries and/or territories, and five climatic zones worldwide. Nine hundred and eighty-eight (988) unique aa replacements across 660 positions along the S protein were identified which differed significantly among different continents (p = .003, Kruskal-Wallis test) and climatic zones (p = .021, Kruskal-Wallis test). We found that the frequency of aa changes in the S protein remained substantially higher in the SARS-CoV-2 genome sequences of Europe (62.02%), followed by North America (25.50%), Asia (6.83%), Australia (2.89%), South America (1.41%) and Africa (1.35%) (Figure 5a , Data S1). Among these replacements, aa residues at position 5 (L5F) and 614 (D614G) were found to be the common in Asia, Europe, North (Watanabe et al., 2020) . The mutational evolution geographic and climatic patterns of mutational evolution of SARS-CoV-2 S protein were visualized in Figure 6 . F I G U R E 4 Structural visualization of S protein deletion sites. The four aa deleted positions (61-76, 138-144, 241-244, and 675-679) in the S protein of the reference genome, SARS-CoV-2 Wuhan-Hu-1 strain (Accession NC_045512, Version NC_045512.2). The positions are visualized in the tertiary (3D) structure of S protein using PyMOl. The smudge, cyan and light orange colours represent the A, B and C chains of SARS-CoV-2 spike protein, respectively. Blue, yellow, magenta and red colours represent the aa deletion position of 61-76, 138-144, 241-244, and 675-679, respectively The genomic variability of SARS-CoV-2 strains manifested by mutations in the spike protein scattered across the globe underlay geographically specific aetiological effects. One important effect of mapping mutations is the development of antiviral therapies targeting specific regions, for example the spike region of the SARS-CoV-2 genomes (Callaway, 2020) . Our current findings corroborate the study completed by Deshwal (2020) , who reported the highest SARS-CoV-2 infections and case fatality rates in European countries. In another study, Pachetti et al. (2020) reported two non-synonymous mutations (R203K and L3606F) that were shared across ORFs of the SARS-CoV-2 genomes of six continents, and recurrent mutations were also common in different countries along with unique mutations. Nevertheless, mutations in the structural proteins of the SARS-CoV-2, especially in the spike proteins, are driven by the geographic locations that diverged differently, possibly due to the environment, demography and the low fidelity of reverse transcriptase (Brassey et al., 2020; Pachetti et al., 2020; Su et al., 2016) . In this study, we found 14.16%, 11.72%, 10.05%, 9.31%, 3.30%, 3.00%, 2.30%, 2.07%, 1.65% and 1.63% case fatality rates in We compared the S protein mutations of the SARS-CoV-2 with the SARS-CoV reference strain (NCBI accession no. NC_004718) and Bat coronavirus RaTG13 strain (NCBI accession no. MN996532). The identity, similarity and gap of the S protein between the Wuhan strain of the SARS-CoV-2 and RaTG13 were 97.3%, 98.3% and 0.4%, respectively, and those between the Wuhan strain SARS-CoV-2 and SARS-CoV were 76.2%, 86.9% and 2.1%, respectively (Table S1 ). These findings are in line with many of the previously published reports (Swatantra Kumar et al., 2020; Tang et al., 2020; Wrapp et al., 2020) . We found mutations in the variable regions between SARS-CoV-2 and RaTG13, and these recurrent mutations (S50L, T76I, A372T, N439K) are supposed to be converted to RaTG13 from SARS-CoV-2 (Table S1) . Furthermore, we also found 45 mutation sites in the variable regions between the SARS-CoV-2 and SARS-CoV which resulted in the conversion of SARS-CoV-2 to SARS-CoV (Figure 7) . The RaTG13 genome possessed a deletion site (681-684 aa) in respect to the SARS-CoV-2 genome, and we also found deletions at a very close position (675-679 aa) in two strains of SARS-CoV-2 ( Table 2 ). The SARS-CoV also possessed deletions in respect to the Wuhan reference strain of the SARS-CoV-2 at aa positions (72-78, 144-147, 243-247, 256-257, 679-682) . In this study, we also found deletion at different aa positions (61-76, 138-145, 241-244, 675-679) in different strains of SARS-CoV-2. Therefore, these types of deletions suggest that different strains of the SARS-CoV-2 are acquiring the traits of SARS-CoV. Moreover, a recent study reported that the S1 protein of the Pangolin-CoV is much more closely related to SARS-CoV-2 than to RaTG13 (Uddin et al., 2020) . However, this phenomenon of evolving mutations and/or recurrent mutations should be interpreted using a larger dataset from different host populations and geo-climatic conditions. Our findings on non-synonymous mutations in the spike protein of SARS-CoV-2 genomes suggest that the virus is continuously evolving. European, North American and Asian strains might coexist where each of them were characterized by a different mutation patterns. Moreover, the geo-climatic distribution of the recurrent mutations in the spike deciphered a plausible link to higher mutations rates and disease severity in the European temperate countries. However, the geo-climate effects of the observed mutations in the spike protein of SARS-CoV-2 on the properties of the diverse strain variants are yet to be evaluated in clinical or experimental studies. Therefore, these results need to be interpreted cautiously given the existing uncertainty about SARS-CoV-2 genomic data to develop potential prophylaxis and mitigation for tackling the COVID-19 pandemic crisis. Therefore, the fast and accurate pipeline will help in an easy and accurate way to investigate the synonymous/non-synonymous mutation, mutation frequency and deletion analysis from large number of data with a shortest possible time without having in-depth bioinformatics knowledge. The authors sincerely appreciate the researchers worldwide who had deposited and shared the complete genomes data of SARS-CoV-2 and other coronaviruses to GISAID (https://www.gisaid. org/). This research utilized these precious data. The authors would also like to extend thanks to Geni Gueiros who was kind to modify his tools (Sequence cleaner) upon request from Md. Shaminur F I G U R E 7 Lolliplot mapping of mutational conversion from SARS-CoV-2 to SARS-CoV with their frequency. We identified 45 sites in the SARS-CoV-2 S protein with substitutions resulting in aa homogeneity with the S protein of SARS-CoV domain of MERS-CoV spike glycoprotein. Nature Communications, 10(1), 1-13. https://doi.org/10.1038/s4146 7-019-10897 -4 Additional supporting information may be found online in the Supporting Information section. Preliminary identification of potential vaccine targets for the COVID-19 coronavirus (SARS-CoV-2) based on SARS-CoV immunological studies SARS-CoV-2, an evolutionary perspective of interaction with human ACE2 reveals undiscovered amino acids necessary for complex stability Does mutation rate depend on itself Molecular characterization of SARS-CoV-2 in the first COVID-19 cluster in France reveals an amino acid deletion in nsp2 (Asp268del) Do weather conditions influence the transmission of the coronavirus (SARS-CoV-2)? Centre for Evidence-Based Medicine Coronavirus vaccines: Five key questions as trials begin Identification of variable sites in Sars-CoV-2 and their abundance profiles in time The spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-like cleavage site absent in CoV of the same clade books ?hl=en&l-r= & i d = y I q q D w A A Q B A J & o i = f n d & p g= PP 1&d q = St at i s t i c s +for+manag ers The PyMOL molecular graphics system Mutation rates among RNA viruses COVID 19: A comparative study of Asian, European, American continent Why are RNA virus mutation rates so damn high? Could the D614 G substitution in the SARS-CoV-2 spike (S) protein be associated with higher COVID-19 mortality? PVS: A web server for protein sequence variability analysis tuned to facilitate conserved epitope discovery 3D Models of glycosylated SARS-CoV-2 spike protein suggest challenges and opportunities for vaccine development What Could Warming Mean for Pathogens like Coronavirus? E&E News Metagenomic deep sequencing reveals association of microbiome signature with functional biases in bovine mastitis Genome-wide analysis of SARS-CoV-2 virus strains circulating worldwide implicates heterogeneity Unusual distributions of amino acids in complementarity determining (hypervariable) segments of heavy and light chains of immunoglobulins and their possible roles in specificity of antibody-combining sites MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform Genome-Wide Identification and Characterization of Point Mutations in the SARS-CoV-2 Genome. Osong Public Health and Research Perspectives Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period Tracking changes in SARS-CoV-2 Spike: Evidence that D614G increases infectivity of the COVID-19 virus Structural, glycosylation and antigenic variation between 2019 novel coronavirus (2019-nCoV) and SARS coronavirus (SARS-CoV). Virus Disease MEGA7: Molecular evolutionary genetics analysis version 7.0 for bigger datasets Attenuated SARS-CoV-2 variants with deletions at the S1/S2 junction Identification of common deletions in the spike protein of SARS-CoV-2 The population genetics of mutations: Good, bad and indifferent Package 'track-Viewer Characterization of spike glycoprotein of SARS-CoV-2 on virus entry and its immune cross-reactivity with SARS-CoV Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments Genetic diversity and evolution of SARS-CoV-2. Infection Epitope-based chimeric peptide vaccine design against S, M and E proteins of SARS-CoV-2, the etiologic agent of COVID-19 pandemic: An in silico approach Comparative analyses of SAR-CoV2 genomes from different geographical locations and other coronavirus family genomes reveals unique features potentially consequential to host-virus interaction and pathogenesis The outbreak of SARS-CoV-2 pneumonia calls for viral vaccines SeqKit: A cross-platform and ultrafast toolkit for FASTA/Q file manipulation Snippy: rapid haploid variant calling and core SNP phylogeny Epidemiology, genetic recombination, and pathogenesis of coronaviruses On the origin and continuing evolution of SARS-CoV-2 Novel Coronavirus (2019-nCoV) Situation Reports (WHO Unveiling diffusion pattern and structural impact of the most invasive SARS-CoV-2 spike mutation SARS-CoV-2/COVID-19: Viral genomics, epidemiology, vaccines, and therapeutic interventions Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein Structural and functional basis of SARS-CoV-2 entry by using human ACE2 Site-specific glycan analysis of the SARS-CoV-2 spike SWISS-MODEL: Homology modelling of protein structures and complexes Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Analysis of therapeutic targets for SARS-CoV-2 and discovery of potential drugs by computational methods Genotyping coronavirus SARS-CoV-2: Methods and implications A highly conserved cryptic epitope in the receptor binding domains of SARS-CoV-2 and SARS-CoV