key: cord-0828868-128mghn7 authors: Alam, A. S. M. R. U.; Islam, O. K.; Hasan, M. S.; Islam, M. R.; Mahmud, S.; AlEmran, H. M.; Jahid, I. K.; Crandall, K. A.; Hossain, M. A. title: Evolving Infection Paradox of SARS-CoV-2: Fitness Costs Virulence? date: 2021-02-23 journal: nan DOI: 10.1101/2021.02.21.21252137 sha: 51b583f5b71ba913bc2061f7930f00c88ae18aa2 doc_id: 828868 cord_uid: 128mghn7 Background: SARS-CoV-2 is continuously spreading worldwide at an unprecedented scale and evolved into seven clades according to GISAID where four (G, GH, GR and GV) are globally prevalent in 2020. These major predominant clades of SARS-CoV-2 are continuously increasing COVID-19 cases worldwide; however, after an early rise in 2020, the death-case ratio has been decreasing to a plateau. G clade viruses contain four co-occurring mutations in their genome (C241T+C3037T+C14408T: RdRp.P323L+A23403G:spike.D614G). GR, GH, and GV strains are defined by the presence of these four mutations in addition to the clade-featured mutation in GGG28881-28883AAC:N. RG203-204KR, G25563T:ORF3a.Q57H, and C22227T:spike.A222V+C28932T-N.A220V+G29645T, respectively. The research works are broadly focused on the spike protein mutations that have direct roles in receptor binding, antigenicity, thus viral transmission and replication fitness. However, mutations in other proteins might also have effects on viral pathogenicity and transmissibility. How the clade-featured mutations are linked with viral evolution in this pandemic through gearing their fitness and virulence is the main question of this study. Methodology: We thus proposed a hypothetical model, combining a statistical and structural bioinformatics approach, endeavors to explain this infection paradox by describing the epistatic effects of the clade-featured co-occurring mutations on viral fitness and virulence. Results and Discussion: The G and GR/GV clade strains represent a significant positive and negative association, respectively, with the death-case ratio (incidence rate ratio or IRR = 1.03, p <0.001 and IRR= 0.99/0.97, p < 0.001), whereas GH clade strains showed no association with the Docking analysis showed the higher infectiousness of a spike mutant through more favorable binding of G614 with the elastase-2. RdRp mutation p.P323L significantly increased genome-wide mutations (p<0.0001) since more expandable RdRp (mutant)-NSP8 interaction may accelerate replication. Superior RNA stability and structural variation at NSP3:C241T might impact upon protein or RNA interactions. Another silent 5'UTR:C241T mutation might affect translational efficiency and viral packaging. These G-featured co-occurring mutations might increase the viral load, alter immune responses in host and hence can modulate intra-host genomic plasticity. An additional viroporin ORF3a:p.Q57H mutation, forming GH-clade, prevents ion permeability by cysteine (C81)-histidine (H57) inter-transmembrane-domain interaction mediated tighter constriction of the channel pore and possibly reduces viral release and immune response. GR strains, four G clade mutations and N:p.RG203-204KR, would have stabilized RNA interaction by more flexible and hypo-phosphorylated SR-rich region. GV strains seemingly gained the evolutionary advantage of superspreading event through confounder factors; nevertheless, N:p.A220V might affect RNA binding. Conclusion: These hypotheses need further retrospective and prospective studies to understand detailed molecular and evolutionary events featuring the fitness and virulence of SARS-CoV-2. This study analyzed 225,526 high-coverage (<1% Ns and <0.05% unique amino acid 151 mutations) and complete (>29,000 nucleotide) genome sequences with specified collection 152 date from a total of 3,16,166 sequences submitted to GISAID until January 03, 2021.We 153 sifted the sequences generated from the non-human host out from the dataset. The Wuhan-154 Hu-1 (Accession ID-NC_045512.2) 27 isolate was used as the reference genome. 155 A python script was used to partition a significant part of the dataset into two subsets 156 based on the RdRp: C14408T mutation and estimated the genome-wide variations (single 157 nucleotide changes) for each strain. For the genome-wide mutation analysis, a total of 37,179 158 sequences (RdRp wild type or 'C' variant: 9,815; and mutant or 'T' variant: 27,364) were 159 analyzed from our dataset. The frequency of mutations was tested for significance with the 160 Wilcoxon signed-rank test between RdRp 'C' variant and 'T' variant using IBM SPSS 161 statistics 25. 162 Random effect poisson regression model was performed in STATA v13.0 to identify the 163 association between death-case ratio and different clade strains (G, GH, GR, and GV); both 164 unadjusted and adjusted incidence risk ratio (IRR) were estimated where time was introduced 165 as a panel variable 28 . 166 In this study, we report the prevalence of these dominant clades in 2020, both 168 individually and in combination, with disease progression and deaths allowing us to infer 169 increasing fitness of the SARS-CoV-2. A weekly-based time plot of G, GH, GR, and GV 170 clade frequencies with infection and death-cases was generated from 23 December 2019 until 171 January 3, 2021 that counted a total of 54 weeks (supplementary table S1). The total number 172 of infections and deaths by weeks were extracted from the WHO 2019-nCoV situation 173 reports. The case-death ratio was estimated by dividing the number of deaths of a particular 174 week by the number of cases identified in the earlier week based on the conservative 175 assumption of a one-week interval between diagnosis and death 29 . 176 Regional time plots of those clades were also generated monthly (from January to 177 June) with frequencies of new infections, deaths, and death-case ratio based on the available 178 data on WHO situation reports 30 The Mfold web server 52 was used with default parameters to check the folding 245 pattern of RNA secondary structure in the mutated 5'UTR, synonymous leader (T445C) and 246 NSP3 regions (C3037T). The structure of complete mutant 5'UTR (variant 'T') was 247 compared with the wild type (variant 'C') secondary pattern as mentioned in the Huston, et 248 al. 53 . Since the wild type (variant 'C' at 318 th nucleotide) RNA structure of the NSP3 was not 249 available in the literature, we generated the structure of mutant (variant 'T') to predict the 250 RNA folding in the Mfold web server. From the Mfold web server, we also estimated free 251 energy change (∆G) for wild and mutant NSP3 RNA fold. 252 We represented the global scenario of SARS-CoV-2 infection by the G, GH, GR, and GV 254 clade strains and estimated the association between the clade strains and death-case ratio. 255 Afterwards, the possible effects of the nine mutations in S, RdRp, ORF3a, N, 5'UTR, leader 256 protein, and NSP3 were discussed with associated results. Whereas researches on molecular 257 docking of the spike protein 54,55 and RdRp 56,57 in search of potential drug targets is a 258 continuous process, our study approached in a unique way to dock spike with elastase-2 and 259 RdRp with NSP8 to satisfy our purpose. The overall epistatic interactions of the mutant 260 proteins and/or RNA was then depicted (Figure 1 ) with appropriate explanation. Finally, we 261 endeavored to postulate using theoretical evolutionary theory how the virus might be 262 changing virulence and fitness. 263 Analysis of the SARS-CoV-2 genome sequences has indicated that 241C > T, 3037C > T, 265 14408C > T, and 23403A > G mutations were discretely identified among different viruses in 266 China on 24 th January of 2020. These four mutations together in a single virus was first 267 detected in England on 3 rd February 2020 (Table 1) . Since then, those mutations were found 268 to cooccur with other mutations, thus formed clade G, GH, GR, and GV, and have become 269 the most dominant variants in other regions of the world (Figure 2b ), for example, escalating 270 to 85% in May 2020 in Southeast Asia 20 . The G clade strain circulated predominantly (30%, 271 n=828) in Africa, whereas the GH, GR, and GV clades have become more recognized in the 272 Americas, Western Pacific and Europe, respectively (Table 1) . 273 Our weekly based time-plot has depicted a gradual increase of G, GH, and GR viruses 274 altogether since the 10 th week (24 February -2 March 2020) recording a sudden jump to 42% 275 in that week from a mere 12% of the previous week. The global COVID-19 cases 276 exponentially increased from the 10 th week with only 7,806 cases and infected almost 277 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 23, 2021. ; https://doi.org/10.1101/2021.02.21.21252137 doi: medRxiv preprint 500,000 people with ca. 35,000 deaths in just seven weeks while the G, GH, and GR strains 278 reached 80% (supplementary table S1). The death rate elevated at its peak (14%) in mid of 279 March (week 12) and gradually decreased to below 2% in early September. Correspondingly, 280 the number of GR strains is increasing from the 10 th week to 32 nd week with a small 281 fluctuation where the G and GH strains each maintained a static ratio between ~20-30%. GV 282 clade strains initiated into the population from 30 th week and became most dominant strain 283 replacing GR clade strains in just 9 weeks (Figure 2a) . 284 The geographical distribution plot of the G, GH, GR, and GV clades with infection and 285 death-cases delineates that the new infections began to rise exponentially with the increase of 286 coevolved mutant variants in all regions except the West Pacific area (Figure 2b ). The West 287 Pacific region, which includes East Asian countries as well, identified a very low number of 288 infection cases and deaths per million (Figure 2b ). Europe and America have a high rate of 289 infections as well as a high percentage of those variants. The reason may be linked with α 1-290 antitrypsin (AAT) deficiency, which is very rare in East Asia unlike Europe and North 291 America 58 . The AAT allele deficiency facilitates entry of the 614G subtype into the host 292 cells and accelerates the spread of G, GH, GR, and GV clades. Our analysis has also found 293 that the proportion of strains containing 23403A > G mutation was 25% in East Asia and 294 >75% in Europe and America in first 6 months of this year (data not shown). 295 The death-case ratio was decreasing globally while the GR mutants were increasing until 296 August ( Figure 2a ). Since September, GV clade strains were increasing while the death-case 297 ratio remained low (2%). To examine the association between clade strains and death-case 298 ratio, both unadjusted and adjusted incidence risk ratio were estimated. In the adjusted model, 299 G, GR, and GV clades were found to be significantly associated with death-case ratio in both 300 models. If G clade strains increase by one percentage point then death-case ratio would be 301 expected to increase by a factor of 1. Like other SARS-CoV-2 studies 59-61 , this statistical analysis also suffers from some 309 limitations in dealing with genomic and calculating death-case ratio data. The death-case 310 ratio is believed to be underestimated because of the inadequate number of tests capacity and 311 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. between the SARS-CoV-2 variants and mortality rates was observed in the Eastern 341 Mediterranean Region 69 . However, another report showed that, GR, GH and L clade viruses 342 are predominant in countries with higher deaths and GR clade showed higher prevalence 343 among severe/deceased patient 70 . Besides these host, pathogen and environment associated 344 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. This study found interesting structural features of the S protein while comparing and 375 superimposing the wild (D 614 ) over mutant (G 614 ). The secondary structure prediction and 376 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 23, 2021. ; https://doi.org/10.1101/2021.02.21.21252137 doi: medRxiv preprint surface accessibility analyses showed that there was a slight mismatch at the S1-S2 junction 377 ( 681 PRRAR↓S 686 ) where serine at 686 (S 686 ) was found covered in G 614 and exposed to the 378 surface in D 614 . However, S 686 in both G 614 and D 614 were exposed to an open-loop region to 379 have possible contact with the proteases (supplementary Figure S1 ). Further investigation on 380 the aligned 3D structures showed no conformational change at the Furin or TMPRSS2 381 cleavage site (Figure 3c ). We observed no structural variation in the surrounding residues of 382 the protease-targeting S1-S2 site (Figure 3c) , which eliminated the assumption of Phan 79 . 383 The predictive 3D models and structural assessment of D 614 and G 614 variants also confirmed 384 that the cleavage site at 815-16 of S2 subunit ( 812 PSKR↓S 816 ) or S2' 3,80 had no structural and 385 surface topological variation (Figure 3d The elastase-2 restrictedly cut valine at 615, due to its valine-dependent constriction 389 of catalytic groove 81 . The present sequence setting surrounding of G 614 (P6-390 610 VLYQGV↓NCTEV 620 -P'5) showed a higher enzymatic activity over the D 614 , which 391 cannot be completely aligned with previous works on the sequence-based substrate 392 specificity of elastase-2 82 . However, the first misaligned residue of the superimposed G 614 , 393 located at the P'4 position (T 618 ), may also be important for binding with the elastase-2, and 394 further down the threonine (T) at 618, the residues may affect the bonding with the respective 395 amino acids of the protease. This changed conformation at the downstream binding site of 396 G 614 may help overcome unfavorable adjacent sequence motifs in the mutated S protein as the 397 elastase-2 substrate. Therefore, the simultaneous and/or sequential processing of the mutated 398 S protein by TMPRSS2/Furin/Cathepsin and elastase-2 facilitates a more efficient SARS-399 CoV-2 entry into the host cells and cell-cell fusion 41, 58, 78 . 400 This study further observed the possible association of the S protein with elastase-2 401 and found an increased binding affinity in case of G 614 (Table 3) . Hence, the active sites of 402 the mutated protein interacted efficiently with more amino acids of elastase-2 (Table 4) 4). The efficient cleaving of this enzyme, although located in an upstream position of the S1-407 S2 junction, may assist in releasing S1 from S2 and change the conformation in a way to get 408 later cleaved at the S2' site, and then help in the fusion process 83,84 . Mutated spike protein 409 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 23, 2021. ; https://doi.org/10.1101/2021.02.21.21252137 doi: medRxiv preprint and elastase-2 complex was more flexible than the wild one, and the interactions with 410 enzyme was also different as shown in RMSD deviation between the complexes ( Figure 5 ). 411 Besides, this G 614 amino acid replacement may have a destabilization effect on the 412 overall protein structure (Table 4 and Figure 3a interactions of the spike trimer and symmetric conformation will give a better chance to bind 419 with ACE2 receptor and can also increase antibody-mediated neutralization 86 . The S1 will 420 release from S2 more effectively in G614 protein due to introduction of glycine that will 421 break hydrogen bond in between the D614 and the neighboring protomer T859 amino acid The binding free energy (ΔG) of the RdRp-NSP8 complexes have been predicted as -433 10.6 and -10.5 Kcalmol -1 , respectively, in wild (P 323 ) and mutant (L 323 ) type that suggests less 434 strong interaction for mutant protein (Table 4 ). The number of contacts made at the interface 435 (IC) per property and interacting amino acids increased between L 323 and NSP8 (Table 3 and and proline (P 116 ) of NSP8 ( Figure 6 ). RdRp binds with NSP8 in its interface domain 440 (residues alanine:A 250 to arginine:R 365 ), forming positively charged 'sliding poles' for RNA 441 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. showing a more expanded surface area in interacting sites ( Figure 5b ) and maintained 444 integrity throughout the simulation (Figure 5c ). Another study reported that the NSP8 binding 445 sites on RdRp and the RNA exit tunnel were comparatively neutral and conserved 90 . Besides, 446 a zinc ion in the conserved metal-binding motif (H 295 , C 301 , C 306, and C 310 ), close to the 323 447 residue, is responsible for maintaining the integrity of the RdRp architecture 90,91 . We did not 448 find any interaction of NSP8 with the zinc-binding residues of RdRp proteins (Table 3 and 449 Overall, the mutation at 323 position, to some extent, stabilized the L 323 structure, 463 made the protein more rigid, binds less strongly with the NSP8, and thus expanded the 464 interacting region with NSP8. These variations may together increase the replication speed by 465 exiting the processed RNA genome from the RdRp groove structure more swiftly (Figure 1) . 466 The increasing replication speed might be due to the perturbation of interaction between 467 RdRp and NSP8 88,94 , or less possibly, the complex tripartite interactions (RdRp, NSP8, and 468 NSP14) responsible for the speculated decrease of proof-reading efficiency 6 . Thus, RdRp 469 mutants might increase the mutation rate by a trade-off between high replication speed and 470 low fidelity of the mutant polymerase 95 . Another possibility could be the lower proof-reading 471 efficiency of NSP14 that was not linked to the replication speed 6 . Analysis of our study 472 sequences revealed that the frequency of mutation (median=8) in L 323 mutants (n=27,364) is 473 significantly higher (p<0.0001) than the frequency (median=6) of wild-type (P 323 ) strains 474 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 23, 2021. ; https://doi.org/10.1101/2021.02.21.21252137 doi: medRxiv preprint (n=9,815). This increased mutation rate may play a vital role in genetic drifts and provide 475 next generations a better adaptation to adverse environments. 476 This study has found that the replacement of glutamine ( (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Figure S6 ) and also reported by Kern, et al. 97 . However, bat or civet did not contain this 509 mutation in TM1 and the flanking vicinity was not identical; whereas, the TM1 510 ( 41 LPFGWLIVGVALLAVFQSASKII 63 ) does not have amino acid replacements for the 511 strains of SARS-CoV-2 and pangolin-coronaviruses ( Figure S5a-b) . The presence of this 512 mutation in pangolin could be an accidental case or might explain its impact on modulating 513 host-specific immune response, which needs functional experimental verification. A possible 514 explanation behind that presence might be the more accustomed nature of the virus towards 515 reverse transmission by being less virulent, i.e., from human to other animals, as observed in 516 recent reports 109,110 . 517 Our study has observed that the combined mutation (N: p.RG203-204KR) causes no 520 conformational change in secondary and 3D structures ( Figure S1 and Figure 8 , respectively) 521 of the conserved SR-rich region of the LKR (supplementary Figure S7) , but there is a minor 522 alteration in the degree of buried or exposed site (Figure 3 ). This result contradicted the 523 prediction of 111 about the change in the length and arrangements of the alpha-helix in the SR-524 rich region. The superimposed 3D structures showed structural deviation, rather at 525 231 ESKMSGKGQQQQGQTVT 247 of the LKR (Figure 9 ), corresponding to the high 526 destabilization of the KR 203-204 protein (Table 3) . On the other hand, A220V mutation in the 527 N protein of the GV clade showed a slightly more stable formation of the mutated N protein 528 with no change in the chemical properties (Table 4) , that might affect RNA binding affinity 529 112 . 530 Impedance to form particular SR-motif due to RG KR mutation might disrupt the 531 phosphorylation catalyzed by host glycogen synthase kinase-3 113 . Similar hypo-532 phosphorylation events could arise due to the conversion of serine to nonpolar or neutral 533 amino acids (L 188/194/197 , I 193 , and N 202 ), as represented in supplementary Table S2 and with lysine that may increase the nucleocapsid (N protein-RNA complex) stability by 538 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 23, 2021. ; https://doi.org/10.1101/2021.02.21.21252137 doi: medRxiv preprint forming stronger electrostatic and ionic interactions due to increased positive charge 118,119 . 539 Besides, the more disordered orientation of the associated LKR 115 and highly destabilizing 540 property of KR 203-204 may assist in the packaging of a stable RNP 112,120 . These interactions 541 and impact upon mutations are depicted in Figure 1 . 542 N protein also utilizes the dynamic nature of the intrinsically disordered linker region 543 (LKR) that controls its affinity towards M protein, self-monomer, 5'UTR, and cellular 544 proteins [121] [122] [123] . The phosphorylation at the LKR site may play an essential role to regulate 545 these interactions 118 . It was speculated that KR 203-204 attained more selective advantage 96 546 over the other mutations of N protein (Table S2 ), probably because of stronger RNA binding 547 and synchronized hypo-phosphorylation. Figure 1 represents the 570 overall possible scenario due to these silent mutations. 571 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 23, 2021. ; https://doi.org/10.1101/2021.02.21.21252137 doi: medRxiv preprint activation of inflammatory response (Figure 1 ), such as reduced viral particle release and 636 cytokine storm 23,151,152 . 637 The fittest viral strains will dominate in a population considering other selective 638 parameters associated with the virulence 139,144 . Public health interventions were able to 639 create a selective pressure to make a strain less virulent and highly competitive 153,154 , which 640 might increase the chance of viral transmission 155 ; however, there was also argument against 641 this established theory 156 . The SARS-CoV-2 with multiple clades and variants needs to be 642 more efficient to maintain the delicate evolutionary trade-off between fitness and virulence. The GH strains were mainly restricted to the USA and partly Eastern Mediterranean 659 (Fig. 1) , and mostly spread by cryptic 159 and pre/a-symptomatic transmission 160 . These 660 variants might trade-off virulence by a slower release of virions, and in exchange, benefited 661 from the induction of low immune response in asymptomatic hosts. They were more fit at 662 that time, in theory, when the people were dealing with the pandemic in panic. The GH type 663 might be able to maximize their transmission by residing within the host unknowingly and 664 spread at ease. We can not nullify that there could be no significant link of the mutation with 665 virulence or death-case ratio since the effect of ORF3a was not found significant in wet lab 666 experiement and in our statistical analysis as well. 667 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 23, 2021. ; in swabs, as well as, higher binding affinity to ACE2 receptor and escape from neutralizing 701 antibody due to three mutations (K417N, E484K and N501Y) at key sites of the RBD 169 The course of COVID-19 pandemic was continuing, although the death rate was 731 gradually decreasing in 2020. Our study hypothesized that this paradoxical scenario was 732 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 23, 2021. Molecular Cell (2020). 794 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Msystems 5 (2020). 843 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. UTR are denoted as the wild type where mutants contain those. Throughout the diagram, the 1224 red and green color icons such as proteins, genome, and virion represent the wild and mutant 1225 type, respectively. For a generalized virion, we used the blue color. Although this theme is 1226 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (Table 3) 1322 conformational change from 231 to 247 amino acids within LKR. Other regions of the N 1323 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 23, 2021. ; https://doi.org/10.1101/2021.02.21.21252137 doi: medRxiv preprint Wild charged-charged (5); charged-polar (9); charged-apolar (15); polar-polar (2); polar-apolar (16); and apolarapolar (21) charged-charged (17);charged-polar (22);charged-apolar (32);polarpolar (5);polar-apolar (31); and apolar-apolar (23) Mutant charged-charged(5);charged-polar (16);charged-apolar (19); polar-polar (3);polar-apolar(15); apolar-apolar (23) charged-charged (13);charged-polar (18);charged-apolar (27);polarpolar (4); polar-apolar (28) and apolar-apolar (36) Associated amino acids of Elastase-2 with possible docking interactions (for spike) or NSP8 (for RdRp) p o s i t i v e a n d n e g a t i v e v a l u e d e n o t e s t h e i n c r e a s e a n d d e c r e a s e o f 1339 m o l e c u l a r f l e x i b i l i t y , r e s p e c t i v e l (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 23, 2021. ; https://doi.org/10.1101/2021.02.21.21252137 doi: medRxiv preprint PR3-three elastases with similar primary but different 992 extended specificities and stability Structure, function, and evolution of coronavirus spike proteins Tectonic conformational changes of a coronavirus spike 996 glycoprotein promote membrane fusion SARS-CoV-2 spike-protein D614G mutation increases virion spike 999 density and infectivity D614G spike mutation increases SARS CoV-2 susceptibility to 1001 neutralization Cryo-EM structure of the 2019-nCoV spike in the prefusion 1003 conformation Structure of replicating SARS-CoV-2 polymerase Structural basis for inhibition of the RNA-dependent RNA polymerase 1007 from SARS-CoV-2 by remdesivir Structure of the SARS-CoV nsp12 polymerase 1009 bound to nsp7 and nsp8 co-factors Structure of the RNA-dependent RNA polymerase from COVID-19 1011 virus Identification of novel mutations in RNA-1013 dependent RNA polymerases of SARS-CoV-2 and their implications on its protein 1014 structure & 1016 Ray, U. Specific mutations in SARS-CoV2 RNA dependent RNA polymerase and 1017 helicase alter protein structure, dynamics and thus function: Effect on viral RNA 1018 replication Identification of novel mutations in RNA-1020 dependent RNA polymerases of SARS-CoV-2 and their implications on its protein 1021 structure RdRp mutations are associated with 1023 SARS-CoV-2 genome evolution Genomic diversity and divergence of SARS-CoV-2/COVID-19 from 1025 GISAID Cryo-EM structure of the SARS-CoV-2 3a ion channel in lipid 1027 nanodiscs Protein structure and ionic selectivity in calcium channels: 1029 Selectivity filter size, not shape, matters Pore size matters 1032 for potassium channel conductance Selectivity filters and 1035 cysteine-rich extracellular loops in voltage-gated sodium, calcium, and NALCN 1036 channels Ion Channels: A novel origin for calcium 1038 selectivity Ion channels in the 1040 regulation of apoptosis SARS-Coronavirus Open Reading Frame-3a drives multimodal necrotic 1043 cell death Bcl-2 and Ca 2 Viroporins: structure and biological functions The ORF3a protein of SARS-CoV-2 induces apoptosis in cells Role of severe acute respiratory syndrome coronavirus 1051 viroporins E, 3a, and 8a in replication and pathogenesis Evolutionary origins of the SARS-CoV-2 sarbecovirus 1053 lineage responsible for the COVID-19 pandemic Transmission of SARS-CoV-2 in domestic cats Susceptibility of ferrets, cats, dogs, and other domesticated animals to 1058 SARS-coronavirus 2 Reporting Two SARS-CoV-2 Strains Based on A Unique Trinucleotide-1060 Bloc Mutation and Their Potential Pathogenic Difference The SARS 1063 coronavirus nucleocapsid protein-forms and functions The SR-rich motif in SARS-CoV nucleocapsid protein is important for 1066 virus replication Evolutionary dynamics of SARS CoV 2 nucleocapsid protein 1068 and its consequences The new (dis) order in RNA 1070 regulation Innate immune evasion by human respiratory RNA viruses Severe acute respiratory syndrome coronavirus open reading frame (ORF) 3b, ORF 1075 6, and nucleocapsid proteins function as interferon antagonists The coronavirus nucleocapsid is a 1078 multifunctional protein A study on the 1080 effect of surface lysine to arginine mutagenesis on protein stability and structure using 1081 green fluorescent protein Serine/arginine-rich splicing factors belong to a class 1083 of intrinsically disordered proteins Phosphoregulation of Phase Separation by the SARS-CoV-2 N 1086 Protein Suggests a Biophysical Basis for its Dual Functions Using the nucleocapsid protein to investigate the relationship between 1089 SARS-CoV-2 and closely related bat and pangolin coronaviruses Molecular interactions in the assembly of 1091 coronaviruses RNA genome conservation and secondary structure in SARS-CoV-2 1093 and SARS-related viruses: a first look Comprehensive in vivo secondary structure of the SARS-CoV-2 1095 genome reveals novel regulatory motifs and mechanisms Synonymous mutations make dramatic contributions to fitness 1098 when growth is limited by a weak-link enzyme Synonymous codons influencing gene expression 1100 in organisms An interaction 1102 between the nucleocapsid protein and a component of the replicase-transcriptase 1103 complex is crucial for the infectivity of coronavirus genomic RNA Characterization of a critical 1106 interaction between the coronavirus nucleocapsid protein and nonstructural protein 3 1107 of the viral replicase-transcriptase complex Nuclear magnetic resonance structure of the N-terminal domain of 1110 nonstructural protein 3 from the severe acute respiratory syndrome coronavirus Networks of genomic co-occurrence capture characteristics of human 1113 influenza A (H3N2) evolution Full restoration of viral fitness by multiple compensatory co-1115 mutations in the nucleoprotein of influenza A virus cytotoxic T-lymphocyte escape 1116 mutants Network of co-mutations in Ebola virus genome predicts the disease 1118 lethality Rules of co-occurring mutations 1120 characterize the antigenic evolution of human influenza A/H3N2, A/H1N1 and B 1121 viruses Influenza virus CTL epitopes, remarkably conserved and remarkably variable Mutation and epistasis in influenza virus evolution Basic concepts in population, quantitative, and evolutionary genetics. 1128 (WH Freeman and Company Viral biocontrol: grand experiments in disease 1130 emergence and evolution The phylogenomics of evolving virus virulence. 1132 Evolution of virulence in 1134 emerging epidemics Temporal trends in prognostic markers of HIV-1 virulence and 1136 transmissibility: an observational cohort study Thinking outside the triangle: 1139 replication fidelity of the largest RNA viruses IPNV with high and low virulence: host immune responses and viral 1142 mutations during infection Increased fidelity reduces poliovirus fitness and 1144 virulence under selective pressure in mice Quasispecies 1146 diversity determines pathogenesis through cooperative interactions in a viral 1147 population Nsp3 of coronaviruses: Structures and functions of 1149 a large multi-domain protein Coronavirus envelope protein: current knowledge The membrane M protein carboxy 1153 terminus binds to transmissible gastroenteritis coronavirus core and contributes to 1154 core stability Membrane binding proteins of coronaviruses Presumed asymptomatic carrier transmission of COVID-19 Clinical and immunological assessment of asymptomatic SARS-1160 CoV-2 infections Adaptive dynamics of infectious 1163 diseases: in pursuit of virulence management On the evolutionary epidemiology of 1166 SARS-CoV-2 Challenging the trade-off model for the evolution of virulence: 1168 is virulence management feasible? A clade of SARS-CoV-2 viruses associated with lower 1170 viral loads in patient upper airways Routine childhood immunisation during the COVID-19 pandemic in 1172 Africa: a benefit-risk analysis of health benefits versus excess risk of SARS-CoV-2 1173 infection. The Lancet Global Health Dissemination and co-circulation of SARS-CoV2 subclades 1175 exhibiting enhanced transmission associated with increased mortality in Western 1176 Europe and the United States Temporal dynamics in viral shedding and transmissibility of COVID-19 N-terminal domain antigenic mapping reveals a site of 1180 vulnerability for SARS-CoV-2. bioRxiv Mining of epitopes on spike protein of SARS-CoV-2 from 1182 COVID-19 patients Selective and cross-reactive SARS-CoV-2 T cell epitopes in 1184 unexposed humans Coronavirus disease 2019 (COVID-19) re-infection by a 1186 phylogenetically distinct severe acute respiratory syndrome coronavirus 2 strain 1187 confirmed by whole genome sequencing Transmission of SARS-CoV-2 Lineage B. 1.1. 7 in England: Insights 1189 from linking epidemiological and genetic data. medRxiv Neutralization of SARS-CoV-2 lineage B. 1.1. 7 pseudovirus by 1192 BNT162b2 vaccine-elicited human sera Recurrent deletions in the SARS-CoV-2 spike glycoprotein 1194 drive antibody escape Neutralising antibodies drive Spike mediated SARS-CoV-2 evasion 1196 (medRxiv). bioRxiv Emergence and rapid spread of a new severe acute respiratory 1198 syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations 1199 in South Africa grinch. B.1.351_South African lineage defined by new variant 501Y.V2 -A more 1201 detailed description of the lineage. global report investigating novel coronavirus 1202 haplotypes (2021). 1203 171 grinch. P.1_Brazilian lineage with variants of biological significance E484K, N501Y 1204 and K417T. global report investigating novel coronavirus haplotypes Genomic characterisation of an emergent SARS-CoV-2 lineage in 1206 Manaus: preliminary findings N439K variant in spike protein may alter the infection efficiency and 1208 antigenicity of SARS-CoV-2 based on molecular dynamics simulation Cell (2021). the S1-S2 cleavage site (685-686), depicted in blue color, of the wild and mutant 1259 protein. (d) Surface and (e) cartoon (2°) structure of the superimposed wild and mutant 1260 proteins where the S2' (pink) is situated in surface region and do not show any change in 1261 accessibility in the residual loop region. (f) The mutant (G 614 ) protein showed higher 1262 flexibility in the G 614 (sticks) and its surroundings (red). The intra-molecular interaction Wild Cys(114), Val(115) and Pro (116) 614 (Gly): 101 (Val) it rised and maintained steady state. Although the spike protein had higher degree of 1283 deviation in RMSD profile than RdRp but they did not exceed 3.0Å. The RMSD from 1284 demonstrated that mutant and wild RdRp protein complex has initial rise of RMSD profile 1285 due to flexibility. Therefore, both RdRp complexes stabilized after 30ns and maintained 1286 steady peak. The wild type RdRp complex had little bit higher RMSD peak than mutant 1287RdRp which indicates the more flexible nature of the wild type. (b) The spike protein 1288 complex had similar SASA profile and did not change its surface volume and maintained 1289 similar trend during the whole simulation time. The higher deviation of SASA indicates that 1290 mutant and wild type RdRp had straight line but mutant structure had higher SASA profile 1291