key: cord-1045462-4sum618u authors: Chaudhari, Armi; Chaudhari, Minal; Mahera, Sapna; Saiyed, Zuber; Nathani, Neelam M.; Shukla, Shantanu; Patel, Dhaval; Patel, Chirag; Joshi, Madhvi; Joshi, Chaitanya G. title: In-Silico analysis reveals lower transcription efficiency of C241T variant of SARS-CoV-2 with host replication factors MADP1 and hnRNP-1 date: 2021-07-21 journal: Inform Med Unlocked DOI: 10.1016/j.imu.2021.100670 sha: fc97f71523a8c84b4988fc7bc426827d24e85707 doc_id: 1045462 cord_uid: 4sum618u Novel severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2) has claimed more than 3.3 million lives worldwide and still counting. As per the GISAID database, the genomics of SARS-CoV-2 has been extensively studied, with more than 500 genome submissions per day. Out of several hotspot mutations within the SARS-CoV-2 genome, recent research has focused mainly on the missense variants. Moreover, significantly less attention has been accorded to delineate the role of the untranslated regions (UTRs) of the SARS-CoV-2 genome in the disease progression and etiology. One of the most frequent 5’ UTR variants in the SARS-CoV-2 genome is the C241T, with a global frequency of more than 95 %. In the present study, the effect of the C241T mutation has been studied with respect to the changes in RNA structure and its interaction with the host replication factors MADP1 Zinc finger CCHC-type and RNA-binding motif 1 (hnRNP1). The results obtained from molecular docking and molecular dynamics simulation indicated weaker interaction of C241T mutant stem-loops with the host transcription factor MADP1, indicating a reduced replication efficiency. The results are also correlated with increased recovery rates and decreased death rates of global SARS-CoV-2 cases. The host-pathogen interaction that follows an initial viral infection results in a struggle to survive between both the host and the pathogen and can result in counter-mechanisms that enable one to prevail and the other to perish. A majority of ssRNA viruses contain their genome replication through 5'cap or have some secondary structures of RNA forming internal ribosomal entry sites (IRES) to recruit host replication factors [3, 4] . For viral replication, the 5' untranslated region (UTR) makes a substantial contribution by acting as the main lever that regulates various host replication initiation factors in the RNA viruses, especially in picornaviruses [5] . Another mechanism involves non-AUG (methionine) initiation wherein a virus targets eukaryotic initiation factor-2 (eIF2), inhibiting the host cell protein synthesis. Additionally, some viruses use the CUG (leucine) initiation mode to overcome the stress of specific antigenic peptides [6] . Furthermore, some viruses employ an alternative for translation initiation, like eIF2-independent shutoff mechanism whereby these viruses use the formation of downstream helix after initiation codon for efficient translation of viral mRNA [7] . These are some of the most important mechanisms by which viruses utilize the host factors for their replication and translation. Thus, these targets can be efficient candidates for preventing viral infections by shutting down the viral translation machinery [8] . SARS-CoV-2 likely performs its replication through a 5' cap-dependent mechanism [9] . Nonstructural protein 16 (nsp16) associates with Nsp 10 to form heterodimer which performs two important functions for viral proliferation in the host, One methylates 5' end of mRNA encoded by virus and second 2' O-methylation of very first nucleotide of viral mRNA [10] . SARS-CoV-2, after entry into the human host through association with ACE2 (angiotensin converting enzyme) receptors, enters the lung epithelial cells and injects its RNA into the host cytoplasm [11, 12] . Full-length negative-sense RNA copies are generated at the initial stage of the infection, which further acts as a template for synthesizing positive-sense genomic RNA [2] . During the discontinuous replication process of the viral genome, RNA stem-loop (SL) structures SL1, SL2, TLR-S (Translation leading region) play an important role by interacting with host RNA binding proteins like the zinc finger CCHC-type and the RNA-binding motif 1 MADP1. These interactions play an important role in viral replication and translation [13] [14] [15] [16] . 5' UTR of SARS-CoV-2 have various stem loops reported as SL1, SL2, SL3, SL4A, SL4B and SL5 [17] . With respect to SARS-CoV-1, SL1 contains a bipartite sequence possibly involved in the fine-tuning of viral RNA interaction with the host protein. SL2 contains a highly conserved loop sequence of 5'-CUUCU (N)-3' among most of the coronavirus species, which is likely to form a tetra loop-like structure [18, 19] . SL3 is conserved among the group of gamma and beta coronavirus. SL3 comprises TRS-L (transcription regulating sequence) 5'-UCUCAA-3', which is exposed in the core region of SL3 and is meant to be highly involved in virus replication and transcription [20] . SL4 is one of the main hairpin loops containing domain among the coronaviruses, which is located downstream to the TRS-L sequence and is prone to mutations. Even a single or a double SNP in SL4 can change the frame of the sequences and can curtail viral replication [18, 21, 22] . The genome sequence of SARS-CoV-2 is around 29-31 kb, and more than 151,000 whole genome sequences have been deposited in GISAID (Global initiative on sharing all influenza data) by researchers across the globe. Top variants in the SARS-CoV-2 genome include highfrequency variants including C241T, C1059T, C3037T, C14408T, A23403G, G25563T, and G28883C, out of which the C241T variant had a 99% frequency with 0.505 entropy by October 2020 [23] . Though a lot of research has been performed about the missense variants of the genome, to the best of our knowledge, no detailed reports on the effect of the 5' UTR variant C241T on host factor binding are available [24] . Study also shows that mutations in spike helps the virus in escape of immune evasion [25, 26] . But overall decrease in death rate J o u r n a l P r e -p r o o f and increase in recovery rate can be due to some mutations favoring host immune system [27] . Host factors involved in coronavirus proliferation include Annexin A2, which regulates IBV (immune bronchitis virus) frameshifting [28] , hnRNP1, which binds with high affinity at MHV (-) strand of the TLR-s leader sequence of RNA, PAPBs binds at poly-A tract of 3'UTR of BCov, TBV and TEGV [29] [30] [31] , MADP1 Zinc finger CCHC-type and RNAbinding motif 1 binds at stem-loops structures at 5' UTR of SARS-CoV-1 and IBV [16] . The involvement of these factors is proven by wet-lab experiments involving in vitro pull-down assay and gene expression studies [13, 32] . In recent viral nsp1 is reported to recognize self RNA through binding with 5' UTR's SL1 and meditates recruitment of 40S ternary ribosome complex for viral proteins translation [33] . In this article, molecular docking and MD (Molecular Dynamics) simulation studies were performed using the wild type (241T) and mutant (241C) sequences from 5'UTR of SARS-CoV-2 with host factor MADP1 and hnRNP1 to decipher the effect of viral virulence. Further, the wild type and mutant RNA-Protein complexes were compared for their stability by interpreting H-bonds formations, binding energy using MMGBSA, MMBPSA, dynamics cross-correlation matrix, and Principal Component Analysis. Overall, this work helps establish the interactions of host factors with viral 5'UTR stem-loops and understanding the effect of C241T mutation on the dynamics of SL1, SL2, SL3 SL4A, SL4B and SL5 with respect to host-factors. RNA secondary and tertiary structure predictions are very important to perform during in silico experiments to reach accurate conclusions. The FASTA sequence of the 5' UTR of the wildtype SARS-CoV-2 was retrieved from the Rfam (RNA sequence family) database with a reference id RF03117 and was used to predict it's secondary structure. The mutant RNA sequence was manually generated using the wild-type sequence by replacing cytosine (C) at position 241 with Uracil (U), which is mentioned as C241T in this article. Strains of various SARS-CoV-2 have already reported the presence of various stem-loops structure and bulges which is documented in the RFAM database; we downloaded FASTA sequence (around 300 base pairs) from the RFAM database (id RF03117) for studying 5' UTR sequences of beta coronaviruses [34] . Generation of RNA 3D structure was done using stand-alone and web-based programs such as 3DNA, X3-DNA-DSSR, and RNA composer [35] [36] [37] . Mutation in RNA sequence was generated by C241T in 3DNA based mut_RNA platform [35] . For RNA secondary and tertiary structures were generated in RNA composer that uses the RNA FRABASE dictionary with the CHARM force field for generating effective minimized energy RNA structures [37] . An advantage of RNA composers is the ease of predicting the ternary structures of long RNA sequences, which is a limitation of many RNA structure prediction software. The effect of single nucleotide variation in base-pairing probabilities was checked in mutaRNA [38] . Final confirmation of folded RNA structure was confirmed in the X3DNA-DSSR Linux-based operating system which gives accurate root mean square deviation (RMSD) between predicted and experimentally verified sequences. J o u r n a l P r e -p r o o f As MADP1 binds to SL1 of SARS-CoV-2, this region was selected as a potential protein binding site for the 5'UTR region. Additionally, since it is also known that TRS-L binds to hnRNP1 sequence 5' CGGCUGC 3', this sequence was chosen as a protein binding sequence [3, 13] . Prediction of RNA residues binding to protein was also done by RNApin web server [39] , where above mentioned sites were also falling in the protein binding region. Molecular docking of host factors and RNA was performed in HADDOCK 2.2 [40] [41] . HADDOCK performs docking using a data-driven approach. Docking encompasses three steps: rigid-body docking, semi-flexible refinement, and water refinement. In stage one of the docking of protein and RNA in their bound conformation, a total of 1000 structures were generated using default settings of the program. Systematic sampling of 180 rotated solutions were used for the rigid-body docking to minimize the occurrence of false positives. The best 20% structures generated from the rigid-body docking were subsequently used in the second stage of semi-flexible refinement. The semi-flexible refinement is carried out in three-stages, followed by a rigid body torsion angle dynamic, and a simulated annealing stage at different MD steps. In the final simulated annealing stage where 1000 MD steps were performed from 300 K to 50 K with 2 fs time steps. In the final stage, both the side chain and backbone of protein residues at interface and RNA can move except the terminal base of RNA. This final stage consists of a gentle refinement (100 MD heating steps at 100, 200, and 300 K followed by 750 sampling steps at 300 K and 500 MD cooling steps at 300, 200, and 100 K all with 2 fs time steps) in an 8 Å shell of TIP3P water molecules [41] . Docking of protein-RNA complexes were also performed in various servers like PATCHDOCK [42] , NPDOCK [43] , and HDOCK [44] to compare the docking score between two structures. MADP1 is supposed to bind at SL1, SL2 region, moreover binding with residues covering SL3 region. Dynamics of SL4 and SL5 seems to be very crucial for influencing the binding of host factors with SL1, SL2 and SL3. C241T mutation is affecting SL4. We carried out unbiased simulations in recently updated forcefield OPLS4 and Amber to conform the overall dynamics of each stem loop in terms of binding of MADP1 and affinity among both variants of RNA. Molecular dynamics simulations of Protein-RNA complexes were performed in Desmond implemented in Schrodinger [45] [46] . Selected protein-RNA complexes were first added to a TIP3P water box, extending 10 Å beyond any of the complex's atoms. Counter ions (Na + , and Clions) were added to neutralize charges. Salt concentration was set to 0.15 M sodium, and chloride ions to approximate physiological condition. The MD was performed in the NPT ensemble at a temperature of 310.5 K and 1.00 bar pressure over 100 ns with recording intervals of 1.2 ps for energy and 24 ps for trajectory. Simulation box contains ~331517 atoms in Desmond trajectories, with box size in volume ( 3 ) 3413425. Simulations were run with the latest OLPS4 and OPLS3e force field compatible for Protein and nucleic acids. Plots and figures were generated in PyMOL DeLano, W. L. 2009. Additional MD simulations were performed in Gromacs (version 2018.1) for MADP1-WT complex [47] . Protein and RNA topologies were built with AMBER force filed [48] . MADP1-RNA complexes were simulated using three site TIP3P water model in rectangular box consisting of atoms ~1979400. Box configuration was set with 1nm distance from edge of RNA in all the directions. Subsequently, equal number of counter ions Na+/Cl-were added to neutralize the overall charge of the system. Bond lengths were constrained using LINCS algorithm [49] . Steepest descent algorithm was used to perform energy minimization J o u r n a l P r e -p r o o f to remove any steric clashes and bad contacts (50000 steps max). Particle Mesh Ewald (PME) summation was applied to account for the long range interactions, with a cut off of 1 nm [50] . Minimization was followed by equilibration with position restraint using NVT ( ) before starting the production run. Berendsen thermostat algorithm was used for maintaining a constant temperature of 300K in NVT based equilibrium [51] . Post NVT equilibrium followed by, NPT equilibration was performed using Parrinello Rahman barostat at a constant pressure (1 bar) for 100 ps [52] . VMD and UCSF-Chimera were used for visualization of MD trajectory [53] [54] . Gromacs analysis scripts g_rmsd, and g_rmsf were used for plotting root mean square deviation (RMSD) and root mean square fluctuations (RMSF) within the systems through their entire trajectories. VMD was used to analyze hydrogen bonds with 3Å cutoff and 30º cut off for donor-acceptor values. Xmgrace was used for plotting graphs (https://plasma-gate.weizmann.ac.il/Grace/). Wild-type RNA and mutated variant with C241T, were docked with two different host factors MADP1 and hnRNP1, total four complexes were generated. In our studies MADP1 was our major concern as it is known to bind SARS-CoV-2 5'UTR region. MADP1-RNA complexes were simulated in OPLS4 of Schrodinger and AMBER force filed in Gromacs twice. While hnRNP1-RNA complexes were simulated in Schrodinger only. Over all 10 complexes were simulated for 100ns. Binding energies of the four protein RNA complexes were calculated using PRIME module in Schrodinger using thermal_mmgbsa.py and residue-wise decomposition using breakdown_mmgbsa_by_resdiue.py script. The binding energy was calculated for equally spaced 1000 conformations from the MD trajectories. ∆G Bind = ∆G SA + ∆G Solv + ∆E MM ∆G SA is the difference in the surface area energies for the protein -RNA complex and the sum of the surface area energies in the protein and RNA. ∆E MM is the difference in the minimized energies between the protein-RNA complexes. ∆G Solv is the difference in the GBSA solvation energy of the protein-RNA complex and the sum of the solvation energies for the protein and RNA. ∆G SA is the difference in the surface area energies for the complex and sum of the surface area energies in the protein and RNA. Binding energy was also calculated for trajectories from Gromacs using MM-PBSA approach Residual decomposition and energy distribution within residues were calculated using MmPbSaDecomp.py and energy2bfac [55] . A vivid graphical presentation of dominant correlated motions of atoms present in the protein-RNA complex is obtained by the covariance matrix of the Cartesian coordinate data set. To generate covariance matrix of elements C ij for coordinates i and j is given by bellow mentioned formula. Here, r i and r j were Cartesian coordinates of i th and j th Cα atoms, 〈r i 〉 and 〈r j 〉 stood for the time average over all the configurations derived from the molecular dynamic simulation. This analysis was performed using the Bio3D library as implemented in R [56] [57] [58] [59] . A cross-correlation matrix was used to study the effect of C241T mutation on Protein-RNA complex dynamics by analyzing how atomic displacements were coupled. A cross-correlation coefficient C ij was calculated from C α atoms by the following equation. In this equation, ∆r i and ∆r j are the displacements from the mean position of the i th and j th residues (or atoms), respectively. The angular brackets "〈 〉" represents the time average of the entire trajectory. The value of C ij is from -1 to 1. A positive value is assigned for positively correlated motion atom (moving in the same direction) indicated in cyan and the negative value represents negatively correlated motion (moving in the opposite direction) indicated in magenta. The DCCM analysis is carried out using the Bio3D packages of R [56] [57] [58] [59] . Viral replication is highly dependent on the host factors, especially the transcription factors in the case of RNA viruses. Novel SARS-CoV-2, a positive-strand RNA virus from the family Coronaviridae deploys a 5' CAP-dependent discontinuous replication within the human host (24, 46) . In the present study, molecular dynamics and interaction of two human transcription factors MADP1 and hnRNP1 were studied with a most common 5' UTR mutation C241T in the viral genome. Previous reports on SARS-CoV-1 infection, MADP1 from the Zinc finger protein motif 1 plays a very crucial role in deciding the replication efficiency within host cells [16] . While another transcription factor, hnRNPA-1 is extensively studied with respect to the SARS-CoV-2, having multiple roles in viral proliferation like RNA replication, transcription, export, import, and translation. hnRNP-1 uses RGG (Arginine-glycine-glycine) motif for RNA binding; it also binds with viral nucleocapsid protein (N) [61] . While hnRNP1 is binding to the TLRS sequence (Pyrimidine rich) of most the coronaviruses like TGEV and MHV [3] , its involvement in SARS-CoV-2 is yet to be studied. MADP1 and hnRNP1 shares highly structural similarities with superimposition RMSD 1.54Å, (Supplementary Figure S1 ). Due to similarities in tertiary structure and role of hnRNP1 in binding at TLR-s of other coronaviruses leads us to incorporate this protein in our study with respect to binding at TLR-s present in SL3. Since the experimental data available for the MADP1 interaction within SL1 and SL2 region of the 5'UTR of SARS-CoV-1, docking is performed based of selecting residues within these regions. In this study, major focus was kept on MADP1 interaction within 5'UTR due to published data with respect to SARS-CoV-1, while no experimental data was available for hnRNP-1 interaction in SARS-CoV-1 and SARS CoV-2. HnRNP1 is known to bind at TLR-s other coronaviruses and they share similar sequence at TLR-s it was hypothesized that hnRNP1 might also plays important role in terms of viral transcription. Simulation with respect to MADP1 was performed into two different force fields, OPLS4 released in Schrodinger 2021.1 and AMBER99SB in Gromacs18.1 to explore the effect of mutation in terms of affinity and dynamics of stem loops SL1, SL2, SL3 were same in terms of binding with host factor or not. Simulations were performed in J o u r n a l P r e -p r o o f duplicates, because of two reasons. One, to check the reproducibility of simulations data in terms of stability of the system. Second, to understand if the dissociation of the host factor MADP1 from Mutant RNA is actually an observed phenomenon or was merely an artefact from the Desmond run. In the case of hnRNP1, simulations were performed only in Schrodinger. Nucleotide sequences of the wild type and mutant (C241T) viral 5' UTR sequences were taken from GenBank accession id MN908947.3 and secondary structures were generated (Supplementary Table S1 ). RNA secondary structure was predicted using X3-DNA-DSSR in dot-bracket format [36] and the data were visualized in Varna-GUI and structural differences due to C241T were interpreted. Further, RNA composer was used for generating tertiary structure of 5'UTR to correlate the results obtained though X3-DSSR [37] . Representation of the variation in RNA structure due to SNP is elucidated in Table 1 . Base pair number was decreased by the change of two in wild-type RNA compare to mutant one, which in turn also affects the number of helices, bulges and internal loops present in overall 5'UTR sequence. The topological difference in both RNA structures is shown in Supplementary Figure S2 . While structural features of RNA stem-loops are described in Figure 1 . In mutant RNA, one major change was observed in SL-4 (101G-111U to 101G-112U), which further leads to a change in the loop structure of SL4 ( Figure 1B ). Change in SL4 also changes the tertiary structure of RNA; further inducing difference in geometry of SL1, SL2, and SL3 ( Figure 1D & 1E). Both RNA sequences showed structural and folding differences within SL1, SL2, and TLR sequences ( Figure D1 -D3 & E1-E3). Wild-type RNA structure shared nearby geometry of SL1, SL2, and SL3, Mutated RNA shared faraway geometry for the same. SL1, SL2, SL3 and SL4 seems to form a cavity for binding of transcription factor. Based on output secondary structure generated in dot-bracket form using X3-RNA-DSSR are matching exactly with the same reported experimental data representing stem loops and helices [20, 22, 32] . To correlate these variations in RNA structure with the favourable or nonfavourable binding of host transcription factors, protein-RNA docking and the resulting binding energy was calculated to identify the effects of binding between both sequences wildtype and mutant (C241T) RNA. MADP1 is known to bind at SL1 and SL2 of RNA with help of Nsp1 protein for viral replication and transcription [16] . Another protein hnRNP1 is known to bind at the TLR-S sequence of IBV, TGEV, and MHV [3] . Invitro studies of MADP1 binding to 5' UTR of SARS-CoV-1 were available but not for hnRNP1 [16] . 5'UTR of SARS-CoV-2 is folded such that SL1, SL2, SL3, and SL4 shared adjacent folding, which may enhance the surface area for binding of these proteins. Our major focus was on MADP1 mediated binding at the SL1 and SL2 regions of both variants of the 5' UTR region due to confirmatory Invitro studies were available in SARS-CoV-2. In a majority of Coronaviridae family viruses, hnRNP1 is known to bind at TLR-S sequence of 5' UTR [13, 30] . It was hypothesized that hnRNP1 might also bind to TLR-S of SARS-CoV-2 for viral proliferation. After docking studies, a change in binding was also observed for host factors with both RNA sequences. Additionally, the stability of host factors-RNA complexes was also analyzed to contrast the affinity among both 5' UTR variants. Docking results for wild-type and mutant RNA with host factor protein from HADDOCK depicted the changes in the folding of stem-loops SL1, SL2 and SL3 in the tertiary structures of RNA, which in turn mediates difference in binding within all four complexes. For MADP1, a total of 117 structures in 8 cluster(s) were clustered in HADDOCK for wild type complex representing 58.5 % of the water-refined models, while 119 structures in 8 cluster(s), were generated for the mutant sequence (C241T) which represented 59.5 % of the water-refined models. For HNRNP1, HADDOCK clustered 107 structures in 13 cluster(s), which represented 53.5 % of the water-refined models generated for wildtype complex, while 92 structures in 14 cluster(s), which represents 46.0 % of the waterrefined models were generated for the mutant complex. Based on the docking score, wild-type RNA showed to have a better binding profile with a high binding constant value (more negative) of -143.0 +/-3.5 compared to mutant complex -138.4 +/-1.6, as well as the RMSD value for MADP1 was 0.9 +/-0.6 compared to the mutated (C241T) sequence was 1.1 +/-0.7 (see Table 2 for details). The binding pose of MADP1 with RNA structure is shown in Figure 2 . As shown in figure 2E and 2F, MADP1 is covering higher portion of SL1, SL2 and SL3 for binding compare to mutated complex. For hnRNP1, no significant change was observed for docking score and RMSD values. However, the average standard deviation (negative Z score) among clusters of wild-type RNA -2.5 with host factors indicates better docking among complexes compared to the mutant with -1.5. (Table 2 ). Docking results clearly narrate that electrostatic energy and Van der Waals energy favored the protein-RNA complex formation while desolvation energy hindered the same. Docked pose with respect to both host factors is shown in figure 2 . Docking was also performed in various platforms like NP Dock, H-Dock, and PATCH Dock (Supplementary table: 2). Based on the docking score obtained from HADDOCK and other docking platforms, Wild-type RNA displayed better binding compare to the Mutant RNA. To further validate these findings, the molecular dynamics of WT and MUT protein-RNA complexes were studied to find binding energy and stability. With respect to MADP1, RMSD (Root mean square deviation) of wild-type complex was 3.89±1.18 nm while for mutant it was 5.18±1.57 nm in opls4 force field. In the case of the Wild-type RNA-protein complex, for the first 20 ns trajectory was fluctuating but after 30ns system reached a plateau showing that the MADP1-RNA complex had formed a stable structure. While the trajectory of the mutant complex was observed to be fluctuating and stabilizing around 60 ns with an average RMSD fluctuation of ( Figure 3A ). As shown in figure 3 , orientation of SL1-SL3 is shown in grey for the wild-type complex and in pink for mutant complex. From OPLS4 trajectory it is clearly visible that wild-type complex is having MADP1 interaction with SL1-SL3 throughout the simulation and mutant showing clearly less interactions or showing pivotal binding within SL1 only. In figure 2E & 2F, initial structure at 0ns and protein was shown in licorice to show the difference area covered by MADP1 in initial structure. One can visualize the orientation of SL1, with respect to MADP1 binding is more favorable in wild-type, compare to mutant complex. From 10ns to 100ns, in wild-type complex SL1 and SL3 started moving as such to create a cavity around MADP1, while in mutant complex SL1 and SL3 were moving in opposite direction. Black arrows indicate the direction of dynamics of stem loops during the simulation (Figure: 3A) . Dynamics of SL4, where the mutation C241T was observed is important to create this enhance cavity around wild-type complex which is further elaborated from RMSF studies. At the end of simulation both duplicates of mutant complex had shown dissociation (Supplementary video 2 & 3) . RMSD values obtained through AMBER99SB were shown in figure 3B , wild-type and mutant complexes were showing 3.86±0.78 and 5.73±0.89 nm respectively. Initial structure at 0ns was same for both forcefield which is shown in figure 2E & 2F. In Figure 3B , SL3 and SL1, were also following same dynamics which was observed in Opls4. As simulation goes on, SL1 and SL3 create a cavity around host factor (madp1) which further enhance affinity for MADP1 with respect to wild-type RNA. At the same time period 75ns, mutant complex has formed a binding cavity which is not as compact as wild-type complex. In Figure 3B , MADP1 is covering more surface area in wild-type RNA as compare to mutant RNA. These findings can be related with over all dynamics of another stem loops which were SL4 and SL5. Dynamics of SL4 and SL5 is influencing the binding of MADP1 with SL1-3, which is shown by RMSF, PCA and porcupine plots for motion mode analysis (Figure 4 ). Figure 3A , is showing RMSD plotted for wild-type and mutant complex in terms of OPLS4 forcefield, which is specific for proteins. RNA structures were fluctuating at higher rate in Schrodinger trajectory, which might lead to increasing RMSD, but system get stable (equilibrate) after ~80ns. Different behavior protein and nucleic acid in two unlike forcefield may leads to this behavior of trajectories. Major aim was to justify the effect of C241T mutation in two different forcefields. From both forcefields C241T mutation affects the change in 5'UTR which leads decreased stability of 5'UTR with host transcription factors. RMSF (root mean square fluctuation) values of MADP1 and RNA from both complexes were calculated. Figure 4A , explains the MADP1 residual fluctuation in both complexes, where amino acid residues from 1 to 60 show less fluctuation in wild type complex compared to the mutant (C241T) complex. Less fluctuation in RMSF value indicates more interaction between RNA-protein complexes (49, 50, 51) which is also evident here in the case of a wildtype complex. Residues which were less fluctuating are actually interacting with RNA, which is further discussed in hydrogen bond analysis section. In figure 4A , residual dynamics crosscorrelation network for MADP1 were shown, light purple color shows correlated and red color shows anti-correlated motions with respect to RNA variants. In mutant, MADP1 is showing less correlated and more anti correlated motions were observed compare to wildtype. Also, porcupine plots are matching with the RMSF graph. In porcupine plots blue arrows shows low while red shows high and white arrows show moderate atomic displacements. In figure 4A, 1a and 1b (also 2a and 2b) showed red arrows and showing high RMSF values. While 1b and 1d in mutant and 2b and 2d in wild-type are showing (blue) low atomic displacements covering low RMSF values. In AMBER99SB, figure 4C , MADP1 in wild-type was showing high atomic displacement in region covering residues from 1-10(1a) and 50-55(1b) shows increase length of arrows in porcupine plots. Mutant complex was showing high atomic displacements in 2a region and 2b region which is narrated in RMSF plot also. Same results were observed in case of RNA from both variants. In figure 5B and 5D, RMSF of RNA variants in Opls4 and AMBER99SSB is shown respectively. Now one can clearly visualize how the dynamics of SL4 and SL5 is affecting MADP1 binding with SL1-SL3. In figure 5D , 1b (wild-type) and 2b (mutant) of porcupine plots SL4 is showing uniform diffraction in wild-type, while in mutant SL4 is showing highly disturbed motions, which can directly influence the binding of MADP1 with SL1-SL3 as all stem loops are interconnected and part of one RNA structure. Same observations can be made from 1c and 2c regions of figure 5D . Another host factor, hnRNP1 and RNA complexes were also analyzed with respect of stability. RMSD of the wild-type complex is 1.45nm which is much lower compared to mutant complex 2.35nm (Supplementary Figure 3A) . Overall residual fluctuation is shown in Figure 3B & 3D. Protein residues that are binding to RNA's TLR-S sequence have overall less fluctuation compared to non-bound residues, among them wild-type hnRNP1 have lower fluctuation compared to mutant (Supplementary Figure 3B-box A) . TLR-S (U59-C66) of the wild-type complex has significantly lower fluctuation compare to the mutant complex (Supplementary Figure 3D) . For both host replication factors, wild-type complex seems to be more favorable for the binding compare to the mutant complex based on RMSD and RMSF of both complexes in different force field. To study the interaction of amino acids in Protein-RNA complexes, hydrogen bond analysis of those amino acid residues interacting with RNA was analyzed. Protein and RNA interaction with respect to hydrogen bond occupancy is shown in Supplementary Table S3. C241T mutation leads to structural changes in the folding of RNA (Supplementary Figure S2 ) which in turn affects the binding of MADP1 with SL1 sequence of 5'UTR and hnRNP1 J o u r n a l P r e -p r o o f with TLR sequence within SL3. H-bonds occupancy within 3Å distance cutoff and 20° distance was calculated using VMD [53] . Change in the tertiary structure of RNA leads to a drastic change in the binding of amino-acid residues for both of the transcription factors. Widening of a gap within the Wild-type complex leads to the binding of diverse MADP1 amino-acid residues with RNA, enhancing the stability compared to only one amino acid binding to two nucleotides. As shown in figure 5A , hydrogen bond formation in wild-type complex (blue) is higher as compare to mutant complex (red). Positive amino acids residues Lys-45, Arg 48, Lys-9, Arg-28, Lys-51, Arg48, Gly-3, and Ser-2 are showing higher interaction with negatively charged backbone of phosphate and other nucleotide residues, favoring more spontaneous binding due to opposite charges present on binding components. Occupancy of hydrogen bonds also increases in such type of interaction which is correlated with percentage occupancy results of Hydrogen bonds with MADP1 using VMD and decreased in bond length (Supplementary table: S3) . Some hydrogen bonds existed for the only period of time, named dynamic hydrogen bonds, while those that existed throughout the simulation are named as static hydrogen bonds. In SL1 there is a major static hydrogen bonds formation within amino acids and nucleotides. Although, overall amino acid residues showed interactions were the same but change in position of interaction was observed, which is favoring higher percentage occupancy of hydrogen bonds within wild-type complexes. Other interactions were also observed in wildtype complex, forming higher occupancy hydrogen bonds with amino acid residue in Lys-35, Lys-43, Arg-48, and Lys-9 with U13, A34, C36, and A68 respectively forming static hydrogen bonds (Figure: 5C ). Wild-type complex seems to cover wide range of nucleotides with different amino acids, which might make it more prone to efficient protein RNA complex formation. Vast varieties of dynamic hydrogen bonds formed during the simulation are formed by amino acids. Interesting results were observed when MADP1 is showing binding within TLR sequence in both complexes. In the 3D folded structure of 5'UTR, TLR-S residues share a nearby position with respect to SL1 and SL2. This unique interaction leads to one of the parameters to yet study, weather MADP1 is interacting in SARS-CoV-2 TLR-S sequence for viral proliferation or another factor is required for the same. At 100ns major change involved is in the mutant complex where protein is getting dissociated from RNA, compared to wild type complex (Supplementary Video S1 & S2). This can lead to the conclusion that the mutant complex is having relatively less stable interaction with host factor MADP1 compared to the wild type complex in terms of number hydrogen bonds 25 and 34 respectively. Other hypothesized protein-RNA complex (hnRNP1) is analyzed for commenting on the structural stability of RNA with respect to protein binding and interaction. It was surprising for us that Hydrogen bond occupancy in wild-type complex is much higher compared to the Mutant complex (Supplementary Table S3 J o u r n a l P r e -p r o o f Hydrogen bond interactions were correlated with dynamics cross-correlation matrix, where the interaction between protein and RNA with respect to four complexes is calculated throughout the trajectory of 100ns. As shown in Figure 6 , Dynamics cross-correlation matrix was calculated between inter and intra protein motions in both wild-type and mutant (C241T) complexes. As previously discussed in paper, SL1 and SL2 sequences are main interaction point in host-protein MADP1 for SARS-CoV-2 RNA during viral replication. DCCM of 1000 snapshots was performed. In DCCM matrix cyan colour indicates positive cross-correlation, magenta colour indicates negative cross-correlation, and white colour indicates no cross-correlation. DCCM was created for protein-protein, Protein-RNA and RNA-RNA interaction of Cα atoms in both complexes. DCCM for MADP1-RNA complexes were shown in Figure 6 . DCCM of hnRNP1 is shown in Figure 6C and 6D, where RNA nucleotide residues on C59, U60, G61, and U62 show interaction with protein amino acid residues Gln180, Lys283, Lys206, Gln253, Ile246 and Asn-48, Gln244 by the formation of hydrogen bonds within 3Å respectively. The intensity of TLR-S sequence is clearly visible that TLR sequence shows a strong positive correlation with wild-type RNA compared to the mutant which is shown in box A, B, C of both complexes ( Figure 6 ). From DCCM it is clearly visible that residue interaction with host initiation factors shows strong positive correlation with nucleotide residue of wild-type complex compares to mutant complexes. PCA was used to study the distinct protein conformational states in a principal component (PC) phase space during the MD simulations. The conformational transitions of the complexes were studied by projecting their trajectories onto a two-dimensional subspace2spanned by the first three eigenvectors (PC1, PC2, and PC3) with respect to Cα residues. In Supplementary Figure S5 Figure S6) . Same thing is observed in case of MADP1-RNA complexes, PC1 in wild-type complex is 64.35% and 73.42% of PC1 in mutant complex. More scattered system is observed PC2/PC3 suggests those principle modes are unstable in both complexes. PC3/PC1 of wild-type shows periodic jumps with sustainable energy barrier, in mutant scatterings suggests unstable system (Supplementary Figure S5) . Overall wild-type complex seems to be more stable compared to mutant (C241T) complex with respect to both proteins. MMGBSA (generalized born surface area) and MMPBSA (Poisson-Boltzmann surface area) are calculated for both protein-RNA complexes. First output structure containing minimized energy was taken in to study for both complexes. Residual decomposition was analysed to get the contribution of each residue within the complexes for favourable or unfavourable binding. Where ∆G bind is higher for wild-type complex compared to mutant complexes for both the proteins MADP1 and hnRNP1 (Table 3) . To get insights regarding to energy components contributing more favourable binding within protein-RNA complexes like, electrostatic, Hbonds, Van der Waals and ligand strain energy were analyzed. For each protein-RNA complex, electrostatic energy is major energy component contributing to favourable binding within both complexes, while Solv GB (Generalized Born electrostatic solvation energy) significantly opposes the protein-RNA interaction and complex formation. Both are rather substantial in magnitude but compensate each other, overall resulting in favourable total electrostatic contributions. Our major aim was to study the effect the mutation C241T with respect to host replication factor MADP1, docking and molecular simulation studies shows the effect of mutation in favour of host. Binding energy calculations are very crucial for commenting on the complex formation concerning both wild-type and mutant RNA. In MMGBSA, for MADP1 binding energy for WT and MUT complexes were -73.58kcal/mole and -59.66kcal/mole respectively. From binding energy, wild-type complex seems to have more efficient complex formation compare to mutant. In figure 8A , 8A1 and 8B1, it is clearly visible that majority part of amino acid residues is falling in lime green-blue zone which represents near to zero value for energy contribution in mutant complex. While in the wild-type complex, residues favouring negative spontaneous energy are significantly high. In wild-type complex, amino acid residues interacting more spontaneous are Lys-32, Arg-69, Gly-7, Asn-66, Asn-73, Gln-65, Asn-62, Lys-45, Arg-55 and Lys-51, showing interfacial hydrogen bonds and intermolecular stacking interactions with nucleotide residues covering SL1 and SL2. Residues decomposition graph for MADP1 residues and WT-RNA shows higher negative energy compare to other basic residues within the complex. In wild type complex MADP-1 residues Lys-68 (25.32±2.3kcal/mole), Lys-38 (-16.72±3.1kcal/mole), Gln-72 (-27.30±3.1kcal/mole), and Asn-79 (-27.27±1.3kcal/mole) shows 4-5-fold higher energy decomposition compared to other amino acids residues also showing direct contact with RNA backbone (Figure 7 & 8) . In mutant complex amino acid residues interacting with more spontaneous energy are Ser-9, Gly-1, Lys-42, Tyr-34, Arg-55, and Lys53 (showing direct contact with nucleotide residues) ( Figure 8B and 7A ). In MUT, minimized structure residues number 1-11 are sharing direct contact with RNA so it is showing less flexibility which is shown in RMSF graph also while in WT there is total opposite case. From Table 3 , it is clear that the majority of amino acids residues where electrostatic potential, Vander Waals and Hbonds interaction are high also shows high MMGBSA energy. MADP1-WT complex have 20 hydrogen bonds are forming while in mutant complex only 10 hydrogen bonds are forming, which clearly narrate more compact binding in WT complex compare to MUT ( Figure 8B and 8G). MADP1 shows a higher binding affinity to SL1 region in WT RNA compare to mutant RNA. In MMPBSA, there is -385.722±22.25KJ/mole and -85.9493±30.25KJ/mole binding energy for WT and MUT complex respectively. Over all residual decomposition for WT complex is much higher compare to MUT for both protein and RNA residues (figure 7C & 7D). Scale bar showing residual contribution with respect to MADP1 and RNA residues is also showing higher energy release in WT residues compare to mutant ( Figure 8 ). Through MMGBSA and MMPBSA wild-type complex is showing much higher energy compare mutant complex, hence WT seems to have more stable protein-RNA interactions. For hnRNP1, binding energy for wild-type and mutant complex is -69.85 Kcal/mole and -45.85 Kcal/mole. Again wild-type complex seems to bind target with higher affinity compare to mutant in case of hnRNP1 also. Other major contributing energy components for complexes are Vander Waals energy and hydrogen-bond energy (Table 3) . These three energies are favoring complex formation more efficiently so energy potencies of each residue in complex were analysed. As electrostatic energy is favoring the complex formation, residues taking part in interaction are shown in term of electrostatic energy (Supplementary Figure S7D and S8D) . With respect to protein residues, heat maps were shown to correlate amino acid residues with respect to their energy for complex formation. Protein interface residues binding to RNA are showing positive electrostatic potential complimentarily to negative electrostatic potential of the phosphate backbone of RNA (Supplementary Figure S7E and S8E). Extra positive electrostatic potential of residues, Lys 218, L, Lys-259, Arg-277, Arg-254 and Arg185 favors enhanced binding to TLR-S sequence of RNA, henceforth favouring efficient binding of wild-type complex (shown in blue) (Supplementary Figure S7 & S8). To identify key residue important for binding, residual decomposition was analyzed for both complexes. For wild type complex RNA residues, U47, C59, U60, U62, U63, C64, A79, A81, C123 and G124 are interacting hnRNP1, covering TLR-S sequence -UCUU-, SL2 and SL4. In mutant complex RNA residues, U11, U10, C32, A31, C66, A69, A68, A73 and A74 seem to be interact with hnRNP1. It was quite surprising that the minimized structure of mutant was interacting within SL1 region, more efficiently compare to TLR-S region. HnRNP1 is supposed to bind at TLRs region for viral proliferation, mutated complex (energy minimized structure) is binding less efficiently to TLR-S region. Superimposition of both minimized structures gives the clear idea for the above statements, wt-hnRNP1 is near to SL3 having TLR-S and SL4, while mut-hnRNP1 is near to SL2 and SL3 (Supplementary Figure S10) . From free energy decomposition, it was unsurprised that residues which were having more hydrogen bonds energy and electrostatic energy will possess higher decomposition energy which correlating in supplementary figure S9. In wild-type complex RNA residues within the TLR-S sequence possess more spontaneous, decomposition energy U60 (21.36kcal/mole), U62 (16.89kcal/mole) and C64 (-14.56kcal/mole) compare to the average of -1.5-2 kcal/mole with same resides of the mutant complex. While in mutant residues, C66 (-11.53kcal/mole), A69 (-5.24kcaal/mole), A73 (-10.2kcal/mole), and A74 (-15.2kcal/mole) (Supplementary Figure S9) . Wild-type sequence seems to have more favourable binding in TLR-s sequence compare to mutant, so 241C seems to be favourable for viral proliferation compare to 241T. Structural analysis of both RNA, reveal changes in TLR-S region while folding in 3D form, which is affecting the binding of host replication factor MADP1. According to results obtained through MD analysis Mutant, the complex seems to be less stable compare to wild type based on RMSD and Hydrogen bonding between protein RNA complexes. In-Silico findings suggest weaker interaction of mutant RNA with host factor MADP1 compared to wild-type. This suggests lesser replication efficiency in mutant SARS-CoV-2 strain. The 5' UTR variant C241T has emerged in March, 2020 and its one of the most observed variants in genomes sequenced in 2020, with a frequency of 0.505 [20] . To further correlate this finding, epidemiological data of global cases, death rate, and the recovery rate was obtained from World Health Organization covid-19 dashboard. The recovery rate in global cases has increased from 2.2% in March 2020 to 70% to date whereas the overall decline in death rate is observed (Supplementary Figure S11) . Although there are many other reasons for increased recovery rate, but the effect of most frequent mutation in the genome can also be not neglected. According to WHO, before the ending of March death rate is 7.1% while recently in October the death rate is 1.46 %. The fact that if viral RNA is binding less efficiently with host replication factors due to structural changes, the efficiency of virus replication within the host decreases. However, these findings can be further validated experimentally using cell lines and other In-vitro methods. 5' UTR interactions with host factors were studied and it was concluded that C241T changes the SL4, which overall changes the folding of RNA. MADP1 and hnRNP-1 reduce binding shows a decrease in efficiency of the virus to replicate in a host, which in terms decrease the death rate (increasing the recovery rate). Results were correlated with the epidemiological data which is also showing an increased recovery rate all over the world. hnRNP-1 interaction which SARS-CoV-2 is not yet been studied. From our studies, it can be hypothesized that hnRNP1 is binding to the -UCUU sequence present in SARS-CoV-2. Crystal structure of a Raver1 PRI3 peptide in complex with poly-pyrimidine tract binding protein RRM2 (PDB: ID 3ZZY) Solution structure of RNA binding domain in Zinc finger CCHC-type and RNA binding motif 1, MADP1 (PDB: ID 2E5H) Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome GenBank Severe Acute Respiratory Syndrome-CoronaVirus-2 2. UTR: Untranslared Regions SARS-CoV-1: Severe Acute Respiratory Syndrome-CoronaVirus-1 GISAID: Global Initiative on Sharing All Influenza Data 6. ssRNA: Single stranded Ribonucleic Acid 7. eIF2: Eukaryotic initiation factor IRES: Internal Ribosomal Entry Sie 9. ACE2: Angiotensin Converting Enzyme Situation Report -65 -Coronavirus disease 2019, World Health Organization Coronavirus biology and replication: implications for SARS-CoV-2 RNA-RNA and RNA-protein interactions in coronavirus replication and transcription Broad-spectrum antiviral activity of the eIF4A inhibitor silvestrol against corona-and picornaviruses Insights into structural and mechanistic features of viral IRES elements Unanticipated antigens: Translation initiation at CUG with leucine Inhibition of host translation by virus infection in vivo The highly conserved 5′ untranslated region as an effective target towards the inhibition of Enterovirus 71 replication by unmodified and appropriate 2′-modified siRNAs s. makin. k.nakagawa, k.g. lokugamage, free information in English and Mandarin on the novel coronavirus COVID-Viral and Cellular mRNA Translation in Coronavirus-Infected Cells, Advances in Viral Research Structural basis of RNA cap modification by SARS-CoV-2 A pneumonia outbreak associated with a new coronavirus of probable bat origin Immune Response, Inflammation, and the Clinical Spectrum of COVID-19 Human Coronavirus : Host-Pathogen Interaction Insights into RNA Biology from an Atlas of Mammalian mRNA-Binding Proteins Host Factors in Coronavirus Replication Binding of the 50-untranslated region of coronavirus J o u r n a l P r e -p r o o f RNA to zinc finger CCHC-type and RNA-binding motif 1 enhances viral replication and transcription Secondary structure of the SARS-CoV-2 5'-UTR Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19 . The COVID-19 resource centre is hosted on Elsevier Connect , the company ' s public news and information Mouse Hepatitis Virus Stem-Loop Functions as a Spacer Element Required To Drive Subgenomic RNA Synthesis ᰔ RNA genome conservation and secondary structure in SARS-CoV-2 and SARS-related viruses: a first look Analysis of subgenomic mRNA transcription, 3CLpro and PL2pro protease cleavage sites and protein synthesis Corresponding autor: Miguel Ramos-Pascual 5 ' -Proximal cis-Acting RNA Signals for Coronavirus Genome Replication NextStrain: Real-time tracking of pathogen evolution Geographic and Genomic Distribution of SARS-CoV-2 E156G and Arg158, Phe-157/del mutation in NTD of spike protein in B.1.617.2 lineage of SARS-CoV-2 leads to immune evasion through antibody escape The D614G mutation in the SARS-CoV-2 spike protein reduces S1 shedding and increases infectivity WHO, Covid-19 : India Situation Report Status Across States Annexin A2 binds RNA and reduces the frameshifting efficiency of Infectious Bronchitis Virus Requirement of the poly(A) tail in coronavirus genome replication Host cell proteins interacting with the 3′ end of TGEV coronavirus genome influence virus replication Regulation of Coronaviral Poly(A) Tail Length during Infection TRACE : Tennessee Research and Creative Exchange 5 ' -Proximal cis-Acting RNA Signals for Coronavirus Genome Replication Structure of the SARS-CoV-2 Nsp1/5′-Untranslated Region Complex and Implications for Potential Therapeutic Targets Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19 . The COVID-19 resource centre is hosted on Elsevier Connect , the company ' s public news and information 3DNA: A versatile, integrated software system for the analysis, rebuilding and visualization of three-dimensional nucleic-acid structures 3DNA: A software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures Automated 3D structure composition for large RNAs MutaRNA : analysis and visualization of mutation-induced changes in RNA structure Identification of protein-interacting nucleotides in a RNA sequence using composition profile of tri-nucleotides Solvated protein -DNA docking using HADDOCK Information-driven protein-DNA docking using HADDOCK: It is a matter of flexibility PatchDock and SymmDock : servers for rigid and symmetric docking NPDock: A web server for protein-nucleic acid docking HDOCK : a web server for proteinprotein and protein -DNA / RNA docking based on a hybrid strategy Applying Physics-Based Scoring to Calculate Free Energies of Binding for Single Amino Acid Mutations in Protein-Protein Complexes GROMACS development team Improved side-chain torsion potentials for the Amber ff99SB protein force field LINCS: A linear constraint solver for molecular simulations Particle mesh Ewald: An N·log(N) method for Ewald sums in large systems Molecular dynamics with coupling to an external bath VMD : Visual Molecular Dynamics UCSF Chimera -A visualization system for exploratory research and analysis G-mmpbsa -A GROMACS tool for high-throughput MM-PBSA calculations The Bio3D packages for structural bioinformatics Bio3d: An R package for the comparative analysis of protein structures Integrating protein structural dynamics and evolutionary analysis with Bio3D Online interactive analysis of protein structure ensembles with Bio3D-web Evaluation of the role of heterogeneous nuclear ribonucleoprotein A1 as a host factor in murine coronavirus discontinuous transcription and genome replication The nucleocapsid protein of SARS coronavirus has a high binding affinity to the human cellular heterogeneous nuclear ribonucleoprotein A1 RNA structural dynamics as captured by molecular simulations: A comprehensive overview