key: cord-0855978-ez2syc0y authors: Svetlov, Dmitri; Artsimovitch, Irina title: Reductionism Ad Absurdum: The Misadventures of Structural Biology in the Time of Coronavirus date: 2021-10-06 journal: ACS Infect Dis DOI: 10.1021/acsinfecdis.1c00492 sha: 03d0de718530b4e2c8b4d43d8fa4041fff310c45 doc_id: 855978 cord_uid: ez2syc0y The tragic consequences of the COVID-19 pandemic have led to admirable responses by the global scientific community, including a profound acceleration in the pace of research and exchange of findings. However, this has had considerable costs of its own, as erroneous conclusions have propagated faster than researchers have been able to detect and correct them. We illustrate the specific misunderstandings that have resulted from reductionist approaches to the study of SARS-CoV-2 RNA-dependent RNA polymerase (RdRp), which are but one instance of a regrettably growing trend in structural biology. Far from merely being cautionary tales about the conduct of scientific research, these errors have had significant practical impact, by hampering a correct understanding of RdRp structure and mechanism, its inhibition by nucleoside analogues such as remdesivir, and the discovery and characterization of such analogues. After correcting these misunderstandings, we close with several recommendations for a broader correction of the course of scientific research. T he ongoing Coronavirus Disease 2019 (COVID- 19) pandemic has resulted in more rapid loss of human life, and more profound economic and social consequences, than any pandemic in a century. As others have noted in this journal, 1 scientists around the world have risen to this challenge admirably, adopting new norms and streamlining existing procedures in order to accelerate research and translate their findings into preventive and therapeutic measures. We join in welcoming these responses and are likewise hopeful that they will remain lasting legacies of this time of tragedy. Nonetheless, such frenetic activity is a double-edged sword, particularly in scientific disciplines accustomed to gradual acquisition of knowledge and advancement of understanding. The more frustratingly slow the pace of progress, the more it affords opportunities to detect and correct the inevitable incorrect hypotheses, experimental errors, and mistaken thinking. There is no better illustration of this duality than the recent research into the structural biology of SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2), the causative agent of COVID-19. In particular, scientists have relied heavily on reductionist approaches, by which we mean methodologies that (1) isolate a specific component of a complex and holistic system (e.g., an enzyme involved in a viral replicative cycle); (2) investigate that component independently of the system (e.g., solve the structure of that enzyme); and then use their results to (3) infer conclusions about that component's effects within the broader context (e.g., propose a role for that enzyme in the replicative cycle). Such approaches are not inherently invalid, but they can easily lead researchers astray if used to address questions that are relevant or meaningful only within the context of the system as a whole. In the case of this pandemic, the truly important context is the infection of human hosts by SARS-CoV-2 and its subsequent replication therein, and when researchers have divorced their research from this context, they have often obtained results that are inapplicable at best and misleading at worst. Consequently, some fraction of successive research efforts might have been in vain, while other opportunities were missed because of the allocation of resources elsewhere. Following a synopsis of the relevant molecular virology, we will recount some of the misadventures of such reductionist research into SARS-CoV-2, showing in detail how it has hindered rather than helped scientific understanding of how the virus behaves. We will conclude with three recommendations for reforming how scientific research is conducted and communicated, not only to avoid similar misadventures in the future but also to preserve the positive developments in science spurred by this pandemic, and to best prepare for the next one. To understand how and why reductionist structural biology has erred in studying SARS-CoV-2, some knowledge of the fundamental "machinery" of the virus is required. Coronaviruses constitute one of the five families in the order Nidovirales, whose members share a distinctive genome organization and consequent mechanisms for gene expression and viral replication. 2 Structural, i.e., virion, and accessory proteins are encoded in different open reading frames (ORFs) from the replicase ORF, which encodes the nonstructural proteins (nsps) involved in the viral replicative cycle. These proteins vary in number but are all translated from nidoviral genomes as polyproteins that are subsequently cleaved into individual proteins by one or more cognate proteases. In SARS-CoV-2, polyprotein cleavage into the 16 nsps is primarily performed by the "main protease," nsp5. Several of these nsps are subsequently involved in the replication of the viral genome by RNA-dependent RNA polymerase (RdRp). While many RdRps are single-subunit and therefore reasonably simple to understand, the SARS-CoV-2 RdRp holoenzyme requires the association of the catalytic subunit (nsp12) with accessory proteins nsp7 and nsp8. 3 This holoenzyme in turn must bind to two copies of the helicase nsp13 4 and associate with an RNA-binding protein, nsp9, the proofreading exonuclease nsp14, and various capping enzymes to form a replication−transcription complex that mediates the synthesis and modification of all coronaviral RNAs. 5 All RNA polymerases perform transcription by catalyzing the transfer of nucleoside monophosphates (NMPs) to nascent RNA chains, but this is only one physiological example of nucleotidyl transfer ("NMPylation"). Indeed, nidoviral RdRps are further complicated by the fact that their catalytic subunits contain two active sites that both catalyze NMPylation. One is the "primary" active site (AS1), structurally conserved across RNA viruses, that performs RNA chain synthesis; 6 the second (AS2) lies within a Nidovirus RdRp-Associated Nucleotidyl transferase (NiRAN) domain. 7 These two sites' activities are independent, in that each one can still occur if the other is abolished, 8, 9 but both are essential for nidoviral replication. 7, 8 The somewhat inelegant name of NiRAN reflects the fact that it was revealed by a bioinformatic analysis that identified the domain and predicted its enzymatic activity. 7 In that pioneering paper, the authors validated the existence of NiRAN and showed that its activity is essential for nidoviral replication. Although they were unable to demonstrate its exact role in the replicative cycle, they used their functional data and phylogenetic arguments to propose three potential functions: RNA ligation, 5′ capping of mRNA, and "protein priming" the initiation of transcription. While the first possibility was and remains entirely speculative, the latter two have been reported in SARS-CoV-2 10,11 and are critical for replication of diverse viruses. The first cryo-EM structure of SARS-CoV-2 RdRp was of an enzyme prepared via a bacterial expression system that had been optimized for efficiencyspecifically, codons that are rare in Escherichia coli were replaced with more common synonymous codons so that translation would proceed without pausing. 12 Researchers perform such codon optimization routinely, even though even a single synonymous codon substitution can compromise a protein's function. 13 This is believed to result from misfolding: nascent peptides fold cotranslationally, and ribosome pause sites defined by rare codons appear to be an intentional "bottleneck" that is essential for physiologically correct folding to occur. 13 In this particular case, the RdRp thus obtained was largely inactive, and we showed that this is indeed due to misfolding, because this activity was partially rescued by attenuating translation and/or by preincubation with the accessory proteins nsp7 and nsp8. 9 We also discovered that AS1 and AS2 appear to "cross-talk" via an allosteric pathway, with the binding of nucleotides and nucleoside analogues (NAs) to AS2 resulting in the activation of RNA chain extension at AS1. Because RdRp is the only protein common to all (+) RNA viruses, it has been one of the two enzymes most commonly targeted by researchers seeking to repurpose existing drugs against SARS-CoV-2, the other being the main protease. 14 Consequently, many of these efforts have explored various NAs as potential therapeutics. The most famous of these candidates is undoubtedly remdesivir (RDV), an adenosine analogue with a cyano group at the 1′ position of the ribose, which showed promise against Ebola infection (1) . RDV is a prodrug that is converted into an active RDV-triphosphate in target cells, and many research groups have probed its subsequent interactions with RdRp in order to elucidate its mechanism of inhibition. Early studies led to a view, now broadly accepted, that RDV acts as a chain terminator, either immediately 15 or after incorporation of three nucleotides when the cyano group of RDV clashes with the Ser861 residue of nsp12. 16, 17 However, subsequent studies showed that, when physiological concentrations of NTP substrates are used, RDV-monophosphate is efficiently incorporated into the RNA. 18−20 Thus, the observed antiviral effects of RDV could be (at least in part) due to lethal mutagenesis (Figure 1) , as was shown to be the case with favipiravir. 21 Beyond the specific activities of RDV and other candidate inhibitors, we contend more generally that most research on SARS-CoV-2 RdRp focused largely on AS1, while the function of NiRAN has been either overlooked or profoundly misunderstood. For example, another high-profile cryo-EM Coordinated activities of nsp12 active sites. The RdRp AS1 mediates RNA synthesis, and the NiRAN AS2 catalyzes NMP transfer to nsp9, which may prime RNA synthesis, and to the RNA 5′ end. While other protein NMPylases stably modify the hydroxyl groups of target residues, NiRAN transfers NMPs to the primary amine of the Nterminal Asn residue in nsp9. This reaction is reversible, possibly enabling nsp9 to deliver the linked nascent RNA to AS2. AS2 will then cleave NMP-nsp9, thereby "recycling" nsp9, and instead transfer a guanosine nucleotide to the nascent RNA, initiating the multistep formation of the cap structure. (C) Both active sites can utilize NAs as substrates, with diverse, and largely unknown, potential consequences for viral replication. study probed the role of NiRAN in the replicative cycle. 11 In a snapshot of an RdRp-helicase complex, researchers observed the N-terminus of nsp9 buried deep within the NiRAN active site, along with a bound GDP molecule. From this, they reasoned that nsp9 must either competitively inhibit NiRAN NMPylation or serve as its target. Because they detected capped RNA, but not NMPylated nsp9, in their functional assays, they conclusively assigned the capping role to NiRAN and posited that nsp9 binds RNA to position the RNA 5′ end in a capping intermediate. 11 However, those researchers used an artificially extended nsp9, with two additional, non-native residues at the Nterminus. The fact that nsps are formed by autocatalytic cleavage of polyproteins would suggest that the identities of their terminal residues are perhaps physiologically important. Indeed, we and others showed that the very first residue of nsp9 is NMPylated by NiRAN; 8, 22 consequently, the native Nterminus is absolutely essential for nsp9 modification and viral replication in turn. 8 We also showed that AS2 can transfer RDV-monophosphate to nsp9; 22 an nsp9 thus mis-modified may be unable to support the viral life cycle. While the exact role of nsp9 is yet unknown, its RNA-binding activity is thought to be critical. 8, 11, 22 Another recent study 23 concluded that nsp9-RNA contacts are distinct from those proposed based on the cryo-EM data analysis, 11 emphasizing the need for functional validation of structure-based models. The unique properties of the NiRAN AS2 are not only scientifically curious; they may also hold promise for repurposing of the existing NAs, which have several drawbacks as RdRp inhibitors. NAs can be excised by the nsp14 exonuclease and can potentially interfere with host transcription and other NTP-dependent processes, reducing efficacy and raising toxicity, respectively. 24 A more subtle limitation is that only those NAs that closely mimic cognate NTPs are recognized by AS1, greatly restricting the chemical space of potential inhibitors targeting that site. On the other hand, the NiRAN domain could be an excellent target for diverse NAs because, unlike AS1, AS2 does not have a basepairing constraint during substrate loading and makes no basespecific contacts to NTPs. 4, 11 Indeed, we found that nsp9 NMPylation is inhibited by nucleoside mono-, di-, and tetraphosphates, which cannot be used as substrates and are thus less likely to interfere with host enzymes. 22 Taken together, the complexity of the RdRp enzymatic mechanisms, which remain far from fully understood, demonstrates the unsuitability, and folly, of research that seeks to examine one aspect of such a system in isolation. Not only have efforts to discover or repurpose NAs against RdRp been needlessly restrained by an exclusive focus on AS1, the characterization of such NAs has also been harmed by ignorance, or lack of appreciation, of the importance of AS2 and the interplay between the two. We hope that we have illustrated the dangers of over-reliance on reductionist approaches in structural biology and overconfidence in the physiological applicability or relevance of their conclusions. These dangers can be avoided in several ways. One is to simply verify such relevance in an appropriate way: in the case of SARS-CoV-2, this would most naturally involve replication assays in cell culture. The examples above indicate that it is especially important to perform such "cross-validation" when utilizing any biotechnological means of modifying a natural system for the sake of expediency. A better solution is to conduct research in more collaborative and holistic fashion. We do not advocate for greater exchange of reagents or data merely so that scientists can more easily engage in myopic research, answering questions of ever narrower focus and ever less connection with the "bigger picture." Meaningful collaboration involves bringing together diverse perspectives and complementary methodologies and expertise, allowing those involved to understand complex systems in ways that would be impossible otherwise. We also recommend that, both within and across disciplines, efforts are made to standardize the reporting of research and results, especially of the "materials and methods" involved. Among the many significant benefits of such standardization, we will highlight three here. First, we see no justification for the wildly divergent formats for research articles mandated by the multitude of journals that currently exist, and such divergence requires researchers to spend some fraction of their time acquiring and maintaining familiarity with multiple formats and often rewriting manuscripts to accommodate a different format than one initially envisioned. While innovations such as bioRxiv have accelerated the dissemination of research findings, publishing work in peerreviewed journals is an inescapable requirement for academic scientists; therefore, that process of publication should be optimized to reduce unnecessary demands on researchers' time. We are not aware of any studies that have quantified the costs resulting from format proliferation, but we expect that we would find their results depressing. Second, many research projects will utilize materials and methods that are mostly standard for the fields and questions involved. The focus should therefore be placed on approaches that are nonstandard or even novel. Standardization will therefore raise the awareness of reviewers and other researchers on precisely those aspects of a project that require greater scrutiny, perhaps helping to correct the errors before publication. Third, in disciplines where research is very fast-paced, it is increasingly difficult for researchers to remain abreast of the latest findings. It is simply not humanly possible for anyone to even encounter, let alone read, all of the SARS-CoV-2 papers that have been released, which already number more than half a million. Furthermore, this explosion of submissions presents challenges to rigorous peer review, blurring the line between a preprint and a published paper. In this case, effective use of literature absolutely requires the use of informatic (in the classical sense of the term) tools, and greater standardization of the literature will only improve the power and robustness of such tools. We stress that this complements rather than substitutes for human involvement; indeed, it enables more optimal utilization of the most valuable and limited resource in scientific research: human time. Each and every improvement of the tools and processes we utilize can yield benefitsfor research consortia, 14 curators of scientific communication, 25 and peer reviewers alikeby helping us to better focus and prioritize our efforts as we attempt to handle such avalanches of data fairly and effectivelybut also quickly. Our final recommendation is to incorporate the detection and correction of scientific errors more vigorously within the broader enterprise of research, including through bioinformatic and cheminformatic tools. Among these, we see particular promise in servers, both those that perform computational tasks and those that serve as repositories of data, and pipelines that sequentially connect multiple in silico methodologies in order to answer complex questions. As part of the global response to the COVID-19 pandemic, some existing tools have been adapted, e.g., ensemble meta-docking for inhibitor identification, 26, 27 whereas entirely new ones have been introduced, e.g., a genome browser 28 and a drug-repurposing pipeline. 29 We are encouraged by these innovations and urge that greater efforts be expended along these avenues. In particular, such tools should be generalized as much as possible, so that they can in turn be "repurposed" to combat the next viral, or bacterial, pandemic that emerges. Throughout that process, they should be rewritten as necessary to allow users to readily identify the empirical data used in the algorithms and the impact of those data on the results. Ideally, such tools would monitor the scientific literature and incorporate new findings, and corrections of previous ones, into their algorithms. The necessity of available and accurate empirical data to inform computational studies has already been realized in the context of drug screening and repurposing. 30 For a typical physiological target, the chemical space of candidate drugs, both existing and hypothetical, is too vast for researchers to explore exhaustively; therefore, screening proceeds like a "funnel" to exclude as many candidates deemed unpromising as quickly as possible. Only a minuscule fraction of candidates are ever evaluated empirically, and consequently, the use of inaccurate information at even a single step in the screening process can prove disastrous. These recommendations are somewhat general because we believe that not only are they applicable to, and worthwhile for, many scientific fields but also that their implementations may well differ from one field, or research context or goal, to another. Moreover, we believe that everyone involved in the scientific enterprise in its broadest senseresearchers, communicators, and policymakers, among othershas both a vested interest in its improvement and contributions to make toward such improvement. This is especially the case where collaborations and communication occur between diverse groups of scientific contributors, as is happening increasingly. Therefore, we intend our recommendations to spark discussionand, we hope, actionas we all seek to learn the lessons from this pandemic and build a better future after it. Science at Its Best in the Time of the COVID-19 Pandemic The Nonstructural Proteins Directing Coronavirus RNA Synthesis and Processing Structure and function of SARS-CoV-2 polymerase Structural Basis for Helicase-Polymerase Coupling in the SARS-CoV-2 Replication-Transcription Complex A unifying structural and functional model of the coronavirus replication organelle: Tracking down RNA synthesis A Comprehensive Superposition of Viral Polymerase Structures Discovery of an essential nucleotidylating activity associated with a newly delineated conserved domain in the RNA polymerase-containing protein of all nidoviruses Coronavirus replication-transcription complex: Vital and selective NMPylation of a conserved site in nsp9 by the NiRAN-RdRp subunit Allosteric activation of SARS-CoV-2 RdRp by remdesivir triphosphate and other phosphorylated nucleotides Protein-primed RNA synthesis in SARS-CoVs and structural basis for inhibition by AT-527 Cryo-EM Structure of an Extended SARS-CoV-2 Replication and Transcription Complex Reveals an Intermediate State in Cap Synthesis Structure of the RNA-dependent RNA polymerase from COVID-19 virus A code within the genetic code: codon usage regulates co-translational protein folding Early Returns on Small Molecule Therapeutics for SARS-CoV-2 Structural basis for inhibition of the RNA-dependent RNA polymerase from SARS-CoV-2 by remdesivir Remdesivir is a direct-acting antiviral that inhibits RNA-dependent RNA polymerase from severe acute respiratory syndrome coronavirus 2 with high potency Structural Basis for RNA Replication by the SARS-CoV-2 Polymerase Remdesivir is a delayed translocation inhibitor of SARS-CoV-2 replication Mechanism of SARS-CoV-2 polymerase stalling by remdesivir Inhibition of SARS-CoV-2 polymerase by nucleotide analogs: a single molecule perspective Rapid incorporation of Favipiravir by the fast and permissive viral RNA polymerase complex results in SARS-CoV-2 lethal mutagenesis Artsimovitch, I. NMPylation and de-NMPylation of SARS-CoV-2 nsp9 by the NiRAN domain A distinct ssDNA/RNA binding interface in the Nsp9 protein from SARS-CoV-2 Coronavirus RNA Proofreading: Molecular Basis and Therapeutic Targeting ASM COVID-19 Research Registry Using parallelized incremental metadocking can solve the conformational sampling issue when docking large ligands to proteins The UCSC SARS-CoV-2 Genome Browser Prediction of potential inhibitors for RNA-dependent RNA polymerase of SARS-CoV-2 using comprehensive drug repurposing and molecular docking approach Artificial intelligence, drug repurposing and peer review Complete contact information is available at: https://pubs.acs.org/10.1021/acsinfecdis.1c00492 The authors declare no competing financial interest. We thank Natacha Ruiz and Yuri Wolf for thoughtful and constructive feedback. Our transition to studies of SARS-CoV-2 RdRp was made possible by The Ohio State University Office of Research.