key: cord-103892-v6gkubd4 authors: Mäkinen, Janne J.; Shin, Yeonoh; Vieras, Eeva; Virta, Pasi; Metsä-Ketelä, Mikko; Murakami, Katsuhiko S.; Belogurov, Georgiy A. title: The mechanism of the nucleo-sugar selection by multi-subunit RNA polymerases date: 2020-07-01 journal: bioRxiv DOI: 10.1101/2020.06.30.179606 sha: doc_id: 103892 cord_uid: v6gkubd4 RNA polymerases (RNAPs) synthesize RNA from NTPs, whereas DNA polymerases synthesize DNA from 2’dNTPs. DNA polymerases select against NTPs by using steric gates to exclude the 2’ OH, but RNAPs have to employ alternative selection strategies. In single-subunit RNAPs, a conserved Tyr residue discriminates against 2’dNTPs, whereas selectivity mechanisms of multi-subunit RNAPs remain hitherto unknown. Here we show that a conserved Arg residue uses a two-pronged strategy to select against 2’dNTPs in multi-subunit RNAPs. The conserved Arg interacts with the 2’OH group to promote NTP binding, but selectively inhibits incorporation of 2’dNTPs by interacting with their 3’OH group to favor the catalytically-inert 2’-endo conformation of the deoxyribose moiety. This deformative action is an elegant example of an active selection against a substrate that is a substructure of the correct substrate. Our findings provide important insights into the evolutionary origins of biopolymers and the design of selective inhibitors of viral RNAPs. All cellular lifeforms use two types of nucleic acids, RNA and DNA to store, propagate and utilize their genetic information. RNA polymerases (RNAPs) synthesize RNA from ribonucleoside triphosphates (NTPs), whereas DNA polymerases (DNAPs) use 2'-deoxyribonucleoside triphosphates (2'dNTPs) to synthesize DNA. The RNA building blocks precede the DNA building blocks biosynthetically and possibly also evolutionarily 1, 2 . Messenger RNA molecules function as information carriers in a single-stranded form, whereas ribosomal, transfer and regulatory RNAs adopt complex three-dimensional structures composed of double-stranded segments. The double stranded RNAs favor A-form geometry where the ribose moiety of each nucleotide adopts the 3'-endo conformation (Fig. 1a) . In contrast, DNA functions as a B-form double helix, where the deoxyribose of each nucleotide adopts the 2'-endo conformation (Fig. 1a, b) . Hybrid duplexes between the RNA and DNA transiently form during transcription and adopt an A-form geometry because 2'OH groups in the RNA clash with the phosphate linkages in the B-form configuration. The sugar moieties of NTPs and 2'dNTPs equilibrate freely between the 3' and 2'-endo conformations in solution with the overall bias typically shifted towards the 2'-endo conformers 3 . However, both NTPs and 2'dNTPs typically adopt the 3'-endo conformation in the active sites of the nucleic acid polymerases 4 . RNAPs and DNAPs need to discriminate efficiently against the substrates with the non-cognate sugar. The intracellular levels of NTPs are in the range of hundreds of micromoles to several millimoles per liter and exceed those of the corresponding 2'dNTPs more than tenfold [5] [6] [7] . When selecting the 2'dNTPs, most DNAPs use bulky side-chain residues in their active sites to exclude the 2'OH of the NTPs (reviewed in Ref. 8 ). These steric gate residues, typically Gln/Glu in A-family DNAPs and Tyr/Phe in Y-and B-family DNAPs, create a stacking interaction with the deoxyribose moiety of an incoming 2'dNTP and form a hydrogen bond between the backbone amide group and the 3′-OH group of the deoxyribose moiety (Fig. 1c) . Selection against the 2'dNTPs by RNAPs is a daunting challenge because 2'dNTPs are substructures of the corresponding NTPs. Single-subunit RNAPs (e.g., mitochondrial and bacteriophage T7 and N4 enzymes) are homologous and structurally similar to DNAPs. However, single-subunit RNAPs lack a steric gate and use a conserved Tyr residue to discriminate against 2'dNTPs 9, 10 . Tyr selectively facilitates the binding of NTPs by forming a hydrogen bond with the 2'OH group of the NTP ribose ( Fig. 1c) 11, 12 . Intriguingly, the same Tyr also inhibits the incorporation of 2'dNTPs by an unknown mechanism 9, 13 . Noteworthy, a homologous Tyr hydrogen bonds with the steric gate Gln/Glu residue in A-family DNAPs ( Fig. 1c ) 14, 15 . The mechanism of discrimination against 2'dNTPs by the multi-subunit RNAPs (bacterial, archaeal and eukaryotic nuclear RNAPs) is poorly understood. The combined structural evidence (reviewed in Ref. 16 ) suggests that the 2'OH group can make polar contacts with three universally conserved amino acid side chains: β'Arg425, β'Asn458 and β'Gln929 (numbering of the Escherichia coli RNAP). The β'Arg425 and the β'Asn458 are contributed by the active site cavity and can interact with the 2'OH of NTPs in the open and closed active site (see below), whereas the β'Gln929 is contributed by a mobile domain called the trigger loop (TL) and can only transiently interact with the 2'-and 3'-OH of NTPs in the closed active site [17] [18] [19] 20 (Fig. 1c) . Closure of the active site by the TL is an essential step during nucleotide incorporation by the multi-subunit RNAPs because the α-phosphate of the NTP is located 5.4 -6 Å away from the RNA 3' end in the open active site 21, 22 . Complete closure of the active site by the folding of two alpha-helical turns of the TL positions the triphosphate moiety of the substrate NTP inline for an attack by the 3'OH group of the RNA and accelerates catalysis ˜10 4 fold 17,21,23,24 . In contrast, folding of one helical turn of the TL is insufficient to promote catalysis (3'OH  αP distance 5.4 Å 19 ) but likely significantly reduces the rate of NTP dissociation from the active site by establishing contacts between the β'Gln929 and the ribose moiety and stacking of the β'Met932 with the nucleobase (reviewed in Ref. 16 ). The relative contribution of the TL (β'Gln929 and β'Met932) and the active site cavity (β'Arg425 and β'Asn458) to the discrimination against 2'dNTPs remains hitherto uncertain. The closure of the active site makes only a 5-to 10-fold contribution to an overall 500-to 5000-fold selectivity against the 2´dNTP in RNAPs from E. coli 24 and Saccharomyces cerevisiae 25 . Consistently, the open active site of the E. coli RNAP retained a ~100-fold overall selectivity against 2'dNTPs 24 . However, the open active site of the Thermus aquaticus RNAP has been reported to be largely unselective 23 , and individual substitutions of the β'Asn458 with Ser in E. coli and S. cerevisiae resulted in only a <5-fold decrease in selectivity 17, 26 . Most importantly, although the universally conserved β′Arg425 closely approaches 2'OH of the NTP in several X-ray crystal structures [17] [18] [19] 22, 20 (Supplementary Table 5 ) and has been highlighted as the sole residue mediating the selectivity against 2'dNTP in a computational study by Roßbach and Ochsenfeld 27 , the role of this residue has not been experimentally assessed. In this study, we systematically investigated the effects of individual substitutions of the active site residues on the discrimination against 2'dNTPs in single nucleotide addition (SNA) assays and during processive transcript elongation by the E. coli RNAP. This analysis demonstrated that β'Arg425 is the major determinant of the selectivity against 2'dNTPs in multi-subunit RNAPs. We further analyzed the binding of 2'-deoxy substrates by in silico docking and X-ray crystallography of Thermus thermophilus RNAP. Our data suggest that the conserved Arg actively selects against 2'dNTPs by favoring their templated binding in the 2'-endo conformation that is poorly suitable for incorporation into RNA. To investigate the mechanism of the discrimination against the 2'-deoxy substrates we performed time-resolved studied of the single nucleotide incorporation by the wild-type (WT) and variant E. coli RNAPs. Among several single substitutions of the key residues that contact NTP ribose (Fig. 1c) , we selected four variant RNAPs that retained at least half of the wild-type activity at saturating concentration of NTPs. This approach minimized the possibility that the amino acid substitutions induced global rearrangements of the active site thereby complicating the interpretations of their effects on the sugar selectivity. Transcript elongation complexes (TECs) were assembled on synthetic nucleic acid scaffolds and they contained the fully complementary transcription bubble flanked by 20-nucleotide DNA duplexes upstream and downstream (Supplementary Fig. 1a) . The annealing region of a 16nucleotide RNA primer was initially 9 nucleotides, permitting the TEC extended by one nucleotide to adopt the post-and pre-translocated states, but disfavoring backtracking 28 . The RNA primer was 5' labeled with the infrared fluorophore ATTO680 to monitor the RNA extension by denaturing PAGE. The template DNA strand contained the fluorescent base analogue 6-methyl-isoxanthopterin (6-MI) eight nucleotides upstream from the RNA 3' end to monitor RNAP translocation along the DNA following nucleotide incorporation 29 . We first measured GTP and 2'dGTP concentration series of the WT and altered RNAPs using a time-resolved fluorescence assay performed in a stopped-flow instrument (Supplementary Fig. 1b, c) . We used the translocation assay because it allowed rapid acquisition of concentration series, whereas measurements of concentration series by monitoring RNA extension in the rapid chemical quench-flow setup would be considerably more laborious. The concentration series data allowed the estimation of kcat and the Km (Michaelis constant) for GTP and 2'dGTP. We then supplemented the concentration series with timecourses of GMP and 2'dGMP incorporation obtained using a rapid chemical quench-flow technique with EDTA as a quencher. EDTA inactivates the free GTP and 2'dGTP by chelating Mg 2+ but allows a fraction of the bound substrate to complete incorporation into RNA after the addition of EDTA 30,31 . As a result, the EDTA quench experiment is equivalent to a pulse-chase setup and provides information about the rate of substrate dissociation from the active site of RNAP. A global analysis of the concentration series and EDTA quench experiments (i) allowed the estimation of the KD for GTP dissociation from the active site and (ii) suggested that the KD for the dissociation of the 2'dGTP from the active site approximately equals the Km for 2'dGMP incorporation (see Supplementary Note). We further used inferred values of kcat and KD to compare the capabilities of the variant RNAPs to discriminate against 2'dGTP ( Fig. 2) . WT RNAP displayed ~60-fold higher affinity for GTP than for 2'dGTP ( Fig. 2a, Table 1 ). The β'R425K and β'Q929M substitutions decreased the selectivity at the binding step 17-and 30fold, respectively, largely by decreasing the affinity for GTP. In contrast, the β'N458S decreased the selectivity only 4-fold, whereas the β'M932A increased the selectivity 2.5-fold. At saturating substrate concentrations, the WT RNAP incorporated GMP ~50-fold faster than the 2'dGMP ( Fig. 2b, Table 1 ). The β'R425K substitution decreased the selectivity 25-fold, primarily by accelerating the incorporation of the 2'dGMP. In comparison, the effects of other substitutions on the selectivity against the 2'dGTP at the incorporation step were relatively small (Fig. 2b, Supplementary Tables 3, 4) . The β'M932A decreased the selectivity 3-fold, whereas the β'N458S and β'Q929M increased the selectivity 1.5-and 4-fold, respectively. Noteworthy, the β'Q929M decreased the rate of 2'dGMP incorporation 10-fold. Overall, these experiments suggested that the β'Arg425 plays a central role in the discrimination against 2'-deoxy substrates ( Table 1) : the β'Arg425 selectively facilitated binding of GTP and selectively inhibited the incorporation of 2'dGMP. In contrast, the role of the β'Gln929 was complex: while the β'Gln929 selectively facilitated the binding of GTP, it also selectively facilitated the incorporation of 2'dGMP. The time-resolved SNA assays described above are superior to any other currently available techniques for the quantitative assessment of the binding and incorporation of different substrates and the effects of active site residues therein. However, these assays have several limitations: the nucleotide incorporation was measured for static complexes stabilized in the post-translocated state by the artificially limited RNA:DNA complementarity and the effects are assessed only at a single, easy to transcribe, sequence position. To test if the conclusions drawn from the SNA assay remain valid during processive transcript elongation we developed a semiquantitative assay as follows. TECs were assembled on a nucleic acid scaffold with a 49 bp-long downstream DNA and chased with NTP mixtures containing 50 µM ATP, CTP, UTP and GTP or 2'dGTP for 2 min at 25°C. Transcription with the 2'dGTP by the WT RNAP resulted in characteristic pauses at each sequence position preceding the incorporation of the 2'dGMP ( Fig. 3 , pre-G sites). We used the amplitude of these accumulations as a semi-quantitative measure of the ability of the RNAP to utilize 2'dGTP. Noteworthy, the interpretation of the processive transcription by some variant RNAPs was complicated by enhanced pausing after the incorporation of cytosine (Fig. 3b , at-C sites) and 2'dGMP ( Fig. 3b , at-G sites) in certain sequence contexts. However, these additional pauses were unrelated to the utilization of the 2'dGTP as a substrate and could be disregarded when comparing pre-G pauses that occurred upstream of all at-C and at-G pauses. In contrast to the WT RNAP, the β'R425K did not pause prior to the incorporation of the 2'dGMP ( Fig. 3) , consistently with the significantly higher 2'dGMP incorporation rate observed in SNA assays (Fig. 2b) . Moreover, the β'R425L RNAP also did not accumulate at the pre-G sites despite being strongly defective during processive transcription (Fig. 3a, Supplementary Fig. 4) . These data suggest that the loss of selectivity in the β'R425K is attributable to the absence of the β'R425 rather than the gain of function effect of the Lys residue at the corresponding position. The β'M932A paused noticeably less whereas the β'Q929M paused noticeably more than the WT RNAP at the pre-G sites (Fig. 3a, Supplementary Fig. 4 ) consistently with the 2-fold higher (β'M932A) and 10-fold lower (β'Q932M) kcat for the 2'dGMP incorporation in the SNA experiments ( Fig. 2b, Table 1 ). In contrast, the β'N458S was largely indistinguishable from the WT RNAP in its ability to utilize the 2'dGTP in the processive transcription assay (Fig. 3, Supplementary Fig. 4) , presumably because this assay is not sensitive enough to resolve the ~1.5-fold difference in kcat for the 2'dGMP incorporation (Fig. 2b, Supplementary Tables 3, 4) . Overall, the analysis of the utilization of the 2'dGTP during the processive transcription of diverse sequences by the WT and variant RNAPs recapitulated the major effects observed in the SNA experiments. Next, we tested the effects of the β'R425K, β'M932A, β'Q932M and β'N458S substitutions on utilization of 2'dATP, 2'dCTP and 2'dUTP during processive transcription (Supplementary Figs. 5-7). For each 2'dNTP, we custom designed a template where the 2'dNTP is incorporated several times early in transcription, thereby allowing unambiguous interpretation of the accumulation of RNAPs at sites preceding the 2'dNMP incorporation. An analysis of the utilization of 2'dATP, 2'dCTP and 2'dUTP largely recapitulated the effects observed for 2'dGTP, except that the β'N458S was markedly inferior to the WT RNAP in utilizing 2'dATP and 2'dUTP. Overall, these data demonstrated that the enhanced or diminished capabilities of the variant RNAPs to utilize 2'dGTP in the SNA assays reflected, in qualitative terms, their capabilities to utilize all four 2'dNTPs. The role of the β'Arg425 in selectively promoting the binding of NTPs was easy to explain because the β'Arg425 interacts with the 2'OH of the NTP analogues in several RNAP structures (Supplementary Table 5 , Fig. 1c, 4a, b) . In contrast, the observation that the β'Arg425 selectively inhibited the incorporation of 2'dNTPs could not be readily explained: our results show that the β'Arg425 substitutions promote the incorporation of the substrate that lacks the 2'OH group, which the β'Arg425 would interact with. We hypothesized that, in the absence of the 2'OH, the β'Arg425 interacted with something else and that the interaction slowed down the incorporation of 2'dNMPs into the nascent RNA. We further reasoned that the 3'OH group of the 2'dNTP was the most likely interacting partner of the β'Arg425, an inference supported by MD simulations of S. cerevisiae RNAPII 27 . However, the 3'OH group is positioned too far from the β'Arg425 when the sugar moiety is in the 3'-endo conformation (Supplementary Table 5) . We further hypothesized and demonstrated by in silico docking experiments that the 3'OH could move to within the hydrogen bond distance of the β'Arg425 if the deoxyribose moiety adopted a 2'-endo conformation (Supplementary Fig. 8 To test this hypothesis in crystallo, we solved the X-ray crystal structure of the initially transcribing complexes containing T. thermophilus RNAP, DNA and 3-nt RNA primer with incoming 2'dCTP bound at the active site at 3.14 Å resolution. The structure displayed a wellresolved electron density of the 2'dCTP and the β'Arg425 closely approaching the deoxyribose moiety ( Fig. 4c, d, Table 2 , Supplementary Fig. 9a, Supplementary Data 4) . The 2'dCTP was observed in the pre-insertion conformation, that was unsuitable for catalysis because the αphosphate was located 5.7 Å from the 3'OH of the RNA primer. The electron density was consistent with the interaction between the β'Arg425 and the 3'OH group of the deoxyribose in the 2'-endo conformation in agreement with the results of in silico docking. Interestingly, the density for the metal ion complexed by the β-and γ-phosphates of the 2'dCTP was weak and the coordination distances were longer than typically observed for Mg 2+ in the corresponding position. We modeled this metal ion as a Na + rather than Mg 2+ similarly to what has been proposed for DNA polymerase β 32 . The TL was completely unfolded in the structure of the initially transcribing complex with the 2'dCTP, in contrast to a partially helical conformation typically observed in the structures of the RNAP complexes with non-hydrolysable NTP analogues (Supplementary Table 5 To test if the unavailability of the 3'OH group was indeed responsible for the destabilization of the TL folding, we solved the X-ray crystal structure of the initially transcribing complex of the T. thermophilus RNAP with a 3'dCTP at 3.0 Å resolution. The structure displayed a wellresolved density of the 3'dCTP and the β'Arg425 closely approaching the 3'-deoxyribose moiety ( Fig. 4e, f, Supplementary Fig. 9b, Supplementary Data 6) . The 3'dCTP was in the preinsertion conformation that was unsuitable for catalysis because the α-phosphate was located 5.6 Å away from the 3'OH of the RNA primer. The overall pose of the 3'dCTP was similar to that of cytidine-5'-[(α,β)-methyleno]triphosphate (CMPCPP): the 3'-deoxyribose adopted a 3'endo conformation and the 2'OH group interacted with the β'Arg425. However, the TL was completely unfolded, supporting our hypothesis that the unavailability of the 3'OH group was alone sufficient to significantly destabilize the folding of the first helical turn of the TL. Overall, the comparative analysis of RNAP structures with CMPCPP, 2'dCTP and 3'dCTP suggested that the β'Arg425 inhibited the incorporation of 2'dNTPs by interacting with their 3'OH group and favoring the 2'-endo conformation of the deoxyribose moiety. At the same time, the structures did not provide a decisive answer as to why the 2'-endo conformations of 2'dNTPs were less suitable for incorporation into RNA than the 3'-endo conformations. The X-ray structures and in silico modeling experiments suggested that interactions between the 3'OH of the deoxyribose moiety and the β'Arg425 or the β'Gln929 were mutually exclusive. Accordingly, the β'Arg425 could inhibit the incorporation of the 2'dNMP solely by slowing down the initial steps of the TL folding, by sequestering the 3'OH group and preventing its interaction with the β'Gln929 of the TL. To test this hypothesis, we determined the incorporation rate of 3'dGMP by the WT RNAP (Supplementary Fig. 3c) . We found that the kcat for 3'dGMP incorporation was only 5-fold slower than the kcat for GMP incorporation and 10fold higher than the kcat for 2'dGMP incorporation ( Table 1 ). These data demonstrated that the sequestration of the 3'OH group accounted for no more than a 5-fold inhibition of the 2'dGMP incorporation by the β'Arg425. The remaining 10-fold inhibition of the overall 50-fold inhibitory effect was contributed by some other features of the 2'-endo binding pose, as discussed below. In this study we performed a systematic analysis of the role of the amino acid residues in the active site of the multi-subunit RNAP in selecting NTPs over 2'dNTPs. We identified a conserved Arg residue, β'Arg425 (E. coli RNAP numbering) as the major determinant of the sugar selectivity. The β'Arg425 favored binding of GTP over 2'dGTP and selectively inhibited the incorporation of 2'dNMPs into RNA (Figs. 2, 3, Table 1 ). The enhancement of NTP binding by the β'Arg425 is consistent with the observation that the β'Arg425 is positioned to hydrogen bond with the 2'OH of the NTP substrate analogues in several RNAP structures (Supplementary Table 5 ) and with MD simulations of the S. cerevisiae RNAPII 27 . However, the existing data fail to explain the inhibition of the 2'dNTP incorporation by the β'Arg425. In search of an explanation, we performed in silico docking experiments and solved the X-ray crystal structures of the initially transcribing T. thermophilus RNAP with the cognate 2'dCTP and 3'dCTP. These experiments revealed that the β'Arg425 interacts with the 3'OH group of the 2'dNTP substrate and favors the 2'-endo conformation of the deoxyribose ( Fig. 4c, d, Supplementary Fig. 8e, f) . In contrast, the ribose of the cognate NTP substrate is stabilized in the 3'-endo conformation by multiple polar contacts and hydrogen bonds with the active site residues: β'Arg425, β'Asn458 and β'Gln929 (Figs. 1c, 4a, b, Supplementary Fig. 8a , b, We next considered whether the deformation of the 2'dNTP substrate, repositioning of the β'Arg425 or both were behind the slow incorporation of the 2'dNMPs by the WT RNAP. A hybrid quantum and molecular mechanics (QM/MM) analysis of nucleotide incorporation by the S. cerevisiae RNAPII suggested that repositioning of the β'Arg425 by the 2'dNTP substrate may increase the activation energy barrier for the nucleotide addition reaction 27 . However, a comparison of the RNAP structures with bound CMPCPP, 2'dCTP and 3'dCTP revealed very small changes in the conformation of the β'Arg425 (Fig. 4) . Similarly, a survey of the published X-ray and CryoEM structures revealed that the β'Arg425 occupies approximately the same volume irrespective of the presence or absence of the active site ligands (Supplementary Table 5 ). Accordingly, we reasoned that the preferential selection of the catalytically-inert 2'-endo conformers of 2'dNTPs and the deformation of the catalytically-labile 3'-endo conformers of 2'dNTPs by β'Arg425 were likely the major factors behind the slow incorporation of the 2'dNMPs. However, it remained unclear why the 2'-endo conformers of the substrates were less suitable for the incorporation than the 3'-endo conformers. We first explored the possibility that the sequestration of the 3'OH group by the β'Arg425 makes it unavailable for the interaction with the β'Gln929 of the TL (Fig. 4c, d, Supplementary Fig. 8e, f) , thereby destabilizing the TL-mediated closure of the active site. It is well established that the closure of the active site by two helical turns of the TL accelerates the catalysis of nucleotide incorporation by ~10 4 -fold 17,21,23,24 . Noteworthy, the TL is partially folded in most structures with ribonucleotide substrate analogues (Fig. 4a, Table 5 ) 17-19,20 , yet was completely unfolded in the structures we obtained with either a 2'dCTP ( Fig. 4c, d) or a 3'dCTP ( Fig. 4e, f) . Given that the 3'dCTP was in the conventional 3'-endo conformation, the latter result suggested that the unavailability of the 3'OH group was sufficient to significantly impair the folding of the TL and slow down the catalysis by the T. thermophilus RNAP in crystallo. To quantitatively estimate the contribution of the 3'OH interactions to the catalysis, we determined the rate of the 3'dGMP incorporation by E. coli RNAP. We found that the rate of the 3'dGMP incorporation was 5-fold slower than the rate of the GMP incorporation, but 10-fold faster than that of the 2'dGMP ( Table 1 , Supplementary Fig. 3c) . These results suggested that the sequestration of the 3'OH group by the β'Arg425 could account for no more than a 5-fold out of its 50-fold overall inhibitory effect. Notably, the T. thermophilus RNAP also incorporates 3'dNMPs faster than 2'dNMPs but discriminates against both types of substrates ~40-fold stronger than the E. coli RNAP 23 . Similarly, the effects of the β'Q929M substitution were inconsistent with the idea that the 3'OH capture by β'Arg425 could alone account for the slow rate of the 2'dNMP incorporation. If that were true, the β'Q929M variant should be relatively insensitive to the absence of the 2'OH group. However, the opposite was true: β'Q929M was only twofold slower in incorporating GMP than the WT RNAP, but tenfold slower in incorporating 2'dGMP. We propose that the β'Gln929 competes with the β'Arg425 for the 3'OH group of the 2'dNTP substrate: the β'Arg425 favors the catalytically-inert 2'-endo conformer (Fig. 4c, d, Supplementary Fig. 8e, f) , whereas the β'Gln929 favors the catalytically-labile 3'-endo conformer (Supplementary Fig. 8c, d) . As a result, the β'Gln929 is more important during the incorporation of 2'dNMPs than NMPs. Since the TL folding can account only for a fraction of the inhibitory effect, what other factors make the 2'-endo conformers of 2'dNTPs catalytically inert? It is noteworthy that the sugars of the attacking and substrate nucleotides adopt the 3'-endo conformation in all RNAPs and DNAPs during the nucleotide incorporation 4 . In other words, even the 3' ends of DNA primers adopt the 3'-endo conformation to catalyze the incorporation of the 2'dNMPs into the DNA. Apparently, the A-form geometry is much better suited for the catalysis of the nucleotide condensation than the B-form geometry 3, 33 . The better accessibility of the nucleophilic 3'OH group of the attacking nucleotide is likely the primary reason. The substrate then adopts the 3'endo conformation to match the overall geometry of the A-form duplex and to avoid clashes with the attacking nucleotide 4 . In general terms, the inertness of the 2'-endo conformation of the 2'dNTPs can be partially attributed to the differences in the conformations of the triphosphate moieties that in turn originate from the differences in the bond angles at C'4 of the sugar between the 3'-and 2'-endo conformers (Fig. 1b) . We term this inhibitory component as C'4-geometry-dependent effects. However, in our view, it is impossible to further refine this hypothesis at present: (i) the resolutions of the structures are not very high (≥ 2.9 Å, Supplementary Table 5 Noteworthy, the conserved Arg is one of only five catalytic residues that are conserved in the superfamily of the so called "two-β-barrel" RNAPs 34,35 that includes the multi-subunit RNAPs and very distantly related cellular RNA-dependent RNAPs (RdRps) involved in the RNA interference ( Supplementary Fig. 10) . Accordingly, the common ancestor of the two-β-barrel RNAPs could conceivably discriminate against 2'dNTPs and therefore likely evolved in the presence of both NTPs and 2'dNTPs. This inference lends credence to the hypothesis that proteins evolved in primordial lifeforms that already possessed both RNA and DNA 36,37 . Viral RdRps (members of the so-called "right-hand" superfamily of nucleic acid polymerases) are not homologous to multi-subunit RNAPs but share some elements of their sugar selection strategies. It appears that the 3'OH of the substrate NTP facilitates the active site closure in both classes of enzymes. In multi-subunit RNAPs, 3'OH facilitates the TL folding via the interaction with β'Gln929 19 , whereas in viral RdRps, 3'OH initiates the closure by sterically clashing with Asp238 (poliovirus RdRp numbering) 38 . In both classes of enzymes, 2'dNTPs adopt a 2'-endo pose wherein the 3'OH is misplaced and cannot readily facilitate the closure of the active site, explaining low reactivities of 2'dNTPs. However, 3'dNTPs are better substrates than 2'dNTPs also for viral RdRps 30 suggesting that the low reactivity of the 2'-endo 2'dNTPs additionally relies on C'4-geometry-dependent effects (see above), which lead to a suboptimal conformation of the triphosphate moiety, a suboptimal geometry of the transition state, or both. Multi-subunit RNAPs and viral RdRps converged on using the 2'-endo binding pose to discriminate against 2'dNTPs. In doing so these enzymes accentuate the intrinsic preferences of 2'dNTPs to retain the inert 2'-endo conformation upon binding to the A-form template in the non-enzymatic system 3 In summary, our data show that a universally conserved Arg residue plays a central role in selecting NTPs over 2'dNTPs by the multi-subunit RNAPs. When NTP binds in the RNAP active site, its ribose adopts the 3'-endo conformation that positions the 3'OH group to interact with the universally conserved Gln residue of the TL domain and promotes the closure of the active site, whereas the triphosphate moiety can undergo rapid isomerization into the insertion conformation leading to efficient catalysis. The interaction of the conserved Arg residue with the 2'OH of the NTPs selectively enhances their binding more than 100-fold and renders RNAP saturated with NTPs in the physiological concentration range. In contrast, the interaction of the conserved Arg with the 3'OH of the 2'dNTP substrates shapes their deoxyribose moiety into the catalytically inert 2'-endo conformation where the 3'OH cannot promote closure of the active site and substrate incorporation is additionally inhibited by the unfavorable geometry of the triphosphate moiety. The deformative action of the conserved Arg on the 2'dNTP substrates is an elegant example of active selection against a substrate that is a substructure of the correct substrate. DNA and RNA oligonucleotides were purchased from Eurofins Genomics GmbH (Ebersberg, Germany) and IBA Biotech (Göttingen, Germany). DNA oligonucleotides and RNA primers are listed in Supplementary Table 1 Our initial docking trials revealed that docking of nucleoside monophosphates produced the most robust and quantitatively interpretable results. Thus, the docking algorithm failed to recover templated poses for nucleosides without phosphate groups. The docking algorithm also failed to position the triphosphate moiety to coordinate metal ion number two and instead attempted to maximize its contacts with the protein. As a result, the recovered conformations of the triphosphate moieties differed from those observed in crystal structures. Considering the high impact of the triphosphate moiety on the ligand binding score and our assessment that the triphosphate moiety was docked incorrectly, we opted to limit the systematic investigation of the interaction between RNAP and the sugar moieties of nucleosides to docking nucleoside monophosphates. We first docked 3'-endo CMP, 3'-endo 2'dCMP and 2'-endo 2'dCMP to the RNAP fragment 1. The docking algorithm recovered high-scoring poses (-7.8 ±0.1 kcal/mol) for CMP in 10 out of 10 runs, lower-scoring poses (-6.8 ±0.2 kcal/mol) for 3'-endo 2'dCMP in 8 out of 10 runs and 2'endo 2'dCMP in 5 out of 10 runs. The β'Arg425 side chain was kept flexible in the latter case because our manual assessment suggested that a sub-angstrom repositioning of β'Arg425 would be needed to accommodate the 2'-endo deoxyribose. We than fixed the β'Arg425 Table 6 ). These in silico experiments suggested that the semi-closed active site can bind the 3'-endo and 2'-endo 2'dCMP with similar affinities. The 3'OH of the 3'-endo 2'dCMP was positioned to interact with β'Gln929 and β'Asn458, whereas the 3'OH of the 2'endo 2'dCMP was positioned to interact with β'Arg425 and β'Asn458 ( Supplementary Fig. 8) . We further inferred that the open active should have preference for the 2'-endo 2'dCMP because β'Gln929 is not positioned to interact with the 3'OH of the substrate in the open active site. Well in line with our prediction, the X-ray diffraction data for the crystals of the RNAP-2'dCTP complex was consistent with the 2'-endo conformation of the 2'dCTP bound in the open active site (Fig. 4b, c, Supplementary Fig. 9a) . We further verified the binding preferences of the open active site in silico by removing the 2'dCTP from the model and docking alternative conformers of the 2'dCMP. The docking algorithm recovered higher-scoring poses (-6.6 ±0.1 kcal/mol) for the 2'-endo 2'dCMP in 9 out of 10 runs and lower-scoring poses (-6.3 ±0.1 kcal/mol) for 3'-endo 2'dCMP in only 2 out of 10 runs (Supplementary Table 6 ). The non-template DNA strand (5'-TATAATGGGAGCTGTCACGGATGCAGG-3') was annealed to the template DNA strand (5'-CCTGCATCCGTGAGTGCAGCCA-3') in 40 µl of 10 mM Tris-HCl (pH 8.0), 50 mM NaCl, and 1 mM EDTA to the final concentration of 1 mM. The solution was heated at 95 °C for 10 min and then gradually cooled to 22 °C. The crystals of the RNAP and promoter DNA complex were prepared as described previously The X-ray datasets were collected at the Macromolecular Diffraction at the Cornell High Energy Synchrotron Source (MacCHESS) F1 beamline (Cornell University, Ithaca, NY) and structures were determined as previously described 51,52 using the following crystallographic software: The reaction products were modelled as sums of independent contributions by the fast and slow fractions of RNAP using numerical integration capabilities of the KinTek Explorer software. Contributions of each fraction were modeled as Scheme 1. Upper and lower bounds of the parameters were calculated at a 10% increase in Chi 2 . Table 1 and Supplementary Tables 1-2 . Error bars are ranges of duplicate measurements or SDs of the best-fit parameters, whichever values were larger. a TECs were assembled using the scaffold shown above the gel panels and chased with 50 µM ATP, CTP, UTP and GTP or 2'dGTP for 2 min at 25°C. The positions of GMPs in the resolved stretches of the transcribed sequence are marked along the right edge of the gel panels. 16-bit grayscale scans were normalized using max pixel counts within each gel panel and pseudocolored using RGB palette on the right. b Lane profiles of transcription in all-NTPs and 2'dGTP chases by the wild-type and β'R425K RNAPs quantified from gels in (a). Traces were manually aligned along the X-axis and scaled along the Y-axis using several sequence positions as references. Magenta numbers are interatomic distances in Å. Panels (a) and (b) were prepared using PDB ID coli RNAPs. a TECs were assembled using the scaffold shown above the gel panels and chased with 50 µM ATP, CTP, UTP and GTP or 2'dGTP for 2 min at 25°C. The positions of GMPs in the resolved stretches of the transcribed sequence are marked along the right edge of the gel panels. 16-bit grayscale scans were normalized using max pixel counts within each gel panel and pseudocolored using RGB pale e on the right. b Lane profiles of transcription in all-NTPs and 2'dGTP chases by the wild-type and β'R425K RNAPs quantified from gels in (a). Traces were manually aligned along the X-axis and scaled along the Y-axis using several sequence positions as references. Supplementary Fig. 4 : Utilization of 2'dUTP and 2'dATP during the processive transcript elongation by the WT and variant RNAPs. a TECs were assembled using the scaffold shown above the gel panels and chased with 100 µM CTP, GTP, UTP, ATP (all-NTPs-chase), or CTP, GTP, ATP, 2'dUTP (2'dUTP-chase), or CTP, GTP, UTP, 2'dATP (2'dATP-chase) for 5 min at 25°C. The positions of UMPs or AMPs in resolved stretches of the transcribed sequence are marked along the right edge of the gel panels. 16-bit grayscale scans were normalized using max pixel counts within each gel panel and pseudocolored using RGB pale e on the right. b Lane profiles of transcription by the WT (cyan) and β'R425K (magenta) RNAPs quantified from gels in (a). Traces were manually aligned along the X-axis and scaled along the Y-axis using several sequence positions as references. +5 +8 +10 +20 Supplementary Fig. 5 : Lane profiles of transcription in all-NTPs and 2'dGTP chases quantified from gels in main text Figure 3 . Fig. 6 : Lane profiles of transcription in all-NTPs, 2'dUTP and 2'dATP chases quantified from gels shown in Supplementary Figure 4 . Fig. 7 . Utilization of 2'dCTP during the processive transcript elongation by the WT and variant RNAPs. a TECs were assembled using the scaffold shown above gel panels and chased with 100 µM GTP, UTP, ATP and CTP (all-NTPs chase) or 2'dCTP (2'dCTP chase) for 2 min at 25°C. b Lane profiles of transcription by the WT (cyan) and β'R425K (magenta) RNAPs quantified from gels Supplementary Table 1 . DNA oligonucleotides and RNA primers used in this study. We used time-resolved single nucleotide addition experiments to estimate the equilibrium constant for GTP, 2'dGTP and 3'dGTP binding and dissociation in the active site of RNAP and to determine the first order rate constant (also known as the turnover number) for the incorporation of GMP, 2'dGMP and 3'dGMP into the nascent RNA. The TECs were assembled on synthetic nucleic acid scaffolds and contained the fully complementary transcription bubble flanked by 20-nucleotide DNA duplexes upstream and downstream (Supplementary Fig. 1a) . The annealing region of a 16-nucleotide RNA primer was initially 9 nucleotides, permitting the TEC extended by one nucleotide to adopt the post-and pre-translocated states, but disfavoring backtracking. The RNA primer was 5' labeled with the infrared fluorophore ATTO680 to monitor the RNA extension by denaturing PAGE. To facilitate the rapid acquisition of kinetic data (see below), the template DNA strand contained a fluorescent base analogue 6-methyl-isoxanthopterin (6-MI) eight nucleotides upstream from the RNA 3' end. 6-MI allowed the monitoring of RNAP translocation along the DNA following nucleotide incorporation (Supplementary Fig. 1a) . 6 We first measured concentration series of GMP and 2'dGMP incorporation by the wild-type and altered RNAPs using a time-resolved fluorescence assay performed in a stopped flow instrument (Supplementary Figs. 1-3) . We used the translocation assay because it allowed the rapid acquisition of concentration series, whereas measurements of concentration series by monitoring RNA extension in a rapid chemical quench-flow setup would be considerably more laborious. We then performed a preliminary data analysis by fitting each fluorescence timetrace to a single exponential function followed by fitting the resulting individual rates to a Michaelis equation. The inferred kcat and Km generally supported all major conclusions reported in this study. However, we proceeded to expand the datasets by including additional data and developed more elaborate analysis routines. The first reason to invoke a more elaborate analysis was the observation that most fluorescence time traces in our datasets fitted poorly to the single exponential function. In fact, the underlying physics of a single turnover enzymatic reaction suggests that individual timetraces in the concentration series should, in a general case, be poorly described by a single exponential function (see below). The second reason to invoke a more elaborate analysis was the concern that the Michaelis constant is a lumped constant that contains a sum of the catalytic and substrate dissociation rates in the numerator and the substrate binding rate in the denominator, whereas the equilibrium binding constants are the ratios of the substrate dissociation and binding rates. Accordingly, we were concerned that comparing the Michaelis constants of reactions could potentially lead to erroneous conclusions in the cases where the Km was markedly different from the KD. For the sake of understanding our analysis workflow, it is important to acknowledge that each reaction timetrace in the concentration series describes a single turnover process: we designed the transcribed sequence so that only a single GMP (or 2'dGMP or 3'dGMP) became incorporated upon the addition of GTP (or 2'dGTP or 3'dGTP). The ease of obtaining single turnover timetraces is a significant analytical advantage natively associated with templatedependent nucleic acid polymerases. It is often possible to infer more parameters from concentration series of single-turnover reactions than from concentration series of classic multiturnover enzymatic reactions. Next, most timetraces in the concentration series are not expected to fit a single exponential function even in the case of the simple signal, a 1-nt extended nascent RNA (RNA17 in this study). The enzymatic reaction is minimally a two-step sequential reaction that consists of the that was employed by Prajapati et al. Next, a kinetic heterogeneity in the TEC preparations introduced an additional level of complexity to the fitting of the data. We reported previously that a vast majority of TECs contain 5-20% of a slow fraction that manifests itself as a slow phase in reaction timetraces of both the fluorescence signal (stopped-flow assay) and the extended RNA (quench flow assay) 29,56 . In the case of fast reactions measured in this study (GTP, 3'dGTP data), the rates of the fast and slow phases differed approximately tenfold and therefore the phases could be precisely resolved (see a dedicated section below). Importantly, the fast phase of the reaction constituted 80-90% of the signal amplitude ( Table 1, Supplementary Table 3) . Accordingly, we considered the activity of the fast fraction as a representative measure of the RNAP activity in each experiment and disregarded the minor slow fraction when comparing the wild-type and variant RNAPs (Fig. 2) . In the case of slow reactions (2'dGTP data), the fast and the slow phases were not well separated (4-fold difference in rates, When fitting data to Equation 1, each timetrace was described by a stretched exponential function (an empirical function that is often used to describe heterogeneous systems 57 ). At the same time, the exponent followed the hyperbolic dependence on the 2'dGTP concentration ( Supplementary Figs. 1c, 3) . Such fits described the data well and gave three parameters: a reaction rate constant (k), a stretching parameter (β) and the Michaelis constant (Km). When a stretching exponential function is applied to a process where the reactivity changes over time (or distance), the rate constant parameter (k) corresponds to the initial reaction rate constant. In our case, the stretched exponential fit potentially absorbed both temporal and structural heterogeneity as well as the deviations from the single exponential behavior caused by the sequential nature of the enzymatic reaction (see above). For this reason, the rate parameter (k) did not have an easily interpretable meaning. To circumvent this problem we calculated the median reaction time as (median reaction time) = (ln(2)^(1/β)) / k; then calculated the median reaction rate assuming that (median reaction rate) = ln(2) / (median reaction time) and used the median reaction rate as a measure when comparing the wild-type and variant RNAPs (Fig. 2, Supplementary Table 4) . Next, fitting the data to Equation 1 gives the Km rather than the KD. However, it is rather certain that koff >> kcat for all 2'dGMP incorporation reactions ( Table 1 , also see Scenario 2 below). If so, Km approximately equals KD for each 2'dGMP incorporation reaction. Accordingly, we used Km in place of KD for 2'dGMP addition reactions when comparing substrates and RNAPs (Fig. 2) . Finally, we emphasize that the 2'dGMP incorporation data by the wild-type and the β'R425K RNAPs were fit to both Scheme 1 and Equation 1 leading to affinities for 2'dGTP that were indistinguishable within the margin of the experimental uncertainty (compare 2'dGTP data in Table 1 and Supplementary Table 4 ). The catalytic activity of the wild-type RNAP towards 2'dGTP inferred by fitting the data to Equation 1 was, as expected, in-between the catalytic activities of the fast and slow fraction inferred by fitting the data to Scheme 1. Accordingly, we argue that the employment of different analysis routines for GTP and 2'dGTP is of little concern for the main inferences drawn in this study. We have previously shown that the nucleotide addition and the subsequent translocation along the DNA by the wild-type E. coli RNAP occur with similar rates at saturating concentrations of cognate NTPs 29 . As a result, (i) the translocation timetraces are delayed by a few milliseconds relative to the nucleotide addition timecurves and (ii) the translocation timetraces at saturating concentrations of cognate NTP substrates are not well described by a single exponential function because both nucleotide addition and translocation are partially rate limiting. In this study, translocation rates were tangential to the main line of investigation, but they were necessary parameters during the global fitting of the fluorescence timetraces and GMP incorporation timecurves to Scheme 1. At the same time, the translocation rates are much faster than the 2'dGMP incorporation rates and could be completely disregarded during the analysis of the 2'dGTP concentration series by fitting the date to Scheme 1 or Equation 1. Supplementary Table 3 should not be equated with the forward translocation rates. Thus, we modeled translocation as an irreversible transition in Scheme 1. As a result, the inferred translocation rates are the rates of the system approaching the translocation equilibrium after the nucleotide incorporation rather than the forward translocation rates. Albeit somewhat counterintuitively but following the rules of the formal kinetics the inferred equilibration rate equals the sum of the forward and the backward translocation rates. It was possible to further split the equilibration rate into the forward and backward translocation rates by assessing the completeness of the translocation, as we did in our previous studies 56 . However, we refrained from doing so in this study because the translocation process was tangential to the main line of the investigation. Fig. 1a) . As a result, both TEC17 and TEC16NTP are detected as TEC17 in the EDTA quenched samples because nearly 100% of the TEC16NTP is converted into TEC17 after the addition of EDTA, and practically no NTP dissociates back into the solution (kcat >> koff). The above situation corresponds to 2'dGMP addition by the wild-type and variant RNAPs. Fitting the 2'dGTP concentration series to a semi-empirical Equation 1 allowed the estimation of kcat and Km 2'dGTP ≈ KD 2'dGTP for the wild-type, β'R425K, β'M932A, β'Q929M and β'N458S RNAPs Table 4 ). For the β'R425K and the wild-type RNAP we additionally measured the EDTA quench curve, fitted the data globally to Scheme 1 and inferred the lower bounds of kon and koff. in addition to KD (Table 1, Supplementary Figs. 1c, 3a) . . As always, kcat and Km can be inferred from the NTP concentration series, but neither kcat/Km ≈ kon (as is in Scenario 1) nor Km ≈ KD (as is in Scenario 2). In contrast, the global fit of the NTP concentration series and the EDTA quench data has the best resolving power in Scenario 3: kcat, kon, koff, and ktra (in some cases) can be inferred from the data though the precision of the individual estimates varies greatly. The above situation corresponds to the GMP addition by the wild-type and variant RNAPs (Supplementary Table 3 , Supplementary Figs. 1b, 2 ) and the 3'dGMP addition by the wildtype RNAP (Table 1, Supplementary Fig. 3c) . Only the wild-type RNAP data allowed for precise estimates of all parameters of Scheme 1. In the case of the β'R425K and β'Q929M for the comparison of the RNAP's capabilities to bind and utilize various substrates (Fig. 2) . Handling of the slow fraction during fitting to Scheme 1. The timecourses of the NMP incorporation by the wild-type E. coli TEC typically display a distinctive slow phase that represents 5-25% of the overall signal amplitude and features the rate of 0.1 -3 s -1 . In contrast, the major, fast phase of the reaction is approximately tenfold faster at saturating [NTP] (20 -30 s -1 for GTP). The slow phase possibly represents an inactive TEC in equilibrium with the active TEC, a fraction of the TEC that slowly reacts with the NTP substrate or a combination of both. During the fitting of the data using the Kintek Explorer software, the slow phase can be modeled in two ways (Supplementary Note Fig. 1b) . The first option is to invoke a reversible equilibrium between the active and inactive TEC and to introduce a virtual equilibration step prior to mixing of the TEC with the NTPs. We term this approach as the reversible inactivation model. The second option is to explicitly model the TEC preparation as two fractions that do not interconvert but incorporate NMP with different rates. The fractions of the slow and fast TEC are then allowed to vary as parameters during the fit. We term this approach as the nonequilibrium heterogeneity model. The two models are largely indistinguishable if measurements are carried out at a single [NTP] and both models require two parameters to describe the slow phase: inactivation and recovery rates in the first case, and the slow fraction and its reaction rate in the second case ( Supplementary Note Fig. 1b) . However, the response of the slow phase to the decrease in the [NTP] differs between these two models. The reversible inactivation model predicts that the rate of the slow phase is independent of [NTP] and the slow phase is largely abolished as the [NTP] decreases. In contrast, the non-equilibrium heterogeneity model predicts that the rate of the slow phase decreases in unison with the rate of the fast phase as [NTP] decreases (both follow a hyperbolic dependence on [NTP]). In this study we analyzed all GMP and 3'dGMP incorporation datasets using the non-equilibrium heterogeneity approach to model the slow phase, because some datasets (e.g. β'Q929M, Supplementary Fig. 2) could not be adequately fit by the previously employed reversible inactivation model 28,45,56 . Fig. 1 : Kinetic analyses of the data. a Simulation and graphic interpretation of the EDTA and HCl quench curves at saturating substrate concentrations and different values of k . b Simulation of concentration series of a off biphasic reaction using the reversible inactivation (left) and non-equilibrium catalytic heterogeneity (right) models. Origin of life: The RNA world The antiquity of RNA-based evolution Activated ribonucleotides undergo a sugar pucker switch upon binding to a single-stranded RNA template Watching DNA polymerase η make a phosphodiester bond Physiological concentrations of purines and pyrimidines Abundant ribonucleotide incorporation into DNA by yeast replicative polymerases Basic mechanisms of transcript elongation and its regulation Unlocking the sugar "steric gate" of DNA polymerases A mutant T7 RNA polymerase as a DNA polymerase Mechanism of ribose 2'-group discrimination by an RNA polymerase The structural mechanism of translocation and helicase activity in T7 RNA polymerase X-ray crystal structures elucidate the nucleotidyl transfer reaction of transcript initiation using two nucleotides Choosing the right sugar: how polymerases select a nucleotide substrate Crystal structures of open and closed forms of binary and ternary complexes of the large fragment of Thermus aquaticus DNA polymerase I: structural basis for nucleotide incorporation KlenTaq polymerase replicates unnatural base pairs by inducing a Watson-Crick geometry Interactive 3D versions of the structural figures (WebGL in browser): Supplementary Data 4: Interactive Fig. 4a, b Supplementary Data 5: Interactive Fig. 4c, d Supplementary Data 6: Interactive Fig. 4e, f Supplementary Data 2: Interactive Supplementary Fig. 8c, d Supplementary Data 3: Interactive Supplementary Fig. 8e, f Supplementary Data 1: Interactive Supplementary Fig We thank Irina Artsimovitch for critically reading the manuscript, the staff at the MacCHESS for support of crystallographic data collection, Anssi M. Malinen for constructing plasmids, Matti Turtola for his contribution to the development of the EDTA quench method. The reaction products were modelled as sums of independent contributions by the fast and slow fractions of RNAP using numerical integration capabilities of the KinTek Explorer software. Contributions of each fraction were modeled as Scheme 1. Upper and lower bounds of the parameters were calculated at a 10% increase in Chi 2 .