key: cord-0997347-m285emnw authors: Flynn, Julia M.; Samant, Neha; Schneider-Nachum, Gily; Barkan, David T.; Yilmaz, Nese Kurt; Schiffer, Celia A.; Moquin, Stephanie A.; Dovala, Dustin; Bolon, Daniel N.A. title: Comprehensive fitness landscape of SARS-CoV-2 Mpro reveals insights into viral resistance mechanisms date: 2022-01-30 journal: bioRxiv DOI: 10.1101/2022.01.26.477860 sha: 9efcde23dbcd5dd99a4debc7050c59fa32ee76b9 doc_id: 997347 cord_uid: m285emnw With the continual evolution of new strains of SARS-CoV-2 that are more virulent, transmissible, and able to evade current vaccines, there is an urgent need for effective anti-viral drugs. SARS-CoV-2 main protease (Mpro) is a leading target for drug design due to its conserved and indispensable role in the viral life cycle. Drugs targeting Mpro appear promising but will elicit selection pressure for resistance. To understand resistance potential in Mpro, we performed a comprehensive mutational scan of the protease that analyzed the function of all possible single amino acid changes. We developed three separate high-throughput assays of Mpro function in yeast, based on either the ability of Mpro variants to cleave at a defined cut-site or on the toxicity of their expression to yeast. We used deep sequencing to quantify the functional effects of each variant in each screen. The protein fitness landscapes from all three screens were strongly correlated, indicating that they captured the biophysical properties critical to Mpro function. The fitness landscapes revealed a non-active site location on the surface that is extremely sensitive to mutation making it a favorable location to target with inhibitors. In addition, we found a network of critical amino acids that physically bridge the two active sites of the Mpro dimer. The clinical variants of Mpro were predominantly functional in our screens, indicating that Mpro is under strong selection pressure in the human population. Our results provide predictions of mutations that will be readily accessible to Mpro evolution and that are likely to contribute to drug resistance. This complete mutational guide of Mpro can be used in the design of inhibitors with reduced potential of evolving viral resistance. The COVID-19 pandemic, caused by the Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-26 2), has had an unprecedented impact on global health, the world economy, and our overall way of life. Despite the rapid deployment of mRNA and traditional vaccines against SARS-CoV-2 which have served 28 to greatly improve patient outcome and decrease spread of the disease, vaccines remain unavailable in many parts of the world and there is hesitancy to get vaccinated among portions of the population. 30 Additionally, the virus appears to be evolving mutations in the Spike protein that reduce immune protection from both vaccines and prior infections. Additional strategies including direct-acting 32 antiviral drugs are needed to combat the SARS-CoV-2 pandemic. The main protease (M pro ) of SARS-CoV-2 is a promising target for drug development with many laboratories working collaboratively to 34 develop drugs against this protease, leading to thousands of M pro inhibitors in the pipeline and the first FDA-authorized clinical drug against this target. The use of drugs that target M pro will apply selection 36 pressure for the evolution of resistance. There is potential to design drugs with reduced likelihood of developing M pro resistance, but these efforts will require an in-depth understanding of the evolutionary 38 potential of the protease. 40 SARS-CoV-2 is a highly contagious virus responsible for the ongoing COVID-19 pandemic. SARS-CoV-2 belongs to the group of coronaviruses and has a positive-sense single-stranded RNA genome. 42 Immediately upon entry into the host cell, the SARS-CoV-2 virus translates its replicase gene (ORF1) into two overlapping large polyproteins produced in tandem by a ribosomal frameshift, pp1a and 44 pp1ab. These polyproteins are cleaved by two cysteine proteases, M pro (also known as the chymotrypsin-like protease, 3CL pro , or Nsp5) and the papain-like protease (PL pro ) to yield functional 46 replication machinery indispensable to viral replication. M pro initiates autoproteolysis from the pp1a and pp1ab polypeptides at its N-and C-terminus, through a poorly understood mechanism. 48 Subsequently, mature M pro cuts at 11 additional cleavage sites in both pp1a and pp1ab. The sites cut by M pro all include a conserved Gln at the P1 position, a small amino acid (Ser, Ala or Gly) at the P1' 50 position, and a hydrophobic residue (Leu, Phe, or Val) at the P2 position (Hegyi, Friebe et al. 2002 , Thiel, Ivanov et al. 2003 . Along with its vital role in the liberation of viral proteins, M pro also cleaves 52 specific host proteins, an activity which has been shown to enhance viral replication (Meyer, Chiaravalli et al. 2021) . Through its substrates, M pro function is required for almost every known step in the viral 54 life cycle. 56 M pro is a highly attractive target for drug development against SARS-CoV-2 and future coronavirusmediated pandemics for numerous reasons. M pro plays an essential functional role in the viral life cycle 58 so that blocking its function will impair viral propagation. M pro is highly conserved among all coronaviruses making it likely that inhibitors will have broad efficacy in potential future pandemics. 60 There are no human M pro homologs, and it shares no overlapping substrate specificity with any known human protease, minimizing the possibility of side effects. Additionally, its nucleophilic cysteine active 62 site enables the design of covalent inhibitors that provide advantages such as increased potency, selectivity, and duration of inhibition (Singh, Petter et al. 2011) . For these reasons, M pro has become 64 one of the most characterized SARS-CoV-2 drug targets (Jin, Du et al. 2020 , Zhang, Lin et al. 2020 , Biering, Van Dis et al. 2021 , Fischer, Veprek et al. 2021 . 66 Native M pro is a homodimer, and each monomer is composed of three domains (Jin, Du et al. 2020) . 68 Domain I (8-101) and Domain II (102-184) are comprised of antiparallel β-barrel structures. Cys145 and His41 make up M pro 's noncanonical catalytic dyad and are located in a cleft between Domains I and II. 70 Domain is an all α-helical domain that coordinates M pro dimerization, which is essential for M pro function (Tan, Verschueren et al. 2005) . Much of the structural and enzymatic knowledge of 72 SARS-CoV-2 M pro has been derived from studies of SARS-CoV-1 that caused the 2003 SARS outbreak (Ksiazek, Erdman et al. 2003) , as well as MERS-CoV that caused the 2012 MERS outbreak (Zaki, van 74 Boheemen et al. 2012). M pro from SARS-CoV-1 and SARS-CoV-2 differ in sequence at only 12 residues, however SARS-CoV-2 M pro exhibits increased structural flexibility and plasticity (Bzowka, Mitusinska et 76 al. 2020 , Estrada 2020 , Kneller, Phillips et al. 2020 . 78 We performed comprehensive mutational analysis of SARS-CoV-2 M pro to provide functional and structural information to aid in the design of effective inhibitors against the protease. Systematic 80 mutational scanning assesses the consequences of all point mutations in a gene providing a comprehensive picture of the relationship between protein sequence and function (Hietpas, Jensen et 82 al. 2011, Fowler and Fields 2014) . Mutational scanning requires a selection step that separates variants based on function. Following selection, the frequency of each variant is assessed by deep 84 sequencing to estimate functional effects. The resulting protein fitness landscape describes how all individual amino acid changes in a protein impact function and provides a detailed guide of the 86 biophysical properties that underlie fitness. Protein fitness landscapes identify mutation-tolerant positions that may readily contribute to drug resistance. These studies also elucidate mutation-88 sensitive residues that are critical to function, making them attractive target sites for inhibitors with reduced likelihood of developing resistance. The work described here focuses on fitness landscapes 90 without drug pressure because these provide critical information regarding M pro mechanism and evolutionary potential that we hope will be useful in the efforts to combat SARS-CoV-2. We are 92 pursuing investigations in the presence of inhibitors, but these experiments will require further optimization steps to make our yeast-based assays compatible with inhibition. Of note, mutational 94 scans of other drug targets including lactamases (Deng, Huang et al. 2012 , Firnberg, Labonte et al. 2014 ) and oncogenes (Choi, Landrette et al. 2014 , Ma, Boucher et al. 2017 ) have demonstrated the 96 potential to accurately identify and predict clinically-relevant resistance evolution. 98 In this study, we used systematic mutational scanning to analyze the functional effects of every individual amino acid change in M pro . We developed three orthogonal screens in yeast to separate 100 M pro variants based on function (Figure 1 ). The first screen measures M pro activity via loss of Fluorescence Resonance Energy Transfer (FRET) from a genetically-encoded FRET pair linked by the 102 Nsp4/5 cleavage sequence. The second screen similarly measures cleavage of the Nsp4/5 cut site; however, in this screen M pro cleavage leads to inactivation of a transcription factor driving GFP 104 expression. The final screen leverages the toxicity of wild-type (WT) M pro to yeast, which leads to depletion of active variants during growth. Following selection in the three screens, populations were 106 subjected to deep sequencing in order to quantify function based on the enrichment or depletion of each variant. 108 We found that the functional scores between screens were correlated, indicating that they all captured 110 key biophysical properties governing function. Our functional scores also correlated well with previously measured catalytic rates of purified individual mutants. Additionally, substitutions in M pro 112 from coronaviruses distantly related to SARS-CoV-2 consistently exhibited high function in our screens indicating that similar biophysical properties underlie the function of genetically-diverse M pro 114 sequences. Our study revealed mutation-sensitive sites distal to the active site and dimerization interface. These allosteric sites reveal important communication networks that may be targeted by 116 inhibitors. Our results provide a comprehensive dataset which can be used to design molecules with decreased vulnerability to resistance, by building drug-protein interactions at mutation-sensitive sites 118 while avoiding mutation-tolerant residues. 120 The main protease of SARS-CoV-2 is produced by self-cleavage of polyproteins translated from the viral 124 RNA genome, and its enzymatic activity is inhibited by the presence of additional N-and C-terminal amino acids (Xue, Yang et al. 2007 ). To express M pro with its authentic N-terminal serine residue, we 126 generated a Ubiquitin-M pro fusion protein. In yeast and other eukaryotes, Ubiquitin (Ub) fusion proteins are cleaved by Ub-specific proteases directly C-terminal to the Ub, revealing the N-terminal 128 residue of the fused protein, regardless of sequence (Bachmair, Finley et al. 1986 ). Expression of functionally-active M pro is toxic to yeast FRET-based reporter screen. M pro variants were sorted based on their ability to cleave at the M pro cut-site, separating the YFP-CFP FRET pair. Cells were separated by FACS into cleaved (low FRET) and uncleaved (high FRET) populations. B. Split transcription factor screen. M pro variants were sorted based on their ability to cleave at the M pro cut-site, separating the DNA binding domain (DBD) and activation domain (AD) of the Gal4 transcription factor. The transcription factor drives GFP expression from a galactose promoter. Cells were separated by FACS into cleaved (low GFP expression) and uncleaved (high GFP expression) populations. C. Growth screen. Yeast cells expressing functional M pro variants that cleave essential yeast proteins grow slowly and are depleted in bulk culture, while yeast cells expressing non-functional M pro variants are enriched. D. Barcoding strategy to measure frequency of all individual mutations of M pro in a single experiment. cells (Alalam, Sigurdardottir et al. 2021 ). To control the expression level of M pro while limiting its toxic side effects, we placed Ub-M pro under control of the inducible and engineered LexA-ER-AD 132 transcription factor (Ottoz, Rudolf et al. 2014) . LexA-ER-AD is a fusion of the bacterial LexA DNAbinding protein, the human estrogen rector (ER) and the B112 activation domain, and its activity is 134 tightly and precisely regulated by the hormone β-estradiol. We inserted 4 lexA boxes recognized by the LexA DNA binding domain upstream of Ub-M pro to control its expression. The Western blot in 136 Figure S1a illustrates both induction of M pro by β-estradiol and successful removal of the Ub moiety, indicating that the protease is being expressed in its mature and functional form. We performed a 138 titration curve with β-estradiol to determine the lowest concentration at which M pro can be expressed without inhibiting yeast cell growth while still enabling measurement of substrate cleavage ( Figure 140 S1b). We developed three distinct yeast screens to characterize the effects of M pro variants on function 144 ( Figure 1 ). The first screen utilized a FRET-based reporter of two fluorescent proteins, YPet and CyPet, fused together with the Nsp4/5 M pro cleavage site engineered in the middle (YPet-M pro CS-CyPet) 146 ( Figure 1a ). The YPet-CyPet pair are derivatives of the YFP-CFP proteins that have been fluorescently optimized by directed evolution for intracellular FRET (Nguyen and Daugherty 2005) and provide a 20-148 fold signal change upon cleavage. The linker between the two fluorescent proteins contains the M pro cleavage site, TSAVLQ|SGFRK, the cut-site at the N-terminus of the M pro protease. This is the most 150 commonly used cut-site for in vitro cleavage assays, which allowed us to directly compare our mutational results to those that were previously published. One advantage of this assay is that the 152 fluorescent readout directly reports on cleavage of a specific cut-site. The plasmid containing Ub-M pro under the control of β-estradiol was transformed into yeast cells expressing a chromosomally 154 integrated copy of YPet-M pro CS-CyPet. Expression of WT M pro led to a β-estradiol-dependent decrease in FRET signal as measured by fluorescence-activated single cell sorting (FACS). Mutation of the 156 essential catalytic cysteine of M pro to alanine (C145A) abolished this change in FRET signal indicating that the change in signal was dependent on the presence of functional M pro ( Figure S1c) . 158 The second screen utilized the DNA binding domain and activation domain of the Gal4 transcription 160 factor, separated by the Nsp4/5 cut site (Johnston, Zavortink et al. 1986 , Murray, Hung et al. 1993 . We used this engineered transcription factor (TF) to drive GFP expression, enabling cells with varying 162 levels of M pro protease activity to be separated by FACS (Figure 1b) . One benefit of this system is its signal amplification, as one cut transcription factor can cause a reduction of more than one GFP 164 molecule. However, due to this amplification, the fluorescent signal is indirectly related to cutting efficiency. Expression of Ub-M pro in cells engineered with the split transcriptional factor exhibited a β-166 estradiol-dependent decrease in GFP reporter activity that required the presence of catalyticallyfunctional M pro protein ( Figure S1d ). 168 Figure S1 . M pro expression in cells harboring the LexA-UbM pro plasmid construct. A. Yeast cells transformed with a plasmid expressing C145A Ub-M pro -His6 under the LexA promoter were grown to exponential phase followed by the addition of 2 µM β-estradiol to induce expression for 8 hours. M pro levels were monitored by Western blot with an anti-His6 antibody and the correct size was measured against purified M pro -His6 protein (control). B. The plasmid expressing WT Ub-M pro under control of the LexA promoter was transformed into cells expressing the split transcription factor. Cells were grown to exponential phase followed by addition of the indicated concentration of β-estradiol. Cell density was monitored based on absorbance at 600 nm at the times indicated (left panel). At the same time points, cells were washed, diluted to equal cell number, and GFP fluorescence was monitored at 525 nm (right panel). C. FACS analysis of cells expressing the CFP-M pro CS-YFP FRET pair and either WT Ub-M pro (left) or C145A Ub-M pro (right). Cell samples were collected before and after induction of M pro expression with 125 nM β-estradiol for 1.5 hours. D. FACS analysis of cells expressing the split transcription factor separated by the M pro cut-site and either WT Ub-M pro (left) or C145A Ub-M pro (right). Cell samples were collected before and after induction of M pro expression with 125 nM β-estradiol for 6 hours. E. Distribution of number of barcodes associated with all M pro variants. The final screen leverages the toxicity of M pro expression in yeast, which likely results from cleavage of essential yeast proteins by the protease (Alalam, Sigurdardottir et al. 2021) . Increasing concentrations 170 of β-estradiol correlates with a decrease in yeast growth rate that is dependent on the presence of catalytically-functional M pro (Figure 1c and S1b). At a high expression level, yeast growth rate becomes 172 tightly coupled to M pro function and can be used as a readout of the function of the expressed M pro variant. While the endogenous yeast substrates are unknown, this assay is likely reporting on M pro 174 cleavage of numerous cellular targets. Sampling of more than one cleavage site may better represent the physiologic role of M pro , which has 11 viral and numerous host cleavage sites. Comprehensive deep mutational scanning of M pro 178 We integrated our three screens with a systematic mutational scanning approach to determine the impact of each single amino acid change in M pro on its function ( Figure 1d ). A comprehensive M pro 180 single site variant library was purchased commercially (Twist Biosciences). Each position of M pro was mutated to all other 19 amino acids plus a stop codon, using the preferred yeast codon for each 182 substitution. We transferred the library to a plasmid under the LexA promoter. To efficiently track each variant of the library using deep sequencing, we employed a barcoding strategy that allowed us 184 to track mutations across the gene using a short sequence readout. We engineered the barcoded library so that each mutant was represented by 20-40 unique barcodes and used PacBio sequencing to 186 associate barcodes with M pro mutations ( Figure 1d ). 96% of library variants were linked to 10 or greater barcodes ( Figure S1e ). As a control, the library was doped with a small amount of WT M pro 188 linked to approximately 150 barcodes. 190 We transformed the plasmid library of M pro mutations into yeast strains harboring the respective reporter for each functional screen. The mutant libraries were amplified in the absence of selection 192 and subsequently β-estradiol was added to induce M pro expression. For the fluorescent screens, the cells were incubated with β-estradiol at the concentration determined to limit M pro toxicity (125 nM) 194 for the time required for WT M pro to achieve full reporter activity (1.5 hours for the FRET screen and 6 hours for the TF screen). Subsequently cells were separated by FACS into populations with either 196 uncleaved or cleaved reporter proteins (See Figure 1a and 1b) . For the growth screen, cells were incubated with a higher concentration of β-estradiol determined to slow yeast growth (2 µM) ( Figure 198 1c and S1b). Populations of cells were collected at the 0-and 16-hour time points. For each cell population in each screen, plasmids encoding the mutated M pro library were recovered, and the 200 barcoded region was sequenced using single end Illumina sequencing. For the TF and FRET screens, the functional score of each mutant was calculated as the fraction of the mutant in the cut population 202 relative to its fraction in both populations. For the growth screen, the functional score was calculated as the fraction of the mutant at the 0-hour time point relative to the fraction in the 0-hour and 16-hour 204 time points. We normalized the functional scores in all three screens to facilitate comparisons, setting the score for the average WT M pro barcode as 1 and the average stop codon as 0 (See Table 1 for all 206 functional scores). 208 To analyze the reproducibility of each screen, we performed biological replicates. For each biological replicate we separately transformed the library into yeast cells, and independently performed 210 competition experiments and sequencing. Functional scores between replicates were strongly correlated (R 2 > 0.98 for all three screens, Figure Table 234 2). For example, mutating the residues of the catalytic dyad, C145 and H41, inactivates the protease both in our screen and in in vitro biochemical assays as expected (Hegyi, Friebe et al. 2002) . 236 Additionally, in vitro assays have shown that residues at the dimer interface including S10, G11 and The average functional score at each position mapped to M pro structure for each screen. The Nsp4/5 substrate peptide is shown in green (PDB 7T70). C. The average functional score at each position compared between the three screens. D. Comparison between relative catalytic rates measured independently in various studies and functional scores measured in each screen (see Table 2 for data). Each graph is fit with a non-linear and linear regression with the best of the two fits represented with a black solid line and the worst fit represented with a red dashed line. The non-linear regression is fit to the equation Y = Ym-(Y0-Ym)e -kx . E14 are essential for SARS-CoV-1 M pro dimerization and function (Chen, Zhang et al. 2008 ). Mutations at these residues are also deleterious to M pro function in our screen. Because of the high sequence and 240 functional similarities between SARS-CoV-1 and CoV-2 M pro , we expect that the majority of the mutational analyses performed previously on SARS-CoV-1 M pro will be valid for SARS-CoV-2 M pro . We 242 observe an apparent non-linear relationship between the functional scores measured in both the FRET and TF screens and the relative catalytic activity of mutants measured independently for M pro in vitro in 244 various studies (R 2 = 0.81 for non-linear fit to TF screen and R 2 = 0.93 for non-linear fit to FRET screen) ( Figure 4d ). Compared to the fluorescent screens, there is a stronger linear relationship (R 2 = 0.86) 246 between the scores measured in our growth screen and the catalytic efficiencies of the individual mutants. The growth screen appears to more fully capture the dynamic range of mutations with slight 248 functional defects that tend to appear WT-like in the FRET and TF screens. For the remainder of this paper, we will report the functional scores collected from both the FRET and growth screens. The 250 advantage of the functional scores for each mutant from the FRET screen is that they report direct cleavage of a defined substrate, with the drawback being that they exhibit less sensitivity to mutation. 252 The advantage of the growth screen is that the functional scores show a more linear relationship with catalytic rate while the drawback is that the screen reports cleavage of undefined substrates. Because 254 of the correlation between all three screens, similar overall biophysical conclusions are supported by each screen. function: P9, S10, G11, E14, R40, H41, T111, S113, R131, C145, G146, S147, G149, F150, H163, G174, G179, G183, D187, D197, N203, D289, E290, and D295 (Figure 6a ). Only four of these mutation-294 sensitive residues contact the substrate: H41 and C145 (the catalytic residues), as well as H163, and D187. H163 interacts with the invariable P1 Gln of the substrate and D187 forms a hydrogen bond 296 with a catalytic water and a salt bridge with R40. A large body of work has previously shown that dimerization is indispensable to M pro function (Chou, Chang et al. 2004 , Hsu, Chang et al. 2005 298 Zhang et al. 2008 , Cheng, Chang et al. 2010 ). Our study also supports the critical functional role of dimerization as we see prevalent mutation-sensitivity in residues at the dimer interface, including P9, 300 S10, G11, E14, and E290, each of which cannot be altered without complete loss of function. Outside of these well-studied critical M pro sites, there are additional clusters of mutation-intolerant residues. R131, D197, N203, D289 and E290 lie at the interface of Domain II and Domain III sandwiched 304 between dimers and make up part of a surface identified by structural modeling as a possible distal drug binding pocket (Bhat, Chitara et al. 2021 , Weng, Naik et al. 2021 ) (Figure 6b ). Within this cluster, 306 a dynamic salt bridge is formed between R131 located on the loop of Domain II connecting β10-11 of the catalytic pocket, and D289 in the α-helical Domain III that has been reported to contribute to the 308 flexibility and structural plasticity of M pro (Bhat, Chitara et al. 2021) . The location of these residues at the interface of the two domains and the dimer interface, combined with the fact that they are critical 310 to M pro function suggests that they are part of an allosteric communication network. Our studies clearly indicate the critical function played by this network of residues providing motivation for further 312 examination of their potential as a mutation-resistant target for inhibitor design. A second cluster of mutation-intolerant residues appear to be part of an allosteric communication network between the active site and the dimerization interface. Prior studies of individual mutations 316 also suggest allosteric connections between the dimerization and active sites. Mutations at both E166 (Cheng, Chang et al. 2010 ) and S147 (Barrila, Bacha et al. 2006 ) were found to disrupt dimerization. 318 Both positions E166 and S147 are located distal to the dimerization site, suggesting that the properties of these two sites are interdependent. Our results show that there is a physically-interacting chain of 320 mutation-sensitive residues that bridge from the active site to the dimerization site ( Figure 6c ). This bridge is composed of H163 that directly contacts the P1 Gln of substrate, S147, L115 and S10 at the 322 dimer interface. Each of these dimer-to-active site bridging residues are critical to M pro function and are strongly conserved among M pro homologs. Based on these observations, we suggest that the physical interactions between H163, S147, L115, and S10 mediate critical communication between the active sites of both subunits in the M pro dimer. 326 All 24 of the identified mutation-intolerant residues are highly conserved among SARS-CoV-2 M pro 328 homologs (Figure 6d ). While functional hot spots accurately predict evolutionary conservation, conservation does not accurately predict functional hot spots. There are many residues in M pro that are 330 strongly conserved, but that can be mutated without strong impacts on function. This pattern has been widely observed for other proteins (Hietpas, Jensen et al. 2011 , Melamed, Young et al. 2013 small for experimental resolution can drive selection in natural evolution due to large population sizes and timescales (Ohta 1973 ). Our functional screen captures the mutations that are critical to catalytic 338 function while evolutionary conservation depicts a wide range of mutations including those that make more nuanced contributions to function. M pro function is essential for SARS-CoV-2 replication, making it a key drug target. To help further guide inhibitor design, we assessed the mutations that are compatible with function and that should be 344 readily available to the evolution of drug resistance. We focused these analyses on the active site, which is the target binding site for most inhibitors that have been generated against M pro (Cho, Rosa et 346 al. 2021 ). In Figure 7a , we highlight all the M pro residues that contact the Nsp4/5 peptide, either through hydrogen bonds or van der Waals interactions (Shaqra, Zvornicanin et al. 2022 ). In our 348 functional screens, we found dramatic variability in mutational sensitivity at these substrate-contact positions. For example, residues G143, H163, D187 and Q192 were extremely sensitive to mutation 350 while residues M49, N142, E166 and Q189 were highly tolerant. Despite the diverse sequence variation amongst M pro 's substrates, they occupy a conserved volume in the active site, known as the 352 substrate envelope, and the interactions between M pro 's residues and all of its substrates are highly conserved (Shaqra, Zvornicanin et al. 2022) indicating that our mutation results from the Nsp4/5 cut-354 site will likely translate to other cut-sites. (Owen, Allerton et al. 2021 , Zhao, Fang et al. 2021 ). PF-07321332 is shown in green, oxygens in red, nitrogens in cyan. Hydrogen bonds less than 4 Å are represented with thick yellow dashed lines and greater than 4 Å with a thin yellow dashed line. The table below lists the mutations with highest potential for being resistant against PF-07321332. Even among residues whose side chains make direct hydrogen bonds with substrates are positions that 358 are surprisingly tolerant to mutation, namely N142, E166 and Q189. N142 forms distinct hydrogen bonds with Nsp4/5 and Nsp8/9, which has been proposed as a mechanism of M pro substrate 360 recognition (MacDonald, Frey et al. 2021) . Q189 is in a flexible loop that closes over the substrates, allowing accommodation of diverse cut-sites (Shaqra, Zvornicanin et al. 2022) . In our screens, we find 362 that these proposed substrate-recognition positions are very tolerant to mutation (Figure 7a ) and have high potential for developing inhibitor resistance. Our results indicate that mutations at N142, E166 364 and Q189 are compatible with function and are readily available to the evolution of drug resistance. A recent study comprehensively examined 233 X-ray crystal structures of SARS-CoV-2 M pro in complex with a wide range of inhibitors (Cho, Rosa et al. 2021) . In 185 of these 233 structures, inhibitors lie in 368 the same binding pocket in the active site, primarily contacting M pro positions T25, H41, M49, N142, S144, C145, H163, H164, E166, P168, H172, Q189 and A191. We therefore went on to determine the 370 mutations at these key inhibitor binding residues that are compatible with M pro function and should likely be available to resistance evolution. Figure 7b illustrates a representative structure of M pro 372 bound to the N3 inhibitor with the average mutational sensitivity of each position mapped to the structure by color (Jin, Du et al. 2020 ). In addition, a heatmap is shown detailing the mutations at 374 these positions that are compatible with function. Of note, residues N142, E166, and Q189 form direct hydrogen bonds with many M pro inhibitors and most mutations at these positions result in a functional 376 protease. Additionally, T25, M49, M164, P168 and A191 form van der Waals interactions with a variety of inhibitors suggesting that mutations at these positions could disrupt inhibitor interactions while 378 maintaining M pro function. In contrast, positions H41, S144, C145, H163 and H172 are highly sensitive in our screen, as well as strongly conserved in nature, and therefore would be ideal contact positions 380 for inhibitors with reduced likelihood of evolving M pro resistance. Pfizer has developed the first FDA-authorized M pro inhibitor, PF-07321332 (Owen, Allerton et al. 2021) . We examined the structure of M pro bound to PF-07321332 to identify positions with the potential to 384 evolve resistance against this drug (Figure 7c ) (Zhao, Fang et al. 2021) . Evolutionarily-accessible resistance mutations are single base change mutations that would disrupt inhibitor binding while 386 maintaining WT-like substrate recognition and cleavage. We identified all mutations of M pro that have WT-like function in both the FRET and growth screens, would lead to a predicted decrease in inhibitor 388 binding energy upon mutation of greater than 1 kcal/mol, and are accessible with a single nucleotide base change. These criteria led to the identification of three mutations, Q189E, E166A and E166Q with 390 potential resistance against PF-07321332. These three positions are at sites where the inhibitor protrudes out of the defined substrate envelope, providing further evidence that these residues may 392 evolve inhibitor resistance while maintaining substrate recognition (Shaqra, Zvornicanin et al. 2022) . Of note, Q189E is a natural variant in both the avian infectious bronchitis virus (IBV) and the swine 394 coronavirus, HKU15 CoV, widely detected in pigs in Asia and North America and of pandemic concern due to its ability to replicate in human cells (Edwards, Yount et al. 2020 ). PF-07321332 may have 396 reduced efficacy against these concerning homologs due to its decreased interactions with Q189E M pro . 398 In addition to the impacts on side-chain properties, mutations in M pro may also impact resistance through changes in main-chain conformation and dynamics, particularly in loops. In-depth structural 400 analyses will be important to extensively assess the potential impacts of mutations on resistance through these mechanisms. Of note, mutations at N142 appears of particular interest for further 402 investigation of conformational changes that may impact resistance evolution. N142 is mutation tolerant and located in a loop over the P1 position of the substrate. The lactam ring on PF-07321332 404 protrudes outside of the substrate envelope at this location (Shaqra, Zvornicanin et al. 2022) . Mutations at position 142 should be readily available to M pro evolution and appear likely to influence 406 loop conformation at a site where PF-07321332 extends beyond the substrate envelope. Together these observations suggest that N142 warrants further attention as a potential contributor to drug 408 resistance. During the SARS-CoV-2 pandemic, intensive efforts have been launched to rapidly develop vaccines 412 and anti-viral drugs to improve human health. In this study, we provide comprehensive functional information on a promising therapeutic target, M pro , with the hopes that these results will be useful in 414 the design of more effective and long-lasting anti-SARS-CoV-2 drugs. We built three yeast screens to measure the functional effects of all individual amino acid changes in M pro . The resulting fitness 416 landscapes provide information on residues to both target and avoid in the drug design process. In the active site, the primary current target of M pro inhibitors, our results indicate both mutation-sensitive 1974, Kacser and Fell 1995, Jiang, Mishra et al. 2013 ) and it may be likely that M pro function must be decreased by a large amount in order to cause measurable changes in viral replication efficiency. This 434 relationship between M pro function and SARS-CoV-2 fitness would need to be determined in order to translate our functional scores to fitness scores. Additionally, our TF and FRET screens quantify 436 cleavage at one defined site (Nsp4/5) and it may be important to analyze all sites in order to fully understand the selection pressures acting on M pro . Another important caveat is that our fitness 438 landscape captures single amino acid changes and therefore does not provide information on the potential interdependence or epistasis between double and higher order mutations. Information 440 regarding epistasis will be important for accurately predicting the impacts of multiple mutations on fitness. Despite these caveats, the similarity in fitness landscapes for the TF and FRET screens with the 442 yeast growth screen suggests that all three capture fundamental and general aspects of M pro selection. In addition, the high function of almost all naturally occurring substitutions in the diversity of natural 444 M pro sequences indicates that estimates of fitness effects in different genetic backgrounds can be made based on our results. 446 We believe that our results will be a useful guide for the continuing intense efforts to develop drugs 448 that target M pro and the interpretation of future M pro evolution in the face of drug pressure. In particular, our results identify amino acid changes that can be functionally tolerated by M pro that are 450 likely to disrupt binding to inhibitors. In a recent study, Shaqra, Schiffer and colleges mapped the M pro substrate envelope; locations where the inhibitors protrude from this envelope is an indicator of 452 susceptibility to resistance mutations (Shaqra, Zvornicanin et al. 2022) . The information in these two studies provides a new view into resistance evolution that can be incorporated into ongoing drug 454 design efforts. Locations in the active site as well as at a likely allosteric site that cannot readily evolve without compromising function are ideal targets for anchoring inhibitors with reduced potential to 456 evolve drug resistance. 458 Our next steps involve developing efficient strategies for assaying M pro fitness landscapes in the presence of potential inhibitors in order to define structure-resistance relationships. This would 460 provide critical guidance for reducing the likelihood of resistance at earlier stages of drug development than is currently possible. For example, it would identify inhibitors with the least likelihood of 462 developing resistance. It would also provide the potential for identifying inhibitors with nonoverlapping resistance profiles that if used in combination would not be susceptible to resistance from 464 an individual mutation. There are technical hurdles to overcome in using our yeast-based screens to investigate resistance because many small-molecules are ineffective due to poor permeability and/or 466 export from yeast. We are assessing strategies to both increase the druggability of yeast and porting our assays to mammalian cells (Chinen, Hamada et al. 2017 ). The results from our current work on M pro 468 in yeast as well as previous studies using fitness landscapes to analyze drug resistance in other proteins (Deng, Huang et al. 2012 , Choi, Landrette et al. 2014 , Firnberg, Labonte et al. 2014 2017) indicates a strong potential of these approaches to improve our understanding and ability to combat resistance evolution. 472 474 476 The Ubiqutin-M pro gene fusion was constructed using overlapping PCR of the yeast ubiquitin gene and SARS-CoV-2 M pro gene (Jin, Du et al. 2020) and was inserted into the pRS416 vector after digestion with 482 SpeI and BamHI. Four LexA boxes were amplified from the LexAbox4_citrine plasmid (FRP793_insul-(lexA-box)4-PminCYC1-Citrine-TCYC1 was a gift from Joerg Stelling; Addgene plasmid # 58434; 484 http://n2t.net/addgene:58434) (Ottoz, Rudolf et al. 2014) and inserted between the SacI and SpeI sites upstream of the ubiquitin-M pro gene. The LexA_ER_B112 transcription factor was amplified from 486 Addgene_58437 (FRP880_PACT1(-1-520)-LexA-ER-haB112-TCYC1 was a gift from Joerg Stelling; Addgene plasmid # 58437; http://n2t.net/addgene:58437) (Ottoz, Rudolf et al. 2014) School, Worcester, MA). PacBio circular consensus sequences (CCS) were generated from the raw reads using SMRTLink v.10.1 and standard Read-Of-Insert (ROI) analysis parameters. After filtering low-522 quality reads (Phred scores < 10), the data was organized by barcode sequence using custom analysis scripts that are available upon request. For each barcode that was read more than three times, we 524 generated a consensus of the M pro sequence that we compared to WT to call mutations. 526 As a control for library experiments, the WT Ub-M pro gene was also barcoded with approximately 150 unique barcode sequences. The randomized 18 bp barcode sequence (N18) was cloned between the 528 NotI and AscI sites upstream of the LexA promoter sequence in the p416LexA-Ub-M pro (WT)-B112 vector with the goal of the WT sequence being represented by approximately 100 barcodes. The 530 barcoded region of the plasmid was amplified by PCR using the primers listed in Table S1 (for the WT barcoding it was not necessary to avoid strand recombination) and sequenced by EZ Amplicon deep 532 sequencing (www.genewiz.com). The GFP reporter strain was generated by integration of GFP driven by a Gal1 promoter together with a 536 HIS3 marker into the HO genomic locus. The Gal4, Gal80 and Pdr5 genes were disrupted to create the following strain: W303 HO:: gal4::trp1; gal80::leu2 pdr5::natMX. 538 The Gal4 Table S1 , was transformed into the 550 reporter stain using LiAc/PEG transformation (Gietz, Schiestl et al. 1995) , and successful integration of the module into the adenine biosynthesis gene was verified by PCR. lacking adenine and uracil to select for the presence of the M pro variant plasmid) to remove extracellular DNA and grown in 500 mL SD-A-U media at 30°C for 48 hours with repeated dilution to 564 maintain the cells in log phase of growth and to expand the library. Subsequently, the library was diluted to early log phase in 100 mL of SD-A-U, grown for two hours, the culture was split in half, and 566 125 nM β-estradiol (from a 10 mM stock in 95% ethanol) was added to one of the cultures to induce Ub-M pro expression. Cultures with and without β-estradiol were grown with shaking at 180 rpm for 6 568 hours at which point samples of ~10 7 cells were collected for FACS analysis. A sample of 10 7 cells were washed three times with 500 µL of Tris-Buffered Saline containing 0.1% 572 Tween and 0.1% bovine serum albumin (TBST-BSA). Cells were diluted to 10 6 /mL and transferred to polystyrene FACS tubes. Samples were sorted for GFP and mCherry expression on a FACS Aria II cell 574 sorter with all cells expressing cut TF (low GFP expression) in one population and uncut TF (high GFP expression) in a second population. To ensure adequate library coverage, we sorted at least 1.5 million 576 cells of each population and collected them in SD-A-U media. Sorted yeast cells were amplified in 20 mL SD-U-A media for 10 hours at 30°C. These yeast samples were collected by centrifugation and cell 578 pellets were stored at -80°C. Generating FRET strain 582 The YPet-CyPet FRET pair is a YFP-CFP fluorescent protein pair that has been fluorescently optimized by directed evolution for intracellular FRET (Nguyen and Daugherty 2005 sequence and a reverse primer containing the XhoI site (XhoI_R). The CyPet-M pro CS-YPet fusion gene was generated using the overlapping CyPet-M pro CS and M pro CS-YPet products from above as templates 592 and BamHI_F and XhoI_R primers. The resulting CyPet-M pro CS-YPet gene was inserted between the BamHI and XhoI sites downstream of the TEF promoter in the integrative bidirectional pDK-ATG 594 plasmid (kindly provided by D. Kaganovich) (Amen and Kaganovich 2017) . The fragment for genomic integration was generated by PCR with the primers listed in Table S1 , was transformed into W303 596 (leu2-3,112 trp1-1 can1-100 ura3-1 ade2-1 his3-11,15) using LiAc/PEG transformation (Gietz, Schiestl et al. 1995) , and successful integration of the module into the adenine biosynthesis gene was verified by 598 PCR. The plasmid library including the barcoded WT plasmid was transformed as above using the lithium 602 acetate procedure into W303 Ade::TEF-CyPet-M pro CS-YPet cells. Sufficient transformation reactions were performed to attain about 5 million independent yeast transformants representing a 50-fold 604 sampling of the average barcode. Cultures were grown and induced with β-estradiol as above for the transcription factor screen with the exception that cells were induced for 1.5 hours. Samples of 10 7 606 cells were collected for FACS analysis. A sample of 10 7 cells were washed three times with 500 µL of TBST-BSA. Cells were diluted to 10 6 /mL 610 and transferred to polystyrene FACS tubes. Samples were sorted for YFP and CFP expression on a FACS Aria II cell sorter with all cells expressing cut FRET pair (low FRET) in one population and uncut FRET 612 pair (high FRET) in a second population. To ensure adequate library coverage, we sorted at least 3 million cells of each population and collected them in SD-A-U media. Yeast samples were collected by 614 centrifugation and cell pellets were stored at -80°C. The plasmid library including the barcoded WT plasmid was transformed as above using the lithium 618 acetate procedure into W303 cells. Libraries were expanded as for the split TF screen, and then the library was diluted to early log phase in 100 mL of SD-A-U, grown for two hours, the culture was split in 620 half, and 2 µM β-estradiol (from a 10 mM stock in 95% ethanol) was added to one of the cultures to induce Ub-M pro expression. Cultures with and without β-estradiol were grown with shaking at 180 rpm 622 for 16 hours with dilution after 8 hours to maintain growth in exponential phase. Samples of ~10 8 cells were collected by centrifugation and cell pellets were stored at -80°C. DNA preparation and sequencing 626 We isolated plasmid DNA from each FACS cell population and the time points from the growth experiment as described (Jiang, Mishra et al. 2013) . Purified plasmid DNA was linearized with AscI. 628 Barcodes were amplified with 22 cycles of PCR using Phusion polymerase (NEB) and primers that add Illumina adapter sequences and a 6 bp identifier sequence used to distinguish cell populations. PCR 630 products were purified two times over silica columns (Zymo Research) and quantified using the KAPA SYBR FAST qPCR Master Mix (Kapa Biosystems) on a Bio-Rad CFX machine. Samples were pooled and 632 sequenced on an Illumina NextSeq instrument in single-end 75 bp mode. We analyzed the Illumina barcode reads using custom scripts that are available upon request. Illumina 636 sequence reads were filtered for Phred scores > 10 and strict matching of the sequence to the expected template and identifier sequence. Reads that passed these filters were parsed based on the 638 identifier sequence. For each screen/cell population, each unique N18 read was counted. The unique N18 count file was then used to identify the frequency of each mutant using the variant-barcode 640 association table. To generate a cumulative count for each codon and amino acid variant in the library, the counts of each associated barcode were summed. To facilitate analysis of expression levels of M pro and examine effective removal of Ubiquitin, a his tag 654 was fused to the C-terminus of M pro to create the plasmid p416LexA-UbM pro -his6-B112. In addition, the C145A mutation was created by site-directed mutagenesis to ensure cleavage by Ub specific proteases 656 and to reduce the toxicity caused by WT M pro expression. W303 cells were transformed with the p416LexA-UbM pro (C145A)-his construct and the resulting yeast cells were grown to exponential phase 658 in SD-ura media at 30°C. 125 nM β-estradiol was added when indicated and cells were grown for an additional eight hours. 10 8 yeast cells were collected by centrifugation and frozen as pellets at −80°C. Cells were lysed by vortexing the thawed pellets with glass beads in lysis buffer (50 mM Tris-HCl pH 7.5, 5 mM EDTA and 10 mM PMSF), followed by addition of 2% Sodium dodecyl sulfate (SDS). Lysed cells 662 were centrifuged at 18,000 g for 1 min to remove debris, and the protein concentration of the supernatants was determined using a BCA protein assay kit (Pierce) compared to a Bovine Serum 664 Albumin (BSA) protein standard. 15 µg of total cellular protein was resolved by SDS-PAGE, transferred to a PVDF membrane, and probed using an anti-his antibody (ref Identifying mutations in circulating SARS-COV-2 sequences 676 The complete set of SARS-COV-2 isolate genome sequences was downloaded from the GISAID database. The SARS-COV-2 M pro reference sequence (NCBI accession NC_045512.2) was used as a 678 query in a tBLASTn search against the translated nucleotide sequences of these isolates to identify the MPro region and its protein sequence for each isolate, if present. Mpro sequences were discarded if 680 they contained 10 or more ambiguous "X" amino acids, or had amino acid length less than 290. A multiple sequence alignment was performed and for each of the twenty standard amino acids, the 682 number of times it was observed at each position in the MPro sequence was calculated. 684 A Genetic 698 Trap in Yeast for Inhibitors of SARS-CoV-2 Main Protease Integrative modules for efficient genome engineering in yeast In vivo half-life of a protein is a function of its amino-terminal 702 residue Long-range cooperative interactions modulate dimerization in SARS 704 3CLpro Targeting allosteric pockets of SARS-706 CoV-2 main protease M(pro) Screening a 710 Library of FDA-Approved and Bioactive Compounds for Antiviral Activity against SARS-CoV-2 Quantifying and understanding the fitness effects of protein mutations: Laboratory versus nature Structural and Evolutionary Analysis Indicate That the SARS-CoV-2 Mpro Is a Challenging Target for Small-Molecule Inhibitor 716 Design Residues on the dimer interface of SARS 718 coronavirus 3C-like protease: dimer stability characterization and enzyme catalytic activity analysis Mutation of Glu-166 blocks the substrate-induced dimerization of SARS coronavirus main protease Multidrug Sensitive Yeast Strains, Useful Tools for Chemical Genetics. The Yeast Role in Medical Applications Dynamic Profiling of 726 beta-Coronavirus 3CL M(pro) Protease Ligand-Binding Sites Identification of PLX4032-resistance mechanisms and implications for novel RAF inhibitors Quaternary structure of the severe acute respiratory syndrome (SARS) coronavirus main protease Deep sequencing of systematic combinatorial libraries reveals beta-lactamase sequence constraints at high 734 resolution Swine acute diarrhea syndrome coronavirus replication in primary human cells reveals potential susceptibility to infection Topological analysis of SARS CoV-2 main protease A comprehensive, high-resolution map of a gene's fitness landscape De novo Design of SARS-CoV-2 Main Protease Inhibitors Deep mutational scanning: a new style of protein science Studies on the transformation of intact yeast 748 cells by the LiAc/SS-DNA/PEG procedure Mutational analysis of the active centre of 750 coronavirus 3C-like proteases A linear steady-state treatment of enzymatic chains. General properties, 752 control and effector strength Experimental illumination of a fitness landscape Critical assessment of important 756 regions in the subunit association and catalytic action of the severe acute respiratory syndrome coronavirus main protease Matplotlib: A 2D Graphics Environment Latent effects of Hsp90 mutants revealed at reduced expression levels Structure of M(pro) from SARS-CoV-2 and discovery of its inhibitors Functional domains of the yeast regulatory protein GAL4 The control of flux: 21 years on Structural plasticity of SARS-CoV-2 3CL M(pro) active site cavity revealed by room 772 temperature X-ray crystallography A novel 776 coronavirus associated with severe acute respiratory syndrome Correlation between dissociation and 778 catalysis of SARS-CoV main protease CRISPR-Cas9-mediated saturated mutagenesis screen predicts clinical drug resistance with improved accuracy Recognition 784 of Divergent Viral Substrates by the SARS-CoV-2 Main Protease Deep mutational scanning of an RRM 786 domain of the Saccharomyces cerevisiae poly(A)-binding protein Characterising proteolysis during SARS-CoV-2 infection identifies viral cleavage sites and 790 cellular targets with therapeutic potential Systematic Mutant Analyses Elucidate General and 792 Client-Specific Aspects of Hsp90 Function Inactivation of a yeast transactivator by the 794 fused HIV-1 proteinase: a simple assay for inhibitors of the viral enzyme activity Evolutionary optimization of fluorescent proteins for intracellular 796 FRET Slightly deleterious mutant substitutions in evolution Inducible, tightly regulated and growth condition-independent transcription factor in Saccharomyces cerevisiae An oral SARS-CoV-2 M(pro) inhibitor clinical candidate for the treatment of COVID-19 Analyses of the effects of all ubiquitin point mutants on yeast growth rate CoV-2 Main Protease to Predict and Avoid Drug Resistance CoV-GLUE: A Web Application for Tracking SARS-CoV-2 Genomic Variation The resurgence of covalent drugs Activity-enhancing mutations in an E3 ubiquitin ligase identified by high-throughput 818 mutagenesis pH-dependent conformational flexibility of the SARS-CoV main proteinase (M(pro)) dimer: molecular dynamics simulations and multiple X-ray structure analyses Mechanisms and enzymes involved in SARS coronavirus genome expression Molecular dynamics and in silico mutagenesis on the reversible inhibitor-bound SARS-CoV-2 main protease complexes 828 reveal the role of lateral pocket in enhancing the ligand affinity Production of 830 authentic SARS-CoV M(pro) with enhanced activity: application as a novel tag-cleavage endopeptidase for protein overproduction Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved alpha-ketoamide inhibitors Crystal structure of SARS-CoV-2 main protease in complex with protease inhibitor PF-07321332 This work was sponsored by Novartis Institutes for BioMedical Research. We would like to thank the 686 UMass Chan Medical School Pacific Biosciences Core Enterprise for providing the PacBio NGS services and the UMass Chan Medical School Flow Cytometry Core Facility for providing the FACS services. We