key: cord-311035-s3tkbh9r authors: Procko, Erik title: Deep mutagenesis in the study of COVID-19: a technical overview for the proteomics community date: 2020-10-21 journal: Expert review of proteomics DOI: 10.1080/14789450.2020.1833721 sha: doc_id: 311035 cord_uid: s3tkbh9r Introduction The spike (S) of SARS coronavirus 2 (SARS-CoV-2) engages angiotensin-converting enzyme 2 (ACE2) on a host cell to trigger viral-cell membrane fusion and infection. The extracellular region of ACE2 can be administered as a soluble decoy to compete for binding sites on the receptor-binding domain (RBD) of S, but it has only moderate affinity and efficacy. The RBD, which is targeted by neutralizing antibodies, may also change and adapt through mutation as SARS-CoV-2 becomes endemic, posing challenges for therapeutic and vaccine development. Areas Covered Deep mutagenesis is a Big Data approach to characterizing sequence variants. A deep mutational scan of ACE2 expressed on human cells identified mutations that increase S affinity and guided the engineering of a potent and broad soluble receptor decoy. A deep mutational scan of the RBD displayed on the surface of yeast has revealed residues tolerant of mutational changes that may act as a source for drug resistance and antigenic drift. Expert Opinion Deep mutagenesis requires a selection of diverse sequence variants; an in vitro evolution experiment that is tracked with next-generation sequencing. The choice of expression system, diversity of the variant library and selection strategy have important consequences for data quality and interpretation. Investigations of protein mutations have classically been approached by precision targeting, in which a small number of mutations are deliberately introduced and tested individually. This requires preconceived ideas or hypotheses on which residues and what changes to those residues might be relevant. When the important residues in a protein sequence are unknown, screens and selections can be used instead, in which a library of random mutations is in some way sorted to enrich for a small number of mutants with the intended phenotype. Both experiments are limited in the scale of information they provide. Deep mutagenesis or deep mutational scanning take advantage of next-generation sequencing to bring experimental protein mutagenesis to the realm of Big Data [1] . A screen or selection of a diverse library of variants is tracked by next-generation sequencing to observe how the population's genetic makeup changes. Mutations with enhanced function are enriched, while deleterious mutations are depleted; the enrichment ratio comparing frequencies in the selected population with the naive library thus acts as a proxy for relative phenotype. Now, the relative effects of thousands of mutations can be assessed simultaneously in a single experiment and a comprehensive mutational landscape can be calculated from experimental data. Deep mutagenesis has been developed by multiple groups over the past decade [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] and has proven especially invaluable to meet three goals: assisting protein engineering, understanding mutational tolerance within a protein sequence, and predicting which mutations might be associated with adverse disease outcomes, especially in the context of cancer or drug resistance. Two recent and prominent studies of SARS coronavirus 2 (SARS-CoV-2) have used deep mutagenesis to address each of these problems [14, 15] . This Special Report summarizes the two studies with a focus on experimental details and caveats that will be unfamiliar to those outside the deep mutational scanning community. Two deep mutagenesis studies have determined how thousands of mutations within the SARS-CoV-2 spike or the virus' human receptor affect their binding. The data have proven invaluable for engineering high affinity decoy receptors that are under preclinical development as a COVID-19 therapy, and have revealed the scope of mutational tolerance within the spike that may have bearing on genetic drift as the virus becomes endemic and changes over time. While these two studies focused on expression and binding between the viral spike and its receptor, the underlying selection strategies used in deep mutational scans are increasingly tied to more complex phenotypes, such as selections for structural stability based on protease-sensitivity [16] , using competing ligands to engineer specificity into proteins including viral receptors [17] [18] [19] , and selections based on catalytic or biological activity [20] [21] [22] [23] . Undoubtedly there are more questions related to SARS-CoV-2 biology and the biochemistry of its While much attention has been given to isolating monoclonal antibodies with tight affinity for the SARS-CoV-2 spike (S) glycoprotein [24] [25] [26] [27] [28] [29] [30] , an alternative is to use the entry receptor as a soluble decoy to neutralize infection. S is a class I viral fusion protein that is proteolytically processed into two subunits, S1 and S2, that are non-covalently associated and decorate the coronavirus envelope [31] [32] [33] . S recognizes angiotensin-converting enzyme 2 (ACE2) on host cells to initiate attachment and fusion of the viral and plasma membranes [33] [34] [35] [36] [37] [38] . Soluble ACE2 (sACE2) blocks receptor-binding sites on S [15, 37, [39] [40] [41] [42] and while escape mutations in S rapidly emerge in tissue culture in the presence of monoclonal antibodies [43] , in principle the virus has limited mechanisms to escape a soluble decoy receptor without simultaneously losing affinity for the natural receptor. The decoy receptor might also have a virucidal effect by inducing conformational changes and S1 shedding, such that virus particles are inactivated even if sACE2 dissociates. However, monoclonal antibodies have superior affinity and neutralization efficacy. To improve the therapeutic potential of decoy receptors, my group used deep mutagenesis to find mutations in ACE2 that enhance affinity [15] . A library of over 2,000 single amino acid substitutions in ACE2 was constructed, focused on diversification of residues at the structurally defined interface with the receptor-binding domain (RBD) of S [44, 45] and also within the ACE2 catalytic cleft. The library was expressed in a human cell line, with a c-myc epitope tag fused to the extracellular N-terminus of ACE2 for detection of surface expressed protein. Other than the presence of the epitope tag, ACE2 expressed in this experimental selection system matches native ACE2 in the human body. The culture expressing the ACE2 library was then selected by fluorescence activated cell sorting (FACS) to collect cells expressing ACE2 variants with tight affinity for fluorescently labeled RBD from S of SARS-CoV-2 ( Figure 1A ). For the artificial selection to be successful, cells must express a single protein variant from a single sequence variant, thereby providing a tight physical link between the phenotype of ACE2 expressed at the plasma membrane and a single sequence within the cell. Getting human cells in culture to acquire and express a single coding variant is no trivial feat, as transfection methods typically introduce many plasmid copies. Different methods to solve this technical challenge have included the use of episomal plasmids that randomly partition to daughter cells during division until progeny harbor a single coding variant over time [4] , the use of engineered integration sites in the genome [9, 46, 47] , or the use of viral vectors at low multiplicities-of-infection [48, 49] . My group used carrier DNA to sufficiently dilute the ACE2 plasmid library such that each cell typically acquired no more than a single coding variant [11] . An episomal plasmid is used for the library so that extrachromosomal replication within the cell enhances expression of the protein under investigation. (The carrier DNA, itself a modified episomal plasmid, further assists in this process [50] .) The disadvantage to this simple solution for linking a single genotype to phenotype is that the coding sequence is so diluted with carrier DNA, most cells in the (A) A library of ACE2 variants was expressed in human cells. Full-length ACE2 (tan) was tagged with a c-myc epitope at its extracellular N-terminus for detection of surface expression with a fluorescent antibody (red). SARS-CoV-2 RBD (pale green) genetically fused with superfolder green fluorescent protein (sfGFP; dark green) was incubated with the cell culture. FACS was used to collect fluorescent cells expressing ACE2 with bound RBD-sfGFP. (B) The isolated RBD of SARS-CoV-2 (pale green) was fused at its N-terminus to Aga2p (blue) and at its C-terminus to a c-myc epitope tag. A saturation mutagenesis library of the RBD was expressed on the yeast surface. Following induction of RBD expression, the yeast were incubated with dimeric, biotinylated sACE2 (tan). Bound ACE2 was detected with fluorescent streptavidin (purple) and surface expressed RBD was detected with a fluorescent antibody (red). • In deep mutagenesis, the relative phenotypes of thousands of mutations in a protein sequence are determined in a single experiment. • The experimental mutational landscape of ACE2 for binding the RBD of SARS-CoV-2 provides a blueprint for engineering high affinity decoy receptors. • A deep mutational scan of the SARS-CoV-2 RBD reveals considerable opportunity for genetic drift without loss of receptor affinity. • Different expression systems for selecting ACE2 or spike variants have inherent advantages and disadvantages. • There are opportunities for deep mutagenesis to provide biochemical insights on other SARS-CoV-2 proteins. culture do not express ACE2 and FACS time is wasted on sorting a large number of negative cells. This has important consequences on the data, as time spent sampling negative cells is time not spent sampling cells expressing the protein under investigation, and consequently variants in the library may be under-sampled giving poor data accuracy. Undersampling becomes exceptionally concerning as the library size increases, and for this reason the library was limited to single amino acid substitutions at just 117 positions in ACE2. Following FACS selection of the human culture to enrich a cell population with high binding activity for SARS-CoV-2 protein S, RNA transcripts were isolated and Illumina sequenced. An enrichment ratio is calculated for each mutation by dividing its frequency in the sorted cell transcripts by its frequency in the naive plasmid library [51] . Illumina sequencing did not cover the full length of ACE2 and instead the cDNA was sequenced as a series of fragments that together provided full coverage of the diversified regions. One assumes during analysis that there are no additional mutations outside a sequenced fragment, a reasonable assumption when a mutation is found because the library was constructed to have only one amino acid substitution per plasmid. However, the assumption breaks down when no mutations are observed in the sequenced fragment, as one cannot know whether there was a mutation elsewhere outside the sequenced region. As a consequence, the wild type sequence is not directly observed and is instead only estimated. There are strategies using the introduction and analysis of silent mutations that can resolve this issue [52] . Overall, there was close agreement between the mutation enrichment ratios from two independent replicates of the FACS experiments, indicating that the ACE2 library was well sampled and there was high confidence in the data [15] . The enrichment ratios calculated for each variant in the sorted ACE2 library provide a mutational landscape that defines the relative phenotypes of thousands of ACE2 mutations for binding to SARS-CoV-2 S [15] . The data in this experiment are qualitative and it is unclear how a log 2 enrichment ratio of, say, −2 or +3 translates to an exact change in a biophysical parameter such as K D . Furthermore, mutations can impact not only binding affinity for the RBD of S but also ACE2 surface expression. To filter out the contribution of mutations to expression, two populations of cells were collected by FACS. In addition to collecting cells that express ACE2 and tightly bind RBD, cells were simultaneously collected in the same experiment that express ACE2 but have weak RBD binding. ACE2 mutants that were not expressed at the cell surface would be depleted from both sorted populations, which was apparent from tracking the depletion of nonsense mutations. In this way, information was collected on how ACE2 mutations impact expression and RBD binding from a single FACS experiment. The deep mutational scan of ACE2 revealed that mutations can indeed be found to enhance binding toward SARS-CoV-2 RBD (Figure 2) , suitable for engineering high affinity soluble decoy receptors [15] . Mutations were found at the binding interface where they enhance specific atomic contacts, and were also found distally in the second shell and beyond where they may impact ACE2 conformation, folding and dynamics. A soluble ACE2 variant that combines three mutations, called sACE2 2 .v2.4, was found to be highly expressed, is a stable monodisperse dimer, binds SARS-CoV-2 S with picomolar affinity and potently neutralizes infection of a susceptible cell line by authentic virus. Its properties rival affinity-matured monoclonal antibodies under commercial development for therapy and prophylaxis. Despite only affinity toward SARS-CoV-2 being considered during the engineering process, sACE2 2 . v2.4 also potently neutralizes authentic SARS-CoV-1, and we speculate that it will have broad activity against betacoronaviruses that use ACE2 as an entry receptor. In unpublished work that has yet to be peer reviewed, we have found sACE2 2 . v2.4 broadly and tightly binds bat coronaviruses that may be a source for future pandemics, supporting the concept of receptor-based decoys as antiviral biologics with exceptional breadth. As determined by yeast display, the effects of mutations in the RBD of SARS-COV-2 protein S on receptor affinity are plotted in the heat map at left, with dark green indicating the mutations are deleterious and pale colors indicating the mutations are neutral. The effects of mutations in human cell-expressed ACE2 on binding to soluble RBD are plotted in the heat map at right, with depleted mutations in orange, neutral mutations in white and enriched mutations in blue. Positional scores are mapped to the atomic structure of RBD-bound ACE2 (PDB 6M17) at center. Conserved ACE2 residues for RBD binding are orange, while ACE2 residues that are hot spots for mutations with increased affinity are blue. RBD residues conserved for ACE2 binding are green. Most RBD mutations in this region of the interface are deleterious, whereas numerous mutations were found in ACE2 that increased affinity. In Starr et al, deep mutagenesis was applied to the SARS-CoV -2 spike to assess mutational tolerance for expression and ACE2 interactions [14] . Instead of investigating the entire trimeric S protein expressed on a cellular or viral membrane, the isolated RBD was fused to the yeast mating factor Aga2p and displayed on the yeast surface [53] (Figure 1B) . This is an artificial display platform that removes the RBD from its native context. N-Glycosylation in yeast is also of high-mannose type and lacks the complex, terminally sialylated glycans produced by human cells [54] , which can be important when binding interactions are glycan-dependent as is seen for some antibodies targeting viral spikes [55] . However, this display platform harnesses yeast genetics to confer tremendous advantages for in vitro selection and evolution. Using yeast display, large diverse libraries can be readily sorted by FACS to provide highquality data. Separate selections were completed at a range of different sACE2 concentrations to simulate a titration experiment, from which the data could be converted to quantitative changes in apparent K D on the yeast surface ( Figure 2 ). As a surrogate for how RBD mutations may impact expression of the viral spike, the effects of mutations on RBD surface display were also assessed in a standalone FACS selection. Quality control pathways for protein secretion in yeast can be forgiving of misfolded protein sequences [16] and there are residues of the RBD that would ordinarily be buried in the context of the full S protein; it therefore remains to be seen how closely the yeast display data will correlate with equivalent experiments in more physiologically relevant expression systems. Nonetheless, the predicted effects by yeast display of some mutations were validated using full length S expressed in human cells and packaged in pseudovirus [14] . The library encoding nearly 4,000 single amino acid substitutions in the SARS-CoV-2 RBD was PacBio sequenced, providing long reads that match untranslated nucleotide barcodes to a specific protein variant. Following FACS-based selection, only the barcodes are read to determine how favorable sequence variants are enriched or deleterious sequence variants are depleted. This resolves issues with Illumina sequencing failing to cover the full cDNA length, and because multiple barcodes are associated with any given protein variant, there are additional internal checks for data quality and consistency. Despite the limitations of a yeast display platform, the deep mutational scan of the isolated RBD provides a high quality and useful data set from which several important conclusions were drawn. First, the ACE2 binding surface of SARS-CoV-2 RBD tolerates surprisingly high sequence diversity, even though it is a critical site for function [14] . High diversity is also seen in the ACE2-binding sites of S proteins from SARS-related bat coronaviruses, but this matches corresponding diversity in ACE2 from ecologically diverse bat species [56] and does not necessarily mean that the RBD tolerates mutations for binding ACE2 from a single species. The deep mutational scan addresses this uncertainty and is further supported by evidence showing that diverse RBD sequences from bat coronaviruses are all competent for binding human ACE2 with varying affinities [38] . Second, mutations were found in the RBD that enhance binding to ACE2, yet there does not appear to be positive selective pressure for these variants in the human population [14] . SARS-CoV-2 affinity for ACE2 is therefore 'good enough,' with no additional fitness benefit for higher affinity. It is worth noting that classical SARS-CoV-1 is also a highly infectious and virulent pathogen, despite having weaker ACE2 affinity [36, 57] . The rapid spread of SARS-CoV -2 probably has more to do with asymptomatic and presymptomatic transmission than enhanced receptor binding. Third, mutations were found within the epitopes for monoclonal antibodies but maintain high ACE2 binding, and it is likely that SARS-CoV-2 can easily mutate to escape neutralization without losing infectivity [14] . This agrees with selection experiments of pseudovirus expressing SARS-CoV-2 S variants, in which escape mutants in the viral spike rapidly emerge to neutralizing antibodies in a single passage [43] . This has profound implications for antibody therapy, where the standard has become combinations of noncompeting monoclonals in a cocktail to prevent rapid resistance. It is currently unknown whether an engineered soluble decoy receptor, such as sACE2 2 .v2.4, will similarly be susceptible to the emergence of viral spike variants that can discriminate between the engineered decoy and the native receptor. We hypothesize that engineered decoys will be broadly active against SARS-CoV-2 variants and this remains an active area of investigation. http://orcid.org/0000-0002-0028-490X Papers of special note have been highlighted as either of interest (•) or of considerable interest Deep mutational scanning: a new style of protein science High-resolution mapping of protein sequence-function relationships A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function Deep mutational scanning of an antibody against epidermal growth factor receptor using mammalian cell display and massively parallel pyrosequencing Affinity and cross-reactivity engineering of CTLA4-Ig to modulate T cell costimulation An engineered switch in T cell receptor specificity leads to an unusual but functional binding geometry Computational design of a protein-based enzyme inhibitor Experimental estimation of the effects of all amino-acid mutations to HIV's envelope protein on viral replication in cell culture A platform for functional assessment of large variant libraries in mammalian cells Multiplex assessment of protein variant abundance by massively parallel sequencing Mapping interaction sites on human chemokine receptors by deep mutational scanning This study established a simple and effective method for linking phenotype to a single genotype in transfected human cells. This technical accomplishment is necessary for selection and deep mutational scanning in human cells A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain High-throughput profiling of influenza A virus hemagglutinin gene at single-nucleotide resolution Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding The isolated RBD of SARS-CoV-2 was displayed on yeast and deep mutationally scanned to understand the mutational landscape for yeast surface expression (a surrogate for folding) and ACE2 binding. The data reveal substantial sequence diversity is tolerated on the RBD surface Engineering human ACE2 to optimize binding to the spike protein of SARS coronavirus 2 deep mutationally scanned ACE2 expressed on a human cell membrane to identify substitutions that enhance binding to S of SARS-CoV-2. This guided the engineering of high affinity and potently neutralizing decoy receptors Global analysis of protein folding using massively parallel design, synthesis, and testing A computationally designed inhibitor of an Epstein-Barr viral Bcl-2 protein induces apoptosis in infected cells Computationally designed high specificity inhibitors delineate the roles of BCL2 family proteins in cancer Engineered receptors for human cytomegalovirus that are orthogonal to normal human biology Single-mutation fitness landscapes for an enzyme on multiple substrates reveal specificity is globally encoded A comprehensive, highresolution map of a gene's fitness landscape Comprehensive sequence-flux mapping of a levoglucosan utilization pathway in E. coli Molecular determinants of chaperone interactions on MHC-I for folding and antigen repertoire selection Cross-neutralization of SARS-CoV-2 by a human monoclonal SARS-CoV antibody Studies in humanized mice and convalescent humans yield a SARS-CoV-2 antibody cocktail Potent neutralizing antibodies from COVID-19 patients define multiple targets of vulnerability Broad neutralization of SARS-related viruses by human monoclonal antibodies A human monoclonal antibody blocking SARS-CoV-2 infection A noncompeting pair of human neutralizing antibodies block COVID-19 virus binding to its receptor ACE2 Isolation of potent SARS-CoV-2 neutralizing antibodies and protection from disease in a small animal model Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein Structural insights into coronavirus entry SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor A pneumonia outbreak associated with a new coronavirus of probable bat origin Receptor recognition by the novel coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus This study reports the original discovery of ACE2 as the entry receptor for classical SARS-CoV-1 Functional assessment of cell entry and receptor usage for SARS-CoV-2 and other lineage B betacoronaviruses * ACE2 is shown to be a shared entry receptor for a clade of SARS-associated betacoronaviruses, including diverse strains from bats and human virus SARS-CoV-2 Susceptibility to SARS coronavirus S protein-driven infection correlates with expression of angiotensin converting enzyme 2 and infection can be blocked by soluble receptor Neutralization of SARS-CoV-2 spike pseudotyped virus by recombinant ACE2-Ig Inhibition of SARS-CoV-2 infections in engineered human tissues using clinical-grade soluble human ACE2 Novel ACE2-IgG1 fusions with improved in vitro and in vivo activity against SARS-CoV2 Antibody cocktail to SARS-CoV-2 spike protein prevents rapid mutational escape seen with individual antibodies SARS-CoV-2 spike is able to rapidly acquire mutations to escape neutralizing monoclonal antibodies in tissue culture, necessitating the combination of multiple non-competing monoclonals in a cocktail to prevent resistance Structural basis for the recognition of the SARS-CoV-2 by full-length human ACE2 Structure of SARS coronavirus spike receptor-binding domain complexed with receptor An improved platform for functional assessment of large protein libraries in mammalian cells Mammalian cell surface display for monoclonal antibody-based FACS selection of viral envelope proteins HIV vaccine design to target germline precursors of glycan-dependent broadly neutralizing antibodies Structure-based design of native-like HIV-1 envelope trimers to silence non-neutralizing epitopes and eliminate CD4 binding Structural architecture of a dimeric class C GPCR based on co-trafficking of sweet taste receptor subunits Enrich: software for analysis of protein function by enrichment and depletion of variants HIV-1 broadly neutralizing antibody precursor B cells revealed by germline-targeting immunogen Isolating and engineering human antibodies using yeast surface display The humanization of N-glycosylation pathways in yeast Glycan-dependent neutralizing antibodies are frequently elicited in individuals chronically infected with HIV-1 Clade B or C. AIDS Research and Human Retroviruses Exceptional diversity and selection pressure on SARS-CoV and SARS-CoV-2 host receptor in bats compared to other mammals Structural basis of receptor recognition by SARS-CoV-2