key: cord-0965863-1qkrpvqc authors: Bhowmick, Pallab; Roome, Simon; Borchers, Christoph H.; Goodlett, David R.; Mohammed, Yassene title: An Update on MRMAssayDB: A Comprehensive Resource for Targeted Proteomics Assays in the Community date: 2021-03-08 journal: J Proteome Res DOI: 10.1021/acs.jproteome.0c00961 sha: 09f809d8b6cce773334a5e909fea39393f58cf13 doc_id: 965863 cord_uid: 1qkrpvqc [Image: see text] Precise multiplexed quantification of proteins in biological samples can be achieved by targeted proteomics using multiple or parallel reaction monitoring (MRM/PRM). Combined with internal standards, the method achieves very good repeatability and reproducibility enabling excellent protein quantification and allowing longitudinal and cohort studies. A laborious part of performing such experiments lies in the preparation steps dedicated to the development and validation of individual protein assays. Several public repositories host information on targeted proteomics assays, including NCI’s Clinical Proteomic Tumor Analysis Consortium assay portals, PeptideAtlas SRM Experiment Library, SRMAtlas, PanoramaWeb, and PeptideTracker, with all offering varying levels of details. We introduced MRMAssayDB in 2018 as an integrated resource for targeted proteomics assays. The Web-based application maps and links the assays from the repositories, includes comprehensive up-to-date protein and sequence annotations, and provides multiple visualization options on the peptide and protein level. We have extended MRMAssayDB with more assays and extensive annotations. Currently it contains >828 000 assays covering >51 000 proteins from 94 organisms, of which >17 000 proteins are present in >2400 biological pathways, and >48 000 mapping to >21 000 Gene Ontology terms. This is an increase of about four times the number of assays since introduction. We have expanded annotations of interaction, biological pathways, and disease associations. A newly added visualization module for coupled molecular structural annotation browsing allows the user to interactively examine peptide sequence and any known PTMs and disease mutations, and map all to available protein 3D structures. Because of its integrative approach, MRMAssayDB enables a holistic view of suitable proteotypic peptides and commonly used transitions in empirical data. Availability: http://mrmassaydb.proteincentre.com. Targeted proteomics quantification of proteins is typically performed using a triple quadrupole operated in multiple reaction monitoring (MRM) mode, or using a mass spectrometer capable of parallel reaction monitoring (PRM) like Orbitraps. 1−7 In both, an initial filtering step isolates the precursor peptide ion typically using a quadrupole filter. After fragmentation, a characteristic fragment ion is isolated using another quadrupole filter in MRM, or all fragment ions are selected for monitoring in PRM. 5, 6 In a scheduled LC/MRM-MS or LC/PRM-MS analysis, precursor and product ions are monitored according to the peptide elution time from the liquid chromatography system. 8, 9 This allows quantitation of a large number of target peptides and, by inference, the corresponding proteins. Reproducible quantitation of almost 300 proteins in 45 min is possible. 10−15 New developments in ion mobility-mass spectrometry (IM-MS) allow an additional gas-phase separation step that promises even faster quantification. 16 A crucial difference between targeted and discovery proteomics lies in the scheduled acquisition of the former. Developing a targeted proteomics assay is a laborious multistep process and involves collecting, validating, and documenting various levels of information on each peptide assay. 17 This includes the uniqueness of the peptide (proteotypic surrogate for the protein of interest) within a particular proteome, its retention time under specific LC conditions, the corresponding precursor/fragment ion pairs, and more. Selecting a suitable proxy peptide that can also be chemically synthesized and used as internal standard requires almost 30 rules be reviewed. 18 A crucial, and probably most important, rule is whether a peptide has been previously observed in MS/MS analyses and therefore is known to be detectable. This type of information is scattered across several online public data repositories hosting raw data and resultant identification information on previous proteomics experiments. 19−22 Sharing information on existing targeted proteomics assays allows scientists to design their targeted proteomics experiments faster and better. Multiple resources for targeted proteomics data exist and include PeptideAtlas SRM Experiment Library (PASSEL), 23 NCI's Clinical Proteomic Tumor Analysis Consortium (CPTAC), 24 PanoramaWeb, 25 SRMAtlas, 26−28 and PeptideTracker. 29 However, the information hosted in these data repositories and knowledge bases is heterogeneous because they were collected with different goals in mind. To help users address this issue, we previously introduced MRMAssayDB as an integrated Web-resource with comprehensive information on all available targeted proteomics assays in these community-wide online repositories. 30 On its release date, MRMAssayDB contained 168 000 assays covering 34 000 proteins from 63 organisms. We have since added a large number of assays and updated the application interface with various annotations, which prompted us to update the resource and redesign some of its aspects. Beside the more than 4-fold increase in the total number of assays added, various protein annotations related to disease, biological pathway, and interaction associations were also added. Additionally, a new integrated visualization module allows mapping of protein and peptide annotations for a better user experience. The new version of MRMAssayDB is larger in content and scope, making it an excellent starting point for designing targeted proteomics experiments. PASSEL is a generic data repository from the Institute for System Biology. MRM experimental results can be submitted 23 The assay portal of CPTAC 24 from the National Cancer Institute (NCI) hosts well-characterized targeted proteomic assays (http://assays.cancer.gov). Its goals are centered around standardizing targeted MS-based assays to achieve robust quantification of all human proteins. 31 Submission to the portal is done by consortium partners. PanoramaWeb 25 is a repository for storing, sharing, and analyzing targeted proteomic experiment processed by Skyline software, 32 and therefore is very popular among the targeted proteomics community (https://panoramaweb.org). SRMAtlas is a compendium of targeted proteomics assays for the quantification of annotated human proteins (http://www. srmatlas.org/). It includes assays to quantify spliced variants, nonsynonymous mutations, and post-translational modifications. 26 PeptideTracker 29 was introduced as a knowledge base for collecting and storing information on protein concentration ranges in biological tissues along with the detailed description of the assays that were used (http://peptidetracker. proteincentre.com). New entries are added continuously to these repositories. Some repositories are specific like CPTAC, while others are more generic like PASSEL. Some repositories store the raw data like PanoramaWeb, while others put emphasis on listing sample preparation protocols like CPTAC and PeptideTracker. PeptideTracker lists the determined protein concentration ranges in samples, as measured by MRM. Each assay is annotated with various information including UniProtKB 33 accession number, protein name, gene name, organism, peptide sequence, uniqueness in proteome, peptide presence in isoforms, modification, labeled internal standard if used, and a hyperlink to the relevant proteomics resources from which the information were obtained. In addition, assays are annotated with biological pathway associations as present in Pathway Commons 34, 35 and KEGG, 36 known protein− protein interactions as present in STRING, 37 disease associations as documented in UniProtKB 33 and DisGeNET, 38 associations with drugs as in DrugBank, 39 available 3D structures from PDB, 40 and PTMs as well as sequence variance known from UniProtKB, 33 community proteomics experiments, and public data sets. 41, 42 We retrieved the information available from the FDA database on all approved assays, and manually checked the entries for those associated with proteins. Using UniProtKB, 33 KEGG, 36 DrugBank, 39 and NIH's PubChem, 43, 44 we manually assigned to each FDA entry the corresponding UniProtKB accession numbers. We then used MRMAssayDB and mapped the protein entries in the FDA approved assays to available targeted proteomics assays in the community. MRMAssayDB was written mainly in Python 2.7 (www. python.org). The user Web interface was developed using the Django 1.8 framework (https://djangoproject.com), and plots are generated using JavaScript. MolArt is used for the interactive visualization of protein annotation and 3D structures. 41 Cytoscape.js 45 was used to plot the interactive PPI networks. Data from PeptideTracker, 29 PASSEL, 23 CPTAC, 24 PanoramaWeb, 25 SRMAtlas, 26−28 UniProtKB, 33 PDB, 40 Pathway Commons, 34,35 KEGG, 36 STRING, 37 Quick-GO, 46 DisGeNET, 38 DrugBank 39 are automatically retrieved using the APIs of these resources by routines written in Java (www.oracle.com/java/index.html), Python, and Selenium Webdriver (http://www.seleniumhq.org). Thousands of targeted proteomics assays have been used previously in various experiments. The information on these assays are largely available in different repositories including PanoramaWeb, 25 CPTAC, 24 SRMAtlas, 26−28 PASSEL, 23 and PeptideTracker. 29 Each one of these resources was developed with a specific goal in mind; however, the information and entries hosted in each are complementary (Table 1) . Together they form a valuable foundation in the targeted proteomics community. MRMAssayDB integrates this disperse information into a single Web-based application, provides up-to-date annotations on the thousands of these assays, and makes all available for researchers. Currently, 828 974 assay entries corresponding to 732 132 unique stripped peptides are listed, covering 51 668 proteins from 94 organisms ( Table 2, Supporting Information, Table S1 ). These entries are based on 834 132 assays corresponding to 736 391 unique stripped peptides currently available in the targeted proteomics community repositories 23−29 (a peptide assay associates a peptide with a protein; same peptide can however be associated with proteins from multiple organisms). The difference, 5710 assays corresponding to 5601 stripped peptide failed during validation against the reference protein sequences (based on UniProtKB release 2020_6), i.e., they are found not to be part of the associated protein. However, if an organism is not present in MRMAssayDB or if researchers are interested in a specific set of proteins in a sample, they can upload their own proteome, and in this case, all 736 000 unique stripped peptides will be searched for suitable assays that enable targeted proteomics experiment in that specific proteome as provided by the researchers. Uniqueness of a peptide assay in the proteome is documented in MRMAssayDB and is always displayed in the results. It is defined with the purpose of quantifying of the protein in the specific organism background; i.e., a peptide assay will be flagged as not unique if it is present in more than one protein in the organism proteome. An exception to this is if the peptide is present in multiple isoforms; in that case, it is still considered unique. A protein may have multiple proteotypic peptide assays already in use. If one's interest is a simple quantification of proteins, a logical choice is to consider those peptides used independently by different researchers. This also applies for the Besides the main goal of having an extensive resource for available targeted proteomics assays, MRMAssayDB provides numerous protein and assay annotations that makes it a resource for researchers looking to design a targeted proteomics experiment. Currently, as of January 2021, 87% of all assays are based on unique peptides. From all of these assays, we were able to map 17 219 proteins to 2399 biological pathways, 48 225 proteins to 21 424 GO terms, 3381 proteins to 8126 drugs, and 4128 proteins to 25 543 diseases. These numbers change over time with the periodic update; however, they provide a good overview on the scope of the assays in the community. In the following, we describe some of the resource features and included annotations. Users can benefit from simple or advanced search modes as well as post search filtering. The search can be performed from the home page using protein name, protein UniProtKB accession, partial peptide sequence, gene name, organism, as well as protein annotation like association with biological pathways or diseases (Figure 1 ). For specific results, a combination of two or more aspects is also possible. Once the search results are present on the screen, the user can add filter terms to each column individually. Columns can also be blended and viewed based on the user interest. Search results can always be downloaded as a spreadsheet document for further local analysis. MRMAssayDB has also an application programming interface (API) allowing developers to send queries in an automated manner and enabling them to incorporate MRMAssayDB information in own data-and knowledge-bases. In the new release of MRMAssayDB we have incorporated a molecular structural annotation of each protein and peptide assay using the MolArt package. 41 This allows users to interactively explore the protein annotations in the context of Users can navigate a screen divided between the sequence annotation visualized by ProtVista 42 and the 3D structural data of the protein visualized by LiteMol. 47 Both views are coupled allowing intuitive interactive exploration of the protein, proteotypic peptide assays, annotations, and their positions in the available 3D structures. The annotations include posttranslational modifications, protein domains and chains, as well as known amino acid variations and their relation to disease. The proteotypic peptides available as targeted proteomics assays are mapped so users can easily view the molecular annotation and structural location of all surrogate peptides with assays. This provides a simple yet powerful visualization of the various information relevant to a targeted proteomics experiment. The annotations are retrieved live from Uni-ProtKB, community proteomics experiments, public data sets. 41 This is important as information on protein annotation is in flux; i.e., new annotations are continuously becoming available on the positions of disease-related mutations or PTMs on the protein. 48, 49 While the majority of targeted proteomics assays are based on nonmodified peptides, it is important to have updated information on whether a selected surrogate peptide carries a mutation or PTM or not. MRMAssayDB provides this information in few clicks. Assays in MRMAssayDB are linked to documented biological pathways and known protein−protein interactions. Multiple resources host these annotations, and although sometimes redundant, complementarity between resources is an essential aspect. In our original release in 2018, we have included KEGG 36 and STRING. 37 In order to extend on the annotation, assay entries are now also linked to Pathway Commons annotations. 34,35 Pathway Commons is an extensive integrated resource of biological pathways and includes annotations from 22 databases with more than 5700 pathways. A visualization of the pathways and protein−protein interaction is also supported. The proteins in the pathway or interaction network are color-coded showing and linking those proteins that have available targeted proteomics assays. If a researcher is interested in designing a targeted proteomics experiment to infer information on specific biological pathway or the interactome of a specific protein, they are able to investigate the network and possible coverage using available assays in one single interactive plot (Figure 3, Figure 4 , and Supporting Information Figure S1 ). The targeted protein assays are annotated with functional data as represented by Gene Ontology (GO) annotations 46 (Supporting Information Figure S2 ). This includes information on the three GO aspects: Biological Process, Molecular Function, and Cellular Component. 50 The assays are also annotated with known disease associations (Supporting Information Figure S3 ). The original application linked to the curated disease annotations as present in UniProtKB. To allow a wider view on expected and machine-curated associations, we added in the new version of MRMAssayDB disease annotations as present in DisGeNET. 38 DisGeNET is an integrated resource with disease associations inferred from expert curated repositories, GWAS, animal models, and scientific literature. Furthermore, annotations for protein− drug association are also provided in MRMAssayDB (Supporting Information Figure S4 ). We continue to link the assays based on the target protein associations as in DrugBank. 39 Having targeted proteomics assays annotated with their known functional, disease, and drug associations can be very helpful for design experiments to study disease, protein−drug interaction, or novel unknown protein function. 14,51 The US Food and Drug Agency (FDA) lists 1140 approved clinical tests, 208 of which are associated with 253 proteins. Using MRMAssayDB we have data-mined all available proteomics assays to determine those, which can be used for the FDA-approved protein markers. We identified quantitative proteomics assays for 222 proteins of the 241 FDA approved protein markers, 109 of the 222 are known to be secreted, with 64 of these proteins listed in MRMAssayDB with information on their concentration ranges in human plasma ( Figure 5) . A selection of current assays and annotations (as of January 2021) is included in Supporting Information Table S2 . The selection was based on the assays that are most frequent among the original resources, whether possible protein concentration values are available, and whether previous information on assay suitability for targeted proteomics experiment is available. 52 Updated information on these protein assays and their annotations is available on MRMAssayDB home page (from the navigation bar at the top of the home screen or by using the following link: http:// mrmassaydb.proteincentre.com/fdaassay/). In the wake of the COVID-19 pandemic and the increased focus of research on SARS-CoV-2, we maintain a list of proteins associated with the infection as reported in UniProtKB. Currently, there are 60 human proteins curated to be associated with COVID-19, and there are assays in MRMAssayDB for all 60 proteins (as of January 2021). These assays can also be viewed directly from the navigation bar at the top of the home screen or by using the following link: http://mrmassaydb.proteincentre.com/covid19/. MRMAssayDB is an integrated Web-based resource for available targeted proteomics assays in the community. It integrates targeted assay information from the five major MRM/PRM assay portals and data repositories including PASSEL, PanoramaWeb, CPTAC assay portal, SRMAtlas, and PeptideTracker. Information on which data repository reported each assay allows sorting search results to determine most frequent assays for each protein. For each assay, the most frequent transitions among repositories and instruments are annotated. This allows informed design of targeted proteomics experiments. The assays are linked to available annotations on the protein sequence, function, and interactions. These annotations include associations with biological pathways, protein−protein interactions, Gene Ontology terms, known PTMs, diseaseassociated mutations, protein domains, disease involvements, and drug associations. Advanced synchronized visualization of the annotations and the 3D structure of the target protein, as well as how the proteotypic surrogate peptides map to all, empower the user to carry out a comprehensive study of their protein of interest and choose most suitable assay. As targeted proteomics data sets are continuously shared by scientists and researchers through the data repositories, we intend to Journal of Proteome Research pubs.acs.org/jpr Technical Note continue integrating these into our knowledgebase. Currently, the assay entries are compiled automatically once a month. A targeted proteomics experiment starts always with a planning phase about which targets to measure. Using MRMAssayDB researchers can streamline this process by finding suitable assays that were previously applied. The large number of entries and annotations available to researchers in this single resource are a result of long-term archiving of experimental information that has been carried out in the targeted-proteomics community. We hope by maintaining MRMAssayDB researchers can easier design and perform targeted proteomics experiments, and by sharing their experimental raw data in public repositories, the assay information in MRMAssayDB will continue to be up-to-date. Access to the software is available free of charge at http:// MRMAssayDB.proteincentre.com/. The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jproteome.0c00961. Figure S1 : Available assays for human in the Cholesterol Metabolism pathway as presented in MRMAssayDB; Figure S2 : GO annotations for Coagulation Factor IX (F9) as presented in MRMAssayDB; Figure S3 : Disease associations, here of F9, are presented in searchable and linked terms with tabs for DesGeNET and for UniProtKB; Figure S4 : Drug associations as shown in MRMAssayDB search results of P00740 (coagulation factor IV, F9); Table S1 : The organisms and number of associated assays in MRMAssayDB (as of January 2021) (PDF) Selected reaction monitoring for quantitative proteomics: a tutorial The absolute quantification strategy: a general procedure for the quantification of proteins and post-translational modifications Targeted proteomic strategy for clinical biomarker discovery Advances in targeted proteomics and applications to biomedical research Parallel reaction monitoring for high resolution and high mass accuracy quantitative, targeted proteomics Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS Using iRT, a normalized retention time for more targeted measurement of peptides Mass spectrometry-based proteomics Quantification of cardiovascular biomarkers in patient plasma by targeted mass spectrometry and stable isotope dilution Proteomics meets the scientific method Advances in multiplexed MRM-based protein biomarker Journal of Proteome Research pubs.acs.org/jpr Technical Note quantitation toward clinical utility Multi-site assessment of the precision and reproducibility of multiple reaction monitoring-based measurements of proteins in plasma Plasma Protein Signatures of a Murine Venous Thrombosis Model and Slc44a2 Knockout Mice Using Quantitative-Targeted Proteomics Multiplexed targeted proteomic assay to assess coagulation factor concentrations and thrombosis-associated cancer Simultaneous Proteomic Discovery and Targeted Monitoring using Liquid Chromatography, Ion Mobility Spectrometry, and Mass Spectrometry Targeted quantitation of proteins by mass spectrometry PeptidePicker: a scientific workflow with web interface for selecting appropriate peptides for targeted proteomics experiments Making proteomics data accessible and reusable: current state of proteomics databases and repositories Proteomics data repositories: providing a safe haven for your data and acting as a springboard for further research Proteomics data repositories The ProteomeXchange consortium in 2020: enabling 'big data' approaches in proteomics PASSEL: the PeptideAtlas SRM experiment library CPTAC Assay Portal: a repository of targeted proteomic assays Panorama: a targeted proteomics knowledge base Human SRMAtlas: A Resource of Targeted Assays to Quantify the Complete Human Proteome The Mtb proteome library: a resource of assays to quantify the complete proteome of Mycobacterium tuberculosis Aebersold, R. A complete mass-spectrometric map of the yeast proteome applied to quantitative trait analysis PeptideTracker: A knowledge base for collecting and storing information on protein concentrations in biological tissues Using the CPTAC Assay Portal to Identify and Implement Highly Characterized Targeted Proteomics Assays Skyline: an open source document editor for creating and analyzing targeted proteomics experiments Ongoing and future developments at the Universal Protein Resource Pathway Commons, a web resource for biological pathway data KEGG: kyoto encyclopedia of genes and genomes STRING v11: protein-protein association networks with increased coverage, supporting functional Journal of Proteome Research pubs.acs.org/jpr Technical Note discovery in genome-wide experimental datasets DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants The Protein Data Bank MolArt: a molecular structure annotation and visualization tool ProtVista: visualization of protein sequence annotations PubChem 2019 update: improved access to chemical data PubChem: a public information system for analyzing bioactivities of small molecules js: a graph theory library for visualisation and analysis QuickGO: a web-based tool for Gene Ontology searching LiteMol suite: interactive web-based visualization of large-scale macromolecular structure data Domain landscapes of somatic mutations in cancer Gene3D: expanding the utility of domain assignments Creating the gene ontology resource: design and implementation The intestinal microbiome potentially affects thrombin generation in human subjects