key: cord-0293620-hau0xo50 authors: Obradovic, Aleksandar; Vlahos, Lukas; Laise, Pasquale; Worley, Jeremy; Tan, Xiangtian; Wang, Alec; Califano, Andrea title: PISCES: A pipeline for the Systematic, Protein Activity-based Analysis of Single Cell RNA Sequencing Data date: 2021-11-03 journal: bioRxiv DOI: 10.1101/2021.05.20.445002 sha: 2e946a17731873fed8561b6ad7612bd3f9375094 doc_id: 293620 cord_uid: hau0xo50 While single-cell RNA sequencing provides a new window on physiologic and pathologic tissue biology and heterogeneity, it suffers from low signal-to-noise ratio and a high dropout rate at the individual gene level, thus challenging quantitative analyses. To address this problem, we introduce PISCES (Protein-activity Inference for Single Cell Studies), an integrated analytical framework for the protein activity-based analysis of single cell subpopulations. PISCES leverages the assembly of lineage-specific gene regulatory networks, to accurately measure activity of each protein based on the expression its transcriptional targets (regulon), using the ARACNe and metaVIPER algorithms, respectively. It implements novel analytical and visualization functions, including activity-based cluster analysis, identification of cell state repertoires, and elucidation of master regulators of cell state and cell state transitions, with full interoperability with Seurat’s single-cell data format. Accuracy and reproducibility assessment, via technical and biological validation assays and by assessing concordance with antibody and CITE-Seq-based measurements, show dramatic improvement in the ability to identify rare subpopulations and to assess activity of key lineage markers, compared to gene expression analysis. High-throughput, droplet-based single-cell RNA Sequencing (scRNASeq) has recently emerged as a valuable tool to elucidate the diverse repertoire of cellular subpopulations comprising a broad range of mammalian tissues. Applications of this technology range from study of tissue development (He et. al., 2020) and tumor micro-environment (Qian et. al., 2020) , to the elucidation of tissue heterogeneity ) and even of tissue-level response to infectious diseases, such as COVID-19 (Xu et. al The key drawback of scRNAseq technologies is that the total number of mRNA molecules per cell, combined with low capture efficiency, fundamentally limits the number of distinct mRNA molecules that can be detected in each single cell (UMI reads). As a result, scRNASeq profiles are extremely sparse, with as many as 90% of all genes producing no reads in any given cell and the majority of detected genes producing one or two reads. This phenomenon, commonly known as gene dropout, greatly hinders downstream analysis, making quantitative assessment of differential gene expression extremely challenging. For instance, while broadly different cell types can be classified, a majority of biologically relevant genes, including the established lineage markers of specific cellular subpopulations, are undetected. As a result, cellular subpopulations presenting more subtle differences, such as different fibroblast or macrophage subpopulations, may be impossible to differentiate (Elyada et al., 2019, Obradovic et al. 2021 ). Even with cutting edge analysis tools such as the Seurat analysis pipeline (Butler et al., 2018), which can often identify individual subpopulations, scRNAseq gene expression data remains limited in its ability to elucidate fine-grain biological mechanisms due to its sparseness. Additionally, interrogation of individual genes of interest across cells is significantly impaired, particularly for transcription factors and signaling molecules, which do not need to be abundantly transcribed in order to fundamentally drive cell phenotype through their downstream effects on transcriptional state. To address these limitations, we have shown that network-based analysis of protein activity, using the VIPER and metaVIPER algorithms (Ding et. al., 2018; Obradovic et al., 2021) , can provide accurate, quantitative assessment for >6,000 proteins, including transcription factors, co-factors, chromatin remodeling enzymes, and signaling proteins. Moreover, we have shown that protein activity-based analysis can help identify rare subpopulations that are responsible for the presentation of key macroscopic phenotypes, ranging from immune evasion (Thorsson et. al., 2018) to relapse following surgery (Obradovic et. al., 2021) . It can also help identify master regulator proteins representing mechanistic, causal determinants of cell state and cell state transitions, such as to de-differentiation to a pluripotent stem cell state (Kushwaha, 2015) or transdifferentiation between distinct tumor cell states (Laise et al., 2021). However, these analyses can be extremely complex because they require assembly of lineage specific regulatory networks and master regulator analyses that are challenging for biologists who are not trained in network biology. To allow broad access to these methodologies to biologists with relatively limited network-based analyses expertise, we introduce a comprehensive pipeline for Protein Activity Inference for Single Cell Studies (PISCES), which is made available to the research community via a generaluse R package. The pipeline automates the optimal generation of lineage specific regulatory ARACNe is an information theoretic algorithm for the inference of the direct transcriptional targets of transcriptional regulator proteins, as well as the least indirect targets of signal transduction proteins. This allows reconstructing the tissue specific repertoire of transcriptional targets (regulon) of ~6,500 regulatory and signaling proteins, including surface markers (SMs). VIPER computes the activity of each protein based on the differential expression of the genes in its regulon, as assessed by weighted gene set enrichment analysis. Since regulons are generally large, containing up to several hundred genes, we prune them to include the same number of the most likely targets (between 50 and 100), to avoid biasing the statistical significance of the gene set enrichment analysis, as discussed in (Alvarez et al., 2016) . As a result, even when the specific gene encoding for a protein of interest is undetected, VIPER can still quantitatively assess its activity ( Figure 1B) . Previous work in the Califano lab has shown the accuracy and reproducibility of these algorithms when used to analyze bulk data. Indeed, ARACNe and VIPER have been used extensively to identify master regulators (MRs) that were experimentally validated as mechanistic determinants of diverse biological states, many of which have been extensively validated, see (Rajbhandari et. al To adapt these tools to the analysis of scRNASeq profiles, PISCES implements three major modifications. First, an initial gene expression-based cluster analysis is used to identify molecularly distinct cellular subpopulations representing distinct sub-lineages. Fine grain cluster analysis is not necessary as we have shown that regulatory networks for closely lineage-related cells are virtually indistinguishable (Mani et al, 2010). ARACNe is then used to generate distinct regulatory networks for each cluster containing at least N = 500 cells. Second, to increase regulon coverage, cells within each sub-lineage-related cluster are combined into "meta-cells" using a K-nearest-neighbors graph analysis. This creates pseudo-bulk samples that can then be analyzed by ARACNe, producing networks with more accurate edges, larger regulons, and greater coverage of regulatory proteins. Finally, rather than using VIPER for protein activity measurement, we use its derivative metaVIPER (Ding et al., 2018) , which is designed to optimally integrate protein activity inferences from multiple networks. This allows for the use of multiple single-cell and, when available, lineage-matched bulk-tissue-derived networks. Downstream of the ARACNe and metaVIPER analyses, PISCES provides access to a variety of novel protein-activity based clustering and data visualization algorithms, in addition to implementing interoperability with the popular Seurat single-cell data format. In order to establish the efficacy of these tools and optimal parameters for future benchmarking and improvement, we have performed both technical and biological validation experiments, first by evaluating reproducibility of protein activity assessment from progressively downsampled data, and then by assessing concordance of gene expression and protein activity to antibodybased measurements using multiplexed FACS (Cytek) and CITE-Seq (Stoeckius et. al., 2017) . Taken together, the results of these benchmarks show that the PISCES analytical pipeline dramatically outperforms gene expression-based analyses and even outperforms experimental assessment via selected antibodies, while allowing essentially proteome-wide activity quantitation. As such, these data suggest that PISCES provides a valuable and highly flexible tool for the analysis of scRNA-Seq datasets, which greatly improves the granularity of cell subpopulation detection, allowing detection of rare yet biologically relevant subpopulations that would be missed by gene expression analysis, due to gene dropout issues, and supports accurate assessment of Master Regulators of single-cell states. The PISCES pipeline takes a single-cell Unique Molecular Identifier (UMI) count matrix as input, with genes organized by row and cells by column. Initial Quality Control filtering is adjustable, with user-defined parameters. By default it will remove cells with fewer than 1,000 UMIs or more than 25% mitochondrial gene UMIs. The gene expression matrix is then normalized and scaled to generate a matrix of gene expression In parallel, the normalized gene expression profile is transformed into a gene expression signature (GES). This can be done in a number of ways, either with an internal normalization against mean and standard deviation of all cells to query differences within the dataset or with an external reference to answer experiment-specific questions (i.e. the differences between cancerous and healthy cells). By default, PISCES will perform a standard internal normalization to generate the gene expression signature, which is then transformed into a matrix of protein activity using MetaVIPER. MetaVIPER takes as input the GES and the previously generated cluster-specific networks and identifies the best network matches to each sample by maximum regulon consensus. Enrichment scores from each matched network are then integrated using a weighted-average to produce a final enrichment value that can then be used for downstream visualization and analysis. The entire pipeline is visualized in Figure 1A . Since every scRNAseq experiment is unique-depending on the specific cell types, the quality of the data, or the overarching question driving the research-PISCES allows users to fine tune the pipeline to match their specific requirements. For instance, since Seurat represents a widely used platform for scRNAseq analysis at the gene expression level, the Seurat batch-correction and SCTransform data scaling approach are incorporated as optional pre-processing steps to generate gene expression signatures before they are analyzed by PISCES. These may, however, be substituted by any user defined normalization and data scaling routine, such that effect of alternative normalization or pre-processing methods may be tested using PISCES's default technical and biological benchmarks. Output from the PISCES pipeline is converted to a Seurat object for convenient export into a variety of external visualization or processing tools, and analyzed by other commonly used tools. In particular, cell type annotation is implemented in PISCES at the single-cell level using SingleR (Looney et. al., 2019), which infers cell types represented in the dataset by correlation of gene expression to expression of sorted bulk-RNASeq reference datasets and stores these labels as metadata for downstream analysis. To benchmark PISCES reproducibility relative to gene expression and to establish an optimal UMI depth for user-driven adjustment of metacell parameters, we executed the entire pipeline using progressively down-sampled profiles from relatively high-depth scRNAseq data. For this purpose, we used the SNU-16 cell line, a relatively homogenous stomach adenocarcinoma model that is transcriptionally complex and produces high UMI counts per cell (i.e., 40,000-50,000), on the high end of the typical yield for cell lines and significantly above the yield produced by clinical samples. Average UMI count in our dataset was 41,915 across 6157 single cells ( Figure S2A ). To create synthetic data with lower depth, we down-sampled this data by first drawing each cell's total UMI-count from a multinomial distribution with mean target depth manually specified and a uniform probability weight over all cells, then drawing the gene-specific counts from a second multinomial with probabilities given by the proportions of genes in the original, full depth profile for each cell. This procedure was applied with target depths between one and ten thousand UMIs at a step-size of 1,000 and between 10,000 and 40,000 UMIs at a step-size of 5,000. We then generated meta-cells using a consistent sub-set of 500 cells for each down-sampled matrix with depth of 10,000 UMIs or fewer. These data were used to generate 27 ARACNe networks in total; one for the full data, 16 from each of the down-sampled gene expression profiles, and 10 from each of the meta-cell matrices. To generate gene expression signatures, we normalized each down-sampled matrix against the Cancer Cell Line Encyclopedia (CCLE) from The Broad. Because this data is from bulk-sequencing, we first had to apply the previously described down-sampling scheme in order to generate depth-matched reference samples for each single-cell matrix. Gene expression profiles were then normalized gene-by-gene by subtracting the mean expression from CCLE, then dividing by the standard deviation of the expression in CCLE. Finally, we generated VIPER matrices for all pairwise combinations of GES and regulatory networks, culminating in 459 VIPER matrices. A flowchart illustrating this experimental design is shown in Figure S5 . To assess the reproducibility of gene expression and protein activity signatures at different depths, we computed the cell-by-cell Pearson correlation between each down-sampled matrix and the full depth data. In each cell, we subset the comparison to those genes or proteins with significantly different expression or activity (p-value < 0.05 with Bonferroni correction) in the full-depth data, then computed the correlation coefficients cell-bycell between full-depth and down-sampled data using this subset. This reduction was performed in order to avoid inflation of correlation values based on non-significant data. In protein activity signatures generated fully from down-sampled data (down-sampled GEP as input to ARACNe, down-sampled GES as input to VIPER), we observe a statistically significant improvement in correlation to full-depth data relative to gene expression signature at all depths above 5,000 UMIs ( Figure 2A ; p-value < 0.05 by Wilcoxon signed rank test). Strikingly, when an ARACNe network generated from full-depth GEP is applied to down-sampled GES as input to VIPER, correlation to original full-depth VIPER signature is strongly conserved even at extremely low UMI counts, remaining above 0.75 on average at UMI depth of 1000, where average correlation of gene expression signature to full-depth data is below 0.1. This emphasizes the importance of constructing a high-quality ARACNe network in the VIPER inference pipeline, such that applying high-quality networks inferred for a given cell type from one dataset to a matched cell type in lower-quality data is likely to provide a significant boost to the power of protein activity inference even from very-low-depth data. Additionally, we find a significant improvement in correlation values when constructing metaCell-based ARACNe networks from lower-depth data ( Figure 2B ), such that metaCell networks applied to run VIPER on GES matrices with mean UMI count of 3000 approach the inference quality seen when running ARACNe and VIPER on gene expression matrices with a mean UMI count of 20,000. However, at the very low mean depth of 1000 UMI/cell this breaks down, and metaCell ARACNe network inference no longer offers any statistically significant improvement over inference on low-depth data. Therefore, we strongly recommend applying the metaCell ARACNe network inference option in PISCES for any datasets with data quality between 1000 and 5000 mean UMIs/cell, which is common in clinical datasets. Overall, these data show that the correlation between full-depth and down-sampled gene expression signatures is poor even at relatively high depth, and decays rapidly to a median value of less than 0.25 even at depths of 10,000 UMIs/cell (purple bars, Figure 2A ). Protein activity, by comparison, is much more robust, significantly outperforming gene expression at all depths above 5,000 UMIs/cell. Interestingly, down-sampling only the gene expression signature input to VIPER while retaining a full-depth ARACNe network had little effect (red bars, Figure 2A ) on protein activities robustness, while down-sampling either the data using to generate ARACNe networks or both ARACNe data and gene expression signature (green and blue bars respectively) had a much more significant effect on correlation to original full-depth VIPER matrix, which was partially rescued by metaCell ARACNe. The full heatmap showing mean correlation across cells comparing all VIPER matrices against full depth data is available in the supplement ( Figure S3 ). These findings indicate that the quality of the ARACNe networks is the driving force behind protein activity signatures' ability to retain signal at low UMI depths and supports the idea of using metacells to rescue signal within the ARACNe network or use context-appropriate bulk networks where available. Furthermore, when gene expression-based clustering was limited only to the genes encoding for the proteins in the CITE-Seq panel, the single-cell RNA-Seq dropout problem was so severe that cluster structure was completely lost (Figure 3D ). This suggests that critical proteins, whose role in the biology of these populations is extremely well established, are completely missed in terms of their gene expression. In sharp contrast, PISCES analysis fully recapitulated the experimentally assessed cluster structure when the analysis was limited to the proteins represented on the CITE-Seq panel ( Figure 3E) . Critically, the coefficients of variation (i.e., ), as computed for gene-expression, antibody-measured protein abundance, and VIPER-measured protein activity, shows that VIPER-measured activity dramatically outperforms gene expression (p=0.0004 by paired t-test across the entire panel) and even antibody measurements for most proteins (p = 0.0083 across the entire panel), indicating a significant improvement in reproducibility and signal-to-noise ratio ( Figure 3A) . Finally, we assessed correlation between either gene expression or VIPERmeasured protein activity against protein abundance as assessed by CITE-Seq. Across the board VIPER significantly outperformed gene expression ( Figure 3B) , with strong visual clusterseparation even on single genes (Figure 3F) , and pairwise plots of VIPER activity vs paired CITE-Seq antibody staining resembling flow cytometry plots (Figure 4) . Furthermore, we would like to point out that protein abundance, as assessed by antibodies, is a poor proxy for protein activity. This is because, even after a protein is expressed, its activity is manifested only when it is effectively post-translationally modified, it is translocated into the appropriate sub-cellular compartment, and it has formed complexes with critical cognate binding partners. By measuring activity via expression of highly multiplexed gene reporter assay, VIPER can effectively report on the activity of proteins, which has been so far elusive, especially in may be used as a benchmark of the high concordance between PISCES and measured protein abundance, and the degree to which PISCES improves signal-to-noise with respect to antibodybased measurements. The PISCES package for analysis of single-cell RNA-Sequencing data represents a comprehensive and highly generalizable pipeline for inference of protein activity to maximize utility of single-cell datasets. We have demonstrated its ability to mitigate the single-cell RNA-Seq data dropout problem and recapitulate high-depth data structure even from low UMI counts. We have also demonstrated its ability to recapitulate biological structure from CITE-Seq antibody-based protein profiling with much better gene-by-gene signal than gene expression. These technical and biological validations also serve as benchmarks for further refinement of the pipeline by which any changes can be comprehensively assessed. In addition to the technical benchmarking of correlation between down-sampled and full-depth data, the extent of improvement by PISCES in coefficient of variation, number of genes recovered, and gene-by-gene correlation to matched antibody profiling represent a critical biological benchmark for alternative workflows by PISCES users as new pre-processing methods are incorporated and existing algorithms are refined. The pipeline has been consciously designed to be highly modular, with customizable workflows and parameter optimization enabled by separate pre-processing, meta-cell, and clustering steps and interoperability with the popular Seurat workflow. We recommend targeting a median UMI depth / cell of no less than 5000, with the crucial step being inference of ARACNe network from highdepth data, applying the metaCell algorithm to improve sample depth for ARACNe network inference. Wherever a high-depth-derived ARACNe net is available, inference fidelity is high even on extremely low-depth datasets, so the increased availability of single-cell RNA-Seq datasets across a broad range of tissue contexts will continually allow construction of an expanding library of ARACNe networks which can be broadly applied to new data. PISCES is chiefly limited by the fraction of 6,500+ total proteins recoverable at low UMI depth, although the number of proteins recovered nearly always compares favorably to CITE-Seq, which requires time-consuming antibody titration and is limited to predefined cell surface proteins, whereas PISCES captures proteins with the strongest signal-to-noise from the data and can infer both cell surface and intracellular protein activity. Applying metaCell ARACNe network inference addresses this to some degree, such that nearly 100% of all proteins recoverable at full depth in SNU-16 cell line sequencing data were recovered at a UMI depth of 10,000, where only half of the proteins inferred at full-depth were recoverable without metaCell, and over half of proteins remained recoverable with metaCell even at critically low UMI depth of 1,000 ( Figure S2C ). Future iterations of the pipeline will continue to improve on the fraction of recoverable proteins by integrating and testing novel pre-processing procedures and optimization of the ARACNe and VIPER inference steps. The development version of the PISCES R package will be continually available at https://github.com/califano-lab/PISCES. Quality Control, Normalization, and Scaling: As a pre-processing step, low quality cells and genes lacking enough data to be useful are removed from the analysis. Cell quality is determined by two primary factors -read depth and mitochondrial gene percentage. Samples with too many or too few reads are likely sequencing errors (doublets or empty droplets), while a high mitochondrial gene percentage is indicative of cell stress or damage. This latter group of cells will typically have a biased transcriptome not representative of the actual cell state. For most data sets, PISCES will simply remove genes with no reads at all. For larger data sets, genes that appear in less than 1% of the total cells will be removed in order to optimize computational complexity. Cells with fewer than 1000 total UMIs or mitochondrial transcript fraction greater than 25% are also removed in quality-control filtering. Filtered data are then normalized to log10(counts per million + 1). A gene expression signature is then generated from the normalized data using either double rank transformation or Seurat SCTransform scaling function. ARACNe is available on github. Once cluster-specific networks have been generated, they will serve as the input to a final VIPER run. More accurate networks will naturally lead to more accurate inferences of protein activity, which in turn allows for more robust downstream analyses. Bulk networks can also be incorporated to fill in any gaps present in the single-cell networks, as ARACNe will typically be unable to infer regulons for some proteins even with the implementation of MetaCells. These protein activities inferred from bulk should be considered less accurate, but they can be used to follow-up on previously known proteins of interest, for instance. Once a final VIPER matrix has been inferred, the data can be re-clustered. VIPERspace will typically allow for the parsing of smaller, more transcriptionally distinct populations. These classifications can then be used for a master regulator analysis that identifies the driving regulators of the differential cell state. This can be done in several ways, with a Bootstrapped Mann Whitney-U test being the most robust. Cluster-specific Stouffer integration or a data-wide ANOVA or Kruskal-Wallis test are also viable alternatives and implemented within PISCES. Weighted VIPER: Previously, MetaVIPER was developed as an initial adaptation of VIPER to single-cell data. By using multiple networks, MetaVIPER sought to accurately recapitulate protein activity in populations for which no context-specific network was available. To briefly explain this method, protein activity would be inferred from a given gene expression signature using multiple networks, which would then be integrated on a protein-by-protein basis using the square of the NES. Since a non-relevant network would generate a protein activity NES close to zero under the null model, networks that generate more extreme NES's can be interpreted to more accurately match the given biological context and were thus weighted more heavily. This approach has been improved on further in PISCES. Rather than relying on the square of the NES to integrate networks in a protein-by-protein manner, Weighted VIPER utilizes all the proteins in a given sample to determine network accuracy. For each sample, the NES's generated by the set of networks for each protein are ranked, and the ranks are totaled across proteins. Networks are then weighted based on their frequency as the most-accurate network. As an example, if network A generates the most extreme NES for 50% of the proteins in a sample and network B generates the most extreme NES for 25% of the proteins, network A will be weighted twice as heavily in the integration. This technique utilizes all proteins as a multiplexed reporter of network accuracy, allowing for more accurate matching of samples and the most-context specific network available. Visualizing data with thousands of dimensions is a fundamental challenge of transcriptomics. PISCES has a number of pre-built plotting functions to aid in the visualization of results. Scatter plots are based in UMAP coordinates, with the starting features filtered by the most significant proteins within each sample. Functions within PISCES allow for the visualization of clustering schemes, protein activity, or gene expression in UMAP space, along with density plotting. Additionally, we provide heatmap functionality for more tractable succinct visualization of a set of genes or proteins grouped by cluster, such as a set of known markers or a list of candidate master regulators. The default clustering method implemented in Seurat is Partitioning Around Medioids (PAM). However, for large datasets aggregating hundreds of thousands of single-cells, PAM is computationally slow, requiring more computational power than is available to the average user and computation of pairwise distance matrices exceeding the vector size limit in R. In such cases, it is preferable to run a networkbased Louvain clustering, as implemented in Seurat, which optimizes network modularity score. However, practical implementations of Louvain clustering include a user-adjustable resolution parameter which allows over-clustering and under-clustering without an objective cluster quality metric. To solve this problem, we have implemented a hybrid clustering approach in PISCES which performs cluster assignment in two steps. First, Seurat Louvain clustering is performed with resolution values ranging from 0.01 to 1.0 at intervals of 0.01, then cluster quality is evaluated at each resolution value to select an optimum in this range. For each resolution value, clustered cells are subsampled to 1000, and silhouette score is computed for these 1000 cells and their corresponding cluster labels, with correlation distance metric. This procedure is repeated for 100 random samples to compute a mean and standard deviation of average silhouette score at each resolution value. The highest resolution value that maximizes mean silhouette score is selected as the optimal resolution at which to cluster the data. Optimization, PISCES further implements a Multi-Way K-Means Clustering approach. Transitioning populations, such as in a differentiation pathway, are extremely common, and such relationships will not be accurately characterized by a discrete clustering scheme. To Figure 3E , capturing all represented cell types. S1C) Gene Expression Heatmap of the top5 inferred master regulators of each VIPER cluster, scaled by row. S1D) VIPER Activity Heatmap of the top5 inferred master regulators of each VIPER cluster, scaled by row. at each down-sampling depth (x-axis) relative to full-depth data, such that fraction decreases log-linearly with down-sampling depth. S2C) Fraction of Total ARACNe network regulons relative to full-depth data (y-axis) recovered at each down-sampling depth from 1000 to 10000 UMI/cell, with metaCell approach (red) or without metaCell approach (black). Heatmap of mean correlation values compared to original full-depth VIPER matrix with full-depth ARACNe network for each combination of down-sampled ARACNe and VIPER gene expression signature depth. Each row corresponds to depth of gene expression signature input to VIPER, and each column corresponds to depth of gene expression input to ARACNe. Correlation is subset to proteins differentially up-regulated or down-regulated (p<0.05) within original full-depth VIPER matrix, on a cell-by-cell basis, and mean correlation across all cells is plotted for each box on the heatmap corresponding to a particular down-sampling approach. To-Do: Add Single-Cell Type Atlas The changing mouse embryo transcriptome at whole tissue and single-cell resolution A pan-cancer blueprint of the heterogeneous tumor microenvironment revealed by single-cell profiling The differential immune responses to COVID-19 in peripheral and lung revealed by single-cell RNA sequencing Single-cell RNA sequencing reveals SARS-CoV-2 infection dynamics in lungs of African green monkeys Defining T Cell States Associated with Response to Checkpoint Immunotherapy in Melanoma A Cancer Cell Program Promotes T Cell Exclusion and Resistance to Checkpoint Blockade Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer Single-Cell Protein Activity Analysis Identified Recurrence-Associated Renal Tumor Macrophages ARACNe-AP: gene network reverse engineering through adaptive partitioning inference of mutual information Functional characterization of somatic mutations in cancer using network-based inference of protein activity Network-based assessment of HDAC6 activity is highly predictive of pre-clinical and clinical responses to the HDAC6 inhibitor ricolinostat Cross-Cohort Analysis Identifies a TEAD4-MYCN Positive Feedback Loop as the Core Regulatory Element of High-Risk Neuroblastoma Quantitative assessment of protein activity in orphan tissues and single cells using the metaVIPER algorithm Simultaneous epitope and transcriptome measurement in single cells Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage Comprehensive Integration of Single-Cell Data BLUEPRINT: mapping human blood cell epigenomes An integrated encyclopedia of DNA elements in the human genome Single-cell RNA sequencing reveals the heterogeneity of liver-resident immune cells in human Interrogation of a context-specific transcription factor network identifies novel regulators of pluripotency The Immune Landscape of Cancer. Immunity Single-cell entropy for accurate estimation of differentiation potency from a cell's transcriptome This work was supported by the NCI Outstanding Investigator Award R35CA197745 to AC. P.L. is Director of Single-Cell Systems Biology at DarwinHealth, Inc., a company that has licensed some of the algorithms used in this manuscript from Columbia University. .A.C. is founder, equity holder, consultant, and director of DarwinHealth Inc., which has licensed IP related to these algorithms from Columbia University. Columbia University is an equity holder in DarwinHealth Inc.