key: cord-1050362-ne45m61x authors: Kojima, Yasuhiro; Mii, Shinji; Hayashi, Shuto; Hirose, Haruka; Ishikawa, Masato; Akiyama, Masashi; Enomoto, Atsushi; Shimamura, Teppei title: Single-cell colocalization analysis using a deep generative model date: 2022-04-11 journal: bioRxiv DOI: 10.1101/2022.04.10.487815 sha: 2630d99538b4eff8fa0b7b1eac230473fbb53d2b doc_id: 1050362 cord_uid: ne45m61x Analyzing colocalization of single cells with heterogeneous molecular phenotypes is essential for understanding cell-cell interactions, cellular responses to external stimuli, and their biological functions in diseases and tissues. However, high-throughput methods for identifying spatial proximity at single-cell resolution are practically unavailable. Here, we introduce DeepCOLOR, a computational framework based on a deep generative model that recovers inter-cellular colocalization networks with single cell resolution by the integration of single cell and spatial transcriptomes. It segregates cell populations defined by the colocalization relationships and predicts cell-cell interactions between colocalized single cells. DeepCOLOR could identify plausible cell-cell interaction candidates in mouse brain tissues, human squamous cell carcinoma samples, and human lung tissues infected with SARS-CoV-2 by reconstructing spatial colocalization maps at single-cell resolution. DeepCOLOR is typically applicable to studying cell-cell interactions in any spatial niche. Our newly developed computational framework could help uncover molecular pathways across single cells connected with colocalization networks. with the TSK population ( Fig. 5-a,b) . Furthermore, we confirmed that the spot-wise product 240 of spatial assignments between the colocalized populations were specifically enriched at the tumor-241 8 stromal boundary , which is expected from the leading-edge molecular phenotype of the 242 TSK population. To explore the molecular profiles of the fibroblast population in the paired cluster 243 0, we analyzed differentially expressed genes in these populations compared to those observed in 244 other fibroblasts (Fig. 5-d) . These populations demonstrated a high expression of MMP14 associ- pair cluster 0 (P < 10 −7 and P < 10 −10 , respectively). Since the glycolysis pathway is reported to 255 be upregulated in many invasive cancers [19] , this result further supports the colocalization of the 256 fibroblast population with TSKs, which demonstrate an invasive leading-edge phenotype. and alveolar cells based on the estimated single-cell colocalization ( Fig. 6-b) . We found that the 295 strongest ligand activity initiating from fibroblasts to AT2 cells was that of NAMPT, which plays 296 an important role in the activation of the innate immune response [7] and is associated with the 297 development of acute respiratory distress syndrome in lung injury [40] . The activity of PECAM1 298 was the strongest activity initiating from monocytes to AT2 cells, while the expression level of 299 PECAM1 was associated with the severity of COVID-19 [31] . We also found that the strongest 300 ligand activity initiating from CD8+ T cells to AT2 cells was that of TNF, the expression level 301 of which is also associated with disease severity and survival of patients with . 302 These results showed that the colocalization-based ligand activity analysis discerned appropriate 303 genes than 500 genes or more mitochondrial genes than 5% of total expression. We conducted and 381 visualized UMAP embeddings of the latent cell states of single cells using 'scanpy'. We also utilized 382 'scanpy' for clustering spatial transcriptome data by the Leiden clustering algorithm with default 383 parameters. c, x c ∈ R G as shown below: where G is the number of genes, z ∈ R M is a latent cell state and f θ : R M −→ R G is a decoder 402 neural network described in Supplementary Table 1 and α (sc) g is the dispersion parameter of the 403 gene g. We approximated the posterior distribution of latent representation P (z c |x c ) ∝ P (x c , z c ) 404 using the Gaussian distribution as shown below: where µ φ , σ 2 φ : R G −→ R D are encoder neural networks described in Supplementary Table 1 . To 406 approximate the true posterior distribution appropriately, we maximized the evidence lower bound 407 (ELBO) for θ and φ, which is defined as follows: where X = (x 1 , . . . , x N ) T and N is the total number of cells. We maximized this ELBO using the 409 Adam optimizer implemented with a learning rate of 0.0004 for 500 epochs. We assumed that the expression of gene g at spatial spot s, e s,g follows a negative binomial distri-412 bution as shown below: 413 P (e s,g |µ θ,θ ,s,g , α (sp) g ) = NegativeBinomial(e s,g |µ θ,θ ,s,g , α (sp) g ) where µ θ,θ ,s,g is the unobserved expression level of the gene g at spot s and α the mapping function, we modeled µ θ,θ ,s,g as the weighted average of the scRNA-seq expression 418 profile, given the following approximated posterior distribution of the latent cell states: where q φ (z|X) is the posterior distribution of a latent cell state, given the total scRNA-seq data 420 set X, r g is the gene-wise technical capturing ratio of spatial transcriptome observation compared 421 to that of scRNA-seq, and l g is the gene-wise shift parameter that is assumed to represent ambient 422 RNA in the spatial transcriptome data. Since the exact integration in equation X is not feasible, we 423 calculated the stratified Monte Carlo approximation of the posterior distribution as shown below: To derive the mapping function optimized for the data, we maximized the log likelihood 426 L = s,g logP (e s,g |µ s,g , α (sp) g ). (1) Since the computational complexity of the mean parameter defined above is proportional to the 427 number of cells, we calculated and optimized the likelihood for spatial transcriptome observation where p is the specified positive rate andŴ s,c is the estimated assignment of cells c in spot s. For the detection of colocalized populations, we evaluated the detection accuracy of cluster pairs 518 belonging to the same region. As a predictor, we calculated the mean colocalization scores across 519 cell pairs within each cluster pair: We compared the performance of DeepCOLOR with that of existing computational methodologies 522 for deconvolving spot-wise spatial transcriptomes, namely, Cell2location [29], and Tangram [4] . For both methodologies, we used default parameters used in the evaluation experiments. Since Cell2location provides cluster-wise abundance for each spot, we deconvolved the weights into every 525 single cell equally for performance evaluation. Spotlight: seeded nmf regression to Deep generative modeling for 713 single-cell transcriptomics Chromatin potential identified by shared single-cell 717 profiling of rna and chromatin Bioactive recombinant human oncostatin m for nmr-based screening in drug discovery Integration of spatial 725 and single-cell transcriptomics localizes epithelial cell-immune cross-talk in kidney injury Kimberly S and single-cell transcriptomics localizes epithelial cell-immune cross-talk in kidney injury Epidermal hyperplasia and appendage abnormalities in mice lacking 741 cd109 Integrating microarray-based 744 spatial transcriptomics and single-cell rna-seq reveals tissue architecture in pancreatic ductal 745 adenocarcinomas Gar-747 cia Endothelial enampt amplifies pre-clinical acute 752 lung injury: efficacy of an enampt-neutralising monoclonal antibody Slide-756 seq: A scalable technology for measuring genome-wide expression at high spatial resolution A framework for advancing our 764 understanding of cancer-associated fibroblasts Characterization of cell fate probabilities in single-cell data with palantir Visualization and analysis of gene expression in tissue sections by spatial transcriptomics Vae with a vampprior High-definition spatial transcriptomics for 782 in situ tissue profiling Figure 1 : Schematic representation of the workflow of DeepCOLOR DeepCOLOR takes single cell and spatial transcriptome as traning inputs and reconstruct spatial distribution and denoised expression profile from noisy single cell observation. Using spatial distribution, we can evaluate colocalization relationships between single cells and identify colocalization network, proximal ligand-receptor communication and colocalized cell-pair clusters.