key: cord-0313707-pfhnlvv1 authors: Liao, Jie; Qian, Jingyang; Liu, Ziqi; Chi, Ying; Zheng, Yanrong; Shao, Xin; Cheng, Junyun; Cui, Yongjin; Guo, Wenbo; Yang, Penghui; Hu, Yining; Bao, Hudong; Chen, Qian; Li, Mingxiao; Zhang, Bing; Fan, Xiaohui title: Reconstruction of the cell pseudo-space from single-cell RNA sequencing data with scSpace date: 2022-05-08 journal: bioRxiv DOI: 10.1101/2022.05.07.491043 sha: 8abc91ecd860fd02247cba5d79b5a54f89b21053 doc_id: 313707 cord_uid: pfhnlvv1 Tissues are highly complicated with spatial heterogeneity in gene expression. However, the cutting-edge single-cell RNA-seq technology eliminates the spatial information of individual cells, which contributes to the characterization of cell identities. Herein, we propose single-cell spatial position associated co-embeddings (scSpace), an integrative algorithm to distinguish spatially variable cell subclusters by reconstructing cells onto a pseudo-space with spatial transcriptome references (Visium, STARmap, Slide-seq, etc.). We demonstrated that scSpace can define biologically meaningful cell subpopulations neglected by single-cell RNA-seq or spatially resolved transcriptomics. The use of scSpace to uncover the spatial association within single-cell data, reproduced, the hierarchical distribution of cells in the brain cortex and liver lobules, and the regional variation of cells in heart ventricles and the intestinal villus. scSpace identified cell subclusters in intratelencephalic neurons, which were confirmed by their biomarkers. The application of scSpace in melanoma and Covid-19 exhibited a broad prospect in the discovery of spatial therapeutic markers. Spatially resolved transcriptomics data are better examples than simulated data for 133 evaluating the reconstruction performance of scSpace because cell coordinates in 134 spatial data are biologically meaningful. Thus, we first collected spatial transcriptomics 135 data of human dorsolateral prefrontal cortex (DLPFC) 31 and mouse primary visual 136 cortex (V1) 9 , which were profiled by Visium and STARmap, respectively. Second, for 137 each set of spatial transcriptomics data, we selected one slice as the spatial reference, 138 and another slice with cell coordinates removed as test data. 139 As illustrated in Fig. 2a , scSpace successfully reconstructed the hierarchical 140 structure of different human DLPFC layers in the pseudo-space, with the relative 141 position between the layers preserved. The Normalized pairwise distances between 142 spots in the pseudo-space were calculated and found to increase monotonically with 143 the pairwise distances in the original tissue, which is consistent with our structural 144 correspondence assumption (Fig. 2b) . Further analysis demonstrated that the pairwise 145 distances between spots in the embedded pseudo-space and the ground truth were 146 highly correlated (Fig. 2c) . Moreover, we examined the spatial distribution of marker 147 genes (HPCAL1, HOPX, MEFH, PCP4, KRT17, and MOBP) in different layers (Layer 2, 148 Layer 3, Layer 4, Layer 5, Layer 6, and WM, respectively), and found that scSpace 149 predicted gene expression patterns correlated well with the original patterns (Fig. 2d) . More examples used to validate the performance of scSpace were shown in the 151 Extended Data Fig. 2a . Similar results were reproduced when we tested scSpace using 152 another spatially resolved single-cell transcriptomics data with 1020 targeted genes, 153 which were derived from mouse V1 neocortex by STARmap ( Fig. 2e and Fig. 2f ). 154 Next, we applied scSpace to reconstruct the pseudo-space of a real scRNA-seq 155 dataset derived from Allen Brain Atlas (see Methods), the results confirmed the 156 hierarchical structure of the mouse V1 neocortex. The spatial distribution pattern of 157 each layer was different and the relative positions between these layers corresponded 158 agreeably to the spatial reference (Fig. 2g) . The pairwise distances between single cells 159 from different layers and Layer 1 in the pseudo-space and the physical space increased 160 synchronously (Fig. 2h) . The spatial gene expression pattern of the biomarkers (Nr2f2, 161 Rorb, Pcp4, Lamp5, and Myl4) correlated well with the spatial distribution of individual 162 cells from corresponding layers in the pseudo-space (Fig. 2i) . 163 Regional reconstruction of scRNA-seq data in different circumstances using scSpace. 176 To investigate the ability of scSpace in restoring the relative spatial associations among 177 cells, we focused on real scRNA-seq data that were obtained from seven distinct zones 178 of an intestine 32 (Fig. 3a) , nine layers of a liver lobule 33 (Extended Data Fig. 3a) , and 179 two segregated regions of an embryonic heart 34 (Fig. 3d) , based on the robust marker 180 gene expression. After the cells were allocated to a pseudo-space, we found that the 181 distribution of cells varied from region to region ( Fig. 3b and Extended Data Fig. 3b, d) , 182 exhibiting a diverse functioning zonation in the tissue microenvironment. Along with 183 the intestinal villus axis from V1 to V6, the pairwise distances between cells to the 184 villus crypt showed an increasing trend (Fig. 3c) , which was consistent with the 185 expectation. Similar results were reproduced on the regional reconstruction of the 186 liver lobule (Extended Data Fig. 3c ) and embryonic heart. We further validated the 187 expression patterns of marker genes in different regions. 188 Specifically, clustering analysis of scRNA-seq data of the embryonic heart was 189 carried out using scSpace and Louvain, and then compared with the original 190 annotations provided experimentally. As illustrated in Fig. 3e Discovery of spatially variated subpopulations in human cortex from scRNA-seq data. 227 As mentioned above, scSpace can preserve the spatial associations of human DLPFC 228 cells by reconstructing the pseudo-space using de-coordinated spatial transcriptomics 229 data. Here we further demonstrate its ability on deciphering spatially variated 230 subpopulations in the human cortex from experimental scRNA-seq data accessed from 231 Allen Brain Atlas (see Methods). As shown in Fig. 4a , scSpace first embedded single 232 cells in a pseudo-space to reconstruct the spatial associations of cells. Consistent with 233 previous results, after spatial reconstruction by scSpace, the distribution density of 234 cells in different cortex layers was significantly different in the pseudo-space (Fig. 4b ) 235 and the normalized pairwise distances between cells and Layer 1 (L1) increased layer 236 by layer from L1 to WM (Fig. 4c) . The results showed that the pseudo-space 237 reconstructed by scSpace has biological significance and rationalized the subsequent 238 space-informed clustering based on it. 239 Next, by combining transcriptional and spatial information of single cells, scSpace 240 classified human DLPFC cells into 19 refined subpopulations (Fig. 4d ). Among these 241 subclusters, there were two subclusters in Layer 6 (L6), cluster 10# (C10) and cluster 242 15# (C15), exhibiting diverse spatial patterns in the pseudo-space (Fig. 4e) . However, 243 the spatial heterogeneity was difficult to distinguish by traditional algorithms such as 244 Louvain, which uses only transcriptional information with a clustering ARI score of -245 0.004 and an NMI score of 0.004, compared with that of 0.725 and 0.637 for scSpace, 246 respectively (Fig. 4f ). Once again, the results confirmed that the spatial information of 247 each cell is crucial for the characterization of its cellular identity. 248 Notably, as illustrated in Fig. 4d , intratelencephalic (IT) neurons in the original 249 dataset can be further classified into five subclusters (C3, C1, C9, C4, and C2) based on 250 their spatial characteristics (Fig. 4g) . scSpace analysis showed that IT neurons were 251 distributed in all layers but accounted for different proportions (Fig. 4h) . Moreover, the 252 density centers of cell spatial distribution in C3, C1, C9, C4, and C2 moved from cortex 253 L1 to WM gradually (Fig. 4i) . Five genes were selected from all differentially expressed 254 marker genes for the subpopulations (Extended Data Fig. 4a ), LAMP5 (C3), GRIK1 (C1), 255 GABRG1 (C9), PCP4 (C4), and RXFP1 (C2), were validated by the histological staining 256 images derived from the Allen Brain Atlas. As shown, the spatial expression patterns 257 of these marker genes were consistent with the distribution of the corresponding 258 subclusters and exhibited a hierarchical structure (Fig. 4j) . 259 After comprehensive evaluations of the performance of scSpace with simulated and 275 biological data, we seek for deeper exploration of its application prospect. We first 276 applied scSpace to melanoma, a cancer disease with high spatial heterogeneity, to 277 reconstruct the pseudo-space of immune cells in the tumor microenvironment from 278 scRNA-seq data. Then, we wanted to further identify spatially variated immune cell 279 subpopulations by combining both transcriptional and spatial information of each cell. 280 Thus, a total of 2064 T cells were collected from melanoma scRNA-seq data 35 . Next, 281 scSpace was utilized to recover the spatial characteristic within the scRNA-seq data 282 referenced by spatial transcriptomics data derived from another experiment 36 . 283 As shown in Fig. 5a , T cells were classified into five refined subclusters by scSpace. 284 Furthermore, the differential expression of these cell subsets was analyzed to retrieve 285 marker genes for each subpopulation (Fig. 5b) were distributed in the same pseudo-space (Fig. 5c) , and the normalized pairwise 287 distances between T cells in each subpopulation and malignant cells were calculated 288 (Fig. 5d) . The pairwise distances between malignant cells and T cells of C5, C1, C4, C2, 289 and C3 subclusters in the pseudo-space increased gradually. Subsequently, we 290 compared the two T cell subpopulations nearest (C5) and farthest (C3) from malignant 291 cells in the melanoma microenvironment by differential expression analysis (Fig. 5e 292 and Extended Data Fig. 5a) . The result showed that IL7R, JUNB, TXNIP, DUSP1, and 293 PLAC8 were highly expressed in C3, while TK1, AURKB, BIRK5, KIFC1, and MKI67 were 294 specifically expressed in C5 ( Fig. 5f and Extended Data Fig. 5b) . 295 Further, we calculated the T exhaustion score (TES) for the two subpopulations 296 using exhaustion-related genes (Extended Data Table S3 ). T cell exhaustion is the loss 297 of T cell function in patients with common chronic infections and cancer. As a result of 298 long-term exposure to persistent antigen and inflammation, exhausted T cells 299 gradually lose their effective function. Our results showed that cells from C5 that were 300 closer to malignant cells had higher TES (Fig. 5g) , which is consistent with the T cell 301 exhaustion hypothesis. We then performed a survival analysis of marker genes in the 302 C5 subpopulation (Fig. 5h and Extended Data Fig. 5c ). The results demonstrated that 303 the high expression of these marker genes in T cells can significantly reduce the 304 survival probability of patients, which is expected to become a therapeutic target for 305 precision medicine in clinics. Indeed, therapeutic strategies targeting TK1 37 , KIFC1 38 , 306 AURKB 39 , TPX2 40 , and BIRC5 41 have been reported to be potentially effective. The gene 307 set enrichment analysis enriched E2F targets, the oxidative phosphorylation, and 308 another four pathways for C5 ( Fig. 5i and Extended Data Fig. 5d AURKB + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + ++ + ++ + + + + + + + + + + + + ++ + + + + ++ + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + ++ + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + ++ + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + ++ + ++ + + + + + + + +++ + + + + ++ + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + ++ + + + + ++ Covid-19. 325 We further examined whether scSpace could distinguish the spatial variation of cells 326 between normal and disease conditions. Two scRNA-seq data 42 with comparable 327 expression states from a Covid-19 and a normal sample were collected to illustrate the 328 performance of scSpace (Extended Data Fig. 6) . By projecting single cells in normal and 329 diseased tissues into the same pseudo-space with scSpace, we can compare the cell-330 type composition and proportion, the spatial distribution patterns, and the relative 331 pairwise associations between cell subpopulations. More accurately describe the 332 process of the occurrence and development of the disease, and find the key targets 333 for the treatment of the disease. 334 After reconstructing spatial relationships of cells, the pseudo-space of the Control 335 and Covid-19 group was established (Fig. 6a ). We found a higher proportion of myeloid 336 cells in the Covid-19 group than in the Control group, and they were closer to the 337 epithelial cells (Fig. 6b) . Consistently, myeloid cells were reported to infiltrate from the 338 blood into the airway in several patients with Covid-19 43, 44 . As shown in Fig. 6c 19 group compared with the Control group (Fig. 6d) . Notably, AM was predicted by 344 scSpace to be the closest cell type to the epithelial cells, which is compatible with the 345 common knowledge that AM is itself in the alveoli. Meanwhile, MDM/TMDM 346 experienced the highest levels of invasion in Covid-19, consistent with previous 347 research 44 . The two cell types that changed the most, MDM and TMDM, were selected 348 for downstream analysis based on their dramatic decrease in the Covid-19 group 349 versus the Control group (Fig. 6e) . 350 We thus clustered MDM and TMDM into six subpopulations by combining both 351 spatial information and transcriptomes of each cell (Fig. 6f) . The expression patterns 352 of marker genes in each cell subpopulation indicated a strong spatial heterogeneity in 353 MDM and TMDM (Fig. 6g) . Then, we calculated the normalized pairwise distances 354 between epithelial cells and different subpopulations and found that the C4 subcluster 355 was nearest to epithelial cells (Fig. 6h) . Compared with other cell subpopulations, C4 significantly overexpressed a large number of mitochondria-related genes (Fig. 6i) and 357 subsequent gene ontology enrichment showed that these highly expressed genes 358 were closely related to oxidative phosphorylation, proton transmembrane transport, 359 energy metabolism, oxidative stress, etc. (Fig. 6j) The authors declare no competing interests. 423 We applied the Seurat 52 R package (v4.1.0) to pre-process the scRNA-seq and spatial 426 transcriptomics data used in scSpace. We first filtered low-quality cells and genes 427 following the standard quality control step of scRNA-seq data analysis. Next, the raw count data were normalized using the 'NormalizeData' function with the default 429 parameters, and 2000 (by default) highly variable genes were selected using the 430 'FindVariableFeatures' function with the 'vst' method. 431 The workflow of scSpace is shown in Fig. 1a and Extended Data Fig. 1a, And the MMD distance can be presented as: Uncovering an Organ's Molecular Architecture at Single-611 Cell Resolution by Spatially Resolved Transcriptomics Embryo-scale, single-cell spatial transcriptomics Integrating single-cell and spatial transcriptomics to 614 elucidate intercellular tissue dynamics Single-cell profiling of the developing mouse brain and spinal cord with split-617 pool barcoding Highly Parallel Genome-wide Expression Profiling of Individual Cells Using 619 Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells Mapping the Mouse Cell Atlas by Microwell-Seq Three-dimensional intact-tissue sequencing of single-cell transcriptional states Visualization and analysis of gene expression in tissue sections by spatial 626 transcriptomics Slide-seq: A scalable technology for measuring genome-wide expression at 628 high spatial resolution High-Spatial-Resolution Multi-Omics Sequencing via Deterministic Barcoding in Tissue Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH Microscopic examination of spatial transcriptome using Seq-Scope High-definition spatial transcriptomics for in situ tissue profiling Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-638 seqV2 Robust decomposition of cell type mixtures in spatial transcriptomics SPOTlight: seeded NMF regression to 642 deconvolute spatial transcriptomics spots with single-cell transcriptomes SpatialDWLS: accurate deconvolution of spatial transcriptomic data Spatial reconstruction of single-cell gene 647 expression data Resolving organoid brain region identities by mapping single-cell genomic data to 649 reference atlases Spatial Gene Enhancement using 651 scRNA-seq Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell 653 Identity Integration of spatial and single-cell transcriptomic data elucidates mouse 655 organogenesis Identification of region-specific astrocyte subtypes at single cell resolution A single-cell and spatially resolved atlas of human breast cancers Refining the Molecular Framework for Pancreatic Cancer with Single-cell and Spatial 661 Technologies CellTalkDB: a manually curated database of ligand-receptor interactions in humans 663 and mice New avenues for systematically inferring cell-cell 665 communication: through single-cell transcriptomics data Cell-type modeling in spatial 667 transcriptomics data elucidates spatially variable colocalization and communication between cell-668 types in mouse brain Transcriptome-scale spatial gene expression in the human dorsolateral 670 prefrontal cortex Spatial Reconstruction of Single Enterocytes Uncovers Broad Zonation along the 672 Intestinal Villus Axis Single-cell spatial reconstruction reveals global division of labour in the 674 mammalian liver A Spatiotemporal Organ-Wide Gene Expression and Cell Atlas of the Developing 676 Human Heart Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-678 seq Enables Dissection of Genetic Heterogeneity in Stage III Cutaneous Malignant Melanoma Increased serum level of thymidine kinase 1 correlates with metastatic site in patients 683 with malignant melanoma KIFC1 promotes aerobic glycolysis in endometrial cancer cells by regulating the c-myc 685 pathway HI-511 overcomes melanoma drug resistance via targeting AURKB and BRAF V600E Targeting of TRX2 by miR-330-3p in melanoma inhibits proliferation Identification of potential therapeutic targets for melanoma using gene expression 691 analysis A molecular single-cell lung atlas of lethal COVID-19 Longitudinal profiling of respiratory and systemic immune responses reveals 694 myeloid cell-driven lung inflammation in severe COVID-19 Pathological inflammation in patients with COVID-19: a key role for 696 monocytes and macrophages Elevated Glucose Levels Favor SARS-CoV-2 Infection and Monocyte Response 698 through a HIF-1alpha/Glycolysis-Dependent Axis Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic 700 region Gene expression cartography Identification of spatial expression trends in single-cell gene 704 expression data SpatialDE: identification of spatially variable genes Integrating gene expression, spatial location and histology to identify spatial 708 domains and spatially variable genes by graph convolutional network Deciphering spatial domains from spatially resolved transcriptomics with an 711 adaptive graph attention auto-encoder Integrated analysis of multimodal single-cell data Domain adaptation via transfer component analysis From Louvain to Leiden: guaranteeing well-connected 718 communities Splatter: simulation of single-cell RNA sequencing data The spatial transcriptomic landscape of the healing mouse intestine following 722 damage Spatial proteogenomics reveals distinct and evolutionarily conserved hepatic 724 macrophage niches Landscape of Infiltrating T Cells in Liver Cancer Revealed by Single-Cell Sequencing Gene set enrichment analysis: a knowledge-based approach for interpreting 728 genome-wide expression profiles Metascape provides a biologist-oriented resource for the analysis of systems-level 730 datasets And the solution are the leading eigenvectors of ( + ) −1 , i.e., 471 the latent feature representation across scRNA-seq data and spatial transcriptomics 472 data with true biological characteristics. 473 Spatial reconstruction. Once the latent biological feature representation across 474 scRNA-seq and spatial transcriptomic data are extracted, a multi-layer fully connected 475 neural network model is applied to spatial reconstruction of scRNA-seq data. The size 476 of input layer of this model is equal to the dimension of the latent biological feature 477 representation, and the size of output layer is corresponding to the dimension of 478 spatial coordinates. Denote ′ and ′ are latent biological feature representation 479 of scRNA-seq data and spatial transcriptomics data, respectively. We assume that the 480 spatial locations of cells are related to their latent biological feature representation: 481Where and are spatial coordinates of cells (or spots) in spatial 483 transcriptomics data ′ . scSpace first trains the model on scRNA-seq data using mean 484 squared error (MSE) loss function. Once training is finished, scSpace then applies the 485 model to scRNA-seq data ′ , and the spatial information of each single cell is 486 reconstructed (we term 'pseudo-space'). 487Space-informed clustering. scSpace applies space-informed clustering to identify 488 spatially heterogeneous single-cell subclusters based on gene expression and pseudo-489 space information of cells in scRNA-seq data. In detail, a gene expression graph 490 ( , 1 ) and a space graph ( , 2 ) are constructed respectively using -nearest 491 neighbor (KNN) algorithm. Since our goal is to find spatially heterogeneous subclusters 492 that may be similar in gene expression, the space graph ( , 2 ) is then 493 transformed to the spatial weight of each edge in gene expression graph 494Finally, scSpace applies unsupervised clustering on spatial-weighted gene 497 expression graph using Leiden algorithm 55 . 498 We applied Splatter R package (v1.16.1) 56 to simulate 50 paired scRNA-seq and spatial 500 transcriptomics data with 5000 expression genes (Extended Data Fig. 1b) . In order to 501 simulate the original batch effect between these two types of data, we set up 2 batches 502 representing scRNA-seq and spatial transcriptomics data, respectively. Moreover, for 503 the robustness of results, we set a gradient from 500 to 2000 for the number of cells, 504 and a gradient from 3 to 8 for the number of cell populations. 505Next, for each cell, a pseudo spatial coordinate was assigned based on random 506 sampling and normal distribution strategy. We randomly choose 2 to 4 cell populations 507 and set the probabilities of a gene being differentially expressed in each of them as 508 0.01 (0.1 ~ 0.2 for remaining cell populations by contrast). Then, we assigned these 509 cell populations as different spatially associated subpopulations with similar 510 transcriptome. 511We benchmarked scSpace with 3 classical clustering algorithms: Louvain, K-means 512 and Hierarchical clustering. We used adjusted rand index (ARI) and normalized mutual 513 information (NMI) measurements to evaluate the performance of clustering result. We 514 also performed Pearson correlation analysis to evaluate the spatial reconstruction 515 results of scSpace. 516 Human dorsolateral prefrontal cortex (DLPFC) data. The human DLPFC spatial 518 transcriptomics data 31 (Visium, 10x Genomics) was downloaded from 519 http://research.libd.org/spatialLIBd. We selected slice 151674 as spatial 520 transcriptomics data reference, and other 11 slices as simulated scRNA-seq data. The 521 distance between cells were normalized with the following formula: 522The differential expression genes between cortex layers were calculated by the 524 function 'FindAllMarkers' of Seurat R package (v4.1.0) with the default two-tailed 525Wilcoxon rank sum test. 526Mouse V1 neocortex data. The mouse V1 neocortex spatial transcriptomics data 527 (STARmap) 9 was downloaded from https://www.starmapresources.org/data. We 528 selected two replicates with 1020 genes as spatial transcriptomics data reference and 529 simulated scRNA-seq data, respectively. 530 Mouse V1 neocortex data. The mouse V1 neocortex scRNA-seq data were downloaded 532 from the RNA-seq data repository of Allen Brain Atlas website (https://portal.brain-533 map.org/atlases-and-data/rnaseq/mouse-v1-and-alm-smart-seq) and then we down 534 sampled to 5000 cells from the data. The mouse V1 neocortex STARmap data were 535 utilized as spatial references. The differential expression genes between cortex layers 536 were calculated using the 'FindAllMarkers' function in Seurat R package (v4.1.0). 537Mouse intestines data. The mouse intestine scRNA-seq data 32 were downloaded 538 from the Gene Expression Omnibus (GEO, GSE109413). We also downloaded another 539 set of spatial transcriptomics data 57 (GSE169749) as the spatial reference. The distance 540 between cells were normalized as described before. 541Mouse liver data. The mouse liver scRNA-seq data 33 and the spatial transcriptomics 542 reference 58 were obtained from GEO (GSE84498) and the Liver Cell Atlas 543 (http://www.livercellatlas.org), respectively. The distance between cells were 544 normalized as described before. 545Human developing heart data. The human developing heart scRNA-seq data and 546 the spatial transcriptomics reference data sequenced by Asp 34 et al. were downloaded 547 from https://www.spatialresearch.org. We used Louvain as benchmarking clustering 548 method compared for scSpace. Statistical differences in the distance between cells of 549 different subpopulations were assessed with Mann-Whitney test. The differential 550 expression genes between different cell subpopulations were calculated by Seurat R 551 package (v4.1.0). 552 For the human cortex scRNA-seq data (SMART-Seq v4), we downloaded it from Allen 554 Brain Atlas (https://portal.brain-map.org/atlases-and-data/rnaseq/human-multiple-555 cortical-areas-smart-seq) and then down sampled to 4000 cells. The human DLPFC 556 data (slice 151674) was applied as spatial reference. We extracted C1, C2, C3, C4, and 557 C9 clusters for IT subpopulation analysis. Layer proportion and normalized distance to 558 L1/WM of every cluster were calculated. We further investigated the specifically 559 expression genes of each IT subpopulation using Seurat R package (v4.1.0). The in situ 560 hybridization (ISH) images from visual cortex (LAMP5, GRIK1) or temporal cortex 561 (GABRG1, PCP4, and RXFP1) of adult human brain were downloaded from Allen 562 Human Brain Atlas: http://human.brain-map.org/. 563 The human melanoma scRNA-seq 35 and spatial transcriptomics data 36 were accessed 565 from GEO (GSE72056) and https://www.spatialresearch.org, respectively. After 566 assigning every single cell a pseudo-space coordinates, 2064 T cells were extracted for 567 spatial-informed sub-clustering, and We calculated the normalized distance of every T 568 subpopulation to malignant cell. We then selected subpopulation C5 closest to 569 malignant cell and subpopulation C3 farthest to malignant cell for differential