key: cord-0266831-c2q3i3xe authors: Medrano-Trochez, Camila; Chatterjee, Paramita; Pradhan, Pallab; Ogle, Molly E; Botchwey, Edward A; Kurtzberg, Joanne; Yeago, Carolyn; Gibson, Greg; Roy, Krishnendu title: Single-Cell RNAseq of Out-of-Thaw Mesenchymal Stromal Cells Shows Striking Tissue-of-Origin Differences and Inter-donor Cell-Cycle Variations date: 2020-09-11 journal: bioRxiv DOI: 10.1101/2020.09.10.290155 sha: 8466d5f4d82ba56970c2ef30ae80acfbaf2fe420 doc_id: 266831 cord_uid: c2q3i3xe Mesenchymal stromal cells (MSCs) from a variety of tissue sources are widely investigated in clinical trials, and the MSCs are often administered immediately after thawing the cryopreserved product. While previous reports have examined the transcriptome of freshly-cultured MSCs from some tissues, little is known about the single-cell transcriptomic profiles of out-of-thaw MSCs from different tissue sources. Such understanding could help determine which tissue origins and delivery methods are best suited for specific indications. Here, we characterized cryopreserved MSCs, immediately post-thaw, from bone marrow (BM) and cord tissue (CT), using single-cell RNA sequencing (scRNA-seq). We show that out-of-thaw BM-vs. CT-MSCs have significant differences in gene expression. Gene-set enrichment analyses implied divergent functional potential. In addition, we show that MSC-batches can vary significantly in cell cycle status, suggesting different proliferative vs. immunomodulatory potentials. Our results provide a comprehensive single-cell transcriptomic landscape of clinically and industrially relevant MSC products. Highlights Single cell gene expression comparison between Bone-marrow derived MSCs and Cord-tissue derived MSCs Donor effects and cell heterogeneity on tissue-specific MSC gene expression Single Cell Pooling Enhances Differential Expression Analysis for Bone marrow and Cord tissue MSC samples Gene ontology reveals tissue specific unique molecular function and pathways 8 the MSC-identity markers established by the ISCT, namely NT5E (CD73), THY1 (CD90) and ENG (CD105) were detected in the majority of cells in the bone marrow and umbilical cord tissue derived MSCs ( Figure S3 ). Furthermore, CD34, CD14, CD19 and PECAM1 -all markers of hematopoietic or lymphoid lineages, were absent. Projecting each cell against the first two Principal Components (PC) of gene expression, three clusters of single cell profiles were observed ( Figure 1A ). PC1 is highly negatively correlated with the total UMI count per cell. Accordingly, the smallest cluster located to the right, consists of 17% of the cells all of which had low UMI counts, typically fewer than 1,000 detected genes, and low ribosomal protein transcript counts ( Figure 1B) . These low UMI counts cells were more prevalent in two donors studiedone from each of the two laboratories (Lab A and Lab B; BM1 and BM4, respectively: Figure 1C ), suggesting the low UMI count may not be related to the lab in which they were manufactured. It is not clear whether the unusual profile of these cells is a technical artefact, or has a biological basis, but they appear to be of low quality and were excluded from all subsequent analyses. Two clusters identified by SC3 in the remaining high-quality datasets largely differentiate along PC3 ( Figure 1D ). Three of the five samples expanded in laboratory B (BM3, BM4 and BM6) were predominantly found in cluster BM-High_b; the other two samples along with both of the samples expanded in laboratory A (BM1, BM2a, BM2b and BM5) were predominantly found in cluster BM-High_a ( Figure 1E ). Both samples from the donor whose cells were cultured in each of the laboratories are in cluster BM-High a (BM2a and BM2b), suggesting that the difference is more likely to be donor-related than due to a laboratory or technical effect. Nevertheless, Figure 1E shows that even between the two laboratories, the cells from this donor tend to separate along PC3. Although the above analysis follows current standard practice, it is nevertheless subject to the caveat that for a large percentage of the genes, transcripts are only observed in fewer than half of the cells, and consequently the differential expression analysis is mostly based on presence-versus-absence. In order to lend robustness to the conclusions, we elected to analyze differential expression with an alternative strategy that accounts for the impact of a high proportion of drop-outs in scRNA-seq. Our method, "scPool", is based on pooling of cells of the same type within samples, which ensures that a high proportion of genes are represented by an approximately normal count distribution. It also allows for fitting of normalization strategies initially developed for bulk microarray or RNA-seq analysis. After normalizing gene expression values to counts per 10,000 UMI, differential expression analysis between clusters BM-High_a and BM-High_b was performed in EdgeR with donor as a random covariate, yielding 4,624 genes at a FDR of 5%. There were 2,230 genes upregulated in cluster 2a, and 2,394 upregulated in cluster 2b ( Figure 2A ). Gene ontology analysis detected strong enrichment for multiple pathways involved in cell cycle regulation in cluster BM-High_b ( Figure 2B ). By contrast, cluster BM-High_a showed upregulation of multiple pathways related to immune signaling and other processes expected to be characteristic of functional MSCs ( Figure 2B ). On the basis of the cell cycle gene expression, cells in cluster BM-High_b may be preparing for or undergoing cell cycle division, whereas the cluster BM-High_a MSCs may be more likely to be in G0 phase. Alternately the two populations may simply express cell cycle related genes at different levels, without this reflecting cell cycle stages. To implement the scPool strategy, we created random pools of pseudo-cells consisting of the sum of raw read counts for random draws of 20 cells (excluding the low-quality cluster 1 cells) within a donor's 10 sample. The number of such pseudo cells ranged from 6 for sample BM5 to 20 for sample BM4. We then retained all genes with at least one read in 90% of the pseudo cells, a total of 6,422 genes. This dataset was normalized using the supervised normalization of microarrays (SNM) protocol (27) with cluster sub-type as the biological variable, adjusting for donor effects, and analysis of variance was used to detect differentially expressed genes. The procedure was repeated ten times, and the fold change and p-values were averaged to generate a robust list of cluster-specific genes. Compared with the single cell analysis, 1,290 more significant differentially expressed genes were detected. Figure 2C shows generally higher significance than the single cell-level analysis, without over-estimation of fold-changes for a large fraction of the less-significant genes. Next, we probed the magnitude of donor contributions to variance within clusters, by performing analysis of variance with donor as the fixed effect of interest. Within just the high-quality cluster BM-High_a cells, donor effects accounted for 8.5% of the variance. A similar result was observed for cluster BM-High_b. Umbilical cord tissue derived MSC (UCT-MSC) samples from four donors were used for scRNA-seq analyses. A total of six scRNA-seq samples were prepared, all from the same lab, including three biological replicates of one donor sample (UCT1a, UCT1b and UCT1c). An average of 349 cells were profiled per sample, with an average read depth of 27,417, representing 9,700 UMI and 3,057 expressed genes per cell ( Table 2 ). The UCT-MSC gene expression profiles were also analyzed with the SC3 pipeline. Two major clusters of single cell profiles were again observed in the projection of the first two principal components of the UCT-MSC data ( Figure 3A ). The smallest of these, consisting of 13% of the cells, was 11 characterized by cells with low UMI counts, typically fewer than 2,000 detected genes ( Figure 3B ), similar to the BM-MCS analysis. These low UMI-count cells were present in every sample but again were more prevalent in two of the samples (UCT1c and UCT3: Figure 3C ). It is not clear whether the origin of these cells is a technical artefact, or has a biological basis, but they also appear to be of low quality and were again excluded from all subsequent analyses. Within the high-quality cells, SC3 once more identified two clusters of cells, UCT-High_a (UCT2 and UCT3) and UCT-High_b (UCT1a, UCT1b, UCT1c and UCT4), though in this case they did not clearly correspond to one of the Principal Components. The three replicates of donor UCT1 were primarily captured within the UCT-High_b cluster, suggesting consistency of technique. Implementation of the scPool strategy for detecting differential gene expression between the two UCT-MSC clusters, after removing the low-quality cells, detected 2,526 genes at an FDR of 5%. Directional up- Direct comparison of MSCs derived from bone marrow and MSCs from umbilical cord tissue was performed by combining the analyses of the previous two datasets. As expected, Principal Component Analysis (PCA) of the raw single cell profiles again identified two major clusters of low and high UMI-abundance cells along PC1, but in this case PC2 of the joint analysis cleanly differentiates the BM and CT samples ( Figure 5A ). This result implies that there are significant differences in gene profile between MSCs derived from the two source tissues. We implemented scPool to identify differentially expressed genes between MSCs derived from the two sources, after removing the low-quality cells identified in our first series of analysis. Pools of 20 cells were again assembled computationally within donor and mixed model analysis of variance was performed, with donor as a random effect. 3,437 genes were found to be up-regulated in the BM-MSC, and 3,250 genes down-regulated, compared to UCT-MSCs. We highlighted the top 16 differentially expressed genes ( Figure S4 ). Gene ontology analysis of all the significantly differential expressed genes indicates enrichment for metabolism of lipids and lipoproteins, cholesterol biosynthesis, mitochondrial translation, and metabolic pathways in the BM-MSCs, whereas the UCT-MSCs were enriched for ECM organization, collagen biosynthesis and signal transduction ( Figure 5B ). UCT-MSC also showed relative upregulation of mitotic cell cycle pathways, but this likely reflects the greater ratio of UCT-High_b to UCT-High_a cluster cells than of BM-High_b to BM-High_a cells, rather than a consistent trend favoring cell division in the UCT-MSC. Pathway enrichment analysis also showed that the differences between the two clusters in each tissue are not consistently maintained. The chord diagram ( Figure 5C ) highlights pathways overexpressed in BM-High_a and UCT-High_a, which are not the same. These data confirm differences in the gene expression of non-dividing cells as a function of tissue of origin and suggest that the two types of MSCs are likely to have divergent regulatory and functional potentials. Importantly, the UCT-High_a population exhibit higher expression of genes involving pro inflammatory mediation, ECM organization and collagen biosynthesis, whereas the BM-High_a population had higher expression of steroid biosynthesis, the citric acid cycle and neutrophil degranulation genes ( Figure 5C ). 13 Next, we examined the expression level of genes that play important roles in the immunomodulatory response induced by MSCs. Focused comparison of expression of genes that are associated with cell adhesion, migration, immunosuppression and immunostimulation between the BM-and UCT-derived MSCs suggests tissue-of-origin and donor differences in gene activity ( Figure S5 We also looked at multiple pluripotent and stemness genes ( Figure S6 ), none of which were found to be significantly differentially expressed between bone marrow and umbilical cord tissue. MSCs from various tissue sources are the subject of 4044 registered clinical trials (ClinicalTrials.govkeyword: MSC as other terms with filters -not yet recruiting, recruiting, enrolling by invitations, and active, not recruiting; search date -August 7, 2020). It is thus important to develop robust highthroughput approaches for characterization of diverse batches from various tissue sources in order to help evaluate reasons for success or failure of individual trials or patient responses. Single cell RNA sequencing is a relatively unbiased approach to profile the molecular attributes of individual cells. Potential utility of scRNA-seq includes characterization of heterogeneity that cannot be observed with bulk RNA-seq, and monitoring of the effect of the stage of the cell-cycle on transcriptional diversity. Contrasting pre-freeze and post-thaw samples from 2 donors, we identified numerous differentially expressed genes that are associated with different types of cellular functions, such as cytokine signaling, cell proliferation, cell adhesion, cholesterol/steroid biosynthesis, and regulation of apoptosis. Previuosly, functional differences between pre-freeze (fresh) and post-thaw MSCs were also reported by others (28). In this study, however, we focused on in-depth scRNA-seq analysis of post-thaw MSCs as they are currently being tested as cell therapy products in many clinical trials. Here we describe a droplet-based scRNA-seq comparison of donor, tissue-of-origin, and expansion conditions of out-of-thaw MSC variability, concluding that bone marrow and umbilical cord tissuederived MSCs have significant differential expression that likely explains some of the documented differences between them, and that donor differences are modest yet significant. To our knowledge, six other scRNA-seq studies (11, 12, (20) (21) (22) (23) On the other hand, it appears that putative G0 cells that do not express cell cycle genes have quite different transcriptional properties that are directly relevant to their biological functions such as immunomodulatory potential. We note that each of our samples was profiled at population doubling level (PDL) ranged from 12-15, eliminating passage number as a source of variability in our study. Other authors have also chosen to regress out "batch" effects before searching for heterogeneity, even though in each case "batch" appears to be coincident with "donor" (8, 21) or "Passage" (20). In the absence of biological replication, that is, two MSC preparations obtained independently from the same donor, it is impossible to know whether differences between sample populations have a biological or technical basis. Nevertheless, we estimate from principal component variance analysis that less than 10% of the overall expression variability is among donors/batches within each of the two clusters observed in both the BM-and UCT-MSC datasets ( Figure 1C , Figure 3C ). We see this minor source of variability is donor-related in the two instances where we had technical replicates from the same donor (in the case of the two BM-MSC samples cultured in different laboratories). The cells strongly tended to be assigned to the same sub-cluster BM-High_a or UCT-High_a. Whether or not these differences impact MSC function in clinical applications remains to be seen, additional large-scale comparisons with a large set of samples with high quality data on patient outcomes will need to be analyzed. In this study the transcriptomes of human bone marrow and cord tissue-derived MSCs were analyzed via drop-seq single cell RNA-seq. Using this approach, new information about MSCs emerges. First, the differences between bone marrow-derived MSCs and cord-tissue derived MSCs were seen. Surprisingly, pathways up-regulated in G0 bone marrow-derived MSCs did not correspond to the same pathways upregulated in G0 cord tissue -derived MSC ( Figure 5C ). Further, we observed differences in various immune regulatory genes between bone marrow and cord tissue MSCs, especially for the "a" cluster cells ( Figure 5C ). Notably, BM-High_a MSCs had higher gene expression for PTGES2, and the protein encoded by this gene is known to have direct or indirect role in immunomodulation by MSCs (29, 30). PTGES2 encodes membrane-bound prostaglandin synthase E2 which converts prostaglandin H2 (PGH2) to prostaglandin E2 (PGE2) that is known to have anti-inflammatory/immunosuppressive effects on various immune cells, including macrophages, T cells and B cells (31, 32). MSC surface proteins are important for their significant roles in identification and functions (33). When we compared gene expression for surface markers that are known to have some immunomodulatory functions, BM derived MSCs showed higher expression for CD46, CD47, and CD276 whereas UCT derived MSC had higher expression for CD81. Surface expression of CD46 protein helps MSCs to inhibit complement binding and complement-mediated lysis (34). CD47 serves as a "don't eat me" signal to avoid phagocytosis by engaging its cognate ligand signal-regulatory-protein alpha (SIRP alpha) on phagocytes (35, 36), and the interaction of CD47 with SIRP alpha is reported to inhibit antigen presenting cell (APC) maturation and enhance STAT3 phosphorylation and IL10 induction in APC (37). CD276 is known to cause immune suppression by inhibiting T cell function and is currently being Figure S6 ), though we note that abundance of these transcripts was very low which reduces power to observe differential expression. In summary, this study both confirms the potential for functional differences to exist between MSCs derived from different tissues and even donors, and that within-sample heterogeneity is low. The expression of cell cycle markers is a major component of heterogeneity among donors, and manufacturing processes may need to accommodate biological and technical influences on proliferative potential. These findings will help improve the therapeutic MSC manufacturing processes and identify the most efficient cells from a heterogeneous MSC population. Even though differences in the gene expression profile between bone marrow and cord tissue G0 MSC were found, further studies are Experimental Procedures: This study was approved by the ethics committee of the institutional review boards at Georgia Institute of Technology and Duke University (IRB protocol no. H17348). All procedures involving human participants were in accordance with the ethical standards of the research committee. Informed consent was obtained from all participants. Seven frozen human bone marrow-derived MSC lots from six male donors were purchased from RoosterBio Inc., and expanded using RoosterBio expansion protocol (https://www.roosterbio.com/wpcontent/uploads/2019/10/A.-RoosterBio-MSC-001-BOM-Expansion-Protocol-IF-08022016.pdf). Briefly, a BM-hMSC high performance media kit was brought to room temperature. Then 1 vial of Media Booster GTX (RoosterBio, catalog no. SU-003) was added to 500ml hMSC high performance basal media (RoosterBio, catalog no. SU-005). The 10million BM-hMSC vial was obtained from a liquid nitrogen dewar and immediately thawed at 37 0 C for approximate 2 minutes while monitoring the process and removed from the water bath once a small bit of ice remained. Cells were aseptically transferred to a 15ml centrifuge tube and 10ml cultured media was added. The cells were spun down at 200g for 10 minutes and all the supernatant was discarded. The cell pellet was re-suspended in 10ml of culture media and transferred into 500ml media bottle. The cells were mixed by capping and gently inverting the bottle and distributed (seeded at 3500-4000 cells/cm 2 and 42 mL media/T225) equally into T-225 vessels (Corning cat no. 431082). The vessels were transferred to a 37 0 C incubator ensuring that the surfaces were covered with media. The vessels were observed microscopically from day 1 to determine percentage confluency. Once they reached >80% confluency, they were harvested the next day, and 20 cryo-preserved in Cryostor CS-10 freezing media. All single-cell RNA-sequencing was performed on the out of thaw cells directly from these frozen vials. Samples BM1 and BM2a were cultured in Laboratory A, while samples BM2b, BM3, BM4, BM5 and BM6 were cultured in Laboratory B. Samples BM2a and BM2b were from the same donor. For cord tissue derived samples, six frozen human MSC samples from four male donors were provided by the department of pediatrics, Duke University. Cryopreserved P0 vials were placed in a sterile bag which was itself placed in a 37°C water bath. Vials were thawed until the cell suspension was slushy (~2 minutes). Cell suspension from the vials were transferred to a 15 mL tube containing XSFM (Irvine Scientific, cat. no. 91149) using a sterile serological pipette and the cell suspension was mixed slowly. The cryovial was rinsed with 0.5 mL of XSFM and the rinse was transferred to the 15 mL tube. After mixing slowly, the cell count and viability was measured. The cells were mixed in the 15 mL conical using a sterile pipette and transfer the volume containing 3.4 x 10 6 cells into a HYPER flask containing 1.12 L of XSFM and the bottle was mixed gently. The HYPER Flasks were placed into a 37°C/5% CO2 incubator. The P1 cells were harvested after 5-7 days. The P2 cells were then frozen using CS-10 freezing media and cryopreserved. The Cryopreserved P2 cells were shipped to us for the downstream characterizations. All the single-cell RNA-sequencing was performed on the out of thaw cells directly from these frozen vials. Samples UCT1a, UCT1b and UCT1c come from the same donor. Frozen vials containing 1 million MSC were thawed in a 37 0 C water bath for a couple of minutes. Cells were then aseptically transferred to a 15 ml centrifuge tube. Room temperature RPMI media (1mL) was used to rinse the cell vial and added to the cells in the 15 ml tube. Another 3 ml of media was added to 21 the cells and mixed well with serological pipette. Cells were counted using Nucleocounter and spun down at 200 g for 10 minutes. The cells were re-suspended in media and counted again and processed for scRNA-sequencing. The Illumina-Bio-Rad ddSEQ platform was used to process, capture, and barcode the cells to generate single-cell Gel Beads by following the manufacturer's protocol. Cell suspensions were loaded onto a ddSEQ Cartridge along with reverse transcription master mix, and encapsulated and barcoded by the Single-Cell Isolator. Lysis and barcoding took place in each droplet. Droplets were disrupted and cDNA was pooled for second strand synthesis. Libraries were generated with direct cDNA tagmentation using Nextera technology. Tagmentation was followed by 3′ enrichment and sample indexing to prepare indexed, sequencing-ready libraries. The libraries were sequenced using Nextseq sequencing Platform on an Ilumina NextSeq in the IBB Molecular Evolution core at Georgia Tech (PE75, mid-output V2.5 kit). The library quality (check for primer dimers, adopter dimers, ethanol contamination, degradation as well as the size and concentration) was confirmed before each sequencing run using Agilent Bioanalyzer2100. All the BM-MSC samples have 2415 cells with an average 3,835 genes/cell (Table 1) Confirmation that MSCs were relatively pure populations of undifferentiated cells was revealed by FACS analysis of cell surface markers provided by the manufacturer as a release criteria. Furthermore, scRNAseq (which is less sensitive due to high drop-out rates) confirmed prevalent expression of NT5E (CD73), THY1 (CD90), and ENG (CD105) and absent expression of CD34 among other genes (Fig. 6) . ENG was expressed on 66% of the cell, NT5E on 72%, and THY1 on 92%. In contrast, each of the transcripts PTPRC, CD34, CD14, ITGAM, CD79A, CD19, and HLA-DRA were detected in less than 0.5% of the cells. Sample demultiplexing and gene counts were extracted using the Illumina Sure cell pipeline. The raw reads were trimmed, and the gene-barcode matrix was generated. Sure cell was also used to filter and align the samples and to generate gene-cell UMI count matrices. The seven samples from bone marrow were sequenced in three different batches, and the six samples from cord tissue were sequenced in two batches. Downstream analysis was initiated with SC3 software (40) for cell clustering. Owing to the high proportion of zero counts in most cells (as is typical of dropseq data), we elected to perform differential expression analysis on pools of pseudo-cells whose profiles have a distribution of read counts that closely resembles that of bulk RNAseq. Custom R scripts were used to generate pools of pseudo-cells by pooling groups of 20 cells within each sample and cluster, and then summing their gene count. The gene expression values from the pseudo cells were normalized to counts per million before using EdgeR (41) for normalization and differential expression estimation. Default likelihood ratio tests assuming negative binomial distributions were performed in EdgeR to evaluate the significance of differential expression. Ten permutations of this procedure were performed, and the average differential expression and negative-log P-values were computed, and genes significant with a FDR less than 5% were selected for downstream gene ontology analysis. Then gene ontology was performed on the differentially expressed genes using GSEA (42, 43) and ToppGene (44) tools. 32 Tables: Table 1 Mesenchymal stem cells: Cell therapy and regeneration potential Mesenchymal Stromal Cells: Clinical Challenges and Therapeutic Opportunities Engineering Stem and Stromal Cell Therapies for Musculoskeletal Tissue Repair Mesenchymal stem cell 3D encapsulation technologies for biomimetic microenvironment in tissue regeneration Mesenchymal stem cells in regenerative medicine: Focus on articular cartilage and intervertebral disc regeneration Challenges in Clinical Development of Mesenchymal Stromal/Stem Cells: Concise Review Minimal criteria for defining multipotent mesenchymal stromal cells. The International Society for Cellular Therapy position statement Single-cell RNA-seq of cultured human adipose-derived mesenchymal stem cells Automated microscopy as a quantitative method to measure differences in adipogenic differentiation in preparations of human mesenchymal stromal cells Establishing criteria for human mesenchymal stem cell potency Highly Parallel Genomewide Expression Profiling of Individual Cells Using Nanoliter Droplets Massively parallel digital transcriptional profiling of single cells Biochemical heterogeneity of mesenchymal stem cell populations: clues to their therapeutic efficacy Integrated Single-Cell Analysis Maps the Continuous Regulatory Landscape of Human Hematopoietic Differentiation Single-cell profiling of human megakaryocyte-erythroid progenitors identifies distinct megakaryocyte and erythroid differentiation pathways Human haematopoietic stem cell lineage commitment is a continuous process Single-cell RNA sequencing to explore immune cell heterogeneity Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors