key: cord-103462-z3d9lcar authors: Wang, Shiyu; Wang, Longlong; Liu, Ya title: CD4+ T cell subsets present stable relationships in their T cell receptor repertoires date: 2020-11-02 journal: bioRxiv DOI: 10.1101/2020.11.01.364224 sha: doc_id: 103462 cord_uid: z3d9lcar CD4+ T cells are key components of adaptive immunity. The cell differentiation equips CD4+ T cells with new functions. However, the effect of cell differentiation on T cell receptor (TCR) repertoire is not investigated. Here, we examined the features of TCR beta (TCRB) repertoire of the top clones within naïve, memory and regular T cell (Treg) subsets: repertoire structure, gene usage, length distribution and sequence composition. First, we found that memory subsets and Treg would be discriminated from naïve by the features of TCRB repertoire. Second, we found that the correlations between the features of memory subsets and naïve were positively related to differentiation levels of memory subsets. Third, we found that public clones presented a reduced proportion and a skewed sequence composition in differentiated subsets. Furthermore, we found that public clones led naïve to recognize a broader spectrum of antigens than other subsets. Our findings suggest that TCRB repertoire of CD4+ T cell subsets is skewed in a differentiation-depended manner. Our findings show that the variations of public clones contribute to these changes. Our findings indicate that the reduce of public clones in differentiation trim the antigen specificity of CD4+ T cells. The study unveils the physiological effect of memory formation and facilitates the selection of proper CD4+ subset for cellular therapy. (TCR), CD4+ T cells recognize the complex of epitopes and major histocompatibility complex 32 II and then induce the activation of other cells in infections 1,2 , cancer 3 and autoimmune diseases. 33 To acquire mature functions, CD4+ T cells undergo differentiation. NT is the protype of 34 CD4+ T cell and has the greatest potential among CD4+ T subsets to differentiate to other 35 subsets. NT usually keep a serenity and can refresh themselves by proliferation. When NT 36 encounters pathogens, it will home to lymphatic organs and receive the help from dendritic 37 cells to initiate the polarization. The study on TCR repertoire suggests that NT has the most 38 large scale of evenness of TCR repertoire among all CD4+ subsets4, which indicates the 39 greatest potential to recognize antigens. In a classical differentiation model5,6, naive (NT) 40 SARS-CoV-2 12 . Cross-reactivation from memory can provide a rapid protection to a novel 59 pathogen in some individuals, such as the case reports of COVID-19 13 . The importance of TCR 60 repertoire for memory cell functions was found in tissues, where the differential composition 61 of TCR repertoire of CD4+ memory among tissues equipped them with distinct functions 14 . 62 The function of Treg was restricted by TCR repertoire. The optimal diversity of TCR was 63 essential for the suppressive ability 15 , and limitations on TCR diversity disturbed the self-64 tolerance of immune system 16 . Although evidences show that the features of TCR repertoire are 65 distinct among CD4+ T subsets, the effect of differentiation on T cell receptor (TCR) repertoire 66 of CD4+ T cells are not investigated. 67 To unveil the influence of differentiation on TCR repertoire, we analyzed the sequencing 68 data of TCR beta (TCRB) chain of NT, ET, EMT, CMT, Tscm and Treg. We detected repertoire 69 structure, germline gene usage, sequence composition and public clones of TCRB repertoire of 70 each subset. We found that NT, CMT, Tscm and Treg were discriminated from each other by 71 repertoire structure, gene usage and sequence composition, independently. The TCRB 72 The TCRB repertoire structure of NT is similar to The TCRB repertoire structure of CMT 139 and Tscm 140 Frequent clones affect the immune repertoire structure 24 . We thus performed the analyses 141 on top1000 clones within each subset. Renyi entropy with alpha values from zero to twenty was 142 used to evaluate the diversity. In dataset1, The TCRB repertoire of NT and Tscm present similar 143 diversities at all alpha values, and are more diverse than the TCRB repertoire of CMT and the 144 TCRB repertoire of Treg ( Figure 1A ). In dataset2, NT has the most diverse TCRB repertoire 145 among all subset whereas ET has the lowest. The TCRB repertoire of CMT is more diverse 146 than that of ETM and Treg. (Supplementary figure 1A) . The similarity of TCRB repertoire 147 structure of subsets was estimated by Jensen-Shannon distance. In dataset1, the TCRB 148 repertoire structure of NT is similar to the TCRB repertoire structure of less-differentiated 149 subsets (CMT and Tscm), but the TCR repertoire structures of Tscm and CMT are different 150 from each other; the TCRB repertoire of Treg is different to the TCRB repertoire of NT and 151 CMT with high JSDs. It indicates that Treg has a structure of TCRB repertoire like that of more-152 differentiated memory subsets ( Figure 1B ). In dataset2, NT and CMT have similar TCRB 153 repertoire structures, and the TCRB repertoire structure of Treg is similar to the TCRB 154 repertoire structure of EMT rather than that of CMT (Supplemental Figure 1B) . These findings 155 fit with the trend found in dataset1. To consider the overlapping usage of CDR3 clones, we 156 further evaluated the similarity of TCRB repertoire among subsets with the Morisita-Horn 157 similarity index. In this analysis, NT keeps a similar TCRB repertoire like Tscm and CMT, 158 while the TCRB repertoire of NT is different from the TCRB repertoire of EMT and Treg 159 ( Figure 1C ; Supplemental Figure 1C ). In conclusion, the TCRB repertoire structure of CD4+ T The entire repertoire of NT was reported to be longer than the repertoire of memory 26 . 229 However, we found that the top clones in NT were shorter than clones in other subsets in all 230 datasets ( Figure 4A ; Supplemental Figure 6 ). Via calculation of the Pearson correlations, the 231 length distribution of NT is different to the length distribution of CMT and Tscm. It suggests 232 that the length distribution of TCRB repertoire of NT is highly skewed in these less-233 differentiated memory cells ( Figure 4B ). Since the naïve cells are sorted without antibody 234 against CD27 in dataset2, the length distribution of NT can be affected by the contamination 235 from cell sorting. We examined the length distribution in dataset 1. To identify whether the gene usage affects the CDR3 length distribution, we calculated the 237 mean length of clones for each gene. The mean length is different among clones by varied V-238 and J-genes ( Figure 4C and D) , however, clones using all of genes are shorter in NT than -239 clones in CMT and Tscm. Therefore, for top clones, the clones of NT are shorter than the clones 240 of other subsets, and the gene usage contributes less to the distinct length distributions among 241 subsets. Public clones that are shared by individuals were shown to be different from private clones 251 in sequence composition 25 . Our analyses showed that public clones were shorter than private 252 ones within top1000 clones (Supplemental Figure 7A ). It suggests that public clones may affect 253 the features of top clones. We referred clones found in no less than two individuals as public 254 clones. We found more public clones in NT than in other subsets: 1,400 in NT, 262 in Tscm, 255 from T1D ( Figure 5A ; Supplemental Figure 7B ). Via calculating abundance, we found that 257 public clones were composed of ~70% of top1000 in NT, and ~5% in other subsets. It suggests 258 that the public clones in NT have a larger effect on the repertoire of top clones than the public 259 clones in others ( Figure 5B ). Most of public clones in NT were a little presented in other subsets 260 ( Figure 5C ), and about 50% public clones in each subset could be found in NT. This result 261 indicates that public clones in NT are less maintained than the top clones in other subsets. To 262 detect the differences between public clones and private clones within each subset, support 263 vector machine (SVM) was used. To avoid the influence of differential sample sizes, we 264 randomly down-sampled 400 public clones and private clones for each subset, respectively. The 265 prediction was repeated for 100 times. The prediction accuracy (BACC) in NT was found to be 266 lower than BACC in other subsets ( Figure 5D ). It suggests that the differences between private 267 and public clones in differentiated subsets are larger than the differences in naïve. Further 268 analyses showed that the gene usage of public clones is similar to the gene usage of all top1000 269 clones (Supplemental Figure 7C ). It suggests that gene usage is not skewed in public clones. To identify that public clones or private clones account for the increased difference in 281 memory and Treg, we performed SVM to discriminate public clones as well as private clones 282 from different subsets separately 25 . For public clones, the BACC was from 50% to 60%; NT 283 was able to be discriminated from Treg, CMT and Tscm with ~ 55% BACC; whereas CMT was 284 incapable of separating from Tscm with ~ 50% BACC ( Figure 6A ). For private clones, the 285 BACC was from 50% to 70%. We were able to achieve a high prediction accuracy to 286 discriminate private clones of TN from private clones of Treg, but we failed to separate private 287 clones from CMT and Tscm ( Figure 6B ). When we increased the sample size of private clones 288 from 400 to 2500 for training SVM model, we found that the varied BACCs to discriminate 289 private clones from different subsets were still existed (Supplemental Figure 8A) . These results 290 suggest that the sequence compositions of public clones and private clones are both skewed in 291 The reduced number of public clones narrows the antigen spectrum recognized by 293 To unveil the functions of clones among subsets, we annotated clones by VDJdb 27 . 1885 295 clones of CD4+ T cells targeting eight epitopes in total are recorded by this database. 296 Comparing with CMT, Tscm and Treg, NT has more clones recognizing antigens (HA, H1 and 297 NP) from influenza, pp65 from cytomegalovirus (CMV), CFP10 from M. tuberculosis and 298 gliadin from Triticum Aestivum ( Figure 6C; Supplementary figure 8B ). To estimate the 299 spectrum of antigens targeted by the top clones, we used GLIPH2 23 to predict the clusters 300 recognizing diverse antigens for each subset. With a stringency filter (see Methods), we found 301 out 806 clusters in NT from HC and 836 clusters in NT from T1D respectively; while less than 302 200 clusters in whole of Tscm, CMT and Treg ( Figure 6D ). When public clones were removed 303 from top clones, only 14.88% clusters remained in NT of HC and 13.25% clusters remained in 304 T1D, whereas over 45% clusters remained in other subsets ( Figure 6E ). It suggests that the public clones enlarge the antigen spectrum recognized by top clones in NT. In conclusion, NT 306 recognizes a broader antigen profile contributed by public clones. It is essential for CD4+ T cells to recognize antigens with TCR, which is primarily 318 achieved by the CDR3 region. CD4+ T cells can acquire new functions via differentiation; 319 however, it is unclear how differentiation affects their TCR repertoire. We detected the 320 relationships among the TCRB repertoire of top1000 clones of naïve, memory and Treg subsets 321 (including NT, ET, Tcm, Tem, Tscm ET and Treg) by estimating the repertoire structure, the 322 germline gene usage, the sequence composition (K-mer) and public CDR3 clone usage. 323 We derive that the TRBV repertoire features of memory subsets are tightly regulated in 324 differentiation. We observed that 23 of 72 genes increased or decreased in an order of NT, Tscm, 325 CMT and EMT. It indicates that a mechanism exists to regulate the variations across subsets. 326 Furthermore, since Tscm is the least differentiated cell whereas EMT is the highest one among 327 the tree memory subsets 28 , it indicates that the differentiation level is along with the mechanism. 328 CMT is formally considered as the primary memory subset which ET prefer to differentiate to, 329 and then part of CMT differentiates to EMT. In the past decade, Tscm has been found to mix 330 phenotypes of naïve and memory. Tscm is able to self-renew and replenish more differentiated 331 subsets of memory T cells, and therefore acts as the key intermediary of the generation of 332 memory 29,30 . In together, differentiation levels of memory subsets reflect their differentiation 333 order. However, memory cells can be directly generated from naïve cells by asymmetric cell 334 division 31,32 . It indicates that the differential order should not be the only factor skewing TCRB antigens. It implies that, for newborns, food, self-antigens and even cytokine driven clones 341 compose the large part of TCRB repertoire of memory. Since the highly frequent clones in NT 342 are self-antigen related, the features of frequent clones in NT will be delivered to memory at 343 this period 35 . Furthermore, shown by Graeme et al, a half part of memory is maintained by self-EMT maintain the features of TCRB repertoire inherited from NT at the early lifetime. In 347 conclusion, it is reasonable to drive that events at the early lifetime, genetic factors and 348 differentiation order regulate the TCRB repertoire of CD4+ T subsets with differentiated levels. 349 Public clones are key components that affect the features of TCRB repertoire in 350 differentiation. First, we found that public clone usage rather than gene usage shortens the 351 length distribution of top clones within NT. Second, the sequence composition of public clones 352 which is skewed in differentiated subsets contribute to the variations of TCRB repertoire in 353 differentiation. Third, decreased public clones induce a reduction in antigen spectrum 354 recognized by memory and Treg subsets. These results suggest that the skewed public clone 355 usage highly affect top clones in differentiated subsets. Furthermore, we showed that factors 356 affecting the generation of public clones in memory and Treg are different to that in NT. The 357 generation of public clones were largely attributed to genetic factors and thymic positive 358 selection in the previous study 37 . In our study, public clones from NT are less maintained in 359 differentiated subsets, and SVM analyses indicate that sequence composition in memory 360 It suggests that the difference between public clones and private clones is enlarged in the 376 differentiated subsets. When we performed SVM on public clones and private clones among 377 subsets respectively, the sequence compositions of public clones and private clones were 378 skewed in differentiation. 379 A small part of peripheral Treg differentiated from conventional Treg. Shown by Golding 380 A. et al, the repertoire of Foxp3+ and Foxp3-cells did not overlap 38 . Although peripheral Tregs 381 are differentiated from conventional T cells 39 and can introduce the features of NT into Treg, 382 the TCR repertoire of effector and memory subsets is similar to NT than to Treg. This 383 phenomenon suggests that the influx from naïve just composed a minor part of Treg in blood, 384 and comparing to Treg, the features of naïve are maintained in effector and memory subsets in 385 the differentiation. Our study includes samples of three healthy states (heathy, RA and T1D 386 individuals), and therefore highlights that our findings are consistent in heathy conditions and 387 datasets. 388 TNF-alpha/IFN-gamma profile of HBV-specific CD4 T cells is associated 405 with liver damage and viral clearance in chronic HBV infection The roles of resident, central and effector 408 memory CD4 T-cells in protective immunity following infection or vaccination CD4(+) T Cell Help Is Required for the Formation of a Cytolytic The naive T-cell receptor repertoire has an extremely broad 414 distribution of clone sizes Memory T cell subsets, migration 416 patterns, and tissue residence Effector and memory T-cell differentiation: 419 implications for vaccine development Development and Function of Protective and Pathologic 422 Diversity and clonal selection in the human T-cell repertoire Regulatory T Cells Suppress Effector T Cell Proliferation by Limiting 426 T cell receptor beta-448 chains display abnormal shortening and repertoire sharing in type 1 diabetes Comprehensive TCR repertoire analysis of CD4(+) T-cell subsets in 451 rheumatoid arthritis IMonitor: A Robust Pipeline for TCR and BCR Repertoire Analysis A model-based 455 approach to comparative analysis of the clone size distribution of the T cell receptor 456 repertoire Philentropy: Information Theory and Distance Quantification with R KeBABS: an R package for kernel-based analysis 460 of biological sequences Analyzing the Mycobacterium 463 tuberculosis immune response by T-cell receptor clustering with GLIPH2 and genome-464 wide antigen screening Large-scale network analysis reveals the 466 sequence space architecture of antibody repertoires Learning the High-Dimensional Immunogenomic Features That Predict T-cell receptor repertoires share a restricted set of public and abundant 493 CDR3 sequences that are associated with self-related immunity Memory CD4 T cell subsets 496 are kinetically heterogeneous and replenished from naive T cells at high levels Crossreactive public TCR sequences undergo positive 499 selection in the human thymic repertoire Deep sequencing of the 502 TCR-beta repertoire of human forkhead box protein 3 (FoxP3)(+) and FoxP3(-) T cells 503 suggests that they are completely distinct and non-overlapping The mechanisms shaping the repertoire of CD4(+) Foxp3(+)