key: cord-0260097-ni1it38q authors: Omer, Aviv; Peres, Ayelet; Rodriguez, Oscar L; Watson, Corey T; Lees, William; Polak, Pazit; Collins, Andrew M; Yaari, Gur title: T Cell Receptor Beta (TRB) Germline Variability is Revealed by Inference From Repertoire Data date: 2021-05-17 journal: bioRxiv DOI: 10.1101/2021.05.17.444409 sha: 0b0590621d1b2bdab6495bdffa3e053600fff0cc doc_id: 260097 cord_uid: ni1it38q T and B cell repertoires constitute the foundation of adaptive immunity. Adaptive immune receptor repertoire sequencing (AIRR-seq) is a common approach to study immune system dynamics. Understanding the genetic factors influencing the composition and dynamics of these repertoires is of major scientific and clinical importance. The chromosomal loci encoding for the variable regions of T and B cell receptors (TCRs and BCRs, respectively) are challenging to decipher due to repetitive elements and undocumented structural variants. To confront this challenge, AIRR-seq-based methods have been developed recently for B cells, enabling genotype and haplotype inference and discovery of undocumented alleles. Applying these methods to AIRR-seq data reveals a plethora of undocumented genomic variations. However, this approach relies on complete coverage of the receptors’ variable regions, and most T cell studies sequence only a small fraction of the variable region. Here, we adapted BCR inference methods to full and partial TCR sequences, and identified 38 undocumented polymorphisms in TRBV, 15 of them were also observed in genomic data assemblies. Further, we identified 31 undocumented 5’ UTR sequences. A subset of these inferences was also observed using independent genomic approaches. We found the two documented TRBD2 alleles to be equally abundant in the population, and show that the single nucleotide that differentiates them is strongly associated with dramatic changes in the expressed repertoire. Our findings expand the knowledge of genomic variation in the TRB (T Cell Receptor Beta) locus and provide a basis for annotation of TCR repertoires for future basic and clinical studies. encoding for the variable regions of T and B cell receptors (TCRs and BCRs, respectively) 23 are challenging to decipher due to repetitive elements and undocumented structural variants. 24 To confront this challenge, AIRR-seq-based methods have been developed recently for B 25 cells, enabling genotype and haplotype inference and discovery of undocumented alleles. 26 Applying these methods to AIRR-seq data reveals a plethora of undocumented genomic 27 variations. However, this approach relies on complete coverage of the receptors' variable 28 regions, and most T cell studies sequence only a small fraction of the variable region. Here, 29 we adapted BCR inference methods to full and partial TCR sequences, and identified 38 30 undocumented polymorphisms in TRBV, 15 of them were also observed in genomic data 31 assemblies. Further, we identified 31 undocumented 5' UTR sequences. A subset of these 32 inferences was also observed using independent genomic approaches. We found the two 33 documented TRBD2 alleles to be equally abundant in the population, and show that the 34 single nucleotide that differentiates them is strongly associated with dramatic changes in the 35 expressed repertoire. Our findings expand the knowledge of genomic variation in the TRB 36 (T Cell Receptor Beta) locus and provide a basis for annotation of TCR repertoires for future 37 basic and clinical studies. SNP within a stretch of four or more identical nucleotides were considered, as such a stretch 118 is more likely to arise from a sequencing read artefact. 119 We applied the above approach to four data-sets, spanning different sequencing proto-120 cols and scales (see methods). From DS1, which includes 28 individuals sequenced in full 121 length, we inferred 18 undocumented alleles (Fig. 2a) . Four of the 18 alleles were only seen 122 at very low levels -less than 20% of all identified alleles of those genes, and showed a 123 uni-modal distribution. This did not follow our expected usage distribution (these alleles are 124 marked red in Fig. 2a ). These candidate "alleles" therefore potentially result from sequenc- 125 ing errors. Two more undocumented alleles were considered erroneous due to adjacent 126 nucleotide stretches. After discarding these six allele candidates, we were left with 12 un-127 documented alleles. Nine of the alleles were observed in more than a single individual's 128 genotype, which increases our confidence in the inferences. 129 For further validation, we compared the undocumented alleles to three sources (Supp. 130 Table 5 ). The first, a shotgun sequencing analysis of 286 individuals from Luo et al. [33] , 131 where six out of the 12 undocumented alleles were observed. The second was a pmDb [27] 132 data-set, where we found four out of the 12 undocumented alleles. The third was long-read 133 assemblies of 35 diploid from Ebert et al. [13] which confirmed six out of the 12 undoc-134 umented alleles. All together, two of the 12 undocumented alleles were observed in all 135 sources. 136 In addition to the 18 undocumented alleles, five additional undocumented allelic variants 137 were identified in DS1 that matched known allele references in IMGT, except in their 3' end 138 that is missing in IMGT (Supp. [55]. The 3' ends of the L-PART1 references of the three sequences in IMGT seem to orig-162 inate from the intron, and the 5' splice site of the introns of those three alleles were likely 163 misidentified in IMGT (Supp. Fig. 8 ). We also found three variant consensus sequences 164 associated with TRBV6-2*01/TRBV6-3*01 (Supp. Fig. 7 ), all three were observed in the 165 long-read assemblies for the TRBV6-2*01 annotation. One of them, TRBV6-2*01/TRBV6-166 3*01 2 was also observed in the undocumented allele TRBV6-3*01 G47A. 167 Next, we analyzed DS2, which includes data from 25 individuals. Fifteen undocumented 168 alleles were inferred (Fig. 2a) . Nine of them were also observed in DS1. Five others 169 were present in the short-read whole genome databases, one in Luo et al. [33] and four 170 in pmDB [27] . Two out of those five undocumented alleles were observed in the long-read 171 assemblies of 35 diploid from Ebert et al. [13] . One out of those two undocumented alleles 172 was observed in all sources (Supp. The left panel shows undocumented alleles within the genotype. Each row represents an undocumented allele, each column represents an individual. The Y-axis annotations in red, green, and blue correspond to alleles not following the expected multi-modal distribution, alleles adjacent nucleotide stretches, or both, respectively. The tile colors correspond to an individual's genotype. The right panel shows the fraction of the undocumented allele assignments out of the gene assignments. The x-axis is the fraction and the Y-axis is the same as in (a). Colors correspond to an individual's genotype of the allele. The shapes correspond to the number of gene assignments: a circle indicates that the number of gene assignments is less than 10, a triangle indicates that the number of gene assignments is between 10 and 30, and a square indicates that the number of gene assignments is more than 30. (b) Documented alleles versus the observed documented alleles and the undocumented alleles for each TRBV gene and in each data-set, DS1 and DS2, DS3, and DS4, respectively. The x-axis corresponds to the TRBV genes and the y-axis to the number of alleles. The color corresponds to the allele groups, dark red is the documented alleles, light red is the observed documented alleles, and blue is the observed undocumented alleles. morphisms in four genes (Fig. 3a ). TRBV4-3 and TRBV3-2 are absent from the genotypes 197 of eight individuals, while TRBV11-1 is absent from two individuals, and TRBV30 from one 198 individual (Fig. 3b) . In DS2, a similar data-set to DS1 from the point of view of sequence 199 length of the coding region, double chromosome deletions of TRBV4-3 were identified in 200 eight individuals, TRBV3-2 in two individuals, and TRBV7-3 in one individual (Supp. Fig. 3 ). 201 Interestingly, the individual from DS1 with an inferred TRBV30 deletion (HC10) was shown to 202 be homozygous or hemizygous for the undocumented allele TRBV30*03 T285C. On the as- 203 sumption that this undocumented allele is found at relatively low frequency within the human 204 population, homozygosity is unlikely. However, it is possible that the undocumented allele 205 has escaped more widespread detection because of its low usage level. This low usage 206 is a consequence of TRBV30*03 T285C being a pseudogene, because its coding region 207 includes an in-frame stop codon. HC10 therefore has at least the functional equivalent of a 208 double chromosome deletion. 209 The gene TRBV4-3 and the pseudogene TRBV3-2 were always inferred as being deleted 210 together in DS1 individuals (Fig. 3b) age of TRBV6-2*01/TRBV6-3*01. In individuals who lack TRBV4-3 and TRBV3-2, usage of 217 TRBV6-2*01/TRBV6-3*01 is significantly lower than in the individuals who express TRBV4-3 218 and TRBV3-2 (Supp. Fig. 13 ). It is therefore likely that detection of TRBV6-2*01/TRBV6-219 3*01 in these individuals is entirely a consequence of sequences utilizing TRBV6-3*01. This 220 line of reasoning also allowed us to conclude that an undocumented polymorphism seen in 221 sample HC4 is most likely an allele of TRBV6-3 (TRBV6-3*01 G47A) rather than TRBV6-222 2. The genotypes of all individuals who carry the TRBV6-2*01/TRBV6-3*01 2 or TRBV6-223 3*01 G47A 5'UTR sequences, include TRBV7-2*02. Since the presence of TRBV7-2*02 224 hints at deletion of TRBV6-2, we conclude that the TRBV6-2*01/TRBV6-3*01 2 5'UTR se-225 quences is attributed to TRBV6-3*01. 226 In DS2 individuals, deletion of V4-3 was not always accompanied by evidence of deletion 227 of TRBV3-2. This is likely because DS2 was collected from different sources, and in some 228 data-sets non-productive sequences had been filtered out. Evidence of the presence or 229 absence of the TRBV3-2 pseudogene is therefore lacking. In other samples, analysis of 230 TRBV3-2 usage is compromised by its low usage (Fig. 3a) . TRBV10 TRBV11 TRBV12 TRBV3 TRBV4 TRBV5 TRBV6 TRBV7 a. b. c. d. e. f. To explore J genotypes, we first checked for evidence of errors by exploring the fraction 283 of all TRBJ1-6 assignments that are assigned to TRBJ1-6*01. This is the most likely error in 284 TRBJ genotyping, as TRBJ1-6*01 is the only TRBJ gene with two known functional alleles. erozygous individuals (Fig. 4d) , indicating that the TRBJ1-6 alleles can be reliably inferred. 287 Of note, in heterozygous individuals TRBJ1-6*02 is considerably more frequently used 288 compared with TRBJ1-6*01 (Fig. 4d) . The average fraction of TRBJ1-6*01 out of all se-289 quences assigned to TRBJ1-6 in heterozygous individuals is ∼ 0.39, which is comparable 290 with the average fraction of TRBD2*01 out of all sequences assigned to TRBD2 in TRBD2 291 heterozygous individuals after correcting for mis-assignments (see above). The similarity 292 between the biased usage of TRBJ1-6 and TRBD2 alleles in heterozygous individuals led 293 us to test the genetic dependency between these loci. 294 The distance between TRBJ1-6 to TRBD2 is relatively short (∼6000bp), suggesting 295 these loci could indeed be in linkage disequilibrium (LD). To test this hypothesis, we reviewed 296 Whole Genome Sequencing (WGS) records from the 1000 Genomes Project to profile the 297 region's variants based on TRBD2 haplotype (Supp. Fig. 12) . We observed SNPs with 298 a high LD score between the genes. Further, the WGS haplotypes showed several other 299 SNPs with high LD score scattered in the TRBD-TRBC2 genomic region, which strengthens 300 the association between TRBD2 alleles and other markers in the locus. DS4 was unsuitable 301 to test this hypothesis because DS4 sequences do not include the SNP that differentiates 302 between TRBJ1-6*01 and TRBJ1-6*02. We therefore tested the LD hypothesis using DS3. 303 Only genotypes for which we were confident of the TRBD2 genotype were taken into ac-304 count. These genotypes are shown outside the gray areas of Fig. 4b . Supp. Fig. 9 shows 305 that all of the homozygous TRBD2*01 individuals are also homozygous for TRBJ1-6*02. 306 Also, 50 out of the 52 homozygous TRBJ1-6*01 individuals are homozygous for TRBD2*02. 307 Having confirmed that TRBJ1-6 and TRBD2 are in LD, we next investigated the influence 309 of TRBD2 genotypes on TRBJ/TRBV gene usage in the repertoires. Since such an investi-310 gation requires accurate TRBD2 genotype inference, accurate annotations of TRBJ genes, 311 and a large data-set, DS4 was used. 312 We found that homozygous TRBD2*02 individuals tend to use TRBD2 1.25 times more 313 than homozygous TRBD2*01 individuals (Fig. 5a, TRBJ2 genes normalized independently for each gene family (Fig. 5c) . Surprisingly, we 325 found that TRBJ usage within each family is also affected by the TRBD2 genotype. Since 326 TRBD2 can rearrange only with TRBJ2 genes, we stratified the above distributions into sub-327 sets that include only biologically possible rearrangements. In particular, Fig. 5d shows the 328 conditional probability P(TRBJ2-N|TRBD2) for all the TRBJ2 genes. The biased usage of 329 the TRBJ2 genes observed in Fig. 5c is still present, indicating that TRBD2 relative likelihood 330 to recombine with TRBJ2 genes is strongly affected by the TRBD2 genotype. We then ex-331 plored the TRBJ gene fraction of the sequences assigned to TRBD1 (P(TRBJ1/2-N|TRBD1), 332 Fig. 5e ), and observed that TRBD2 genotype is associated with the TRBD1 likelihoods to re-333 12 arrange with individual TRBJ genes. We further investigated the effect of TRBD2 genotype 334 on the likelihoods to rearrange with individual TRBJ2 genes only, and the effect was mostly 335 eliminated P(TRBJ2-N|TRBD1), Fig. 5f ). 336 The strong biases observed in Fig.5a -e can result from amino acid alterations in the se-337 quence, from non-coding regulatory variants, or from unknown structural variations in the 338 locus that are associated with the different alleles. To discriminate between these options, 339 we repeated the analysis for non-functional sequences that resulted from frame-shifts be-340 tween the TRBV and TRBJ genes. Such non-functional sequences are commonly used to 341 reflect the initial V(D)J usage prior to thymus selection [12, 68, 59]. In these non-functional 342 sequences the biases are pronounced in a similar fashion (Supp. Fig. 11 ). Thus, we con-343 clude that the differences between the TRBD and TRBJ rearrangements stratified by D2 344 genotype are most likely due to structural differences or non-coding regulatory variants be-345 tween the loci rather than due to negative selection. . Table 1 BIOMED-2). In 11 out of 12 individuals with a high genotype infer-362 ence likelihood, the pattern between homozygosity of TRBV7-2*bp02 (see section 4.3) and 363 a deletion inference of TRBV4-3 was apparent (Supp. Fig. 14 and 15 ). Another gene with 364 a link to TRBV7-2 is TRBV6-2/TRBV6-3. Its usage was also highly affected by the geno-365 type of TRBV7-2. The mean usage of TRBV6-2/TRBV6-3 in TRBV7-2*bp02 homozygous 366 individuals was less than half of the mean usage in TRBV7-2*bp01 homozygous individu-367 als. Since we cannot distinguish between TRBV6-2 and TRBV6-3, this observation supports 368 the hypothesis that both TRBV6-2 and TRBV4-3, are not present in haplotypes that carry 369 TRBV7-2*bp02. 370 In addition, the following association patterns between specific alleles and single chromo- TRBV24-1*02 was present on chromosome TRBJ1-6*02, TRBV20-1*02 was also observed. 377 Of note is a haplotype block between the genes TRBV6-4 to TRBV10-1, that was observed 378 in DS1:CI21 on the TRBJ1-6*01 chromosome and in DS2:donor4 on the TRBJ1-6*02 chro-379 mosome (Fig. 6) . q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q TRBJ gene q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q fraction of TRBJ2 genes out of the sequences that were assigned to TRBD2 and were longer than 7nt. (e) The fraction of TRBJ2 genes out of the sequences that were assigned to TRBD1 and were longer than 7nt. TRBJ genes are shown along the X-axis in the order in which they appear in the genome. (f) The fraction of TRBJ genes out of the sequences that were assigned to TRBD1 and were longer than 7nt. TRBJ genes are shown along the X-axis in the order in which they appear in the genome. The box colors correspond to the TRBD2 genotype. Statistical significance was determined using a Mann-Whitney test and adjusted by Bonferroni correction ( n.s. -not significant, * -p < 0.05, ** -p < 0.01, and ***p < 0.001). TRBV4−3 TRBV7−2 TRBV6−4 TRBV7−3 TRBV9 TRBV10−1 TRBV11−1 TRBV10−2 TRBV11−2 TRBV6−5 TRBV7−4 TRBV5−4 TRBV6−6 TRBV5−5 TRBV7−6 TRBV5−6 TRBV6−8 TRBV7−7 TRBV6−9 TRBV7−8 TRBV5−8 TRBV7−9 TRBV13 TRBV10−3 TRBV11−3 TRBV12−3 TRBV12−4 TRBV12−5 TRBV14 TRBV15 TRBV16 TRBV18 TRBV19 TRBV20−1 TRBV24−1 TRBV25−1 TRBV27 TRBV28 TRBV29−1 TRBV30 TRBV2 TRBV3−1 TRBV4−1 TRBV5−1 TRBV6−1 TRBV4−2 TRBV6−2/TRBV6−3 TRBV4−3 TRBV7−2 TRBV6−4 TRBV7−3 TRBV9 TRBV10−1 TRBV11−1 TRBV10−2 TRBV11−2 TRBV6−5 TRBV7−4 TRBV5−4 TRBV6−6 TRBV5−5 TRBV7−6 TRBV5−6 TRBV6−8 TRBV7−7 TRBV6−9 TRBV7−8 TRBV5−8 TRBV7−9 TRBV13 TRBV10−3 TRBV11−3 TRBV12−3 TRBV12−4 TRBV12−5 TRBV14 TRBV15 TRBV16 TRBV18 TRBV19 TRBV20−1 TRBV24−1 TRBV25−1 TRBV27 TRBV28 TRBV29− the reliability of this inference, we used the same individuals who were heterozygous for 382 TRBJ1-6. 10 out the 11 individuals from DS1 were heterozygous for TRBD2 and were used 383 for the haplotype inference (Supp. Fig. 16) . Although the number of recombinations of 384 TRBD2 with the TRBV genes is much larger than TRBJ1-6 and could potentially supply a 385 better inference, comparison of the results from both anchor genes shows a different picture. 386 The haplotypes inferred with TRBD2 commonly show occurrences of more than one allele 387 per gene on a single chromosome. This is most likely due to ambiguous assignment of the 388 very short and similar TRBD2 alleles. Hence, although haplotype inference with TRBD2 is 389 feasible, it is likely to be less accurate. Table 9 . The data-sets are described in Table. 2. DS2 and DS4 were 399 downloaded after preprocessing, DS1 was preprocessed according to the preprocessing of 400 Eliyahu et al. [14] , and DS3 was preprocessed using pRESTO [63] according to the example 401 workflow "Illumina MiSeq 2x250 BCR mRNA" as follows, (i) paired-ends were assembled, 402 (ii) sequences with low quality (mean Phred quality scores lower than 20) were removed, (iii) 403 the 3' and 5' end primers were cut, (iv) duplicate sequences were removed and collapsed. 405 Sequences of two full-length TRBV genes, TRBV6-2*01 and TRBV6-3*01, are indistinguish-406 able (Fig. 1c) . We therefore refer to them here as TRBV6-2*01/TRBV6-3*01. TRBV se-407 quences amplified using the BIOMED-2 primers are partial, yet it is still possible to differen-408 tiate most of the genes. Only TRBV6-2 and TRBV6-3, as well as TRBV12-3 and TRBV12-4 409 could not be differentiated (Fig. 1c) . Those partial genes are referred to here as TRBV6- and TRBV6-5/TRBV6-6 were indistinguishable. (Fig. 1c) . Although there are few ambiguities in the identification of partial TRBV genes, the unam-424 biguous identification of partial allelic variants is more problematic. Many SNPs that distin-425 guish between alleles are located outside the regions that are generated using BIOMED-2 426 or Adaptive Biotechnologies primers. Thus, all alleles were collapsed into partial allelic vari-427 ation groups, the sequence of each partial allelic variation group was determined to be iden-428 tical to the longest allele sequence reference (out of the identical partial alleles' references). 429 The allele patterns were named here using the following structure: [gene name]*[protocol 430 primers][0-9][0-9]. The BIOMED-2 partial allelic variants were symbolized by bp, and the 431 Adaptive Biotechnologies partial allelic variants were symbolized by ap. For example, the 432 partial sequence of the allele TRBV5-6*01 was collapsed into the partial allelic variation 433 groups TRBV5-6*bp01 and TRBV5-6*ap01 (see Supplementary Table 8 with mismatches in the TRBD segment or with identifiable TRBD sequences shorter than 9 464 nucleotides were filtered out. 465 TIgGER's level of confidence was calculated using a Bayes factor (K) from the poste- 466 rior probability for each model. The larger the K, the greater the certainty in the genotype 467 inference. lk that is used throughout the manuscript indicates the log of K. σ 1 ) and X 2 ∼ N 2 (µ 2 , σ 2 ), the equilibrium point x is determined as follows: xσ 1 + xσ 2 = µ 1 σ 2 + µ 2 σ 1 491 x = µ 1 σ 2 + µ 2 σ 1 σ 1 + σ 2 4.9 Haplotype inference 508 RAbHIT was used as previously described [45] , with TRBD2 and TRBJ1-6 anchors to infer 509 TCR haplotypes. The epsilon error parameter was adjusted to deal with TRBD2 alignment 510 errors, and was estimated with reference to the frequency distribution of TRBD2*01 align-511 ments amongst all TRBD2-bearing sequences (see Fig. 4 genotypes and haplotypes would be obscured. This study shows, however, that the usage of 596 particular genes in the expressed repertoire appears to be very similar between individuals. 597 The 'shape' of the TCR repertoire may therefore be as predictable as has been found for 598 20 the BCR repertoire [23, 52], reflecting both the carriage of individual genes and the LD that 599 is found within the loci. Conspicuous LD identified in this study includes that of the TRBV4-600 3/TRBV3-2/TRBV6-2 deletion polymorphism and carriage of the TRBV7-2*02 allele, as well 601 as linkage between the TRBD2 and TRBJ1-6 loci. These different haplotypes, in turn, are 602 associated with significant differences in the usage of neighbouring genes. for specific allelic variants of these genes could not be explored. Interestingly, in the present 622 study a previously undocumented polymorphism of TRBV12-4 was identified. 623 The lack of disease associations with TCR genes is likely to be a reflection of our ig-624 norance of individual genetic variation within the TCR loci. SNP coverage of these regions 625 is even sparser than coverage of the BCR gene loci, and therefore real associations will 626 escape detection in genome-wide association studies. Only after thorough exploration of 627 the population genetics of the TCR genes, and of individual variation in the expressed TCR 628 repertoire, will it be possible to determine whether or not these genes have a role in disease 629 susceptibility. Genomic sequencing of TCR genes will contribute to this [32], but the present 630 study demonstrates that it will also be possible to do this efficiently through the analysis of The hardy-weinberg principle Adaptive Biotechnologies. Adaptive biotechnologies datasets Deconstructing the peptide-mhc specificity of t cell recognition Individual variation in the germline ig gene repertoire inferred from variable region gene 660 rearrangements The impact of a large and frequent deletion in the human tcrβ locus on antiviral 664 immunity Determination of gene 666 organization in the human ighv region on single chromosomes Medical genetics: a markerfor stevens-669 johnsonsyndrome Germline immunoglobulin genes: disease susceptibility genes hidden in plain sight? 672 Current Opinion in Systems Biology Néstor Vázquez Bernat Production of individualized v gene databases reveals high levels of im-676 munoglobulin genetic diversity iReceptor: A 679 platform for querying and analyzing antibody/B-cell and T-cell receptor repertoire data 680 across federated repositories A model of somatic hypermutation targeting in mice based on high-684 throughput ig sequencing data Haplotype-resolved diverse human genomes and integrated analysis of structural 688 variation Francois Vi-690 gneault, Chris Clouser, Ronen Hope, Assy Nimer, Marius Braun Antibody repertoire analysis of hepatitis 692 c virus infections identifies immune signatures associated with spontaneous clearance Immunosequencing identifies signatures 697 of cytomegalovirus exposure history and hla-mediated effects on the t cell repertoire Genotyping 700 and copy number analysis of immunoglobin heavy chain variable genes using long 701 reads Identification of subject-specific immunoglobulin alleles from expressed repertoire se-705 quencing data Identification of subject-specific immunoglobulin alleles from expressed repertoire se-709 quencing data Auto-711 mated analysis of high-throughput b-cell sequencing data reveals a high frequency of 712 novel immunoglobulin v gene segment alleles Auto-715 mated analysis of high-throughput b-cell sequencing data reveals a high frequency of 716 novel immunoglobulin v gene segment alleles The promise and challenge of high-throughput 720 sequencing of the antibody repertoire Mosaic deletion patterns of the human antibody heavy chain gene locus shown by 724 bayesian haplotyping Naive antibody gene-segment frequencies are heritable and unaltered 728 24 by chronic lymphocyte ablation Unusual organization and diversity of t-cell re-732 ceptor a-chain genes Sequence relationships between putative t-cell receptor polypeptides and immunoglob-735 ulins Identifying highly mutated 737 ighd genes in the junctions of rearranged human immunoglobulin heavy chain genes Population matched (pm) germline 741 allelic variants of immunoglobulin (ig) loci: New pmig database to better understand ig 742 repertoire and selection processes in disease and vaccination. bioRxiv The 745 inference of phased haplotypes for the immunoglobulin h chain v region gene loci by 746 analysis of vdj gene rearrangements Fredrik Levander, and Mats Ohlin. Parallel antibody germline 749 gene and haplotype analyses support the validity of immunoglobulin germline gene 750 inference and discovery Imgt, the international immunogenetics database Single-cell landscape of bronchoalveolar immune cells in patients with covid-19 Profiling germline adaptive immune receptor repertoire with gairr suite. bioRxiv Worldwide genetic variation of the 763 ighv and trbv immune receptor gene families in humans Subtle changes in tcrα cdr1 profoundly increase the sensitivity of cd4 t cells Sequence diversity, natural selec-770 tion and linkage disequilibrium in the human t cell receptor alpha/delta locus. Human 771 genetics Non-productive human tcrβ 773 chain genes represent v-d-j diversity before selection upon function: insight into biased 774 usage of tcrbd and tcrbj genes and diversity of cdr3 region length The 777 mhc locus and genetic susceptibility to autoimmune and infectious diseases Polymorphisms in human immunoglobulin heavy chain variable genes 781 and their upstream regions Alternative splice variants and germline polymorphisms in human immunoglob-784 ulin light chain genes. bioRxiv Janeway's Immunobiology. Garland Science, 8 edition Janeway's immunobiology. Garland science Vdjbase: an 790 adaptive immune receptor genotype and haplotype database Pharmacogenomic advances in the pre-793 diction and prevention of cutaneous idiosyncratic drug reactions Identification of drug-specific public tcr driving severe cutaneous adverse 798 reactions RAbHIT: R Antibody Haplotype 800 Inference Tool Consistency of vdj rearrangement and 802 substitution parameters enables accurate b cell receptor sequence annotation Per-sample immunoglobulin germline 805 inference from b cell receptor deep sequencing data Comprehensive assessment of t-cell receptor beta-chain diversity in 810 alphabeta t cells A novel framework for characterizing genomic haplotype diversity in the human im-815 munoglobulin heavy chain locus Overview of methodologies for t-cell receptor reper-818 toire analysis The complete 685-kilobase dna sequence 820 of the humanβ t cell receptor locus Individual heritable differences result in unique cell lymphocyte 824 receptor repertoires of naïve and antigen-experienced cells Restric-827 tion fragment length polymorphisms and single germline coding region sequence in 828 vh182, a duplicated gene encoding autoantibody Ethnic differences in vh gene polymorphism Evaluation of grch38 and de novo haploid genome assemblies 840 demonstrates the enduring quality of the reference assembly. bioRxiv Physical map of the 3' region of the human immunoglobulin heavy chain locus: clus-843 tering of autoantibody-related variable segments in one haplotype Polymorphism of the human im-847 munoglobulin variable region segment v1-4.1 T cell receptor 850 next-generation sequencing reveals cancer-associated repertoire metrics and reconsti-851 tution after chemotherapy in patients with hematological and solid tumors. OncoIm-852 munology Learning the heteroge-854 neous hypermutation landscape of immunoglobulins from high-throughput repertoire 855 data Sequence variation and linkage disequilibrium in the human t-858 cell receptor β (tcrb) locus Organization and sequences of the diversity, joining, and constant region genes of the 862 human t-cell receptor beta chain Design and stan-868 dardization of pcr primers and protocols for detection of clonal immunoglobulin and t-cell 869 receptor gene recombinations in suspect lymphoproliferations: Report of the biomed-2 870 concerted action bmh4-ct98-3936 presto: a 873 toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor 874 repertoires Junctional biases in the naive TCR 877 repertoire control the CTL response to an immunodominant determinant of HSV-1. Im-878 munity Comment on" a database of human immune receptor 882 alleles recovered from population sequencing data Complete haplotype 887 sequence of the human immunoglobulin heavy-chain variable, diversity, and joining 888 genes and characterization of allelic and copy-number variation Immune 893 cell profiling of covid-19 patients in the recovery stageby single-cell sequencing Models of somatic hypermutation targeting and substitution based on 898 synonymous mutations from high-throughput immunoglobulin sequencing data. Fron-899 tiers in Immunology Igblast: an immunoglobulin 901 variable domain sequence analysis tool Organization and sequences of the variable, joining and 905 constant region genes of the human t-cell receptor α-chain Lymanalyzer: a tool for comprehen-908 sive analysis of next generation sequencing data of t cell receptors and immunoglobu-909 lins A database of human immune re-911 ceptor alleles recovered from population sequencing data A genetically determined insertion/deletion 914 related polymorphism in human T cell receptor beta chain (TCRB) includes functional 915 variable gene segments