[en] Allelic variability in the adaptive immune receptor loci, which harbor the gene segments that encode B cell and T cell receptors (BCR/TCR), is of critical importance for immune responses to pathogens and vaccines. Adaptive immune receptor repertoire sequencing (AIRR-seq) has become widespread in immunology research making it the most readily available source of information about allelic diversity in immunoglobulin (IG) and T cell receptor (TR) loci. Here we present a novel algorithm for extra-sensitive and specific variable (V) and joining (J) gene allele inference, allowing reconstruction of individual high-quality gene segment libraries. The approach can be applied for inferring allelic variants from peripheral blood lymphocyte BCR and TCR repertoire sequencing data, including hypermutated isotype-switched BCR sequences, thus allowing high-throughput novel allele discovery from a wide variety of existing datasets. The developed algorithm is a part of the MiXCR software. We demonstrate the accuracy of this approach using AIRR-seq paired with long-read genomic sequencing data, comparing it to a widely used algorithm, TIgGER. We applied the algorithm to a large set of IG heavy chain (IGH) AIRR-seq data from 450 donors of ancestrally diverse population groups, and to the largest reported full-length TCR alpha and beta chain (TRA; TRB) AIRR-seq dataset, representing 134 individuals. This allowed us to assess the genetic diversity within the IGH, TRA and TRB loci in different populations and to establish a database of alleles of V and J genes inferred from AIRR-seq data and their population frequencies with free public access through an online database.
Avnir Y, Watson CT, Glanville J, Peterson EC, Tallarico AS, Bennett AS, Qin K, Fu Y, Huang C-Y, Beigel JH, et al. 2016. IGHV1-69 polymorphism modulates anti-influenza antibody repertoires, correlates with IGHV utilization shifts and varies by ethnicity. Sci Rep 6: 20842. doi:10 .1038/srep20842
Bolotin DA, Poslavsky S, Mitrophanov I, Shugay M, Mamedov IZ, Putintseva EV, Chudakov DM. 2015. MiXCR: software for comprehensive adaptive immunity profiling. Nat Methods 12: 380–381. doi:10.1038/nmeth .3364
Chaisson MJ, Tesler G. 2012. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics 13: 238. doi:10.1186/1471-2105-13-238
Corcoran MM, Phad GE, Vázquez Bernat N, Stahl-Hennig C, Sumida N, Persson MAA, Martin M, Hedestam GBK. 2016. Production of individualized V gene databases reveals high levels of immunoglobulin genetic diversity. Nat Commun 7: 13642. doi:10.1038/ncomms13642
Corcoran M, Chernyshev M, Mandolesi M, Narang S, Kaduk M, Ye K, Sundling C, Färnert A, Kreslavsky T, Bernhardsson C, et al. 2023. Archaic humans have contributed to large-scale variation in modern human T cell receptor genes. Immunity 56: 635–652.e6. doi:10.1016/j .immuni.2023.01.026
Davis CW, Jackson KJL, McElroy AK, Halfmann P, Huang J, Chennareddy C, Piper AE, Leung Y, Albariño CG, Crozier I, et al. 2019. Longitudinal analysis of the human B cell response to Ebola virus infection. Cell 177: 1566–1582.e17. doi:10.1016/j.cell.2019.04.036
Dekker J, van Dongen JJM, Reinders MJT, Khatri I. 2022. pmTR database: population matched (pm) germline allelic variants of T-cell receptor (TR) loci. Genes Immun 23: 99–110. doi:10.1038/s41435-022-00171-x
Gadala-Maria D, Yaari G, Uduman M, Kleinstein SH. 2015. Automated analysis of high-throughput B-cell sequencing data reveals a high frequency of novel immunoglobulin V gene segment alleles. Proc Natl Acad Sci 112: E862–E870. doi:10.1073/pnas.1417683112
Gadala-Maria D, Gidoni M, Marquez S, Vander Heiden JA, Kos JT, Watson CT, O’Connor KC, Yaari G, Kleinstein SH. 2019. Identification of subject-specific immunoglobulin alleles from expressed repertoire sequencing data. Front Immunol 10: 129. doi:10.3389/fimmu.2019.00129
Gibson WS, Rodriguez OL, Shields K, Silver CA, Dorgham A, Emery M, Deikus G, Sebra R, Eichler EE, Bashir A, et al. 2023. Characterization of the immunoglobulin lambda chain locus from diverse populations reveals extensive genetic variation. Genes Immun 24: 21–31. doi:10 .1038/s41435-022-00188-2
Gidoni M, Snir O, Peres A, Polak P, Lindeman I, Mikocziova I, Sarna VK, Lundin KEA, Clouser C, Vigneault F, et al. 2019. Mosaic deletion patterns of the human antibody heavy chain gene locus shown by Bayesian haplotyping. Nat Commun 10: 628. doi:10.1038/s41467-019-08489-3
Gu Z, Eils R, Schlesner M. 2016. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32: 2847–2849. doi:10.1093/bioinformatics/btw313
Gupta NT, Vander Heiden JA, Uduman M, Gadala-Maria D, Yaari G, Kleinstein SH. 2015. Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics 31: 3356–3358. doi:10.1093/bioinformatics/btv359
Hart A, Martínez S. 2023. spgs: Statistical patterns in genomic sequences. Retrieved from https://CRAN.R-project.org/package=spgs
Hellinger E. 1909. Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen. J Reine Angew Math 1909: 210–271. doi:10.1515/crll.1909.136.210
Hester J, Vaughan D. 2023. bench: High precision timing of R expressions. Retrieved from https://CRAN.R-project.org/package=bench
Kassambara A. 2023. ggpubr: “ggplot2” based publication ready plots. Retrieved from https://CRAN.R-project.org/package=ggpubr
Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. 2017. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27: 722–736. doi:10 .1101/gr.215087.116
Lee JH, Toy L, Kos JT, Safonova Y, Schief WR, Havenar-Daughton C, Watson CT, Crotty S. 2021. Vaccine genetics of IGHV1-2 VRC01-class broadly neutralizing antibody precursor naïve human B cells. NPJ Vaccines 6: 113. doi:10.1038/s41541-021-00376-7
Lees W, Busse CE, Corcoran M, Ohlin M, Scheepers C, Matsen FA, Yaari G, Watson CT, Collins A, Shepherd AJ. 2020. OGRDB: a reference database of inferred immune receptor genes. Nucleic Acids Res 48: D964–D970. doi:10.1093/nar/gkz822
Leggat DJ, Cohen KW, Willis JR, Fulp WJ, deCamp AC, Kalyuzhniy O, Cottrell CA, Menis S, Finak G, Ballweber-Fleming L, et al. 2022. Vaccination induces HIV broadly neutralizing antibody precursors in humans. Science 378: eadd6502. doi:10.1126/science.add6502
Martin M, Ebert P, Marschall T. 2023. Read-based phasing and analysis of phased variants with WhatsHap. Methods Mol Biol 2590: 127–138. doi:10.1007/978-1-0716-2819-5_8
Mikocziova I, Greiff V, Sollid LM. 2021. Immunoglobulin germline gene variation and its impact on human disease. Genes Immun 22: 205–217. doi:10.1038/s41435-021-00145-5
Nielsen SCA, Roskin KM, Jackson KJL, Joshi SA, Nejad P, Lee JY, Wagar LE, Pham TD, Hoh RA, Nguyen KD, et al. 2019. Shaping of infant B cell receptor repertoires by environmental factors and infectious disease. Sci Transl Med 11: eaat2004. doi:10.1126/scitranslmed.aat2004
Nielsen SCA, Yang F, Jackson KJL, Hoh RA, Röltgen K, Jean GH, Stevens BA, Lee JY, Rustagi A, Rogers AJ, et al. 2020. Human B cell clonal expansion and convergent antibody responses to SARS-CoV-2. Cell Host Microbe 28: 516–525.e5. doi:10.1016/j.chom.2020.09.002
Ohlin M, Scheepers C, Corcoran M, Lees WD, Busse CE, Bagnara D, Thörnqvist L, Bürckert J-P, Jackson KJL, Ralph D, et al. 2019. Inferred allelic variants of immunoglobulin receptor genes: a system for their evaluation, documentation, and naming. Front Immunol 10: 435. doi:10 .3389/fimmu.2019.00435
Omer A, Shemesh O, Peres A, Polak P, Shepherd AJ, Watson CT, Boyd SD, Collins AM, Lees W, Yaari G. 2020. VDJbase: an adaptive immune receptor genotype and haplotype database. Nucleic Acids Res 48(D1): D1051–D1056. doi:10.1093/nar/gkz872
Omer A, Peres A, Rodriguez OL, Watson CT, Lees W, Polak P, Collins AM, Yaari G. 2022. T cell receptor beta germline variability is revealed by inference from repertoire data. Genome Med 14: 2. doi:10.1186/s13073-021-01008-4
Pagès H, Aboyoun P, Gentleman R, DebRoy S. 2024. Biostrings: Efficient manipulation of biological strings. Retrieved from https://bioconductor.org/packages/Biostrings
Peres A, Lees WD, Rodriguez OL, Lee NY, Polak P, Hope R, Kedmi M, Collins AM, Ohlin M, Kleinstein SH, et al. 2023. IGHV allele similarity clustering improves genotype inference from adaptive immune receptor repertoire sequencing data. Nucleic Acids Res 51: e86. doi:10.1093/nar/gkad603
Pushparaj P, Nicoletto A, Sheward DJ, Das H, Castro Dopico X, Perez Vidakovics L, Hanke L, Chernyshev M, Narang S, Kim S, et al. 2023. Immunoglobulin germline gene polymorphisms influence the function of SARS-CoV-2 neutralizing antibodies. Immunity 56: 193–206.e7. doi:10.1016/j.immuni.2022.12.005
Ralph DK, Matsen FA. 2019. Per-sample immunoglobulin germline inference from B cell receptor deep sequencing data. PLoS Comput Biol 15: e1007133. doi:10.1371/journal.pcbi.1007133
R Core Team. 2023. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/.
Robinson D. 2020. fuzzyjoin: Join tables together on inexact matching. Retrieved from https://CRAN.R-project.org/package=fuzzyjoin
Rodriguez OL, Ritz A, Sharp AJ, Bashir A. 2020a. MsPAC: a tool for haplotype-phased structural variant detection. Bioinformatics 36: 922–924. doi:10.1093/bioinformatics/btz618
Rodriguez OL, Gibson WS, Parks T, Emery M, Powell J, Strahl M, Deikus G, Auckland K, Eichler EE, Marasco WA, et al. 2020b. A novel framework for characterizing genomic haplotype diversity in the human immunoglobulin heavy chain locus. Front Immunol 11: 2136. doi:10.3389/ fimmu.2020.02136
Rodriguez OL, Silver CA, Shields K, Smith ML, Watson CT. 2022. Targeted long-read sequencing facilitates phased diploid assembly and genotyping of the human T cell receptor alpha, delta, and beta loci. Cell Genom 2: 100228. doi:10.1016/j.xgen.2022.100228
Rodriguez OL, Safonova Y, Silver CA, Shields K, Gibson WS, Kos JT, Tieri D, Ke H, Jackson KJL, Boyd SD, et al. 2023. Genetic variation in the immunoglobulin heavy chain locus shapes the human antibody repertoire. Nat Commun 14: 4419. doi:10.1038/s41467-023-40070-x
Roskin KM, Simchoni N, Liu Y, Lee JY, Seo K, Hoh RA, Pham T, Park JH, Furman D, Dekker CL, et al. 2015. IgH sequences in common variable immune deficiency reveal altered B cell development and selection. Sci Transl Med 7: 302ra135. doi:10.1126/scitranslmed.aab1216
Rudis B, Gandy D. 2023. waffle: Create waffle chart visualizations. Retrieved from https://CRAN.R-project.org/package=waffle
Shugay M, Britanova OV, Merzlyak EM, Turchaninova MA, Mamedov IZ, Tuganbaev TR, Bolotin DA, Staroverov DB, Putintseva EV, Plevova K, et al. 2014. Towards error-free profiling of immune repertoires. Nat Methods 11: 653–655. doi:10.1038/nmeth.2960
Tange O. 2018. GNU Parallel 2018. В GNU Parallel 2018 (с. 112). Ole Tange. doi:10.5281/zenodo.1146014
Vander Heiden JA, Yaari G, Uduman M, Stern JNH, O’Connor KC, Hafler DA, Vigneault F, Kleinstein SH. 2014. pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics 30: 1930–1932. doi:10.1093/bioinformatics/ btu138
Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, et al. 2019. Welcome to the tidyverse. J Open Source Softw 4: 1686. doi:10.21105/joss.01686
Wilke CO. 2024. cowplot: Streamlined plot theme and plot annotations for “ggplot2”. Retrieved from https://CRAN.R-project.org/package=cowplot
Zhang W, Wang I-M, Wang C, Lin L, Chai X, Wu J, Bett AJ, Dhanasekaran G, Casimiro DR, Liu X. 2016. IMPre: an accurate and efficient software for prediction of T- and B-cell receptor germline genes and alleles from rearranged repertoire data. Front Immunol 7: 457. doi:10.3389/fimmu.2016 .00457