No document available.
Abstract :
[en] MHC class I–associated peptides (MAPs), collectively referred to as the immunopeptidome, have a pivotal role in cancer immunosurveillance. While MAPs were long thought to be solely generated by the degradation of canonical proteins, recent advances in the field of proteogenomics (genomically-informed proteomics) evidenced that ∼10% of MAPs originate from allegedly noncoding genomic sequences. Among these sequences, the endogenous retroelements (EREs) are under intense scrutiny as a possible source of cancer-specific antigens (TSAs). With the increasing number of cancer-oriented immunopeptidomic and proteogenomic studies comes the need to accurately attribute an RNA expression level to each MAP identified by mass-spectrometry. Here, we introduce BamQuery (BQ), a computational tool to count all reads able to code for any MAP in any RNA-seq data chosen by the user, and to annotate each MAP with all available biological features. Using BQ, we found that most canonical MAPs can derive from an average of two different genomic regions, whereas most tested ERE-derived MAPs can be generated by numerous (median of 210) different genomic regions and RNA transcripts. We show that published ERE MAPs considered as TSA candidates can be coded by numerous other genomic regions than those previously studied, resulting in high undetected expression in normal tissues. We also show that some mutated neoantigens previously published as presumably specific anti-cancer targets can in fact be generated by other non-mutated, non-coding, widely expressed RNA-seq reads in normal tissues. We therefore conclude that BQ could become an essential tool in any TSA-identification/validation pipelines in the near future.