Machine learning analysis of the T cell receptor repertoire identifies sequence features of self-reactivity

Johannes Textor*, Franka Buytenhuijs, Dakota Rogers, Ève Mallet Gauthier, Shabaz Sultan, Inge M.N. Wortel, Kathrin Kalies, Anke Fähnrich, René Pagel, Heather J. Melichar, Jürgen Westermann, Judith N. Mandl*

*Corresponding author for this work


The T cell receptor (TCR) determines specificity and affinity for both foreign and self-peptides presented by the major histocompatibility complex (MHC). Although the strength of TCR interactions with self-pMHC impacts T cell function, it has been challenging to identify TCR sequence features that predict T cell fate. To discern patterns distinguishing TCRs from naive CD4+ T cells with low versus high self-reactivity, we used data from 42 mice to train a machine learning (ML) algorithm that identifies population-level differences between TCRβ sequence sets. This approach revealed that weakly self-reactive T cell populations were enriched for longer CDR3β regions and acidic amino acids. We tested our ML predictions of self-reactivity using retrogenic mice with fixed TCRβ sequences. Extrapolating our analyses to independent datasets, we predicted high self-reactivity for regulatory T cells and slightly reduced self-reactivity for T cells responding to chronic infections. Our analyses suggest a potential trade-off between TCR repertoire diversity and self-reactivity. A record of this paper's transparent peer review process is included in the supplemental information.

Original languageEnglish
JournalCell Systems
Issue number12
Pages (from-to)1059-1073.e5
Publication statusPublished - 20.12.2023

Research Areas and Centers

  • Academic Focus: Center for Infection and Inflammation Research (ZIEL)

DFG Research Classification Scheme

  • 205-33 Anatomy
  • 204-05 Immunology

Cite this