MARVIN: A Deep Generative Model for Flow  Cytometry Analysis Informed by Biological Assumption

De Voeght, Adrien; Bodart, Fanny; Baron, Frédéric; Louppe, Gilles

Download

Poster (Scientific congresses and symposiums)

MARVIN: A Deep Generative Model for Flow Cytometry Analysis Informed by Biological Assumption

De Voeght, Adrien; Bodart, Fanny; Baron, Frédéric et al.

2026 • 41st General Annual Meeting of the Belgian Hematology Society

Peer reviewed

Permalink
https://hdl.handle.net/2268/341221

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

MARVIN_Poster_BHS_GL_FB_final.pdf

Author postprint (2.33 MB)

Creative Commons License - Public Domain Dedication

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Artificial intelligence; generative model; flow cytometry; MRD; Acute leukemia

Abstract :

[en] Introduction Flow cytometry (FCM) is widely used in research and clinical practice to characterise complex cell populations, generating high-dimensional single-cell data across numerous markers. Despite technological advances, manual gating remains the standard approach for annotating cell populations, even if the process is time-consuming and operator-dependent. Deep generative models offer the potential to perform classification and discovery tasks simultaneously, improving efficiency and consistency. Methods Model MARVIN is a semi-supervised deep generative model for cytometry analysis. Its architecture is structured around the biological assumption that the immune system consists of mixtures of cell populations. This assumption constrains its latent space to reflect the population structure, enabling biologically interpretable representations. MARVIN can perform multiple tasks: classification of known populations, discovery of novel or rare subpopulations, and exploration of immune system dynamics. Dataset and Experiments The dataset comprises 5,480,065 cells from three patients without active disease and 10,222 malignant lymphoblastic cells from four additional patients. In total, the dataset includes 12 annotated cell populations profiled with 8 markers. All measurements were transformed and standardized using an auto-logicle transformation. Classification task: We trained the model by using a large dataset combining labelled cells from one patient and unlabeled cells from others. Cell-discovery tasks: Healthy and pathological cells were merged, and two analyses were conducted: (i) (ii) Results Subpopulation discovery: Increasing the number of clusters in the latent space and masking malignant cells during training and evaluating whether MARVIN isolates pathological cells into additional clusters. Anomaly detection: A previously unseen cell population was provided to the model without addition of new clusters, and reconstruction error was used to assess its dissimilarity from learned populations. Classification task: Accuracy, F1 score and balanced accuracy are for patient 2, 99.21%, 94.83%, 96.83%, respectively and for patient 3, 75.88%, 78.41% and 92.26%, respectively. Discovery/anomaly detection MARVIN successfully highlighted rare pathological populations (<0.1%). Through cluster expansion, it identified new pathological populations as distinct from healthy cells. It grouped two small MRD populations (MRD2 and MRD4) into the same cluster while still detecting subtle differences, and it mapped patient 1 and 3 blast groups into separate clusters. Marvin detected and correctly assigned 99.2% leukemic cells in new clusters. Using reconstruction error, MARVIN identified all pathological populations as previously unseen and suitable for further characterisation. Conclusion MARVIN is a semi-supervised generative model grounded in biological assumptions for FCM data. It can be trained on routinely standardised datasets and applied across instruments, supporting broad laboratory implementation. MARVIN achieves high classification accuracy and detects novel populations through expanded clustering and reconstruction-loss evaluation. Ongoing work focuses on biological refinement to improve rare population clustering and applying MARVIN to study MRD dynamics in acute leukemia.

Disciplines :

Hematology

Author, co-author :

De Voeght, Adrien ; Université de Liège - ULiège > Département des sciences cliniques

Bodart, Fanny ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Big Data

Baron, Frédéric ; Université de Liège - ULiège > Département des sciences cliniques

Louppe, Gilles ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Big Data

Language :

English

Title :

MARVIN: A Deep Generative Model for Flow Cytometry Analysis Informed by Biological Assumption

Publication date :

06 February 2026

Event name :

41st General Annual Meeting of the Belgian Hematology Society

Event date :

6-7 february

Audience :

International

Peer review/Selection committee :

Peer reviewed

Available on ORBi :

since 09 February 2026

Statistics

Number of views

29 (6 by ULiège)

Number of downloads

9 (2 by ULiège)

More statistics