Can we predict the host of a virus ? A study case on plant viruses applying machine learning approaches on curated viral proteomics features

Soyeurt, Hélène; Massart, Sébastien

Unpublished conference/Abstract (Scientific congresses and symposiums)

Simankov, Nikolay; Soyeurt, Hélène; Massart, Sébastien

2026 • BELVIR 2025

Permalink
https://hdl.handle.net/2268/345482

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

BELVIR_simankov.pptx

Author postprint (13.64 MB)

Creative Commons License - Attribution, Non-Commercial, No Derivative

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

physicochemical properties; proteome-associated functional features; epidemiological control; Plant viruses; Host prediction

Abstract :

[en] Plant viruses represent a vast and phylogenetically diverse group of pathogens that threaten global agricultural productivity. To leverage the recent expansion of high-throughput sequencing, we collected, curated and clustered over 150,000 plant viral proteins, including coat protein, movement protein, RNA silencing suppressors, RdRp, and polyprotein complexes. For each protein, we generated about 1,000 physicochemical and structural features using, among others, ProtLearn and Bio2Byte algorithms. On the other hand, a database of more than 21,000 plant host-virus relationships, including 3,820 virus and 4,223 plant species, was built by large-scale gathering and data mining. This database was peer-reviewed by experts from multiple partner laboratories. Next, we designed a machine learning pipeline called Holistic AutoML-driven Robust pipeline optimization tool for Applied Multi-Omics (HARAMO). HARAMO was applied on the curated databases to identify key physicochemical signatures of proteins (amino acid composition and properties, backbone dynamics, early folding regions, properties of secondary structures, and intrinsic disorder) involved in virus-host specificity in plants. Our protein-based approach predicted more than 1,500 host plants with MCC scores (≈ robust balanced accuracy, expressed in percentage here) ranging from 80% to 99%, depending on the input viral protein and the target plant. Overall, our integrative framework offers a robust protein-based host prediction tool for elucidating complex virus–host interactions. It opens new perspectives for studying the host range of virus species and guiding both fundamental and applied research. Indeed, the HARAMO results are raising new experimental questions to better understand the interactions between a virus and its plant host. In addition, knowing the putative host range of a virus is a significant asset for epidemiological studies, providing critical insights for monitoring and managing viral spread. Currently, we are developing a user-friendly dashboard based on our database, framework and prediction models, which will be available soon. This dashboard will guide future experimental research for biological validation with partner laboratories.

Disciplines :

Biochemistry, biophysics & molecular biology
Agriculture & agronomy
Computer science

Author, co-author :

Soyeurt, Hélène ^✱; Université de Liège - ULiège > Département GxABT > Modélisation et développement

Massart, Sébastien ^✱; Université de Liège - ULiège > TERRA Research Centre > Entomologie, Phytopathologie et Productions Innovantes (EPPI)

^✱ These authors have contributed equally to this work.

Speaker :

Simankov, Nikolay ; Université de Liège - ULiège > TERRA Research Centre > Entomologie, Phytopathologie et Productions Innovantes (EPPI)

Language :

English

Title :

Can we predict the host of a virus ? A study case on plant viruses applying machine learning approaches on curated viral proteomics features

Publication date :

2026

Event name :

BELVIR 2025

Event date :

3rd of December 2025

Funders :

F.R.S.-FNRS - Fonds de la Recherche Scientifique

Funding number :

FRIA grant No. FC 52719

Available on ORBi :

since 28 May 2026

Statistics

Number of views

59 (3 by ULiège)

Number of downloads

25 (1 by ULiège)

More statistics