Unpublished conference/Abstract (Scientific congresses and symposiums)
Can we predict the host of a virus ? A study case on plant viruses applying machine learning approaches on curated viral proteomics features
Simankov, Nikolay; Soyeurt, Hélène; Massart, Sébastien
2026BELVIR 2025
 

Files


Full Text
BELVIR_simankov.pptx
Author postprint (13.64 MB) Creative Commons License - Attribution, Non-Commercial, No Derivative
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
physicochemical properties; proteome-associated functional features; epidemiological control; Plant viruses; Host prediction
Abstract :
[en] Plant viruses represent a vast and phylogenetically diverse group of pathogens that threaten global agricultural productivity. To leverage the recent expansion of high-throughput sequencing, we collected, curated and clustered over 150,000 plant viral proteins, including coat protein, movement protein, RNA silencing suppressors, RdRp, and polyprotein complexes. For each protein, we generated about 1,000 physicochemical and structural features using, among others, ProtLearn and Bio2Byte algorithms. On the other hand, a database of more than 21,000 plant host-virus relationships, including 3,820 virus and 4,223 plant species, was built by large-scale gathering and data mining. This database was peer-reviewed by experts from multiple partner laboratories. Next, we designed a machine learning pipeline called Holistic AutoML-driven Robust pipeline optimization tool for Applied Multi-Omics (HARAMO). HARAMO was applied on the curated databases to identify key physicochemical signatures of proteins (amino acid composition and properties, backbone dynamics, early folding regions, properties of secondary structures, and intrinsic disorder) involved in virus-host specificity in plants. Our protein-based approach predicted more than 1,500 host plants with MCC scores (≈ robust balanced accuracy, expressed in percentage here) ranging from 80% to 99%, depending on the input viral protein and the target plant. Overall, our integrative framework offers a robust protein-based host prediction tool for elucidating complex virus–host interactions. It opens new perspectives for studying the host range of virus species and guiding both fundamental and applied research. Indeed, the HARAMO results are raising new experimental questions to better understand the interactions between a virus and its plant host. In addition, knowing the putative host range of a virus is a significant asset for epidemiological studies, providing critical insights for monitoring and managing viral spread. Currently, we are developing a user-friendly dashboard based on our database, framework and prediction models, which will be available soon. This dashboard will guide future experimental research for biological validation with partner laboratories.
Disciplines :
Biochemistry, biophysics & molecular biology
Agriculture & agronomy
Computer science
Author, co-author :
Soyeurt, Hélène   ;  Université de Liège - ULiège > Département GxABT > Modélisation et développement
Massart, Sébastien   ;  Université de Liège - ULiège > TERRA Research Centre > Entomologie, Phytopathologie et Productions Innovantes (EPPI)
 These authors have contributed equally to this work.
Speaker :
Simankov, Nikolay  ;  Université de Liège - ULiège > TERRA Research Centre > Entomologie, Phytopathologie et Productions Innovantes (EPPI)
Language :
English
Title :
Can we predict the host of a virus ? A study case on plant viruses applying machine learning approaches on curated viral proteomics features
Publication date :
2026
Event name :
BELVIR 2025
Event date :
3rd of December 2025
Funders :
F.R.S.-FNRS - Fonds de la Recherche Scientifique
Funding number :
FRIA grant No. FC 52719
Available on ORBi :
since 28 May 2026

Statistics


Number of views
39 (3 by ULiège)
Number of downloads
18 (1 by ULiège)

Bibliography


Similar publications



Contact ORBi