Article (Scientific journals)
Machine Learning Algorithms to Predict Breast Cancer Recurrence Using Structured and Unstructured Sources from Electronic Health Records.
González-Castro, Lorena; Chávez, Marcela; Duflot, Patrick et al.
2023In Cancers, 15 (10), p. 2741
Peer Reviewed verified by ORBi
 

Files


Full Text
cancers-15-02741.pdf
Author postprint (318.99 kB) Creative Commons License - Public Domain Dedication
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
breast cancer; machine learning; patient stratification; recurrence prediction; secondary use; structured data; unstructured data; Oncology; Cancer Research
Abstract :
[en] Recurrence is a critical aspect of breast cancer (BC) that is inexorably tied to mortality. Reuse of healthcare data through Machine Learning (ML) algorithms offers great opportunities to improve the stratification of patients at risk of cancer recurrence. We hypothesized that combining features from structured and unstructured sources would provide better prediction results for 5-year cancer recurrence than either source alone. We collected and preprocessed clinical data from a cohort of BC patients, resulting in 823 valid subjects for analysis. We derived three sets of features: structured information, features from free text, and a combination of both. We evaluated the performance of five ML algorithms to predict 5-year cancer recurrence and selected the best-performing to test our hypothesis. The XGB (eXtreme Gradient Boosting) model yielded the best performance among the five evaluated algorithms, with precision = 0.900, recall = 0.907, F1-score = 0.897, and area under the receiver operating characteristic AUROC = 0.807. The best prediction results were achieved with the structured dataset, followed by the unstructured dataset, while the combined dataset achieved the poorest performance. ML algorithms for BC recurrence prediction are valuable tools to improve patient risk stratification, help with post-cancer monitoring, and plan more effective follow-up. Structured data provides the best results when fed to ML algorithms. However, an approach based on natural language processing offers comparable results while potentially requiring less mapping effort.
Disciplines :
Computer science
Author, co-author :
González-Castro, Lorena ;  School of Telecommunication Engineering, University of Vigo, 36310 Vigo, Spain
Chávez, Marcela ;  Department of Information System Management, Centre Hospitalier Universitaire de Liège, 4000 Liège, Belgium
Duflot, Patrick  ;  Centre Hospitalier Universitaire de Liège - CHU > > Secteur Appui méthodologique aux Projets GSI et Planification (APP)
Bleret, Valérie ;  Université de Liège - ULiège > Département des sciences cliniques
Martin, Alistair G;  Science Department, Symptoma GmbH, 1030 Vienna, Austria
Zobel, Marc;  Science Department, Symptoma GmbH, 1030 Vienna, Austria
Nateqi, Jama;  Science Department, Symptoma GmbH, 1030 Vienna, Austria ; Department of Internal Medicine, Paracelsus Medical University, 5020 Salzburg, Austria
Lin, Simon ;  Science Department, Symptoma GmbH, 1030 Vienna, Austria ; Department of Internal Medicine, Paracelsus Medical University, 5020 Salzburg, Austria
Pazos-Arias, José J ;  atlanTTic Research Center, Department of Telematics Engineering, University of Vigo, 36310 Vigo, Spain
Del Fiol, Guilherme ;  Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT 84108, USA
López-Nores, Martín ;  atlanTTic Research Center, Department of Telematics Engineering, University of Vigo, 36310 Vigo, Spain
Language :
English
Title :
Machine Learning Algorithms to Predict Breast Cancer Recurrence Using Structured and Unstructured Sources from Electronic Health Records.
Alternative titles :
[fr] Algorithmes de Machine Learning pour prédire la récidive du cancer du sein à l’aide de sources structurées et non structurées provenant de dossiers médicaux électroniques.
Publication date :
13 May 2023
Journal title :
Cancers
eISSN :
2072-6694
Publisher :
MDPI, Switzerland
Volume :
15
Issue :
10
Pages :
2741
Peer reviewed :
Peer Reviewed verified by ORBi
Name of the research project :
Patients-centered SurvivorShIp care plan after Cancer treatments based on Big Data and Artificial Intelligence technologies
Funders :
EU - European Union
Funding text :
Part of this work was supported by the European Union’s Horizon 2020 research and innovation program under Grant Agreement No. 875406. The authors from the University of Vigo received support from the European Regional Development Fund (ERDF) and the Galician Regional Government under an agreement to fund the atlanTTic Research Center for Telecommunication Technologies.
Available on ORBi :
since 02 December 2024

Statistics


Number of views
5 (0 by ULiège)
Number of downloads
2 (0 by ULiège)

Scopus citations®
 
10
Scopus citations®
without self-citations
9
OpenCitations
 
3
OpenAlex citations
 
15

Bibliography


Similar publications



Contact ORBi