Abstract :
[en] Accurate and early prediction of breast cancer recurrence is crucial to guide medical decisions and treatment success. Machine learning (ML) has shown promise in this domain. However, its effectiveness critically depends on proper hyperparameter setting, a step that is not always performed systematically in the development of ML models. In this study, we aimed to highlight the impact that this process has on the final performance of ML models through a real-world case study by predicting the five-year recurrence of breast cancer patients. We compared the performance of five ML algorithms (Logistic Regression, Decision Tree, Gradient Boosting, eXtreme Gradient Boost, and Deep Neural Network) before and after optimizing their hyperparameters. Simpler algorithms showed better performance using the default hyperparameters. However, after the optimization process, the more complex algorithms demonstrated superior performance. The AUCs obtained before and after adjustment were 0.7 vs. 0.84 for XGB, 0.64 vs. 0.75 for DNN, 0.7 vs. 0.8 for GB, 0.62 vs. 0.7 for DT, and 0.77 vs. 0.72 for LR. The results underscore the critical importance of hyperparameter selection in the development of ML algorithms for the prediction of cancer recurrence. Neglecting this step can undermine the potential of more powerful algorithms and lead to the choice of suboptimal models.
Name of the research project :
Patients-centered SurvivorShIp care plan after Cancer treatments based on Big Data and Artificial Intelligence technologies
Funding text :
Part of this work was supported by the European Union\u2019s Horizon 2020 research and innovation program under Grant Agreement No. 875406. The authors from the University of Vigo received support from the European Regional Development Fund (ERDF) and the Galician Regional Government under an agreement to fund the atlanTTic Research Center for Telecommunication Technologies.
Scopus citations®
without self-citations
5