Is the generalizability of a developed artificial intelligence algorithm for COVID-19 on chest CT sufficient for clinical use? Results from the International Consortium for COVID-19 Imaging AI (ICOVAI).
Topff, Laurens; Groot Lipman, Kevin B W; Guffens, Fredericet al.
Artificial intelligence; COVID-19; Computed tomography; Reproducibility of results; Validation study; Radiology, Nuclear Medicine and imaging; General Medicine
Abstract :
[en] ("[en] OBJECTIVES: Only few published artificial intelligence (AI) studies for COVID-19 imaging have been externally validated. Assessing the generalizability of developed models is essential, especially when considering clinical implementation. We report the development of the International Consortium for COVID-19 Imaging AI (ICOVAI) model and perform independent external validation.
METHODS: The ICOVAI model was developed using multicenter data (n = 1286 CT scans) to quantify disease extent and assess COVID-19 likelihood using the COVID-19 Reporting and Data System (CO-RADS). A ResUNet model was modified to automatically delineate lung contours and infectious lung opacities on CT scans, after which a random forest predicted the CO-RADS score. After internal testing, the model was externally validated on a multicenter dataset (n = 400) by independent researchers. CO-RADS classification performance was calculated using linearly weighted Cohen's kappa and segmentation performance using Dice Similarity Coefficient (DSC).
RESULTS: Regarding internal versus external testing, segmentation performance of lung contours was equally excellent (DSC = 0.97 vs. DSC = 0.97, p = 0.97). Lung opacities segmentation performance was adequate internally (DSC = 0.76), but significantly worse on external validation (DSC = 0.59, p < 0.0001). For CO-RADS classification, agreement with radiologists on the internal set was substantial (kappa = 0.78), but significantly lower on the external set (kappa = 0.62, p < 0.0001).
CONCLUSION: In this multicenter study, a model developed for CO-RADS score prediction and quantification of COVID-19 disease extent was found to have a significant reduction in performance on independent external validation versus internal testing. The limited reproducibility of the model restricted its potential for clinical use. The study demonstrates the importance of independent external validation of AI models.
KEY POINTS: • The ICOVAI model for prediction of CO-RADS and quantification of disease extent on chest CT of COVID-19 patients was developed using a large sample of multicenter data. • There was substantial performance on internal testing; however, performance was significantly reduced on external validation, performed by independent researchers. The limited generalizability of the model restricts its potential for clinical use. • Results of AI models for COVID-19 imaging on internal tests may not generalize well to external data, demonstrating the importance of independent external validation.","[en] ","")
Disciplines :
Cardiovascular & respiratory systems Radiology, nuclear medicine & imaging
Author, co-author :
Topff, Laurens ; Department of Radiology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066, CX, Amsterdam, The Netherlands. l.topff@nki.nl ; GROW School for Oncology and Reproduction, Maastricht University, Universiteitssingel 40, 6229 ER, Maastricht, The Netherlands. l.topff@nki.nl
Groot Lipman, Kevin B W; Department of Radiology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066, CX, Amsterdam, The Netherlands ; GROW School for Oncology and Reproduction, Maastricht University, Universiteitssingel 40, 6229 ER, Maastricht, The Netherlands ; Department of Thoracic Oncology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066, CX, Amsterdam, The Netherlands
Guffens, Frederic; Department of Radiology, University Hospitals Leuven, Herestraat 49, 3000, Leuven, Belgium
Wittenberg, Rianne; Department of Radiology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066, CX, Amsterdam, The Netherlands
Bartels-Rutten, Annemarieke; Department of Radiology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066, CX, Amsterdam, The Netherlands
van Veenendaal, Gerben; Aidence, Amsterdam, The Netherlands
Hess, Mirco; Aidence, Amsterdam, The Netherlands
Lamerigts, Kay; Aidence, Amsterdam, The Netherlands
Wakkie, Joris; Aidence, Amsterdam, The Netherlands
Ranschaert, Erik; Department of Radiology, St. Nikolaus Hospital, Hufengasse 4-8, 4700, Eupen, Belgium ; Ghent University, C. Heymanslaan 10, 9000, Ghent, Belgium
Trebeschi, Stefano; Department of Radiology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066, CX, Amsterdam, The Netherlands
Visser, Jacob J; Department of Radiology and Nuclear Medicine, Erasmus MC, University Medical Center Rotterdam, Dr. Molewaterplein 40, 3015, GD, Rotterdam, The Netherlands
Beets-Tan, Regina G H; Department of Radiology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066, CX, Amsterdam, The Netherlands ; GROW School for Oncology and Reproduction, Maastricht University, Universiteitssingel 40, 6229 ER, Maastricht, The Netherlands ; Institute of Regional Health Research, University of Southern Denmark, Campusvej 55, 5230, Odense, Denmark
ICOVAI, International Consortium for COVID-19 Imaging AI; Julien Guiot, Annemiek Snoeckx, Peter Kint, Lieven Van Hoe, Carlo Cosimo Quattrocchi, Dennis Dickerscheid, Samir Lounis, Eric Schulze, Arnout Eric-bart Sjer, Niels van Vucht, Jeroen A.W. Tielbeek, Frank Raat, Daniël Eijspaart & Ausami Abbas
Guiot, Julien ; Centre Hospitalier Universitaire de Liège - CHU > > Service de pneumologie - allergologie
Is the generalizability of a developed artificial intelligence algorithm for COVID-19 on chest CT sufficient for clinical use? Results from the International Consortium for COVID-19 Imaging AI (ICOVAI).
Shi F, Wang J, Shi J et al (2021) Review of artificial intelligence techniques in imaging data acquisition, segmentation, and diagnosis for COVID-19. IEEE Rev Biomed Eng 14:4–15. 10.1109/RBME.2020.2987975 DOI: 10.1109/RBME.2020.2987975
Francone M, Iafrate F, Masci GM et al (2020) Chest CT score in COVID-19 patients: correlation with disease severity and short-term prognosis. Eur Radiol 30:6808–6817. 10.1007/s00330-020-07033-y DOI: 10.1007/s00330-020-07033-y
Yang R, Li X, Liu H et al (2020) Chest CT severity score: an imaging tool for assessing severe COVID-19. Radiol Cardiothorac Imaging 2:e200047. 10.1148/ryct.2020200047 DOI: 10.1148/ryct.2020200047
Wang X, Hu X, Tan W et al (2021) Multicenter study of temporal changes and prognostic value of a CT visual severity score in hospitalized patients with coronavirus disease (COVID-19). AJR Am J Roentgenol 217:83–92. 10.2214/AJR.20.24044 DOI: 10.2214/AJR.20.24044
Lanza E, Muglia R, Bolengo I et al (2020) Quantitative chest CT analysis in COVID-19 to predict the need for oxygenation support and intubation. Eur Radiol 30:6770–6778. 10.1007/s00330-020-07013-2 DOI: 10.1007/s00330-020-07013-2
Grodecki K, Lin A, Cadet S et al (2020) Quantitative burden of COVID-19 pneumonia at chest CT predicts adverse outcomes: a post hoc analysis of a prospective international registry. Radiology Cardiothorac Imaging 2:e200389. https://doi.org/10.1148/ryct.2020200389
Prokop M, van Everdingen W, van Rees VT et al (2020) CO-RADS: a categorical CT assessment scheme for patients suspected of having COVID-19—definition and evaluation. Radiology 296:E97–E104. 10.1148/radiol.2020201473 DOI: 10.1148/radiol.2020201473
Lieveld AWE, Azijli K, Teunissen BP et al (2021) Chest CT in COVID-19 at the ED: validation of the COVID-19 Reporting and Data System (CO-RADS) and CT severity score: a prospective, multicenter, observational study. Chest 159:1126–1135. 10.1016/j.chest.2020.11.026 DOI: 10.1016/j.chest.2020.11.026
Abdel-Tawab M, Basha MAA, Mohamed IAI et al (2021) Comparison of the CO-RADS and the RSNA chest CT classification system concerning sensitivity and reliability for the diagnosis of COVID-19 pneumonia. Insights Imaging 12:55. 10.1186/s13244-021-00998-4 DOI: 10.1186/s13244-021-00998-4
Inui S, Kurokawa R, Nakai Y et al (2020) Comparison of chest CT grading systems in coronavirus disease 2019 (COVID-19) pneumonia. Radiol Cardiothorac Imaging 2:e200492. 10.1148/ryct.2020200492 DOI: 10.1148/ryct.2020200492
Shah C, Kohlmyer S, Hunter KJ, (2021) A translational clinical assessment workflow for the validation of external artificial intelligence models. In: Medical Imaging 2021: Imaging Informatics for Healthcare, Research, and Applications. SPIE, pp 92–102
Roberts M, Driggs D, Thorpe M et al (2021) Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nature Machine Intelligence 3:199–217. 10.1038/s42256-021-00307-0 DOI: 10.1038/s42256-021-00307-0
Diakogiannis FI, Waldner F, Caccetta P, Wu C (2019) ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data. arXiv:1904.00592 [cs.CV] 10.48550/arXiv.1904.00592
Ramspek CL, Jager KJ, Dekker FW et al (2021) External validation of prognostic models: what, why, how, when and where? Clin Kidney J 14:49–58. 10.1093/ckj/sfaa188 DOI: 10.1093/ckj/sfaa188
Feldman V, Frostig R, Hardt M (2019) The advantages of multiple classes for reducing overfitting from test set reuse. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th International Conference on Machine Learning. PMLR, pp 1892–1900
Lessmann N, Sánchez CI, Beenen L et al (2021) Automated assessment of COVID-19 Reporting and Data System and chest CT severity scores in patients suspected of having COVID-19 using artificial intelligence. Radiology 298:E18–E28. 10.1148/radiol.2020202439 DOI: 10.1148/radiol.2020202439
Wang S, Zha Y, Li W et al (2020) A fully automatic deep learning system for COVID-19 diagnostic and prognostic analysis. Eur Respir J 56. 10.1183/13993003.00775-2020
Bai HX, Wang R, Xiong Z et al (2020) Artificial intelligence augmentation of radiologist performance in distinguishing COVID-19 from pneumonia of other origin at chest CT. Radiology 296:E156–E165. 10.1148/radiol.2020201491 DOI: 10.1148/radiol.2020201491
Li L, Qin L, Xu Z et al (2020) Using artificial intelligence to detect COVID-19 and community-acquired pneumonia based on pulmonary CT: evaluation of the diagnostic accuracy. Radiology 296:E65–E71. 10.1148/radiol.2020200905 DOI: 10.1148/radiol.2020200905
Zhang K, Liu X, Shen J, et al (2020) Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography. Cell 181:1423–1433.e11. 10.1016/j.cell.2020.04.045
Jin C, Chen W, Cao Y et al (2020) Development and evaluation of an artificial intelligence system for COVID-19 diagnosis. Nat Commun 11:5088. 10.1038/s41467-020-18685-1 DOI: 10.1038/s41467-020-18685-1
Wang M, Xia C, Huang L et al (2020) Deep learning-based triage and analysis of lesion burden for COVID-19: a retrospective study with external validation. Lancet Digit Health 2:e506–e515. 10.1016/S2589-7500(20)30199-0 DOI: 10.1016/S2589-7500(20)30199-0
Jungmann F, Müller L, Hahn F (2021) Commercial AI solutions in detecting COVID-19 pneumonia in chest CT: not yet ready for clinical implementation? Eur Radiol. https://doi.org/10.1007/s00330-021-08409-4
Li Z, Zhong Z, Li Y et al (2020) From community-acquired pneumonia to COVID-19: a deep learning-based method for quantitative analysis of COVID-19 on thick-section CT scans. Eur Radiol 30:6828–6837. 10.1007/s00330-020-07042-x DOI: 10.1007/s00330-020-07042-x
Pu J, Leader JK, Bandos A et al (2021) Automated quantification of COVID-19 severity and progression using chest CT images. Eur Radiol 31:436–446. 10.1007/s00330-020-07156-2 DOI: 10.1007/s00330-020-07156-2
Enshaei N, Oikonomou A, Rafiee MJ et al (2022) COVID-rate: an automated framework for segmentation of COVID-19 lesions from chest CT images. Sci Rep 12:3212. 10.1038/s41598-022-06854-9 DOI: 10.1038/s41598-022-06854-9
Wang B, Jin S, Yan Q et al (2021) AI-assisted CT imaging analysis for COVID-19 screening: building and deploying a medical AI system. Appl Soft Comput 98:106897. 10.1016/j.asoc.2020.106897 DOI: 10.1016/j.asoc.2020.106897