Distributed learning: Developing a predictive model based on data from multiple hospitals without data leaving the hospital – A real life proof of concept
JOCHEMS, Arthur; DEIST, Timo M.; VAN SOEST, Johanet al.
Distributed learning Developing a predictive model based on data from multiple hospitals without data leaving the hosital - a real life proof of concept.pdf
[en] Purpose: One of the major hurdles in enabling personalized medicine is obtaining sufficient patient data to feed into predictive models. Combining data originating from multiple hospitals is difficult because of ethical, legal, political, and administrative barriers associated with data sharing. In order to avoid these issues, a distributed learning approach can be used. Distributed learning is defined as learning from data without the data leaving the hospital.
Patients and methods:Clinical data from 287 lung cancer patients, treated with curative intent with chemoradiation (CRT) or radiotherapy (RT) alone were collected from and stored in 5 different medical institutes (123 patients at MAASTRO (Netherlands, Dutch), 24 at Jessa (Belgium, Dutch), 34 at Liege (Belgium, Dutch and French), 48 at Aachen (Germany, German) and 58 at Eindhoven (Netherlands, Dutch)). A Bayesian network model is adapted for distributed learning (watch the animation: http://youtu.be/nQpqMIuHyOk). The model predicts dyspnea, which is a common side effect after radiotherapy treatment of lung cancer.
Results:We show that it is possible to use the distributed learning approach to train a Bayesian network model on patient data originating from multiple hospitals without these data leaving the individual hospital. The AUC of the model is 0.61 (95%CI, 0.51–0.70) on a 5-fold cross-validation and ranges from 0.59 to 0.71 on external validation sets.
Conclusion: Distributed learning can allow the learning of predictive models on data originating from multiple hospitals while avoiding many of the data sharing barriers. Furthermore, the distributed learning approach can be used to extract and employ knowledge from routine patient data from multiple hospitals while being compliant to the various national and European privacy laws.
Disciplines :
Oncology
Author, co-author :
JOCHEMS, Arthur; Department of Radiation Oncology (MAASTRO Clinic)
DEIST, Timo M.; Department of Radiation Oncology (MAASTRO Clinic)
VAN SOEST, Johan; Department of Radiation Oncology (MAASTRO Clinic)
ELBE, Michael; University clinic Aachen
BULENS, Paul; jessa hopital > Departement of Radiation Oncology
COUCKE, Philippe ; Centre Hospitalier Universitaire de Liège - CHU > Service médical de radiothérapie
DRIES, Wim; Catharina-Hospital Eindhoven
LAMBIN, Philippe; Universiteit Maastricht > Oncology and developmental biology
Distributed learning: Developing a predictive model based on data from multiple hospitals without data leaving the hospital – A real life proof of concept
Alternative titles :
[fr] Apprentissage distribué: Élaboration d'un modèle prédictif basé sur les données de plusieurs hôpitaux sans données sortant de l'hôpital - Une preuve de concept réelle
[1] Etheredge, L.M., A rapid-learning health system. Health Aff Proj Hope 26 (2007), w107–w118, 10.1377/hlthaff.26.2.w107.
[2] Lambin, P., Petit, S.F., Aerts, H.J.W.L., van Elmpt, W.J.C., Oberije, C.J.G., Starmans, M.H.W., et al. From population to voxel-based radiotherapy: exploiting intra-tumour and intra-organ heterogeneity for advanced treatment of non-small cell lung cancer. Radiother Oncol J Eur Soc Ther Radiol Oncol 96 (2010), 145–152, 10.1016/j.radonc.2010.07.001.
[3] Lambin, P., Zindler, J., Vanneste, B., Van De Voorde, L., Jacobs, M., Eekers, D., et al. Modern clinical research: How rapid learning health care and cohort multiple randomised clinical trials complement traditional evidence based medicine. Acta Oncol 54 (2015), 1289–1300.
[4] Abernethy, A.P., Etheredge, L.M., Ganz, P.A., Wallace, P., German, R.R., Neti, C., et al. Rapid-learning system for cancer care. J Clin Oncol 28 (2010), 4268–4274, 10.1200/JCO.2010.28.5478.
[5] Lambin, P., Zindler, J., Vanneste, B.G.L., De Voorde, L.V., Eekers, D., Compter, I., et al. Decision support systems for personalized and participative radiation oncology. Adv Drug Deliv Rev, 2016, 10.1016/j.addr.2016.01.006.
[6] Jayasurya, K., Fung, G., Yu, S., Dehing-Oberije, C., De Ruysscher, D., Hope, A., et al. Comparison of Bayesian network and support vector machine models for two-year survival prediction in lung cancer patients treated with radiotherapy. Med Phys 37 (2010), 1401–1407.
[7] Dehing-Oberije, C., De Ruysscher, D., Petit, S., Van Meerbeeck, J., Vandecasteele, K., De Neve, W., et al. Development, external validation and clinical usefulness of a practical prediction model for radiation-induced dysphagia in lung cancer patients. Radiother Oncol 97 (2010), 455–461.
[8] Lambin, P., van Stiphout, R.G.P.M., Starmans, M.H.W., Rios-Velazquez, E., Nalbantov, G., Aerts, H.J.W.L., et al. Predicting outcomes in radiation oncology–multifactorial decision support systems. Nat Rev Clin Oncol 10 (2013), 27–40, 10.1038/nrclinonc.2012.196.
[9] Doshi, P., Jefferson, T., Del Mar, C., The imperative to share clinical study reports: recommendations from the Tamiflu experience. PLoS Med, 9, 2012, e1001201, 10.1371/journal.pmed.1001201.
[10] Collins, G.S., Reitsma, J.B., Altman, D.G., Moons, K.G.M., Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. Ann Intern Med 162 (2015), 55–63, 10.7326/M14-0697.
[11] Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends® Mach Learn 3 (2011), 1–122.
[13] Karr, A.F., Lin, X., Sanil, A.P., Reiter, J.P., Privacy-preserving analysis of vertically partitioned data using secure matrix products. J Off Stat, 25, 2009, 125.
[14] Sanil, A.P., Karr, A.F., Lin, X., Reiter, J.P., Privacy preserving regression modelling via distributed computation. Proc Tenth ACM SIGKDD Int Conf Knowl Discov Data Min, 2004, ACM, 677–682.
[15] Chen, R., Sivakumar, K., Kargupta, H., Learning Bayesian network structure from distributed data. Barbara, D., Kamath, C., (eds.) Proc 2003 SIAM Int Conf Data Min, 2003, Society for Industrial and Applied Mathematics, Philadelphia, PA, 284–288.
[16] Na, Y., Yang, J., Distributed Bayesian network structure learning. Ind Electron ISIE 2010 IEEE Int Symp On IEEE, 2010, 1607–1611.
[17] Gou, K.X., Jun, G.X., Learning, Zhao Z., Bayesian network structure from distributed homogeneous data. Softw Eng Artif Intell Netw Parallel Distributed Comput 2007 SNPD 2007 Eighth ACIS Int Conf On IEEE 3 (2007), 250–254.
[18] Wright, R., Yang, Z., Privacy-preserving Bayesian network structure computation on distributed heterogeneous data. Proc Tenth ACM SIGKDD Int Conf Knowl Discov Data Min, 2004, ACM, New York, NY, USA, 713–718, 10.1145/1014052.1014145.
[19] Yang, Z., Wright, R.N., Improved privacy-preserving bayesian network parameter learning on vertically partitioned data. 21st Int Conf Data Eng Workshop 2005, 2005, 1196, 10.1109/ICDE.2005.230.
[20] Meng, D., Sivakumar, K., Privacy-sensitive, Kargupta H., Bayesian network parameter learning. Data Min 2004 ICDM04 Fourth IEEE Int Conf On IEEE, 2004, 487–490.
[21] Ruysscher, D.D., Dehing, C., Yu, S., Wanders, R., Öllers, M., Dingemans, A.-M.C., et al. Dyspnea evolution after high-dose radiotherapy in patients with non-small cell lung cancer. Radiother Oncol 91 (2009), 353–359, 10.1016/j.radonc.2008.10.006.
[22] Jain, S., Poon, I., Soliman, H., Keller, B., Kim, A., Lochray, F., et al. Lung stereotactic body radiation therapy (SBRT) delivered over 4 or 11 days: a comparison of acute toxicity and quality of life. Radiother Oncol 108 (2013), 320–325, 10.1016/j.radonc.2013.06.045.
[23] Rodrigues, G., Lock, M., D'Souza, D., Yu, E., Van Dyk, J., Prediction of radiation pneumonitis by dose–volume histogram parameters in lung cancer–a systematic review. Radiother Oncol J Eur Soc Ther Radiol Oncol 71 (2004), 127–138, 10.1016/j.radonc.2004.02.015.
[24] Oberije, C., Liao, Z., De Ruysscher, D., Tucker, S., Lambin, P., Development and external validation of a model for prediction of radiation-induced dyspnea: an approach combining clinical data with information from literature. Int J Radiat Oncol Biol Phys, 78, 2010, S528.
[25] Allemang, D., Hendler, J., Semantic web for the working ontologist: effective modeling in RDFS and OWL. 2011, Elsevier.
[26] Sioutos, N., de Coronado, S., Haber, M.W., Hartel, F.W., Shaiu, W.-L., Wright, L.W., NCI thesaurus: a semantic model integrating cancer-related clinical and molecular information. J Biomed Inform 40 (2007), 30–43.
[27] Heath, T., Bizer, C., Linked data: evolving the web into a global data space. 2011, Morgen & Claypool Publishers.
[28] Rudolph, S., Gottlob, G., Horrocks, I., van Harmelen, F., Reasoning web. Semantic Technologies for intelligent data access: 9th International Summer School 2013. Mannheim, Germany, July 30–August 2, 2013. Proceedings, 2013, Springer.
[29] Casters, M., Bouman, R., Van Dongen, J., Pentaho Kettle Solutions: building open source ETL solutions with Pentaho Data Integration. 2010, John Wiley & Sons.
[30] Bizer, C., Seaborne, A., D2RQ-treating non-RDF databases as virtual RDF graphs. Proc 3rd Int Semantic Web Conf ISWC2004, Vol. 2004, 2004, Citeseer Hiroshima.
[31] Broekstra, J., Kampman, A., van Harmelen, F., Sesame: a generic architecture for storing and querying RDF and RDF schema. Horrocks, I., Hendler, J., (eds.) Semantic web — ISWC 2002, 2002, Springer, Berlin Heidelberg, 54–68.
[32] Quilitz, B., Leser, U., Querying distributed RDF data sources with SPARQL. 2008, Springer.
[33] Lauritzen, S.L., The EM algorithm for graphical association models with missing data. Comput Stat Data Anal 19 (1995), 191–201.
[34] Kuschner, K.W., Malyarenko, D.I., Cooke, W.E., Cazares, L.H., Semmes, O.J., Tracy, E.R., A Bayesian network approach to feature selection in mass spectrometry data. BMC Bioinformatics, 11, 2010, 177.
[35] Spirtes, P., Glymour, C., An algorithm for fast recovery of sparse causal graphs. Soc Sci Comput Rev 9 (1991), 62–72.
[36] Druzdzel, M.J., SMILE: structural modeling, inference, and learning engine and GeNIe: a development environment for graphical decision-theoretic models. AAAI/IAAI, 1999, 902–903.
[37] Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., et al. PROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12, 2011, 77.
[38] LeDell, E., Petersen, M.L., van der Laan, M.J., Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates, 2012.
[39] DeLong, E.R., DeLong, D.M., Clarke-Pearson, D.L., Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44 (1988), 837–845.
[40] Roelofs, E., Dekker, A., Meldolesi, E., van Stiphout, R.G.P.M., Valentini, V., Lambin, P., International data-sharing for radiotherapy research: an open-source based infrastructure for multicentric clinical data mining. Radiother Oncol 110 (2014), 370–374, 10.1016/j.radonc.2013.11.001.
[41] Skripcak, T., Belka, C., Bosch, W., Brink, C., Brunner, T., Budach, V., et al. Creating a data exchange strategy for radiotherapy research: towards federated databases and anonymised public datasets. Radiother Oncol 113 (2014), 303–309, 10.1016/j.radonc.2014.10.001.
[42] Oken, M.M., Creech, R.H., Tormey, D.C., Horton, J., Davis, T.E., McFadden, E.T., et al. Toxicity and response criteria of the Eastern Cooperative Oncology Group. Am J Clin Oncol 5 (1982), 649–655.
[43] Lambin, P., Roelofs, E., Reymen, B., Velazquez, E.R., Buijsen, J., Zegers, C.M.L., et al. Rapid Learning health care in oncology – an approach towards decision support systems enabling customised radiotherapy. Radiother Oncol 109 (2013), 159–164, 10.1016/j.radonc.2013.07.007.
[45] Ma, J., Sivakumar, K., Privacy-preserving Bayesian network learning from heterogeneous distributed data. DMIN. 2006, Citeseer, 246–252.
[46] El Emam, K., Hu, J., Mercer, J., Peyton, L., Kantarcioglu, M., Malin, B., et al. A secure protocol for protecting the identity of providers when disclosing data for disease surveillance. J Am Med Inform Assoc JAMIA 18 (2011), 212–217, 10.1136/amiajnl-2011-000100.
[47] Nalbantov, G., Kietselaer, B., Vandecasteele, K., Oberije, C., Berbee, M., Troost, E., et al. Cardiac comorbidity is an independent risk factor for radiation-induced lung toxicity in lung cancer patients. Radiother Oncol 109 (2013), 100–106.
[48] Dehing-Oberije, C., De Ruysscher, D., van Baardwijk, A., Yu, S., Rao, B., Lambin, P., The importance of patient characteristics for the prediction of radiation-induced lung toxicity. Radiother Oncol J Eur Soc Ther Radiol Oncol 91 (2009), 421–426, 10.1016/j.radonc.2008.12.002.
[49] Sesen, M.B., Nicholson, A.E., Banares-Alcantara, R., Kadir, T., Brady, M., Bayesian networks for clinical decision support in lung cancer care. PLoS ONE, 8, 2013, e82349, 10.1371/journal.pone.0082349.
[50] Oberije, C., Liao, Z., De-Ruysscher, D., Tucker, S., Lambin, P., Development and external validation of a model for prediction of radiation-induced dyspnea: an approach combining clinical data with information from literature. Int J Radiat Oncol Biol Phys, 78, 2010, S528, 10.1016/j.ijrobp.2010.07.1233.