Global multi-output decision trees for interaction prediction

[en] Interaction data are characterized by two sets of objects, each described by their own set of features. They are often modeled as networks and the values of interest are the possible interactions between two instances, represented usually as a matrix. Here, a novel global decision tree learning method is proposed, where multi-output decision trees are constructed over the global interaction setting, addressing the problem of interaction prediction as a multi-label classiﬁcation task. More speciﬁcally, the tree is constructed bysplitting the interaction matrix both row-wise and column-wise, incorporating this way both interaction dataset features in the learning procedure. Experiments are conducted across several heterogeneous interaction datasets from the biomedical domain. The experimental results indicate the superiority of the proposed method against other decision tree approaches in terms of predictive accuracy, model size and computational efﬁciency. The performance is boosted by fully exploiting the multi-output structure of the model. We conclude that the proposed method should be considered in interaction prediction tasks, especially where interpretable models are desired.

Disciplines :

Computer science

Author, co-author :

Pliakos, Konstantinos; KU Leuven > Department of Public Health and Primary Care

Geurts, Pierre ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Algorith. des syst. en interaction avec le monde physique

Vens, Celine; KU Leuven > Department of Public Health and Primary Care

Language :

English

Title :

Global multi-output decision trees for interaction prediction

Publication date :

2018

Journal title :

Machine Learning

ISSN :

0885-6125

eISSN :

1573-0565

Publisher :

Wolters Kluwer, Netherlands

Volume :

107

Pages :

1257-1281

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

https://link.springer.com/epdf/10.1007/s10994-018-5700-x?author_access_token=OpLYp-4kCBQXOcENXZ_f3Pe4RwlQNchNByi7wbcMAY4qFr660Y0nic8HSKI57goLkTr8FpfgPCd3FNHaO4fbKdn8DHyC5GTXb6bHA-lcRUW2WjEpOmq0yPBWlS5Cgjgyk7Y3aTZX4yVGB4P6EPdiBA%3D%3D

Available on ORBi :

since 17 January 2019

Statistics

Number of views

158 (4 by ULiège)

Number of downloads

2 (2 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

Barutcuoglu, Z., Schapire, R. E., & Troyanskaya, O. G. (2006). Hierarchical multi-label prediction of gene function. Bioinformatics, 22(7), 830–836
Ben-Hur, A., & Noble, W. S. (2005). Kernel methods for predicting protein-protein interactions. Bioinformatics, 21(SUPPL. 1), i38–i46
Berge, C. (1973). Graphs and hypergraphs. Amsterdam, The Netherlands: North-Holland
Bleakley, K., Biau, G., & Vert, J. P. (2007). Supervised reconstruction of biological networks with local models. Bioinformatics, 23(13), i57–i65
Blockeel, H., Raedt, L. D., & Ramon, J.: Top-down induction of clustering trees. In Proceedings of the 15th international conference on machine learning (ICML) (pp. 55–63). Morgan Kaufmann Publishers Inc., San Francisco (1998)
Boutell, M. R., Luo, J., Shen, X., & Brown, C. M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757–1771
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32
Davis, J. & Goadrich, M.: The relationship between precision-recall and ROC curves. In Proceedings of the 23rd international conference on machine learning (ICML) (pp. 233–240). New York, USA (2006)
Dembczynski, K., Waegeman, W., Cheng, W., & Hellermeier, E. (2012). On label dependence and loss minimization in multi-label classification. Machine Learning, 88(1–2), 5–45
Faith, J. J., Hayete, B., Thaden, J. T., Mogno, I., Wierzbowski, J., Cottarel, G., et al. (2007). Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biology, 5(1), e8
Fan, W., & Bifet, A. (2013). Mining big data: Current status, and forecast to the future. ACM SIGKDD Explorations Newsletter, 14(2), 1–5
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42
Geurts, P., Irrthum, A., & Wehenkel, L. (2009). Supervised learning with decision tree-based methods in computational and systems biology. Molecular Biosystems, 5(12), 1593–1605
Guo, X., Liu, F., Ju, Y., Wang, Z., & Wang, C. (2016). Human protein subcellular localization with integrated source and multi-label ensemble classifier. Scientific Reports, 6, 28087
Henriques, R., Antunes, C., & Madeira, S. C. (2015). A structured view on pattern mining-based biclustering. Pattern Recognition, 48(12), 3941–3958
Huang, L., Liao, L., & Wu, C. H. (2016). Protein-protein interaction prediction based on multiple kernels and partial network with linear programming. BMC Systems Biology, 10(S2), 45
Joly, A., Geurts, P., & Wehenkel, L.: Random forests with random projections of the output space for high dimensional multi-label classification. In Proceedings of the European conference on machine learning and knowledge discovery in databases, (ECML PKDD) (Vol. 8724, pp. 607–622) (2014)
Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260
Kocev, D., Vens, C., Struyf, J., & Džeroski, S. (2013). Tree ensembles for predicting structured outputs. Pattern Recognition, 46(3), 817–833
Kuhn, M., von Mering, C., Campillos, M., Jensen, L. J., & Bork, P. (2007). Stitch: Interaction networks of chemicals and proteins. Nucleic Acids Research, 36(suppl–1), D684–D688
Lanckriet, G., & Cristianini, N. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5(Jan), 27–72
Li, X., & Chen, H. (2013). Recommendation as link prediction in bipartite graphs: A graph kernel-based machine learning approach. Decision Support Systems, 54(2), 880–890
MacIsaac, K. D., Wang, T., Gordon, D. B., Gifford, D. K., Stormo, G. D., & Fraenkel, E. (2006). An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics, 7(1), 113
Mayer-Schönberger, V., & Cukier, K. (2014). Big data: A revolution that will transform how we live, work, and think. Boston: Houghton Mifflin Harcourt
Menon, A. K., & Elkan, C. (2010). Predicting labels for dyadic data. Data Mining and Knowledge Discovery, 21(2), 327–343
Nascimento, A. C. A., Prudêncio, R. B. C., & Costa, I. G. (2016). A multiple kernel learning algorithm for drug-target interaction prediction. BMC Bioinformatics, 17(1), 46
Papagiannopoulou, C., Tsoumakas, G., & Tsamardinos, I.: Discovering and exploiting deterministic label relationships in multi-label learning. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, (KDD) (pp. 915–924) (2015)
Park, Y., & Marcotte, E. M. (2012). Flaws in evaluation schemes for pair-input computational predictions. Nature Methods, 9(12), 1134–1136
Pratanwanich, N., Lio, P., & Stegle, O.: Warped matrix factorisation for multi-view data integration. In Joint European conference on machine learning and knowledge discovery in databases (pp. 789–804). Springer (2016)
Qi, G. J., Hua, X. S., Rui, Y., Tang, J., Mei, T., & Zhang, H. J.: Correlative multi-label video annotation. In Proceedings of the 15th ACM international conference on Multimedia (pp. 17–26). New York, USA (2007)
Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2011). Classifier chains for multi-label classification. Machine Learning, 85(3), 333–359
Ruan, J., & Zhang, W. (2006). A bi-dimensional regression tree approach to the modeling of gene expression regulation. Bioinformatics, 22(3), 332–340
Schrynemackers, M., Kueffner, R., & Geurts, P. (2013). On protocols and measures for the validation of supervised methods for the inference of biological networks. Frontiers in Genetics, 4, 262
Schrynemackers, M., Wehenkel, L., Babu, M. M., & Geurts, P. (2015). Classifying pairs with trees for supervised biological network inference. Molecular Biosystems, 11(8), 2116–25
Seal, A., Ahn, Y. Y., & Wild, D. J. (2015). Optimizing drug target interaction prediction based on random walk on heterogeneous networks. Journal of Cheminformatics, 7(1), 40
Stock, M., Pahikkala, T., Airola, A., De Baets, B., & Waegeman, W. (2016). Efficient pairwise learning using kernel ridge regression: An exact two-step method. arXiv preprint arXiv:1606.04275
Stojanova, D., Ceci, M., Appice, A., & Džeroski, S. (2012). Network regression with predictive clustering trees. Data Mining and Knowledge Discovery, 25(2), 378–413
Sun, Y., & Han, J. (2012). Mining heterogeneous information networks: Principles and methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery, 3(2), 1–159
Sun, Y., & Han, J. (2013). Mining heterogeneous information networks: A structural analysis approach. ACM SIGKDD Explorations Newsletter, 14(2), 20–28
Tang, L., Rajan, S., & Narayanan, V.K.: Large scale multi-label classification via metalabeler. In Proceedings of the 18th international conference on World wide web (WWW) (pp. 211–220). New York, USA (2009)
Tsoumakas, G., Katakis, I., & Vlahavas, I. (2009). Mining multi-label data. In O. Maimon & L. Rokach (Eds.), Data mining and knowledge discovery handbook. Boston: Springer
Tsoumakas, G., Katakis, I., & Vlahavas, I. (2011). Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering, 23(7), 1079–1089
Tsoumakas, G., Zhang, M. L., & Zhou, Z. H. (2012). Introduction to the special issue on learning from multi-label data. Machine Learning, 88(1–2), 1–4
Vens, C., Struyf, J., Schietgat, L., Džeroski, S., & Blockeel, H. (2008). Decision trees for hierarchical multi-label classification. Machine Learning, 73(2), 185–214
Vert, J. P. (2010). Reconstruction of biological networks by supervised machine learning approaches. In H. M. Lodhi & S. H. Muggleton (Eds.), Elements of computational systems biology (pp. 165–188). New York: Wiley
Witten, I. H., Frank, E., & Hall, M. A. (2016). Data mining: Practical machine learning tools and techniques (4th ed.). Burlington: Morgan Kaufmann
Yamanishi, Y., Araki, M., Gutteridge, A., Honda, W., & Kanehisa, M. (2008). Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics, 24(13), i232–i240
Yin, S., Li, X., Gao, H., & Kaynak, O. (2015). Data-based techniques focused on modern industry: An overview. IEEE Transactions on Industrial Electronics, 62(1), 657–667
Zhang, M. L., & Zhou, Z. H. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048
Zhang, M. L., & Zhou, Z. H. (2014). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1819–1837
Zhang, W., Liu, F., Luo, L., & Zhang, J. (2015). Predicting drug side effects by multi-label learning and ensemble learning. BMC Bioinformatics, 16(1), 365