General Physics and Astronomy; General Biochemistry, Genetics and Molecular Biology; General Chemistry; Multidisciplinary
Abstract :
[en] Comprehensive understanding of the human protein-protein interaction (PPI) network, aka the human interactome, can provide important insights into the molecular mechanisms of complex biological processes and diseases. Despite the remarkable experimental efforts undertaken to date to determine the structure of the human interactome, many PPIs remain unmapped. Computational approaches, especially network-based methods, can facilitate the identification of previously uncharacterized PPIs. Many such methods have been proposed. Yet, a systematic evaluation of existing network-based methods in predicting PPIs is still lacking. Here, we report community efforts initiated by the International Network Medicine Consortium to benchmark the ability of 26 representative network-based methods to predict PPIs across six different interactomes of four different organisms: A. thaliana, C. elegans, S. cerevisiae, and H. sapiens. Through extensive computational and experimental validations, we found that advanced similarity-based methods, which leverage the underlying network characteristics of PPIs, show superior performance over other general link prediction methods in the interactomes we considered.
WBI - Wallonia-Brussels International Télévie Leon Fredericq Foundation F.R.S.-FNRS - Fonds de la Recherche Scientifique
Funding text :
L.M., A.F., and L.B. were partially supported by the ERC Advanced Grant 788893 AMDROMA “Algorithmic and Mechanism Design Research in Online Markets”, the EC H2020RIA project “SoBigData++” (871042), and the MIUR PRIN project ALGADIMAR “Algorithms, Games, and Digital Markets”. F.L. was supported by a Wallonia-Brussels International (WBI)-World Excellence Fellowship, a Fonds de la Recherche Scientifique (FRS-FNRS)-Télévie Grant (FC31747, Crédit no. 7459421F), a Herman-van Beneden Prize and a Léon Frédéricq Foundation-Josée & Jean Schmets Prize. M.V. is a Chercheur Qualifié Honoraire from the Fonds de la Recherche Scientifique (FRS-FNRS, Wallonia-Brussels Federation, Belgium). M.V acknowledges support from the National Institute of Health (R01 GM130885). P.F. and B.Á. were supported by the National Research, Development and Innovation Office of Hungary (National Heart Program NVKP 16-1-2016-0017) and the Thematic Excellence Programme (2020-4.1.1.-TKP2020) of the Ministry for Innovation and Technology in Hungary, within the framework of the Therapeutic Development and Bioimaging thematic programmes of the Semmelweis University. Project no. RRF-2.3.1-21-2022-00003 has been implemented with the support provided by the European Union. JL acknowledges support from the National Institutes of Health (R01 HL155107, R01 HL155096, U01 HG007690, and U54 HL119145); and from the American Heart Association (D700382 and CV-19). A-LB is supported by the Veteran’s Affairs Medical Center of Boston Contract #36C24122N0769, the NIH grant #1P01HL132825 And the European Union’s Horizon 2020 research and innovation programme under grant agreement No 810115 – DYNASNET. Y.-Y.L. acknowledges grants from National Institutes of Health (R01AI141529, R01HD093761, RF1AG067744, UH3OD023268, U19AI095219, and U01HL089856).
Vidal, M., Cusick, M. E. & Barabási, A.-L. Interactome networks and human disease. Cell 144, 986–998 (2011). DOI: 10.1016/j.cell.2011.02.016
Rolland, T. et al. A proteome-scale map of the human interactome network. Cell 159, 1212–1226 (2014). DOI: 10.1016/j.cell.2014.10.050
Menche, J. et al. Uncovering disease-disease relationships through the incomplete interactome. Science 347, 1257601 (2015).
Luck, K. et al. A reference map of the human binary protein interactome. Nature 580, 402–408 (2020). DOI: 10.1038/s41586-020-2188-x
Keskin, O., Tuncbag, N. & Gursoy, A. Predicting protein–protein interactions from the molecular to the proteome level. Chem. Rev. 116, 4884–4909 (2016). DOI: 10.1021/acs.chemrev.5b00683
Szilagyi, A., Grimm, V., Arakaki, A. K. & Skolnick, J. Prediction of physical protein–protein interactions. Phys. Biol. 2, S1 (2005). DOI: 10.1088/1478-3975/2/2/S01
Albert, I. & Albert, R. Conserved network motifs allow protein–protein interaction prediction. Bioinformatics 20, 3346–3352 (2004). DOI: 10.1093/bioinformatics/bth402
Wang, X.-W., Chen, Y. & Liu, Y.-Y. Link prediction through deep generative model. iScience 23, 101626 (2020). DOI: 10.1016/j.isci.2020.101626
Schoenrock, A. et al. Efficient prediction of human protein-protein interactions at a global scale. BMC Bioinform. 15, 383 (2014). DOI: 10.1186/s12859-014-0383-1
Kumar, A., Singh, S. S., Singh, K. & Biswas, B. Link prediction techniques, applications, and performance: A survey. Phys. Stat. Mech. Appl. 553, 124289 (2020). DOI: 10.1016/j.physa.2020.124289
Martínez, V., Berzal, F. & Cubero, J.-C. A survey of link prediction in complex networks. ACM Comput. Surv. 49, 1–33 (2017). DOI: 10.1145/3012704
Zahiri, J., Hannon Bozorgmehr, J. & Masoudi-Nejad, A. Computational prediction of protein–protein interaction networks: algorithms and resources. Curr. Genomics 14, 397–414 (2013). DOI: 10.2174/1389202911314060004
Valencia, A. & Pazos, F. Computational methods for the prediction of protein interactions. Curr. Opin. Struct. Biol. 12, 368–373 (2002). DOI: 10.1016/S0959-440X(02)00333-0
Rao, V. S., Srinivas, K., Sujini, G. N. & Kumar, G. N. Protein-protein interaction detection: methods and analysis. Int. J. Proteom. 2014, 147648 (2014). DOI: 10.1155/2014/147648
Lü, L. & Zhou, T. Link prediction in complex networks: a survey. Phys. Stat. Mech. Appl. 390, 1150–1170 (2011). DOI: 10.1016/j.physa.2010.11.027
Zhang, M. Chen, Y. Link prediction based on graph neural networks. In Proc. International Conference on Neural Information Processing 5171–5181 (Curran Associates Inc., 2018).
Huang, K., Xiao, C., Glass, L. M., Zitnik, M. & Sun, J. SkipGNN: predicting molecular interactions with skip-graph networks. Sci. Rep. 10, 21092 (2020). DOI: 10.1038/s41598-020-77766-9
Loscalzo, J. Network Medicine (Harvard University Press, 2017).
Arabidopsis Interactome Mapping Consortium. Evidence for network evolution in an Arabidopsis interactome map. Science 333, 601–607 (2011).
Simonis, N. et al. Empirically-controlled mapping of the Caenorhabditis elegans protein-protein interactome network. Nat. Methods 6, 47–54 (2009). DOI: 10.1038/nmeth.1279
Schwikowski, B., Uetz, P. & Fields, S. A network of protein–protein interactions in yeast. Nat. Biotechnol. 18, 1257–1261 (2000). DOI: 10.1038/82360
Franceschini, A. et al. STRING v9. 1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 41, D808–D815 (2012). DOI: 10.1093/nar/gks1094
Stark, C. et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34, D535–D539 (2006). DOI: 10.1093/nar/gkj109
Stumpf, M. P. et al. Estimating the size of the human interactome. Proc. Natl Acad. Sci. 105, 6959–6964 (2008). DOI: 10.1073/pnas.0708078105
Venkatesan, K. et al. An empirical framework for binary interactome mapping. Nat. Methods 6, 83–90 (2009). DOI: 10.1038/nmeth.1280
Saito, T. & Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10, e0118432 (2015). DOI: 10.1371/journal.pone.0118432
Ozenne, B., Subtil, F. & Maucort-Boulch, D. The precision–recall curve overcame the optimism of the receiver operating characteristic curve in rare diseases. J. Clin. Epidemiol. 68, 855–859 (2015). DOI: 10.1016/j.jclinepi.2015.02.010
Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874 (2006). DOI: 10.1016/j.patrec.2005.10.010
Davis, J. Goadrich, M. The relationship between Precision-Recall and ROC curves. In Proc. 23rd International Conference on Machine Learning, 2006; Pittsburgh, Pennsylvania (eds Cohen, W. W. & Moore, A.) 233–240 (ACM Press, 2006).
Yang, Y., Lichtenwalter, R. N. & Chawla, N. V. Evaluating link prediction methods. Knowl. Inf. Syst. 45, 751–782 (2015). DOI: 10.1007/s10115-014-0789-0
Clauset, A., Moore, C. & Newman, M. E. Hierarchical structure and the prediction of missing links in networks. Nature 453, 98–101 (2008). DOI: 10.1038/nature06830
Liu, W. & Lü, L. Link prediction based on local random walk. EPL Europhys. Lett. 89, 58007 (2010). DOI: 10.1209/0295-5075/89/58007
Lü, L., Pan, L., Zhou, T., Zhang, Y.-C. & Stanley, H. E. Toward link predictability of complex networks. Proc. Natl Acad. Sci. 112, 2325–2330 (2015). DOI: 10.1073/pnas.1424644112
Gleiser, P. M. & Danon, L. Community structure in jazz. Adv. Complex Syst. 6, 565–573 (2003). DOI: 10.1142/S0219525903001067
Newman, M. E. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74, 036104 (2006). DOI: 10.1103/PhysRevE.74.036104
Vázquez, A., Flammini, A., Maritan, A. & Vespignani, A. Modeling of protein interaction networks. Complexus 1, 38–44 (2003). DOI: 10.1159/000067642
Hart, G. T., Ramani, A. K. & Marcotte, E. M. How complete are current yeast and human protein-interaction networks? Genome Biol. 7, 1–9 (2006). DOI: 10.1186/gb-2006-7-11-120
Kovács, I. A. et al. Network-based prediction of protein interactions. Nat. Commun. 10, 1240 (2019). DOI: 10.1038/s41467-019-09177-y
Ghasemian, A., Hosseinmardi, H., Galstyan, A., Airoldi, E. M. & Clauset, A. Stacking models for nearly optimal link prediction in complex networks. Proc. Natl Acad. Sci. 117, 23393–23400 (2020). DOI: 10.1073/pnas.1914950117
Dwork, C., Kumar, R., Naor, M., Sivakumar, D. Rank aggregation methods for the web, In: Proc. 10th Int. Conf. on World Wide Web, 613–622 (Association for Computing Machinery, New York, NY, USA, 2001). https://doi.org/10.1145/371920.372165.
Reilly, B. Social choice in the south seas: electoral innovation and the Borda count in the pacific island countries. Int. Polit. Sci. Rev. 23, 355–372 (2002). DOI: 10.1177/0192512102023004002
Zitnik, M. & Leskovec, J. Prioritizing network communities. Nat. Commun. 9, 1–9 (2018). DOI: 10.1038/s41467-018-04948-5
Gillis, J., Ballouz, S. & Pavlidis, P. Bias tradeoffs in the creation and analysis of protein–protein interaction networks. J. Proteom. 100, 44–54 (2014). DOI: 10.1016/j.jprot.2014.01.020
Smits, A. H. & Vermeulen, M. Characterizing protein–protein interactions using mass spectrometry: challenges and opportunities. Trends Biotechnol. 34, 825–834 (2016). DOI: 10.1016/j.tibtech.2016.02.014
Guo, Y., Yu, L., Wen, Z. & Li, M. Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences. Nucleic Acids Res. 36, 3025–3030 (2008). DOI: 10.1093/nar/gkn159
You, Z.-H. et al. Detecting protein-protein interactions with a novel matrix-based protein sequence representation and support vector machines. BioMed. Res. Int. 2015, 1–9 (2015). DOI: 10.1155/2015/867516
Zhang, S.-W., Hao, L.-Y. & Zhang, T.-H. Prediction of protein–protein interaction with pairwise kernel support vector machine. Int. J. Mol. Sci. 15, 3220–3233 (2014). DOI: 10.3390/ijms15023220
Sun, T., Zhou, B., Lai, L. & Pei, J. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinform. 18, 277 (2017). DOI: 10.1186/s12859-017-1700-2
Yu, B., Chen, C., Wang, X., Yu, Z., Ma, A. & Liu, B. Prediction of protein–protein interactions based on elastic net and deep forest. Expert Systems with Applications. 176, 114876 (2021).
You, Z. H., Li, X., & Chan, K. C. An improved sequence-based prediction protocol for protein-protein interactions using amino acids substitution matrix and rotation forest ensemble classifiers. Neurocomputing 228, 277–282 (2017).
Kong, M., Zhang, Y., Xu, D., Chen, W. & Dehmer, M. FCTP-WSRC: protein–protein interactions prediction via weighted sparse representation based classification. Front. Genet. 11, 18 (2020). DOI: 10.3389/fgene.2020.00018
Hashemifar, S., Neyshabur, B., Khan, A. A. & Xu, J. Predicting protein–protein interactions through sequence-based deep learning. Bioinformatics 34, i802–i810 (2018). DOI: 10.1093/bioinformatics/bty573
Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020). DOI: 10.1038/s41592-019-0666-6
Chen, K.-H., Wang, T.-F. & Hu, Y.-J. Protein-protein interaction prediction using a hybrid feature representation and a stacked generalization scheme. BMC Bioinform. 20, 1–17 (2019). DOI: 10.1093/bib/bbx068
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Zhang, M., Li, P., Xia, Y., Wang, K. & Jin, L. Labeling trick: A theory of using graph neural networks for multi-node representation learning. Adv. in Neural Inf. Processing Syst. 34, 9061–9073 (2021).
Balogh, O. M. et al. Efficient link prediction in the protein–protein interaction network using topological information in a generative adversarial network machine learning model. BMC Bioinform. 23, 78 (2022). DOI: 10.1186/s12859-022-04598-x
Yu, J. et al. Simple sequence-based kernels do not predict protein–protein interactions. Bioinformatics 26, 2610–2614 (2010). DOI: 10.1093/bioinformatics/btq483
Park, Y. & Marcotte, E. M. Revisiting the negative example sampling problem for predicting protein–protein interactions. Bioinformatics 27, 3024–3028 (2011). DOI: 10.1093/bioinformatics/btr514
Newman, M. E. Clustering and preferential attachment in growing networks. Phys. Rev. E 64, 025102 (2001). DOI: 10.1103/PhysRevE.64.025102
Zhou, T., Lü, L. & Zhang, Y.-C. Predicting missing links via local information. Eur. Phys. J. B 71, 623–630 (2009). DOI: 10.1140/epjb/e2009-00335-8
Barabâsi, A.-L. et al. Evolution of the social network of scientific collaborations. Phys. Stat. Mech. Appl. 311, 590–614 (2002). DOI: 10.1016/S0378-4371(02)00736-7
Jaccard, P. Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bull. Soc. Vaud. Sci. Nat. 37, 241–272 (1901).
Adamic, L. A. & Adar, E. Friends and neighbors on the web. Soc. Netw. 25, 211–230 (2003). DOI: 10.1016/S0378-8733(03)00009-1
Katz, L. A new status index derived from sociometric analysis. Psychometrika 18, 39–43 (1953). DOI: 10.1007/BF02289026
Chen, Y., Wang, W., Liu, J., Feng, J. & Gong, X. Protein interface complementarity and gene duplication improve link prediction of protein-protein interaction network. Front. Genet. 11, 291 (2020). DOI: 10.3389/fgene.2020.00291
Becchetti, L., Fazzone, A. Martini, L. Network and sequence-based prediction of protein-protein interactions. Preprint at https://arxiv.org/abs/2107.03694 (2021).
Cannistraci, C. V., Alanis-Lobato, G. & Ravasi, T. From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks. Sci. Rep. 3, 1–14 (2013). DOI: 10.1038/srep01613
Guimerà, R. & Sales-Pardo, M. Missing and spurious interactions and the reconstruction of complex networks. Proc. Natl Acad. Sci. 106, 22073–22078 (2009). DOI: 10.1073/pnas.0908366106
Colonnese, S., Petti, M., Farina, L., Scarano, G. & Cuomo, F. Protein-protein interaction prediction via graph signal processing. IEEE Access 9, 142681–142692 (2021). DOI: 10.1109/ACCESS.2021.3119569
Colonnese, S., Di Lorenzo, P., Cattai, T., Scarano, G. & Fallani, F. D. V. A joint Markov model for communities, connectivity and signals defined over graphs. IEEE Signal Process. Lett. 27, 1160–1164 (2020). DOI: 10.1109/LSP.2020.3005053
Tremblay, N. & Borgnat, P. Graph wavelets for multiscale community mining. IEEE Trans. Signal Process 62, 5227–5239 (2014). DOI: 10.1109/TSP.2014.2345355
Wu, Z. & Chen, Y. Link prediction using matrix factorization with bagging. In: 2016 IEEE/ACIS 15th Int. Conf. on Computer and Information Science (ICIS) (ed. Uehara, K.) 1–6 (IEEE, 2016).
Torres, L., Chan, K. S. & Eliassi-Rad, T. GLEE: geometric Laplacian eigenmap embedding. J. Complex Netw. 8, cnaa007 (2020). DOI: 10.1093/comnet/cnaa007
Symeonidis, P. & Mantas, N. Spectral clustering for link prediction in social networks with positive and negative links. Soc. Netw. Anal. Min. 3, 1433–1447 (2013). DOI: 10.1007/s13278-013-0128-6
Tong, H., Faloutsos, C. & Pan, J. Fast random walk with restart and its applications. In: Proc. Sixth International Conference on Data Mining (ICDM’06) (eds. Clifton, C. W., Zhong, N., Liu, J., Wah, B. W. & Wu, X.) 613–622 (IEEE, 2006).
Jeh, G. & Widom, J. Simrank: a measure of structural-context similarity. In: Proc. 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 538–543 (2002).
Grover, A. & Leskovec, J. node2vec. In Proceedings of the 22nd ACM SIGKDD The International Conference on Knowledge Discovery and Data Mining, 855–864 (ACM, New York, NY, USA, 2016).
Klambauer, G., Unterthiner, T., Mayr, A. & Hochreiter, S. Self-normalizing neural networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 972–981 (2017).
Madeddu, L., Stilo, G. & Velardi, P. A feature-learning-based method for the disease-gene prediction problem. Int. J. Data Min. Bioinform. 24, 16–37 (2020). DOI: 10.1504/IJDMB.2020.109502
Diez, D., Hutchins, A. P. & Miranda-Saavedra, D. Systematic identification of transcriptional regulatory modules from protein–protein interaction networks. Nucleic Acids Res. 42, e6 (2014). DOI: 10.1093/nar/gkt913