Ames test; deep neural networks; mutagenicity; Mutagens; Models, Chemical; Mutagens/chemistry; Deep Learning; Neural Networks, Computer; Quantitative Structure-Activity Relationship; Drug Discovery
Abstract :
[en] Assessing chemical toxicity is a multidisciplinary process, traditionally involving in vivo, in vitro and in silico tests. Currently, toxicological goal is to reduce new tests on chemicals, exploiting all information yet available. Recent advancements in machine learning and deep neural networks allow computers to automatically mine patterns and learn from data. This technology, applied to (Q)SAR model development, leads to discover by learning the structural-chemical-biological relationships and the emergent properties. Starting from Toxception, a deep neural network predicting activity from the chemical graph image, we designed SmilesNet, a recurrent neural network taking SMILES as the only input. We then integrated the two networks into C-Tox network to make the final classification. Results of our networks, trained on a ~20K molecule dataset with Ames test experimental values, match or even outperform the current state of the art. We also extract knowledge from the networks and compare it with the available mutagenic structural alerts. The advantage over traditional QSAR modelling is that our models automatically extract the features without using descriptors. Nevertheless, the model is successful if large numbers of examples are provided and computation is more complex than in classical methods.
Disciplines :
Environmental sciences & ecology
Author, co-author :
Gini, Giuseppina; DEIB, Politecnico di Milano, Milan, Italy
Zanoli, Francesco; DEIB, Politecnico di Milano, Milan, Italy
Gamba, Alessio ; Université de Liège - ULiège > GIGA > GIGA In silico medecine - Biomechanics Research Unit ; Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Laboratory of Environmental Chemistry and Toxicology, Milan, Italy
Raitano, Giuseppa; Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Laboratory of Environmental Chemistry and Toxicology, Milan, Italy
Benfenati, Emilio; Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Laboratory of Environmental Chemistry and Toxicology, Milan, Italy
Language :
English
Title :
Could deep learning in neural networks improve the QSAR models?
G.M., Maggiora, On outliers and activity cliffs–Why QSAR often disappoints, J Chem. Inf. Model. 46 (2006), pp. 1535. doi:10.1021/ci060117s.
G., Gini and A., Katrizky, Predictive toxicology of chemicals: Experiences and impact of AI tools, AAAI Spring Symposium on Predictive Toxicology SS-99-01, AAAI Press, Menlo Park, California, 1999.
J., Devillers, Neural Networks in QSAR and Drug Design, Academic Press, San Diego, CA, 1996.
Y., LeCun, Y., Bengio, and G., Hinton, Deep learning, Nature 521 (2015), pp. 436–444. doi:10.1038/nature14539.
L., Zhang, J., Tan, D., Han, and H., Zhu, From machine learning to deep learning: Progress in machine intelligence for rational drug discovery, Drug Discov. Today 22 (2017), pp. 1680–1685.
G., Goh, C., Siegel, A., Vishnu, N.O., Hodas, and N., Baker, Chemception: A deep neural network with minimal chemistry knowledge matches the performance of expert-developed QSAR/QSPR models (2017). Available at https://arxiv.org/abs/1706.066892017.
B., Ames, F.D., Lee, and W.E., Durston, An improved bacterial test system for the detection and classification of mutagens and carcinogens, Proc. Natl. Acad. Sci. USA 70 (1973), pp. 782–786. doi:10.1073/pnas.70.3.782.
E., Benfenati, A., Golbamaki, G., Raitano, A., Roncaglioni, S., Manganelli, F., Lemke, U., Norinder, E., Lo Piparo, M., Honma, A., Manganaro, and G., Gini, A large comparison of integrated SAR/QSAR models of the Ames test for mutagenicity, SAR QSAR Environ. Res. 29 (2018), pp. 591–611.
G., Gini and F., Zanoli, Machine learning and deep learning methods in ecotoxicological QSAR modeling, in Ecotoxicological QSARs, K., Roy, ed., Springer, 2019. in press.
M., Weininger, A., Weininger, and J.L., Weininger, SMILES. 2. Algorithm for generation of unique SMILES notation, J. Chem. Inf. Model. 29 (1989), pp. 97–101. doi:10.1021/ci00062a008.
W., Piegorsch and E., Zeige, Measuring intra-assay agreement for the Ames Salmonella assay, in Statistical Methods in Toxicology, L., Hotorn, ed., Lecture Notes in Medical Informatics, Springer-Verlag, Berlin Heidelberg, 1991, pp. 35–41.
NIHS. Ames/QSAR international collaborative study. Available at https://bit.ly/2z7Rg2g.
Gene-tox. Available at https://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?GENETOX.
Chemical carcinogenesis research information system (ccris). Availiable at https://bit.ly/2Su2zuw.
EURL ECVAM genotoxicity and carcinogenicity consolidated database of Ames positive chemicals. European Commission, Joint Research Centre (JRC) (2018). Available at https://data.europa.eu/euodp/data/dataset/jrc-eurl-ecvam-genotoxicity-carcinogenicity-ames.
E., Benfenati, A., Manganaro, and G., Gini, VEGA-QSAR: AI inside a platform for predictive toxicology, Workshop Popularize Artificial Intelligence (PAI) 2013, 2013, pp. 21–28. Available at http://ceur-ws.org/Vol-1107. doi:10.1016/j.msec.2013.10.014
K., Hansen, S., Mika, T., Schroeter, A., Sutter, A., Ter Laak, T., Steger-Hartmann, N., Heinrich, and K.-R., Müller, Benchmark data set for in silico prediction of Ames mutagenicity, J. Chem. Inf. Model. 49 (2009), pp. 2077–2081. PMID: 19702240. doi:10.1021/ci900161g.
Pubchem. Available at https://pubchem.ncbi.nlm.nih.gov/.
N., O’Boyle, Generating multiple SMILES (2018). Available at https://baoilleach.
M.G., Whitesides and R.F., Ismagilov, Complexity in chemistry, Science 284 (1999), pp. 89–92. doi:10.1126/science.284.5411.89.
R., Wang, Y., Fu, and L., Lai, A new atom-additive method for calculating partition coefficients, J. Chem. Inf. Comput. Sci 37 (1997), pp. 615–621. doi:10.1021/ci960169p.
T., Chen and H., Chen, Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems, IEEE Trans. Neural Network 6 (1995), pp. 911–917. doi:10.1109/72.392253.
D.P., Kingma and J., Ba, Adam: A method for stochastic optimization. CoRR, abs/1412.6980 (2014). Available at http://arxiv.org/abs/1412.6980.
W.Y., Zou, R., Socher, D., Cer, and C.D., Manning, Bilingual word embeddings for phrase-based machine translation, Proc of 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 2013, pp. 1393–1398.
Y., Bengio, P., Simard, and P., Frasconi, Learning long-term dependencies with gradient descent is difficult, IEEE Trans. Neural. Network 5 (1994), pp. 157–166. doi:10.1109/72.279181.
S., Hochreiter and J., Schmidhiber, Long short-term memory, Neural Comput. 9 (1997), pp. 1735–1780.
J., Chung, C., Gulcehre, and K., Cho, Gated feedback recurrent neural networks, 2015. arXiv:1502.02367, https://arxiv.org/abs/1502.02367.
M., Schuster and K.K., Paliwal, Bidirectional recurrent neural networks, IEEE Trans. Signal Process. 45 (1997), pp. 2673–2681.
Introduction to sequence models—rnn, bidirectional rnn, lstm, gru (2015). Available at https://bit.ly/2FoKGev.
K., Cho, A., Courville, and Y., Bengio, Describing multimedia content using attention-based encoder-decoder networks, IEEE Trans. Multimedia 17 (2015), pp. 1875-1886. doi:10.1109/TMM.2015.2477044.
G.B., Goh, N.O., Hodas, C., Siegel, and A., Vishnu, SMILES2Vec: An Interpretable General-purpose Deep Neural Network for Predicting Chemical Properties, 2017. arXiv e-prints, https://arxiv.org/abs/1712.02034.
M., Honma, A., Kitazawa, A., Cayley, R.V., Williams, C., Barber, T., Hanser, R., Saiakhov, S., Chakravarti, G.J., Myatt, K.P., Cross, E., Benfenati, G., Raitano, O., Mekenyan, P., Petkov, C., Bossa, R., Benigni, C.L., Battistelli, A., Giuliani, O., Tcheremenskaia, C., DeMeo, U., Norinder, H., Koga, C., Jose, N., Jeliazkova, N., Kochev, V., Paskaleva, C., Yang, P.R., Daga, R.D., Clark, and J., Rathman, Improvement of quantitative structure–activity relationship (QSAR) tools for predicting Ames mutagenicity: Outcomes of the Ames/QSAR international challenge project, Mutagenesis 34 (2019), pp. 3–16. doi:10.1093/mutage/gey031.
G., Gini, T., Ferrari, D., Cattaneo, N., Golbamaki, A., Manganaro, and E., Benfenati, Automatic knowledge extraction from chemical structures: The case of mutagenicity prediction, SAR QSAR Environ. Res 24 (2013), pp. 365–383. doi:10.1080/1062936X.2013.773376.
G., Raitano, A., Roncaglioni, A., Manganaro, M., Honma, L., Sousselier, Q.T., Do, E., Payan, and E., Benfenati, Integrating in Silico Models for the Prediction of Mutagenicity (Ames Test) of Botanical Ingredients of Cosmetics, Personal communication, 2019.
R., Benigni and C., Bossa, Mechanisms of chemical carcinogenicity and mutagenicity: A review with implications for predictive toxicology, Chem. Rev. 111 (2011), pp. 2507–2536. doi:10.1021/cr100222q.
A.A., Toropov, A.P., Toropova, S.E., Martyanov, E., Benfenati, G., Gini, D., Leszczynska, and J., Leszczynski, Comparison of SMILES and molecular graphs as the representation of the molecular structure for QSAR analysis for mutagenic potential of polyaromatic amine, Chemom. Intell. Lab. Syst. 109 (2011), pp. 94–100. doi:10.1016/j.chemolab.2011.07.008.