Explainable machine learning; LIME; Local model-agnostic explanations; SHAP; Additive models; Black box modelling; Local model; Local model-agnostic explanation; Local permutations; Logistics regressions; Machine-learning; Non-additive; Information Systems; Computer Science Applications; Computer Networks and Communications
Abstract :
[en] Local model-agnostic additive explanation techniques decompose the predicted output of a black-box model into additive feature importance scores. Questions have been raised about the accuracy of the produced local additive explanations. We investigate this by studying whether some of the most popular explanation techniques can accurately explain the decisions of linear additive models. We show that even though the explanations generated by these techniques are linear additives, they can fail to provide accurate explanations when explaining linear additive models. In the experiments, we measure the accuracy of additive explanations, as produced by, e.g., LIME and SHAP, along with the non-additive explanations of Local Permutation Importance (LPI) when explaining Linear and Logistic Regression and Gaussian naive Bayes models over 40 tabular datasets. We also investigate the degree to which different factors, such as the number of numerical or categorical or correlated features, the predictive performance of the black-box model, explanation sample size, similarity metric, and the pre-processing technique used on the dataset can directly affect the accuracy of local explanations.
Disciplines :
Computer science
Author, co-author :
Rahnama, Amir Hossein Akhavan ; Department of Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
Bütepage, Judith; Department of Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
Geurts, Pierre ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Algorithmique des systèmes en interaction avec le monde physique
Boström, Henrik; Department of Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
Language :
English
Title :
Can local explanation techniques explain linear additive models?
Aas K, Jullum M, Løland A (2021) Explaining individual predictions when features are dependent: more accurate approximations to shapley values. Artif Intell 298:103502 DOI: 10.1016/j.artint.2021.103502
Adebayo J, Gilmer J, Muelly M, Goodfellow I, Hardt M, Kim B (2018) Sanity checks for saliency maps. arXiv preprint arXiv:1810.03292
Agarwal C, Krishna S, Saxena E, Pawelczyk M, Johnson N, Puri I, Zitnik M, Lakkaraju H (2022) Openxai: towards a transparent evaluation of model explanations. Adva Neur Inform Process Syst 35:15784–15799
Alvarez Melis D, Jaakkola T (2018) Towards robust interpretability with self-explaining neural networks. Advances in neural information processing systems 31
Alvarez-Melis D, Jaakkola TS (2018) On the robustness of interpretability methods. ICML Workshop on human interpretability in machine learning
Amparore E, Perotti A, Bajardi P (2021) To trust or not to trust an explanation: using leaf to evaluate local linear XAI methods. PeerJ Comput Sci 7:479 DOI: 10.7717/peerj-cs.479
Breiman L (2001) Random forests. Mach Learn 45(1):5–32 DOI: 10.1023/A:1010933404324
Casalicchio G, Molnar C, Bischl B (2018) Visualizing the feature importance for black box models. In: Joint European conference on machine learning and knowledge discovery in databases, pp. 655–670. Springer
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608
Faber L, Moghaddam AK, Wattenhofer R (2021) When comparing to ground truth is wrong: On evaluating gnn explanation methods. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp. 332–341
Fong RC, Vedaldi A (2017) Interpretable explanations of black boxes by meaningful perturbation. In: Proceedings of the IEEE international conference on computer vision, pp. 3429–3437
Freitas AA (2014) Comprehensible classification models: a position paper. ACM SIGKDD Explorat Newsl 15(1):1–10 DOI: 10.1145/2594473.2594475
Ghorbani A, Abid A, Zou J (2019) Interpretation of neural networks is fragile. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 3681–3688
Gosiewska A, Biecek P (2019) Do not trust additive explanations. arXiv preprint arXiv:1903.11420
Guidotti R (2021) Evaluating local explanation methods on ground truth. Artif Intell 291:103428 DOI: 10.1016/j.artint.2020.103428
Hakkoum H, Abnane I, Idri A (2022) Interpretability in the medical field: a systematic mapping and review study. Appl Soft Comput 117:108391 DOI: 10.1016/j.asoc.2021.108391
Hooker S, Erhan D, Kindermans P-J, Kim B (2019) A benchmark for interpretability methods in deep neural networks. Advances in Neural Information Processing Systems 32 (NeurIPS)
Hsieh C-Y, Yeh C-K, Liu X, Ravikumar P, Kim S, Kumar S, Hsieh C-J (2020) Evaluations and methods for explanation through robustness analysis. arXiv preprint arXiv:2006.00442
Kramer O, Kramer O (2016) Scikit-learn. Machine learning for evolution strategies, 45–53
Lakkaraju H, Arsov N, Bastani O (2020) Robust and stable black box explanations. In: International conference on machine learning, pp. 5628–5638. PMLR
Laugel T, Renard X, Lesot M-J, Marsala C, Detyniecki M (2018) Defining locality for surrogates in post-hoc interpretablity. arXiv preprint arXiv:1806.07498
Liu Y, Khandagale S, White C, Neiswanger W (2021) Synthetic benchmarks for scientific research in explainable machine learning. arXiv preprint arXiv:2106.12543
Liu M, Mroueh Y, Ross J, Zhang W, Cui X, Das P, Yang T (2019) Towards better understanding of adaptive gradient algorithms in generative adversarial nets. arXiv preprint arXiv:1912.11940
Lundberg S, Lee S-I (2017) A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (NeruIPS)
Molnar C, König G, Herbinger J, Freiesleben T, Dandl S, Scholbeck CA, Casalicchio G, Grosse-Wentrup M, Bischl B (2022) General pitfalls of model-agnostic interpretation methods for machine learning models. In: International Workshop on Extending Explainable AI Beyond Deep Models and Classifiers, pp. 39–68. Springer
Montavon G, Samek W, Müller K-R (2018) Methods for interpreting and understanding deep neural networks. Digit Signal Process 73:1–15 DOI: 10.1016/j.dsp.2017.10.011
Nguyen A-p, Martínez MR (2020) On quantitative aspects of model interpretability. arXiv preprint arXiv:2007.07584
Omeiza D, Speakman S, Cintas C, Weldermariam K (2019) Smooth grad-cam++: an enhanced inference level visualization technique for deep convolutional neural network models. arXiv preprint arXiv:1908.01224
Plumb G, Molitor D, Talwalkar AS (2018) Model agnostic supervised local explanations. Advances in neural information processing systems 31
Poursabzi-Sangdeh F, Goldstein DG, Hofman JM, Wortman Vaughan JW, Wallach H (2021) Manipulating and measuring model interpretability. In: Proceedings of the 2021 CHI conference on human factors in computing systems, pp. 1–52
Rahnama AHA, Boström H (2019) A study of data and label shift in the lime framework. Neurip 2019 Workshop on human-centric machine learning
Ribeiro MT, Singh S, Guestrin C (2016) “why should i trust you?” explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining, pp. 1135–1144
Ribeiro MT, Singh S, Guestrin C (2016) Model-agnostic interpretability of machine learning. ICML Workshop on human interpretability in machine
Ribeiro MT, Singh S, Guestrin C (2018) Anchors: high-precision model-agnostic explanations. In: Proceedings of the AAAI conference on artificial intelligence, vol. 32
Ross SM (2017) Introductory statistics. Academic Press, Cambridge DOI: 10.1016/B978-0-12-804317-2.00031-X
Rudin C (2018) Please stop explaining black box models for high stakes decisions. Stat, 1050:26
Samek W, Binder A, Montavon G, Lapuschkin S, Müller K-R (2016) Evaluating the visualization of what a deep neural network has learned. IEEE Trans Neural Netw Learn Syst 28(11):2660–2673 DOI: 10.1109/TNNLS.2016.2599820
Shrikumar A, Greenside P, Shcherbina A, Kundaje A (2016) Not just a black box: learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713
Sturmfels P, Lundberg S, Lee S-I (2020) Visualizing the impact of feature attribution baselines. Distill 5(1):22 DOI: 10.23915/distill.00022
van der Waa J, Nieuwburg E, Cremers A, Neerincx M (2021) Evaluating xai: a comparison of rule-based and example-based explanations. Artif Intell 291:103404 DOI: 10.1016/j.artint.2020.103404
Wang C, Han B, Patel B, Rudin C (2022) In pursuit of interpretable, fair and accurate machine learning for criminal recidivism prediction. J Quantit Criminol 39(2):519–581 DOI: 10.1007/s10940-022-09545-w
Yang M, Kim B (2019) Benchmarking attribution methods with relative feature importance. Neurip 2019 workshop on human-centric machine learning
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision, pp. 818–833. Springer