attribution; traceback; rationale; citation; language model; LLM; natural language processing; NLP; explainability; trustworthiness; auditability; probability drop; contributive attribution
Abstract :
[en] The development of large language models for question answering has benefited from understanding which context sentences are responsible for their answer. These sentences are commonly called contributive attribution. Recent works use the probability drop of the answer for a modified context to estimate how well sentences in the context match the attribution. Unfortunately, this metric does not convey the necessity and sufficiency qualities that the natural language processing community has defined in previous works. We propose a metric composed of a necessary and a sufficiency score based on probability drops to fill this gap. Then, to illustrate the soundness of the metric in practice, we develop a hierarchical method, TreeFinder, which progressively selects finer parts of the context through tree-based pruning using the metric. It begins with a few coarse-grained chunks and iteratively narrows the top k chunks according to our metric down to sentence-level granularity. At each iteration, we calculate our metric using ablation-based log-probability differences and filter out irrelevant chunks. Experimental results on HotpotQA demonstrate that TreeFinder outperforms ContextCite and TracLLM in contributive attribution quality when it is composed of a few sentences. Further experiments on Loogle and LongBench-v2 show that TreeFinder ranks sentences for attribution score better than ContextCite in long contexts.
Disciplines :
Computer science
Author, co-author :
Pirenne, Lize ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids
Lambrechts, Gaspard ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids
Marlier, Norman; NRB > Data&AI
de la Brassinne Bonardeaux, Maxence; NRB > Data&AI
Louppe, Gilles ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Big Data
Ernst, Damien ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids
Language :
English
Title :
Contributive Attribution for Question Answering via Tree-based Context Pruning
Publication date :
2025
Number of pages :
16
Funders :
Walloon region
Funding number :
2010235; 1910247
Funding text :
Lize Pirenne and Damien Ernst gratefully acknowledge the financial support of the Walloon Region for Grant No. 2010235 – ARIAC by DW4AI and the NRB Research Chair on Large Language Models for the Computer Software Industry.
Gaspard Lambrechts gratefully acknowledges the financial support of the Wallonia-Brussels Federation and the Fund for Scientific Research for his FRIA and CR grants.
The present research benefited from computational resources made available on Lucia, the
Tier-1 supercomputer of the Walloon Region, infrastructure funded by the Walloon Region
under the grant agreement n°1910247.