Abstract :
[en] In the computer vision and machine learning communities, as well as in many
other research domains, rigorous evaluation of any new method, including
classifiers, is essential. One key component of the evaluation process is the
ability to compare and rank methods. However, ranking classifiers and
accurately comparing their performances, especially when taking
application-specific preferences into account, remains challenging. For
instance, commonly used evaluation tools like Receiver Operating Characteristic
(ROC) and Precision/Recall (PR) spaces display performances based on two
scores. Hence, they are inherently limited in their ability to compare
classifiers across a broader range of scores and lack the capability to
establish a clear ranking among classifiers. In this paper, we present a novel
versatile tool, named the Tile, that organizes an infinity of ranking scores in
a single 2D map for two-class classifiers, including common evaluation scores
such as the accuracy, the true positive rate, the positive predictive value,
Jaccard's coefficient, and all F-beta scores. Furthermore, we study the
properties of the underlying ranking scores, such as the influence of the
priors or the correspondences with the ROC space, and depict how to
characterize any other score by comparing them to the Tile. Overall, we
demonstrate that the Tile is a powerful tool that effectively captures all the
rankings in a single visualization and allows interpreting them.