Abstract :
[en] The development of AI methods is based on three pillars: (1) the availability of data, at least partially annotated; (2) the development of algorithms; and (3) the evaluation of performance. From a scientific perspective, the question of this development becomes more delicate when we have to agree on the metrics for assessing inference performance and when we have to decide on a ranking score. For example, the CDnet site, designed to evaluate the performance of change detection methods for video, proposes no less than 9 ranking metrics.
The aim of this presentation is to outline the various elements of a methodology for evaluating the performance of tools for detecting the motion of objects in a video.
First, we present the probabilistic framework for motion detection from the perspective of segmentation. This framework makes it possible to define a series of scores that are widely used in practice for performance evaluation. As we assume that the result is segmentation, the indicators will be calculated at the pixel level. Next, after defining the notion of ranking, we show that there are an infinite number of performance indicators that can be used for ranking, including the F (also named F1) score.
Secondly, we study the practical case of an evaluation involving several videos, or even several categories of videos. If we consider all these videos as coming from the same source, we explain why a summarization is more appropriate than an arithmetic average. On the other hand, we also discuss the case of performance evaluation for a set of sources (that might be videos, domains or datasets). Both of these cases apply to video surveillance, including all-weather surveillance.
We conclude by presenting rules of good practice that avoid interpretations drawn, for example, from redundant scores or those obtained by inadequate averaging. These rules are presented as a basis for choosing an algorithm to be implemented in practice.