Abstract :
[en] This thesis is about the computer vision-based automation of specific tasks of face perception, for specific applications where they are essential. These tasks, and the applications in which they are automated, deal with the interpretation of facial expressions.
Our first application of interest is the automatic recognition of sign language, as carried out via a chain of automatic systems that extract visual communication cues from the image of a signer, transcribe these visual cues to an intermediary semantic notation, and translate this semantic notation to a comprehensible text in a spoken language. For use within the visual cue extraction part of such a system chain, we propose a computer vision system that automatically extracts facial communication cues from the image of a signer, based on a pre-existing facial landmark point tracking method and its various robust refinements. With this system, our contribution notably lies in the fruitful use of this tracking method and its refinements within a sign language recognition system chain. We consider the facial communication cues extracted by our system as facial expressions with a specific interpretation useful to this application.
Our second application of interest is the objective assessment of visual pursuit in patients with a disorder of consciousness. In the clinical practice, this delicate assessment is done by a clinician who manually moves a handheld mirror in front of the patient's face while simultaneously estimating the patient's ability to track this visual stimulus. This clinical setup is appropriate, but the assessment outcome was shown to be sensitive to the clinician's subjectivity. For use with a head-mounted device, we propose a computer vision system that attaches itself to the clinical procedure without disrupting it, and automatically estimates, in an objective way, the patient's ability to perform visual pursuit. Our system, combined with the use of a head-mounted device, therefore takes the form of an assisting technology for the clinician. It is based on the tracking of the patient's pupil and the mirror moved by the clinician, and the comparison of the obtained trajectories. All methods used within our system are simple yet specific instantiations of general methods, for the objective assessment of visual pursuit. We consider the visual pursuit ability extracted by our system as a facial expression with a specific interpretation useful to this application.
To some extent, our third application of interest is the general-purpose automatic recognition of facial expression codes in a muscle-based taxonomic coding system. We do not actually provide any new computer vision system for this application. Instead, we consider a supervised classification problem relevant to this application, and we empirically compare the performance of two general classification approaches for solving this problem, namely hierarchical classification and standard classification ("flat" classification, in this comparative context). We also compare these approaches for solving a classification problem relevant to 3D shape recognition, as well as artificial classification problems we generate in a simulation framework of our design. Our contribution lies in the general theoretical conclusions we reach from our empirical study of hierarchical vs. flat classification, which are of interest for properly using hierarchical classification in vision-based recognition problems, for example for an application of facial expression recognition.