Abstract :
[en] In recent years, unmanned aerial vehicles (UAVs), also known as drones, have become increasingly popular. In particular, small UAVs, such as quadcopters, have gained attention due to their agility and maneuverability, making them ideal for tasks such as aerial photography, surveying, and delivery. However, remote controlling such vehicles requires specific skills that are scarce and expensive to hire, which creates a demand for automated or assisted flight.
Planning a motion or a trajectory in an uncontrolled environment while avoiding undesired collisions can hardly be achieved without knowing the distance to potential obstacles in the way. However, the dedicated sensors typically used in robotics for estimating distances can hardly be used on small UAVs because of their size, weight, or power requirement. As drones are usually equipped with one or several cameras for various purposes, using the visual information to infer distances to objects in the environment emerged as a compelling, though challenging, option to replace these sensors.
Depth estimation, the process of determining the distance from a camera to an object in the environment, is a computer vision task that has varied applications. When used for autonomous piloting application, depth estimation methods need to fulfill a series of requirements that are not necessarily needed for other applications, and that were left unanswered has a whole. This dissertation aims at addressing reliable monocular depth estimation for UAVs.
Similar to many other computer vision fields, the state of the art in depth estimation has been led by methods based on deep learning, which require large datasets to be trained on. In the first part of this work, we note a lack of dataset suitable to train and test depth estimation methods for outdoor UAV applications, and introduce Mid-Air, a new multipurpose synthetic dataset of low altitude drone flights in unstructured environments. The ground-truth data available with this new dataset makes it useful not only for depth estimation, but also for various other computer vision tasks such as visual odometry, semantic segmentation, or even surface normal estimation.
In the other parts of this work, we address the challenge of depth estimation itself. First, we proceed to identify the weaknesses of existing depth estimation methods when considering a major requirement for UAV applications that is the ability to handle a wide range of environments, even unseen ones. We use this analysis to propose a new method, called M4Depth, that makes use of a notion of visual parallax, that we define, to avoid the weaknesses identified in other methods. Second, we consider the reliability requirement on depth estimation. To this extent, we investigate the use of uncertainty as a mean to anticipate erroneous data in the depth estimates. We then present a new method, called M4Depth+U, that upgrades M4Depth to jointly estimate depth and its uncertainty, and show that the obtained uncertainty is indeed representative of the error on the depth estimates.
Our tests on several datasets and in various conditions, including zero-shot cross-dataset transfer, show that our methods are robust to visual changes, and generalize better than existing methods, while being more computationally efficient. With these results, M4Depth+U emerges as an excellent and reliable joint depth and uncertainty estimator, and shows that it has the properties expected from a depth estimation method targeting UAV applications.