[en] Estimating the distance to objects is crucial for autonomous vehicles, but cost, weight or power constraints sometimes prevent the use of dedicated depth sensors. In this case, the distance has to be estimated from on-board mounted RGB cameras, which is a complex task especially for environments such as natural outdoor landscapes. In this paper, we present a new depth estimation method suitable for use in such landscapes. First, we establish a bijective relationship between depth and the visual parallax of two consecutive frames and show how to exploit it to perform motion-invariant pixel-wise depth estimation. Then, we detail our architecture which is based on a pyramidal convolutional neural network where each level refines an input parallax map estimate by using two customized cost volumes. We use these cost volumes to leverage the visual spatio-temporal constraints imposed by motion and make the network robust for varied scenes. We benchmarked our approach both in test and generalization modes on public datasets featuring synthetic camera trajectories recorded in a wide variety of outdoor scenes. Results show that our network outperforms the state of the art on these datasets, while also performing well on a standard depth estimation benchmark.
Research center :
Montefiore Institute - Montefiore Institute of Electrical Engineering and Computer Science - ULiège Telim
Disciplines :
Computer science
Author, co-author :
Fonder, Michaël ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Télécommunications
Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids
Van Droogenbroeck, Marc ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Télécommunications
Language :
English
Title :
Parallax Inference for Robust Temporal Monocular Depth Estimation in Unstructured Environments
Publication date :
01 December 2022
Journal title :
Sensors
ISSN :
1424-8220
eISSN :
1424-3210
Publisher :
Multidisciplinary Digital Publishing Institute (MDPI), Switzerland
Special issue title :
Advances in Intelligent Transportation Systems Based Sensor Fusion
Achtelik M. Bachrach A. He R. Prentice S. Roy N. Stereo vision and laser odometry for autonomous helicopters in GPS-denied indoor environments Proceedings of the Unmanned Systems Technology XI International Society for Optics and Photonics SPIE, Orlando, FL, USA 14–17 April 2009 Volume 7332 336 345 10.1117/12.819082
Dudek G. Jenkin M. Computational Principles of Mobile Robotics 2nd ed. Cambridge University Press Cambridge, UK 2010 10.1017/CBO9780511780929
Choi J. Lee G. Lee C. Reinforcement learning-based dynamic obstacle avoidance and integration of path planning Intell. Serv. Robot. 2021 14 663 677 10.1007/s11370-021-00387-2 34642589
Kim M. Kim J. Jung M. Oh H. Towards monocular vision-based autonomous flight through deep reinforcement learning Expert Syst. Appl. 2022 198 116742 10.1016/j.eswa.2022.116742
Wenzel P. Schon T. Leal-Taixe L. Cremers D. Vision-Based Mobile Robotics Obstacle Avoidance With Deep Reinforcement Learning Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) Xi’an, China 30 May–5 June 2021 14360 14366 10.1109/icra48506.2021.9560787
Fonder M. Van Droogenbroeck M. Mid-Air: A Multi-Modal Dataset for Extremely Low Altitude Drone Flights Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) UAVision, Long Beach, CA, USA 16–20 June 2019 553 562 10.1109/cvprw.2019.00081
Wang W. Zhu D. Wang X. Hu Y. Qiu Y. Wang C. Hu Y. Kapoor A. Scherer S. TartanAir: A Dataset to Push the Limits of Visual SLAM Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Las Vegas, NV, USA 25–29 October 2020 4909 4916 10.1109/iros45743.2020.9341801
Florea H. Nedevschi S. Survey on Monocular Depth Estimation for Unmanned Aerial Vehicles using Deep Learning Proceedings of the International Conference on Intelligent Computer Communication and Processing (ICCP) Cluj-Napoca, Romania 28–30 October 2022 1 8
Masoumian A. Rashwan H.A. Cristiano J. Asif M.S. Puig D. Monocular Depth Estimation Using Deep Learning: A Review Sensors 2022 22 5353 10.3390/s22145353 35891033
Ming Y. Meng X. Fan C. Yu H. Deep learning for monocular depth estimation: A review Neurocomputing 2021 438 14 33 10.1016/j.neucom.2020.12.089
Ruan X. Yan W. Huang J. Guo P. Guo W. Monocular Depth Estimation Based on Deep Learning: A Survey Proceedings of the Chinese Automation Congress (CAC) Shanghai, China 6–8 November 2020 2436 2440 10.1109/CAC51589.2020.9327548
Zhao C. Sun Q. Zhang C. Tang Y. Qian F. Monocular depth estimation based on deep learning: An overview Sci. China Technol. Sci. 2020 63 1612 1627 10.1007/s11431-020-1582-8
Schröppel P. Bechtold J. Amiranashvili A. Brox T. A Benchmark and a Baseline for Robust Multi-view Depth Estimation arXiv 2022 2209.06681
Geiger A. Lenz P. Urtasun R. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Providence, RI, USA 16–21 June 2012 3354 3361 10.1109/CVPR.2012.6248074
Saxena A. Chung S. Ng A. Learning Depth from Single Monocular Images Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) MIT Press Cambridge, MA, USA 2005 Volume 18 1161 1168
Godard C. Mac Aodha O. Brostow G.J. Unsupervised Monocular Depth Estimation with Left-Right Consistency Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Honolulu, HI, USA 21–26 July 2017 6602 6611 10.1109/CVPR.2017.699
Godard C. Mac Aodha O. Brostow G.J. Digging Into Self-Supervised Monocular Depth Estimation Proceedings of the IEEE International Conference on Computer Vision (ICCV) Seoul, Republic of Korea 27 October–2 November 2019 3827 3837 10.1109/ICCV.2019.00393
Poggi M. Aleotti F. Tosi F. Mattoccia S. Towards real-time unsupervised monocular depth estimation on CPU Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Madrid, Spain 1–5 October 2018 5848 5854 10.1109/IROS.2018.8593814
Ranftl R. Bochkovskiy A. Koltun V. Vision Transformers for Dense Prediction Proceedings of the IEEE International Conference on Computer Vision (ICCV) Montréal, QC, Canada 11–17 October 2021 12159 12168 10.1109/iccv48922.2021.01196
Farooq Bhat S. Alhashim I. Wonka P. AdaBins: Depth Estimation Using Adaptive Bins Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Nashville, TN, USA 20–25 June 2021 4008 4017 10.1109/cvpr46437.2021.00400
Ranftl R. Lasinger K. Hafner D. Schindler K. Koltun V. Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer IEEE Trans. Pattern Anal. Mach. Intell. 2022 44 1623 1637 10.1109/TPAMI.2020.3019967 32853149
Watson J. Mac Aodha O. Prisacariu V. Brostow G. Firman M. The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Nashville, TN, USA 20–25 June 2021 1164 1174 10.1109/cvpr46437.2021.00122
Kumar A. Bhandarkar S. Prasad M. DepthNet: A Recurrent Neural Network Architecture for Monocular Depth Prediction Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Salt Lake City, UT, USA 18–22 June 2018 396 404 10.1109/CVPRW.2018.00066
Patil V. Van Gansbeke W. Dai D. Van Gool L. Don’t Forget The Past: Recurrent Depth Estimation from Monocular Video IEEE Robot. Autom. Lett. 2020 5 6813 6820 10.1109/LRA.2020.3017478
Wang R. Pizer S. Frahm J.M. Recurrent Neural Network for (Un-)Supervised Learning of Monocular Video Visual Odometry and Depth Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Long Beach, CA, USA 15–20 June 2019 5550 5559 10.1109/CVPR.2019.00570
Wu Z. Wu X. Zhang X. Wang S. Ju L. Spatial Correspondence with Generative Adversarial Network:Learning Depth from Monocular Videos Proceedings of the IEEE International Conference on Computer Vision (ICCV) Seoul, Republic of Korea 27 October–2 November 2019 7494 7504 10.1109/ICCV.2019.00759
Zhang H. Shen C. Li Y. Cao Y. Liu Y. Ya Y. Exploiting Temporal Consistency for Real-Time Video Depth Estimation Proceedings of the IEEE International Conference on Computer Vision (ICCV) Seoul, Republic of Korea 27 October–2 November 2019 1725 1734 10.1109/ICCV.2019.00181
Collins R. A space-sweep approach to true multi-image matching Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) San Francisco, CA, USA 18–20 June 1996 358 363 10.1109/CVPR.1996.517097
Gallup D. Frahm J. Mordohai P. Yang Q. Pollefeys M. Real-Time Plane-Sweeping Stereo with Multiple Sweeping Directions Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Minneapolis, MN, USA 17–22 June 2007 1 8 10.1109/CVPR.2007.383245
Xing H. Cao Y. Biber M. Zhou M. Burschka D. Joint prediction of monocular depth and structure using planar and parallax geometry Pattern Recognit. 2022 130 108806 10.1016/j.patcog.2022.108806
Irani M. Anandan P. Parallax geometry of pairs of points for 3D scene analysis Proceedings of the European Conference on Computer Vision (ECCV) Cambridge, UK 15–18 April 1996 Volume 1064 17 30 10.1007/bfb0015520
Sawhney H.S. 3D geometry from planar parallax Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Seattle, WA, USA 21–23 June 1994 929 934 10.1109/cvpr.1994.323927
Luo C. Yang Z. Wang P. Wang Y. Xu W. Nevatia R. Yuille A. Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding IEEE Trans. Pattern Anal. Mach. Intell. 2020 42 2624 2641 10.1109/TPAMI.2019.2930258 31352333
Mahjourian R. Wicke M. Angelova A. Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Salt Lake City, UT, USA 18–23 June 2018 5667 5675 10.1109/CVPR.2018.00594
Luo X. Huang J. Szeliski R. Matzen K. Kopf J. Consistent video depth estimation ACM Trans. Graph. (TOG) 2020 39 71:1 71:13 10.1145/3386569.3392377
Özyeşil O. Voroninski V. Basri R. Singer A. A survey of structure from motion Acta Numer. 2017 26 305 364 10.1017/S096249291700006X
Huang P.H. Matzen K. Kopf J. Ahuja N. Huang J.B. DeepMVS: Learning Multi-view Stereopsis Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Salt Lake City, UT, USA 18–23 June 2018 2821 2830 10.1109/cvpr.2018.00298
Gu X. Fan Z. Zhu S. Dai Z. Tan F. Tan P. Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Seattle, WA, USA 13–19 June 2020 2495 2504 10.1109/CVPR42600.2020.00257
Ummenhofer B. Zhou H. Uhrig J. Mayer N. Ilg E. Dosovitskiy A. Brox T. DeMoN: Depth and Motion Network for Learning Monocular Stereo Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Honolulu, HI, USA 21–26 July 2017 5622 5631 10.1109/CVPR.2017.596
Yao Y. Luo Z. Li S. Fang T. Quan L. MVSNet: Depth Inference for Unstructured Multi-view Stereo Proceedings of the European Conference on Computer Vision (ECCV) Munich, Germany 8–14 September 2018 Volume 11212 785 801 10.1007/978-3-030-01237-3_47
Düzçeker A. Galliani S. Vogel C. Speciale P. Dusmanu M. Pollefeys M. DeepVideoMVS: Multi-View Stereo on Video with Recurrent Spatio-Temporal Fusion arXiv 2020 2012.02177
Teed Z. Deng J. DeepV2D: Video to Depth with Differentiable Structure from Motion arXiv 2018 1812.04605
Eigen D. Puhrsch C. Fergus R. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) Montreal, QC, Canada 8–13 December 2014 2366 2374
Sun D. Yang X. Liu M. Kautz J. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Salt Lake City, UT, USA 18–23 June 2018 8934 8943 10.1109/CVPR.2018.00931
Ronneberger O. Fischer P. Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI) Munich, Germany 5–9 October 2015 Volume 9351 234 241 10.1007/978-3-319-24574-4_28
Zhang F. Qi X. Yang R. Prisacariu V. Wah B. Torr P. Domain-Invariant Stereo Matching Networks Proceedings of the European Conference on Computer Vision (ECCV) Glasgow, UK 23–28 August 2020 Volume 12347 420 439 10.1007/978-3-030-58536-5_25
Xu B. Wang N. Chen T. Li M. Empirical Evaluation of Rectified Activations in Convolutional Network arXiv 2015 1505.00853
Dosovitskiy A. Fischer P. Ilg E. Hausser P. Hazirbas C. Golkov V. Smagt P.v.d. Cremers D. Brox T. FlowNet: Learning Optical Flow with Convolutional Networks Proceedings of the IEEE International Conference on Computer Vision (ICCV) Santiago, Chile 7–13 December 2015 2758 2766 10.1109/iccv.2015.316
Xu J. Ranftl R. Koltun V. Accurate Optical Flow via Direct Cost Volume Processing Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Honolulu, HI, USA 21–26 July 2017 5807 5815 10.1109/cvpr.2017.615
Chen X. Chen X. Zha Z.J. Structure-Aware Residual Pyramid Network for Monocular Depth Estimation Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) Macao, China 10–16 August 2019 694 700
Huang L. Zhang J. Zuo Y. Wu Q. Pyramid-Structured Depth MAP Super-Resolution Based on Deep Dense-Residual Network IEEE Signal Process. Lett. 2019 26 1723 1727 10.1109/LSP.2019.2944646
Liu J. Zhang X. Li Z. Mao T. Multi-Scale Residual Pyramid Attention Network for Monocular Depth Estimation Proceedings of the IEEE International Conference on Pattern Recognition (ICPR) Milan, Italy 10–15 January 2021 5137 5144 10.1109/icpr48806.2021.9412670
Carvalho M. Le Saux B. Trouvé-Peloux P. Almansa A. Champagnat F. On Regression Losses for Deep Depth Estimation Proceedings of the IEEE International Conference on Image Processing (ICIP) Athens, Greece 7–10 October 2018 2915 2919 10.1109/icip.2018.8451312
Miclea V.C. Nedevschi S. Monocular Depth Estimation with Improved Long-Range Accuracy for UAV Environment Perception IEEE Trans. Geosci. Remote. Sens. 2022 60 1 15 10.1109/TGRS.2021.3060513
He K. Zhang X. Ren S. Sun J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Proceedings of the IEEE International Conference on Computer Vision (ICCV) Santiago, Chile 7–13 December 2015 1026 1034 10.1109/ICCV.2015.123
Kingma D. Ba J. Adam: A Method for Stochastic Optimization Proceedings of the International Conference on Learning Representations (ICLR) San Diego, CA, USA 7–9 May 2015 1 15
Ros G. Sellart L. Materzynska J. Vazquez D. Lopez A.M. The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Las Vegas, NV, USA 27–30 June 2016 3234 3243 10.1109/cvpr.2016.352
Sturm J. Engelhard N. Endres F. Burgard W. Cremers D. A Benchmark for the Evaluation of RGB-D SLAM Systems Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Vilamoura, Portugal 7–12 October 2012 573 580 10.1109/IROS.2012.6385773
NVidia NVidia Deep Learning Inference Plateform Performance Study. Technical Overview 2018 Available online: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/tesla-product-literature/t4-inference-print-update-inference-tech-overview-final.pdf (accessed on 3 November 2021)
Godard C. Mac Aodha O. Brostow G.J. Unsupervised Monocular Depth Estimation with Left-Right Consistency Available online: https://github.com/mrharicot/monodepth (accessed on 16 March 2021)
Godard C. Mac Aodha O. Brostow G.J. Digging into Self-Supervised Monocular Depth Prediction Available online: https://github.com/nianticlabs/monodepth2 (accessed on 15 March 2021)
Weihao X. Exploiting Temporal Consistency for Real-Time Video Depth Estimation—Unofficial Implementation Available online: https://github.com/weihaox/ST-CLSTM (accessed on 14 March 2021)
Wang R. Pizer S. Frahm J.M. Recurrent Neural Network for (Un-)supervised Learning of Monocular VideoVisual Odometry and Depth Available online: https://github.com/wrlife/RNN_depth_pose (accessed on 15 March 2021)
Watson J. Mac Aodha O. Prisacariu V. Brostow G. Firman M. The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth Available online: https://github.com/nianticlabs/manydepth (accessed on 29 October 2021)