Deep 3D Dynamic Object Detection towards Successful and Safe Navigation for Full Autonomous Driving
Patikiri Arachchige Don Shehan Nilmantha Wijesekara1, *
Identifiers and Pagination:Year: 2022
E-location ID: e187444782208191
Publisher ID: e187444782208191
Article History:Received Date: 31/3/2022
Revision Received Date: 11/5/2022
Acceptance Date: 29/6/2022
Electronic publication date: 25/10/2022
Collection year: 2022
open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: https://creativecommons.org/licenses/by/4.0/legalcode. This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Infractions other than collisions are also a crucial factor in autonomous driving since other infractions can result in an accident. Most existing works have been conducted on navigation and collisions; however, fewer studies have been conducted on other infractions such as off-road driving and not obeying road signs. Furthermore, state-of-the-art driving models have not performed dynamic 3D object detection in the imitation learning stage; hence, the performance of such a model is unknown. No research has been conducted to investigate the driving models' computational complexities.
The objective of this research is to study the effect of 3D dynamic object detection for autonomous driving and derive an optimized driving model with superior performance for navigation and safety benchmarks.
We propose two driving models. One of them is an imitation learning-based model called Conditional Imitation Learning Dynamic Objects (CILDO), which performs dynamic object detection using image segmentation, depth prediction, and speed prediction. The other proposed model is an optimized model of the base model using an additional traffic light detection branch and deep deterministic policy gradient-based reinforcement learning called Conditional Imitation Learning Dynamic Objects Low Infractions-Reinforcement Learning (CILDOLI-RL).
An ablation study proves that using image segmentation and depth prediction together to enable three-dimensional object vision improves navigation performance rather than taking decisions entirely from the image. The CILDOLI-RL model presented in this paper achieves the highest score for the newly introduced No-Other-Infraction benchmark and No-Crash benchmark. It scores a moderate score for the Car Learning to Act (CARLA) benchmark in both the training town and the testing town, ensuring safe autonomous driving. The base CILDO model achieves the best performance in navigation and moderate scores for safety benchmarks under urban or rural dense traffic environments in both towns. Both proposed models are relatively computationally complex.
For safety-critical driving, since both navigation performance and safety are crucial factors, it can be concluded that the proposed CILDOLI-RL is the best model out of the two proposed models. For applications where driving safety is not of much concern, the proposed CILDO is the best model.