Introduction

Accurately localize the 3D positions of the objects in videos captured by a camera mounted on an autonomous vehicle.
Adaptively estimate ground plane of each frame for more robust object 3D localization.

Solution:

Monocular depth estimation or other 3D sensors to obtain depth information.
Object depth histogram analysis or 3D point cloud clustering for object depth initialization.
Adaptive ground plane estimation taking advantage of sparse and dense ground features.
Tracklet smoothing using the results from multi-object tracking.

Detailed information: http://yizhouwang.net/blog/2019/07/15/object-3d-localization/

Reference for CO annotation method:

@inproceedings{wang2019monocular,

title={Monocular Visual Object 3D Localization in Road Scenes},

author={Wang, Yizhou and Huang, Yen-Ting and Hwang, Jenq-Neng},

booktitle={Proceedings of the 27th ACM International Conference on Multimedia},

pages={917--925},

year={2019}

}

Camera-Radar Fusion (CRF) Annotation

An intuitive way of improving the above camera-only annotation is by taking advantage of radar, which has a plausible capability of range estimation without any systematic bias.

Heuristic Fusion Algorithm:

Fuse the location results from camera and radar by the distances between each pair. During the fusion process, we trust the range from radar, and trust the azimuth from camera. The brief pipeline of this algorithm can be concluded as follows:

Calculate the fused locations for each pair of the camera-radar locations.
Remove nearby redundant radar locations according to their distances.
First, find and keep the best matching pair for each radar detection. Then, find and keep the best matching pair for each camera detection.
Now, the above matching becomes a one-to-one mapping. Collect all the mappings as the final CRF annotations.

Probabilistic Fusion Algorithm:

Align camera and radar coordinates with sensor calibration results.
Generate two probability maps for camera and radar locations separately.
Fuse two probability maps by element-wise product.
The fused annotations are derived from the fused probability maps by peak detection.

The paper that introduces the above method about CRF annotation is accepted by WACV 2021:

@inproceedings{wang2021rodnet,

author={Wang, Yizhou and Jiang, Zhongyu and Gao, Xiangyu and Hwang, Jenq-Neng and Xing, Guanbin and Liu, Hui},

title={RODNet: Radar Object Detection Using Cross-Modal Supervision},

booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},

month={January},

year={2021},

pages={504-513}

}

Human Annotation

For the ground truth needed for evaluation purposes, we human-annotate the testing sequences with different scenarios. The annotations are operated on the RF images by labeling the object classes and locations according to the corresponding RGB and RF images.

Page updated

Google Sites

Report abuse