Skip to content

Commit

Permalink
Support multi-modal 3D detection on NuScenes #1339
Browse files Browse the repository at this point in the history
Add support for multi-modal NuScenes Detection
  • Loading branch information
sshaoshuai authored May 13, 2023
2 parents ad9c25c + fcfa077 commit 02ac3e1
Show file tree
Hide file tree
Showing 41 changed files with 3,863 additions and 33 deletions.
19 changes: 14 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ It is also the official code release of [`[PointRCNN]`](https://arxiv.org/abs/18
* `OpenPCDet` has been updated to `v0.6.0` (Sep. 2022).
* The codes of PV-RCNN++ has been supported.
* The codes of MPPNet has been supported.
* The multi-modal 3D detection approaches on Nuscenes have been supported.

## Overview
- [Changelog](#changelog)
Expand All @@ -22,10 +23,15 @@ It is also the official code release of [`[PointRCNN]`](https://arxiv.org/abs/18


## Changelog
[2023-04-02] Added support for [`VoxelNeXt`](https://github.com/dvlab-research/VoxelNeXt) on Nuscenes, Waymo, and Argoverse2 datasets. It is a fully sparse 3D object detection network, which is a clean sparse CNNs network and predicts 3D objects directly upon voxels.
[2023-05-13] **NEW:** Added support for the multi-modal 3D object detection models on Nuscenes dataset.
* Support multi-modal Nuscenes detection (See the [GETTING_STARTED.md](docs/GETTING_STARTED.md) to process data).
* Support [TransFusion-Lidar](https://arxiv.org/abs/2203.11496) head, which ahcieves 69.43% NDS on Nuscenes validation dataset.
* Support [`BEVFusion`](https://arxiv.org/abs/2205.13542), which fuses multi-modal information on BEV space and reaches 70.98% NDS on Nuscenes validation dataset. (see the [guideline](docs/guidelines_of_approaches/bevfusion.md) on how to train/test with BEVFusion).

[2023-04-02] Added support for [`VoxelNeXt`](https://arxiv.org/abs/2303.11301) on Nuscenes, Waymo, and Argoverse2 datasets. It is a fully sparse 3D object detection network, which is a clean sparse CNNs network and predicts 3D objects directly upon voxels.

[2022-09-02] **NEW:** Update `OpenPCDet` to v0.6.0:
* Official code release of [MPPNet](https://arxiv.org/abs/2205.05979) for temporal 3D object detection, which supports long-term multi-frame 3D object detection and ranks 1st place on [3D detection learderboard](https://waymo.com/open/challenges/2020/3d-detection) of Waymo Open Dataset on Sept. 2th, 2022. For validation dataset, MPPNet achieves 74.96%, 75.06% and 74.52% for vehicle, pedestrian and cyclist classes in terms of mAPH@Level_2. (see the [guideline](docs/guidelines_of_approaches/mppnet.md) on how to train/test with MPPNet).
* Official code release of [`MPPNet`](https://arxiv.org/abs/2205.05979) for temporal 3D object detection, which supports long-term multi-frame 3D object detection and ranks 1st place on [3D detection learderboard](https://waymo.com/open/challenges/2020/3d-detection) of Waymo Open Dataset on Sept. 2th, 2022. For validation dataset, MPPNet achieves 74.96%, 75.06% and 74.52% for vehicle, pedestrian and cyclist classes in terms of mAPH@Level_2. (see the [guideline](docs/guidelines_of_approaches/mppnet.md) on how to train/test with MPPNet).
* Support multi-frame training/testing on Waymo Open Dataset (see the [change log](docs/changelog.md) for more details on how to process data).
* Support to save changing training details (e.g., loss, iter, epoch) to file (previous tqdm progress bar is still supported by using `--use_tqdm_to_record`). Please use `pip install gpustat` if you also want to log the GPU related information.
* Support to save latest model every 5 mintues, so you can restore the model training from latest status instead of previous epoch.
Expand All @@ -38,10 +44,10 @@ It is also the official code release of [`[PointRCNN]`](https://arxiv.org/abs/18

[2022-02-07] Added support for Centerpoint models on Nuscenes Dataset.

[2022-01-14] Added support for dynamic pillar voxelization, following the implementation proposed in [H^23D R-CNN](https://arxiv.org/abs/2107.14391) with unique operation and [`torch_scatter`](https://github.com/rusty1s/pytorch_scatter) package.
[2022-01-14] Added support for dynamic pillar voxelization, following the implementation proposed in [`H^23D R-CNN`](https://arxiv.org/abs/2107.14391) with unique operation and [`torch_scatter`](https://github.com/rusty1s/pytorch_scatter) package.

[2022-01-05] **NEW:** Update `OpenPCDet` to v0.5.2:
* The code of [PV-RCNN++](https://arxiv.org/abs/2102.00463) has been released to this repo, with higher performance, faster training/inference speed and less memory consumption than PV-RCNN.
* The code of [`PV-RCNN++`](https://arxiv.org/abs/2102.00463) has been released to this repo, with higher performance, faster training/inference speed and less memory consumption than PV-RCNN.
* Add performance of several models trained with full training set of [Waymo Open Dataset](#waymo-open-dataset-baselines).
* Support Lyft dataset, see the pull request [here](https://github.com/open-mmlab/OpenPCDet/pull/720).

Expand Down Expand Up @@ -199,7 +205,7 @@ We could not provide the above pretrained models due to [Waymo Dataset License A
but you could easily achieve similar performance by training with the default configs.

### NuScenes 3D Object Detection Baselines
All models are trained with 8 GTX 1080Ti GPUs and are available for download.
All models are trained with 8 GPUs and are available for download. For training BEVFusion, please refer to the [guideline](docs/guidelines_of_approaches/bevfusion.md).

| | mATE | mASE | mAOE | mAVE | mAAE | mAP | NDS | download |
|----------------------------------------------------------------------------------------------------|-------:|:------:|:------:|:-----:|:-----:|:-----:|:------:|:--------------------------------------------------------------------------------------------------:|
Expand All @@ -209,7 +215,10 @@ All models are trained with 8 GTX 1080Ti GPUs and are available for download.
| [CenterPoint (voxel_size=0.1)](tools/cfgs/nuscenes_models/cbgs_voxel01_res3d_centerpoint.yaml) | 30.11 | 25.55 | 38.28 | 21.94 | 18.87 | 56.03 | 64.54 | [model-34M](https://drive.google.com/file/d/1Cz-J1c3dw7JAWc25KRG1XQj8yCaOlexQ/view?usp=sharing) |
| [CenterPoint (voxel_size=0.075)](tools/cfgs/nuscenes_models/cbgs_voxel0075_res3d_centerpoint.yaml) | 28.80 | 25.43 | 37.27 | 21.55 | 18.24 | 59.22 | 66.48 | [model-34M](https://drive.google.com/file/d/1XOHAWm1MPkCKr1gqmc3TWi5AYZgPsgxU/view?usp=sharing) |
| [VoxelNeXt (voxel_size=0.075)](tools/cfgs/nuscenes_models/cbgs_voxel0075_voxelnext.yaml) | 30.11 | 25.23 | 40.57 | 21.69 | 18.56 | 60.53 | 66.65 | [model-31M](https://drive.google.com/file/d/1IV7e7G9X-61KXSjMGtQo579pzDNbhwvf/view?usp=share_link) |
| [TransFusion-L*](tools/cfgs/nuscenes_models/transfusion_lidar.yaml) | 27.96 | 25.37 | 29.35 | 27.31 | 18.55 | 64.58 | 69.43 | [model-32M](https://drive.google.com/file/d/1cuZ2qdDnxSwTCsiXWwbqCGF-uoazTXbz/view?usp=share_link) |
| [BEVFusion](tools/cfgs/nuscenes_models/bevfusion.yaml) | 28.03 | 25.43 | 30.19 | 26.76 | 18.48 | 67.75 | 70.98 | [model-157M](https://drive.google.com/file/d/1X50b-8immqlqD8VPAUkSKI0Ls-4k37g9/view?usp=share_link) |

*: Use the fade strategy, which disables data augmentations in the last several epochs during training.

### ONCE 3D Object Detection Baselines
All models are trained with 8 GPUs.
Expand Down
7 changes: 7 additions & 0 deletions docs/GETTING_STARTED.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,16 @@ pip install nuscenes-devkit==1.0.5

* Generate the data infos by running the following command (it may take several hours):
```python
# for lidar-only setting
python -m pcdet.datasets.nuscenes.nuscenes_dataset --func create_nuscenes_infos \
--cfg_file tools/cfgs/dataset_configs/nuscenes_dataset.yaml \
--version v1.0-trainval

# for multi-modal setting
python -m pcdet.datasets.nuscenes.nuscenes_dataset --func create_nuscenes_infos \
--cfg_file tools/cfgs/dataset_configs/nuscenes_dataset.yaml \
--version v1.0-trainval \
--with_cam
```

### Waymo Open Dataset
Expand Down
35 changes: 35 additions & 0 deletions docs/guidelines_of_approaches/bevfusion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@

## Installation

Please refer to [INSTALL.md](../INSTALL.md) for the installation of `OpenPCDet`.
* We recommend the users to check the version of pillow and use pillow==8.4.0 to avoid bug in bev pooling.

## Data Preparation
Please refer to [GETTING_STARTED.md](../GETTING_STARTED.md) to process the multi-modal Nuscenes Dataset.

## Training

1. Train the lidar branch for BEVFusion:
```shell
bash scripts/dist_train.sh ${NUM_GPUS} --cfg_file cfgs/nuscenes_models/transfusion_lidar.yaml \
```
The ckpt will be saved in ../output/nuscenes_models/transfusion_lidar/default/ckpt, or you can download pretrained checkpoint directly form [here](https://drive.google.com/file/d/1cuZ2qdDnxSwTCsiXWwbqCGF-uoazTXbz/view?usp=share_link).

2. To train BEVFusion, you need to download pretrained parameters for image backbone [here](https://drive.google.com/file/d/1v74WCt4_5ubjO7PciA5T0xhQc9bz_jZu/view?usp=share_link), and specify the path in [config](../../tools/cfgs/nuscenes_models/bevfusion.yaml#L88). Then run the following command:
```shell
bash scripts/dist_train.sh ${NUM_GPUS} --cfg_file cfgs/nuscenes_models/bevfusion.yaml \
--pretrained_model path_to_pretrained_lidar_branch_ckpt \
```
## Evaluation
* Test with a pretrained model:
```shell
bash scripts/dist_test.sh ${NUM_GPUS} --cfg_file cfgs/nuscenes_models/bevfusion.yaml \
--ckpt ../output/cfgs/nuscenes_models/bevfusion/default/ckpt/checkpoint_epoch_6.pth
```

## Performance
All models are trained with spconv 1.0, but you can directly load them for testing regardless of the spconv version.
| | mATE | mASE | mAOE | mAVE | mAAE | mAP | NDS | download |
|----------------------------------------------------------------------------------------------------|-------:|:------:|:------:|:-----:|:-----:|:-----:|:------:|:--------------------------------------------------------------------------------------------------:|
| [TransFusion-L](../../tools/cfgs/nuscenes_models/transfusion_lidar.yaml) | 27.96 | 25.37 | 29.35 | 27.31 | 18.55 | 64.58 | 69.43 | [model-32M](https://drive.google.com/file/d/1cuZ2qdDnxSwTCsiXWwbqCGF-uoazTXbz/view?usp=share_link) |
| [BEVFusion](../../tools/cfgs/nuscenes_models/bevfusion.yaml) | 28.03 | 25.43 | 30.19 | 26.76 | 18.48 | 67.75 | 70.98 | [model-157M](https://drive.google.com/file/d/1X50b-8immqlqD8VPAUkSKI0Ls-4k37g9/view?usp=share_link) |
36 changes: 36 additions & 0 deletions pcdet/datasets/augmentor/data_augmentor.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from functools import partial

import numpy as np
from PIL import Image

from ...utils import common_utils
from . import augmentor_utils, database_sampler
Expand All @@ -23,6 +24,18 @@ def __init__(self, root_path, augmentor_configs, class_names, logger=None):
cur_augmentor = getattr(self, cur_cfg.NAME)(config=cur_cfg)
self.data_augmentor_queue.append(cur_augmentor)

def disable_augmentation(self, augmentor_configs):
self.data_augmentor_queue = []
aug_config_list = augmentor_configs if isinstance(augmentor_configs, list) \
else augmentor_configs.AUG_CONFIG_LIST

for cur_cfg in aug_config_list:
if not isinstance(augmentor_configs, list):
if cur_cfg.NAME in augmentor_configs.DISABLE_AUG_LIST:
continue
cur_augmentor = getattr(self, cur_cfg.NAME)(config=cur_cfg)
self.data_augmentor_queue.append(cur_augmentor)

def gt_sampling(self, config=None):
db_sampler = database_sampler.DataBaseSampler(
root_path=self.root_path,
Expand Down Expand Up @@ -139,6 +152,7 @@ def random_world_translation(self, data_dict=None, config=None):

data_dict['gt_boxes'] = gt_boxes
data_dict['points'] = points
data_dict['noise_translate'] = noise_translate
return data_dict

def random_local_translation(self, data_dict=None, config=None):
Expand Down Expand Up @@ -251,6 +265,28 @@ def random_local_pyramid_aug(self, data_dict=None, config=None):
data_dict['points'] = points
return data_dict

def imgaug(self, data_dict=None, config=None):
if data_dict is None:
return partial(self.imgaug, config=config)
imgs = data_dict["camera_imgs"]
img_process_infos = data_dict['img_process_infos']
new_imgs = []
for img, img_process_info in zip(imgs, img_process_infos):
flip = False
if config.RAND_FLIP and np.random.choice([0, 1]):
flip = True
rotate = np.random.uniform(*config.ROT_LIM)
# aug images
if flip:
img = img.transpose(method=Image.FLIP_LEFT_RIGHT)
img = img.rotate(rotate)
img_process_info[2] = flip
img_process_info[3] = rotate
new_imgs.append(img)

data_dict["camera_imgs"] = new_imgs
return data_dict

def forward(self, data_dict):
"""
Args:
Expand Down
28 changes: 28 additions & 0 deletions pcdet/datasets/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from pathlib import Path

import numpy as np
import torch
import torch.utils.data as torch_data

from ..utils import common_utils
Expand Down Expand Up @@ -130,6 +131,30 @@ def __getitem__(self, index):
"""
raise NotImplementedError

def set_lidar_aug_matrix(self, data_dict):
"""
Get lidar augment matrix (4 x 4), which are used to recover orig point coordinates.
"""
lidar_aug_matrix = np.eye(4)
if 'flip_y' in data_dict.keys():
flip_x = data_dict['flip_x']
flip_y = data_dict['flip_y']
if flip_x:
lidar_aug_matrix[:3,:3] = np.array([[1, 0, 0], [0, -1, 0], [0, 0, 1]]) @ lidar_aug_matrix[:3,:3]
if flip_y:
lidar_aug_matrix[:3,:3] = np.array([[-1, 0, 0], [0, 1, 0], [0, 0, 1]]) @ lidar_aug_matrix[:3,:3]
if 'noise_rot' in data_dict.keys():
noise_rot = data_dict['noise_rot']
lidar_aug_matrix[:3,:3] = common_utils.angle2matrix(torch.tensor(noise_rot)) @ lidar_aug_matrix[:3,:3]
if 'noise_scale' in data_dict.keys():
noise_scale = data_dict['noise_scale']
lidar_aug_matrix[:3,:3] *= noise_scale
if 'noise_translate' in data_dict.keys():
noise_translate = data_dict['noise_translate']
lidar_aug_matrix[:3,3:4] = noise_translate.T
data_dict['lidar_aug_matrix'] = lidar_aug_matrix
return data_dict

def prepare_data(self, data_dict):
"""
Args:
Expand Down Expand Up @@ -165,6 +190,7 @@ def prepare_data(self, data_dict):
)
if 'calib' in data_dict:
data_dict['calib'] = calib
data_dict = self.set_lidar_aug_matrix(data_dict)
if data_dict.get('gt_boxes', None) is not None:
selected = common_utils.keep_arrays_by_name(data_dict['gt_names'], self.class_names)
data_dict['gt_boxes'] = data_dict['gt_boxes'][selected]
Expand Down Expand Up @@ -287,6 +313,8 @@ def collate_batch(batch_list, _unused=False):
constant_values=pad_value)
points.append(points_pad)
ret[key] = np.stack(points, axis=0)
elif key in ['camera_imgs']:
ret[key] = torch.stack([torch.stack(imgs,dim=0) for imgs in val],dim=0)
else:
ret[key] = np.stack(val, axis=0)
except:
Expand Down
Loading

0 comments on commit 02ac3e1

Please sign in to comment.