3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection

NeurIPS 2023

1Stanford University, 2University of Southern California, 3Bosch Research North America, Bosch Center for Artificial Intelligence (BCAI)
Interpolate start reference image.

3D Copy-Paste automatically copies virtual objects and pastes them into a real scene. The resulting objects in the scene have 3D bounding boxes with plausible physical locations and appearances. The generated data trains a monocular 3D detection model and achieves SOTA.

Interpolate start reference image.

Abstract

A major challenge in monocular 3D object detection is the limited diversity and quantity of objects in real datasets. While augmenting real scenes with virtual objects holds promise to improve both the diversity and quantity of the objects, it remains elusive due to the lack of an effective 3D object insertion method in complex real captured scenes. In this work, we study augmenting complex real indoor scenes with virtual objects for monocular 3D object detection. The main challenge is to automatically identify plausible physical properties for virtual assets (e.g., locations, appearances, sizes, etc.) in cluttered real scenes. To address this challenge, we propose a physically plausible indoor 3D object insertion approach to automatically copy virtual objects and paste them into real scenes. The resulting objects in scenes have 3D bounding boxes with plausible physical locations and appearances. In particular, our method first identifies physically feasible locations and poses for the inserted objects to prevent collisions with the existing room layout. Subsequently, it estimates spatially-varying illumination for the insertion location, enabling the immersive blending of the virtual objects into the original scene with plausible appearances and cast shadows. We show that our augmentation method significantly improves existing monocular 3D object models and achieves state-of-the-art performance.

Method

Where and How to put the object. 3D Copy-Paste first identifies physically feasible locations and poses for the inserted objects to prevent collisions with the existing room layout.

What illumination is on the object. 3D Copy-Paste estimates spatially varying illumination for the insertion location, enabling the immersive blending of the virtual objects into the original scene with plausible appearances and cast shadows.

Interpolate start reference image.

BibTeX

@inproceedings{ge2023d,
title={3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection},
author={Yunhao Ge and Hong-Xing Yu and Cheng Zhao and Yuliang Guo and Xinyu Huang and Liu Ren and Laurent Itti and Jiajun Wu},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
url={https://openreview.net/forum?id=d86B6Mdweq}
}