Author:
Tripicchio Paolo,D’Avella Salvatore,Avizzano Carlo Alberto
Abstract
Several datasets have been proposed in the literature, focusing on object detection and pose estimation. The majority of them are interested in recognizing isolated objects or the pose of objects in well-organized scenarios. This work introduces a novel dataset that aims to stress vision algorithms in the difficult task of object detection and pose estimation in highly cluttered scenes concerning the specific case of bin picking for the Cluttered Environment Picking Benchmark (CEPB). The dataset provides about 1.5M virtually generated photo-realistic images (RGB + depth + normals + segmentation) of 50K annotated cluttered scenes mixing rigid, soft, and deformable objects of varying sizes used in existing robotic picking benchmarks together with their 3D models (40 objects). Such images include three different camera positions, three light conditions, and multiple High Dynamic Range Imaging (HDRI) maps for domain randomization purposes. The annotations contain the 2D and 3D bounding boxes of the involved objects, the centroids’ poses (translation + quaternion), and the visibility percentage of the objects’ surfaces. Nearly 10K separated object images are presented to perform simple tests and compare them with more complex cluttered scenarios tests. A baseline performed with the DOPE neural network is reported to highlight the challenges introduced by the novel dataset.