Object Pose and Shape Estimation for Grasping: Does it Work?

Research Overview

Abstract

This paper studies whether recent object pose and shape estimation methods are mature enough to serve as a reliable front end for robotic grasping. The authors compare a state-of-the-art end-to-end grasp synthesis approach against modular pipelines that first estimate object pose and shape, then generate 7-DoF parallel-jaw grasps with antipodal sampling from single-view RGB or RGB-D input. Across the experiments, the modular approaches generate more viable grasps, including for small objects where the end-to-end baseline struggles. The results also show that performance depends strongly on pose and shape accuracy, with cluttered scenes exposing current limitations. Finally, the paper demonstrates that single-view pose and shape estimates can be paired with vision-language models to produce language-conditioned grasps from RGB-D input.

Object Pose and Shape Estimation for Grasping: Does it Work?

Abstract

Explore Next

PICASSO: Holistic Scene Reconstruction with Physics-Constrained Sampling

Other Projects

PICASSO: Holistic Scene Reconstruction with Physics-Constrained Sampling

Hydra++: Real-Time Hierarchical 3D Scene Graph Construction With Object-Level Shape Estimation

Hierarchical Object Representation for Spatial Robot Perception: Points, Meshes, and Superquadrics