Publication

Object Pose and Shape Estimation for Grasping: Does it Work?

arXiv 2026

Authors: Pavan Karke, Kushal Shah, Gaurav Singh, Md Faizal Karim, K Madhava Krishna, Rajat Talak

Research Overview

Abstract

This paper studies whether recent object pose and shape estimation methods are mature enough to serve as a reliable front end for robotic grasping. The authors compare a state-of-the-art end-to-end grasp synthesis approach against modular pipelines that first estimate object pose and shape, then generate 7-DoF parallel-jaw grasps with antipodal sampling from single-view RGB or RGB-D input. Across the experiments, the modular approaches generate more viable grasps, including for small objects where the end-to-end baseline struggles. The results also show that performance depends strongly on pose and shape accuracy, with cluttered scenes exposing current limitations. Finally, the paper demonstrates that single-view pose and shape estimates can be paired with vision-language models to produce language-conditioned grasps from RGB-D input.

Explore Next

PICASSO: Holistic Scene Reconstruction with Physics-Constrained Sampling

Other Projects