publications | Nikhil Chavan-Dafle

2025

FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection

Hongsuk Choi*, Isaac Kasahara*, Selim Engin, and 3 more authors

WACV, 2025

Abstract arXiv Website

Recently introduced ControlNet has the ability to steer the text-driven image generation process with geometric input such as human 2D pose, or edge features. While ControlNet provides control over the geometric form of the instances in the generated image, it lacks the capability to dictate the visual appearance of each instance. We present FineControlNet to provide fine control over each instance’s appearance while maintaining the precise pose control capability. Specifically, we develop and demonstrate FineControlNet with geometric control via human pose images and appearance control via instance-level text prompts. The spatial alignment of instance-specific text prompts and 2D poses in latent space enables the fine control capabilities of FineControlNet. We evaluate the performance of FineControlNet with rigorous comparison against state-of-the-art pose-conditioned text-to-image diffusion models. FineControlNet achieves superior performance in generating images that follow the user-provided instance-specific text prompts and poses.

2024

simPLE: a Visuotactile Method Learned in Simulation to Precisely Pick, Localize, Regrasp, and Place Objects

Maria Bauza, Antonia Bronars, Yifan Hou, and 3 more authors

Science Robotics, 2024

Abstract arXiv Website

Existing robotic systems have a clear tension between generality and precision. Deployed solutions for robotic manipulation tend to fall into the paradigm of one robot solving a single task, lacking precise generalization, i.e., the ability to solve many tasks without compromising on precision. This paper explores solutions for precise and general pick-and-place. In precise pick-and-place, i.e. kitting, the robot transforms an unstructured arrangement of objects into an organized arrangement, which can facilitate further manipulation. We propose simPLE (simulation to Pick Localize and PLacE) as a solution to precise pick-and-place. simPLE learns to pick, regrasp and place objects precisely, given only the object CAD model and no prior experience. We develop three main components: task-aware grasping, visuotactile perception, and regrasp planning. Task-aware grasping computes affordances of grasps that are stable, observable, and favorable to placing. The visuotactile perception model relies on matching real observations against a set of simulated ones through supervised learning. Finally, we compute the desired robot motion by solving a shortest path problem on a graph of hand-to-hand regrasps. On a dual-arm robot equipped with visuotactile sensing, we demonstrate pick-and-place of 15 diverse objects with simPLE. The objects span a wide range of shapes and simPLE achieves successful placements into structured arrangements with 1mm clearance over 90% of the time for 6 objects, and over 80% of the time for 11 objects.
VioLA: Aligning Videos to 2D LiDAR Scans

Jun-Jee Chao, Selim Engin, Nikhil Chavan-Dafle, and 2 more authors

ICRA, 2024

Abstract arXiv Website

We study the problem of aligning a video that captures a local portion of an environment to the 2D LiDAR scan of the entire environment. We introduce a method (VioLA) that starts with building a semantic map of the local scene from the image sequence, then extracts points at a fixed height for registering to the LiDAR map. Due to reconstruction errors or partial coverage of the camera scan, the reconstructed semantic map may not contain sufficient information for registration. To address this problem, VioLA makes use of a pre-trained text-to-image inpainting model paired with a depth completion model for filling in the missing scene content in a geometrically consistent fashion to support pose registration. We evaluate VioLA on two real-world RGB-D benchmarks, as well as a self-captured dataset of a large office scene. Notably, our proposed scene completion module improves the pose registration performance by up to 20%.
RIC: Rotate-Inpaint-Complete for Generalizable Scene Reconstruction

Isaac Kasahara, Shubham Agrawal, Selim Engin, and 3 more authors

ICRA, 2024

Abstract arXiv Website

General scene reconstruction refers to the task of estimating the full 3D geometry and texture of a scene containing previously unseen objects. In many practical applications such as AR/VR, autonomous navigation, and robotics, only a single view of the scene may be available, making the scene reconstruction a very challenging task. In this paper, we present a method for scene reconstruction by structurally breaking the problem into two steps: rendering novel views via inpainting and 2D to 3D scene lifting. Specifically, we leverage the generalization capability of large language models to inpaint the missing areas of scene color images rendered from different views. Next, we lift these inpainted images to 3D by predicting normals of the inpainted image and solving for the missing depth values. By predicting for normals instead of depth directly, our method allows for robustness to changes in depth distributions and scale. With rigorous quantitative evaluation, we show that our method outperforms multiple baselines while providing generalization to novel objects and scenes.
HandNeRF: Learning to Reconstruct Hand-Object Interaction Scene from a Single RGB Image

Hongsuk Choi*, Nikhil Chavan-Dafle, Jiacheng Yuan, and 2 more authors

ICRA, 2024

Abstract arXiv Website

This paper presents a method to learn hand-object interaction prior for reconstructing a 3D hand-object scene from a single RGB image. The inference as well as training-data generation for 3D hand-object scene reconstruction is challenging due to the depth ambiguity of a single image and occlusions by the hand and object. We turn this challenge into an opportunity by utilizing the hand shape to constrain the possible relative configuration of the hand and object geometry. We design a generalizable implicit function, HandNeRF, that explicitly encodes the correlation of the 3D hand shape features and 2D object features to predict the hand and object scene geometry. With experiments on real-world datasets, we show that HandNeRF is able to reconstruct hand-object scenes of novel grasp configurations more accurately than comparable methods. Moreover, we demonstrate that object reconstruction from HandNeRF ensures more accurate execution of downstream tasks, such as grasping and motion planning for robotic hand-over and manipulation.

2023

Real-time Simultaneous Multi-Object 3D Shape Reconstruction, 6DoF Pose Estimation and Dense Grasp Prediction

Shubham Agrawal, Nikhil Chavan-Dafle, Isaac Kasahara, and 3 more authors

IROS, 2023

Abstract arXiv Website

Robotic manipulation systems operating in complex environments rely on perception systems that provide information about the geometry (pose and 3D shape) of the objects in the scene along with other semantic information such as object labels. This information is then used for choosing the feasible grasps on relevant objects. In this paper, we present a novel method to provide this geometric and semantic information of all objects in the scene as well as feasible grasps on those objects simultaneously. The main advantage of our method is its speed as it avoids sequential perception and grasp planning steps. With detailed quantitative analysis, we show that our method delivers competitive performance compared to the state-of-the-art dedicated methods for object shape, pose, and grasp predictions while providing fast inference at 30 frames per second speed.
Pick2Place: Task-aware 6DoF Grasp Estimation via Object-Centric Perspective Affordance

Zhanpeng He, Nikhil Chavan-Dafle, Jinwook Huh, and 2 more authors

ICRA, 2023

Abstract arXiv

The choice of a grasp plays a critical role in the success of downstream manipulation tasks. Consider a task of placing an object in a cluttered scene; the majority of possible grasps may not be suitable for the desired placement. In this paper, we study the synergy between the picking and placing of an object in a cluttered scene to develop an algorithm for task-aware grasp estimation. We present an object-centric action space that encodes the relationship between the geometry of the placement scene and the object to be placed in order to provide placement affordance maps directly from perspective views of the placement scene. This action space enables the computation of a one-to-one mapping between the placement and picking actions allowing the robot to generate a diverse set of pick-and-place proposals and to optimize for a grasp under other task constraints such as robot kinematics and collision avoidance. With experiments both in simulation and on a real robot we demonstrate that with our method, the robot is able to successfully complete the task of placement-aware grasping with over 89% accuracy in such a way that generalizes to novel objects and scenes.

2022

Simultaneous Object Reconstruction and Grasp Prediction using a Camera-centric Object Shell Representation

Nikhil Chavan-Dafle, Sergiy Popovych, Shubham Agrawal, and 2 more authors

IROS, 2022

Abstract arXiv Website

Being able to grasp objects is a fundamental component of most robotic manipulation systems. In this paper, we present a new approach to simultaneously reconstruct a mesh and a dense grasp quality map of an object from a depth image. At the core of our approach is a novel camera-centric object representation called the "object shell" which is composed of an observed "entry image" and a predicted "exit image". We present an image-to-image residual ConvNet architecture in which the object shell and a grasp-quality map are predicted as separate output channels. The main advantage of the shell representation and the corresponding neural network architecture, ShellGrasp-Net, is that the input-output pixel correspondences in the shell representation are explicitly represented in the architecture. We show that this coupling yields superior generalization capabilities for object reconstruction and accurate grasp quality estimation implicitly considering the object geometry. Our approach yields an efficient dense grasp quality map and an object geometry estimate in a single forward pass. Both of these outputs can be used in a wide range of robotic manipulation applications. With rigorous experimental validation, both in simulation and on a real setup, we show that our shell-based method can be used to generate precise grasps and the associated grasp quality with over 90% accuracy. Diverse grasps computed on shell reconstructions allow the robot to select and execute grasps in cluttered scenes with more than 93% success rate.

2020

PnuGrip: An Active Two-Phase Gripper for Dexterous Manipulation

Ian H. Taylor, Nikhil Chavan-Dafle, Godric Li, and 2 more authors

IROS, 2020

Abstract PDF

We present the design of an active two-phase finger for mechanically mediated dexterous manipulation. The finger enables re-orientation of a grasped object by using a pneumatic braking mechanism to transition between free-rotating and fixed (i.e., braked) phases. Our design allows controlled high-bandwidth (5 Hz) phase transitions independent of the grasping force for manipulation of a variety of objects. Moreover, its thin profile (1 cm) facilitates picking and placing in clutter. Finally, the design features a sensor for measuring fingertip rotation to support feedback control. We experimentally characterize the finger’s load handling capacity in the brake phase and rotational resistance in the free phase. We also demonstrate several pick-and-place manipulations common to industrial and laboratory automation settings that are simplified by our design.
Dexterous Manipulation with Simple Grippers

Nikhil Chavan-Dafle

PhD Thesis, Massachusetts Institute of Technology, 2020

Abstract PDF

This thesis focuses on enabling robots, specially those with simple grippers, to dexterously manipulate an object in a grasp. The dexterity of a robot is not limited to the intrinsic capability of a gripper. The robot can roll the object in the gripper using gravity, or adjust the object’s pose by pressing it against a surface, or it can even toss the object in the air and catch it in a different pose. All these techniques rely on resources extrinsic to the hand, either gravity, external contacts or dynamic arm motions. We refer to such techniques collectively as "extrinsic dexterity". We focus on empowering robots to autonomously reason about using extrinsic dexterity, particularly, pushes against external contacts. We develop mechanics and algorithms for simulating, planning, and controlling motions of an object pushed in a grasp. We show that the force-motion relationship at contacts can be captured well with complementarity constraints and the mechanics of prehensile pushing in a general setting can be formulated as a mixed nonlinear complementarity problem. For computational efficiency, we derive the abstraction of the mechanics in the form of motion cones. A motion cone defines the set of object motions a pusher can induce using frictional contact. Building upon these mechanics models, we develop a sampling-based planner and an MPC-based controller for in-hand manipulation. The planner generates a series of pushes, possibly from different sides of the object, to move the object to a desired grasp. The controller generates local corrective pushes to keep the object close to the planned pushing strategy. With a variety of regrasp examples, we demonstrate that our planner-controller framework allows the robot to handle uncertainty in physical parameters and external disturbances during manipulation to successfully move the object to a desired grasp.
Planar In-hand Manipulation via Motion Cones

Nikhil Chavan-Dafle, Rachel Holladay, and Alberto Rodriguez

IJRR [Invited Paper | RSS 2018 Best Student Paper Award], 2020

Abstract PDF

In this article, we present the mechanics and algorithms to compute the set of feasible motions of an object pushed in a plane. This set is known as the motion cone and was previously described for non-prehensile manipulation tasks in the horizontal plane. We generalize its construction to a broader set of planar tasks, such as those where external forces including gravity influence the dynamics of pushing, or prehensile tasks, where there are complex frictional interactions between the gripper, object, and pusher. We show that the motion cone is defined by a set of low-curvature surfaces and approximate it by a polyhedral cone. We verify its validity with thousands of pushing experiments recorded with a motion tracking system. Motion cones abstract the algebra involved in the dynamics of frictional pushing and can be used for simulation, planning, and control. In this article, we demonstrate their use for the dynamic propagation step in a sampling-based planning algorithm. By constraining the planner to explore only through the interior of motion cones, we obtain manipulation strategies that are robust against bounded uncertainties in the frictional parameters of the system. Our planner generates in-hand manipulation trajectories that involve sequences of continuous pushes, from different sides of the object when necessary, with 5–1,000 times speed improvements to equivalent algorithms.

2019

Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching

Andy Zeng, Shuran Song, Kuan-Ting Yu, and 18 more authors

IJRR [Amazon Robotics Best Systems Paper Award in Manipulation], 2019

Abstract PDF Website

This article presents a robotic pick-and-place system that is capable of grasping and recognizing both known and novel objects in cluttered environments. The key new feature of the system is that it handles a wide range of object categories without needing any task-specific training data for novel objects. To achieve this, it first uses an object-agnostic grasping framework to map from visual observations to actions: inferring dense pixel-wise probability maps of the affordances for four different grasping primitive actions. It then executes the action with the highest affordance and recognizes picked objects with a cross-domain image classification framework that matches observed images to product images. Since product images are readily available for a wide range of objects (e.g., from the web), the system works out-of-the-box for novel objects without requiring any additional data collection or re-training. Exhaustive experimental results demonstrate that our multi-affordance grasping achieves high success rates for a wide variety of objects in clutter, and our recognition algorithm achieves high accuracy for both known and novel grasped objects. The approach was part of the MIT–Princeton Team system that took first place in the stowing task at the 2017 Amazon Robotics Challenge.

2018

In-Hand Manipulation via Motion Cones

Nikhil Chavan-Dafle, Rachel Holladay, and Alberto Rodriguez

RSS [Best Student Paper Award Winner], 2018

Abstract arXiv

In this paper, we present the mechanics and algorithms to compute the set of feasible motions of an object pushed in a plane. This set is known as the motion cone and was previously described for non-prehensile manipulation tasks in the horizontal plane. We generalize its geometric construction to a broader set of planar tasks, where external forces such as gravity influence the dynamics of pushing, and prehensile tasks, where there are complex interactions between the gripper, object, and pusher. We show that the motion cone is defined by a set of low-curvature surfaces and provide a polyhedral cone approximation to it. We verify its validity with 2000 pushing experiments recorded with motion tracking system. Motion cones abstract the algebra involved in simulating frictional pushing by providing bounds on the set of feasible motions and by characterizing which pushes will stick or slip. We demonstrate their use for the dynamic propagation step in a sampling-based planning algorithm for in-hand manipulation. The planner generates trajectories that involve sequences of continuous pushes with 5-1000x speed improvements to equivalent algorithms.
Pneumatic Shape-shifting Fingers to Reorient and Grasp

Nikhil Chavan-Dafle, Kyubin Lee, and Alberto Rodriguez

CASE, 2018

Abstract arXiv

We present pneumatic shape-shifting fingers to enable a simple parallel-jaw gripper for different manipulation modalities. By changing the finger geometry, the gripper effectively changes the contact type between the fingers and an object to facilitate distinct manipulation primitives. In this paper, we demonstrate the development and application of shape-shifting fingers to reorient and grasp cylindrical objects. The shape of the fingers changes based on the air pressure inside them and attains two distinct geometric forms at high and low pressure values. In our implementation, the finger shape switches between a wedge-shaped geometry and V-shaped geometry at high and low pressure, respectively. Using the wedge-shaped geometry, the fingers provide a point contact on a cylindrical object to pivot it to a vertical pose under the effect of gravity. By changing to V-shaped geometry, the fingers localize the object in the vertical pose and securely hold it. Experimental results show that the smooth transition between the two contact types allows a robot with a simple gripper to reorient a cylindrical object lying horizontally on a ground and to grasp it in a vertical pose.
Regrasping by Fixtureless Fixturing

Nikhil Chavan-Dafle, and Alberto Rodriguez

CASE, 2018

Abstract arXiv

This paper presents a fixturing strategy for regrasping that does not require a physical fixture. To regrasp an object in a gripper, a robot pushes the object against external contact/s in the environment such that the external contact keeps the object stationary while the fingers slide over the object. We call this manipulation technique fixtureless fixturing. Exploiting the mechanics of pushing, we characterize a convex polyhedral set of pushes that results in fixtureless fixturing. These pushes are robust against uncertainty in the object inertia, grasping force, and the friction at the contacts. We propose a sampling-based planner that uses the sets of robust pushes to rapidly build a tree of reachable grasps. A path in this tree is a pushing strategy, possibly involving pushes from different sides, to regrasp the object. We demonstrate the experimental validity and robustness of the proposed manipulation technique with different regrasp examples on a manipulation platform. Such a fast and flexible regrasp planner facilitates versatile and flexible automation solutions.
Stable Prehensile Pushing: In-Hand Manipulation with Alternating Sticking Contacts

Nikhil Chavan-Dafle, and Alberto Rodriguez

ICRA, 2018

Abstract arXiv

This paper presents an approach to in-hand manipulation planning that exploits the mechanics of alternating sticking contact. Particularly, we consider the problem of manipulating a grasped object using external pushes for which the pusher sticks to the object. Given the physical properties of the object, frictional coefficients at contacts and a desired regrasp on the object, we propose a sampling-based planning framework that builds a pushing strategy concatenating different feasible stable pushes to achieve the desired regrasp. An efficient dynamics formulation allows us to plan in-hand manipulations 100-1000 times faster than our previous work which builds upon a complementarity formulation. Experimental observations for the generated plans show that the object precisely moves in the grasp as expected by the planner.

2017

Sampling-based Planning of In-Hand Manipulation with External Pushes

Nikhil Chavan-Dafle, and Alberto Rodriguez

ISRR, 2017

Abstract arXiv

This paper presents a sampling-based planning algorithm for in-hand manipulation of a grasped object using a series of external pushes. A high-level sampling-based planning framework, in tandem with a low-level inverse contact dynamics solver, effectively explores the space of continuous pushes with discrete pusher contact switch-overs. We model the frictional interaction between gripper, grasped object, and pusher, by discretizing complex surface/line contacts into arrays of hard frictional point contacts. The inverse dynamics problem of finding an instantaneous pusher motion that yields a desired instantaneous object motion takes the form of a mixed nonlinear complementarity problem. Building upon this dynamics solver, our planner generates a sequence of pushes that steers the object to a goal grasp. We evaluate the performance of the planner for the case of a parallel-jaw gripper manipulating different objects, both in simulation and with real experiments. Through these examples, we highlight the important properties of the planner: respecting and exploiting the hybrid dynamics of contact sticking/sliding/rolling and a sense of efficiency with respect to discrete contact switch-overs.

2016

Experimental Validation of Contact Dynamics for In-Hand Manipulation

Roman Kolbert, Nikhil Chavan-Dafle, and Alberto Rodriguez

ISER, 2016

Abstract arXiv

This paper evaluates state-of-the-art contact models at predicting the motions and forces involved in simple in-hand robotic manipulations. In particular it focuses on three primitive actions –linear sliding, pivoting, and rolling– that involve contacts between a gripper, a rigid object, and their environment. The evaluation is done through thousands of controlled experiments designed to capture the motion of object and gripper, and all contact forces and torques at 250Hz. We demonstrate that a contact modeling approach based on Coulomb’s friction law and maximum energy principle is effective at reasoning about interaction to first order, but limited for making accurate predictions. We attribute the major limitations to 1) the non-uniqueness of force resolution inherent to grasps with multiple hard contacts of complex geometries, 2) unmodeled dynamics due to contact compliance, and 3) unmodeled geometries dueto manufacturing defects.
A Summary of Team MIT’s Approach to the Amazon Picking Challenge 2015

Kuan-Ting Yu, Nima Fazeli, Nikhil Chavan-Dafle, and 4 more authors

arXiv, 2016

Abstract arXiv

The Amazon Picking Challenge (APC), held alongside the International Conference on Robotics and Automation in May 2015 in Seattle, challenged roboticists from academia and industry to demonstrate fully automated solutions to the problem of picking objects from shelves in a warehouse fulfillment scenario. Packing density, object variability, speed, and reliability are the main complexities of the task. The picking challenge serves both as a motivation and an instrument to focus research efforts on a specific manipulation problem. In this document, we describe Team MIT’s approach to the competition, including design considerations, contributions, and performance, and we compile the lessons learned. We also describe what we think are the main remaining challenges.

2015

Prehensile Pushing: In-hand Manipulation with Push-Primitives

Nikhil Chavan-Dafle, and Alberto Rodriguez

IROS, 2015

Abstract PDF

This paper explores the manipulation of a grasped object by pushing it against its environment. Relying on precise arm motions and detailed models of frictional contact, prehensile pushing enables dexterous manipulation with simple manipulators, such as those currently available in industrial settings, and those likely affordable by service and field robots. This paper is concerned with the mechanics of the forceful interaction between a gripper, a grasped object, and its environment. In particular, we describe the quasi-dynamic motion of an object held by a set of point, line, or planar rigid frictional contacts and forced by an external pusher (the environment). Our model predicts the force required by the external pusher to “break” the equilibrium of the grasp and estimates the instantaneous motion of the object in the grasp. It also captures interesting behaviors such as the constraining effect of line or planar contacts and the guiding effect of the pusher’s motion on the objects’s motion. We evaluate the algorithm with three primitive prehensile pushing actions—straight sliding, pivoting, and rolling—with the potential to combine into a broader in-hand manipulation capability.
A Two-Phase Gripper to Reorient and Grasp

Nikhil Chavan-Dafle, Matthew T. Mason, Harald Staab, and 2 more authors

CASE, 2015

Abstract PDF

This paper introduces the design of novel two-phase fingers to passively reorient objects while picking them up. Two-phase refers to a change in the finger-object contact geometry, from a free spinning point contact to a firm multipoint contact, as the gripping force increases. We exploit the two phases to passively reorient prismatic objects from a horizontal resting pose to an upright secure grasp. This problem is particularly relevant to industrial assembly applications where parts often are presented lying on trays or conveyor belts and need to be assembled vertically. Each two-phase finger is composed of a small hard contact point attached to an elastic strip mounted over a V-groove cavity. When grasped between two parallel fingers with low gripping force, the object pivots about the axis between the contact points on the strips, and aligns upright with gravity. A subsequent increase in the gripping force makes the elastic strips recede into the cavities letting the part seat in the V-grooves to secure the grasp. The design is compatible with any type of parallel-jaw gripper, and can be reconfigured to specific objects by changing the geometry of the cavity. The two-phase gripper provides robots with the capability to accurately position and manipulate parts, reducing the need for dedicated part feeders or time-demanding regrasp procedures.

2014

Extrinsic Dexterity: In-hand Manipulation with External Forces

Nikhil Chavan-Dafle, Alberto Rodriguez, Robert Paolini, and 7 more authors

ICRA [Best Research Video Award Finalist], 2014

Abstract PDF

In-hand manipulation is the ability to reposition an object in the hand, for example when adjusting the grasp of a hammer before hammering a nail. The common approach to in-hand manipulation with robotic hands, known as dexterous manipulation [1], is to hold an object within the fingertips of the hand and wiggle the fingers, or walk them along the object’s surface. Dexterous manipulation, however, is just one of the many techniques available to the robot. The robot can also roll the object in the hand by using gravity, or adjust the object’s pose by pressing it against a surface, or if fast enough, it can even toss the object in the air and catch it in a different pose. All these techniques have one thing in common: they rely on resources extrinsic to the hand, either gravity, external contacts or dynamic arm motions. We refer to them as “extrinsic dexterity”. In this paper we study extrinsic dexterity in the context of regrasp operations, for example when switching from a power to a precision grasp, and we demonstrate that even simple grippers are capable of ample in-hand manipulation. We develop twelve regrasp actions, all open-loop and hand-scripted, and evaluate their effectiveness with over 1200 trials of regrasps and sequences of regrasps, for three different objects (see video [2]). The long-term goal of this work is to develop a general repertoire of these behaviors, and to understand how such a repertoire might eventually constitute a general-purpose in-hand manipulation capability.