5arXiv cs.LG (Machine Learning)·8d ago

Mana framework achieves zero-shot sim-to-real transfer for dexterous articulated tool manipulation

Researchers introduce Mana (Manipulation Animator), a sim-to-real framework that reframes dexterous robotic manipulation as an animation problem using a coarse-to-fine pipeline of procedurally-generated grasp keyframes, motion planning, and reinforcement learning. The system requires minimal human input (under one minute per tool) and achieves zero-shot sim-to-real transfer across four articulated tools with varying joint types and scales. The work addresses a longstanding gap in dexterous robotics where articulated tool use—requiring coordination of internal degrees of freedom and contact-rich interactions—has been underexplored relative to rigid object manipulation.

Agent and Tool Ecosystem Mana Mana: Dexterous Manipulation of Articulated Tools

Related guides (1)

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

6Openai Blog·1mo ago·source ↗

Learning Dexterity: OpenAI Trains Robot Hand for Physical Object Manipulation

OpenAI announced the training of a human-like robot hand capable of manipulating physical objects with what they describe as unprecedented dexterity. The system uses reinforcement learning to develop fine motor control in a dexterous robotic hand. This work represents an early milestone in OpenAI's robotics research program, predating their later Dactyl work on solving Rubik's cubes.

Agent and Tool Ecosystem OpenAI Dexterous Hand Reinforcement Learning OpenAI

5arXiv · cs.AI·23d ago·source ↗

Beyond Binary: Sim-to-Real Dexterous Manipulation with Physics-Grounded Contact Representation (CoP)

Researchers introduce Center-of-Pressure (CoP), a tactile representation grounded in physical principles designed to bridge the sim-to-real gap in contact-rich dexterous manipulation. CoP preserves dense contact information while remaining robust for sim-to-real transfer, supported by a differentiable-dynamics-based sensor calibration scheme that estimates taxel orientations without ground-truth force measurements. Evaluated on peg-in-hole insertion and ball balancing tasks, CoP-conditioned policies achieve zero-shot sim-to-real transfer on a multi-fingered robotic hand, outperforming binary-contact and raw-taxel baselines. An emergent finding is that CoP-conditioned policies implicitly encode task-relevant physical properties such as object mass.

Evaluation and Benchmarking Agent and Tool Ecosystem multi-fingered dexterous hand Center-of-Pressure (CoP) tactile representation ball balancing +5 more

6arXiv · cs.LG·22d ago·source ↗

DynaFLIP: Dynamics-Aware Multimodal Pre-Training for Robot Manipulation Perception

DynaFLIP is a pre-training framework that injects motion understanding into visual encoders for robot manipulation by constructing image-language-3D flow triplets from human and robot videos. The method encourages tri-modal alignment via simplex-volume minimization in a shared hyperspherical space, combined with cosine regularization and contrastive objectives. The resulting dynamics-aware visual backbone consistently outperforms baselines across diverse downstream policies including VLAs, with gains up to +22.5% in out-of-distribution scenarios. The work argues that robot generalization requires encoding how the world changes under action, not just static scene content.

Frontier Model Releases Agent and Tool Ecosystem Vision-Language-Action models simplex-volume minimization DynaFLIP +3 more

5Openai Blog·1mo ago·source ↗

Sim-to-real transfer of robotic control with dynamics randomization

OpenAI published research on transferring robotic control policies trained in simulation to real-world robots using dynamics randomization. The technique involves varying physical parameters during simulation training so that the real world appears as just another variation, enabling zero-shot sim-to-real transfer. This was an early foundational contribution to the sim-to-real robotics research thread.

Agent and Tool Ecosystem dynamics randomization sim-to-real transfer OpenAI

6arXiv · cs.AI·11d ago·source ↗

AHA-WAM: Asynchronous world-action modeling with temporal decoupling for robot manipulation

AHA-WAM introduces a dual Diffusion Transformer architecture that decouples world prediction (low-frequency) from action execution (high-frequency) in robot manipulation policies, addressing the inefficiency of existing world-action models that force both branches to operate at the same temporal resolution. The system uses a rolling key-value memory video DiT as a long-horizon scene planner and a fast action DiT that queries layerwise latent context via joint attention, with Observation-Guided Video-Context Routing enabling asynchronous execution. On RoboTwin benchmarks, AHA-WAM achieves 92.80% average success and 78.3% on real-world tasks at 24.17 Hz, a 4.59x speedup over Fast-WAM, without robot-data pretraining.

Inference Economics RoboTwin Linear Diffusion Transformer Observation-Guided Video-Context Routing +2 more

6arXiv · cs.LG·4d ago·source ↗

Geometric Action Model (GAM) repurposes geometric foundation models for 3D-aware robot manipulation

Researchers propose the Geometric Action Model (GAM), a language-conditioned robot manipulation policy that splits a pretrained geometric foundation model (GFM) to serve simultaneously as an observation encoder, causal future predictor, and action decoder. Unlike existing vision-language-action models that operate on 2D image frames, GAM explicitly incorporates 3D geometric priors for contact-rich manipulation. The approach claims improvements in accuracy, robustness, speed, and model size over foundation-model-scale baselines across simulation and real-robot benchmarks.

Agent and Tool Ecosystem Multimodal Progress Geometric Action Model for Robot Policy Learning Geometric Action Model

7Openai Blog·1mo ago·source ↗

Solving Rubik's Cube with a Robot Hand via Reinforcement Learning and Automatic Domain Randomization

OpenAI trained neural networks to solve a Rubik's Cube using a dexterous robot hand, with training conducted entirely in simulation via reinforcement learning. A new technique called Automatic Domain Randomization (ADR) enables the system to generalize to real-world physical perturbations not seen during training. The work demonstrates that sim-to-real transfer can achieve unprecedented dexterity in manipulation tasks.

Frontier Model Releases Agent and Tool Ecosystem Automatic Domain Randomization Dactyl OpenAI Five +1 more

6arXiv · cs.CL·11d ago·source ↗

PhysTool-Bench reveals severe gaps in MLLM physical tool use and embodied planning

Researchers introduce PhysTool-Bench, the first benchmark evaluating multimodal LLMs on physical tool use across 2,510 queries and 2,678 real-world tools spanning manufacturing, electrical work, agriculture, and healthcare. Evaluation of 13 leading MLLMs shows even the best model (Gemini-3.1-Pro) identifies only 58.7% of tools in a scene and completes just 21.0% of queries end-to-end. The results expose a two-level deficit: poor tool perception in realistic scenes and a much larger drop at the planning stage, indicating a lack of functional commonsense for mapping tools to task semantics. This pinpoints a critical bottleneck for embodied AI development.

Evaluation and Benchmarking Agent and Tool Ecosystem Google PhysTool-Bench Gemini-3.1-Pro +1 more