Almanac
benchmark

DexHoldem

benchmarkactivedexholdem-d6b0a068·1 events·first seen 29d ago

Aliases: DexHoldem

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.AI·29d ago·source ↗

DexHoldem: A Real-World Benchmark for Dexterous Embodied Agents Using Texas Hold'em Manipulation

DexHoldem is a new system-level benchmark for evaluating dexterous embodied agents on a ShadowHand robot performing Texas Hold'em card manipulation tasks. It provides 1,470 teleoperated demonstrations across 14 manipulation primitives, a physical policy benchmark, and an agentic perception benchmark for structured game-state recovery. Top performers include π₀.₅ at 61.2% task completion and Claude Opus 4.7 at 34.3% strict perception accuracy, with GPT 5.5 achieving 66.8% field-wise accuracy. The benchmark exposes gaps between isolated visual sub-capabilities and full closed-loop embodied decision-making.