benchmark
DexHoldem
benchmarkactive
dexholdem-d6b0a068·1 events·first seen 29d agoAliases: DexHoldem
Co-occurring entities
More like this (12)
Recent events (1)
DexHoldem: A Real-World Benchmark for Dexterous Embodied Agents Using Texas Hold'em Manipulation
DexHoldem is a new system-level benchmark for evaluating dexterous embodied agents on a ShadowHand robot performing Texas Hold'em card manipulation tasks. It provides 1,470 teleoperated demonstrations across 14 manipulation primitives, a physical policy benchmark, and an agentic perception benchmark for structured game-state recovery. Top performers include π₀.₅ at 61.2% task completion and Claude Opus 4.7 at 34.3% strict perception accuracy, with GPT 5.5 achieving 66.8% field-wise accuracy. The benchmark exposes gaps between isolated visual sub-capabilities and full closed-loop embodied decision-making.