Almanac
benchmark

MobileGym-Bench

benchmarkactiveprovisionalmobilegym-bench-1c31ca7b·1 events·first seen 22d ago

Aliases: MobileGym-Bench

Co-occurring entities

More like this (12)

Recent events (1)

7arXiv · cs.CL·22d ago·source ↗

MobileGym: Verifiable Parallel Simulation Platform for Mobile GUI Agent Training

MobileGym is a browser-hosted simulation environment for mobile GUI agent research that enables deterministic outcome verification via structured JSON state and scalable online RL through hundreds of parallel instances (~400 MB/instance, ~3s cold start). The accompanying MobileGym-Bench provides 416 parameterized task templates across 28 apps with deterministic judges. A sim-to-real case study using GRPO on Qwen3-VL-4B-Instruct achieves +12.8 percentage points on the 256-task test set, with real-device execution retaining 95.1% of simulation-side training gains.