Almanac
technique

Improvement Dynamics Curve

techniqueactiveprovisionalimprovement-dynamics-curve-9a583ff5·1 events·first seen 8d ago

Aliases: Improvement Dynamics Curve

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.AI·8d ago·source ↗

OmniGameArena: UE5 benchmark for VLM game agents with multi-round improvement dynamics

Researchers introduce OmniGameArena, a real-time benchmark of twelve Unreal Engine 5 games spanning solo, PvP, and cooperative play, designed to evaluate vision-language model agents under unified protocols across commercial VLMs, open-weight VLMs, and specialized game policies. The benchmark introduces the Improvement Dynamics Curve (IDC), an agentic-reflection harness where a tool-using LLM autonomously refines skill prompts across multiple rounds, exposing how agent performance evolves and generalizes beyond a single cold-start score. Twelve VLM agents are evaluated on the leaderboard, with four top agents further analyzed under IDC. The work addresses gaps in existing game benchmarks that report only single-attempt scores and lack multi-agent or cooperative evaluation modes.