Almanac
benchmark

MiniGrid

benchmarkactiveprovisionalminigrid-06f510ae·1 events·first seen 2d ago

Aliases: MiniGrid

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·2d ago·source ↗

AgentSpec: A modular framework for controlled composition and analysis of embodied LLM agent scaffolds

AgentSpec is a new modular specification framework that represents embodied LLM agents as typed compositions of reusable policy components with standardized interfaces across perception, memory, reasoning, reflection, action, and learning modules. The framework enables controlled swapping and recombination of components, instantiated across four benchmarks (DeliveryBench, ALFRED, MiniGrid, RoboTHOR). Key findings include that agent performance is governed by scaffold compatibility and interaction effects rather than isolated module strength, and that RL-trained policies compose best when optimized with deployment-time scaffold structure. Code, baselines, and an interactive playground are publicly released.