Entity · benchmark

ALFRED

benchmarkactivealfred-8d760aa1·1 events·first seen Jun 15, 2026

Aliases: ALFRED

Co-occurring entities

DeliveryBench AgentSpec MiniGrid RoboTHOR

More like this (12)

ALFWorld Alfred Lin AMALIA ALX ABBEL ARDY AMEL ALOHA ALMANAC AlphaOracle XAlpha Albert Gu

Recent events (1)

6arXiv · cs.CL·Jun 15, 2026·source ↗

AgentSpec: A modular framework for controlled composition and analysis of embodied LLM agent scaffolds

AgentSpec is a new modular specification framework that represents embodied LLM agents as typed compositions of reusable policy components with standardized interfaces across perception, memory, reasoning, reflection, action, and learning modules. The framework enables controlled swapping and recombination of components, instantiated across four benchmarks (DeliveryBench, ALFRED, MiniGrid, RoboTHOR). Key findings include that agent performance is governed by scaffold compatibility and interaction effects rather than isolated module strength, and that RL-trained policies compose best when optimized with deployment-time scaffold structure. Code, baselines, and an interactive playground are publicly released.

Evaluation and Benchmarking Agent and Tool Ecosystem DeliveryBench AgentSpec MiniGrid +2 more