benchmark
TextQuests
benchmarkactive
textquests-f4d57013·1 events·first seen 28d agoAliases: TextQuests
Co-occurring entities
More like this (12)
Recent events (1)
TextQuests: How Good are LLMs at Text-Based Video Games?
A Hugging Face blog post introduces TextQuests, an evaluation framework that tests LLMs on text-based video games as a proxy for interactive reasoning, planning, and language understanding. The benchmark assesses how well models can navigate, solve puzzles, and maintain state across multi-turn interactions in classic interactive fiction environments. This type of evaluation targets agentic capabilities including long-horizon planning and grounded language understanding.