Almanac
benchmark

TextQuests

benchmarkactivetextquests-f4d57013·1 events·first seen 28d ago

Aliases: TextQuests

Co-occurring entities

More like this (12)

Recent events (1)

4Hugging Face Blog·28d ago·source ↗

TextQuests: How Good are LLMs at Text-Based Video Games?

A Hugging Face blog post introduces TextQuests, an evaluation framework that tests LLMs on text-based video games as a proxy for interactive reasoning, planning, and language understanding. The benchmark assesses how well models can navigate, solve puzzles, and maintain state across multi-turn interactions in classic interactive fiction environments. This type of evaluation targets agentic capabilities including long-horizon planning and grounded language understanding.