Almanac
benchmark

TriggerBench

benchmarkactiveprovisionaltriggerbench-c7f56ef6·1 events·first seen 3h ago

Aliases: TriggerBench

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·3h ago·source ↗

TriggerBench: A benchmark for evaluating prospective memory in LLMs

Researchers introduce TriggerBench, a benchmark evaluating prospective memory (PM) in LLMs — the ability to spontaneously recall and act on latent constraints without explicit prompting. The benchmark spans five dimensions across daily assistant and professional workflow scenarios, and reveals that PM is substantially harder than retrospective memory, decaying sharply with context length while retrospective memory near-saturates at 100K tokens. Key findings include a precision-recall trade-off in PM, attentional fragility under concurrent requests, and a novel result that PM accuracy correlates with spare reasoning capacity as measured against AIME-2025 math performance.