benchmark
TriggerBench
benchmarkactiveprovisional
triggerbench-c7f56ef6·1 events·first seen 3h agoAliases: TriggerBench
Co-occurring entities
More like this (12)
Recent events (1)
TriggerBench: A benchmark for evaluating prospective memory in LLMs
Researchers introduce TriggerBench, a benchmark evaluating prospective memory (PM) in LLMs — the ability to spontaneously recall and act on latent constraints without explicit prompting. The benchmark spans five dimensions across daily assistant and professional workflow scenarios, and reveals that PM is substantially harder than retrospective memory, decaying sharply with context length while retrospective memory near-saturates at 100K tokens. Key findings include a precision-recall trade-off in PM, attentional fragility under concurrent requests, and a novel result that PM accuracy correlates with spare reasoning capacity as measured against AIME-2025 math performance.