benchmark

TriggerBench

benchmarkactiveprovisionaltriggerbench-c7f56ef6·1 events·first seen 3h ago

Aliases: TriggerBench

Co-occurring entities

AIME 2025

More like this (12)

CursorBench TokenBench ProgramBench LiveBench RepoBench DeliveryBench MTBench PseudoBench MemBench SpecBench IFBench T1-Bench

Recent events (1)

6arXiv · cs.CL·3h ago·source ↗

TriggerBench: A benchmark for evaluating prospective memory in LLMs

Researchers introduce TriggerBench, a benchmark evaluating prospective memory (PM) in LLMs — the ability to spontaneously recall and act on latent constraints without explicit prompting. The benchmark spans five dimensions across daily assistant and professional workflow scenarios, and reveals that PM is substantially harder than retrospective memory, decaying sharply with context length while retrospective memory near-saturates at 100K tokens. Key findings include a precision-recall trade-off in PM, attentional fragility under concurrent requests, and a novel result that PM accuracy correlates with spare reasoning capacity as measured against AIME-2025 math performance.

Long Context Evolution Evaluation and Benchmarking TriggerBench AIME 2025 +1 more