Almanac
benchmark

FutureBench

benchmarkactivefuturebench-74e9f409·1 events·first seen 28d ago

Aliases: FutureBench

Co-occurring entities

More like this (12)

Recent events (1)

5Hugging Face Blog·28d ago·source ↗

Back to The Future: Evaluating AI Agents on Predicting Future Events

This Hugging Face blog post introduces FutureBench, a benchmark designed to evaluate AI agents on their ability to predict future events, addressing the challenge of data contamination in standard benchmarks by using temporally forward-looking tasks. The approach tests whether agents can reason about and forecast outcomes beyond their training data cutoff. This framing positions future-event prediction as a rigorous, contamination-resistant evaluation methodology for frontier models and agents.