Almanac
other

Directed Accuracy

otheractiveprovisionaldirected-accuracy-4e886b6e·1 events·first seen 22d ago

Aliases: Directed Accuracy

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·22d ago·source ↗

StakeBench: A Market-Commitment-Grounded Benchmark for Financial Language Understanding

StakeBench is a new evaluation framework linking 560,876 comments from 2,261 resolved prediction markets (Polymarket and Manifold) to verified trading positions, actions, and market-odds records, replacing human annotation with observable market behavior as supervision. Four diagnostic tasks test commitment detection, side identification, action anticipation, and collective odds projection, evaluated across 15 LLMs. Results reveal structural failures: models partially recover position-side signals (Directed Accuracy 0.506–0.599) but collapse on action anticipation and fail to beat naive baselines on odds projection. Notably, model scale shows no correlation with performance, and finance-domain fine-tuning does not improve revealed-side identification.