Polymarket
polymarket-73085a9d·3 events·first seen 22d agoAliases: Polymarket
Co-occurring entities
More like this (12)
Recent events (3)
StakeBench: A Market-Commitment-Grounded Benchmark for Financial Language Understanding
StakeBench is a new evaluation framework linking 560,876 comments from 2,261 resolved prediction markets (Polymarket and Manifold) to verified trading positions, actions, and market-odds records, replacing human annotation with observable market behavior as supervision. Four diagnostic tasks test commitment detection, side identification, action anticipation, and collective odds projection, evaluated across 15 LLMs. Results reveal structural failures: models partially recover position-side signals (Directed Accuracy 0.506–0.599) but collapse on action anticipation and fail to beat naive baselines on odds projection. Notably, model scale shows no correlation with performance, and finance-domain fine-tuning does not improve revealed-side identification.
PolyGnosis 2.0: Multi-Agent Architecture for Prediction Market Intelligence via Harness Engineering
PolyGnosis 2.0 introduces a multi-agent system that synthesizes Polymarket prediction market signals with GDELT OSINT streams to identify 'Perspective Mismatches' as trading signals. The paper rigorously evaluates agentic harness engineering techniques—reflection loops, tool-calling, divide-and-conquer partitioning, and chain-of-thought—in high-noise financial domains. Key empirical findings include that structural partitioning is necessary for multi-dimensional alignment, but unconstrained terminal reflection induces logical drift, and a pervasive consensus bias emerges across agent configurations. The authors identify a Pareto-optimal configuration achieving professional-grade analytical precision with minimized latency and token overhead.
Stance Detection in Prediction Market Commentary via Counterfactual Augmentation and Market Context
This paper introduces the first stance detection system applied to prediction market commentary (Polymarket), addressing extreme class imbalance (8.7% anti-market comments) through LLM-driven counterfactual augmentation using the Anthropic API. RoBERTa-base is fine-tuned across a 4×3 ablation covering input configurations and augmentation doses. Key findings: market context is the dominant factor (raising 3-class Anti recall from 0.10 to 0.45), 50% synthetic augmentation is optimal, and full augmentation (100%) consistently degrades performance. Attention-based interpretability supports all three findings mechanistically.