Entity · product

Polymarket

productactivepolymarket-73085a9d·5 events·first seen May 26, 2026

Aliases: Polymarket

Co-occurring entities

Reddit Hindcast Human Capital, Not Model Benchmarks, Predicts Hybrid Intelligence in Forecasting RoBERTa counterfactual data augmentation stance detection Anthropic PolyGnosis 2.0 Divide-and-Conquer Partitioning Harness Engineering Chain-of-Thought Reasoning GDELT Reflection Loops Manifold StakeBench Directed Accuracy

More like this (12)

AWS Marketplace MetaTrader Mercado Libre Unipol Group Google Cloud Marketplace Mercor ShopX Phenopackets Agentopia Multica Repomix AgentMob

Recent events (5)

6arXiv · cs.CL·Jul 16, 2026·source ↗

Hindcast: A benchmark framework that closes data-leakage loopholes in LLM forecasting evaluation

A new arXiv preprint introduces Hindcast, an evaluation framework for LLM forecasters that addresses two systematic data-leakage channels in standard backtesting: post-event retrieval and training-data contamination from newer models. The system replays resolved Polymarket prediction markets against a frozen Reddit snapshot, grading models only on information available before a chosen cutoff date and comparing against contemporaneous market prices as a human-forecast baseline. Key finding: retrieval helps forecasting only when pre-event Reddit discussion existed; where only speculation was available, retrieval hurts performance. The framework is designed to remain valid as new models and markets emerge without going stale.

Evaluation and Benchmarking Agent and Tool Ecosystem Reddit Polymarket Hindcast

6arXiv · cs.AI·Jul 3, 2026·source ↗

Human capital traits, not model benchmarks, predict effective human-AI collaboration in forecasting

A pilot study using Polymarket as an externally resolved benchmark finds that the value of human-AI collaboration in forecasting is highly individual-dependent, with a trimodal distribution: most users either defer to the model or rubber-stamp prior beliefs, while a minority engage in genuine complementary reasoning that matches or beats market accuracy. Collaborative traits—perspective-taking, intellectual humility, and curiosity—predicted who reached the high-performance mode, while raw cognitive ability and model benchmark scores did not. The results challenge the common practice of reporting human-AI collaboration effects as a single average, and a pre-registered replication is in preparation.

Evaluation and Benchmarking Polymarket Human Capital, Not Model Benchmarks, Predicts Hybrid Intelligence in Forecasting

4arXiv · cs.CL·May 28, 2026·source ↗

Stance Detection in Prediction Market Commentary via Counterfactual Augmentation and Market Context

This paper introduces the first stance detection system applied to prediction market commentary (Polymarket), addressing extreme class imbalance (8.7% anti-market comments) through LLM-driven counterfactual augmentation using the Anthropic API. RoBERTa-base is fine-tuned across a 4×3 ablation covering input configurations and augmentation doses. Key findings: market context is the dominant factor (raising 3-class Anti recall from 0.10 to 0.45), 50% synthetic augmentation is optimal, and full augmentation (100%) consistently degrades performance. Attention-based interpretability supports all three findings mechanistically.

Agent and Tool Ecosystem RoBERTa counterfactual data augmentation Polymarket +2 more

5arXiv · cs.CL·May 26, 2026·source ↗

PolyGnosis 2.0: Multi-Agent Architecture for Prediction Market Intelligence via Harness Engineering

PolyGnosis 2.0 introduces a multi-agent system that synthesizes Polymarket prediction market signals with GDELT OSINT streams to identify 'Perspective Mismatches' as trading signals. The paper rigorously evaluates agentic harness engineering techniques—reflection loops, tool-calling, divide-and-conquer partitioning, and chain-of-thought—in high-noise financial domains. Key empirical findings include that structural partitioning is necessary for multi-dimensional alignment, but unconstrained terminal reflection induces logical drift, and a pervasive consensus bias emerges across agent configurations. The authors identify a Pareto-optimal configuration achieving professional-grade analytical precision with minimized latency and token overhead.

Evaluation and Benchmarking Agent and Tool Ecosystem PolyGnosis 2.0 Divide-and-Conquer Partitioning Harness Engineering +4 more

5arXiv · cs.CL·May 26, 2026·source ↗

StakeBench: A Market-Commitment-Grounded Benchmark for Financial Language Understanding

StakeBench is a new evaluation framework linking 560,876 comments from 2,261 resolved prediction markets (Polymarket and Manifold) to verified trading positions, actions, and market-odds records, replacing human annotation with observable market behavior as supervision. Four diagnostic tasks test commitment detection, side identification, action anticipation, and collective odds projection, evaluated across 15 LLMs. Results reveal structural failures: models partially recover position-side signals (Directed Accuracy 0.506–0.599) but collapse on action anticipation and fail to beat naive baselines on odds projection. Notably, model scale shows no correlation with performance, and finance-domain fine-tuning does not improve revealed-side identification.

Frontier Model Releases Evaluation and Benchmarking Manifold StakeBench Polymarket +1 more