benchmark
Long-context Reasoning Benchmarks
benchmarkactiveprovisional
long-context-reasoning-benchmarks-23403fa9·1 events·first seen 16d agoAliases: Long-context Reasoning Benchmarks
Co-occurring entities
More like this (12)
Reasoning EnhancementBeyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning ModelsReasoning Language ModelsDoes Reasoning Preserve Alignment? On the Trustworthiness of Large Reasoning ModelsLarge Reasoning Modelslatent reasoningBias Benchmark for Question AnsweringWhen the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning ModelsCharXiv Reasoningtemporally grounded QA benchmarkLearning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-TuningReasoning in Memory (RiM)
Recent events (1)
LongTraceRL: Reinforcement Learning for Long-Context Reasoning via Search Agent Trajectories and Rubric Rewards
LongTraceRL is a new RL training framework for improving long-context reasoning in LLMs, addressing limitations of existing RLVR methods. It constructs challenging training data using multi-hop questions from knowledge graph random walks and tiered distractors derived from search agent trajectories (high-confusability: read but uncited; low-confusability: seen but unopened). A rubric reward provides entity-level process supervision along reasoning chains, applied only to correct responses to prevent reward hacking. Experiments across three LLMs (4B–30B parameters) on five long-context benchmarks show consistent improvements over strong baselines.