Almanac
technique

Knowledge Graph Random Walk

techniqueactiveprovisionalknowledge-graph-random-walk-bef1aff1·1 events·first seen 16d ago

Aliases: Knowledge Graph Random Walk

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·16d ago·source ↗

LongTraceRL: Reinforcement Learning for Long-Context Reasoning via Search Agent Trajectories and Rubric Rewards

LongTraceRL is a new RL training framework for improving long-context reasoning in LLMs, addressing limitations of existing RLVR methods. It constructs challenging training data using multi-hop questions from knowledge graph random walks and tiered distractors derived from search agent trajectories (high-confusability: read but uncited; low-confusability: seen but unopened). A rubric reward provides entity-level process supervision along reasoning chains, applied only to correct responses to prevent reward hacking. Experiments across three LLMs (4B–30B parameters) on five long-context benchmarks show consistent improvements over strong baselines.