technique
reinforcement learning with belief-state rewards
techniqueactiveprovisional
reinforcement-learning-with-belief-state-rewards-d5939c34·1 events·first seen 18d agoAliases: reinforcement learning with belief-state rewards
Co-occurring entities
More like this (12)
rule-based reinforcement learning rewardsreinforcement learning from verifier feedbackshielded reinforcement learningUsing Reward Uncertainty to Induce Diverse Behaviour in Reinforcement Learningsim-to-real reinforcement learningcuriosity-driven reinforcement learningReinforcement Learning for CodeReinforcement Learning with Verifiable RewardsEntropy-Regularized Reinforcement Learningdecoupled reinforcement learningReinforcement Learning from Rich Feedback with Distributional DAggerGradient-Guided Reward Optimization
Recent events (1)
BeliefTrack: Benchmarking and Improving Contextual Belief Management in LLMs
This paper introduces Contextual Belief Management (CBM) as a framework for studying how LLMs should update, preserve, or ignore information across long-horizon interactions. The authors release BeliefTrack, a closed-world benchmark with symbolic verifiers enabling exact turn-level evaluation across Rule Discovery and Circuit Diagnosis tasks. Vanilla LLMs show severe CBM failures; reinforcement learning with belief-state rewards reduces failure rates by 70.9% on average, while representation-level steering achieves 46.1% reduction. Probing experiments reveal latent belief-state dynamics underlying these failures.