Entity · paper

Predicting Future Behaviors in Reasoning Models Enables Better Steering

paperactivepredicting-future-behaviors-in-reasoning-models-enables-better-steering-05d4f39d·1 events·first seen Jun 10, 2026

Aliases: Predicting Future Behaviors in Reasoning Models Enables Better Steering

Co-occurring entities

Future Probe Controlled Generation

More like this (12)

When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning Reasoning Language Models Does Reasoning Preserve Alignment? On the Trustworthiness of Large Reasoning Models Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement Agentic Chain-of-Thought Steering Leveraging Instruction Tuning and Merging for Reasoning Model Adaptation Fixed-Point Reasoning Model Can We Break LLMs Out of Self-Loops? Fine-Grained Reasoning Control with Activation Steering Forecasting With LLMs: Improved Generalization Through Feature Steering Beyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Models Reasoning Enhancement

Recent events (1)

6arXiv · cs.LG·Jun 10, 2026·source ↗

Future Probe Controlled Generation enables steering of reasoning models without quality degradation

Researchers introduce Future Probe Controlled Generation (FPCG), a text-level steering method for large reasoning models (LRMs) that trains activation probes to predict future behavior likelihoods from intermediate reasoning steps rather than detecting behavior in already-generated text. The probes achieve 64–91% accuracy in predicting the most likely future behavior, revealing a distinct class of internal prediction features separate from detection features. FPCG steers model outputs by sampling candidate sentences and selecting the best according to these probes, achieving steering with minimal output quality degradation and succeeding in cases where activation steering fails. The work provides a principled distinction between detection and prediction features as intervention targets for controlling LRM behavior.

Frontier Model Releases AI Safety Research Predicting Future Behaviors in Reasoning Models Enables Better Steering Future Probe Controlled Generation +1 more