Probe Trajectories
probe-trajectories-25c9e90a·1 events·first seen 28d agoAliases: Probe Trajectories
Co-occurring entities
More like this (12)
Recent events (1)
Probe Trajectories Reveal Reasoning Dynamics in Large Reasoning Models
This paper investigates whether hidden representations of Large Reasoning Models (LRMs) can predict future model behavior by analyzing probe trajectories—the continuous evolution of concept probabilities across Chain-of-Thought reasoning tokens. The authors find that temporal trajectory features (volatility, trend, steady-state) significantly outperform single static probes, with max-pooling achieving up to 95% AUROC across safety and mathematics domains. Two methodological insights are offered: template-based training data matches dynamically generated responses in quality, and pooling strategy is critical to probe performance. The work positions probe trajectories as a complementary safety monitoring framework for LRMs where CoT faithfulness cannot be assumed.