technique
inference-time behavioural unlearning
techniqueactiveprovisional
inference-time-behavioural-unlearning-c05ad1d1·1 events·first seen 22d agoAliases: inference-time behavioural unlearning
Co-occurring entities
More like this (12)
Alternating Token-Weighted Unlearninginference-time interventionBackdoor Unlearning Generalization: A Path Toward the Removal of Unknown Triggers in LLMsdecoupled reinforcement learningcuriosity-driven reinforcement learninglatent reasoningreinforcement learning from verifier feedbackbehavior trees with learning-enabled componentsshielded reinforcement learningUniIntervene: Agentic Intervention for Efficient Real-World Reinforcement Learningtemporally ordered pre-trainingLearning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning
Recent events (1)
SafeCtrl-RL: Inference-Time Adaptive Behaviour Control for LLMs via RL-Driven Prompt Optimisation
SafeCtrl-RL is a framework for controlling LLM safety at inference time without retraining or modifying model parameters. It formulates dialogue generation as a sequential decision process where an RL agent dynamically selects prompt adjustment strategies based on contextual feedback, iteratively suppressing unsafe outputs. The authors frame this as 'inference-time behavioural unlearning' and report improvements in safety and response quality across multiple LLMs and unsafe dialogue scenarios, outperforming existing prompt-based optimisation baselines.