Entity · technique

inference-time behavioural unlearning

techniqueactiveinference-time-behavioural-unlearning-c05ad1d1·1 events·first seen May 26, 2026

Aliases: inference-time behavioural unlearning

Co-occurring entities

Reinforcement Learning SafeCtrl-RL prompt optimisation

More like this (12)

Alternating Token-Weighted Unlearning inference-time intervention Backdoor Unlearning Generalization: A Path Toward the Removal of Unknown Triggers in LLMs Uncertainty-based Debiasing and Unlearning for Decontamination decoupled reinforcement learning TILDE: TILt-based Distributional Erasure for Concept Unlearning Inference-time Plasticity curiosity-driven reinforcement learning latent reasoning reinforcement learning from verifier feedback behavior trees with learning-enabled components shielded reinforcement learning

Recent events (1)

6arXiv · cs.CL·May 26, 2026·source ↗

SafeCtrl-RL: Inference-Time Adaptive Behaviour Control for LLMs via RL-Driven Prompt Optimisation

SafeCtrl-RL is a framework for controlling LLM safety at inference time without retraining or modifying model parameters. It formulates dialogue generation as a sequential decision process where an RL agent dynamically selects prompt adjustment strategies based on contextual feedback, iteratively suppressing unsafe outputs. The authors frame this as 'inference-time behavioural unlearning' and report improvements in safety and response quality across multiple LLMs and unsafe dialogue scenarios, outperforming existing prompt-based optimisation baselines.

Inference Economics AI Safety Research inference-time behavioural unlearning Reinforcement Learning SafeCtrl-RL +2 more