Almanac
technique

inference-time behavioural unlearning

techniqueactiveprovisionalinference-time-behavioural-unlearning-c05ad1d1·1 events·first seen 22d ago

Aliases: inference-time behavioural unlearning

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·22d ago·source ↗

SafeCtrl-RL: Inference-Time Adaptive Behaviour Control for LLMs via RL-Driven Prompt Optimisation

SafeCtrl-RL is a framework for controlling LLM safety at inference time without retraining or modifying model parameters. It formulates dialogue generation as a sequential decision process where an RL agent dynamically selects prompt adjustment strategies based on contextual feedback, iteratively suppressing unsafe outputs. The authors frame this as 'inference-time behavioural unlearning' and report improvements in safety and response quality across multiple LLMs and unsafe dialogue scenarios, outperforming existing prompt-based optimisation baselines.