technique
steering vectors
techniqueactiveprovisional
steering-vectors-83924bbe·1 events·first seen 9d agoAliases: steering vectors
Co-occurring entities
More like this (12)
Recent events (1)
SV-Detect: AI-generated text detection via steering vectors in representation space
SV-Detect proposes a method for detecting machine-generated text by extracting steering vectors from the hidden representations of a frozen language model, constructing layer-wise directions that separate human from AI-written text. A lightweight classifier trained on projection features achieves strong performance both in-distribution and under distribution shift across domains, source models, and editing attacks like polishing and rewriting. The approach reframes AI-text detection as a representation-space probing problem, with interpretation analyses showing the learned directions capture stylistic cues beyond surface features.