Almanac
benchmark

TraitFactory

benchmarkactiveprovisionaltraitfactory-1e23f74f·1 events·first seen 41h ago

Aliases: TraitFactory

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·41h ago·source ↗

ORBIT: Training-free multi-attribute behavioral steering via orthogonal subspace rotation

Researchers introduce ORBIT (Orthogonal Rotation-Based Intervention Technique), a training-free activation steering method that simultaneously controls multiple behavioral attributes in language models. The approach constructs a joint subspace from per-attribute steering planes via SVD and applies a single norm-preserving rotation, avoiding the norm imbalance and directional cancellation problems of naive vector summation. The authors also release TraitFactory, a new multi-attribute behavioral benchmark, and evaluate across Llama-3.2-3B, Qwen-2.5-7B, and Llama-3.1-8B. ORBIT outperforms existing training-free baselines on multi-attribute steering while better preserving output coherence.