Entity · technique

ALIGNBEAM

techniqueactivealignbeam-d45de65f·1 events·first seen Jun 11, 2026

Aliases: ALIGNBEAM

More like this (12)

Beam Search ALIGN BrightbeamAI Constrained Beam Search Andy Beam BEIR MedAlign BEA-Dialogue BITEMBED AlignAtt alignment tax AI alignment

Recent events (1)

6arXiv · cs.CL·Jun 11, 2026·source ↗

ALIGNBEAM: Training-free safety alignment transfer across model families at inference time

ALIGNBEAM is a training-free inference-time method that transfers safety alignment from a safe anchor model to a domain-fine-tuned target model, even when the two models have different vocabularies. It works by translating anchor logits into the target model's vocabulary token-by-token at each decoding step, then using a small LLM judge to select the safest among K candidate continuations. The method addresses a known vulnerability where domain fine-tuning degrades safety, and demonstrates substantial refusal improvements on adversarial benchmarks without retraining either model or incurring prohibitive inference overhead.

Inference Economics AI Safety Research ALIGNBEAM +1 more