Entity · technique

SecAlign

techniqueactivesecalign-227860c6·1 events·first seen May 18, 2026

Aliases: SecAlign

Co-occurring entities

StruQ Berkeley AI Research (BAIR)Instruction Hierarchy Llama3-8B-Instruct Direct Preference Optimization (DPO)AlpacaEval 2 OpenAI Sizhe Chen

More like this (12)

ALIGN VecAlign MedAlign AlignAtt G-IdiomAlign AI alignment Positive Alignment hidden misalignment misalignment detection CLIPSeg AlignAtt4LLM The Alignment Project

Recent events (1)

6Berkeley Ai Research (Bair) Blog·May 18, 2026·source ↗

Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)

Researchers from BAIR propose two fine-tuning-based defenses against prompt injection attacks: StruQ (Structured Instruction Tuning) and SecAlign (Special Preference Optimization). Both methods use a Secure Front-End with special delimiter tokens to separate trusted prompts from untrusted data, then fine-tune LLMs to ignore injected instructions. SecAlign, which uses DPO-style preference optimization, reduces attack success rates to under 15% against strong optimization-based attacks—more than 4x better than prior SOTA—while preserving model utility on AlpacaEval2.

AI Safety Research Agent and Tool Ecosystem StruQ SecAlign Berkeley AI Research (BAIR)+7 more