technique
Jailbreak
techniqueactive
jailbreak-0be9e8ca·1 events·first seen 28d agoAliases: Jailbreak
Co-occurring entities
More like this (12)
Recent events (1)
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
OpenAI published research on the 'instruction hierarchy,' a training approach that teaches LLMs to prioritize instructions based on their source privilege level (system prompt > user > third-party). The method aims to make models more robust against prompt injection, jailbreaks, and adversarial instruction overrides. By training models to recognize and respect a hierarchy of instruction authority, OpenAI seeks to reduce the attack surface for multi-agent and deployed LLM systems.