Almanac
technique

Jailbreak

techniqueactivejailbreak-0be9e8ca·1 events·first seen 28d ago

Aliases: Jailbreak

Co-occurring entities

More like this (12)

Recent events (1)

7Openai Blog·28d ago·source ↗

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

OpenAI published research on the 'instruction hierarchy,' a training approach that teaches LLMs to prioritize instructions based on their source privilege level (system prompt > user > third-party). The method aims to make models more robust against prompt injection, jailbreaks, and adversarial instruction overrides. By training models to recognize and respect a hierarchy of instruction authority, OpenAI seeks to reduce the attack surface for multi-agent and deployed LLM systems.