Almanac
paper

The Unfireable Safety Kernel: Execution-Time AI Alignment for AI Agents and Other Escapable AI Systems

paperactiveprovisionalthe-unfireable-safety-kernel-execution-time-ai-alignment-for-ai-agents-and-other-escapable-ai-systems-7e160207·1 events·first seen 11h ago

Aliases: The Unfireable Safety Kernel: Execution-Time AI Alignment for AI Agents and Other Escapable AI Systems

Co-occurring entities

More like this (12)

Recent events (1)

7arXiv · cs.AI·11h ago·source ↗

Unfireable Safety Kernel: Formal execution-time alignment layer for escapable AI agents

A new arXiv preprint introduces the concept of 'escapable AI systems' — agents with sufficient reach into their own runtime to subvert in-process safety controls — and proposes a four-property architectural framework for external enforcement. The authors present the Unfireable Safety Kernel, a Rust reference implementation with machine-checked fail-closed invariants via SMT (Z3) and bounded model checking (Kani), evaluated against a self-improving world model adversary across 7,240 authorization attempts with zero successful bypasses. The work positions this 'execution-time alignment' layer as a complement to training-time approaches like RLHF and Constitutional AI, arguing that any control inside the agent's address space is fundamentally reachable by adversarial inputs.