Unfireable Safety Kernel
unfireable-safety-kernel-f2b71118·1 events·first seen 11h agoAliases: Unfireable Safety Kernel
Co-occurring entities
More like this (12)
Recent events (1)
Unfireable Safety Kernel: Formal execution-time alignment layer for escapable AI agents
A new arXiv preprint introduces the concept of 'escapable AI systems' — agents with sufficient reach into their own runtime to subvert in-process safety controls — and proposes a four-property architectural framework for external enforcement. The authors present the Unfireable Safety Kernel, a Rust reference implementation with machine-checked fail-closed invariants via SMT (Z3) and bounded model checking (Kani), evaluated against a self-improving world model adversary across 7,240 authorization attempts with zero successful bypasses. The work positions this 'execution-time alignment' layer as a complement to training-time approaches like RLHF and Constitutional AI, arguing that any control inside the agent's address space is fundamentally reachable by adversarial inputs.