product

Unfireable Safety Kernel

productactiveprovisionalunfireable-safety-kernel-f2b71118·1 events·first seen 11h ago

Aliases: Unfireable Safety Kernel

Co-occurring entities

Constitutional AI Kani Z3 The Unfireable Safety Kernel: Execution-Time AI Alignment for AI Agents and Other Escapable AI Systems

More like this (12)

The Unfireable Safety Kernel: Execution-Time AI Alignment for AI Agents and Other Escapable AI Systems Neural Safety Filters AI Safety Level Standards Linux kernel kernel-builder Nemotron 3.5 Content Safety Frontier Safety Framework Safety Filters SafeCtrl-RL SafetyKit CoT-Output 2x2 safety matrix Centered Kernel Alignment

Recent events (1)

7arXiv · cs.AI·11h ago·source ↗

Unfireable Safety Kernel: Formal execution-time alignment layer for escapable AI agents

A new arXiv preprint introduces the concept of 'escapable AI systems' — agents with sufficient reach into their own runtime to subvert in-process safety controls — and proposes a four-property architectural framework for external enforcement. The authors present the Unfireable Safety Kernel, a Rust reference implementation with machine-checked fail-closed invariants via SMT (Z3) and bounded model checking (Kani), evaluated against a self-improving world model adversary across 7,240 authorization attempts with zero successful bypasses. The work positions this 'execution-time alignment' layer as a complement to training-time approaches like RLHF and Constitutional AI, arguing that any control inside the agent's address space is fundamentally reachable by adversarial inputs.

AI Safety Research Agent and Tool Ecosystem Constitutional AI Kani Z3 +3 more