Almanac
technique

shielded reinforcement learning

techniqueactiveprovisionalshielded-reinforcement-learning-116ea138·1 events·first seen 5d ago

Aliases: shielded reinforcement learning

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.LG·5d ago·source ↗

Shield synthesis reframed as design-time defensibility analysis for adversarial network security games

A new arXiv preprint argues that shielded reinforcement learning's automata-theoretic machinery is better used as a design-time analytical tool than a runtime safety enforcer. The authors instantiate this via a two-player safety game for network defense, producing a 'defensibility verdict' — a formal certificate of whether a topology-specification pair can be defended — along with a 'defensibility fingerprint' combining formal safety properties and operational behavior under adaptive play. A what-if analysis reveals that formal defensibility and operational effectiveness are distinct dimensions: small architectural changes can shift operational outcomes dramatically while leaving formal safety margins nearly unchanged. The work reframes shield synthesis as an architectural analysis framework rather than a deployment mechanism.