4arXiv cs.LG (Machine Learning)·3d ago

Imitation learning technique infers red agent policy in partially observable cyber-defense environments

Researchers propose a Policy Learning Technique using imitation learning to infer attacker (red agent) policies from network observations and defender actions in partially observable autonomous cyber environments. The method integrates with neurosymbolic cyber-defense agents that use behavior trees with learning-enabled components. Evaluated across diverse simulated scenarios, the approach achieves high prediction accuracy for red agent actions, improving the defender's ability to anticipate intrusions.

AI Safety Research Agent and Tool Ecosystem Learning Red Agent Policy from Observations for Neurosymbolic Autonomous Cyber Agents behavior trees with learning-enabled components

Related guides (2)

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

5arXiv · cs.AI·11d ago·source ↗

DARP: Semi-parametric retrieval-based imitation learning reduces compounding errors by 15-46%

Researchers introduce DARP (Difference-Aware Retrieval Policies), a semi-parametric imitation learning method that retrieves k-nearest neighbor demonstrations at inference time and predicts actions based on relative distance vectors between neighbor and query states. The approach reparameterizes behavior cloning around local neighborhood structure rather than global state-to-action mappings, requiring no additional data collection or online expert feedback. Across continuous control and robotic manipulation tasks, DARP shows 15-46% performance improvements over standard behavior cloning, including on high-dimensional visual inputs.

Agent and Tool Ecosystem DARP Difference-Aware Retrieval Policies for Imitation Learning

4Openai Blog·1mo ago·source ↗

Adversarial Attacks on Neural Network Policies

OpenAI published research examining adversarial attacks on neural network-based reinforcement learning policies. The work investigates how small, carefully crafted perturbations to observations can cause trained RL agents to fail catastrophically. This represents an early investigation into the robustness and safety of learned policies under adversarial conditions.

Evaluation and Benchmarking AI Safety Research adversarial examples Adversarial Attacks on Neural Network Policies Reinforcement Learning +1 more

4arXiv · cs.LG·11d ago·source ↗

Agency-transferring technique improves RL policy training by bootstrapping from baseline policies

A new arXiv paper proposes a model-free reinforcement learning method that embeds an existing suboptimal baseline policy into training via an arbitration mechanism, progressively transferring control from the baseline to a trainable neural network. The approach yields high goal-reaching rates from the start of training and produces a standalone policy that outperforms the baseline without requiring it at inference time. Theoretical bounds on goal-reaching probability are derived, and empirical results on continuous-control benchmarks show competitive or superior returns compared to existing methods.

Alignment and RLHF An Agency-Transferring Model-Free Policy Enhancement Technique

6arXiv · cs.AI·9d ago·source ↗

Ambient Diffusion Policy: imitation learning from suboptimal robot data via noise-dependent co-training

Researchers introduce Ambient Diffusion Policy, a method for robot imitation learning that extracts useful features from suboptimal demonstrations by restricting their contribution to specific diffusion timesteps (high and low noise levels). The approach is grounded in the observation that robot action data follows a spectral power law, inducing global-to-local hierarchy and locality properties in diffusion models. Evaluated across six tasks and four types of suboptimal data, it outperforms co-training baselines by up to 33% when scaled to the Open X-Embodiment dataset.

Training Infrastructure Diffusion Policy Ambient Diffusion Policy Open X-Embodiment

4Openai Blog·1mo ago·source ↗

One-shot imitation learning

OpenAI published research on one-shot imitation learning, a technique enabling agents to learn new tasks from a single demonstration. The approach allows a policy network to observe a demonstration and immediately generalize to new instances of the same task without additional training. This was an early contribution to the field of meta-learning and few-shot generalization in robotics and sequential decision-making.

Agent and Tool Ecosystem Alignment and RLHF One-Shot Imitation Learning meta-learning Imitation Learning +1 more

5arXiv · cs.CL·11d ago·source ↗

RedAct framework protects procedural skills in agent execution traces via selective redaction and watermarking

Researchers introduce RedAct, a framework for releasing agent execution traces without exposing proprietary procedural skills (tool invocations, decision logic, error-recovery strategies). The system localizes sensitive information, rewrites traces while preserving audit-critical evidence, and embeds behavioral watermarks for provenance tracking. To evaluate the approach, the authors construct CapTraceBench, a benchmark of 75 long-horizon tasks and 154 skills across seven domains. RedAct reduces normalized skill transfer from 44.7–67.1% on raw traces to below the no-skill baseline, while watermark detection achieves 93.6–100% true positive rate with under 2% false alarms.

Evaluation and Benchmarking AI Safety Research RedAct CapTraceBench Xu Shuwen +1 more

6Openai Blog·1mo ago·source ↗

Advancing Red Teaming with People and AI

OpenAI published a blog post describing advances in their red teaming methodology, combining human red teamers with AI-assisted approaches. The post outlines how AI tools are being integrated into the red teaming pipeline to improve coverage and efficiency of safety evaluations. This represents an evolution in OpenAI's pre-deployment safety testing practices.

Evaluation and Benchmarking AI Safety Research AI-assisted red teaming OpenAI human red teaming +1 more

4Hugging Face Blog·1mo ago·source ↗

Red-Teaming Large Language Models

This Hugging Face blog post introduces red-teaming as a safety evaluation methodology for large language models, explaining how adversarial testing can surface harmful outputs, biases, and failure modes before deployment. It covers techniques for systematically probing LLMs to elicit problematic behaviors and discusses the role of red-teaming in responsible AI development. The post serves as an educational overview aimed at practitioners working on LLM safety.

Evaluation and Benchmarking AI Safety Research large language models Hugging Face red-teaming