Entity · technique

adversarial training

techniqueactiveadversarial-training-4bbe420e·3 events·first seen May 19, 2026

Aliases: adversarial training

Co-occurring entities

KV Cache representation-level sensitive information leakage LCGuard multi-agent systems latent communication adversarial robustness L-infinity perturbation L2 perturbation OpenAI MNIST Hugging Face

More like this (12)

adversarial examples adversarial refinement adversarial robustness collaborative distributed training distributed training adversarial pragmatics Adversarial Attacks on Neural Network Policies self-training ART (Agent Reinforcement Trainer)Confessions (training method)automated red teaming Task-Agnostic Pretraining (TAP)

Recent events (3)

6arXiv · cs.AI·May 22, 2026·source ↗

LCGuard: Adversarial Training Framework for Safe KV Cache Sharing in Multi-Agent LLM Systems

LCGuard introduces a framework for preventing sensitive information leakage when multi-agent LLM systems share KV caches as a latent communication channel. The approach formalizes leakage operationally via reconstruction: a shared cache artifact is deemed unsafe if an adversarial decoder can recover sensitive inputs from it. An adversarial training loop pits a reconstructor against LCGuard's representation-level transformations, which aim to preserve task-relevant semantics while suppressing recoverable sensitive content. Empirical results across multiple model families and multi-agent benchmarks show reduced reconstruction-based leakage and attack success rates with competitive task performance.

Inference Economics AI Safety Research KV Cache representation-level sensitive information leakage LCGuard +4 more

4Openai Blog·May 20, 2026·source ↗

Transfer of Adversarial Robustness Between Perturbation Types

OpenAI published research examining whether adversarial robustness trained against one type of perturbation (e.g., L-infinity) transfers to other perturbation types (e.g., L2, L1). The work investigates the generalization properties of adversarial training across different threat models. This is an early safety and robustness research contribution from OpenAI predating the modern LLM era.

Evaluation and Benchmarking AI Safety Research adversarial robustness L-infinity perturbation adversarial training +2 more

3Hugging Face Blog·May 19, 2026·source ↗

How to Train Your Model Dynamically Using Adversarial Data

This Hugging Face blog post describes a methodology for dynamically training models using adversarial data, likely in the context of improving robustness against adversarial examples. The post covers techniques for generating and incorporating adversarial inputs during the training loop to improve model resilience. Published in mid-2022, it targets practitioners looking to harden ML models against distribution shift and adversarial attacks.

AI Safety Research MNIST Hugging Face adversarial training