adversarial training
adversarial-training-4bbe420e·3 events·first seen 28d agoAliases: adversarial training
Co-occurring entities
More like this (12)
Recent events (3)
Transfer of Adversarial Robustness Between Perturbation Types
OpenAI published research examining whether adversarial robustness trained against one type of perturbation (e.g., L-infinity) transfers to other perturbation types (e.g., L2, L1). The work investigates the generalization properties of adversarial training across different threat models. This is an early safety and robustness research contribution from OpenAI predating the modern LLM era.
How to Train Your Model Dynamically Using Adversarial Data
This Hugging Face blog post describes a methodology for dynamically training models using adversarial data, likely in the context of improving robustness against adversarial examples. The post covers techniques for generating and incorporating adversarial inputs during the training loop to improve model resilience. Published in mid-2022, it targets practitioners looking to harden ML models against distribution shift and adversarial attacks.
LCGuard: Adversarial Training Framework for Safe KV Cache Sharing in Multi-Agent LLM Systems
LCGuard introduces a framework for preventing sensitive information leakage when multi-agent LLM systems share KV caches as a latent communication channel. The approach formalizes leakage operationally via reconstruction: a shared cache artifact is deemed unsafe if an adversarial decoder can recover sensitive inputs from it. An adversarial training loop pits a reconstructor against LCGuard's representation-level transformations, which aim to preserve task-relevant semantics while suppressing recoverable sensitive content. Empirical results across multiple model families and multi-agent benchmarks show reduced reconstruction-based leakage and attack success rates with competitive task performance.