Almanac
technique

adversarial training

techniqueactiveadversarial-training-4bbe420e·3 events·first seen 28d ago

Aliases: adversarial training

Co-occurring entities

More like this (12)

Recent events (3)

4Openai Blog·28d ago·source ↗

Transfer of Adversarial Robustness Between Perturbation Types

OpenAI published research examining whether adversarial robustness trained against one type of perturbation (e.g., L-infinity) transfers to other perturbation types (e.g., L2, L1). The work investigates the generalization properties of adversarial training across different threat models. This is an early safety and robustness research contribution from OpenAI predating the modern LLM era.

3Hugging Face Blog·28d ago·source ↗

How to Train Your Model Dynamically Using Adversarial Data

This Hugging Face blog post describes a methodology for dynamically training models using adversarial data, likely in the context of improving robustness against adversarial examples. The post covers techniques for generating and incorporating adversarial inputs during the training loop to improve model resilience. Published in mid-2022, it targets practitioners looking to harden ML models against distribution shift and adversarial attacks.

6arXiv · cs.AI·26d ago·source ↗

LCGuard: Adversarial Training Framework for Safe KV Cache Sharing in Multi-Agent LLM Systems

LCGuard introduces a framework for preventing sensitive information leakage when multi-agent LLM systems share KV caches as a latent communication channel. The approach formalizes leakage operationally via reconstruction: a shared cache artifact is deemed unsafe if an adversarial decoder can recover sensitive inputs from it. An adversarial training loop pits a reconstructor against LCGuard's representation-level transformations, which aim to preserve task-relevant semantics while suppressing recoverable sensitive content. Empirical results across multiple model families and multi-agent benchmarks show reduced reconstruction-based leakage and attack success rates with competitive task performance.