
adversarial robustness
adversarial-robustness-ba26e331·5 events·first seen 28d agoAliases: adversarial robustness
Co-occurring entities
More like this (12)
Recent events (5)
Transfer of Adversarial Robustness Between Perturbation Types
OpenAI published research examining whether adversarial robustness trained against one type of perturbation (e.g., L-infinity) transfers to other perturbation types (e.g., L2, L1). The work investigates the generalization properties of adversarial training across different threat models. This is an early safety and robustness research contribution from OpenAI predating the modern LLM era.
Trading Inference-Time Compute for Adversarial Robustness
OpenAI published research exploring the trade-off between inference-time compute and adversarial robustness. The work investigates whether allocating more compute at inference time can improve a model's resistance to adversarial attacks. This connects to the broader trend of using test-time compute scaling as a lever for capability and safety improvements.
Testing Robustness Against Unforeseen Adversaries
OpenAI published a method to evaluate whether neural network classifiers can defend against adversarial attacks not encountered during training. The approach introduces a new metric called UAR (Unforeseen Attack Robustness) to quantify a model's resilience to unanticipated attacks. The work argues for measuring robustness across a broader, more diverse set of attack types rather than only those seen in training.
Computational limitations in robust classification and win-win results
OpenAI published research examining computational limitations in robust classification, exploring theoretical bounds on adversarially robust machine learning. The work investigates so-called 'win-win' results where both standard and robust accuracy can be achieved simultaneously. This is a foundational safety and robustness research contribution from 2019, addressing hardness results in adversarial ML.
Attacking Machine Learning with Adversarial Examples
This 2017 OpenAI blog post introduces adversarial examples — inputs intentionally crafted to cause machine learning models to make mistakes, analogized to optical illusions for machines. It surveys how adversarial examples manifest across different input modalities and discusses the fundamental difficulties in defending against them. The post is an early foundational explainer on adversarial robustness from OpenAI.