Robust Adversarial Inputs: Multi-Scale Fooling of Neural Network Classifiers
OpenAI researchers created adversarial images that reliably fool neural network classifiers even when viewed from varied scales and perspectives. This directly challenges the assumption that self-driving car vision systems are robust to adversarial attacks due to their multi-angle image capture. The finding has implications for the security of deployed vision systems in safety-critical applications.
Related guides (3)
Related events (8)
Attacking Machine Learning with Adversarial Examples
This 2017 OpenAI blog post introduces adversarial examples — inputs intentionally crafted to cause machine learning models to make mistakes, analogized to optical illusions for machines. It surveys how adversarial examples manifest across different input modalities and discusses the fundamental difficulties in defending against them. The post is an early foundational explainer on adversarial robustness from OpenAI.
Testing Robustness Against Unforeseen Adversaries
OpenAI published a method to evaluate whether neural network classifiers can defend against adversarial attacks not encountered during training. The approach introduces a new metric called UAR (Unforeseen Attack Robustness) to quantify a model's resilience to unanticipated attacks. The work argues for measuring robustness across a broader, more diverse set of attack types rather than only those seen in training.
Adversarial Attacks on Neural Network Policies
OpenAI published research examining adversarial attacks on neural network-based reinforcement learning policies. The work investigates how small, carefully crafted perturbations to observations can cause trained RL agents to fail catastrophically. This represents an early investigation into the robustness and safety of learned policies under adversarial conditions.
Acoustic adversarial attacks on computer vision using audible frequencies disrupt YOLO11 object detection
Researchers demonstrate physical acoustic attacks on AI-based computer vision systems using audible frequencies (<20 kHz), extending prior ultrasonic work to longer effective ranges. By resonating a commercial camera, they induce motion artifacts that cause YOLO11 to misclassify, miss targets, or hallucinate objects. The paper characterizes which image and object features increase vulnerability, offering a foundation for future mitigation strategies. The attack vector is physically realizable against deployed systems including autonomous vehicles and security cameras.
Computational limitations in robust classification and win-win results
OpenAI published research examining computational limitations in robust classification, exploring theoretical bounds on adversarially robust machine learning. The work investigates so-called 'win-win' results where both standard and robust accuracy can be achieved simultaneously. This is a foundational safety and robustness research contribution from 2019, addressing hardness results in adversarial ML.
Adversarial methodology improves detection of AI-generated social bot content
Researchers introduce an adversarial framework that simulates malicious actors impersonating real social media users to generate training data for AI-content detection. The approach produces a multilingual, cross-platform dataset of paired human and AI-generated messages. Models trained on this adversarial data significantly outperform existing content-based bot detection systems on out-of-distribution real-world data.
How to Train Your Model Dynamically Using Adversarial Data
This Hugging Face blog post describes a methodology for dynamically training models using adversarial data, likely in the context of improving robustness against adversarial examples. The post covers techniques for generating and incorporating adversarial inputs during the training loop to improve model resilience. Published in mid-2022, it targets practitioners looking to harden ML models against distribution shift and adversarial attacks.
How AI Training Scales: Gradient Noise Scale Predicts Batch Parallelizability
OpenAI researchers report that the gradient noise scale — a statistical metric measuring gradient variance relative to mean — reliably predicts the optimal batch size and degree of parallelizability across a wide range of neural network training tasks. The finding suggests that more complex tasks with noisier gradients can benefit from increasingly large batch sizes, removing a potential ceiling on scaling. The work frames training dynamics as a systematic, measurable process rather than empirical art.


