4OpenAI Blog·1mo ago

Testing Robustness Against Unforeseen Adversaries

OpenAI published a method to evaluate whether neural network classifiers can defend against adversarial attacks not encountered during training. The approach introduces a new metric called UAR (Unforeseen Attack Robustness) to quantify a model's resilience to unanticipated attacks. The work argues for measuring robustness across a broader, more diverse set of attack types rather than only those seen in training.

Evaluation and Benchmarking AI Safety Research adversarial robustness OpenAI UAR (Unforeseen Attack Robustness)

Related guides (3)

OpenAI

OpenAI: The Lab That Made AI a Household Word

Read asBeginner In-depth

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

4Openai Blog·1mo ago·source ↗

Adversarial Attacks on Neural Network Policies

OpenAI published research examining adversarial attacks on neural network-based reinforcement learning policies. The work investigates how small, carefully crafted perturbations to observations can cause trained RL agents to fail catastrophically. This represents an early investigation into the robustness and safety of learned policies under adversarial conditions.

Evaluation and Benchmarking AI Safety Research adversarial examples Adversarial Attacks on Neural Network Policies Reinforcement Learning +1 more

4Openai Blog·1mo ago·source ↗

Computational limitations in robust classification and win-win results

OpenAI published research examining computational limitations in robust classification, exploring theoretical bounds on adversarially robust machine learning. The work investigates so-called 'win-win' results where both standard and robust accuracy can be achieved simultaneously. This is a foundational safety and robustness research contribution from 2019, addressing hardness results in adversarial ML.

Evaluation and Benchmarking AI Safety Research adversarial robustness Robust Classification OpenAI

5Openai Blog·1mo ago·source ↗

Robust Adversarial Inputs: Multi-Scale Fooling of Neural Network Classifiers

OpenAI researchers created adversarial images that reliably fool neural network classifiers even when viewed from varied scales and perspectives. This directly challenges the assumption that self-driving car vision systems are robust to adversarial attacks due to their multi-angle image capture. The finding has implications for the security of deployed vision systems in safety-critical applications.

Evaluation and Benchmarking AI Safety Research adversarial examples self-driving cars OpenAI +1 more

4Openai Blog·1mo ago·source ↗

Transfer of Adversarial Robustness Between Perturbation Types

OpenAI published research examining whether adversarial robustness trained against one type of perturbation (e.g., L-infinity) transfers to other perturbation types (e.g., L2, L1). The work investigates the generalization properties of adversarial training across different threat models. This is an early safety and robustness research contribution from OpenAI predating the modern LLM era.

Evaluation and Benchmarking AI Safety Research adversarial robustness L-infinity perturbation adversarial training +2 more

4Openai Blog·1mo ago·source ↗

Strengthening cyber resilience as AI capabilities advance

OpenAI published a post outlining its approach to cybersecurity risk as its models grow more capable, covering risk assessment frameworks, misuse mitigation, and collaboration with the security community. The piece addresses both offensive risk (AI-enabled attacks) and defensive applications. It represents OpenAI's public positioning on responsible deployment in a high-stakes domain.

AI Safety Research Enterprise Deployment Patterns OpenAI

3Openai Blog·1mo ago·source ↗

Attacking Machine Learning with Adversarial Examples

This 2017 OpenAI blog post introduces adversarial examples — inputs intentionally crafted to cause machine learning models to make mistakes, analogized to optical illusions for machines. It surveys how adversarial examples manifest across different input modalities and discusses the fundamental difficulties in defending against them. The post is an early foundational explainer on adversarial robustness from OpenAI.

AI Safety Research adversarial examples adversarial robustness OpenAI

3Hugging Face Blog·1mo ago·source ↗

How to Train Your Model Dynamically Using Adversarial Data

This Hugging Face blog post describes a methodology for dynamically training models using adversarial data, likely in the context of improving robustness against adversarial examples. The post covers techniques for generating and incorporating adversarial inputs during the training loop to improve model resilience. Published in mid-2022, it targets practitioners looking to harden ML models against distribution shift and adversarial attacks.

AI Safety Research MNIST Hugging Face adversarial training

5Openai Blog·1mo ago·source ↗

OpenAI Red Teaming Network

OpenAI is launching an open call for a Red Teaming Network, inviting domain experts to participate in ongoing safety evaluations of its models. The initiative aims to build a structured community of external red teamers who can help identify risks and failure modes across OpenAI's model releases. This represents a formalization of OpenAI's external adversarial testing program beyond one-off pre-release red teaming exercises.

Evaluation and Benchmarking AI Safety Research OpenAI Red Teaming Network OpenAI