Acoustic adversarial attacks on computer vision using audible frequencies disrupt YOLO11 object detection
Researchers demonstrate physical acoustic attacks on AI-based computer vision systems using audible frequencies (<20 kHz), extending prior ultrasonic work to longer effective ranges. By resonating a commercial camera, they induce motion artifacts that cause YOLO11 to misclassify, miss targets, or hallucinate objects. The paper characterizes which image and object features increase vulnerability, offering a foundation for future mitigation strategies. The attack vector is physically realizable against deployed systems including autonomous vehicles and security cameras.
Related guides (2)
Related events (8)
Robust Adversarial Inputs: Multi-Scale Fooling of Neural Network Classifiers
OpenAI researchers created adversarial images that reliably fool neural network classifiers even when viewed from varied scales and perspectives. This directly challenges the assumption that self-driving car vision systems are robust to adversarial attacks due to their multi-angle image capture. The finding has implications for the security of deployed vision systems in safety-critical applications.
Attacking Machine Learning with Adversarial Examples
This 2017 OpenAI blog post introduces adversarial examples — inputs intentionally crafted to cause machine learning models to make mistakes, analogized to optical illusions for machines. It surveys how adversarial examples manifest across different input modalities and discusses the fundamental difficulties in defending against them. The post is an early foundational explainer on adversarial robustness from OpenAI.
Explainability pipeline reveals divergent cues used by deepfake speech detectors
Researchers propose an audio-native explainability pipeline using Integrated Gradients on time-aligned self-supervised representations to localize decision evidence in deepfake speech detectors. Applied to three WavLM-based detectors (AASIST, CA-MHFA, SLS) on the ASVspoof 5 benchmark, the method reveals that despite similar performance, each detector relies on fundamentally different cues: environmental noise, phoneme artifacts, and word boundaries respectively. Findings are validated via causal masking experiments that confirm performance degrades when primary cues are removed. The work advances interpretability of audio deepfake detection, relevant to AI safety and media authenticity.
Lost in Fog: Sensor Perturbations Expose Reasoning Fragility in Driving VLAs
This paper presents a controlled robustness study of Vision-Language-Action (VLA) models in autonomous driving, evaluating Alpamayo R1 (10B parameters) across ~18,000 inference trials under eight sensor perturbation types including noise, lighting extremes, and fog. The key finding is that Chain-of-Causation (CoC) reasoning consistency is a high-fidelity proxy for trajectory reliability: when CoC explanations change post-perturbation, trajectory deviation spikes 5.3× (r=0.99 across attack types). Enabling CoC generation is associated with 11.8% average improvement in trajectory accuracy, and degradation under noise is approximately linear (R²=0.957), while standard preprocessing defenses offer only marginal benefit.
Adversarial methodology improves detection of AI-generated social bot content
Researchers introduce an adversarial framework that simulates malicious actors impersonating real social media users to generate training data for AI-content detection. The approach produces a multilingual, cross-platform dataset of paired human and AI-generated messages. Models trained on this adversarial data significantly outperform existing content-based bot detection systems on out-of-distribution real-world data.
Adversarial Attacks on Neural Network Policies
OpenAI published research examining adversarial attacks on neural network-based reinforcement learning policies. The work investigates how small, carefully crafted perturbations to observations can cause trained RL agents to fail catastrophically. This represents an early investigation into the robustness and safety of learned policies under adversarial conditions.
Testing Robustness Against Unforeseen Adversaries
OpenAI published a method to evaluate whether neural network classifiers can defend against adversarial attacks not encountered during training. The approach introduces a new metric called UAR (Unforeseen Attack Robustness) to quantify a model's resilience to unanticipated attacks. The work argues for measuring robustness across a broader, more diverse set of attack types rather than only those seen in training.
RING attack exploits differential privacy to amplify backdoor success in federated learning
A new arXiv paper challenges the assumption that differential privacy (DP) inherently protects federated learning (FL) against backdoor attacks, demonstrating that DP's noise mechanism actually masks the statistical signatures that defenses rely on to detect malicious updates. The authors propose RING, an attack that exploits this masking effect by having compromised clients collaboratively craft adversarial perturbations that reconstruct a strong backdoor signal at aggregation time. Evaluated across four datasets against six state-of-the-art defenses, RING achieves a 90.3% average attack success rate under moderate privacy budgets, up to 26x better than baselines. Proposed countermeasures incur significant utility trade-offs, exposing a fundamental security gap in DP-FL deployments.

