5arXiv cs.LG (Machine Learning)·47h ago

Predictability as a Fine-Grained Privacy Metric Complementary to Differential Privacy

A new arXiv preprint introduces 'privacy via predictability,' a framework that measures privacy leakage as the incremental gain in an attacker's ability to predict sensitive information after observing an algorithm's output, conditioned on the attacker's prior knowledge. The authors show predictability and differential privacy are generally incomparable, but that predictability implies mutual-information DP in worst-case regimes. They develop a generalized method of moments framework for asymptotic analysis and derive a predictability-calibrated output perturbation scheme for empirical risk minimization. The work positions predictability as a complementary, finer-grained alternative to DP for settings where attacker knowledge and query families can be specified.

Evaluation and Benchmarking AI Safety Research Differential Privacy Generalized Method of Moments Predictability as a Fine-Grained Measure for Privacy

Related guides (2)

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

4arXiv · cs.LG·1mo ago·source ↗

The Privacy Price of Tail-Risk Learning: Effective Tail Sample Size in Differentially Private CVaR Optimization

This paper characterizes how differential privacy affects the statistical complexity of CVaR (Conditional Value at Risk) optimization, showing that the effective sample size governing private tail-risk learning is εnτ rather than n, where τ is the tail mass. Complete minimax rates are derived for scalar estimation and finite classes under pure DP, with lower bounds extending to approximate DP. For convex Lipschitz learning, the CVaR-specific privacy cost necessarily scales as 1/(εnτ), with dimension dependence inherited from private stochastic convex optimization. The results reduce private CVaR learning to private learning on Θ(nτ) tail records as the canonical hard subproblem.

AI Safety Research Differential Privacy Approximate DP Private Stochastic Convex Optimization +1 more

5arXiv · cs.LG·26d ago·source ↗

Perturbation Theory for Spherical Hellinger-Kantorovich Flows with Differential Privacy Guarantees

This paper develops a perturbation theory for Spherical Hellinger-Kantorovich (SHK) gradient flows, which couple transport and reaction dynamics and coincide with birth-death Langevin dynamics. The authors derive dimension-free bounds on log-likelihood ratios and Rényi/KL divergences when two potentials differ, quantifying how perturbations propagate over time. These results are applied to differential privacy: the likelihood-ratio control yields explicit Pure-DP guarantees for SHK-based samplers implementing the exponential mechanism, while KL bounds provide Approximate-DP certificates. A utility bound is also derived that separates intrinsic exponential-mechanism suboptimality from finite-time sampling error.

AI Safety Research Alignment and RLHF Differential Privacy KL Divergence Spherical Hellinger-Kantorovich geometry +4 more

6arXiv · cs.LG·4d ago·source ↗

RING attack exploits differential privacy to amplify backdoor success in federated learning

A new arXiv paper challenges the assumption that differential privacy (DP) inherently protects federated learning (FL) against backdoor attacks, demonstrating that DP's noise mechanism actually masks the statistical signatures that defenses rely on to detect malicious updates. The authors propose RING, an attack that exploits this masking effect by having compromised clients collaboratively craft adversarial perturbations that reconstruct a strong backdoor signal at aggregation time. Evaluated across four datasets against six state-of-the-art defenses, RING achieves a 90.3% average attack success rate under moderate privacy budgets, up to 26x better than baselines. Proposed countermeasures incur significant utility trade-offs, exposing a fundamental security gap in DP-FL deployments.

AI Safety Research RING Your Privacy My Cloak: Backdoor Attacks on Differentially Private Federated Learning

5arXiv · cs.LG·47h ago·source ↗

Optimal deterministic multicalibration achieved, resolving open problem on randomization necessity

A new arXiv preprint resolves an open problem in multicalibration theory by constructing a minimax-optimal multicalibration algorithm that outputs a deterministic predictor, achieving the same O(ε⁻³) sample complexity previously only attainable by randomized predictors. The result extends to outcome indistinguishability, deterministic omnipredictors, and panpredictors with optimal sample complexity, resolving multiple open problems from recent works. Multicalibration is a fairness and reliability property requiring calibration to hold across reweighted subgroups, making this relevant to trustworthy ML research.

Evaluation and Benchmarking AI Safety Research outcome indistinguishability Optimal Deterministic Multicalibration and Omniprediction multicalibration

5arXiv · cs.LG·18d ago·source ↗

IntraShuffler: Privacy-Preserving Framework for Heterogeneous DP Federated Learning

This paper identifies a novel Privacy Inference Attack against heterogeneous differential privacy federated learning (HDP-FL) systems, where an honest-but-curious server exploits epsilon-aware aggregation and gradient denoising to infer client data distributions and link updates across rounds. To counter this, the authors propose IntraShuffler, a middleware framework that groups clients into privacy-compatible buckets and performs parameter-level shuffling within buckets, preserving epsilon-aware aggregation while disrupting persistent gradient structure. Experiments on four datasets show IntraShuffler reduces gradient recoverability by over 60% and drops surrogate inference accuracy from 0.78 to 0.33 with minimal utility loss.

AI Safety Research Enterprise Deployment Patterns Differential Privacy Privacy Inference Attack IntraShuffler +2 more

7Google Deepmind Blog·1mo ago·source ↗

VaultGemma: The world's most capable differentially private LLM

DeepMind introduces VaultGemma, a large language model trained from scratch using differential privacy (DP), claiming it as the most capable DP-trained model to date. The announcement positions VaultGemma as a significant advance in privacy-preserving AI, combining strong utility with formal privacy guarantees. The blog post is brief and likely precedes a more detailed technical disclosure.

Open Weights Progress AI Safety Research Differential Privacy Gemma Google DeepMind +2 more

5arXiv · cs.CL·15d ago·source ↗

PropMe framework distinguishes memorization capability from propensity in LLMs

A new arXiv preprint introduces PropMe, a framework that separates whether LLMs can be forced to reproduce training data (capability) from whether they do so under ordinary use (propensity). The authors also release SimpleTrace, a lightweight pipeline using infini-gram to attribute model outputs to training corpora. Evaluating two open models on Common Pile and Dynaword, they find a consistent gap: adversarial prefix attacks elicit strong memorization, but propensity scores remain low in non-adversarial settings. The paper argues memorization audits should report both worst-case extractability and ordinary leakage propensity.

Evaluation and Benchmarking AI Safety Research PropMe SimpleTrace Dynaword +4 more

6arXiv · cs.AI·4d ago·source ↗

Causal auditing framework detects privacy disclosures in synthetic data without model access

A new arXiv preprint introduces a model-agnostic empirical framework for auditing synthetic data generated by LLMs and generative AI systems for privacy leakage. The framework distinguishes 'true disclosures' (direct reproduction of user data) from 'phantom disclosures' (incidental generation), using held-out control sets and statistical hypothesis testing without requiring model access, canary insertion, or shadow model training. It functions as a membership inference attack and provides empirical lower bounds on privacy leakage that are tighter than prior data-based auditing methods. The approach is computationally lightweight and applicable to any synthetic data generation mechanism.

Evaluation and Benchmarking AI Safety Research Differential Privacy Phantoms and Disclosures: a Causal Framework for Auditing Synthetic Data