6arXiv cs.AI (Artificial Intelligence)·4d ago

Causal auditing framework detects privacy disclosures in synthetic data without model access

A new arXiv preprint introduces a model-agnostic empirical framework for auditing synthetic data generated by LLMs and generative AI systems for privacy leakage. The framework distinguishes 'true disclosures' (direct reproduction of user data) from 'phantom disclosures' (incidental generation), using held-out control sets and statistical hypothesis testing without requiring model access, canary insertion, or shadow model training. It functions as a membership inference attack and provides empirical lower bounds on privacy leakage that are tighter than prior data-based auditing methods. The approach is computationally lightweight and applicable to any synthetic data generation mechanism.

Evaluation and Benchmarking AI Safety Research Differential Privacy Phantoms and Disclosures: a Causal Framework for Auditing Synthetic Data

Related guides (2)

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

6arXiv · cs.AI·4d ago·source ↗

Bayesian audit framework for public AI evaluation archives challenges frontier model claims

A new arXiv preprint proposes a Bayesian inference and decision-audit framework for interpreting public AI evaluation archives (LiveBench, Open LLM Leaderboard v2, LMArena, GAIA, tau-bench) as longitudinal time series rather than terminal leaderboards. The paper demonstrates that a single terminal snapshot is compatible with multiple distinct performance histories, yielding ambiguous timing estimates for reaching capability ceilings. A candidate selection-aware frontier model is shown to fail synthetic recovery, objective-archive prediction, preference transfer, and uncertainty calibration, with fixed audit gates rejecting its stronger claims. The work proposes an archive-and-adjudication protocol to reconstruct evaluation histories and falsify unsupported frontier capability claims.

Evaluation and Benchmarking AI Safety Research Bayesian Inference and Decision Audits for Public Archives of Frontier AI Evaluations GAIA Open LLM Leaderboard +3 more

6arXiv · cs.LG·8d ago·source ↗

Task exchangeability framework enables statistically valid inference from synthetic data

A new arXiv preprint proposes a statistical framework for using synthetic data in scientific research with provable validity guarantees, centered on a condition called 'task exchangeability.' The framework requires identifying historical tasks with real data that are exchangeable with the current task of interest, enabling valid inference even when synthetic data is biased or misspecified. The authors demonstrate the approach on LLM-generated 'silicon samples' for public opinion surveys and LLM-as-a-judge AI evaluation settings. This addresses a foundational concern about the reliability of synthetic data pipelines increasingly used across AI evaluation and scientific research.

Evaluation and Benchmarking AI Safety Research Valid Inference with Synthetic Data via Task Exchangeability task exchangeability

5Openai Blog·1mo ago·source ↗

A Hazard Analysis Framework for Code Synthesis Large Language Models

OpenAI published a hazard analysis framework specifically targeting code synthesis LLMs, addressing the safety and risk dimensions of models that generate executable code. The framework likely identifies threat categories, failure modes, and mitigation strategies relevant to deploying code-generating AI systems. This represents an early structured attempt to apply safety engineering methodology to a specific LLM capability domain. The work is relevant to both AI safety research and enterprise deployment considerations for coding assistants.

AI Safety Research Agent and Tool Ecosystem hazard analysis framework code synthesis LLMs OpenAI

6arXiv · cs.CL·11d ago·source ↗

Clinically grounded privacy evaluation framework reveals high memorization risk in medical LMs

Researchers introduce a tiered adversarial framework for evaluating privacy leakage in medical language models, moving beyond simple training-text recovery to realistic clinical threat models. Applied to an LM pretrained on 378k clinical notes, the framework finds that routine encounter metadata (name, DOB, provider, visit date) elicits high verbatim memorization and sensitive-diagnosis recovery (AUROC 0.91 for abortion, 0.81 for HIV). The study also finds that exact-match memorization overstates disclosure risk because 36% of memorized tokens reflect templated documentation. The work provides a practical contextual privacy evaluation methodology for medical LMs trained on longitudinal patient data.

Evaluation and Benchmarking AI Safety Research Clinically Grounded Privacy Evaluation of Medical LMs +1 more

4arXiv · cs.CL·5d ago·source ↗

Synthetic data generation method enables small LLMs to match large models on Text-To-Cypher tasks

A new arXiv paper presents an automatic synthetic data generation method for fine-tuning small LLMs on Text-To-Cypher (Text2Cypher) parsing, enabling natural language interfaces to property graph databases. Experiments across major Text-To-Cypher benchmarks show that small fine-tuned models can compete with much larger proprietary models. The approach is positioned as a solution for local deployment scenarios requiring data sovereignty without expensive annotation.

Evaluation and Benchmarking Enterprise Deployment Patterns Cypher Achieving Precise Text-To-Cypher Via Grounded Knowledge Graph Data Generation

6arXiv · cs.LG·4d ago·source ↗

RING attack exploits differential privacy to amplify backdoor success in federated learning

A new arXiv paper challenges the assumption that differential privacy (DP) inherently protects federated learning (FL) against backdoor attacks, demonstrating that DP's noise mechanism actually masks the statistical signatures that defenses rely on to detect malicious updates. The authors propose RING, an attack that exploits this masking effect by having compromised clients collaboratively craft adversarial perturbations that reconstruct a strong backdoor signal at aggregation time. Evaluated across four datasets against six state-of-the-art defenses, RING achieves a 90.3% average attack success rate under moderate privacy budgets, up to 26x better than baselines. Proposed countermeasures incur significant utility trade-offs, exposing a fundamental security gap in DP-FL deployments.

AI Safety Research RING Your Privacy My Cloak: Backdoor Attacks on Differentially Private Federated Learning

4Hugging Face Blog·1mo ago·source ↗

Creating Privacy Preserving AI with Substra

This Hugging Face blog post covers Substra, a federated learning framework developed by Owkin for privacy-preserving AI. The post describes how Substra enables collaborative model training across institutions without sharing raw data, targeting healthcare and biomedical use cases. It represents a practical deployment pattern for federated learning in sensitive data environments.

AI Safety Research Enterprise Deployment Patterns Federated Learning Substra Owkin +1 more

6arXiv · cs.CL·18d ago·source ↗

Ghost Tool Calls: Issue-Time Privacy for Speculative Agent Tools

This paper identifies a privacy vulnerability in tool-augmented language agents that speculatively issue future tool calls to reduce latency: these 'ghost tool calls' leak inferred user intent to external services before the agent commits to a branch, and cannot be unsent after the fact. The authors argue that timing—not authorization—is the core issue, and propose Speculative Tool Privacy Contracts, a runtime abstraction treating pre-commitment observation as a distinct first-class effect. A prototype runtime is implemented and twelve policies are evaluated across three corpora, finding that only issue-time argument or destination suppression/modification actually reduces inference leakage.

Inference Economics AI Safety Research tool-augmented language agents Speculative Tool Privacy Contracts Ghost Tool Calls +2 more