Phantoms and Disclosures: a Causal Framework for Auditing Synthetic Data
phantoms-and-disclosures-a-causal-framework-for-auditing-synthetic-data-8a9e2018·1 events·first seen 25h agoAliases: Phantoms and Disclosures: a Causal Framework for Auditing Synthetic Data
Co-occurring entities
More like this (12)
Recent events (1)
Causal auditing framework detects privacy disclosures in synthetic data without model access
A new arXiv preprint introduces a model-agnostic empirical framework for auditing synthetic data generated by LLMs and generative AI systems for privacy leakage. The framework distinguishes 'true disclosures' (direct reproduction of user data) from 'phantom disclosures' (incidental generation), using held-out control sets and statistical hypothesis testing without requiring model access, canary insertion, or shadow model training. It functions as a membership inference attack and provides empirical lower bounds on privacy leakage that are tighter than prior data-based auditing methods. The approach is computationally lightweight and applicable to any synthetic data generation mechanism.