Almanac
paper

Phantoms and Disclosures: a Causal Framework for Auditing Synthetic Data

paperactiveprovisionalphantoms-and-disclosures-a-causal-framework-for-auditing-synthetic-data-8a9e2018·1 events·first seen 25h ago

Aliases: Phantoms and Disclosures: a Causal Framework for Auditing Synthetic Data

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.AI·25h ago·source ↗

Causal auditing framework detects privacy disclosures in synthetic data without model access

A new arXiv preprint introduces a model-agnostic empirical framework for auditing synthetic data generated by LLMs and generative AI systems for privacy leakage. The framework distinguishes 'true disclosures' (direct reproduction of user data) from 'phantom disclosures' (incidental generation), using held-out control sets and statistical hypothesis testing without requiring model access, canary insertion, or shadow model training. It functions as a membership inference attack and provides empirical lower bounds on privacy leakage that are tighter than prior data-based auditing methods. The approach is computationally lightweight and applicable to any synthetic data generation mechanism.