6arXiv cs.AI (Artificial Intelligence)·2d ago

MADreMIA: Chained Regeneration Framework for Amplifying Membership Inference Signals

Researchers introduce MADreMIA, a model-agnostic framework for membership inference attacks (MIA) and dataset inference (DI) that uses iterative chained regeneration across modalities rather than shadow model training. The key insight is that memorized training samples exhibit higher coherence and slower degradation under repeated regeneration than non-member samples, yielding stronger membership signals at low false positive rates. The framework is evaluated across image autoregressive models, diffusion models, language models, and audio models, supporting white-, gray-, and black-box threat models. This work advances privacy auditing and copyright enforcement capabilities for large generative models.

Evaluation and Benchmarking AI Safety Research Model Autophagy Disorder MADreMIA

Related guides (2)

AI Safety ResearchTopic guide

AI Safety Research: From Lab Principles to Real-World Flashpoints

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

AI Evaluation and Benchmarking: From Leaderboards to the Limits of Measurement

Read asBeginner In-depth

Related events (8)

5arXiv · cs.AI·Jun 25, 2026·source ↗

AMIA: Attention-based membership inference attacks on tabular foundation models with k-anonymity defense

Researchers demonstrate that tabular foundation models using in-context learning are vulnerable to membership inference attacks (MIAs) via attention mechanism leakage, even when pre-trained on synthetic data. They introduce AMIA, a shadow-model-free attack exploiting transformer attention concentration patterns, achieving a 7.7% average gain over confidence-based attacks. A k-anonymity-inspired inference-time defense reduces membership leakage by 50% against AMIA and 25% against confidence-based attacks with only 3.9% performance degradation. The paper also shows fine-tuning amplifies memorization risk through confidence shifts.

Evaluation and Benchmarking AI Safety Research Membership Inference Attack Privacy Vulnerabilities of Attention Layers in Tabular Foundation Models and Protection of High-Risk Queries AMIA

5arXiv · cs.CL·Jun 5, 2026·source ↗

PropMe framework distinguishes memorization capability from propensity in LLMs

A new arXiv preprint introduces PropMe, a framework that separates whether LLMs can be forced to reproduce training data (capability) from whether they do so under ordinary use (propensity). The authors also release SimpleTrace, a lightweight pipeline using infini-gram to attribute model outputs to training corpora. Evaluating two open models on Common Pile and Dynaword, they find a consistent gap: adversarial prefix attacks elicit strong memorization, but propensity scores remain low in non-adversarial settings. The paper argues memorization audits should report both worst-case extractability and ordinary leakage propensity.

Evaluation and Benchmarking AI Safety Research PropMe SimpleTrace Dynaword +4 more

5arXiv · cs.LG·May 26, 2026·source ↗

Squeezing Capacity from MLLMs for Subject-driven Image Generation via Dual Layer Aggregation

This paper proposes conditioning diffusion models on Multimodal Large Language Models (MLLMs) that jointly encode text and reference images, augmented with VAE-based identity conditioning to address copy-paste artifacts and identity preservation failures in subject-driven image generation. A Dual Layer Aggregation (DLA) module aggregates multi-level MLLM features, and a multi-stage denoising strategy progressively balances semantic and fine-detail identity signals during inference. Experiments show improved human preference scores on subject-driven generation benchmarks compared to prior approaches that encode text and reference images separately.

Agent and Tool Ecosystem Multimodal Progress Multimodal Large Language Models Dual Layer Aggregation (DLA)Subject-driven Image Generation +3 more

6arXiv · cs.AI·Jun 16, 2026·source ↗

Causal auditing framework detects privacy disclosures in synthetic data without model access

A new arXiv preprint introduces a model-agnostic empirical framework for auditing synthetic data generated by LLMs and generative AI systems for privacy leakage. The framework distinguishes 'true disclosures' (direct reproduction of user data) from 'phantom disclosures' (incidental generation), using held-out control sets and statistical hypothesis testing without requiring model access, canary insertion, or shadow model training. It functions as a membership inference attack and provides empirical lower bounds on privacy leakage that are tighter than prior data-based auditing methods. The approach is computationally lightweight and applicable to any synthetic data generation mechanism.

Evaluation and Benchmarking AI Safety Research Differential Privacy Phantoms and Disclosures: a Causal Framework for Auditing Synthetic Data

4arXiv · cs.LG·19h ago·source ↗

Theoretical analysis shows MIM more robust than contrastive learning in distributed self-supervised learning under non-IID data

A new arXiv preprint provides a rigorous theoretical analysis of distributed self-supervised learning (D-SSL) frameworks under non-IID (heterogeneous) data conditions. The key findings are that Masked Image Modeling (MIM) is inherently more robust to data heterogeneity than Contrastive Learning (CL), and that federated learning is no less robust than fully decentralized learning due to network connectivity effects. The authors also introduce MAR loss, a refinement of the MIM objective with local-to-global alignment regularization, validated across multiple architectures and distributed settings.

Training Infrastructure Understanding the Robustness of Distributed Self-Supervised Learning Frameworks Against Non-IID Data MAR loss Masked Image Modeling

5arXiv · cs.CL·Jun 23, 2026·source ↗

Gazer: Training-free semantic correction for autoregressive visual models using MLLM feedback

Researchers introduce Gazer, a training-free framework that integrates multimodal large language model feedback into the sampling loop of autoregressive visual models (AVMs) to correct semantic errors during generation. The system operates in two stages: Reflective Diagnosis identifies semantic errors in intermediate generation states, and Semantic Correction rewinds and adjusts the generation trajectory to better match the target prompt. Experiments on compositional image and video benchmarks show improved semantic alignment and compositional accuracy across multiple AVMs without additional training. The work addresses a known weakness of next-scale prediction AVMs, where semantic errors accumulate across discrete generation scales.

Evaluation and Benchmarking Multimodal Progress Gazer Training-Free Semantic Correction for Autoregressive Visual Models

5arXiv · cs.CL·Jun 10, 2026·source ↗

Provenance-grounded gating and adaptive recovery improve synthetic post-training data curation

A controlled study examines two underexplored practices in synthetic post-training data pipelines: grounding filtering signals in source provenance and systematically recovering rejected samples rather than discarding them. Using adversarially injected corpora as ground-truth failure labels, the authors find that exact source provenance improves faithfulness gating for stronger judges, that hallucination and reward gates reject largely disjoint populations (making both necessary), and that adaptive recovery via failure diagnosis and targeted regeneration outperforms naive resampling. Generator scale is the primary driver of downstream fine-tuning quality, with filtration and recovery contributing meaningfully but secondarily.

Evaluation and Benchmarking Alignment and RLHF Provenance-Grounded Gating and Adaptive Recovery in Synthetic Post-Training Data Curation

6arXiv · cs.AI·May 29, 2026·source ↗

Reasoning in Memory (RiM): Latent Reasoning via Working Memory Blocks in LLMs

RiM introduces a latent reasoning method that replaces autoregressive chain-of-thought token generation with fixed sequences of special 'memory block' tokens, allowing LLMs to perform internal computation without externalizing intermediate steps. These memory blocks are processed in a single forward pass rather than generated autoregressively, improving compute efficiency at test time. Training uses a two-stage curriculum: first grounding memory blocks by predicting explicit reasoning steps, then discarding step-level supervision and refining answers iteratively. Experiments across multiple model families and sizes show RiM matches or exceeds existing latent reasoning methods.

Evaluation and Benchmarking Inference Economics latent reasoning Chain-of-Thought Reasoning Reasoning in Memory (RiM)+3 more