KLIP: Localized OOD Detection in Inverse Problems via KL-Divergence with Diffusion Priors
KLIP proposes an out-of-distribution detection metric for computational imaging that computes KL-divergence between a diffusion model prior and the posterior distribution. Unlike prior approaches, it requires no calibration data or knowledge of the shifted distribution, and can both flag whole images and localize OOD patches within images. The method is validated on medical imaging tasks such as detecting liver tumors in CT scans and generalizes across diffusion model architectures, datasets, and inverse problem types.
Related guides (3)
Related events (8)
Finite-Sample Lens for Understanding Diffusion Posterior Sampler Failures
This paper introduces a finite-sample theoretical framework for analyzing diffusion model posterior samplers used in imaging inverse problems. The authors show that popular likelihood approximations at intermediate timesteps systematically under- or over-estimate posterior spread, leading to failure modes including sensitivity to early stopping, incorrect weighting of posterior modes, and hallucination of prior or likelihood modes. Crucially, they demonstrate these failures can arise from a multimodal prior alone, without requiring nonlinear measurement models or multimodal posteriors. The framework is model-agnostic and can serve as a diagnostic tool for evaluating existing and future posterior samplers.
AUDITS: A Comprehensive Benchmark for Image Manipulation Localization Across Multiple Analysis Axes
Researchers introduce AUDITS (Analysis Under Domain-shifts, qualIty, Type, and Size), a benchmark of over 530K images designed to evaluate image manipulation detection across multiple axes including domain shift, manipulation type, and size. The dataset draws from user and news photos and incorporates recent diffusion-based inpaintings. Experiments assess the robustness of existing manipulation detection methods under various domain shifts, aiming to advance development of more generalizable detection approaches.
Exact Posterior Score (EPS): Closed-form posterior sampling for linear inverse problems with diffusion models
A new arXiv preprint derives the exact posterior score in closed form for linear Gaussian inverse problems under general Gaussian interpolants, showing that posterior sampling reduces to a denoising problem at an operator-dependent shifted pivot under anisotropic noise covariance. The authors convert this identity into a training objective called Exact Posterior Score (EPS) that preserves the input/output structure of standard diffusion pretraining, enabling training from scratch or fine-tuning from a pretrained denoiser. EPS is evaluated on five linear inverse problems across FFHQ and ImageNet, outperforming both training-free and training-based baselines while requiring roughly an order of magnitude fewer denoiser evaluations than gradient-based posterior samplers.
Vision-OPD: On-Policy Self-Distillation for Fine-Grained Visual Understanding in MLLMs
Vision-OPD addresses a 'regional-to-global perception gap' in multimodal LLMs, where models answer fine-grained visual questions more accurately when given cropped evidence regions than full images. The method instantiates a crop-conditioned teacher and full-image-conditioned student from the same MLLM, minimizing token-level divergence along on-policy rollouts to transfer regional perception to the full-image policy. This self-distillation requires no external teacher models, ground-truth labels, reward verifiers, or inference-time tools. Benchmarks show competitive or superior performance against larger open-source, closed-source, and agentic 'Thinking-with-Images' models.
PTL-Diffusion: Diffusion framework with periodic terminal laws for manifold-aware generation
PTL-Diffusion is a new diffusion modeling framework that replaces the standard single Gaussian terminal distribution with a periodic family of Gaussian terminal laws, embedding phase structure directly into the forward noising dynamics rather than only in the denoising network. The authors derive closed-form forward marginals and reverse posteriors for a periodically forced Ornstein-Uhlenbeck process, enabling standard noise-prediction training. Experiments on torus, cylinder, and face datasets show improvements in manifold-level distributional matching over DDPM baselines. The work is a proof-of-concept motivating structured terminal reference laws as a direction for geometry-aware generative modeling.
DRPO: Smooth divergence regularization replaces hard masking in LLM RL training
A new arXiv preprint proposes Divergence Regularized Policy Optimization (DRPO), a method that replaces the hard trust-region mask used in DPPO with a smooth advantage-weighted quadratic regularizer on policy shift. The approach addresses a known weakness in PPO and GRPO where importance ratios poorly proxy distributional shift in long-tailed vocabularies, and in DPPO where gradient signals are discarded rather than corrected at trust-region boundaries. Experiments across model scales, architectures, and precision settings show improved stability and efficiency in LLM RL post-training.
d-OPSD: First on-policy self-distillation framework tailored for diffusion LLMs
Researchers introduce d-OPSD, the first on-policy self-distillation (OPSD) framework designed specifically for diffusion large language models (dLLMs). The method addresses a fundamental mismatch between existing autoregressive OPSD approaches and dLLMs' arbitrary-order generation by using suffix conditioning on self-generated answers and step-level rather than token-level divergence supervision. Across four reasoning benchmarks, d-OPSD outperforms RLVR and SFT baselines while requiring only ~10% of the optimization steps of RLVR, suggesting strong sample efficiency gains for dLLM post-training.
Information-Driven Design of Imaging Systems
Researchers from Berkeley present a framework for evaluating and optimizing imaging systems based on mutual information content rather than traditional metrics like resolution or SNR, published at NeurIPS 2025. The method estimates mutual information directly from noisy measurements using known noise physics and learned probabilistic models (including transformers and PixelCNN), avoiding the need for task-specific decoders. Validated across four domains—color photography, radio astronomy, lensless imaging, and microscopy—the information metric predicts downstream decoder performance and enables hardware optimization with less compute and memory than end-to-end neural approaches.


