technique

Diffusion Models

techniqueactivediffusion-models-5b88d83f·9 events·first seen 1mo ago

Aliases: Diffusion Models, diffusion model

Co-occurring entities

More like this (12)

latent diffusion model discrete diffusion models Representation-Conditioned Diffusion Models diffusion-based generative models Masked Diffusion Models Diffusion Language Models Denoising Diffusion Probabilistic Models diffusion-based policy continuous diffusion language model Model Distillation Diffusers Survival Diffusion Probabilistic Model (SDPM)

Guides (1)

Diffusion ModelsConcept

Diffusion Models: How AI Learns to Paint by Unpainting

Read asBeginner In-depth

Recent events (9)

5arXiv · cs.LG·24d ago·source ↗

Representation-Conditioned Diffusion Models for Controllable Image Generation

This paper explores conditioning diffusion models on representations from pre-trained self-supervised models as an alternative to text prompts or semantic maps, which require large annotated datasets. The self-conditioning mechanism improves unconditional image generation quality and provides a controllable representation space. The authors identify directions of variation in this space and demonstrate smoothness and disentanglement properties, suggesting potential for fine-grained generative control without heavy annotation overhead.

Frontier Model Releases Multimodal Progress Representation-Conditioned Diffusion Models Self-Supervised Learning Disentangled Representation Learning +1 more

6arXiv · cs.LG·22d ago·source ↗

Finite-Sample Lens for Understanding Diffusion Posterior Sampler Failures

This paper introduces a finite-sample theoretical framework for analyzing diffusion model posterior samplers used in imaging inverse problems. The authors show that popular likelihood approximations at intermediate timesteps systematically under- or over-estimate posterior spread, leading to failure modes including sensitivity to early stopping, incorrect weighting of posterior modes, and hallucination of prior or likelihood modes. Crucially, they demonstrate these failures can arise from a multimodal prior alone, without requiring nonlinear measurement models or multimodal posteriors. The framework is model-agnostic and can serve as a diagnostic tool for evaluating existing and future posterior samplers.

Evaluation and Benchmarking AI Safety Research finite-sample posterior sampling framework likelihood approximation imaging inverse problems +3 more

5arXiv · cs.LG·19d ago·source ↗

KLIP: Localized OOD Detection in Inverse Problems via KL-Divergence with Diffusion Priors

KLIP proposes an out-of-distribution detection metric for computational imaging that computes KL-divergence between a diffusion model prior and the posterior distribution. Unlike prior approaches, it requires no calibration data or knowledge of the shifted distribution, and can both flag whole images and localize OOD patches within images. The method is validated on medical imaging tasks such as detecting liver tumors in CT scans and generalizes across diffusion model architectures, datasets, and inverse problem types.

Evaluation and Benchmarking AI Safety Research KLIP out-of-distribution detection computational imaging +3 more

7Openai Blog·1mo ago·source ↗

Simplifying, Stabilizing, and Scaling Continuous-Time Consistency Models

OpenAI has published research advancing continuous-time consistency models (sCMs), achieving sample quality comparable to leading diffusion models while requiring only two sampling steps. The work addresses prior instability and complexity issues in consistency model training. This represents a significant efficiency improvement for generative image synthesis, potentially enabling faster inference pipelines.

Inference Economics Multimodal Progress OpenAI Continuous-Time Consistency Models Diffusion Models

6Openai Blog·1mo ago·source ↗

Consistency Models

OpenAI introduces Consistency Models, a new generative modeling framework designed to address the slow iterative sampling process inherent in diffusion models. The approach aims to enable faster single-step or few-step generation for image, audio, and video synthesis. The post appears to be a research announcement or blog summary of the underlying technique.

Inference Economics Multimodal Progress Latent Consistency Models OpenAI Diffusion Models

5arXiv · cs.LG·18d ago·source ↗

Review: Generative Models, Multimodal Learning, and Closed-Loop Workflows in Inverse Materials Design

This arxiv review surveys recent advances in generative modeling for inverse materials design, covering variational autoencoders, normalizing flows, autoregressive models, and diffusion models applied to crystalline solid discovery. It examines how multimodal learning fuses crystal structures, thermodynamic data, spectroscopy, microscopy, and scientific text into transferable chemical-space representations. The paper also reviews closed-loop design pipelines integrating conditional generation with Bayesian optimization, reinforcement learning, and active learning, and identifies recurring failure modes including surrogate exploitation, diversity collapse, and the stability-synthesizability gap.

Evaluation and Benchmarking Agent and Tool Ecosystem Bayesian Optimization Multimodal Learning active learning +6 more

7Openai Blog·1mo ago·source ↗

Hierarchical Text-Conditional Image Generation with CLIP Latents (DALL-E 2 / unCLIP)

OpenAI published research on hierarchical text-conditional image generation using CLIP latents, the technique underlying DALL-E 2. The approach uses a prior network to map text embeddings to image embeddings, then a diffusion decoder to generate images from those embeddings. This represented a significant advance in text-to-image generation quality and semantic fidelity at the time of release.

Frontier Model Releases Multimodal Progress DALL·E 3 unCLIP OpenAI +2 more

5arXiv · cs.LG·25d ago·source ↗

Squeezing Capacity from MLLMs for Subject-driven Image Generation via Dual Layer Aggregation

This paper proposes conditioning diffusion models on Multimodal Large Language Models (MLLMs) that jointly encode text and reference images, augmented with VAE-based identity conditioning to address copy-paste artifacts and identity preservation failures in subject-driven image generation. A Dual Layer Aggregation (DLA) module aggregates multi-level MLLM features, and a multi-stage denoising strategy progressively balances semantic and fine-detail identity signals during inference. Experiments show improved human preference scores on subject-driven generation benchmarks compared to prior approaches that encode text and reference images separately.

Agent and Tool Ecosystem Multimodal Progress Multimodal Large Language Models Dual Layer Aggregation (DLA)Subject-driven Image Generation +3 more

5Openai Blog·1mo ago·source ↗

Improved Techniques for Training Consistency Models

OpenAI presents improved training techniques for consistency models, a class of generative models capable of producing high-quality samples in a single step without adversarial training. The work advances a nascent alternative to diffusion-based generation that trades multi-step sampling for single-step inference. The post originates from OpenAI's research blog, indicating continued investment in efficient generative modeling.

Inference Economics Multimodal Progress Latent Consistency Models OpenAI Diffusion Models