Entity · technique

Flow Matching

techniqueactiveflow-matching-30edbf16·6 events·first seen May 18, 2026

Aliases: Flow Matching, flow-matching

Co-occurring entities

More like this (12)

flow-matching diffusion transformer Flow flow-matching decoder Riemannian conditional flow matching normalizing flows pyMatching Positive-Direction Matching Positive-Direction Matching Matching Principle Self-Flow MeanFlow SINT-Flow

Recent events (6)

4arXiv · cs.LG·4d ago·source ↗

TemporalSinkhorn: Parallel-in-Time Certified Sinkhorn for Dynamic Entropic Optimal Transport

A new arXiv preprint introduces TemporalSinkhorn, a parallel-in-time algorithm for solving dynamic entropic optimal transport problems with certified correctness guarantees. The method batches future Sinkhorn candidates and uses a deterministic safe-prefix certificate to ensure no inaccurate outputs are authorized, achieving 1.42x–3.55x speedups over sequential baselines on synthetic streams and 3.05x–3.63x speedups on Flow Matching minibatch streams on 4 A100 GPUs. The work is directly relevant to optimal-transport Flow Matching, a technique used in generative model training pipelines.

Training Infrastructure Inference Economics TemporalSinkhorn Flow Matching Certified Parallel-in-Time Sinkhorn for Dynamic Entropic Optimal Transport

4arXiv · cs.AI·Jun 29, 2026·source ↗

DEFAR framework uses exposure bias signals to self-rectify Flow Matching during training

A new arXiv preprint introduces DEFAR (DirEctional-Frequency Adaptive Rectification), a training framework for Flow Matching generative models that addresses exposure bias — the train/inference discrepancy — by extracting dynamic correction signals from the bias itself. The method has two components: Anti-Drift Rectification (ADR), which steers deviated inference states back toward targets, and Frequency Compensation (FC), which reinforces missing low-frequency components using bias as a self-feedback weight. Experiments on CIFAR-10, CelebA-64, and ImageNet-256/512 show improvements over prior baselines with favorable scalability and inference robustness.

Frequency Compensation CIFAR-10 Anti-Drift Rectification +3 more

5arXiv · cs.AI·Jun 25, 2026·source ↗

Two-stage action prior pretraining improves cross-embodiment VLA robot manipulation

Researchers propose a two-stage training framework for Vision-Language-Action (VLA) models that pretrains the action module with motion priors before cross-modal alignment begins. Stage 1 uses a flow-matching-based encoder-decoder to learn temporal motion structure from unconditioned action trajectories alone; Stage 2 transfers this prior to VLA training via decoder reuse and latent distillation. Evaluated across 13 cross-embodiment tasks in simulation and real-world settings, the approach achieves faster convergence, higher success rates, and notably better performance in data-scarce real-world scenarios compared to VLA training without action priors.

Agent and Tool Ecosystem Multimodal Progress Learning Action Priors for Cross-embodiment Robot Manipulation Vision-Language-Action model Flow Matching

5The Batch·May 29, 2026·source ↗

Meta Research Improves Image Generation via Staged Planning and Self-Revision Fine-Tuning

Researchers from Meta and collaborating universities propose a fine-tuning method that teaches image generators to compose images through discrete plan-sketch-inspect-refine cycles rather than generating all at once. Starting from BAGEL-7B, they construct ~62,000 training examples using GPT-4o and FLUX.1 Kontext to supervise each stage, achieving 83% on GenEval versus 77% for the base model and a competing method (PARM) that required 11x more training data and ~8x more inference steps. The approach improves spatial relationship accuracy, object attribute fidelity, and real-world knowledge grounding in generated images.

Evaluation and Benchmarking Agent and Tool Ecosystem University of California San Diego WISE FLUX.1 Kontext +10 more

7arXiv · cs.AI·May 29, 2026·source ↗

GPIC: Stanford Releases 28-Trillion-Pixel Permissively Licensed Image Corpus for Visual Generation Research

Stanford Vision Lab introduces GPIC, a Giant Permissive Image Corpus of approximately 28 trillion pixels comprising 100M training, 200K validation, and 1M test images, all permissively licensed for research and commercial use. Images are captioned by a state-of-the-art vision-language model, safety-filtered, deduplicated, and hosted on Hugging Face. The release includes a benchmarking protocol for generative modeling and a reference baseline using pixel-space flow matching. The dataset addresses a key gap in scalable visual generative modeling research by providing a large, stable, and openly licensed resource.

Training Infrastructure Evaluation and Benchmarking GPIC Stanford Vision Lab Flow Matching +3 more

5arXiv · cs.LG·May 18, 2026·source ↗

Dynamics-Level Watermarking of Flow Matching Models with Random Codes

This paper proposes embedding watermarks directly into the velocity field (continuous dynamics) of flow matching generative models, rather than into weights or outputs. The method uses key-dependent perturbations added during training, formulated as random coding over a continuous channel, allowing black-box message recovery at detection time. The perturbation is designed to leave the generated distribution unchanged. Experiments on MNIST and CIFAR-10 demonstrate reliable message recovery, preserved generation quality, and chance-level decoding without the secret key.

Evaluation and Benchmarking AI Safety Research MNIST CIFAR-10 Random Coding +2 more