DEFAR framework uses exposure bias signals to self-rectify Flow Matching during training
A new arXiv preprint introduces DEFAR (DirEctional-Frequency Adaptive Rectification), a training framework for Flow Matching generative models that addresses exposure bias — the train/inference discrepancy — by extracting dynamic correction signals from the bias itself. The method has two components: Anti-Drift Rectification (ADR), which steers deviated inference states back toward targets, and Frequency Compensation (FC), which reinforces missing low-frequency components using bias as a self-feedback weight. Experiments on CIFAR-10, CelebA-64, and ImageNet-256/512 show improvements over prior baselines with favorable scalability and inference robustness.
Related events (8)
FlowEdit: Lifelong pronunciation adaptation for flow-matching TTS via associative memory
FlowEdit is a new framework enabling lifelong pronunciation correction in frozen flow-matching text-to-speech systems without retraining model weights. Corrections are stored as token-level perturbations in text embedding space within a Modern Hopfield Network, retrieved at inference via soft attention with fuzzy morphological matching. On a curated benchmark of 312 multilingual proper nouns across 18 language families, the method reduces target-word Phoneme Error Rate by 92.7% relative to the zero-shot baseline, with each correction completing in ~15 seconds on a single GPU.
FMLM+ introduces Posterior Refinement for fast non-autoregressive language generation
Researchers introduce FMLM+, a framework combining Flow Map Language Models with masking-style noise schedules to enable joint sequence generation with per-token global consistency scoring. The key contribution is Posterior Refinement, an inference-time self-correction strategy that matches discrete baseline performance with 32x fewer neural function evaluations (NFEs). The approach improves the speed-quality tradeoff over both Masked Diffusion Models and standard FLMMs across multiple benchmarks, addressing longstanding factorization error problems in non-autoregressive generation.
Dynamics-Level Watermarking of Flow Matching Models with Random Codes
This paper proposes embedding watermarks directly into the velocity field (continuous dynamics) of flow matching generative models, rather than into weights or outputs. The method uses key-dependent perturbations added during training, formulated as random coding over a continuous channel, allowing black-box message recovery at detection time. The perturbation is designed to leave the generated distribution unchanged. Experiments on MNIST and CIFAR-10 demonstrate reliable message recovery, preserved generation quality, and chance-level decoding without the secret key.
Reducing bias and improving safety in DALL·E 2
OpenAI announced a new technique applied to DALL·E 2 that adjusts image generation of people to better reflect global demographic diversity. The intervention targets representational bias in the model's outputs when generating human subjects. This is an early public example of a major lab deploying a post-training bias mitigation technique in a production image generation system.
ADAS: Attention-Discounted Adaptive Sampler improves parallel decoding for masked diffusion language models
Researchers propose ADAS, a training-free reranking rule for masked diffusion language model decoding that addresses token interaction failures in parallel token commitment. The method greedily penalizes candidates that attend strongly to already-selected uncertain positions, using attention weights as soft marginal penalties rather than hard constraints. Evaluated on LLaDA-8B-Base and Dream-7B-Base across GSM8K, MATH500, HumanEval, and MBPP, ADAS improves low-NFE performance by 9–10 percentage points on average when plugged into existing samplers with only 3.1% runtime overhead.
DanceOPD: On-policy generative field distillation for composing image generation capabilities
DanceOPD is a new training framework for flow-matching image generation models that addresses the conflict between text-to-image generation, local editing, and global editing capabilities. The approach routes each sample to a capability-specific velocity field, queries student-induced rollout states on-policy, and trains with a velocity MSE objective to compose multiple expert capabilities without mutual degradation. Experiments show improvements in multi-capability composition while preserving baseline generation quality. The method also absorbs operator-defined fields such as classifier-free guidance.
ASRD: Training-free anchor-guided revocable decoding for diffusion LLMs improves accuracy and throughput
A new arXiv preprint introduces ASRD (Anchor Supervised Revocable Decoding), a training-free framework for improving decoding quality in diffusion large language models. The method addresses error propagation and local error reinforcement in revocable decoding by separating trusted 'anchor tokens' (identified via temporal consistency) from uncertain candidates, then applying anchor-guided generation and anchor-perturbed verification. Experiments on math and coding benchmarks show up to 6.4% accuracy improvement and 7.2× inference throughput gains over remasking baselines.
AGDO: Attention-guided denoising and optimization framework improves diffusion language model reasoning
Researchers propose AGDO, a framework that replaces random masking in diffusion large language models (dLLMs) with attention-guided denoising order and token weighting during fine-tuning and reinforcement learning. The work is motivated by an empirical finding that tokens with stronger attention to unmasked context are more stable and critical for reasoning. Experiments on math and coding benchmarks show AGDO outperforms existing post-training methods for dLLMs, advancing the case for attention-aware training in parallel-decoding language models.