Weave of Formal Thought: Sound-and-complete constrained decoding with learned latent syntax for code LLMs
The paper introduces Weave of Formal Thought (WoFT), a framework combining a formally sound-and-complete constrained decoder for code generation with a latent-variable fine-tuning method that teaches LLMs to interleave grammar non-terminals during generation. The constrained decoder extends generalized LR (GLR) parsing with speculative lexing to handle context-sensitive lexing and maximal-munch tokenization, addressing gaps in prior constrained-decoding work. A reweighted wake-sleep (RWS) fine-tuning objective on StarCoder2-3B achieves a 14.3% relative reduction in per-token cross-entropy over a text-only SFT baseline on Python, suggesting that explicit structural scaffolding recovers information lost in flat autoregressive training.
Related guides (2)
Related events (8)
NF-CoT: Latent reasoning with normalizing flows preserves autoregressive LLM advantages
Researchers propose NF-CoT, a latent reasoning framework that replaces discrete chain-of-thought token streams with continuous intermediate states modeled by normalizing flows embedded inside an LLM backbone. The approach uses a TARFlow-style normalizing flow head alongside the standard language model head, enabling exact likelihoods, KV-cache-compatible left-to-right decoding, and policy-gradient optimization in latent space. On code-generation benchmarks, NF-CoT improves pass rates over both explicit CoT and prior latent-reasoning baselines while reducing intermediate reasoning cost. The work addresses a key limitation of existing latent reasoning methods, which typically sacrifice probabilistic tractability or autoregressive compatibility.
IS-CoT framework addresses long-form generation collapse in LLMs via interleaved structural thinking
Researchers introduce IS-CoT (Interleaved Structural Chain-of-Thought), a framework that embeds a dynamic Plan-Write-Reflect cycle into LLM generation to overcome severe length collapse observed in reasoning-enhanced models for open-ended writing tasks beyond 2,000 words. The authors construct a multi-teacher training dataset of interleaved reasoning traces and train IS-Writer-8B, which achieves state-of-the-art results on LongBench-Write, outperforming DeepSeek-V3.2 by 3.08 points. The work identifies static hierarchical planning as a root cause of long-form degradation and proposes an in-model alternative to external agentic workflows.
ASRD: Training-free anchor-guided revocable decoding for diffusion LLMs improves accuracy and throughput
A new arXiv preprint introduces ASRD (Anchor Supervised Revocable Decoding), a training-free framework for improving decoding quality in diffusion large language models. The method addresses error propagation and local error reinforcement in revocable decoding by separating trusted 'anchor tokens' (identified via temporal consistency) from uncertain candidates, then applying anchor-guided generation and anchor-perturbed verification. Experiments on math and coding benchmarks show up to 6.4% accuracy improvement and 7.2× inference throughput gains over remasking baselines.
Variance-Calibrated Modulation (VCM): training-free decoding intervention to address LLM likelihood trap
Researchers propose Variance-Calibrated Modulation (VCM), a training-free pre-decoding method that reshapes LLM probability distributions before truncation to combat repetitive degeneration and vocabulary dullness. VCM combines two mechanisms: Contextual Searchlight via PMI (suppressing stopwords, elevating context-relevant tokens) and Adaptive Self-Debiasing (scale-invariant penalization using real-time logit standard deviation). Evaluated across open-ended generation, factual QA, and mathematical reasoning, VCM improves diversity, coherence, and reasoning accuracy at higher temperatures with negligible overhead. The method is compatible with existing decoding strategies like Top-p and Min-p.
Trajectory Analysis of Masked Diffusion LMs for Graph-to-Text Generation with Lambda-Scaled Structural Decoding
This paper presents the first systematic study of masked diffusion language models (MDLMs) for graph-to-text generation, analyzing the order in which tokens are unmasked during iterative decoding. The authors find MDLMs naturally unmask entities first, then relational/function words, then structural tokens—a pattern disrupted by supervised fine-tuning, which prematurely anchors structural tokens and causes hallucination or omission. They propose lambda-scaled structural decoding, a training-free inference-time fix that recovers +9.4 BLEU-4, and introduce Graph-LLaDA, which integrates a Graph Transformer encoder into LLaDA's decoding process. Cross-dataset evaluation on the LAGRANGE benchmark shows prior baselines overfit to dataset-specific patterns while MDLM-based approaches generalize better.
FMLM+ introduces Posterior Refinement for fast non-autoregressive language generation
Researchers introduce FMLM+, a framework combining Flow Map Language Models with masking-style noise schedules to enable joint sequence generation with per-token global consistency scoring. The key contribution is Posterior Refinement, an inference-time self-correction strategy that matches discrete baseline performance with 32x fewer neural function evaluations (NFEs). The approach improves the speed-quality tradeoff over both Masked Diffusion Models and standard FLMMs across multiple benchmarks, addressing longstanding factorization error problems in non-autoregressive generation.
StarCoder: A State-of-the-Art LLM for Code
Hugging Face and ServiceNow released StarCoder, a large language model for code trained on permissively licensed data from The Stack dataset. The model targets code generation, completion, and understanding tasks and is positioned as an open-weights alternative to proprietary code models. The release includes model weights, training details, and an associated technical report.
LeVo 2: Hybrid LLM-Diffusion framework for stable full-length song generation with hierarchical modeling
LeVo 2 is a new hybrid LLM-Diffusion system for controllable full-length song generation that addresses the coherence-vs-acoustics trade-off through hierarchical token prediction: a language model handles semantic planning via mixed tokens, then predicts vocal and accompaniment tracks in parallel, while a diffusion-based codec reconstructs waveforms. A key contribution is an aesthetics-guided progressive post-training schedule combining SFT, offline DPO, and semi-online DPO to separately optimize quality, controllability, and musicality. Expert listening tests show LeVo 2 outperforms open-source baselines across six subjective dimensions and approaches leading commercial systems on several metrics.

