OpenWebText
openwebtext-cd4a59d7·2 events·first seen 28d agoAliases: OpenWebText
Co-occurring entities
More like this (12)
Recent events (2)
RePlaid: Continuous Diffusion Language Models Scale Competitively with Discrete Diffusion
This paper revisits continuous diffusion language models (DLMs) by introducing RePlaid, an updated version of Plaid that aligns its architecture with modern discrete DLMs. RePlaid establishes the first scaling law for continuous DLMs competitive with discrete approaches, achieving a compute gap of only 20× versus autoregressive models and a state-of-the-art perplexity bound of 22.1 on OpenWebText among continuous DLMs. The authors provide theoretical analysis showing that likelihood-based training naturally yields linear cross-entropy over time and creates structured embedding geometries, explaining the performance gains.
K-Forcing: Joint multi-token decoding via push-forward language modeling distillation
K-Forcing is a new inference acceleration paradigm that distills an autoregressive model into a push-forward mapping that generates k tokens per forward pass rather than one. The method uses progressive self-forcing distillation to match the teacher's sequence distribution, achieving 2.4–3.5x speedup at k=4 with modest quality degradation. Unlike speculative decoding, K-Forcing is designed to address high-load batch serving scenarios common in industrial deployment, while remaining compatible with standard AR infrastructure.