paper
Sumi: Open Uniform Diffusion Language Model from Scratch
paperactiveprovisional
sumi-open-uniform-diffusion-language-model-from-scratch-d824a709·1 events·first seen 2d agoAliases: Sumi: Open Uniform Diffusion Language Model from Scratch
Co-occurring entities
More like this (12)
Diffusion Language ModelsLESS: Mutual-Stability Sampling for Diffusion Language Modelscontinuous diffusion language modelSelf-Augmenting Retrieval for Diffusion Language ModelsKnowledge Editing in Masked Diffusion Language ModelsDango: A Strictly L1-Only Large Language Model for Studying Second Language Acquisitiondiscrete diffusion modelsMasked Diffusion ModelsTransformer Language ModelsBeyond Fully Random Masking: Attention-Guided Denoising and Optimization for Diffusion Language ModelsBlock-Size Curriculum Learning for Diffusion Reasoning ModelsCivil Court Simulation with Large Language Models
Recent events (1)
Sumi: First open 7B uniform diffusion language model pretrained from scratch at scale
Researchers introduce Sumi, a fully open 7B uniform diffusion language model (UDLM) pretrained from scratch on 1.5 trillion tokens — the first UDLM at both large parameter scale and large token budget. Sumi performs competitively with autoregressive models on knowledge, reasoning, and coding benchmarks, though underperforms on commonsense tasks, attributed partly to an education-heavy data mixture. Model weights, checkpoints, and full training recipe including data mixture specification are released publicly. The work fills a gap in the diffusion language model landscape, providing a reference point for studying scaling behavior and generation dynamics in uniform diffusion.