paper

Sumi: Open Uniform Diffusion Language Model from Scratch

paperactiveprovisionalsumi-open-uniform-diffusion-language-model-from-scratch-d824a709·1 events·first seen 2d ago

Aliases: Sumi: Open Uniform Diffusion Language Model from Scratch

Co-occurring entities

More like this (12)

Diffusion Language Models LESS: Mutual-Stability Sampling for Diffusion Language Models continuous diffusion language model Self-Augmenting Retrieval for Diffusion Language Models Knowledge Editing in Masked Diffusion Language Models Dango: A Strictly L1-Only Large Language Model for Studying Second Language Acquisition discrete diffusion models Masked Diffusion Models Transformer Language Models Beyond Fully Random Masking: Attention-Guided Denoising and Optimization for Diffusion Language Models Block-Size Curriculum Learning for Diffusion Reasoning Models Civil Court Simulation with Large Language Models

Recent events (1)

6arXiv · cs.CL·2d ago·source ↗

Sumi: First open 7B uniform diffusion language model pretrained from scratch at scale

Researchers introduce Sumi, a fully open 7B uniform diffusion language model (UDLM) pretrained from scratch on 1.5 trillion tokens — the first UDLM at both large parameter scale and large token budget. Sumi performs competitively with autoregressive models on knowledge, reasoning, and coding benchmarks, though underperforms on commonsense tasks, attributed partly to an education-heavy data mixture. Model weights, checkpoints, and full training recipe including data mixture specification are released publicly. The work fills a gap in the diffusion language model landscape, providing a reference point for studying scaling behavior and generation dynamics in uniform diffusion.

Frontier Model Releases Open Weights Progress Sumi Sumi: Open Uniform Diffusion Language Model from Scratch