Almanac
model

Sumi

modelactiveprovisionalsumi-81bb6eb5·1 events·first seen 2d ago

Aliases: Sumi

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·2d ago·source ↗

Sumi: First open 7B uniform diffusion language model pretrained from scratch at scale

Researchers introduce Sumi, a fully open 7B uniform diffusion language model (UDLM) pretrained from scratch on 1.5 trillion tokens — the first UDLM at both large parameter scale and large token budget. Sumi performs competitively with autoregressive models on knowledge, reasoning, and coding benchmarks, though underperforms on commonsense tasks, attributed partly to an education-heavy data mixture. Model weights, checkpoints, and full training recipe including data mixture specification are released publicly. The work fills a gap in the diffusion language model landscape, providing a reference point for studying scaling behavior and generation dynamics in uniform diffusion.