Entity · dataset

FineWeb-Edu

datasetactivefineweb-edu-470afb4f·3 events·first seen May 22, 2026

Aliases: FineWeb-Edu

Co-occurring entities

Möbius RoPE Anti-Periodic Positional Encoding: Möbius Boundary Conditions Make In-Context Retrieval Reliable Co-LMLM: Continuous-Query Limited Memory Language Models GPT-4o mini Claude Sonnet 4.5 CO-LMLM SimpleQA NVIDIA Labs Mamba WY Algorithm Delta Rule Gated DeltaNet-2 Kimi Delta Attention RULER

More like this (12)

FineWeb FineVideo Online-Mind2Web Mind2Web OpenSciEd iLearn-Lab ChatGPT Edu FNet FedLAB FEniCS DigitalCoach Canva Education

Recent events (3)

5arXiv · cs.CL·Jul 24, 2026·source ↗

Möbius RoPE: Anti-periodic positional encoding improves in-context retrieval reliability

A new arXiv preprint introduces Möbius RoPE, a rotary positional encoding variant using an anti-periodic frequency ladder (θ_i = π(2i+1)/N) that couples sequence endpoints via a closed-form Dirichlet 'dipole' — claimed as the first anti-periodic boundary condition in positional encoding. The authors pretrain 48 models (160M and 410M parameter classes) on 2B FineWeb-Edu tokens and find that a hybrid Möbius arm raises needle-in-a-haystack retrieval from 63.3% to 90.3% at context length 512 with no perplexity cost (29.66 vs. 29.72). The effect is isolated to the anti-periodic structure specifically, as aperiodic and periodic ladder controls do not reproduce it, and swapping the frequency table back to standard RoPE collapses retrieval performance.

Long Context Evolution Evaluation and Benchmarking Möbius RoPE FineWeb-Edu Anti-Periodic Positional Encoding: Möbius Boundary Conditions Make In-Context Retrieval Reliable

6arXiv · cs.LG·Jul 9, 2026·source ↗

Co-LMLM: Continuous-query limited memory language models outperform vanilla LLMs on factual tasks at small scale

Researchers introduce CO-LMLM, a limited memory language model that externalizes factual knowledge to a knowledge base during pretraining and retrieves it at inference via continuous vector queries paired with human-readable text values. The approach removes prior restrictions to relational knowledge bases and Wikipedia-only data by introducing an annotation pipeline for arbitrary text. At 360M parameters, CO-LMLM achieves lower perplexity than models trained on 40x more data and SimpleQA factual performance comparable to GPT-4o mini and above Claude Sonnet 4.5, suggesting significant efficiency gains for factual grounding.

Evaluation and Benchmarking Open Weights Progress Co-LMLM: Continuous-Query Limited Memory Language Models GPT-4o mini Claude Sonnet 4.5 +4 more

7arXiv · cs.AI·May 22, 2026·source ↗

Gated DeltaNet-2: Decoupling Erase and Write Gates in Linear Attention

Gated DeltaNet-2 is a new linear attention architecture from NVIDIA Labs that separates the erase and write operations in the delta-rule update into independent channel-wise gates, generalizing both Gated DeltaNet and Kimi Delta Attention (KDA). The model introduces a chunkwise WY algorithm with channel-wise decay and a gate-aware backward pass for efficient parallel training. At 1.3B parameters trained on 100B FineWeb-Edu tokens, it outperforms Mamba-2, Gated DeltaNet, KDA, and Mamba-3 variants on language modeling, commonsense reasoning, and long-context RULER needle-in-a-haystack retrieval benchmarks. Code is publicly released via NVlabs on GitHub.

Training Infrastructure Long Context Evolution NVIDIA Labs Mamba WY Algorithm +7 more