Entity · other

H100

otheractiveh100-395383c8·2 events·first seen May 19, 2026

Aliases: H100

Co-occurring entities

NVIDIA University of California San Diego Mamba Stanford University Arnuv Tandon UC Berkeley Gated DeltaNet-2 Sliding Window Attention Astera Institute Needle-in-a-Haystack Karan Dalal TTT-E2E The Pile NVIDIA DGX Cloud Hugging Face

More like this (12)

NVIDIA H100 H800 Nvidia A100 Horizon 1000 M2M100 Falcon-H1 GPT-OSS 120B Hcompany Qwen1.5-110B Apriel-H1 H Company HPRO

Recent events (2)

6The Batch·Jun 1, 2026·source ↗

Test-Time Training End-to-End (TTT-E2E) Retrains Model Weights to Handle Long Inputs

Researchers from Astera Institute, Nvidia, Stanford, UC Berkeley, and UC San Diego introduced TTT-E2E, a method that compresses long context into transformer weights by training the model during inference via meta-learning. The approach uses sliding-window attention restricted to 8,000 tokens and updates only the fully connected layers of the last quarter of the network on each 1,000-token chunk at inference time, keeping per-token generation latency roughly constant as context scales to 128,000 tokens. TTT-E2E slightly outperforms vanilla transformers on next-token prediction loss across long contexts and matches efficient architectures like Mamba 2 and Gated DeltaNet on inference speed, but fails dramatically on Needle-in-a-Haystack retrieval beyond 8,000 tokens and incurs substantially higher training latency. The work reframes long-context handling as a training-inference trade-off rather than an architectural design problem.

Training Infrastructure Long Context Evolution University of California San Diego Mamba Stanford University +13 more

5Hugging Face Blog·May 19, 2026·source ↗

Easily Train Models with H100 GPUs on NVIDIA DGX Cloud

Hugging Face announced integration with NVIDIA DGX Cloud, enabling users to train models on H100 GPU clusters directly through the Hugging Face platform. The partnership simplifies access to high-end training infrastructure without requiring users to manage cloud provisioning themselves. This represents a continued push to lower the barrier to large-scale model training for the broader ML community.

Training Infrastructure Inference Economics NVIDIA NVIDIA DGX Cloud H100 +2 more