H100
h100-395383c8·2 events·first seen 28d agoAliases: H100
Co-occurring entities
More like this (12)
Recent events (2)
Easily Train Models with H100 GPUs on NVIDIA DGX Cloud
Hugging Face announced integration with NVIDIA DGX Cloud, enabling users to train models on H100 GPU clusters directly through the Hugging Face platform. The partnership simplifies access to high-end training infrastructure without requiring users to manage cloud provisioning themselves. This represents a continued push to lower the barrier to large-scale model training for the broader ML community.
Test-Time Training End-to-End (TTT-E2E) Retrains Model Weights to Handle Long Inputs
Researchers from Astera Institute, Nvidia, Stanford, UC Berkeley, and UC San Diego introduced TTT-E2E, a method that compresses long context into transformer weights by training the model during inference via meta-learning. The approach uses sliding-window attention restricted to 8,000 tokens and updates only the fully connected layers of the last quarter of the network on each 1,000-token chunk at inference time, keeping per-token generation latency roughly constant as context scales to 128,000 tokens. TTT-E2E slightly outperforms vanilla transformers on next-token prediction loss across long contexts and matches efficient architectures like Mamba 2 and Gated DeltaNet on inference speed, but fails dramatically on Needle-in-a-Haystack retrieval beyond 8,000 tokens and incurs substantially higher training latency. The work reframes long-context handling as a training-inference trade-off rather than an architectural design problem.