Almanac
← Events
5Hugging Face Blog·1mo ago

We Got Claude to Build CUDA Kernels and Teach Open Models

A Hugging Face blog post describes using Claude to generate CUDA kernels and then distilling that knowledge into open-weight models. The approach combines LLM-assisted low-level GPU programming with knowledge transfer to smaller open models. This sits at the intersection of AI-assisted systems programming and open-weights capability improvement.

Related guides (4)

Related events (8)

5Hugging Face Blog·1mo ago·source ↗

Custom CUDA Kernels for All from Codex and Claude

A Hugging Face blog post describes using AI coding agents (Codex and Claude) to automatically generate custom CUDA kernels, lowering the barrier to GPU kernel development. The piece demonstrates agent-assisted GPU programming as a practical workflow for ML practitioners. This represents a concrete application of AI coding tools to the specialized domain of CUDA/GPU optimization.

5Hugging Face Blog·1mo ago·source ↗

We Got Claude to Fine-Tune an Open Source LLM

Hugging Face demonstrates using Claude (Anthropic's model) as an orchestrating agent to autonomously fine-tune an open-source LLM, showcasing an agentic workflow for model training. The post illustrates how a frontier model can handle the end-to-end process of dataset preparation, training configuration, and execution for a smaller open-weights model. This represents a practical example of AI-assisted ML engineering and agent-tool ecosystem development.

4Hugging Face Blog·1mo ago·source ↗

Generate Images with Claude and Hugging Face via MCP

Hugging Face published a blog post demonstrating how to use Claude with the Model Context Protocol (MCP) to generate images through Hugging Face's inference infrastructure. The integration allows Claude to call Hugging Face image generation models as tools via MCP, connecting frontier LLMs with open-weight diffusion models. This represents a practical example of the agent-tool ecosystem pattern where LLMs orchestrate specialized model endpoints.

5Hugging Face Blog·1mo ago·source ↗

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

This Hugging Face blog post details a workflow for fine-tuning NVIDIA's Cosmos Predict 2.5 world model using LoRA and DoRA parameter-efficient techniques for robot video generation tasks. The post covers practical implementation steps for adapting the foundation video model to robotics-specific domains. This represents a concrete application of world models to embodied AI, where synthetic video generation can support robot training data pipelines.

4Hugging Face Blog·1mo ago·source ↗

From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels

Hugging Face published a guide on building and scaling production-ready CUDA kernels, covering the full workflow from development to deployment. The post targets ML engineers who need to write custom GPU kernels for inference optimization and production workloads. It addresses practical concerns around kernel compilation, testing, and integration with existing ML frameworks.

4Openai Blog·1mo ago·source ↗

OpenAI Releases Block-Sparse GPU Kernels for Sparse Neural Networks

OpenAI released optimized GPU kernels targeting block-sparse neural network architectures, claiming orders-of-magnitude speedups over cuBLAS and cuSPARSE depending on sparsity level. The kernels were applied to achieve state-of-the-art results in text sentiment analysis and generative modeling of text and images. This release represents an early infrastructure contribution toward efficient sparse computation in deep learning.

7The Batch·38h ago·source ↗

Nvidia Nemotron 3 Ultra: hybrid Mamba-transformer open-weights model targeting agentic workloads

Nvidia released Nemotron 3 Ultra, a 550B parameter (55B active) hybrid Mamba-transformer mixture-of-experts model with a 1M token context window, publishing weights, training data, and RL environments under an open license. The model ranks as the highest-scoring U.S. open-weights model on the Artificial Analysis Intelligence Index (47.7-48.2) and is approximately three times faster than comparable open-weights rivals, though it trails leading Chinese models like Kimi K2.6 and DeepSeek V4 Pro on intelligence benchmarks. Nvidia used a novel Multi-Teacher On-Policy Distillation approach with 10+ specialized teacher models and trained using NVFP4 quantization. The release is strategically motivated by Nvidia's interest in a healthy open-weights ecosystem that drives AI semiconductor adoption.

6Hugging Face Blog·1mo ago·source ↗

NVIDIA's GTC 2025 Announcement for Physical AI Developers: New Open Models and Datasets

NVIDIA announced new open models and datasets for physical AI development at GTC 2025, covered via the Hugging Face blog. The release targets robotics and embodied AI developers with open-weights resources. This represents NVIDIA's continued push into the physical AI ecosystem alongside its hardware dominance.