Almanac
product

PyTorch

productactivepytorch-c54a4cdc·14 events·first seen 1mo ago

Aliases: PyTorch

Co-occurring entities

More like this (12)

Recent events (14)

5Openai Blog·28d ago·source ↗

OpenAI standardizes on PyTorch

OpenAI announced in January 2020 that it is standardizing its deep learning framework on PyTorch. This marks a consolidation away from any internal or alternative frameworks toward the widely-adopted open-source library. The move signals organizational alignment on tooling infrastructure for all future research and development.

4Hugging Face Blog·5d ago·source ↗

Hugging Face blog: Profiling PyTorch nn.Linear toward a fused MLP implementation

A Hugging Face blog post (Part 2 of a profiling series) walks through optimizing PyTorch's nn.Linear layers toward a fused MLP kernel. The post covers profiling methodology and kernel fusion techniques relevant to inference and training efficiency. This is a practical deep-dive into low-level PyTorch optimization for ML practitioners.

4Hugging Face Blog·28d ago·source ↗

Visualize and Understand GPU Memory in PyTorch

A Hugging Face blog post explains how to visualize and analyze GPU memory usage during PyTorch model training. The post covers tools and techniques for understanding memory allocation patterns, helping practitioners diagnose and reduce memory bottlenecks. This is practical infrastructure knowledge relevant to training large models efficiently.

4Hugging Face Blog·28d ago·source ↗

Accelerating PyTorch Transformers with Intel Sapphire Rapids - Part 2

This Hugging Face blog post covers inference optimization techniques for PyTorch Transformer models on Intel Sapphire Rapids (4th Gen Xeon) CPUs. It likely demonstrates performance gains using hardware-specific features such as AMX (Advanced Matrix Extensions) and BF16 support. The post is part of a series focused on making transformer inference more efficient on Intel server hardware without requiring GPU acceleration.

4Hugging Face Blog·28d ago·source ↗

Accelerating PyTorch Transformers with Intel Sapphire Rapids - Part 1

This Hugging Face blog post covers hardware-level inference acceleration for PyTorch Transformer models using Intel's Sapphire Rapids Xeon processors. It likely details how the new AVX-512 and AMX (Advanced Matrix Extensions) instructions in Sapphire Rapids can speed up transformer workloads without requiring GPU hardware. The post is part one of a series, suggesting a practical, tutorial-oriented treatment of CPU-based inference optimization.

4Hugging Face Blog·28d ago·source ↗

Accelerate Large Model Training using PyTorch Fully Sharded Data Parallel

This Hugging Face blog post explains how to use PyTorch's Fully Sharded Data Parallel (FSDP) to train large models that exceed single-GPU memory limits. It covers the integration of FSDP with the Hugging Face Accelerate library, enabling distributed sharding of model parameters, gradients, and optimizer states across multiple GPUs. The post provides practical guidance on configuration and usage for scaling large model training.

6arXiv · cs.AI·27d ago·source ↗

torchtune: PyTorch Native Post-Training Library for LLMs

Meta's PyTorch team introduces torchtune, a PyTorch-native library for post-training LLMs that emphasizes modularity, hackability, and direct access to underlying PyTorch components. The library supports fine-tuning, experimentation, and deployment-oriented workflows across distributed training settings. Benchmarked against popular frameworks Axolotl and Unsloth, torchtune demonstrates competitive performance and memory efficiency while maintaining flexibility for research iteration. The paper presents design principles, model builders, training recipes, and distributed training stack details.

5Hugging Face Blog·1mo ago·source ↗

Safetensors is Joining the PyTorch Foundation

The safetensors format, developed by Hugging Face as a secure and fast alternative to pickle-based model serialization, is being adopted under the PyTorch Foundation. This move formalizes safetensors as part of the broader PyTorch ecosystem, signaling growing standardization around safe model weight storage. The transition reflects increasing industry concern about supply-chain security in ML model distribution.

5Hugging Face Blog·28d ago·source ↗

How Hugging Face Accelerate Runs Very Large Models Thanks to PyTorch

This Hugging Face blog post explains the technical mechanisms behind the Accelerate library for running large models that exceed single-GPU memory, leveraging PyTorch features such as device maps, CPU/disk offloading, and sharded checkpoints. It describes how models can be distributed across multiple GPUs, CPU RAM, and disk storage transparently. The post serves as both documentation and a technical explainer for practitioners working with large-scale inference and deployment.

4Hugging Face Blog·28d ago·source ↗

Block Sparse Matrices for Smaller and Faster Language Models

This Hugging Face blog post introduces block sparse matrix techniques as a method to reduce the size and improve the inference speed of language models. Block sparsity enforces structured zero patterns in weight matrices, enabling hardware-friendly sparse operations compared to unstructured sparsity. The post likely covers implementation details and benchmarks showing efficiency gains for transformer-based models.

5Hugging Face Blog·28d ago·source ↗

Accelerate 1.0.0 Released

Hugging Face has released Accelerate 1.0.0, marking the library's first stable major version. Accelerate is a widely-used PyTorch training library that abstracts distributed training across hardware configurations including multi-GPU, TPU, and mixed-precision setups. The 1.0.0 milestone signals API stability and production readiness for the training infrastructure ecosystem.

4Hugging Face Blog·28d ago·source ↗

nanoVLM: Minimal Pure-PyTorch Repository for Training Vision-Language Models

Hugging Face published nanoVLM, a minimal open-source repository designed to make training vision-language models (VLMs) as simple as possible using pure PyTorch. The project aims to lower the barrier to entry for VLM research and experimentation by providing a clean, readable codebase without heavy abstractions. It follows in the tradition of educational ML repositories like nanoGPT, targeting researchers and practitioners who want to understand or customize VLM training from scratch.

5Hugging Face Blog·28d ago·source ↗

Quanto: a PyTorch quantization backend for Optimum

Hugging Face introduced Quanto, a new PyTorch-based quantization backend integrated into the Optimum library. Quanto supports multiple quantization schemes and data types, targeting efficient inference for large language models and other neural networks. The tool is designed to work across hardware backends and integrates with the Hugging Face ecosystem.

5Hugging Face Blog·28d ago·source ↗

Introducing 🤗 Accelerate

Hugging Face introduced Accelerate, a library designed to simplify distributed training of PyTorch models across multiple GPUs and TPUs with minimal code changes. The library abstracts away the complexity of multi-device training setups, allowing researchers to scale training with a few lines of code. This was a notable contribution to the ML training infrastructure ecosystem at the time of release.