Almanac
← Events
4Hugging Face Blog·1mo ago

Controlling Language Model Generation with NVIDIA's LogitsProcessorZoo

NVIDIA's LogitsProcessorZoo is a library providing a collection of logits processors for fine-grained control over language model text generation. The blog post, published on Hugging Face, covers how these processors can constrain, guide, or modify token sampling distributions at inference time. This tooling is relevant for applications requiring structured outputs, constrained decoding, or specialized generation behaviors without retraining.

Related guides (4)

Related events (8)

6arXiv · cs.CL·5d ago·source ↗

LOGOS: A unified autoregressive foundation model for natural science tasks across domains

Researchers introduce LOGOS (Language Of Generative Objects in Science), a generative language model that encodes heterogeneous scientific objects and spatial interactions as discrete token sequences within a single autoregressive framework, avoiding explicit coordinates or geometric neural networks. Models are trained at 1B, 3B, and 8B parameter scales and consistently match or outperform domain-specific baselines across diverse scientific tasks. The work argues that AI for Science should converge on shared architectures and training paradigms with LLMs rather than maintaining a separate technical stack. Model weights are released publicly.

4Hugging Face Blog·1mo ago·source ↗

Optimizing your LLM in production

A Hugging Face blog post covering practical techniques for optimizing large language models in production environments. The post likely addresses inference efficiency methods such as quantization, batching, caching, and hardware utilization strategies. It serves as a practitioner-oriented guide for deploying LLMs at scale.

5Hugging Face Blog·1mo ago·source ↗

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

This Hugging Face blog post details a workflow for fine-tuning NVIDIA's Cosmos Predict 2.5 world model using LoRA and DoRA parameter-efficient techniques for robot video generation tasks. The post covers practical implementation steps for adapting the foundation video model to robotics-specific domains. This represents a concrete application of world models to embodied AI, where synthetic video generation can support robot training data pipelines.

5Hugging Face Blog·29d ago·source ↗

Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models

NVIDIA's Nemotron-Labs introduces diffusion-based language models targeting extremely fast text generation, published as a Hugging Face blog post. The piece covers the approach of using diffusion processes for language modeling as an alternative to autoregressive generation, with a focus on inference speed. This represents a continued push by NVIDIA's research arm into non-autoregressive generation paradigms.

4Hugging Face Blog·1mo ago·source ↗

Open-Source Text Generation & LLM Ecosystem at Hugging Face

Hugging Face published a blog post surveying the open-source LLM ecosystem as of mid-2023, covering text generation models, tooling, and deployment patterns available on the platform. The post highlights the breadth of open-weight models and associated infrastructure for inference and fine-tuning. It serves as a reference overview of the state of open-source LLMs at that point in time.

5Hugging Face Blog·1mo ago·source ↗

Assisted Generation: a new direction toward low-latency text generation

Hugging Face introduces assisted generation (speculative decoding) as a practical technique for reducing LLM inference latency. The approach uses a smaller draft model to propose token candidates that a larger model then verifies in parallel, enabling multiple tokens to be accepted per forward pass. The blog post explains the mechanism and demonstrates integration into the Hugging Face Transformers library.

6Hugging Face Blog·1mo ago·source ↗

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

A Hugging Face blog post surveys 16 open-source reinforcement learning libraries for LLM training, analyzing their architectural approaches to async and synchronous token generation pipelines. The piece distills practical lessons about throughput, scalability, and design trade-offs across the ecosystem. It serves as a comparative landscape analysis for practitioners building or choosing RL training infrastructure for language models.

4Hugging Face Blog·1mo ago·source ↗

Fast Inference on Large Language Models: BLOOMZ on Habana Gaudi2 Accelerator

This Hugging Face blog post covers deploying BLOOMZ, a large multilingual language model, on Intel's Habana Gaudi2 accelerator for inference. It benchmarks throughput and latency performance on Gaudi2 as an alternative to GPU-based inference. The post is part of ongoing work to demonstrate non-NVIDIA hardware options for large model deployment.