Lightricks releases LTX-2 official Python inference and LoRA trainer package for audio-video generation
Lightricks has published the official Python package for LTX-2, an audio-video generative model, including both inference and LoRA fine-tuning capabilities. The repository has accumulated 7,474 stars with ongoing community traction. This represents a notable open-source multimodal generative model release combining audio and video synthesis.
Related guides (3)
Related events (8)
AudioLDM 2, but faster ⚡️
Hugging Face published a blog post on AudioLDM 2, a latent diffusion model for audio generation, with a focus on inference speed improvements. The post likely covers integration into the Diffusers library and optimization techniques for faster audio synthesis. AudioLDM 2 supports text-to-audio, text-to-music, and text-to-speech generation tasks.
AuRA: Distilling audio understanding into LLMs via LoRA adaptation
AuRA is a new method for integrating speech understanding into LLMs by distilling audio encoding capability directly into LoRA-adapted model weights, bypassing cascaded ASR-LLM pipelines. A lightweight audio embedding layer feeds speech to both an ASR encoder (teacher) and a LoRA-adapted LLM (student), with layer-wise distillation aligning hidden states. The approach claims to outperform cascaded systems, bridge-based adaptation baselines, and large-scale multimodal models on multiple speech-language benchmarks while enabling parallel end-to-end inference without large-scale multimodal training.
LoRA Training Scripts of the World, Unite!
Hugging Face published a blog post consolidating and comparing advanced LoRA fine-tuning scripts for Stable Diffusion XL, covering techniques such as pivotal tuning, custom captions, and various regularization strategies. The post aims to unify fragmented community training approaches into a more coherent set of best practices. It serves as a practical guide for practitioners fine-tuning SDXL models with LoRA adapters.
TGI Multi-LoRA: Deploy Once, Serve 30 Models
Hugging Face's Text Generation Inference (TGI) introduces Multi-LoRA serving, enabling a single base model deployment to serve up to 30 fine-tuned LoRA adapters simultaneously. This approach reduces infrastructure costs by eliminating the need to deploy separate model instances per fine-tune. The feature targets enterprise use cases where multiple task-specific variants of a base model are needed in production.
Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation
This Hugging Face blog post details a workflow for fine-tuning NVIDIA's Cosmos Predict 2.5 world model using LoRA and DoRA parameter-efficient techniques for robot video generation tasks. The post covers practical implementation steps for adapting the foundation video model to robotics-specific domains. This represents a concrete application of world models to embodied AI, where synthetic video generation can support robot training data pipelines.
Lumos-Nexus: Efficient Frequency Bridging for Reasoning-Driven Video Generation
Lumos-Nexus is a training-efficient unified video generation framework that decouples training and inference to achieve high visual fidelity without prohibitive compute costs. During training, a lightweight generator is aligned with an understanding block; at inference, Unified Progressive Frequency Bridging (UPFB) hands off generation to a high-capacity pretrained generator in a shared latent space for coarse-to-fine refinement. The authors also introduce VR-Bench, a new benchmark for evaluating reasoning-driven video generation. Code and models are publicly released.
Code2LoRA: Hypernetwork generates repository-specific LoRA adapters for code models with zero token overhead
Code2LoRA is a hypernetwork framework that generates repository-specific LoRA adapters for code language models, eliminating the inference-time token overhead of RAG or long-context injection. It supports both static repository snapshots and evolving codebases via a GRU-backed adapter updated per code diff. The authors introduce RepoPeftBench, a new benchmark of 604 Python repositories with static and evolution tracks, on which Code2LoRA-Static matches per-repository LoRA fine-tuning upper bounds and Code2LoRA-Evo outperforms a shared LoRA by 5.2 percentage points.
torchtune: PyTorch Native Post-Training Library for LLMs
Meta's PyTorch team introduces torchtune, a PyTorch-native library for post-training LLMs that emphasizes modularity, hackability, and direct access to underlying PyTorch components. The library supports fine-tuning, experimentation, and deployment-oriented workflows across distributed training settings. Benchmarked against popular frameworks Axolotl and Unsloth, torchtune demonstrates competitive performance and memory efficiency while maintaining flexibility for research iteration. The paper presents design principles, model builders, training recipes, and distributed training stack details.


