MolmoMotion: Language-guided 3D motion forecasting from Allen AI
Allen AI published a blog post on Hugging Face introducing MolmoMotion, a system for language-guided 3D motion forecasting. The work extends the Molmo model family into motion prediction tasks, combining natural language conditioning with 3D spatial reasoning. The post appears to be an announcement or demonstration of the capability, though the body content was not available for detailed review.
Related guides (3)
Related events (8)
smolagents Now Supports Vision-Language Models
Hugging Face has added vision-language model (VLM) support to its smolagents framework, enabling agents to process and reason over visual inputs alongside text. This update extends the agentic tooling ecosystem to multimodal workflows. The announcement comes from the Hugging Face blog, which serves as the primary communication channel for the smolagents project.
AnyMo: Geometry-Aware Setup-Agnostic Framework for Wearable IMU Human Motion Understanding
AnyMo is a geometry-aware framework that addresses the setup-dependence problem in wearable IMU-based human motion modeling by using physics-grounded simulation over dense body-surface placements to generate synthetic training signals. It pre-trains a graph encoder from synthetic placement views and masked partial observations, then tokenizes multi-position IMU data into full-body motion tokens aligned with an LLM for motion-language understanding. Evaluated across zero-shot activity recognition (14 unseen datasets), cross-modal retrieval, and motion captioning, AnyMo improves average Accuracy/F1 by ~11.7%/11.6%, zero-shot retrieval MRR by 15.9–28.6%, and captioning BERT-F1 by 18.8%. The work positions itself as a generalist model for wearable motion understanding transferable across devices and sensing configurations.
SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data
Hugging Face introduces SmolVLA, a compact Vision-Language-Action model designed for robotics control, trained on community-contributed data from the LeRobot ecosystem. The model targets efficient deployment on resource-constrained hardware while maintaining competitive manipulation performance. This release represents a continuation of Hugging Face's strategy to democratize robotics AI through open community data pipelines.
SmolVLM2: Bringing Video Understanding to Every Device
Hugging Face introduces SmolVLM2, a family of compact vision-language models designed for video understanding on resource-constrained devices. The models extend the SmolVLM line with video comprehension capabilities while maintaining small footprints suitable for edge and on-device deployment. The release targets democratizing multimodal video understanding beyond cloud-only inference.
SmolLM3: Hugging Face Releases Small Multilingual Long-Context Reasoning Model
Hugging Face has released SmolLM3, a compact language model designed for multilingual support, long-context processing, and reasoning capabilities. The model targets the small/efficient model segment while incorporating reasoning features typically associated with larger models. This release continues Hugging Face's SmolLM series aimed at capable but deployable open-weight models.
AllenAI releases olmo-eval evaluation workbench for model development
AllenAI published a blog post on Hugging Face introducing olmo-eval, an evaluation workbench designed to integrate into the model development loop. The tool appears aimed at streamlining evaluation workflows for researchers iterating on open-weights models. This is relevant to the OLMo model family ecosystem and the broader open-weights evaluation infrastructure space.
GLM-5.2 announced as model built for long-horizon tasks
ZAI.org published a blog post on Hugging Face announcing GLM-5.2, a model positioned for long-horizon tasks. The post appears to be a model release announcement from the GLM (General Language Model) lineage. Limited body content is available, but the framing suggests capabilities relevant to extended reasoning or agentic workflows.
Vision Language Models (Better, faster, stronger)
A Hugging Face blog post surveys the state of vision-language models (VLMs) in 2025, covering advances in architecture, training, efficiency, and deployment. The post reviews progress across major open and closed VLMs, highlighting trends in multimodal capability, speed improvements, and practical deployment patterns. As a tier-2 commentary piece, it synthesizes the current landscape rather than announcing new research.


