5Hugging Face Blog·4d ago

MolmoMotion: Language-guided 3D motion forecasting from Allen AI

Allen AI published a blog post on Hugging Face introducing MolmoMotion, a system for language-guided 3D motion forecasting. The work extends the Molmo model family into motion prediction tasks, combining natural language conditioning with 3D spatial reasoning. The post appears to be an announcement or demonstration of the capability, though the body content was not available for detailed review.

Frontier Model Releases Multimodal Progress MolmoMotion Molmo Hugging Face Allen Institute for AI

Related guides (3)

Hugging Face

Hugging Face: The Home of Open-Source AI

Read asBeginner In-depth

Frontier Model ReleasesTopic guide

Frontier Model Releases: The Race From Language to Action

Read asBeginner In-depth

Multimodal ProgressTopic guide

Multimodal Progress: How AI Learned to See, Hear, and Act

Read asBeginner In-depth

Related events (8)

5Hugging Face Blog·1mo ago·source ↗

smolagents Now Supports Vision-Language Models

Hugging Face has added vision-language model (VLM) support to its smolagents framework, enabling agents to process and reason over visual inputs alongside text. This update extends the agentic tooling ecosystem to multimodal workflows. The announcement comes from the Hugging Face blog, which serves as the primary communication channel for the smolagents project.

Agent and Tool Ecosystem Multimodal Progress Vision-Language Models Hugging Face smolagents

5arXiv · cs.CL·1mo ago·source ↗

AnyMo: Geometry-Aware Setup-Agnostic Framework for Wearable IMU Human Motion Understanding

AnyMo is a geometry-aware framework that addresses the setup-dependence problem in wearable IMU-based human motion modeling by using physics-grounded simulation over dense body-surface placements to generate synthetic training signals. It pre-trains a graph encoder from synthetic placement views and masked partial observations, then tokenizes multi-position IMU data into full-body motion tokens aligned with an LLM for motion-language understanding. Evaluated across zero-shot activity recognition (14 unseen datasets), cross-modal retrieval, and motion captioning, AnyMo improves average Accuracy/F1 by ~11.7%/11.6%, zero-shot retrieval MRR by 15.9–28.6%, and captioning BERT-F1 by 18.8%. The work positions itself as a generalist model for wearable motion understanding transferable across devices and sensing configurations.

Agent and Tool Ecosystem Multimodal Progress large language models BERT-F1 Baiyu Chen +4 more

5Hugging Face Blog·1mo ago·source ↗

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

Hugging Face introduces SmolVLA, a compact Vision-Language-Action model designed for robotics control, trained on community-contributed data from the LeRobot ecosystem. The model targets efficient deployment on resource-constrained hardware while maintaining competitive manipulation performance. This release represents a continuation of Hugging Face's strategy to democratize robotics AI through open community data pipelines.

Open Weights Progress Agent and Tool Ecosystem LeRobot Vision-Language-Action model Hugging Face +2 more

5Hugging Face Blog·1mo ago·source ↗

SmolVLM2: Bringing Video Understanding to Every Device

Hugging Face introduces SmolVLM2, a family of compact vision-language models designed for video understanding on resource-constrained devices. The models extend the SmolVLM line with video comprehension capabilities while maintaining small footprints suitable for edge and on-device deployment. The release targets democratizing multimodal video understanding beyond cloud-only inference.

Open Weights Progress Inference Economics SmolVLM SmolVLM2 Hugging Face +1 more

5Hugging Face Blog·1mo ago·source ↗

SmolLM3: Hugging Face Releases Small Multilingual Long-Context Reasoning Model

Hugging Face has released SmolLM3, a compact language model designed for multilingual support, long-context processing, and reasoning capabilities. The model targets the small/efficient model segment while incorporating reasoning features typically associated with larger models. This release continues Hugging Face's SmolLM series aimed at capable but deployable open-weight models.

Long Context Evolution Frontier Model Releases SmolLM Hugging Face SmolLM3 +2 more

5Hugging Face Blog·9d ago·source ↗

AllenAI releases olmo-eval evaluation workbench for model development

AllenAI published a blog post on Hugging Face introducing olmo-eval, an evaluation workbench designed to integrate into the model development loop. The tool appears aimed at streamlining evaluation workflows for researchers iterating on open-weights models. This is relevant to the OLMo model family ecosystem and the broader open-weights evaluation infrastructure space.

Evaluation and Benchmarking Open Weights Progress OLMo AllenAI Hugging Face +1 more

5Hugging Face Blog·4d ago·source ↗

GLM-5.2 announced as model built for long-horizon tasks

ZAI.org published a blog post on Hugging Face announcing GLM-5.2, a model positioned for long-horizon tasks. The post appears to be a model release announcement from the GLM (General Language Model) lineage. Limited body content is available, but the framing suggests capabilities relevant to extended reasoning or agentic workflows.

Long Context Evolution Frontier Model Releases zai-org Hugging Face GLM-5.1

5Hugging Face Blog·1mo ago·source ↗

Vision Language Models (Better, faster, stronger)

A Hugging Face blog post surveys the state of vision-language models (VLMs) in 2025, covering advances in architecture, training, efficiency, and deployment. The post reviews progress across major open and closed VLMs, highlighting trends in multimodal capability, speed improvements, and practical deployment patterns. As a tier-2 commentary piece, it synthesizes the current landscape rather than announcing new research.

Open Weights Progress Inference Economics Vision-Language Models Hugging Face +1 more