Almanac
technique

Vision-Language-Action model

techniqueactivevision-language-action-model-3ce380f4·6 events·first seen 1mo ago

Aliases: Vision-Language-Action model

Co-occurring entities

More like this (12)

Recent events (6)

5Hugging Face Blog·1mo ago·source ↗

Bringing Robotics AI to Embedded Platforms: Dataset Recording, VLA Fine-Tuning, and On-Device Optimizations

NXP and Hugging Face describe a pipeline for deploying Vision-Language-Action (VLA) models on embedded/edge hardware, covering dataset recording, fine-tuning, and on-device optimization techniques. The post targets robotics applications where inference must run on resource-constrained microcontrollers or SoCs rather than cloud GPUs. Key topics include quantization, model compression, and integration with the LeRobot ecosystem. This represents a practical engineering bridge between frontier VLA research and real-world embedded robotics deployment.

6arXiv · cs.AI·27d ago·source ↗

Lost in Fog: Sensor Perturbations Expose Reasoning Fragility in Driving VLAs

This paper presents a controlled robustness study of Vision-Language-Action (VLA) models in autonomous driving, evaluating Alpamayo R1 (10B parameters) across ~18,000 inference trials under eight sensor perturbation types including noise, lighting extremes, and fog. The key finding is that Chain-of-Causation (CoC) reasoning consistency is a high-fidelity proxy for trajectory reliability: when CoC explanations change post-perturbation, trajectory deviation spikes 5.3× (r=0.99 across attack types). Enabling CoC generation is associated with 11.8% average improvement in trajectory accuracy, and degradation under noise is approximately linear (R²=0.957), while standard preprocessing defenses offer only marginal benefit.

6Hugging Face Blog·28d ago·source ↗

π0 and π0-FAST: Vision-Language-Action Models for General Robot Control

Hugging Face published a blog post covering π0 and π0-FAST, vision-language-action (VLA) models developed for general-purpose robot control. These models combine vision and language understanding with action generation to enable robots to perform a broad range of manipulation tasks. The post appears to be a technical overview or release commentary on Physical Intelligence's robotics foundation models, situating them within the broader VLA research landscape.

7arXiv · cs.LG·20d ago·source ↗

Ω-QVLA: Training-Free W4A4 Quantization for Full Vision-Language-Action Models Including Diffusion Action Heads

Omega-QVLA is a post-training quantization framework that compresses both the LLM backbone and the diffusion-based action head of VLA models to uniform W4A4 precision without mixed-precision schemes or fine-tuning. It combines composite SVD-Hadamard rotation for weight energy equalization with per-step DiT activation scaling to handle dynamic-range drift across denoising steps. On the LIBERO benchmark, it achieves 98.0% and 87.8% task success on Pi 0.5 and GR00T N1.5 respectively—matching or exceeding FP16 baselines—while reducing static memory footprint by 71.3%. Real-world manipulation experiments confirm the approach generalizes beyond simulation.

5Hugging Face Blog·28d ago·source ↗

Asynchronous Robot Inference: Decoupling Action Prediction and Execution

Hugging Face published a blog post on asynchronous robot inference, a technique that decouples the timing of action prediction from action execution in robotic systems. This approach addresses latency bottlenecks that arise when large neural network inference times exceed the real-time control loop requirements of physical robots. The post likely covers architectural patterns and implementation strategies for deploying vision-language-action models or similar policies on robot hardware without blocking the control pipeline.

5Hugging Face Blog·28d ago·source ↗

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

Hugging Face introduces SmolVLA, a compact Vision-Language-Action model designed for robotics control, trained on community-contributed data from the LeRobot ecosystem. The model targets efficient deployment on resource-constrained hardware while maintaining competitive manipulation performance. This release represents a continuation of Hugging Face's strategy to democratize robotics AI through open community data pipelines.