technique
visual language model
techniqueactiveprovisional
visual-language-model-b2fd45f4·1 events·first seen 20d agoAliases: visual language model
Co-occurring entities
More like this (12)
Vision-Language ModelsVision-Language-Action modelVision-Language-Action modelsvision-language groundingLabVLA: Grounding Vision-Language-Action Models in Scientific LaboratoriesVisualMemModeling Complex Behaviors: Multi-Personality Composition and Dynamic Switching in Vision-Language Modelscontrastive vision-language pretraininglarge language model agentsTempoVLA: Learning Speed-Controllable Vision-Language-Action PoliciesMultimodal Large Language Modelslarge language models
Recent events (1)
MaskClaw: Edge-Side Privacy Arbitration System for GUI Agents with Behavior-Driven Skill Evolution
MaskClaw is an edge-side privacy arbitration framework for GUI agents that intercepts screenshots before they leave a trusted environment, applying Allow/Mask/Ask decisions based on local visual evidence and user-specific policy memory. The system addresses the gap where static PII detectors miss context-dependent privacy boundaries and cloud-side VLMs may upload raw screens before deciding what to protect. The authors introduce P-GUI-Evo, a new benchmark built from real UI patterns and sanitized labels, and demonstrate that pattern matching, cloud reasoning, and routing alone each exhibit systematic failure modes. The artifact is open-sourced on GitHub.