Entity · technique

visual language model

techniqueactivevisual-language-model-b2fd45f4·1 events·first seen May 28, 2026

Aliases: visual language model

Co-occurring entities

GUI Agents MaskClaw P-GUI-Evo Theodora-Y

More like this (12)

Vision-Language Models Vision-Language-Action model Vision-Language-Action models vision-language grounding LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories VisualMem Test-Time Training for Modality Order Consistency in Vision-Language Models Visually Grounded Self-Reflection for Vision-Language Models via Reinforcement Learning Benchmarking Multimodal Large Language Models for Scientific Visualization Literacy Modeling Complex Behaviors: Multi-Personality Composition and Dynamic Switching in Vision-Language Models From Fixed to Free Cameras: Calibration-Free View-Robust Vision-Language-Action Model From Fixed to Free Cameras: Calibration-Free View-Robust Vision-Language-Action Model

Recent events (1)

6arXiv · cs.CL·May 28, 2026·source ↗

MaskClaw: Edge-Side Privacy Arbitration System for GUI Agents with Behavior-Driven Skill Evolution

MaskClaw is an edge-side privacy arbitration framework for GUI agents that intercepts screenshots before they leave a trusted environment, applying Allow/Mask/Ask decisions based on local visual evidence and user-specific policy memory. The system addresses the gap where static PII detectors miss context-dependent privacy boundaries and cloud-side VLMs may upload raw screens before deciding what to protect. The authors introduce P-GUI-Evo, a new benchmark built from real UI patterns and sanitized labels, and demonstrate that pattern matching, cloud reasoning, and routing alone each exhibit systematic failure modes. The artifact is open-sourced on GitHub.

Evaluation and Benchmarking AI Safety Research visual language model GUI Agents MaskClaw +4 more