Entity · person

XinhaoS0101

personactivexinhaos0101-c39ad925·1 events·first seen Jun 2, 2026

Aliases: XinhaoS0101

Co-occurring entities

multimodal agents GUI grounding CAPTCHA HLL (Humanity's Last Line of Verification)

More like this (12)

YutongWang1216 yifanfeng97 linshenkx simonlin1212 Yuxiao Qu Xu Kevin Xu Reynold Xin NoXi Haitao Wu Xin Ye Yuxi

Recent events (1)

6arXiv · cs.CL·Jun 2, 2026·source ↗

HLL: Benchmark for Evaluating Multimodal Agents on CAPTCHA Human-Verification Boundaries

The paper introduces Humanity's Last Line of Verification (HLL), a controlled benchmark that tests whether multimodal agents can solve CAPTCHA challenges through grounded, human-like GUI interaction rather than mere recognition. Eight frontier multimodal agents are evaluated in a closed-loop environment across diverse CAPTCHA types with realism stressors including cluttered interfaces, harder variants, and trace-conditioned validation. Results show current agents remain brittle at this human-substitution boundary, with performance degrading under realistic conditions and when action traces must be consistent with correct answers. The benchmark exposes specific gaps in localization, action calibration, state tracking, and process consistency.

Evaluation and Benchmarking Enterprise Deployment Patterns multimodal agents GUI grounding XinhaoS0101 +4 more