Almanac
product

VISTA

productactiveprovisionalvista-802aa418·1 events·first seen 7d ago

Aliases: VISTA

More like this (12)

Recent events (1)

5arXiv · cs.CL·7d ago·source ↗

VISTA: Hybrid user simulation toolkit for interactive agent evaluation

Researchers introduce VISTA, a user simulation framework designed to address limitations in current agent evaluation methods, which rely on static benchmarks that miss dynamic, multi-step failure modes. VISTA provides six metrics for measuring realism, capability coverage, and interaction effectiveness, and combines UI-based and API-based interactions in a hybrid simulator. The toolkit is evaluated in e-commerce and education customer service settings, showing more realistic and comprehensive coverage than existing approaches.