product
Inspect Scout
productactiveprovisional
inspect-scout-5e357dc5·1 events·first seen 6h agoAliases: Inspect Scout
Co-occurring entities
More like this (12)
Recent events (1)
TAC benchmark finds frontier AI agents systematically book animal-exploitative travel options below chance rate
Researchers introduce TAC (Travel Agent Compassion), the first agentic benchmark testing whether AI agents avoid animal-exploitative options when booking travel on behalf of users. Across 48 scenarios spanning six exploitation categories, all seven evaluated frontier models score below the 64% chance baseline, with the best performer (Claude Opus 4.7) at 53%. A single welfare-aware sentence in the system prompt yields dramatic gains in Claude and GPT-5.5 (47-63 percentage points) but minimal effect on DeepSeek and Gemini models. The study highlights a gap between models' text-response welfare reasoning and their agentic decision-making behavior.