benchmark
TAC (Travel Agent Compassion)
benchmarkactiveprovisional
tac-travel-agent-compassion--07a98639·1 events·first seen 6h agoAliases: TAC (Travel Agent Compassion)
Co-occurring entities
More like this (12)
Recent events (1)
TAC benchmark finds frontier AI agents systematically book animal-exploitative travel options below chance rate
Researchers introduce TAC (Travel Agent Compassion), the first agentic benchmark testing whether AI agents avoid animal-exploitative options when booking travel on behalf of users. Across 48 scenarios spanning six exploitation categories, all seven evaluated frontier models score below the 64% chance baseline, with the best performer (Claude Opus 4.7) at 53%. A single welfare-aware sentence in the system prompt yields dramatic gains in Claude and GPT-5.5 (47-63 percentage points) but minimal effect on DeepSeek and Gemini models. The study highlights a gap between models' text-response welfare reasoning and their agentic decision-making behavior.