Almanac
product

Inspect Scout

productactiveprovisionalinspect-scout-5e357dc5·1 events·first seen 6h ago

Aliases: Inspect Scout

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·6h ago·source ↗

TAC benchmark finds frontier AI agents systematically book animal-exploitative travel options below chance rate

Researchers introduce TAC (Travel Agent Compassion), the first agentic benchmark testing whether AI agents avoid animal-exploitative options when booking travel on behalf of users. Across 48 scenarios spanning six exploitation categories, all seven evaluated frontier models score below the 64% chance baseline, with the best performer (Claude Opus 4.7) at 53%. A single welfare-aware sentence in the system prompt yields dramatic gains in Claude and GPT-5.5 (47-63 percentage points) but minimal effect on DeepSeek and Gemini models. The study highlights a gap between models' text-response welfare reasoning and their agentic decision-making behavior.