Andon Labs
andon-labs-e3e7dea4·2 events·first seen 15d agoAliases: Andon Labs
Co-occurring entities
More like this (12)
Recent events (2)
Andon Labs on building frontier evals: VendingBench and evaluating Claude models
Latent Space interviews Lukas Petersson and Axel Backlund of Andon Labs, the creators of VendingBench, about their approach to building real-world AI evaluations. The conversation covers their experience evaluating Claude models across the capability spectrum from Haiku to Mythos, and their methodology for constructing durable frontier evals. The episode is notable for touching on a speculative or unreleased Claude model tier called 'Mythos.'
Insurance Companies Carve Out AI Risk Exceptions; GPT-Rosalind, Claude Design, and Agentic Retail Deployments Highlighted
Major insurers including Berkshire Hathaway units, Travelers Group, and Chubb are excluding or restricting AI-related liability coverage, signaling growing concern over hard-to-model AI-driven claims. OpenAI introduced GPT-Rosalind, a domain-specific LLM fine-tuned for life sciences workflows, while Anthropic launched Claude Design for visual asset generation targeting non-designers. Additional items cover an AI-run San Francisco retail store exposing agentic system limitations, Wall Street banks cutting junior roles via AI deployment, and Anthropic's continued engagement with the Trump administration despite prior Pentagon restrictions.