SimpleQA
simpleqa-f4314ada·2 events·first seen 1mo agoAliases: SimpleQA
Co-occurring entities
More like this (12)
Recent events (2)
Introducing SimpleQA: OpenAI's Factuality Benchmark for Language Models
OpenAI has released SimpleQA, a benchmark designed to measure language model factuality on short, fact-seeking questions. The benchmark targets a specific and well-defined capability: answering direct factual queries accurately. It is intended to provide a clean signal on model truthfulness and calibration for this class of questions.
Mistral AI Launches Agents API with Built-in Connectors, MCP Tools, and Persistent Memory
Mistral AI has released a dedicated Agents API that extends beyond chat completion by providing built-in connectors for code execution, web search, image generation, and document retrieval, alongside support for Model Context Protocol (MCP) tools. The API features stateful conversation management with branching, streaming output, and multi-agent orchestration capabilities. Benchmark results show substantial web search augmentation gains: Mistral Large jumps from 23% to 75% on SimpleQA, and Mistral Medium from 22% to 82% with search enabled. The release targets enterprise-grade agentic workflows and is accompanied by cookbooks covering GitHub coding assistants, financial analysis, and travel planning use cases.