Entity · model

Gemini 3.5 Pro

modelactiveprovisionalgemini-3-5-pro-1bc3d45a·5 events·first seen May 20, 2026

Aliases: Gemini 3.5 Pro, Gemini 3 Pro

Co-occurring entities

More like this (12)

Gemini 3.1 Pro Gemini-3.1-Pro Gemini-3.0-Pro Gemini-3 Pro Gemini-2.5-Pro Gemini 1.5 Pro Gemini Gemini 3.5 Flash Gemini 3.1 Pro Thinking Gemini 2.5 Gemini 3.5 Flash-Lite Gemini Flash 3.5

Recent events (5)

5The Batch·Jul 24, 2026·source ↗

Stanford/Together AI study finds retrieval is the weakest link for LLM web-search agents

Researchers at Stanford University and Together AI tested six LLMs equipped with web-search tools on daily news questions across six languages, finding that retrieval failures account for the majority of errors (38.8%) rather than reasoning or comprehension failures. Top models exceeded 90% accuracy on well-formed English multiple-choice questions, but performance degraded significantly for Hindi, free-response formats, and questions containing false premises. The study identifies three retrieval improvement levers—indexing coverage, source ranking, and multilingual query handling—and suggests retrieval optimization may yield larger gains than model scaling for time-sensitive queries.

Evaluation and Benchmarking Agent and Tool Ecosystem Gemini 3.5 Pro GPT-4o mini Stanford University +10 more

7The Batch·Jun 3, 2026·source ↗

Google's Aletheia agent uses Gemini 3 Deep Think to generate novel solutions to unsolved Erdős problems

Google researchers introduced Aletheia, an agentic workflow using Gemini 3 Deep Think that generates, verifies, and revises solutions to previously unsolved mathematical problems. Applied to Erdős problems, Aletheia produced 13 correct solutions out of 200 evaluated, with 4 being genuinely novel contributions not found in existing literature. The announcement also reveals Gemini 3 Deep Think's benchmark performance: 48.4% on HLE, 84.6% on ARC-AGI-2, and 93.8% on GPQA Diamond. The system demonstrates both the promise and current limitations of AI-assisted mathematical research, with a 6.5% correct-under-intended-interpretation rate on a hard problem set.

Frontier Model Releases Evaluation and Benchmarking Gemini 3.5 Pro Gemini Deep Think Tony Feng +9 more

6The Batch·May 29, 2026·source ↗

Gemini 3.5 Flash Launch, AI FDE Job Trends, AI Act Delays, and Agent-Driven Web Traffic

Google launched Gemini 3.5 Flash, a mid-tier multimodal mixture-of-experts model with improved agentic capabilities, visual understanding, and speed, priced at $1.50/$9.00 per million input/output tokens — three times the cost of its predecessor Gemini 3 Flash. The model supports up to 1M token context, adjustable reasoning levels, and thought preservation across multi-turn conversations, and tops the Artificial Analysis APEX-Agents-AA and MMMU-Pro benchmarks. The issue also covers Andrew Ng's commentary on the rise of AI Forward Deployed Engineers versus the broader AI Engineer role, plus news items on EU AI Act implementation delays and AI agents driving measurable online traffic shifts.

Frontier Model Releases Evaluation and Benchmarking Gemini 3.5 Pro Palantir Artificial Analysis Intelligence Index +18 more

6arXiv · cs.CL·May 22, 2026·source ↗

Systematic 14-Day Evaluation of Six AI Chatbots as News Intermediaries Across Languages and Regions

Researchers evaluated six commercial AI chatbots (Gemini 3 Flash/Pro, Grok 4, Claude 4.5 Sonnet, GPT-5, GPT-4o mini) on 2,100 factual questions derived from same-day BBC News reporting across six regional services over 14 days in February 2026. Top systems exceed 90% multiple-choice accuracy on breaking news but lose 11-17% under free-response conditions. Key findings include systematic Hindi-language underperformance (79% vs. 89-91% elsewhere) driven by Anglophone retrieval bias, retrieval failures accounting for over 70% of errors, and dramatic accuracy collapse (to 19-70%) on questions containing subtle false premises. A detection-accuracy paradox is identified: the best false-premise detector does not yield the best adversarial accuracy, suggesting premise detection and answer recovery are partially independent capabilities.

Frontier Model Releases Evaluation and Benchmarking Gemini 3.5 Pro BBC News GPT-4o mini +11 more

6The Batch·May 20, 2026·source ↗

Data Points: Cursor Composer 2.5, Gemini 3.5 Flash, Antigravity 2.0, Omni Flash, AI Search, and Corti Symphony

This edition covers several notable AI product and model releases: Cursor shipped Composer 2.5 (built on Kimi K2.5) scoring 79.8% on SWE-Bench Multilingual at significantly lower cost than frontier competitors; Google released Gemini 3.5 Flash with claimed 4x speed advantage and launched Antigravity 2.0 as an agent-first desktop app replacing its IDE; Google also introduced Gemini Omni Flash for multimodal video generation and overhauled its search interface with Gemini 3.5. Additionally, Copenhagen-based Corti launched Symphony for Speech-to-Text achieving 1.4% word error rate on medical terminology versus 17-19% for generalist models.

Frontier Model Releases Evaluation and Benchmarking Gemini 3.5 Pro Gemini Spark Cursor Composer 2.5 +23 more