7The Batch (DeepLearning.AI)·17d ago

Google's Aletheia agent uses Gemini 3 Deep Think to generate novel solutions to unsolved Erdős problems

Google researchers introduced Aletheia, an agentic workflow using Gemini 3 Deep Think that generates, verifies, and revises solutions to previously unsolved mathematical problems. Applied to Erdős problems, Aletheia produced 13 correct solutions out of 200 evaluated, with 4 being genuinely novel contributions not found in existing literature. The announcement also reveals Gemini 3 Deep Think's benchmark performance: 48.4% on HLE, 84.6% on ARC-AGI-2, and 93.8% on GPQA Diamond. The system demonstrates both the promise and current limitations of AI-assisted mathematical research, with a 6.5% correct-under-intended-interpretation rate on a hard problem set.

Frontier Model Releases Evaluation and Benchmarking Agent and Tool Ecosystem Gemini 3.5 Pro Gemini Deep Think Tony Feng Thang Luong Google Aletheia AlphaEvolve GPQA Diamond Quoc V. Le HLE ARC-AGI

Related guides (4)

Frontier Model ReleasesTopic guide

Frontier Model Releases: The Race From Language to Action

Read asBeginner In-depth

Google

Google: The AI Lab That Builds Everything from DNA Models to Your Phone's Assistant

Read asBeginner

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How the Infrastructure Layer Around LLMs Is Consolidating

Read asIn-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

8Google Deepmind Blog·1mo ago·source ↗

AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms

DeepMind has announced AlphaEvolve, a coding agent powered by Gemini that autonomously evolves algorithms for mathematical and practical computing applications. The system combines large language model creativity with automated evaluators to iteratively improve algorithmic solutions. It represents a significant step in AI-driven algorithm discovery, extending DeepMind's prior work in this space (e.g., AlphaTensor, FunSearch). The announcement comes from DeepMind's official blog, indicating a substantive capability release rather than a research preview.

Frontier Model Releases Evaluation and Benchmarking AlphaEvolve Google DeepMind AlphaTensor +3 more

6Google Deepmind Blog·1mo ago·source ↗

Accelerating Mathematical and Scientific Discovery with Gemini Deep Think

DeepMind published a blog post highlighting the research impact of Gemini Deep Think across mathematical and scientific domains. The post references multiple research papers demonstrating the model's growing utility in technical discovery workflows. This appears to be a capability showcase for DeepMind's extended-thinking variant of Gemini, positioning it as a tool for frontier scientific research.

Long Context Evolution Frontier Model Releases Gemini Deep Think Google DeepMind Gemini +1 more

7Google Deepmind Blog·1mo ago·source ↗

AlphaEvolve: How our Gemini-powered coding agent is scaling impact across fields

DeepMind published a blog post detailing the real-world impact of AlphaEvolve, a Gemini-powered coding agent designed to discover and optimize algorithms. The post covers applications spanning business operations, infrastructure, and scientific research. AlphaEvolve represents a deployment of LLM-driven evolutionary algorithm search at scale across multiple domains.

Frontier Model Releases Inference Economics AlphaEvolve Google DeepMind Gemini +1 more

6The Batch·17d ago·source ↗

Data Points: Perplexity Computer expands, Google Aletheia math agent, DeepSeek chip strategy, Nvidia retrieval pipeline, Stargate cancellation

The Batch's weekly data points roundup covers five significant AI developments: Perplexity expanded its Computer agentic platform to desktop, mobile, and enterprise with new APIs and financial data tools; Google released Aletheia, a Gemini-based math research agent achieving 95.1% on IMO-Proof Bench Advanced (up from 65.7%); DeepSeek withheld pre-release access to its V4 model from Nvidia and AMD while giving domestic Chinese chipmakers early access; Nvidia's NeMo Retriever topped the ViDoRe v3 leaderboard using a ReACT-based agentic retrieval loop; and OpenAI and Oracle cancelled plans to expand the Abilene Stargate campus from 1.2 GW to 2.0 GW due to financing and reliability issues.

Training Infrastructure Frontier Model Releases ViDoRe v3 Crusoe BRIGHT +19 more

9Google Deepmind Blog·1mo ago·source ↗

Gemini 3.5: Frontier Intelligence with Action

Google DeepMind has announced Gemini 3.5, a new model generation positioned around agentic capabilities and complex workflow execution. The announcement emphasizes action-oriented AI, suggesting a focus on tool use, multi-step reasoning, and autonomous task completion. The blog post is brief, indicating this may be an initial announcement with further details to follow.

Frontier Model Releases Agent and Tool Ecosystem Google DeepMind Gemini +1 more

7Google Deepmind Blog·1mo ago·source ↗

Google DeepMind Rolls Out Deep Think in Gemini App for Ultra Subscribers

Google DeepMind is making Deep Think available in the Gemini app for Google AI Ultra subscribers, marking a broader consumer rollout of its advanced reasoning capability. Additionally, select mathematicians are being granted access to the full Gemini 2.5 Deep Think model that was entered into the International Mathematical Olympiad (IMO) competition. This deployment follows DeepMind's earlier IMO-related capability demonstrations and represents a step toward productizing frontier mathematical reasoning.

Frontier Model Releases Evaluation and Benchmarking Gemini Deep Think International Mathematical Olympiad Google DeepMind +3 more

8Google Deepmind Blog·1mo ago·source ↗

Gemini 2.5 Deep Think Achieves Gold-Medal Level at ICPC World Finals

Google DeepMind reports that Gemini 2.5 Deep Think has achieved gold-medal-level performance at the International Collegiate Programming Contest (ICPC) World Finals, one of the most prestigious competitive programming competitions globally. The announcement frames this as a significant advance in abstract problem-solving capability. This follows a pattern of frontier labs using competitive programming benchmarks to demonstrate reasoning breakthroughs, similar to prior milestones at IOI and Codeforces. The specific score, problem set, and evaluation methodology are not detailed in the announcement body.

Frontier Model Releases Evaluation and Benchmarking Gemini Deep Think International Collegiate Programming Contest Gemini 2.5 +2 more

8Google Deepmind Blog·1mo ago·source ↗

Gemini 3.1 Pro: A smarter model for your most complex tasks

Google DeepMind has announced Gemini 3.1 Pro, a new model positioned for complex reasoning tasks where simple answers are insufficient. The announcement comes from the official DeepMind blog, indicating a flagship-tier release. The body content is minimal, providing little technical detail beyond the positioning statement.

Frontier Model Releases Enterprise Deployment Patterns Gemini 3.1 Pro Google DeepMind Gemini