Almanac
model

Gemini 3.5 Pro

modelactiveprovisionalgemini-3-5-pro-1bc3d45a·4 events·first seen 27d ago

Aliases: Gemini 3.5 Pro, Gemini 3 Pro

Co-occurring entities

More like this (12)

Recent events (4)

6arXiv · cs.CL·25d ago·source ↗

Systematic 14-Day Evaluation of Six AI Chatbots as News Intermediaries Across Languages and Regions

Researchers evaluated six commercial AI chatbots (Gemini 3 Flash/Pro, Grok 4, Claude 4.5 Sonnet, GPT-5, GPT-4o mini) on 2,100 factual questions derived from same-day BBC News reporting across six regional services over 14 days in February 2026. Top systems exceed 90% multiple-choice accuracy on breaking news but lose 11-17% under free-response conditions. Key findings include systematic Hindi-language underperformance (79% vs. 89-91% elsewhere) driven by Anglophone retrieval bias, retrieval failures accounting for over 70% of errors, and dramatic accuracy collapse (to 19-70%) on questions containing subtle false premises. A detection-accuracy paradox is identified: the best false-premise detector does not yield the best adversarial accuracy, suggesting premise detection and answer recovery are partially independent capabilities.

7The Batch·13d ago·source ↗

Google's Aletheia agent uses Gemini 3 Deep Think to generate novel solutions to unsolved Erdős problems

Google researchers introduced Aletheia, an agentic workflow using Gemini 3 Deep Think that generates, verifies, and revises solutions to previously unsolved mathematical problems. Applied to Erdős problems, Aletheia produced 13 correct solutions out of 200 evaluated, with 4 being genuinely novel contributions not found in existing literature. The announcement also reveals Gemini 3 Deep Think's benchmark performance: 48.4% on HLE, 84.6% on ARC-AGI-2, and 93.8% on GPQA Diamond. The system demonstrates both the promise and current limitations of AI-assisted mathematical research, with a 6.5% correct-under-intended-interpretation rate on a hard problem set.

6The Batch·27d ago·source ↗

Data Points: Cursor Composer 2.5, Gemini 3.5 Flash, Antigravity 2.0, Omni Flash, AI Search, and Corti Symphony

This edition covers several notable AI product and model releases: Cursor shipped Composer 2.5 (built on Kimi K2.5) scoring 79.8% on SWE-Bench Multilingual at significantly lower cost than frontier competitors; Google released Gemini 3.5 Flash with claimed 4x speed advantage and launched Antigravity 2.0 as an agent-first desktop app replacing its IDE; Google also introduced Gemini Omni Flash for multimodal video generation and overhauled its search interface with Gemini 3.5. Additionally, Copenhagen-based Corti launched Symphony for Speech-to-Text achieving 1.4% word error rate on medical terminology versus 17-19% for generalist models.

6The Batch·18d ago·source ↗

Gemini 3.5 Flash Launch, AI FDE Job Trends, AI Act Delays, and Agent-Driven Web Traffic

Google launched Gemini 3.5 Flash, a mid-tier multimodal mixture-of-experts model with improved agentic capabilities, visual understanding, and speed, priced at $1.50/$9.00 per million input/output tokens — three times the cost of its predecessor Gemini 3 Flash. The model supports up to 1M token context, adjustable reasoning levels, and thought preservation across multi-turn conversations, and tops the Artificial Analysis APEX-Agents-AA and MMMU-Pro benchmarks. The issue also covers Andrew Ng's commentary on the rise of AI Forward Deployed Engineers versus the broader AI Engineer role, plus news items on EU AI Act implementation delays and AI agents driving measurable online traffic shifts.