
GPT-5.2
gpt-5-2-5ad34f86·12 events·first seen 28d agoAliases: GPT-5.2
Co-occurring entities
More like this (12)
Recent events (12)
GPT-5.2 derives a new result in theoretical physics
A new preprint demonstrates GPT-5.2 proposing a novel formula for a gluon amplitude in theoretical physics, which was subsequently formally proved and verified by OpenAI researchers and academic collaborators. This represents a claimed instance of an AI system producing a genuinely new scientific result rather than reproducing known work. The result was published as a preprint and announced via the OpenAI blog.
Advancing science and math with GPT-5.2
OpenAI has released GPT-5.2, described as its strongest model for mathematics and science, achieving state-of-the-art results on GPQA Diamond and FrontierMath benchmarks. The announcement highlights practical research applications including solving an open theoretical problem and generating verified mathematical proofs. The post positions GPT-5.2 as a meaningful step toward AI-assisted scientific discovery.
Introducing GPT-5.2
OpenAI has released GPT-5.2, described as their most advanced frontier model for professional use, featuring state-of-the-art reasoning, long-context understanding, coding, and vision capabilities. The model is available through ChatGPT and the OpenAI API. It is positioned to support faster and more reliable agentic workflows.
OpenAI Releases GPT-5.2 System Card Update
OpenAI has published a system card update for GPT-5.2, the latest model family in the GPT-5 series. The safety mitigation approach is described as largely consistent with the prior GPT-5 and GPT-5.1 system cards. Training data sources follow the same pattern as other OpenAI models: publicly available internet data, third-party partnerships, and user/researcher-generated content.
Introducing Prism: OpenAI's LaTeX-Native Research Workspace with GPT-5.2
OpenAI has launched Prism, a free LaTeX-native workspace designed for researchers that integrates GPT-5.2 directly into the writing and collaboration environment. The product targets academic and scientific workflows, combining document authoring with AI-assisted reasoning in a single interface. This also marks a public reference to GPT-5.2, indicating a model iteration beyond GPT-5.
Netomi's lessons for scaling agentic systems into the enterprise
Netomi, an enterprise AI customer service platform, shares operational lessons from deploying agentic systems at scale using OpenAI's GPT-4.1 and GPT-5.2 models. The case study covers concurrency management, governance frameworks, and multi-step reasoning in production workflows. This represents a real-world deployment pattern for frontier models in enterprise agentic contexts.
Addendum to GPT-5.2 System Card: GPT-5.2-Codex
OpenAI published a system card addendum for GPT-5.2-Codex, a specialized variant of GPT-5.2 focused on coding capabilities. The document provides safety evaluations, capability assessments, and deployment considerations specific to this coding-oriented model. As a Tier 1 source system card, it represents official documentation of a frontier coding model's properties and risk profile.
Claude Opus 4.6 Released with 1M Token Context, Agentic Coding Advances, and State-of-the-Art Benchmarks
Anthropic has released Claude Opus 4.6, its most capable model to date, featuring a 1M token context window in beta, improved agentic coding and planning capabilities, and adaptive thinking with developer-controlled effort levels. The model claims top scores on Terminal-Bench 2.0, Humanity's Last Exam, GDPval-AA, and BrowseComp, outperforming OpenAI's GPT-5.2 by 144 Elo points on GDPval-AA. New product features include agent teams in Claude Code, context compaction for long-running tasks, and Claude in PowerPoint (research preview). Pricing remains unchanged at $5/$25 per million input/output tokens.
GPT-5.3-Codex System Card
OpenAI has released the system card for GPT-5.3-Codex, described as the most capable agentic coding model to date. It combines the frontier coding performance of GPT-5.2-Codex with the reasoning and professional knowledge capabilities of GPT-5.2. The release represents a continuation of OpenAI's Codex line of specialized coding models within the GPT-5 family.
Benchmarking Local LLMs for Confidential Translation Workflows
This paper evaluates locally runnable LLMs (via Ollama) for offline, privacy-constrained translation workflows targeting freelance translators and smaller language service providers. The authors expand their Reeve Foundation corpus to include German and Simplified Chinese, then benchmark local models across four language directions against commercial NMTs (DeepL, Baidu), a frontier LLM (GPT-5.2), and professional local NMT systems. Results show substantial performance variation by language direction and model size, with the best local LLMs matching or exceeding local NMT systems and the frontier LLM, though falling short of top commercial NMTs. The study supports the viability of local LLMs for confidentiality-sensitive translation use cases.
TAC benchmark finds frontier AI agents systematically book animal-exploitative travel options below chance rate
Researchers introduce TAC (Travel Agent Compassion), the first agentic benchmark testing whether AI agents avoid animal-exploitative options when booking travel on behalf of users. Across 48 scenarios spanning six exploitation categories, all seven evaluated frontier models score below the 64% chance baseline, with the best performer (Claude Opus 4.7) at 53%. A single welfare-aware sentence in the system prompt yields dramatic gains in Claude and GPT-5.5 (47-63 percentage points) but minimal effect on DeepSeek and Gemini models. The study highlights a gap between models' text-response welfare reasoning and their agentic decision-making behavior.
Alibaba releases Qwen3.5 open-weights vision-language model family with MoE architecture across eight sizes
Alibaba released the Qwen3.5 family of eight open-weights vision-language models ranging from 0.8B to 397B parameters, built on a mixture-of-experts architecture with mixed attention and Gated DeltaNet layers. The flagship Qwen3.5-397B-A17B outperforms GPT-5.2, Claude 4.5 Opus, and Gemini-3 Pro on 28 of 44 vision benchmarks, while the 9B model surpasses OpenAI's gpt-oss-120B on most language tasks. Open weights are available under Apache 2.0, with hosted agentic variants (Qwen3.5-Plus, Qwen3.5-Flash) available via Alibaba Cloud. The release is notable for strong small-model efficiency and comes amid reported team departures following the Qwen3 rollout.