Entity · model

GPT-5.4 mini

modelactiveprovisionalgpt-5-4-mini-fafb0963·14 events·first seen May 19, 2026

Aliases: GPT-5.4 mini, GPT-5 mini, GPT-5.4-Mini

Co-occurring entities

More like this (12)

GPT-4.1 mini GPT-4o mini GPT-5.4 nano GPT-5.5 GPT-5.2 GPT-4b micro GPT-4.1 nano GPT Pro GPT-4V GPT-5.3 GPT-4.1 GPT-5.5 System Card

Recent events (14)

6arXiv · cs.CL·Jul 23, 2026·source ↗

SLAI T-Rex: Full-parameter post-training of DeepSeek-V4 family on Ascend NPU SuperPOD achieves 34% MFU

Researchers present SLAI T-Rex, an end-to-end optimization framework for full-parameter post-training of trillion-parameter MoE models on Huawei Ascend NPU SuperPOD infrastructure, using the DeepSeek-V4 model family as the target workload. The system achieves 34.22% Model FLOPs Utilization, a 2.93x improvement over the open-source baseline, through hierarchical optimizations spanning model parallelism, communication orchestration, and kernel execution. Building on this infrastructure, the team develops a domain-specialized CPT and SFT pipeline for Operations Research tasks using DeepSeek-V4-Flash, producing a model that achieves 71.81% zero-shot Pass@1 on OR benchmarks, outperforming GPT-5.4-Mini by ~4 percentage points. The work is notable both as a non-GPU large-scale training system report and as a demonstration of domain specialization for complex mathematical reasoning.

Training Infrastructure Frontier Model Releases DeepSeek V4 SLAI T-Rex DeepSeek-V4-Flash +3 more

3arXiv · cs.CL·Jul 20, 2026·source ↗

GPT-5 variants benchmarked for summarization-assisted automated essay scoring on ASAP 2.0

Researchers propose a hybrid framework for automated essay scoring (AES) that uses GPT-5, GPT-5 mini, and GPT-5 nano to generate controlled-length summaries of long essays, addressing transformer input-length limitations. Summaries are combined with handcrafted linguistic features and fed into downstream AES models evaluated on the ASAP 2.0 dataset using quadratic weighted kappa. GPT-5 mini achieves the best human-rating agreement while GPT-5 produces higher summarization quality, revealing cost-performance trade-offs. The study also finds that higher-scoring, more complex essays are harder to compress without information loss, raising fairness concerns for educational deployment.

Enterprise Deployment Patterns ASAP 2.0 OpenAI GPT-5.4 mini +2 more

3arXiv · cs.CL·Jul 16, 2026·source ↗

Cross-rubric generalization for automated essay scoring using LLM fine-tuning with trait-based representations

A new arXiv paper introduces a framework for automated essay scoring (AES) that generalizes to previously unseen scoring rubrics, rather than just unseen prompts. The approach uses rubric-agnostic intermediate representations called 'traits' combined with target-essay supervision, achieving a 5.0% macro F1 improvement over a baseline in the hardest generalization setting. A fine-tuned Llama-based model outperforms GPT-5-mini prompting by 2.1% macro F1 and trails GPT-5 by only 1.9%, demonstrating that structured intermediate representations improve rubric generalization.

Evaluation and Benchmarking When Rubrics Change: Cross-Rubric Generalization for Critical Thinking Essay Scoring Llama GPT-5.4 mini +1 more

5arXiv · cs.CL·Jul 10, 2026·source ↗

Benchmarking LLM judges for citation verification in deep-research systems

A new arXiv paper evaluates 8 LLM judges from 3 model families on citation quality assessment for deep-research systems, testing across 1,248 rubric decisions with human-reviewed gold labels. The study finds that cheaper models remain competitive with frontier models — GPT-5-mini achieves the strongest source-relevance F1 at 0.908 — but judges differ substantially in directional bias (pass-rate drift, false positive/negative rates) even when scalar F1 scores are similar. The key finding is that scalar F1 obscures biases that would be directly reinforced in an RL training loop, making judge calibration a prerequisite before using citation rubrics as reward signals.

Evaluation and Benchmarking Alignment and RLHF OpenAI GPT-5.4 mini Do You Need a Frontier Model as a Citation Verifier? Benchmarking Rubric LLMs for Deep-Research Source Attribution

5The Batch·Jul 3, 2026·source ↗

RoboReward: Vision-Language Reward Models for Robot Training via RL

Researchers at Stanford and UC Berkeley developed RoboReward, a family of 4B and 8B vision-language reward models designed to provide reward signals for robot reinforcement learning across diverse robot types and tasks. The team built a novel dataset by augmenting successful robot demonstrations with synthetically generated failure examples using GPT-5 mini and Qwen3-4B, then fine-tuned Qwen3-VL models to predict task progress scores. RoboReward 8B outperformed GPT-5, GPT-5 mini, and Gemini Robotics-ER 1.5 on the new RoboRewardBench evaluation, and in real-world robot trials substantially exceeded prior reward model baselines while still falling short of human-assigned rewards. The authors also release RoboRewardBench as a community benchmark for reward model evaluation.

Evaluation and Benchmarking Agent and Tool Ecosystem DeepLearning.AI Stanford University UC Berkeley +12 more

6Openai Release Notes·Jul 1, 2026·source ↗

OpenAI introduces GPT-5.4 mini in Codex for fast, efficient coding tasks

OpenAI has made GPT-5.4 mini available in Codex across the CLI, IDE extension, app, and API, positioning it as a faster and cheaper alternative to GPT-5.4 for lighter coding and subagent work. The model runs more than 2x faster than GPT-5.4 and consumes only 30% of the usage limits, allowing roughly 3.3x more work within the same quota. It improves over GPT-5 mini on coding, reasoning, image understanding, and tool use. Recommended use cases include codebase exploration, large-file review, and document processing, while GPT-5.4 remains preferred for complex planning and final judgment.

Frontier Model Releases Inference Economics OpenAI GPT-5.4 mini Codex +2 more

6Openai Release Notes·Jul 1, 2026·source ↗

OpenAI releases GPT-5.4 mini and GPT-5.4 nano to Chat Completions and Responses API

OpenAI has released two smaller variants of GPT-5.4 — GPT-5.4 mini and GPT-5.4 nano — to the Chat Completions and Responses API. GPT-5.4 mini targets high-volume workloads with GPT-5.4-class capabilities and supports tool search, built-in computer use, and compaction; GPT-5.4 nano is optimized for simple, high-volume, speed- and cost-sensitive tasks and supports only compaction. The release extends the GPT-5.4 family into smaller, more economical tiers for production deployment.

Frontier Model Releases Inference Economics OpenAI GPT-5.4 mini GPT-5.4 nano +2 more

5Openai Release Notes·Jul 1, 2026·source ↗

OpenAI rolls out GPT-5.4 mini in ChatGPT as reasoning fallback

OpenAI is deploying GPT-5.4 mini in ChatGPT, making it available to Free and Go users via the 'Thinking' feature and as a rate-limit fallback for GPT-5.4 Thinking for paid users. The model will not appear as a selectable option in the model picker, positioning it as a background capacity-management tool rather than a user-facing choice. Enterprise customers retain the option to set Auto routing to default to GPT-5.4 mini.

Frontier Model Releases Inference Economics ChatGPT GPT-5.4 Thinking OpenAI +1 more

4Openai Release Notes·Jul 1, 2026·source ↗

OpenAI deprecates older GPT-5 and Codex model variants in ChatGPT

OpenAI is removing six model variants from the ChatGPT model picker starting April 7, 2026, including gpt-5.2-codex, gpt-5.1-codex-mini, gpt-5.1-codex-max, gpt-5.1-codex, gpt-5.1, and gpt-5, with full removal from Codex on April 14. Users are directed toward gpt-5.4, gpt-5.4-mini, gpt-5.3-codex, and gpt-5.2 as the supported options, with gpt-5.3-codex-spark available to Pro subscribers. The update signals OpenAI consolidating its model lineup around newer GPT-5.x variants and retiring earlier iterations.

Frontier Model Releases Inference Economics GPT-5.3-Codex GPT-5.2 ChatGPT +5 more

6arXiv · cs.AI·Jun 24, 2026·source ↗

LLM-guided search framework discovers new quantum LDPC code families via structured concept evolution

Researchers introduce Structured Concept Evolution (SCE), a framework pairing an LLM with an algebraic mutation grammar to discover lifted-product quantum LDPC code families. The system evolves structured algebraic specifications rather than asking the LLM to design codes from scratch, enabling discovery of both abelian and non-abelian code families competitive with standard designs like bivariate-bicycle codes. Results are achieved using lightweight models (GPT-5.4-mini and GPT-5.4-nano), suggesting LLM-guided combinatorial search can be effective for hard discrete design problems in quantum error correction.

Agent and Tool Ecosystem OpenAI Structured Concept Evolution GPT-5.4 mini +1 more

6arXiv · cs.AI·Jun 10, 2026·source ↗

Frontier coding agents use metaprogramming to handle esoteric programming languages

A new arXiv paper evaluates six LLM-based coding agents on four esoteric programming languages (including Brainfuck and Befunge-98), finding that the strongest agents—Claude Opus 4.6 and GPT-5.4 xhigh—often avoid writing the target language directly, instead generating it via Python metaprograms. Forbidding this strategy causes large performance drops, and text guidance alone does not transfer the capability to weaker models, though sharing Opus-derived Python helper code does sharply improve mid-tier agents. The study reveals capability stratification that mainstream benchmarks like SWE-Bench Verified compress into narrow bands, suggesting frontier agents succeed by constructing and debugging working models of unfamiliar environments rather than pattern-matching to training data.

Frontier Model Releases Evaluation and Benchmarking Claude Sonnet 4 Claude Opus 4.6 SWE-Bench Verified +8 more

6The Batch·Jun 3, 2026·source ↗

Data Points: NemoClaw enterprise stack, GPT-5.4 mini/nano, Nemotron 3 Nano 4B, Midjourney V8, and Mamba-3

A multi-item roundup covers several AI developments: Nvidia unveiled NemoClaw at GTC 2026, an enterprise software stack integrating with OpenClaw to add security and governance for agentic deployments, with launch partners including Salesforce, Cisco, and CrowdStrike. OpenAI released GPT-5.4 mini and nano, smaller variants optimized for speed with benchmark results on SWE-Bench Pro and OSWorld-Verified, priced at $0.75 and $0.20 per million input tokens respectively. Nvidia also released Nemotron 3 Nano 4B, a hybrid Mamba-Transformer 4B parameter on-device model. Additional items cover Midjourney V8 alpha (5x faster, diffusion-only) and Mamba-3, a 1.5B state space model from CMU and Together.AI with improved accuracy over Mamba-2.

Frontier Model Releases Inference Economics Midjourney Mamba Carnegie Mellon University +19 more

8Openai Blog·May 19, 2026·source ↗

Introducing GPT-5.4 mini and nano

OpenAI has released GPT-5.4 mini and nano, smaller and faster variants of GPT-5.4 optimized for coding, tool use, multimodal reasoning, and high-volume API and sub-agent workloads. These models are positioned for efficiency-sensitive deployment scenarios including agentic pipelines. The release extends the GPT-5.4 family with tiered model options targeting different cost and latency tradeoffs.

Frontier Model Releases Inference Economics OpenAI GPT-5.4 mini GPT-5.4 nano +3 more

5Openai Blog·May 19, 2026·source ↗

Gradient Labs gives every bank customer an AI account manager

Gradient Labs is deploying AI agents for banking support workflows, powered by OpenAI's GPT-4.1 and GPT-5.4 mini and nano models. The system targets low latency and high reliability for automating customer-facing banking operations. This represents a concrete enterprise deployment of frontier OpenAI models in a regulated financial services context.

Inference Economics Enterprise Deployment Patterns Gradient Labs OpenAI GPT-4.1 +3 more