Thinking with images
OpenAI announced a new capability allowing its reasoning models to incorporate images directly into their chain-of-thought process, enabling visual reasoning during intermediate thinking steps rather than only at input/output boundaries. This extends multimodal reasoning to the internal computation layer, potentially improving performance on tasks requiring visual analysis combined with multi-step reasoning. The announcement comes from OpenAI's official blog, indicating a product-level capability update.
Related guides (3)
Related events (8)
ETCHR: Decoupled Image Editing for Visual Chain-of-Thought Reasoning in MLLMs
ETCHR introduces a question-conditioned, reasoning-aware image editing model that decouples visual transformation from downstream understanding in multimodal LLMs. It addresses two identified gaps—language-side (mapping abstract questions to visual edits) and generation-side (edit quality degrading with reasoning depth)—via a two-stage training recipe combining supervised fine-tuning on edit trajectories and VLM-derived reward signals. Because the editor is decoupled, it plugs into arbitrary MLLMs without retraining, yielding Pass@1 gains of roughly +4.6 to +5.5 points across five task families when paired with Qwen3-VL-8B, Gemini-3.1-Flash-Lite, and Kimi K2.5. The work advances the 'think with images' paradigm beyond fixed toolkits and unified multimodal approaches.
QVQ-Max: Alibaba Qwen Releases Visual Reasoning Model with Multimodal Chain-of-Thought
Alibaba's Qwen team has officially released QVQ-Max, a visual reasoning model succeeding the December 2024 QVQ-72B-Preview. The model is designed to analyze and reason over images and videos, covering domains including mathematics, programming, and creative tasks. It represents a step beyond the exploratory preview, positioning as a production-grade multimodal reasoning system.
Learning to Reason with LLMs
OpenAI announced a new model or capability focused on reasoning in large language models, published on September 12, 2024. The post, hosted on the OpenAI blog, describes advances in training LLMs to perform complex multi-step reasoning. This likely corresponds to the release of the o1 (formerly 'Strawberry') model series, which uses chain-of-thought reasoning trained via reinforcement learning to achieve significantly improved performance on math, science, and coding benchmarks.
OpenAI o3 and o4-mini System Card
OpenAI has published the system card for its o3 and o4-mini models, which combine advanced reasoning capabilities with a full suite of integrated tools including web browsing, Python execution, image and file analysis, image generation, canvas, automations, file search, and memory. The system card documents safety evaluations and deployment considerations for these frontier reasoning models. This represents a significant capability expansion over prior o-series models by natively integrating tool use alongside chain-of-thought reasoning.
OneReason: Activating Chain-of-Thought Reasoning in Generative Recommendation Models
Researchers from the OneRec team introduce OneReason, a framework for enabling reasoning capabilities in generative recommendation models deployed across short-video, live-streaming, advertising, and e-commerce. The work identifies a key failure mode — that naive thinking-mode integration does not outperform non-thinking baselines — and diagnoses this as a deficit in two factors: itemic token perception and user behavior cognition. The proposed solution combines perception-focused pre-training, a three-level cognition-enhanced CoT format for supervised fine-tuning, and a specialize-then-unify RL training recipe.
Reasoning models struggle to control their chains of thought, and that's good
OpenAI introduces CoT-Control, a framework for evaluating how well reasoning models can deliberately manipulate or suppress their chain-of-thought outputs. The finding that models struggle to control their CoT is framed as a positive safety property, reinforcing the argument that visible reasoning traces serve as a meaningful monitorability safeguard. This contributes to ongoing research on whether chain-of-thought transparency is a reliable alignment and oversight tool.
Introducing OpenAI o1
OpenAI announced o1, a new series of AI models designed to spend more time 'thinking' before responding, using chain-of-thought reasoning to tackle complex problems in science, coding, and mathematics. The o1-preview and o1-mini models are being released, with o1-preview representing the most capable version and o1-mini offering a faster, cheaper alternative optimized for coding and reasoning tasks. OpenAI claims o1-preview ranks in the 89th percentile on competitive programming problems and performs at a PhD level on science benchmarks. This release marks a significant shift in OpenAI's approach to scaling, moving from purely training-time compute to inference-time compute as a new axis of capability improvement.
ATLAS: Unified Agentic and Latent Visual Reasoning via Functional Tokens
ATLAS proposes a framework where a single discrete 'functional token' serves dual roles as both an agentic operation trigger and a latent visual reasoning unit in multimodal models. This design avoids the computational cost of generating intermediate images while sidestepping the context-switching latency of external tool calls and the generalization limitations of pure latent methods. The framework is compatible with standard SFT and RL training pipelines without architectural changes, and introduces Latent-Anchored GRPO (LA-GRPO) to stabilize reinforcement learning when functional tokens are sparse. Experiments show strong performance on visual reasoning benchmarks with maintained interpretability.


