Researchers introduce LOTUS (Looped Transformers with parallel supervision on latents), a latent chain-of-thought method that processes reasoning steps in hidden states rather than decoded tokens. LOTUS is claimed to be the first latent-CoT approach to match explicit CoT performance at the 3B parameter scale, while reducing thought-phase latency by 2.5x–6.9x. The method uses a looped (recurrent-depth) Transformer backbone with parallel cross-entropy supervision on gold CoT-step tokens at each latent position, and the latent space is shown to be interpretable by projecting through the base LM head to recover reasoning steps.
Researchers propose NF-CoT, a latent reasoning framework that replaces discrete chain-of-thought token streams with continuous intermediate states modeled by normalizing flows embedded inside an LLM backbone. The approach uses a TARFlow-style normalizing flow head alongside the standard language model head, enabling exact likelihoods, KV-cache-compatible left-to-right decoding, and policy-gradient optimization in latent space. On code-generation benchmarks, NF-CoT improves pass rates over both explicit CoT and prior latent-reasoning baselines while reducing intermediate reasoning cost. The work addresses a key limitation of existing latent reasoning methods, which typically sacrifice probabilistic tractability or autoregressive compatibility.
This paper analyzes Latent Chain-of-Thought (CoT) reasoning — where reasoning occurs in continuous hidden states rather than discrete text — through an information-theoretic lens, identifying a 'dual collapse' failure mode involving gradient attenuation and representational drift. The authors decompose process supervision into Trajectory Supervision and Space Supervision, and introduce the Unified Latent Probe (ULP) to quantify mutual information between latent trajectories and explicit reasoning steps. Experiments reveal an 'Information-Performance Binding' showing reasoning accuracy depends on information fidelity in the latent chain, suggesting supervision should shift from geometric imitation toward mutual information maximization.
Researchers propose Implicit Visual Chain-of-Thought (IV-CoT), a latent visual reasoning framework that decomposes visual conditioning queries into a structural-to-semantic cascade for text-to-image generation. The method uses training-only sketch supervision to guide structural queries without requiring sketch extraction at inference time, enabling implicit CoT reasoning in a single forward pass. IV-CoT achieves improved results on GenEval and T2I-CompBench benchmarks, targeting persistent weaknesses in multimodal LLMs around object counts, spatial relations, and attribute binding.
A new arXiv preprint introduces the concept of a 'commitment boundary' in chain-of-thought reasoning — a sharp transition point where a model's answer stabilizes, after which subsequent reasoning steps are 'epiphenomenal' and causally inert. The authors use early-exit probing and attention probes to detect this boundary, finding it can be linearly decoded from intermediate steps and generalizes across tasks. Exploiting this signal to exit reasoning blocks at the commitment boundary reduces CoT length by up to 55% on average with negligible performance loss, with direct implications for inference efficiency in large reasoning models.
OpenAI introduces CoT-Control, a framework for evaluating how well reasoning models can deliberately manipulate or suppress their chain-of-thought outputs. The finding that models struggle to control their CoT is framed as a positive safety property, reinforcing the argument that visible reasoning traces serve as a meaningful monitorability safeguard. This contributes to ongoing research on whether chain-of-thought transparency is a reliable alignment and oversight tool.
Researchers introduce IS-CoT (Interleaved Structural Chain-of-Thought), a framework that embeds a dynamic Plan-Write-Reflect cycle into LLM generation to overcome severe length collapse observed in reasoning-enhanced models for open-ended writing tasks beyond 2,000 words. The authors construct a multi-teacher training dataset of interleaved reasoning traces and train IS-Writer-8B, which achieves state-of-the-art results on LongBench-Write, outperforming DeepSeek-V3.2 by 3.08 points. The work identifies static hierarchical planning as a root cause of long-form degradation and proposes an in-model alternative to external agentic workflows.
RiM introduces a latent reasoning method that replaces autoregressive chain-of-thought token generation with fixed sequences of special 'memory block' tokens, allowing LLMs to perform internal computation without externalizing intermediate steps. These memory blocks are processed in a single forward pass rather than generated autoregressively, improving compute efficiency at test time. Training uses a two-stage curriculum: first grounding memory blocks by predicting explicit reasoning steps, then discarding step-level supervision and refining answers iteratively. Experiments across multiple model families and sizes show RiM matches or exceeds existing latent reasoning methods.
ATLAS proposes a framework where a single discrete 'functional token' serves dual roles as both an agentic operation trigger and a latent visual reasoning unit in multimodal models. This design avoids the computational cost of generating intermediate images while sidestepping the context-switching latency of external tool calls and the generalization limitations of pure latent methods. The framework is compatible with standard SFT and RL training pipelines without architectural changes, and introduces Latent-Anchored GRPO (LA-GRPO) to stabilize reinforcement learning when functional tokens are sparse. Experiments show strong performance on visual reasoning benchmarks with maintained interpretability.