Researchers propose replacing the standard transformer feed-forward sublayer with explicit fuzzy set operations (intersection and set-difference), creating a negation-capable FFN (NC-FFN) whose hidden units carry interpretable logical form. At 125M scale on OpenWebText, NC-FFN matches GELU baseline perplexity while remaining legible by construction. Adding soft sequence quantifiers with learned forgetting rates recovers grammatical licensing deficits and produces units that detectably fire on grammatical licensors (comparatives, passive participles, negative-polarity items) without dictionary learning. The work advances mechanistic interpretability by providing a parameter-neutral architecture whose computations are readable as grammatical mechanisms.
Researchers propose NF-CoT, a latent reasoning framework that replaces discrete chain-of-thought token streams with continuous intermediate states modeled by normalizing flows embedded inside an LLM backbone. The approach uses a TARFlow-style normalizing flow head alongside the standard language model head, enabling exact likelihoods, KV-cache-compatible left-to-right decoding, and policy-gradient optimization in latent space. On code-generation benchmarks, NF-CoT improves pass rates over both explicit CoT and prior latent-reasoning baselines while reducing intermediate reasoning cost. The work addresses a key limitation of existing latent reasoning methods, which typically sacrifice probabilistic tractability or autoregressive compatibility.
A new arXiv preprint analyzes how well large language models handle negation from two angles: behavioral systematicity (whether models correctly recognize negation expressions and scope) and representational systematicity (whether function vectors can be reliably constructed from in-context examples). Results show LLMs partially succeed at negation cue recognition via in-context learning but struggle with scope recognition, with performance varying by output format. Function vectors can be composed for cue extraction but are harder to extract for scope recognition tasks.
Researchers introduce FMLM+, a framework combining Flow Map Language Models with masking-style noise schedules to enable joint sequence generation with per-token global consistency scoring. The key contribution is Posterior Refinement, an inference-time self-correction strategy that matches discrete baseline performance with 32x fewer neural function evaluations (NFEs). The approach improves the speed-quality tradeoff over both Masked Diffusion Models and standard FLMMs across multiple benchmarks, addressing longstanding factorization error problems in non-autoregressive generation.
Researchers introduce FPRM, a Transformer-based Fixed-Point Reasoning Model that uses fixed-point convergence as a halting mechanism in looped architectures, addressing signal propagation problems through pre-norm layers and residual scaling. Looped architectures provide inductive bias for compositional reasoning, but suffer from depth-induced signal degradation when halting is deferred; FPRM resolves this while enabling compute to scale with task difficulty. The model is evaluated on Sudoku, Maze, state-tracking, and ARC-AGI benchmarks. This contributes to the growing body of work on adaptive-compute and iterative-refinement architectures for reasoning.
A new arXiv preprint proposes a framework for making transformer-based speech cognitive impairment detection clinically interpretable by combining SHAP token attribution, linguistic feature analysis, and a four-stage LLM reasoning pipeline using LLaMA-3.1-70B-Instruct. The system is built on the SpeechCARE-Adaptive Gating Network multimodal model (F1=72.11% on NIA PREPARE) and maps outputs to four cognitive-linguistic dimensions. Physician evaluation on 70 samples showed strong alignment with clinical profiles and a System Usability Scale score of 82/100, suggesting practical clinical workflow integration potential.
The paper introduces Weave of Formal Thought (WoFT), a framework combining a formally sound-and-complete constrained decoder for code generation with a latent-variable fine-tuning method that teaches LLMs to interleave grammar non-terminals during generation. The constrained decoder extends generalized LR (GLR) parsing with speculative lexing to handle context-sensitive lexing and maximal-munch tokenization, addressing gaps in prior constrained-decoding work. A reweighted wake-sleep (RWS) fine-tuning objective on StarCoder2-3B achieves a 14.3% relative reduction in per-token cross-entropy over a text-only SFT baseline on Python, suggesting that explicit structural scaffolding recovers information lost in flat autoregressive training.
Hugging Face announces native integration of AutoGPTQ into the transformers library, enabling 4-bit quantized inference for large language models. The integration allows users to load and run GPTQ-quantized models directly through the standard transformers API with minimal code changes. This lowers the hardware barrier for deploying LLMs by significantly reducing VRAM requirements while maintaining competitive performance.
Researchers introduce LOTUS (Looped Transformers with parallel supervision on latents), a latent chain-of-thought method that processes reasoning steps in hidden states rather than decoded tokens. LOTUS is claimed to be the first latent-CoT approach to match explicit CoT performance at the 3B parameter scale, while reducing thought-phase latency by 2.5x–6.9x. The method uses a looped (recurrent-depth) Transformer backbone with parallel cross-entropy supervision on gold CoT-step tokens at each latent position, and the latent space is shown to be interpretable by projecting through the base LM head to recover reasoning steps.