Physics-informed Fourier-wavelet transformer improves multiscale CFD surrogate modeling
A new arXiv preprint introduces a physics-informed Fourier-wavelet transformer for next-step velocity-field reconstruction in computational fluid dynamics, combining hybrid spectral encoding with PDE-residual-guided self-attention and self-supervised pretraining. The model is evaluated on cylinder-wake and fluid-structure interaction benchmarks, achieving best-in-class normalized mean-squared error on both tasks and stronger recovery of localized flow structures compared to spectral, transformer, and physics-informed neural network baselines. The work targets the persistent gap between global flow pattern accuracy and fine-grained multiscale structure recovery in surrogate models.
Related events (8)
Walrus: A 1.3B-Parameter General Transformer for Fluid Dynamics Simulation
Polymathic AI Collaboration released Walrus, a 1.3 billion-parameter transformer model that simulates fluids, gases, and plasmas across 19 physical domains, outperforming prior specialized physics models. The model addresses aliasing artifacts in transformers—errors that compound at specific spatial locations over time—by randomly jittering input data at each time step before encoding, distributing errors rather than allowing accumulation. Walrus achieved lowest VRMSE in 18 of 19 domains for one-step predictions, reducing error by 63.6% on average versus best competing models. The jittering technique may generalize to vision and video transformer architectures where similar pixelation artifacts occur.
Roofline-inspired scaling model predicts Transformer fine-tuning energy consumption across GPU configurations
A new arXiv preprint presents a framework for modeling energy consumption during Transformer training on multiple GPUs, using BERT architectural sweeps to relate measured energy to proxies for compute, memory traffic, and hardware efficiency. The approach adapts roofline modeling with a speedup-based hardware-efficiency factor that accounts for tensor parallelism and fully sharded data parallelism. The resulting scaling law accurately predicts training energy across heterogeneous configurations, targeting sustainable and cost-aware system design.
Dynamic short convolutions yield 1.33–1.60× compute advantage over standard Transformers
A new arXiv preprint introduces dynamic short convolutions as an architectural primitive for Transformers, using input-dependent filters to combine locality bias with increased expressivity. Experiments across 150M–2B parameter language models show consistent perplexity improvements over standard Transformers and static convolution variants, with scaling-law fits indicating a 1.33× compute advantage when applied to key/query/value vectors and 1.60× when added after every linear layer. The technique also improves linear RNNs (Mamba-2, Gated DeltaNet) and mixture-of-experts architectures, with custom Triton kernels making training practical.
Conditional Scale Entropy: A Wavelet-Derived Tool for Mechanistic Interpretability of Metaphor Processing in Transformers
This paper introduces Conditional Scale Entropy (CSE), a wavelet-derived measure of how transformer computation engages across frequency scales at each layer, and applies it to study metaphor processing in decoder-only language models. The authors prove CSE is invariant to update magnitude, isolating structural computation patterns from intensity. Across architectures ranging from GPT-2 (124M) to LLaMA-2 7B and GPT-oss 20B, metaphorical tokens consistently produce higher spectral breadth than literal tokens in early-to-mid layers, with the effect surviving permutation correction and specificity controls. The work establishes multi-scale coordination as a consistent mechanistic signature of metaphorical language processing and positions CSE as a general interpretability tool for cross-depth structure in transformers.
Looped World Models introduce iterative latent depth as a new scaling axis for world simulation
A new arXiv preprint introduces Looped World Models (LoopWM), a parameter-shared transformer architecture that iteratively refines latent environment states to achieve up to 100x parameter efficiency over conventional world models. The approach uses adaptive computation to scale depth dynamically per prediction step, addressing the tension between long-horizon simulation fidelity and deployment cost. The authors position iterative latent depth as a new scaling axis orthogonal to model size and training data.
SURGE: Approximation-free Training-Free Particle Filter for Diffusion Surrogate
The paper introduces URGE (Unbiased Resampling via Girsanov Estimation), a derivative-free inference-time scaling algorithm for diffusion models that performs path-wise importance reweighting using a Girsanov change of measure. Unlike existing inference-time guidance methods, URGE requires no score, Hessian, or PDE evaluations, attaching multiplicative weights to simulated trajectories and periodically resampling. The authors establish a theoretical equivalence between path-wise and particle-wise sequential Monte Carlo (SMC), guaranteeing unbiased terminal distributions. Empirically, URGE outperforms existing inference-time guidance baselines on synthetic tests and diffusion-model benchmarks while being simpler to implement.
Energy-based transformers as unified predictors of reading difficulty in computational psycholinguistics
A new arXiv preprint introduces energy-based transformer measures as predictors of human reading difficulty, evaluated across three reading-time corpora (Natural Stories, UCL eye-tracking, UCL self-paced reading). The energy measure outperforms surprisal alone and appears to subsume both surprisal and attention entropy effects, suggesting it could serve as a single unified predictor. The work connects transformer language models to Hopfield networks and dense associative memory literature, marking the first application of energy-based transformer measures in computational psycholinguistics.
Dynamics-Level Watermarking of Flow Matching Models with Random Codes
This paper proposes embedding watermarks directly into the velocity field (continuous dynamics) of flow matching generative models, rather than into weights or outputs. The method uses key-dependent perturbations added during training, formulated as random coding over a continuous channel, allowing black-box message recovery at detection time. The perturbation is designed to leave the generated distribution unchanged. Experiments on MNIST and CIFAR-10 demonstrate reliable message recovery, preserved generation quality, and chance-level decoding without the secret key.