3arXiv cs.LG (Machine Learning)·10h ago

Physics-informed Fourier-wavelet transformer improves multiscale CFD surrogate modeling

A new arXiv preprint introduces a physics-informed Fourier-wavelet transformer for next-step velocity-field reconstruction in computational fluid dynamics, combining hybrid spectral encoding with PDE-residual-guided self-attention and self-supervised pretraining. The model is evaluated on cylinder-wake and fluid-structure interaction benchmarks, achieving best-in-class normalized mean-squared error on both tasks and stronger recovery of localized flow structures compared to spectral, transformer, and physics-informed neural network baselines. The work targets the persistent gap between global flow pattern accuracy and fine-grained multiscale structure recovery in surrogate models.

A Physics-Informed Fourier-Wavelet Transformer for Multiscale Computational Fluid Dynamics Surrogate Modeling

Related events (8)

6The Batch·22d ago·source ↗

Walrus: A 1.3B-Parameter General Transformer for Fluid Dynamics Simulation

Polymathic AI Collaboration released Walrus, a 1.3 billion-parameter transformer model that simulates fluids, gases, and plasmas across 19 physical domains, outperforming prior specialized physics models. The model addresses aliasing artifacts in transformers—errors that compound at specific spatial locations over time—by randomly jittering input data at each time step before encoding, distributing errors rather than allowing accumulation. Walrus achieved lowest VRMSE in 18 of 19 domains for one-step predictions, reducing error by 63.6% on average versus best competing models. The jittering technique may generalize to vision and video transformer architectures where similar pixelation artifacts occur.

Frontier Model Releases Evaluation and Benchmarking temporal jittering Polymathic AI Collaboration VRMSE +6 more

5arXiv · cs.CL·34h ago·source ↗

Roofline-inspired scaling model predicts Transformer fine-tuning energy consumption across GPU configurations

A new arXiv preprint presents a framework for modeling energy consumption during Transformer training on multiple GPUs, using BERT architectural sweeps to relate measured energy to proxies for compute, memory traffic, and hardware efficiency. The approach adapts roofline modeling with a speedup-based hardware-efficiency factor that accounts for tensor parallelism and fully sharded data parallelism. The resulting scaling law accurately predicts training energy across heterogeneous configurations, targeting sustainable and cost-aware system design.

Training Infrastructure Inference Economics The Energy Consumption of Transformer Fine-Tuning: A Roofline-Inspired Scaling Model BERT

6arXiv · cs.CL·20d ago·source ↗

Dynamic short convolutions yield 1.33–1.60× compute advantage over standard Transformers

A new arXiv preprint introduces dynamic short convolutions as an architectural primitive for Transformers, using input-dependent filters to combine locality bias with increased expressivity. Experiments across 150M–2B parameter language models show consistent perplexity improvements over standard Transformers and static convolution variants, with scaling-law fits indicating a 1.33× compute advantage when applied to key/query/value vectors and 1.60× when added after every linear layer. The technique also improves linear RNNs (Mamba-2, Gated DeltaNet) and mixture-of-experts architectures, with custom Triton kernels making training practical.

Training Infrastructure Frontier Model Releases Triton Mamba Gated DeltaNet-2 +1 more

5arXiv · cs.CL·1mo ago·source ↗

Conditional Scale Entropy: A Wavelet-Derived Tool for Mechanistic Interpretability of Metaphor Processing in Transformers

This paper introduces Conditional Scale Entropy (CSE), a wavelet-derived measure of how transformer computation engages across frequency scales at each layer, and applies it to study metaphor processing in decoder-only language models. The authors prove CSE is invariant to update magnitude, isolating structural computation patterns from intensity. Across architectures ranging from GPT-2 (124M) to LLaMA-2 7B and GPT-oss 20B, metaphorical tokens consistently produce higher spectral breadth than literal tokens in early-to-mid layers, with the effect surviving permutation correction and specificity controls. The work establishes multi-scale coordination as a consistent mechanistic signature of metaphorical language processing and positions CSE as a general interpretability tool for cross-depth structure in transformers.

Evaluation and Benchmarking AI Safety Research Conditional Scale Entropy mechanistic interpretability GPT-2 +3 more

6arXiv · cs.AI·7d ago·source ↗

Looped World Models introduce iterative latent depth as a new scaling axis for world simulation

A new arXiv preprint introduces Looped World Models (LoopWM), a parameter-shared transformer architecture that iteratively refines latent environment states to achieve up to 100x parameter efficiency over conventional world models. The approach uses adaptive computation to scale depth dynamically per prediction step, addressing the tension between long-horizon simulation fidelity and deployment cost. The authors position iterative latent depth as a new scaling axis orthogonal to model size and training data.

Training Infrastructure Frontier Model Releases Looped World Models LoopWM +2 more

6arXiv · cs.LG·1mo ago·source ↗

SURGE: Approximation-free Training-Free Particle Filter for Diffusion Surrogate

The paper introduces URGE (Unbiased Resampling via Girsanov Estimation), a derivative-free inference-time scaling algorithm for diffusion models that performs path-wise importance reweighting using a Girsanov change of measure. Unlike existing inference-time guidance methods, URGE requires no score, Hessian, or PDE evaluations, attaching multiplicative weights to simulated trajectories and periodically resampling. The authors establish a theoretical equivalence between path-wise and particle-wise sequential Monte Carlo (SMC), guaranteeing unbiased terminal distributions. Empirically, URGE outperforms existing inference-time guidance baselines on synthetic tests and diffusion-model benchmarks while being simpler to implement.

Frontier Model Releases Inference Economics diffusion-based generative models URGE (Unbiased Resampling via Girsanov Estimation)Girsanov change of measure +1 more

4arXiv · cs.CL·34h ago·source ↗

Energy-based transformers as unified predictors of reading difficulty in computational psycholinguistics

A new arXiv preprint introduces energy-based transformer measures as predictors of human reading difficulty, evaluated across three reading-time corpora (Natural Stories, UCL eye-tracking, UCL self-paced reading). The energy measure outperforms surprisal alone and appears to subsume both surprisal and attention entropy effects, suggesting it could serve as a single unified predictor. The work connects transformer language models to Hopfield networks and dense associative memory literature, marking the first application of energy-based transformer measures in computational psycholinguistics.

Evaluation and Benchmarking Natural Stories Energy-Based Transformers as Predictors of Reading Difficulty Hopfield Networks

5arXiv · cs.LG·1mo ago·source ↗

Dynamics-Level Watermarking of Flow Matching Models with Random Codes

This paper proposes embedding watermarks directly into the velocity field (continuous dynamics) of flow matching generative models, rather than into weights or outputs. The method uses key-dependent perturbations added during training, formulated as random coding over a continuous channel, allowing black-box message recovery at detection time. The perturbation is designed to leave the generated distribution unchanged. Experiments on MNIST and CIFAR-10 demonstrate reliable message recovery, preserved generation quality, and chance-level decoding without the secret key.

Evaluation and Benchmarking AI Safety Research MNIST CIFAR-10 Random Coding +2 more