4arXiv cs.LG (Machine Learning)·8d ago

Theoretical analysis of truncated positional encodings for graph neural networks

A new arXiv paper initiates a formal study of truncated positional encodings (PEs) for graph neural networks, showing that truncation breaks the theoretical equivalence between spectral and walk-based PE families. Key findings include that truncated spectral PEs lose their advantage over the 1-WL expressivity test, and that k-harmonic distances differ meaningfully from other closely related truncated spectral PEs. Experiments on real-world datasets suggest mixing truncated PE families outperforms any single family.

Evaluation and Benchmarking Understanding Truncated Positional Encodings for Graph Neural Networks

Related guides (1)

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

4Hugging Face Blog·1mo ago·source ↗

You Could Have Designed State of the Art Positional Encoding

A Hugging Face blog post walks through the design space of positional encoding for transformer models, building intuition for why modern schemes like RoPE emerged. The post takes a pedagogical approach, showing how one could derive state-of-the-art positional encoding from first principles. It covers the evolution from absolute to relative positional encodings and the properties that make certain schemes preferable for long-context generalization.

Long Context Evolution Transformers Rotary Position Embedding (RoPE)Positional Encoding +1 more

6arXiv · cs.LG·19d ago·source ↗

Positional vs. Symbolic Attention Heads: Learning Dynamics, RoPE Geometry, and Length Generalization

Researchers train a decoder-only Transformer (GPT-J) on two structurally equivalent multi-hop reasoning tasks to study how attention heads specialize into positional or symbolic roles during learning. They find that successful task learning correlates with the emergence of 'pure' heads—exclusively positional or symbolic—and provide theoretical constructions showing how single-layer RoPE-based attention realizes these functions geometrically. A novel 'discrepancy' metric formalizes the robustness difference between the two head types, with symbolic mechanisms shown to extrapolate more reliably to longer sequences than positional ones. The findings have implications for understanding length generalization failures in RoPE-based models.

Long Context Evolution Evaluation and Benchmarking Transformers multi-hop reasoning Rotary Position Embedding (RoPE)+5 more

3arXiv · cs.CL·12d ago·source ↗

Comparative study of semantic geometry in transformer embeddings vs. graph-based lexical models

A preprint from arXiv compares the geometric and topological properties of transformer-based vector embeddings (CamemBERT) against lexical co-occurrence graphs for representing semantic structure. Applied to a French civic debate corpus, the study finds similar local topology but divergent global structure between the two approaches. The authors argue graph-based models offer more interpretable semantic organization and suggest graphs could guide neural architectures toward more stable, interpretable convergence.

Evaluation and Benchmarking CamemBERT Geometry of Semantic Space: Comparative Study of Discrete and Continuous Models

5arXiv · cs.AI·1mo ago·source ↗

Beyond Isotropy in JEPAs: Hamiltonian Geometry and Symplectic Prediction

This paper critiques the standard practice of regularizing Joint-Embedding Predictive Architecture (JEPA) encoders toward isotropic Gaussian marginals, showing that this Euclidean symmetry assumption incurs a quantifiable 'price of isotropy' and that no geometry-independent fixed marginal target is universally canonical. The authors prove that oracle one-view marginals do not identify the view-to-view predictive coupling, arguing structural bias should enter the cross-view coupling instead. They introduce HamJEPA, which encodes views as phase-space states and uses a learned Hamiltonian leapfrog map for view-to-view prediction, with symplectic coupling identified as the key driver of gains. HamJEPA outperforms SIGReg on CIFAR-100 by up to +6.45 kNN@20 and +10.64 linear-probe points at 80 epochs, with similar improvements on ImageNet-100.

Evaluation and Benchmarking Alignment and RLHF ImageNet-100 HamJEPA CIFAR-100 +4 more

4arXiv · cs.LG·18d ago·source ↗

Expressivity Limits of Congruence-Based Architectures for Neural Networks on Positive-Definite Matrices

This paper analyzes neural network architectures designed to classify symmetric positive-definite (SPD) matrices, focusing on congruence-like layers as used in SPDNet. The authors prove that imposing semi-orthogonality constraints on weight matrices limits expressivity, causing deep architectures to collapse to single-hidden-layer equivalents due to spectral diversity loss—a consequence of Poincaré's separation theorem. The work also compares Riemannian classifiers for compatibility with congruence-based feature maps.

Evaluation and Benchmarking congruence-based layers SPDNet Poincaré separation theorem +2 more

5arXiv · cs.AI·9d ago·source ↗

Reroute: Training-free recoverable visual token routing for vision-language models

A new arXiv preprint proposes Reroute, a training-free plug-in that replaces the standard rank-and-remove visual token pruning paradigm in VLMs with a recoverable routing mechanism. Instead of permanently discarding low-ranked tokens, Reroute defers them to re-enter the candidate pool at later decoder stages, addressing the problem that token importance shifts across decoder depth. Evaluated on LLaVA-1.5 and Qwen backbones augmented with FastV, PDrop, and Nüwa pruning methods, Reroute improves grounding performance under aggressive token reduction without sacrificing general VQA accuracy. The approach preserves the theoretical compute and KV-cache budget of the underlying pruning method.

Inference Economics Multimodal Progress FastV PDrop Qwen +4 more

6arXiv · cs.AI·1mo ago·source ↗

Graft: Hybrid Tree Construction for Speculative Decoding via Prune-Then-Retrieve

Graft is a training-free framework that improves speculative decoding by coupling dynamic-depth pruning with retrieval-based token compensation. Pruning reduces VRAM and compute overhead while freeing budget for retrieval, which fills topological gaps in the draft tree with near-zero additional cost. On short-context benchmarks, Graft achieves up to 5.41× speedup and improves average speedup over EAGLE-3 by up to 21.8% on Qwen3-235B. The method is evaluated across short- and long-context settings and extended to block-drafting paradigms.

Frontier Model Releases Inference Economics speculative decoding DFlash EAGLE-3 +2 more

3arXiv · cs.LG·2d ago·source ↗

P-K-GCN: Physics-augmented Koopman-enhanced Graph Convolutional Network for spatiotemporal super-resolution

Researchers propose P-K-GCN, a framework combining graph convolutional networks, Koopman operator theory, and physics-informed loss functions for spatiotemporal super-resolution on irregular geometries. The method linearizes nonlinear dynamics in a latent space and enforces physical constraints to improve reconstruction fidelity. Theoretical analysis claims guaranteed error reduction via Rademacher complexity bounds. The framework is evaluated on reconstructing high-resolution cardiac electrodynamics from sparse 3D heart geometry measurements.

P-K-GCN