Almanac
← Events
4arXiv cs.LG (Machine Learning)·8d ago

Theoretical analysis of truncated positional encodings for graph neural networks

A new arXiv paper initiates a formal study of truncated positional encodings (PEs) for graph neural networks, showing that truncation breaks the theoretical equivalence between spectral and walk-based PE families. Key findings include that truncated spectral PEs lose their advantage over the 1-WL expressivity test, and that k-harmonic distances differ meaningfully from other closely related truncated spectral PEs. Experiments on real-world datasets suggest mixing truncated PE families outperforms any single family.

Related guides (1)

Related events (8)

4Hugging Face Blog·1mo ago·source ↗

You Could Have Designed State of the Art Positional Encoding

A Hugging Face blog post walks through the design space of positional encoding for transformer models, building intuition for why modern schemes like RoPE emerged. The post takes a pedagogical approach, showing how one could derive state-of-the-art positional encoding from first principles. It covers the evolution from absolute to relative positional encodings and the properties that make certain schemes preferable for long-context generalization.

6arXiv · cs.LG·19d ago·source ↗

Positional vs. Symbolic Attention Heads: Learning Dynamics, RoPE Geometry, and Length Generalization

Researchers train a decoder-only Transformer (GPT-J) on two structurally equivalent multi-hop reasoning tasks to study how attention heads specialize into positional or symbolic roles during learning. They find that successful task learning correlates with the emergence of 'pure' heads—exclusively positional or symbolic—and provide theoretical constructions showing how single-layer RoPE-based attention realizes these functions geometrically. A novel 'discrepancy' metric formalizes the robustness difference between the two head types, with symbolic mechanisms shown to extrapolate more reliably to longer sequences than positional ones. The findings have implications for understanding length generalization failures in RoPE-based models.

3arXiv · cs.CL·12d ago·source ↗

Comparative study of semantic geometry in transformer embeddings vs. graph-based lexical models

A preprint from arXiv compares the geometric and topological properties of transformer-based vector embeddings (CamemBERT) against lexical co-occurrence graphs for representing semantic structure. Applied to a French civic debate corpus, the study finds similar local topology but divergent global structure between the two approaches. The authors argue graph-based models offer more interpretable semantic organization and suggest graphs could guide neural architectures toward more stable, interpretable convergence.

5arXiv · cs.AI·1mo ago·source ↗

Beyond Isotropy in JEPAs: Hamiltonian Geometry and Symplectic Prediction

This paper critiques the standard practice of regularizing Joint-Embedding Predictive Architecture (JEPA) encoders toward isotropic Gaussian marginals, showing that this Euclidean symmetry assumption incurs a quantifiable 'price of isotropy' and that no geometry-independent fixed marginal target is universally canonical. The authors prove that oracle one-view marginals do not identify the view-to-view predictive coupling, arguing structural bias should enter the cross-view coupling instead. They introduce HamJEPA, which encodes views as phase-space states and uses a learned Hamiltonian leapfrog map for view-to-view prediction, with symplectic coupling identified as the key driver of gains. HamJEPA outperforms SIGReg on CIFAR-100 by up to +6.45 kNN@20 and +10.64 linear-probe points at 80 epochs, with similar improvements on ImageNet-100.

4arXiv · cs.LG·18d ago·source ↗

Expressivity Limits of Congruence-Based Architectures for Neural Networks on Positive-Definite Matrices

This paper analyzes neural network architectures designed to classify symmetric positive-definite (SPD) matrices, focusing on congruence-like layers as used in SPDNet. The authors prove that imposing semi-orthogonality constraints on weight matrices limits expressivity, causing deep architectures to collapse to single-hidden-layer equivalents due to spectral diversity loss—a consequence of Poincaré's separation theorem. The work also compares Riemannian classifiers for compatibility with congruence-based feature maps.

5arXiv · cs.AI·9d ago·source ↗

Reroute: Training-free recoverable visual token routing for vision-language models

A new arXiv preprint proposes Reroute, a training-free plug-in that replaces the standard rank-and-remove visual token pruning paradigm in VLMs with a recoverable routing mechanism. Instead of permanently discarding low-ranked tokens, Reroute defers them to re-enter the candidate pool at later decoder stages, addressing the problem that token importance shifts across decoder depth. Evaluated on LLaVA-1.5 and Qwen backbones augmented with FastV, PDrop, and Nüwa pruning methods, Reroute improves grounding performance under aggressive token reduction without sacrificing general VQA accuracy. The approach preserves the theoretical compute and KV-cache budget of the underlying pruning method.

6arXiv · cs.AI·1mo ago·source ↗

Graft: Hybrid Tree Construction for Speculative Decoding via Prune-Then-Retrieve

Graft is a training-free framework that improves speculative decoding by coupling dynamic-depth pruning with retrieval-based token compensation. Pruning reduces VRAM and compute overhead while freeing budget for retrieval, which fills topological gaps in the draft tree with near-zero additional cost. On short-context benchmarks, Graft achieves up to 5.41× speedup and improves average speedup over EAGLE-3 by up to 21.8% on Qwen3-235B. The method is evaluated across short- and long-context settings and extended to block-drafting paradigms.

3arXiv · cs.LG·2d ago·source ↗

P-K-GCN: Physics-augmented Koopman-enhanced Graph Convolutional Network for spatiotemporal super-resolution

Researchers propose P-K-GCN, a framework combining graph convolutional networks, Koopman operator theory, and physics-informed loss functions for spatiotemporal super-resolution on irregular geometries. The method linearizes nonlinear dynamics in a latent space and enforces physical constraints to improve reconstruction fidelity. Theoretical analysis claims guaranteed error reduction via Rademacher complexity bounds. The framework is evaluated on reconstructing high-resolution cardiac electrodynamics from sparse 3D heart geometry measurements.