Entity · paper

VIA-SD: Verification via Intra-Model Routing for Speculative Decoding

paperactivevia-sd-verification-via-intra-model-routing-for-speculative-decoding-5902fb98·1 events·first seen Jun 11, 2026

Aliases: VIA-SD: Verification via Intra-Model Routing for Speculative Decoding

Co-occurring entities

speculative decoding

More like this (12)

A Practical Investigation of Training-free Relaxed Speculative Decoding Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models speculative decoding Follow the Latent Roadmap: Navigating Revocable Decoding for Diffusion LLMs with Anchor Tokens Parallel Decoding Distillation Self-Trained Verification (STV)Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement blockwise decoding RAT: Reference-Augmented Training for ASV Anti-Spoofing Partially Correlated Verifier Cascades in LLM Harnesses: Concave Log-Odds, Polynomial Reliability, and Blind-Spot Ceilings MODUS: Decoder-Only Any-to-Any Modeling of Diverse Modalities SV-Detect

Recent events (1)

5arXiv · cs.CL·Jun 11, 2026·source ↗

VIA-SD: Multi-tier speculative decoding via intra-model routing cuts rejection rates and boosts inference speed

VIA-SD introduces a three-tier verification framework for speculative decoding that routes draft tokens to a lightweight 'slim verifier' submodel for medium-confidence cases, reserving full-model verification only for uncertain tokens. Across four tasks and multiple model families, the method reduces rejection rates by 0.10–0.22 and achieves 10–20% speedups over strong speculative decoding baselines, with 2.5–3x acceleration over standard decoding. The approach is compatible with existing speculative decoding frameworks without retraining. The work proposes multi-tier speculative decoding as a general paradigm for scalable LLM inference.

Inference Economics speculative decoding VIA-SD: Verification via Intra-Model Routing for Speculative Decoding