paper
VIA-SD: Verification via Intra-Model Routing for Speculative Decoding
paperactiveprovisional
via-sd-verification-via-intra-model-routing-for-speculative-decoding-5902fb98·1 events·first seen 6d agoAliases: VIA-SD: Verification via Intra-Model Routing for Speculative Decoding
Co-occurring entities
More like this (12)
Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Modelsspeculative decodingFollow the Latent Roadmap: Navigating Revocable Decoding for Diffusion LLMs with Anchor TokensSelf-Trained Verification (STV)Visual Verification Enables Inference-time Steering and Autonomous Policy Improvementblockwise decodingRAT: Reference-Augmented Training for ASV Anti-SpoofingSV-DetectObservation-Guided Video-Context RoutingVerifier-in-the-Loop Training (ViL)multimodal meta-verificationParallel Box Decoding
Recent events (1)
VIA-SD: Multi-tier speculative decoding via intra-model routing cuts rejection rates and boosts inference speed
VIA-SD introduces a three-tier verification framework for speculative decoding that routes draft tokens to a lightweight 'slim verifier' submodel for medium-confidence cases, reserving full-model verification only for uncertain tokens. Across four tasks and multiple model families, the method reduces rejection rates by 0.10–0.22 and achieves 10–20% speedups over strong speculative decoding baselines, with 2.5–3x acceleration over standard decoding. The approach is compatible with existing speculative decoding frameworks without retraining. The work proposes multi-tier speculative decoding as a general paradigm for scalable LLM inference.