Almanac
paper

VIA-SD: Verification via Intra-Model Routing for Speculative Decoding

paperactiveprovisionalvia-sd-verification-via-intra-model-routing-for-speculative-decoding-5902fb98·1 events·first seen 6d ago

Aliases: VIA-SD: Verification via Intra-Model Routing for Speculative Decoding

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·6d ago·source ↗

VIA-SD: Multi-tier speculative decoding via intra-model routing cuts rejection rates and boosts inference speed

VIA-SD introduces a three-tier verification framework for speculative decoding that routes draft tokens to a lightweight 'slim verifier' submodel for medium-confidence cases, reserving full-model verification only for uncertain tokens. Across four tasks and multiple model families, the method reduces rejection rates by 0.10–0.22 and achieves 10–20% speedups over strong speculative decoding baselines, with 2.5–3x acceleration over standard decoding. The approach is compatible with existing speculative decoding frameworks without retraining. The work proposes multi-tier speculative decoding as a general paradigm for scalable LLM inference.