technique
Graft
techniqueactive
graft-bda63d87·1 events·first seen 27d agoAliases: Graft
Co-occurring entities
More like this (12)
Recent events (1)
Graft: Hybrid Tree Construction for Speculative Decoding via Prune-Then-Retrieve
Graft is a training-free framework that improves speculative decoding by coupling dynamic-depth pruning with retrieval-based token compensation. Pruning reduces VRAM and compute overhead while freeing budget for retrieval, which fills topological gaps in the draft tree with near-zero additional cost. On short-context benchmarks, Graft achieves up to 5.41× speedup and improves average speedup over EAGLE-3 by up to 21.8% on Qwen3-235B. The method is evaluated across short- and long-context settings and extended to block-drafting paradigms.