Entity · technique

Graft

techniqueactivegraft-bda63d87·1 events·first seen May 20, 2026

Aliases: Graft

Co-occurring entities

speculative decoding DFlash EAGLE-3 Qwen3-235B

More like this (12)

Splice Grab GRASP FORGE lesioning technique Abridge SyGra AraGen Jailbreak GPT Cross-Domain Transfer GPTs

Recent events (1)

6arXiv · cs.AI·May 20, 2026·source ↗

Graft: Hybrid Tree Construction for Speculative Decoding via Prune-Then-Retrieve

Graft is a training-free framework that improves speculative decoding by coupling dynamic-depth pruning with retrieval-based token compensation. Pruning reduces VRAM and compute overhead while freeing budget for retrieval, which fills topological gaps in the draft tree with near-zero additional cost. On short-context benchmarks, Graft achieves up to 5.41× speedup and improves average speedup over EAGLE-3 by up to 21.8% on Qwen3-235B. The method is evaluated across short- and long-context settings and extended to block-drafting paradigms.

Frontier Model Releases Inference Economics speculative decoding DFlash EAGLE-3 +2 more