Almanac
technique

Pair-In, Pair-Out (PIPO)

techniqueactiveprovisionalpair-in-pair-out-pipo--3e80aba0·1 events·first seen 21d ago

Aliases: Pair-In, Pair-Out (PIPO)

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·21d ago·source ↗

Pair-In, Pair-Out (PIPO): Unified Latent Compression and Multi-Token Prediction for Efficient LLM Inference

PIPO is a new inference efficiency framework that unifies input-side latent compression with output-side multi-token prediction (MTP) by treating them as mirror operations: a compressor folds two input tokens into one latent, while an MTP head unfolds one hidden state into an additional output token. To avoid the expensive verifier pass typically required by speculative decoding, PIPO trains a lightweight confidence head using On-Policy Distillation (OPD), which naturally aligns with rejection-sampling criteria. Experiments on Qwen3.5-4B and 9B backbones across AIME 2025, GPQA-Diamond, LiveCodeBench v6, and LongBench v2 show up to 2.64× first-token-latency speedup and +7.15 pass@4 improvement over regular decoding.