Almanac
other

NVIDIA RTX 5090

otheractiveprovisionalnvidia-rtx-5090-91a4100a·1 events·first seen 2d ago

Aliases: NVIDIA RTX 5090

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.LG·2d ago·source ↗

Execution-State Capsules: Graph-bound checkpoint/restore for low-latency on-device LLM serving

Researchers introduce execution-state capsules, a checkpoint-and-restore mechanism that snapshots the complete execution state (KV cache, recurrent state, convolution state, MTP state, and metadata) at graph boundaries rather than managing only KV fragments. The FlashRT runtime implements this on NVIDIA CUDA with sub-millisecond GPU-resident snapshot/restore, achieving TTFT speedups of 3.9x at 2k tokens and 27x at 16k tokens over cold prefill on an RTX 5090. The work targets low-latency, small-batch, on-device physical-AI scenarios—interactive agents, speech systems, robot policies—where branching, rollback, and re-entry are common. This is positioned as complementary to, not a replacement for, high-throughput KV-cache serving.