Almanac
product

PALS

productactivepals-41b69b61·1 events·first seen 26d ago

Aliases: PALS

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.AI·26d ago·source ↗

PALS: Power-Aware LLM Serving Runtime for MoE and Dense Models

PALS is a power-aware inference runtime integrated into vLLM that treats GPU power caps as a first-class scheduling parameter alongside batch size and parallelism settings. Using lightweight offline power-performance models and a feedback-driven controller, it jointly optimizes energy efficiency and throughput targets without model retraining or API changes. Across multi-GPU deployments with both dense and MoE models, PALS achieves up to 26.3% energy efficiency improvement and reduces QoS violations by 4-7x under power constraints, enabling energy-proportional and grid-interactive AI serving.