Almanac
model

MobileLLM-Pro

modelactiveprovisionalmobilellm-pro-82ac0c5a·1 events·first seen 21d ago

Aliases: MobileLLM-Pro

Co-occurring entities

More like this (12)

Recent events (1)

7arXiv · cs.CL·21d ago·source ↗

MobileMoE: Scaling Mixture-of-Experts for Sub-Billion Parameter On-Device Deployment

MobileMoE introduces a family of on-device MoE language models with 0.3–0.9B active parameters and 1.3–5.3B total parameters, targeting mobile deployment under memory and compute constraints. The authors derive an on-device MoE scaling law identifying a sweet spot of moderate sparsity with fine-grained and shared experts, then train models through a four-stage recipe including quantization-aware training on open-source data. Across 14 benchmarks, MobileMoE matches or exceeds leading dense on-device LLMs with 2–4× fewer inference FLOPs, and delivers 1.8–3.8× faster prefill and 2.2–3.4× faster decode than dense baselines on commodity smartphones at comparable INT4 memory.