paper
Token-Operations-Oriented Inference Optimization Techniques for Large Models
paperactiveprovisional
token-operations-oriented-inference-optimization-techniques-for-large-models-f39b75c4·1 events·first seen 47h agoAliases: Token-Operations-Oriented Inference Optimization Techniques for Large Models
More like this (12)
GraphPO: Graph-based Policy Optimization for Reasoning ModelsBeyond Uniform Tokens: Adaptive Compression for Time Series Language ModelsCLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token InferenceLarge Reasoning ModelsLanguage Model FinetuningBayesian Optimizationdistributionally robust optimizationScaling Laws for Reward Model OveroptimizationDecomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape RobustnessAdaptive Multi-Resolution Procedural Knowledge Compression for Large Language ModelsQuantifying Faithful Confidence Expression in Large Reasoning ModelsDoes Reasoning Preserve Alignment? On the Trustworthiness of Large Reasoning Models
Recent events (1)
Survey proposes four-layer architecture for token-operations-oriented LLM inference optimization
A new arXiv preprint introduces a four-layer technical architecture—Multi-model Fusion, Model Optimization, Compute-Model Fusion, and Compute-Network-Model Fusion—for systematically organizing LLM inference optimization techniques. The paper reviews key technologies and industry status at each layer and analyzes their application in real-world business scenarios. The framing around 'token operations' positions inference optimization as an operational discipline analogous to traditional IT operations.