5arXiv cs.LG (Machine Learning)·19d ago

Tight Convergence Theory for Error Feedback Algorithms in Distributed Optimization

This paper provides tight convergence analyses for two major error-feedback algorithms—classic Error Feedback (EF) and Error Feedback 21 (EF21)—used to mitigate communication bottlenecks in distributed learning. The authors identify optimal step-size choices and construct tailored Lyapunov functions for each method, yielding guarantees that hold independently of the number of agents and recover the best known single-agent bounds. The work clarifies the relative performance of these gradient compression variants, which has remained poorly understood despite widespread use.

Training Infrastructure Inference Economics Error Feedback 21 (EF21)Error Feedback (EF)Lyapunov function gradient compression distributed optimization

Related guides (2)

Training InfrastructureTopic guide

Training Infrastructure: The Compute Arms Race Powering Modern AI

Read asBeginner In-depth

Inference EconomicsTopic guide

Inference Economics: The Cost of Running AI in Production

Read asBeginner In-depth

Related events (8)

5arXiv · cs.LG·25d ago·source ↗

Global Convergence Theory for Wasserstein Policy Gradient in Entropy-Regularized RL

This paper establishes the first global convergence theory for Wasserstein Policy Gradient (WPG), a continuous-control RL optimization method that uses optimal-transport geometry over action distributions. The authors show that the Bellman recursion structure of entropy-regularized RL induces a Polyak–Łojasiewicz (PL) geometry that substitutes for classical convexity, enabling global convergence analysis. Key technical contributions include a statewise KL representation of the soft Bellman residual, a Bellman resolvent identity linking value improvement to relative Fisher information, and a uniform log-Sobolev inequality for the evolving Gibbs policy family. The result yields geometric contraction up to discretization bias, providing theoretical grounding for WPG in continuous-action settings.

AI Safety Research Optimal Transport Langevin Dynamics Soft Q-Function +4 more

6arXiv · cs.AI·23d ago·source ↗

Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL

This paper investigates whether extrapolative weight averaging of RL-trained checkpoints can extend Pareto frontiers between competing objectives (correctness vs. computational efficiency) without additional training. Starting from a shared initialization, the authors train checkpoints under nested unit-test coverage regimes for competitive programming tasks, revealing a correctness-efficiency frontier where higher-coverage rewards reduce optimization failures but increase correctness failures. Extrapolation beyond trained endpoints produces complementary policies that, when ensembled, improve pass@250 on LCB/hard by 3.3% over the best single checkpoint at matched sample budget. Results hold across 7B and 32B model scales and three inference settings: pure reasoning, tool use, and agentic coding.

Evaluation and Benchmarking Inference Economics LCB/hard benchmark Competitive Programming RL LeetCode Hard (LCB/hard)+9 more

4arXiv · cs.LG·12d ago·source ↗

MG-ADSGD achieves optimal communication complexity for decentralized stochastic strongly convex optimization

Researchers propose Multi-Gossip Accelerated DSGD (MG-ADSGD), a decentralized stochastic optimization algorithm that simultaneously achieves accelerated dependence on both the condition number (√κ) and the network spectral gap (1/√(1-β)), a combination no prior stochastic method had attained. The algorithm couples gossip depth with mini-batch size so that additional communication rounds improve both consensus accuracy and gradient variance reduction. The resulting communication complexity is claimed to be the best currently known for decentralized stochastic strongly convex optimization up to logarithmic factors.

Training Infrastructure Multi-Gossip Accelerated DSGD Accelerated Decentralized Stochastic Gradient Descent for Strongly Convex Optimization

5arXiv · cs.LG·3d ago·source ↗

Kolmogorov Regression lifts diffusion policies to Cameron-Martin space for robust long-horizon control

Researchers introduce a backward Kolmogorov equation framework that reformulates diffusion policy training as a deterministic boundary-value PDE problem in Cameron-Martin space, replacing stochastic score matching. The approach uses a precision-weighted Cameron-Martin loss and a Kolmogorov residual as an inference-time failure detector, yielding convergence guarantees tied to kernel effective rank rather than action dimension. Validation on the PushT manipulation benchmark shows 17% improvement in episode reward and 67.6% reduction in inter-step drift; a 6-station manufacturing scheduling task shows 28.4% lower RMSE than LSTM baselines and 96% reduction in deadlock events via Hamilton-Jacobi reachability certification.

Agent and Tool Ecosystem Hamilton-Jacobi reachability Kolmogorov Regression for Robust Diffusion Policies PushT +1 more

5arXiv · cs.CL·5d ago·source ↗

OPCoD: On-Policy Co-Distillation for Mutual LLM Improvement via Peer Feedback

Researchers introduce On-Policy Co-Distillation (OPCoD), a training framework where two LLMs, each stronger in a different domain, iteratively tutor each other using on-policy rollouts and peer feedback. The method uses cognizance-based gating to control when feedback is given and feedback anchoring to ground it in the problem context. On Science Q&A tasks, OPCoD achieves Pareto improvement for both models across all evaluated domain pairs, outperforming one-way distillation and single-model fine-tuning baselines.

Evaluation and Benchmarking Alignment and RLHF On-Policy Co-Distillation Be My Tutor: On-Policy Co-Distillation for Mutual LLM Improvement via Peer Feedback

5arXiv · cs.LG·19d ago·source ↗

KAFFEE: Addressing the Dynamic-Probabilistic Consistency Gap in Chaotic Surrogate Modeling

This paper identifies a 'dynamic-probabilistic consistency (DPC) gap' in dynamical systems reconstruction (DSR), where optimizing finite-horizon probabilistic objectives can degrade learned dynamics or decouple predictive uncertainty from local tangent dynamics. Three failure mechanisms are isolated: core collapse, noise masking, and blind uncertainty. The authors propose KAFFEE, a differentiable extended Kalman filter-based training framework that evaluates likelihood on local predictive residuals while transporting covariance through learned Jacobians, reducing these failure modes on stochastic hyperchaotic Lorenz-96 and across 13 chaotic systems when adapting a DSR foundation model.

Evaluation and Benchmarking AI Safety Research Dynamic-Probabilistic Consistency Gap Extended Kalman Filter Lorenz-96 +3 more

6arXiv · cs.AI·47h ago·source ↗

Contagion Networks: formal framework for measuring evaluator bias propagation in multi-agent LLM systems

A new arXiv preprint introduces Contagion Networks, a formal framework for quantifying how systematic evaluation biases spread across interacting LLM agents in multi-agent systems. Using a controlled 3-agent experiment with DeepSeek-chat, the authors measure a Cross-Agent Contagion Matrix and find that homogeneous-model agents produce contagion coefficients 3-5x weaker than cross-model settings. A key practical finding is that increasing evaluator committee size from k=1 to k=3 reduces effective contagion by 72.4%, offering a concrete mitigation strategy. The authors release an open-source experimental framework alongside the paper.

Evaluation and Benchmarking AI Safety Research Contagion Networks: Evaluator Bias Propagation in Multi-Agent LLM Systems MM-EPC deepseek-chat +1 more

4arXiv · cs.AI·23d ago·source ↗

Preference-Shaped Expected Hypervolume and R2 Improvement: Exact Computation and Monotonicity

This paper analyzes preference-shaped expected improvement criteria for Bayesian multiobjective optimization, focusing on hypervolume (EHVI) and R2 indicator families. The authors establish which preference transformations preserve exact computation, Pareto compatibility, and monotonicity, and which alter the underlying geometry. A key result is that exact integral R2 improvement is not generally an objective-space weighted hypervolume but is exactly a scalarization-space volume (Tchebycheff shadow measure), enabling new finite-sum and quadrature algorithms for ER2I. The work also provides an achievement-space Gaussian surrogate formulation reducing ER2I to an integral of scalar Gaussian expected improvements.

Evaluation and Benchmarking Tchebycheff Scalarization Bayesian Multiobjective Optimization Expected Hypervolume Improvement (EHVI)+2 more