4arXiv cs.LG (Machine Learning)·Jun 8, 2026

Second-order path kernel interpolation formulas extend Domingos' gradient-descent characterization

This paper extends Pedro Domingos' 2020 first-order path-kernel interpolation formula for gradient-descent-trained models to second-order forms. The authors derive curvature-weighted correction terms for standard SGD, an additional sampling-induced component coupling prediction curvature with mini-batch gradient noise covariance, and an extension to SGD with momentum. A concentration estimate for the terminal prediction is also established, quantifying fluctuation around the expected second-order representation.

Pedro Domingos Second-Order Path Kernel Interpolation Formulas in Machine Learning

Related events (8)

3arXiv · cs.LG·5d ago·source ↗

Complexity bounds for learning projected gradient descent solver iterates via k-neighborhood data augmentation

A new arXiv preprint derives Rademacher complexity-based generalization bounds for learning to predict intermediate iterates of projected gradient descent solvers applied to box-constrained quadratic programs. The authors propose a k-neighborhood data collection strategy that augments converged-solution datasets with intermediate solver states, increasing training data without additional solver runs. The work connects to GLENS, a data-efficient global search method, and frames the approach within the Dynamic Data-Driven Application Systems (DDDAS) paradigm for tightening data-model-optimization loops.

GLENS Complexity Bounds and Approaches to Learning Projected Gradient Descent Solver Iterates Rademacher Complexity

5arXiv · cs.LG·Jun 17, 2026·source ↗

Kolmogorov Regression lifts diffusion policies to Cameron-Martin space for robust long-horizon control

Researchers introduce a backward Kolmogorov equation framework that reformulates diffusion policy training as a deterministic boundary-value PDE problem in Cameron-Martin space, replacing stochastic score matching. The approach uses a precision-weighted Cameron-Martin loss and a Kolmogorov residual as an inference-time failure detector, yielding convergence guarantees tied to kernel effective rank rather than action dimension. Validation on the PushT manipulation benchmark shows 17% improvement in episode reward and 67.6% reduction in inter-step drift; a 6-station manufacturing scheduling task shows 28.4% lower RMSE than LSTM baselines and 96% reduction in deadlock events via Hamilton-Jacobi reachability certification.

Agent and Tool Ecosystem Hamilton-Jacobi reachability Kolmogorov Regression for Robust Diffusion Policies PushT +1 more

5arXiv · cs.AI·Jun 29, 2026·source ↗

Theoretical analysis of generalization scaling laws in quadratic two-layer neural networks

A new arXiv preprint derives explicit characterizations of generalization error as a joint function of model width, sample count, and regularization in a quadratic two-layer network with structured data. The analysis reveals a phase diagram with distinct scaling regimes governed by data-dependent power laws tied to the spectral structure of the target function. The work extends scaling law theory beyond fixed-feature or infinite-width regimes by operating in a finite-sample, feature-learning setting, and characterizes interpolation threshold transitions.

Evaluation and Benchmarking How Width and Data Shape Generalization Scaling Laws in Quadratic Neural Networks

4arXiv · cs.LG·5d ago·source ↗

Mixed-sign spectral regularization via negative-shifted gradient descent for overparameterized linear regression

A new arXiv preprint introduces negative-shifted gradient descent as a method for mixed-sign spectral regularization in overparameterized linear regression, escaping structural limitations of the negative-ridge endpoint. The authors identify a Marchenko-Pastur barrier in a Gaussian spike-plus-flat model and prove that early-stopped paths improve on all admissible endpoints by a polynomial factor in risk under explicit conditions. The main theorem handles general high-effective-rank tails and recovers all head scales simultaneously, with technical control via localized Duhamel integrals and a finite-grid hold-out inequality for validation-selected algorithms.

Evaluation and Benchmarking Beyond Negative-Ridge Endpoints: Mixed-Sign Spectral Regularization via Negative-Shifted Gradient Descent Marchenko-Pastur distribution

4arXiv · cs.LG·Jun 5, 2026·source ↗

Large deviation analysis shows most interpolating classifiers share the same generalization performance

A new arXiv preprint establishes a large deviation principle characterizing the generalization performance of interpolating linear classifiers in the overparameterized regime (n/d → α, small α). The key result is a concentration phenomenon: all but an exponentially small fraction of interpolators achieve approximately the same generalization error, determined by a unique rate-function maximizer. Empirically, gradient descent and a natural linear program both outperform this typical interpolator, providing theoretical grounding for benign overfitting in overparameterized models.

How abundant are good interpolators?

5arXiv · cs.LG·May 25, 2026·source ↗

Perturbation Theory for Spherical Hellinger-Kantorovich Flows with Differential Privacy Guarantees

This paper develops a perturbation theory for Spherical Hellinger-Kantorovich (SHK) gradient flows, which couple transport and reaction dynamics and coincide with birth-death Langevin dynamics. The authors derive dimension-free bounds on log-likelihood ratios and Rényi/KL divergences when two potentials differ, quantifying how perturbations propagate over time. These results are applied to differential privacy: the likelihood-ratio control yields explicit Pure-DP guarantees for SHK-based samplers implementing the exponential mechanism, while KL bounds provide Approximate-DP certificates. A utility bound is also derived that separates intrinsic exponential-mechanism suboptimality from finite-time sampling error.

AI Safety Research Alignment and RLHF Differential Privacy KL Divergence Spherical Hellinger-Kantorovich geometry +4 more

4arXiv · cs.LG·Jun 17, 2026·source ↗

SDE approximation for TD learning with linear features under Markovian noise

A new arXiv preprint replaces the classical ODE description of linear TD(0) learning with a stochastic differential equation (SDE) approximation that accounts for Markovian sampling noise. The model separates contraction dynamics governed by the projected Bellman operator from the influence of Markovian long-run covariance, providing a theoretical explanation for the constant-stepsize error floor. The work is a theoretical contribution to the foundations of reinforcement learning policy evaluation.

Alignment and RLHF TD(0)A Diffusion Approximation for Temporal-Difference Learning with Linear Features under Markovian Noise

5arXiv · cs.LG·Jun 16, 2026·source ↗

Exact Posterior Score (EPS): Closed-form posterior sampling for linear inverse problems with diffusion models

A new arXiv preprint derives the exact posterior score in closed form for linear Gaussian inverse problems under general Gaussian interpolants, showing that posterior sampling reduces to a denoising problem at an operator-dependent shifted pivot under anisotropic noise covariance. The authors convert this identity into a training objective called Exact Posterior Score (EPS) that preserves the input/output structure of standard diffusion pretraining, enabling training from scratch or fine-tuning from a pretrained denoiser. EPS is evaluated on five linear inverse problems across FFHQ and ImageNet, outperforming both training-free and training-based baselines while requiring roughly an order of magnitude fewer denoiser evaluations than gradient-based posterior samplers.

Evaluation and Benchmarking Exact Posterior Score Estimation for Solving Linear Inverse Problems FFHQ ImageNet +1 more

Second-order path kernel interpolation formulas extend Domingos' gradient-descent characterization

Related events (8)

3arXiv · cs.LG·5d ago·source ↗