Almanac
← Events
5arXiv cs.LG (Machine Learning)·19d ago

KAFFEE: Addressing the Dynamic-Probabilistic Consistency Gap in Chaotic Surrogate Modeling

This paper identifies a 'dynamic-probabilistic consistency (DPC) gap' in dynamical systems reconstruction (DSR), where optimizing finite-horizon probabilistic objectives can degrade learned dynamics or decouple predictive uncertainty from local tangent dynamics. Three failure mechanisms are isolated: core collapse, noise masking, and blind uncertainty. The authors propose KAFFEE, a differentiable extended Kalman filter-based training framework that evaluates likelihood on local predictive residuals while transporting covariance through learned Jacobians, reducing these failure modes on stochastic hyperchaotic Lorenz-96 and across 13 chaotic systems when adapting a DSR foundation model.

Related guides (2)

Related events (8)

5arXiv · cs.LG·3d ago·source ↗

Kolmogorov Regression lifts diffusion policies to Cameron-Martin space for robust long-horizon control

Researchers introduce a backward Kolmogorov equation framework that reformulates diffusion policy training as a deterministic boundary-value PDE problem in Cameron-Martin space, replacing stochastic score matching. The approach uses a precision-weighted Cameron-Martin loss and a Kolmogorov residual as an inference-time failure detector, yielding convergence guarantees tied to kernel effective rank rather than action dimension. Validation on the PushT manipulation benchmark shows 17% improvement in episode reward and 67.6% reduction in inter-step drift; a 6-station manufacturing scheduling task shows 28.4% lower RMSE than LSTM baselines and 96% reduction in deadlock events via Hamilton-Jacobi reachability certification.

5arXiv · cs.LG·26d ago·source ↗

Perturbation Theory for Spherical Hellinger-Kantorovich Flows with Differential Privacy Guarantees

This paper develops a perturbation theory for Spherical Hellinger-Kantorovich (SHK) gradient flows, which couple transport and reaction dynamics and coincide with birth-death Langevin dynamics. The authors derive dimension-free bounds on log-likelihood ratios and Rényi/KL divergences when two potentials differ, quantifying how perturbations propagate over time. These results are applied to differential privacy: the likelihood-ratio control yields explicit Pure-DP guarantees for SHK-based samplers implementing the exponential mechanism, while KL bounds provide Approximate-DP certificates. A utility bound is also derived that separates intrinsic exponential-mechanism suboptimality from finite-time sampling error.

6arXiv · cs.AI·11d ago·source ↗

WorldKernel: Formalizing world models as coupling kernels over counterfactual worlds

A new arXiv preprint identifies a structural failure mode in prediction-based world models: strong predictors can recover the diagonal of a counterfactual coupling kernel (ordinary posteriors) but systematically fail on off-diagonal cross-world couplings, collapsing to point estimates that are sometimes provably inadmissible. The authors formalize a world model as a positive semidefinite kernel K(T,T') over admissible possible worlds, showing the off-diagonal encodes counterfactual structure that more data cannot resolve. They demonstrate that PSD constraints provide partial identification bounds computable in polynomial time, that ontological axioms tighten these bounds, and that targeted constraint learning ('scars') closes the gap faster than untargeted approaches. The work has implications for causal reasoning in AI systems and the theoretical limits of learned world models.

5arXiv · cs.LG·11d ago·source ↗

DRPO: Smooth divergence regularization replaces hard masking in LLM RL training

A new arXiv preprint proposes Divergence Regularized Policy Optimization (DRPO), a method that replaces the hard trust-region mask used in DPPO with a smooth advantage-weighted quadratic regularizer on policy shift. The approach addresses a known weakness in PPO and GRPO where importance ratios poorly proxy distributional shift in long-tailed vocabularies, and in DPPO where gradient signals are discarded rather than corrected at trust-region boundaries. Experiments across model scales, architectures, and precision settings show improved stability and efficiency in LLM RL post-training.

6arXiv · cs.AI·1mo ago·source ↗

Lost in Fog: Sensor Perturbations Expose Reasoning Fragility in Driving VLAs

This paper presents a controlled robustness study of Vision-Language-Action (VLA) models in autonomous driving, evaluating Alpamayo R1 (10B parameters) across ~18,000 inference trials under eight sensor perturbation types including noise, lighting extremes, and fog. The key finding is that Chain-of-Causation (CoC) reasoning consistency is a high-fidelity proxy for trajectory reliability: when CoC explanations change post-perturbation, trajectory deviation spikes 5.3× (r=0.99 across attack types). Enabling CoC generation is associated with 11.8% average improvement in trajectory accuracy, and degradation under noise is approximately linear (R²=0.957), while standard preprocessing defenses offer only marginal benefit.

4arXiv · cs.AI·11d ago·source ↗

IA-VQC-DPC: Intervention-aware quantum predictive control with safety attribution for learned policies

A new arXiv preprint introduces Intervention-Aware Variational Quantum Differentiable Predictive Control (IA-VQC-DPC), a framework that trains variational quantum circuit policies under a primal-dual intervention budget to penalize over-reliance on downstream safety filters (Control-Barrier-Function projections). The work also proposes a safety-attribution protocol that decomposes trajectory corrections into policy-level versus filter-level contributions, enabling measurement of whether a policy has genuinely learned safe behavior or is merely being silently repaired by its safety layer. Experiments on BOPTEST building-control emulators show the quantum policy achieves significantly lower pre-filter violations than a matched classical policy at equal parameter budget, with a notable negative result: a learned energy head is only safe when paired with a distribution-aware runtime guard.

6arXiv · cs.CL·22d ago·source ↗

Canonical-Context On-Policy Distillation (CCOPD) for Multi-Turn LLM Consistency

This paper identifies 'self-anchored drift' as a key failure mode in multi-turn LLMs: when information is revealed incrementally across turns, models produce unsupported assumptions that distort final answers, even when the total evidence is identical to a single-prompt setting. The authors propose Canonical-Context On-Policy Distillation (CCOPD), which trains a student model on incremental multi-turn conversations to match the output distribution of a frozen teacher conditioned on the full clean prompt. Trained only on math conversations, CCOPD achieves a 32% average relative improvement on multi-turn (RAW-SHARDED) tasks and generalizes zero-shot to five out-of-domain task families while preserving single-prompt performance.

5arXiv · cs.AI·11d ago·source ↗

DARP: Semi-parametric retrieval-based imitation learning reduces compounding errors by 15-46%

Researchers introduce DARP (Difference-Aware Retrieval Policies), a semi-parametric imitation learning method that retrieves k-nearest neighbor demonstrations at inference time and predicts actions based on relative distance vectors between neighbor and query states. The approach reparameterizes behavior cloning around local neighborhood structure rather than global state-to-action mappings, requiring no additional data collection or online expert feedback. Across continuous control and robotic manipulation tasks, DARP shows 15-46% performance improvements over standard behavior cloning, including on high-dimensional visual inputs.