Goal-Oriented Lower-Tail Calibration of Gaussian Processes for Bayesian Optimization
This paper addresses miscalibration in Gaussian process predictive distributions used for Bayesian optimization, focusing specifically on the lower tail relevant to minimization objectives. The authors introduce a framework for 'goal-oriented' spatial calibration below a threshold t, defining occurrence calibration and thresholded μ-calibration on sublevel sets. They propose tcGP, a post-hoc calibration method, and prove the resulting EI-based optimizer remains dense in the design space. Experiments on standard benchmarks show tcGP improves both lower-tail calibration and overall BO performance compared to standard and globally calibrated GP models.
Related guides (2)
Related events (8)
GGRO: Gradient-Guided Reward Optimization for inference-time LLM alignment
Researchers introduce Gradient-Guided Reward Optimization (GGRO), an inference-time alignment method that uses gradient signals from a reward model to inject 'nudging tokens' at high-uncertainty decoding steps, rather than relying on sampling-intensive re-ranking approaches like Best-of-N. The method monitors token-level entropy to detect distribution drift and steers generation trajectories directly, claiming improved robustness to reward hacking with minimal computational overhead. Experiments show gains across safety, helpfulness, and reasoning benchmarks compared to standard inference-time alignment baselines.
Probabilistic Smoothing with Ratio-Monotone Transforms for Global Optimization
This paper proposes a generalized probabilistic smoothing framework for global optimization that replaces Gaussian kernels with flexible symmetric unimodal kernels combined with monotonic ratio-based transformations. The authors prove that the smoothed objective preserves the global maximizer and that stationary points concentrate near the true optimum under large amplification, without requiring a decreasing smoothing schedule. Explicit complexity bounds for stochastic gradient ascent are derived, and a leave-one-out baseline is shown to provably reduce variance. Experiments on high-dimensional benchmarks and black-box adversarial attacks demonstrate improved robustness over existing methods.
GoBOED: Goal-Driven Bayesian Optimal Experimental Design for Decision-Focused Robustness
GoBOED is a new framework for Bayesian optimal experimental design (BOED) that replaces information-gain maximization with direct optimization for a specified downstream decision objective. It combines an amortized variational posterior surrogate with a differentiable convex decision layer to enable gradient-based, decision-focused design optimization. The authors prove that GoBOED gradients are insensitive to parameter directions irrelevant to the decision goal, formally justifying why goal-driven design achieves equivalent decision quality over a wider range of experimental designs. Empirical results across source localization, epidemic management, and pharmacokinetic control show improved alignment with decision objectives compared to goal-agnostic BOED.
Optimal deterministic multicalibration achieved, resolving open problem on randomization necessity
A new arXiv preprint resolves an open problem in multicalibration theory by constructing a minimax-optimal multicalibration algorithm that outputs a deterministic predictor, achieving the same O(ε⁻³) sample complexity previously only attainable by randomized predictors. The result extends to outcome indistinguishability, deterministic omnipredictors, and panpredictors with optimal sample complexity, resolving multiple open problems from recent works. Multicalibration is a fairness and reliability property requiring calibration to hold across reweighted subgroups, making this relevant to trustworthy ML research.
Unified MAIR framework bridges GP-UCB and DEC approaches in kernel bandits
A new arXiv preprint unifies two major theoretical frameworks for frequentist RKHS bandits — Gaussian-process upper confidence bound (GP-UCB) and decision-estimation-coefficient (DEC) methods — under a common algorithmic-information language called MAIR. The paper generalizes both the GP-UCB analysis and the MAMS algorithm, proposes a safeguarded master algorithm combining their advantages, and demonstrates that algorithmic complexity can be more informative than class-wide minimax certificates in overparameterized models. The work clarifies a foundational distinction between algorithmic information and minimax coefficients in bandit theory.
Normal Guidance: Bell-Curve Regularization for Attention-Based MIL in 3D Medical Imaging
This paper addresses weakly supervised classification of 3D medical images where only volume-level binary labels are available. The authors identify that a simple center-focused baseline outperforms attention-based and transformer-based multiple instance learning (MIL) at slice-level classification across brain, thoracic, and abdominal CT datasets. They propose Normal Guidance, a regularization technique that constrains learned attention distributions to follow a bell-shaped curve, achieving superior slice-level localization over state-of-the-art MIL methods across datasets totaling over 4 million 2D slices.
Calibrated Mixture-of-Experts under distribution shift: adversarial reweighting approach
A new arXiv preprint analyzes how mixture-of-experts (MoE) models maintain calibration under distribution shift, examining the interaction between routing mechanisms and expert-level calibration. The authors prove that expert calibration is sufficient for overall model calibration in hard-routed MoE but insufficient for soft-routed variants. To address the soft-routing gap, they propose an adversarial reweighting method that penalizes calibration errors of the routed aggregate under distribution shift, demonstrating improved accuracy-calibration tradeoffs across model classes and tasks.
General Preference Reinforcement Learning (GPRL): Bridging Online RL and Preference Optimization for Open-Ended Tasks
GPRL proposes a new alignment framework that replaces scalar reward models with a General Preference Model (GPM) embedding responses into k skew-symmetric subspaces to capture multi-dimensional, intransitivity-aware preferences. The method computes per-dimension group-relative advantages, normalizes across axes, and uses a closed-loop drift monitor to detect and correct single-axis reward hacking during training. Starting from Llama-3-8B-Instruct, GPRL achieves a 56.51% length-controlled win rate on AlpacaEval 2.0 and outperforms SimPO and SPPO on Arena-Hard, MT-Bench, and WildBench. The work directly addresses the gap between verifiable-reward online RL (strong on math/code) and preference optimization (strong on open-ended tasks).

