Entity · technique

Uncertainty Quantification

techniqueactiveuncertainty-quantification-02290263·2 events·first seen May 26, 2026

Aliases: Uncertainty Quantification

Co-occurring entities

Reverse Probing delta energy AUPRC clinical text summarization Expected Calibration Error Activation Oracles Qwen3-4B Federico Torrielli Qwen3.6-27B Bootstrap Mode Frequency

More like this (12)

Subjective Risk Decomposition: A New View for Uncertainty Quantification BeyondUncertainty Uncertainty Quantification for Computer-Use Agents: A Benchmark across Vision-Language Models and GUI Grounding Datasets Uncertainty Calibration Verbal Uncertainty Expression Just how sure are you? Improving Verbalized Uncertainty Calibration in Medical VQA Uncertainty-Aware Generation and Decision-Making Under Ambiguity epistemic uncertainty Estimating Uncertainty from Reasoning: A Large-Scale Study of Multi- and Crosslingual MCQA Performance in LLMs Code Is More Than Text: Uncertainty Estimation for Code Generation Quantium Uncertainty-aware Multi-Granularity RAG

Recent events (2)

5arXiv · cs.AI·May 28, 2026·source ↗

Reverse Probing: Supervised Token-level Uncertainty Quantification for LLMs in Clinical Text

The paper introduces Reverse Probing, a novel uncertainty quantification framework designed specifically for clinical text summarization that estimates token-level uncertainty from pre-existing labeled summaries rather than sampling new outputs. It extracts uncertainty signals from four categories of internal model activations, treating text as a probe into the model's internal state. Evaluated on two expert-annotated clinical datasets, it outperforms eight adapted baselines on all metrics, achieving up to 4× higher AUPRC while reducing inference time and compute. Feature analysis identifies delta energy and neighborhood context as the most consistent predictors of uncertainty across models.

Evaluation and Benchmarking AI Safety Research Reverse Probing delta energy AUPRC +3 more

5arXiv · cs.CL·May 26, 2026·source ↗

Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals

This paper investigates uncertainty quantification (UQ) for activation oracles—systems that make LLM internal activations human-legible—by evaluating 6 confidence estimation methods across 6,000 samples per oracle. The authors find that bootstrap mode frequency achieves the best calibration (ECE 5.7% vs. 25.5% for log-probability baseline on Qwen3-8B), while the log-prob baseline remains useful as a cheap triage signal. Experiments vary verbalizer and context prompts across two Qwen3 model sizes. Code and a patched trainer are released publicly.

Evaluation and Benchmarking AI Safety Research Expected Calibration Error Activation Oracles Qwen3-4B +4 more