Entity · paper

Quantifying Faithful Confidence Expression in Large Reasoning Models

paperactivequantifying-faithful-confidence-expression-in-large-reasoning-models-3f5f4d11·1 events·first seen Jun 3, 2026

Aliases: Quantifying Faithful Confidence Expression in Large Reasoning Models

More like this (12)

Does Reasoning Preserve Alignment? On the Trustworthiness of Large Reasoning Models Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs Future Confidence Distillation in Large Language Models Estimating Uncertainty from Reasoning: A Large-Scale Study of Multi- and Crosslingual MCQA Performance in LLMs Beyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Models Large Reasoning Models The Riddle Riddle: Testing Flexible Reasoning in Large Language Models and Humans Reasoning Language Models When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models Long-context Reasoning Benchmarks Relaxing Faithfulness with Intervention-Only Causal Discovery DAIS: Dependency-Aware Intermediate QA Supervision for Complex Reasoning

Recent events (1)

6arXiv · cs.CL·Jun 3, 2026·source ↗

Framework for quantifying faithful confidence expression in large reasoning models

A new arXiv preprint introduces a framework to measure faithful calibration (FC) in large reasoning models (LRMs)—the alignment between a model's intrinsic confidence and its linguistically expressed confidence. The authors analyze linguistic decisiveness against three internal uncertainty sources (token probabilities, hidden states, sampled response consistency) and introduce prefix-conditioned sampling to handle structural variation in chain-of-thought traces. Applying the framework across leading models, they find FC is a significant and distinct failure mode for LRMs: extended reasoning traces do not automatically improve calibration, prompt interventions that help non-reasoning models fail in the reasoning setting, and different confidence estimators produce divergent assessments of the same traces.

Frontier Model Releases Evaluation and Benchmarking Quantifying Faithful Confidence Expression in Large Reasoning Models +2 more