Almanac
technique

Bootstrap Mode Frequency

techniqueactiveprovisionalbootstrap-mode-frequency-59504269·1 events·first seen 22d ago

Aliases: Bootstrap Mode Frequency

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·22d ago·source ↗

Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals

This paper investigates uncertainty quantification (UQ) for activation oracles—systems that make LLM internal activations human-legible—by evaluating 6 confidence estimation methods across 6,000 samples per oracle. The authors find that bootstrap mode frequency achieves the best calibration (ECE 5.7% vs. 25.5% for log-probability baseline on Qwen3-8B), while the log-prob baseline remains useful as a cheap triage signal. Experiments vary verbalizer and context prompts across two Qwen3 model sizes. Code and a patched trainer are released publicly.