Almanac
technique

Min-p sampling

techniqueactiveprovisionalmin-p-sampling-d5831621·1 events·first seen 21d ago

Aliases: Min-p sampling

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·21d ago·source ↗

Word Coverage Score (WCS): Measuring Lexical Suppression from LLM Sampling Filters

This paper introduces the Word Coverage Score (WCS), a metric that quantifies how much contextually appropriate low-frequency vocabulary is pruned away by standard sampling strategies (Top-p, Top-k, Min-p) in LLMs. The authors audit open-weight models against human-authored corpora to measure the 'lexical survival rate' of high-information words under typical decoding defaults. Their findings provide quantitative evidence that industry-standard sampling parameters act as unintended censorship mechanisms, suppressing linguistic diversity even when rare words exist within the model's probability distribution. The WCS is proposed as a diagnostic tool for tuning the coherence–lexical-richness trade-off.