dataset
Dynaword
datasetactiveprovisional
dynaword-2343d64d·1 events·first seen 11d agoAliases: Dynaword
Co-occurring entities
More like this (12)
Recent events (1)
PropMe framework distinguishes memorization capability from propensity in LLMs
A new arXiv preprint introduces PropMe, a framework that separates whether LLMs can be forced to reproduce training data (capability) from whether they do so under ordinary use (propensity). The authors also release SimpleTrace, a lightweight pipeline using infini-gram to attribute model outputs to training corpora. Evaluating two open models on Common Pile and Dynaword, they find a consistent gap: adversarial prefix attacks elicit strong memorization, but propensity scores remain low in non-adversarial settings. The paper argues memorization audits should report both worst-case extractability and ordinary leakage propensity.