Almanac
technique

KDoS

techniqueactivekdos-b0db06cf·1 events·first seen 7d ago

Aliases: KDoS

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·7d ago·source ↗

KDoS framework proposes distribution-optimized synthetic data for LLM knowledge injection

Researchers introduce KDoS (Knowledge Distribution-optimized Synthesis), a framework that uses a three-stage feedback mechanism guided by 'knowledge density' to optimize the distribution of synthetic training data for LLMs. Rather than stopping at preset token counts or fixed ratios, KDoS dynamically adjusts synthesis to avoid sparse or redundant domain coverage. Experiments across Qwen, Ling, and LLaMA models (0.6B–16B parameters) on 1B–5B token scales show consistent improvements over baselines on six knowledge benchmarks. A key finding is that an optimal knowledge distribution exists and remains stable across model families and scales.