technique
Data Mixture Surgery
techniqueactiveprovisional
data-mixture-surgery-b4bc51f9·1 events·first seen 19d agoAliases: Data Mixture Surgery
Co-occurring entities
More like this (12)
Sparse Mixture-of-Expertscounterfactual data augmentationmixture-density networksMixture of ExpertsRedesign Mixture-of-Experts Routers with Manifold Power IterationData Is Better TogetherSemantic Neighbor MixingMDA (Mixture-Density Ambiguity)Heterogeneous Differential Privacy Federated LearningSynthetic Data GeneratorFrom Observation to Intervention: A Causal Audit of Expert Importance in Mixture-of-Experts ModelsData Journalist Agent: Transforming Data into Verifiable Multimodal Stories
Recent events (1)
LLMSurgeon: Post-Hoc Auditing of LLM Pretraining Data Mixtures
LLMSurgeon formalizes Data Mixture Surgery (DMS), a framework for estimating the domain-level distribution of an LLM's pretraining corpus using only generated text from the target model. The method casts DMS as an inverse problem under the label-shift assumption, using a calibrated soft confusion matrix to correct domain confusion and recover the latent mixture prior. The authors also introduce LLMScan, a verifiable evaluation suite built from open-source LLMs with known pretraining mixtures, on which LLMSurgeon demonstrates high-fidelity recovery of domain compositions without access to training data.