technique
malicious fine-tuning
techniqueactive
malicious-fine-tuning-101c8d73·1 events·first seen 28d agoAliases: malicious fine-tuning
Co-occurring entities
More like this (12)
Recent events (1)
Estimating Worst-Case Frontier Risks of Open-Weight LLMs
OpenAI introduces a methodology called malicious fine-tuning (MFT) to assess worst-case risks of releasing open-weight models, specifically applied to their internal model gpt-oss. The study attempts to elicit maximum dangerous capabilities in biology and cybersecurity domains through targeted fine-tuning. This represents a systematic effort to quantify uplift risks before open-weight releases, informing OpenAI's open-weight release policy.