Almanac
technique

malicious fine-tuning

techniqueactivemalicious-fine-tuning-101c8d73·1 events·first seen 28d ago

Aliases: malicious fine-tuning

Co-occurring entities

More like this (12)

Recent events (1)

8Openai Blog·28d ago·source ↗

Estimating Worst-Case Frontier Risks of Open-Weight LLMs

OpenAI introduces a methodology called malicious fine-tuning (MFT) to assess worst-case risks of releasing open-weight models, specifically applied to their internal model gpt-oss. The study attempts to elicit maximum dangerous capabilities in biology and cybersecurity domains through targeted fine-tuning. This represents a systematic effort to quantify uplift risks before open-weight releases, informing OpenAI's open-weight release policy.