Entity · technique

malicious fine-tuning

techniqueactivemalicious-fine-tuning-101c8d73·1 events·first seen May 20, 2026

Aliases: malicious fine-tuning

Co-occurring entities

cybersecurity risk uplift biology risk uplift GPT-OSS OpenAI

More like this (12)

fine-tuning finetuning supervised fine-tuning behavioral fine-tuning reinforcement fine-tuning Parameter-Efficient Fine-Tuning instruction tuning adapter fine-tuning Prefix Tuning Prompt Tuning Chain-of-Thought Fine-Tuning adversarial refinement

Recent events (1)

8Openai Blog·May 20, 2026·source ↗

Estimating Worst-Case Frontier Risks of Open-Weight LLMs

OpenAI introduces a methodology called malicious fine-tuning (MFT) to assess worst-case risks of releasing open-weight models, specifically applied to their internal model gpt-oss. The study attempts to elicit maximum dangerous capabilities in biology and cybersecurity domains through targeted fine-tuning. This represents a systematic effort to quantify uplift risks before open-weight releases, informing OpenAI's open-weight release policy.

Evaluation and Benchmarking Open Weights Progress cybersecurity risk uplift biology risk uplift malicious fine-tuning +3 more