Almanac
paper

Paved with True Intents: Intent-Aware Training Improves LLM Safety Classification Across Training Regimes

paperactiveprovisionalpaved-with-true-intents-intent-aware-training-improves-llm-safety-classification-across-training-regimes-b741b7e7·1 events·first seen 2d ago

Aliases: Paved with True Intents: Intent-Aware Training Improves LLM Safety Classification Across Training Regimes

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·2d ago·source ↗

AIMS dataset and intent-aware training improve LLM safety classification across multiple regimes

Researchers introduce AIMS, a 1,724-sample human-annotated dataset of difficult safety prompts paired with intent descriptions and harm labels, designed to study intent-aware training for LLM safety classifiers. The paper evaluates intent-aware training across SFT, DPO, reasoning distillation, and GRPO reinforcement learning, finding that directly rewarding intent faithfulness via GRPO yields the strongest average performance across five external safety benchmarks. Intent-conditioned distillation also outperforms reasoning-only distillation in most teacher-student pairs, and intent-aware models form the inference latency-F1 Pareto frontier. The work argues that explicit user intent modeling is a compact, high-quality supervision signal for more robust safety classification.