Almanac
dataset

AIMS

datasetactiveprovisionalaims-8454362c·1 events·first seen 2d ago

Aliases: AIMS

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·2d ago·source ↗

AIMS dataset and intent-aware training improve LLM safety classification across multiple regimes

Researchers introduce AIMS, a 1,724-sample human-annotated dataset of difficult safety prompts paired with intent descriptions and harm labels, designed to study intent-aware training for LLM safety classifiers. The paper evaluates intent-aware training across SFT, DPO, reasoning distillation, and GRPO reinforcement learning, finding that directly rewarding intent faithfulness via GRPO yields the strongest average performance across five external safety benchmarks. Intent-conditioned distillation also outperforms reasoning-only distillation in most teacher-student pairs, and intent-aware models form the inference latency-F1 Pareto frontier. The work argues that explicit user intent modeling is a compact, high-quality supervision signal for more robust safety classification.