Almanac
technique

Inverse Constitutional AI

techniqueactiveprovisionalinverse-constitutional-ai-5aa01b0f·1 events·first seen 45h ago

Aliases: Inverse Constitutional AI

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.LG·45h ago·source ↗

Democratic ICAI uses structured persona debate to derive richer alignment steering principles

Researchers introduce Democratic ICAI, an extension of Inverse Constitutional AI (ICAI) that gathers multiple competing rationales through structured persona debate rather than single-pass explanations. The method derives natural-language steering principles from these richer signals and applies them via LLM-based and decision-tree judges. Experiments on creative preference benchmarks MuCE-Pref and LiTBench show improved preference prediction over deliberative prompting and principle-based baselines, with LLM annotators preferring the resulting constitutions. The work addresses a core limitation of pairwise preference labels — that they reveal final choices but not the underlying reasoning.