Almanac
paper

Democratic ICAI: Debating Our Way to Steering Principles from Preferences

paperactiveprovisionaldemocratic-icai-debating-our-way-to-steering-principles-from-preferences-e6c40e74·1 events·first seen 44h ago

Aliases: Democratic ICAI: Debating Our Way to Steering Principles from Preferences

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.LG·44h ago·source ↗

Democratic ICAI uses structured persona debate to derive richer alignment steering principles

Researchers introduce Democratic ICAI, an extension of Inverse Constitutional AI (ICAI) that gathers multiple competing rationales through structured persona debate rather than single-pass explanations. The method derives natural-language steering principles from these richer signals and applies them via LLM-based and decision-tree judges. Experiments on creative preference benchmarks MuCE-Pref and LiTBench show improved preference prediction over deliberative prompting and principle-based baselines, with LLM annotators preferring the resulting constitutions. The work addresses a core limitation of pairwise preference labels — that they reveal final choices but not the underlying reasoning.