Democratic ICAI: Debating Our Way to Steering Principles from Preferences
democratic-icai-debating-our-way-to-steering-principles-from-preferences-e6c40e74·1 events·first seen 44h agoAliases: Democratic ICAI: Debating Our Way to Steering Principles from Preferences
Co-occurring entities
More like this (12)
Recent events (1)
Democratic ICAI uses structured persona debate to derive richer alignment steering principles
Researchers introduce Democratic ICAI, an extension of Inverse Constitutional AI (ICAI) that gathers multiple competing rationales through structured persona debate rather than single-pass explanations. The method derives natural-language steering principles from these richer signals and applies them via LLM-based and decision-tree judges. Experiments on creative preference benchmarks MuCE-Pref and LiTBench show improved preference prediction over deliberative prompting and principle-based baselines, with LLM annotators preferring the resulting constitutions. The work addresses a core limitation of pairwise preference labels — that they reveal final choices but not the underlying reasoning.