paper

Democratic ICAI: Debating Our Way to Steering Principles from Preferences

paperactiveprovisionaldemocratic-icai-debating-our-way-to-steering-principles-from-preferences-e6c40e74·1 events·first seen 44h ago

Aliases: Democratic ICAI: Debating Our Way to Steering Principles from Preferences

Co-occurring entities

MuCE-Pref Inverse Constitutional AI LiTBench

More like this (12)

Democratic Inputs to AI AI Persuasive Framing in Collective Dilemmas deliberative alignment Towards Value-Constrained Credit Assignment in Fully Delegated AI Cooperatives Democratic Governance of Frontier AI: A Blueprint For A Federal Framework AI Safety via Debate Agentic Chain-of-Thought Steering Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning FAIR Principles Debate (AI safety technique)Conformal Decision Theory REAR: Test-time Preference Realignment through Reward Decomposition

Recent events (1)

5arXiv · cs.LG·44h ago·source ↗

Democratic ICAI uses structured persona debate to derive richer alignment steering principles

Researchers introduce Democratic ICAI, an extension of Inverse Constitutional AI (ICAI) that gathers multiple competing rationales through structured persona debate rather than single-pass explanations. The method derives natural-language steering principles from these richer signals and applies them via LLM-based and decision-tree judges. Experiments on creative preference benchmarks MuCE-Pref and LiTBench show improved preference prediction over deliberative prompting and principle-based baselines, with LLM annotators preferring the resulting constitutions. The work addresses a core limitation of pairwise preference labels — that they reveal final choices but not the underlying reasoning.

Evaluation and Benchmarking Alignment and RLHF Democratic ICAI: Debating Our Way to Steering Principles from Preferences MuCE-Pref Inverse Constitutional AI +1 more