Entity · other

multi-turn language models

otheractivemulti-turn-language-models-95ed38d9·1 events·first seen May 29, 2026

Aliases: multi-turn language models

Co-occurring entities

on-policy distillation self-anchored drift Canonical-Context On-Policy Distillation (CCOPD)

More like this (12)

Transformer Language Models Understanding Large Language Models Tapered Language Models Random Language Model Reinforcement Learning for Language Models encoder-only language models Reasoning Language Models unsupervised language modeling Multimodal Large Language Models large language models generative language modeling large language model agents

Recent events (1)

6arXiv · cs.CL·May 29, 2026·source ↗

Canonical-Context On-Policy Distillation (CCOPD) for Multi-Turn LLM Consistency

This paper identifies 'self-anchored drift' as a key failure mode in multi-turn LLMs: when information is revealed incrementally across turns, models produce unsupported assumptions that distort final answers, even when the total evidence is identical to a single-prompt setting. The authors propose Canonical-Context On-Policy Distillation (CCOPD), which trains a student model on incremental multi-turn conversations to match the output distribution of a frozen teacher conditioned on the full clean prompt. Trained only on math conversations, CCOPD achieves a 32% average relative improvement on multi-turn (RAW-SHARDED) tasks and generalizes zero-shot to five out-of-domain task families while preserving single-prompt performance.

Evaluation and Benchmarking Agent and Tool Ecosystem on-policy distillation multi-turn language models self-anchored drift +2 more