4arXiv cs.CL (Computation and Language)·7d ago

CURIOBOT framework uses LLM tutoring dialogues to operationalize curiosity-driven learning interventions

Researchers introduce CURIOBOT, a conversational tutoring framework that implements Berlyne's collative variables (novelty, complexity, conflict, uncertainty) as adaptive linguistic interventions via LLMs. Across 270 tutoring conversations, curiosity-oriented prompting strategies produced up to 2.4x more exploratory conversational turns under fixed time budgets. The study also introduces a learner-centered evaluation framework measuring exploratory questioning, conversational agency, and productive struggle. Results suggest curiosity functions as a partially independent interaction-level mechanism, and that LLM-mediated dialogue can serve as a scalable experimental platform for studying language's effect on cognition.

Agent and Tool Ecosystem CURIOBOT

Related guides (1)

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

4arXiv · cs.CL·11d ago·source ↗

Adaptive LLM tutoring system with subject-aware prompt routing improves high-school student engagement

Researchers develop and evaluate an LLM-based tutoring system that uses a learned prompt routing model to dynamically select pedagogical strategies based on 14 features extracted from conversation transcripts. The system was trained in simulation and deployed in an A/B test with 359 high-school students (656 conversations), showing sim-to-real transfer and reducing required interactions by ~3 turns. A stochastic routing strategy achieved a notably higher exercise conversion rate (28.1%) compared to a greedy router (19.1%) and static baseline (19.6%).

Enterprise Deployment Patterns Learning to Prompt: Improving Student Engagement with Adaptive LLM-based High-School Tutoring

5Openai Blog·1mo ago·source ↗

Large-scale Study of Curiosity-Driven Learning

OpenAI published research on curiosity-driven learning, exploring intrinsic motivation as a reward signal for reinforcement learning agents at scale. The study investigates how curiosity-based exploration can enable agents to learn useful behaviors without extrinsic rewards. This represents an early foundational contribution to reward-free and self-supervised RL research.

AI Safety Research Alignment and RLHF Reinforcement Learning OpenAI Curiosity-Driven Learning

5arXiv · cs.AI·42h ago·source ↗

LLawCo framework teaches embodied multi-agent LLMs to derive and follow cooperation laws

Researchers from MERL propose LLawCo (Learning Laws of Cooperation), a framework that enables embodied LLM-based agents to autonomously align with partners and task objectives in decentralized, partially observable environments. Agents reflect on past failures to extract misaligned behavioral patterns and derive high-level behavioral laws (e.g., 'Talk when necessary', 'Wait for partner'), which are incorporated into reasoning via supervised fine-tuning. The authors also introduce PARTNR-Dialog, a new large-scale multi-agent communicative planning benchmark, and report average success rate improvements of 4.5% on PARTNR-Dialog and 6.8% on TDW-MAT over state-of-the-art open-source communicative agent frameworks across four backbone LLMs.

Evaluation and Benchmarking Agent and Tool Ecosystem LLawCo MERL PARTNR +2 more

4arXiv · cs.CL·15d ago·source ↗

LoSoNA benchmark evaluates LLM adaptation to implicit local social norms in group chats

Researchers introduce LoSoNA, a benchmark for testing whether LLM-based agents can infer and adapt to unstated local conversational norms in multi-party chat scenarios. Each scenario presents a group-chat transcript where non-subject participants implicitly demonstrate a hidden norm, followed by an elicitor turn. Eight frontier and open-weight models are evaluated under four prompting conditions; naive prompting performs poorly for most models, while explicit norm-aware prompting yields uneven gains—Gemini 3.1 Pro reaches 84.2% and Claude Fable 5 reaches 81.6%. The work contributes to growing interest in evaluating LLM social and pragmatic capabilities beyond factual or reasoning tasks.

Evaluation and Benchmarking Agent and Tool Ecosystem Gemini 3.1 Pro Claude Fable 5 LoSoNA

6arXiv · cs.CL·28d ago·source ↗

AgentCL: A Rigorous Evaluation Framework for Continual Learning in Language Agents

AgentCL is a new benchmark and evaluation framework designed to rigorously assess continual learning in language agents, addressing gaps in existing benchmarks that focus on retrieval over long-context documents or use naive task streams with limited cross-task analysis. The framework constructs compositional task streams where earlier sub-solutions, evidence, or workflows are intentionally reusable in later tasks, contrasting them with naive streams to measure transfer gains. The authors also introduce MemProbe, a probing method that stores interactions, insights, and skills while filtering unreliable experiences during consolidation. Empirical results across coding, deep research, and language understanding tasks show that controlled streams better distinguish memory design quality, and that naive streams can mask memory-induced degradation.

Long Context Evolution Evaluation and Benchmarking AgentCL MemProbe Continual Learning +3 more

5arXiv · cs.CL·6d ago·source ↗

Cross-lingual prompting strategies unlock hidden parametric knowledge in LLMs

A new arXiv preprint investigates how cross-lingual prompting can surface factual knowledge that standard inference techniques fail to retrieve in multilingual LLMs. The authors identify four dimensions of cross-lingual exploration governing parametric knowledge retrieval and evaluate them on multilingual factual benchmarks across 17 typologically diverse languages. Results show cross-lingual exploration improves both factual recall and cross-lingual consistency, and is claimed to be a more compute-efficient approach than scaling native-language inference.

Evaluation and Benchmarking Cross-Lingual Exploration for Parametric Knowledge

5Hugging Face Blog·1mo ago·source ↗

Consilium: When Multiple LLMs Collaborate

Hugging Face introduces Consilium, a framework for multi-LLM collaboration where multiple language models work together on tasks rather than relying on a single model. The approach explores how ensembling or deliberation among diverse LLMs can improve output quality and robustness. This fits into the broader agent-tool ecosystem trend of orchestrating multiple AI models for better results.

Frontier Model Releases Agent and Tool Ecosystem Hugging Face Consilium

4arXiv · cs.CL·4d ago·source ↗

Conceptual framework for analyzing dialogue dynamics in human-AI and multi-agent collaborative problem-solving

A new arXiv preprint proposes a hierarchical two-layer coding scheme for analyzing dialogue in collaborative problem-solving, integrating cognitive and metacognitive dimensions. The framework is validated across nine datasets spanning multiple domains and is positioned to apply to both human-AI and multi-agent collaboration contexts. A key finding is that metacognitive regulation is a strong discriminator of deeper collaboration quality.

Evaluation and Benchmarking Agent and Tool Ecosystem Bridging Talk and Thought: Understanding Dialogue Dynamics Across Collaborative Problem-Solving Contexts