Entity · technique

ContextRL

techniqueactivecontextrl-f5e56683·1 events·first seen Jun 16, 2026

Aliases: ContextRL

Co-occurring entities

GRPO

More like this (12)

PrefixRL ReuseRL CheckRLM ExpRL MedRLM PipelineRL SafeRL-Lab AgenticRL LongTraceRL MemRL RL² PivotRL

Recent events (1)

5arXiv · cs.CL·Jun 16, 2026·source ↗

ContextRL: Context-aware reinforcement learning improves grounding in agentic and multimodal LLMs

Researchers introduce ContextRL, a reinforcement learning method that trains LLMs to select the context that supports a given query-answer pair from two highly similar candidates, rather than supervising only final answers. The approach constructs contrastive context pairs in two domains: coding agent trajectories (1k pairs) and multimodal image pairs (7k pairs). ContextRL achieves +2.2% average gains over standard GRPO on 5 long-horizon benchmarks and +1.8% across 12 visual QA benchmarks, with ablations showing the gains stem from the context-selection objective rather than the contrastive data alone.

Agent and Tool Ecosystem Alignment and RLHF GRPO ContextRL +1 more