person

deokhk

personactiveprovisionaldeokhk-0fc0492c·1 events·first seen 15d ago

Aliases: deokhk

Co-occurring entities

GRPO Luar Reasoning Language Models

More like this (12)

AGDO DPPO iKraph DOCCI Opik best@k khoj-ai Doubao-2.0-lite cheahjs DAgger adk-samples d-OPSD

Recent events (1)

6arXiv · cs.CL·15d ago·source ↗

Luar: Selective Translation via Reinforcement Learning for Multilingual Reasoning

Luar is a reinforcement learning framework that trains reasoning language models to selectively invoke English translation only when direct understanding of a non-English input is deemed unreliable. The approach, built on top of GRPO, outperforms standard multilingual baselines across reasoning benchmarks, with especially large gains on low-resource languages. Analysis confirms the model learns to avoid unnecessary translation when direct reasoning suffices, and generalizes the translation-call behavior to unseen low-resource languages.

Frontier Model Releases Evaluation and Benchmarking GRPO Luar Reasoning Language Models +3 more