3arXiv cs.CL (Computation and Language)·19h ago

HULAT2 multi-agent LangGraph system for Spanish Easy-to-Read text simplification at MER-TRANS 2026

Researchers from HULAT2-UC3M describe their submission to the MER-TRANS 2026 shared task on multilingual Easy-to-Read translation, using a LangGraph-based multi-agent workflow combining Gemini 2.5 Flash and RigoChat-7B-v2. The best run (RUN1) achieved a SARI score of 44.05 using Event-Condition-Action routing and internal quality signals, outperforming a LoRA-adapted generate-evaluate-regenerate baseline. Results show signal-guided multi-agent routing outperforms linear regeneration, while adding lexical support did not automatically improve reference-based scores.

Agent and Tool Ecosystem HULAT2-UC3M SARI LoRA LangGraph Gemini-2.5-Flash-Lite MER-TRANS 2026 RigoChat-7B-v2

Related guides (2)

LoRAConcept

LoRA: How to Teach a Giant AI New Tricks Without Rebuilding It

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

5arXiv · cs.CL·May 29, 2026·source ↗

Loong: A Human-Like Long Document Translation Agent with Observe-and-Act Adaptive Context Selection

Loong is a long document translation agent that uses a 3E memory module (Essence-Exemplar-Entity) to store structured historical context, replacing passive full-context attention with RL-optimized adaptive context selection. The agent learns its context retrieval policy via reinforcement learning on self-sampled reasoning trajectories. Evaluations show average gains of up to 13.0 points across three metrics in English↔Chinese, German, and French translation directions, with strong generalization and robustness to noise in ultra-long documents.

Long Context Evolution Agent and Tool Ecosystem YutongWang1216 3E Memory Module Reinforcement Learning +3 more

6arXiv · cs.CL·May 22, 2026·source ↗

LANG: Reinforcement Learning Framework for Multilingual Reasoning with Language-Adaptive Hint Guidance

LANG is a new RL-based framework for improving multilingual reasoning in LLMs that addresses the trade-off between input-language consistency and reasoning quality. It uses language-conditioned hints with a progressive decay schedule and a language-adaptive switch to tailor learning to per-language difficulty. Empirical results on multilingual mathematical benchmarks show improved reasoning without language drift toward English, and the approach generalizes beyond mathematics.

Evaluation and Benchmarking Alignment and RLHF large language models LANG multilingual mathematical benchmarks +3 more

4arXiv · cs.CL·May 19, 2026·source ↗

Ancient Greek to Modern Greek Machine Translation: Novel Benchmark and Fine-Tuning Experiments

Researchers introduce the AG-MG Parallel Corpus, a 132,481 sentence-pair dataset for Ancient Greek to Modern Greek machine translation, created via a pipeline combining web scraping, VecAlign with LaBSE embeddings, and Gemini 2.5 Flash-based alignment correction. The paper benchmarks NMT models (NLLB, M2M100) and a Greek LLM (Llama-Krikri-8B) under three fine-tuning strategies. Full-parameter fine-tuning of Llama-Krikri-8B achieves the best BLEU score of 13.16, while QLoRA-adapted M2M100-1.2B shows the largest relative gains (+10.3 BLEU). This represents the first comprehensive MT benchmark for this low-resource language pair.

Evaluation and Benchmarking Open Weights Progress M2M100 VecAlign NLLB +5 more

3arXiv · cs.CL·2d ago·source ↗

Cross-lingual relation extraction for Romanian: QLoRA fine-tuning narrows gap but small encoders remain competitive

Researchers evaluate cross-lingual relation extraction for Romanian by translating the SemEval-2010 Task 8 benchmark and testing Gemma 4 31B under zero-shot, few-shot, and QLoRA fine-tuned settings against encoder baselines (XLM-RoBERTa, Romanian BERT, RoBERTa-large). QLoRA fine-tuning improves macro F1 by over 22 percentage points and reduces the English-Romanian cross-lingual gap from 3.3 to 1.4 pp, but encoder baselines of 125M–560M parameters come within 1–4 pp of the fine-tuned 31B model. The study concludes that large LLMs offer limited advantage over compact encoders for single-task relation extraction in compute-constrained deployment scenarios, and releases the translated dataset, code, and models.

Evaluation and Benchmarking Open Weights Progress Romanian BERT QLoRA XLM-RoBERTa +2 more

6arXiv · cs.CL·Jun 2, 2026·source ↗

AgentCL: A Rigorous Evaluation Framework for Continual Learning in Language Agents

AgentCL is a new benchmark and evaluation framework designed to rigorously assess continual learning in language agents, addressing gaps in existing benchmarks that focus on retrieval over long-context documents or use naive task streams with limited cross-task analysis. The framework constructs compositional task streams where earlier sub-solutions, evidence, or workflows are intentionally reusable in later tasks, contrasting them with naive streams to measure transfer gains. The authors also introduce MemProbe, a probing method that stores interactions, insights, and skills while filtering unreliable experiences during consolidation. Empirical results across coding, deep research, and language understanding tasks show that controlled streams better distinguish memory design quality, and that naive streams can mask memory-induced degradation.

Long Context Evolution Evaluation and Benchmarking AgentCL MemProbe Continual Learning +3 more

5arXiv · cs.CL·3d ago·source ↗

Multi-agent system using open-source LLMs outperforms GPT-4 on disinformation detection

A new arXiv preprint proposes a multi-agent system for automated disinformation detection that emulates human annotator decision-making through consensus mechanisms, cognitive diversity, and hierarchical structure. The system uses open-source models (LLaMA, Kimi, Qwen, DeepSeek, LLaMA-Nemotron) and is evaluated on English, Polish, Slovak, and Bulgarian datasets across three fact-checking tasks. Results claim superior performance over individual LLMs including GPT-4 and GPT-3.5, with transparency benefits from using open weights models.

Open Weights Progress Agent and Tool Ecosystem Llama Nemotron Kimi DeepSeek V4 +5 more

4arXiv · cs.CL·Jun 24, 2026·source ↗

Multi-agent semantic rewriting framework for privacy-preserving RAG

A new arXiv preprint proposes a three-agent framework for sanitizing retrieved content in RAG pipelines by performing privacy extraction, semantic analysis, and reconstruction as an offline preprocessing step. Evaluated on ChatDoctor and Wiki-PII datasets across six LLMs, the approach reduces targeted information exposure in LLaMA-3-8B from 144 baseline instances to 1, while maintaining contextual fidelity (BLEU-1 of 0.122 vs. SAGE's 0.117). The framework introduces no additional online inference latency since rewriting is done offline. Source code is publicly released.

AI Safety Research Enterprise Deployment Patterns Privacy-Preserving RAG via Multi-Agent Semantic Rewriting Wiki-PII SAGE +2 more

6arXiv · cs.CL·Jun 2, 2026·source ↗

Luar: Selective Translation via Reinforcement Learning for Multilingual Reasoning

Luar is a reinforcement learning framework that trains reasoning language models to selectively invoke English translation only when direct understanding of a non-English input is deemed unreliable. The approach, built on top of GRPO, outperforms standard multilingual baselines across reasoning benchmarks, with especially large gains on low-resource languages. Analysis confirms the model learns to avoid unnecessary translation when direct reasoning suffices, and generalizes the translation-call behavior to unseen low-resource languages.

Frontier Model Releases Evaluation and Benchmarking GRPO Luar Reasoning Language Models +3 more

HULAT2 multi-agent LangGraph system for Spanish Easy-to-Read text simplification at MER-TRANS 2026

Related events (8)

5arXiv · cs.CL·May 29, 2026·source ↗