5arXiv cs.CL (Computation and Language)·8d ago

ArogyaSutra: Multi-agent framework for multimodal medical reasoning in Indic languages

Researchers introduce ArogyaSutra, an actor-critic-based multi-agent framework for multilingual multimodal medical reasoning targeting Indic languages, alongside ArogyaBodha, a large-scale dataset spanning 31 body systems, six imaging modalities, and 21 clinical domains across English and seven Indian languages. The framework integrates tool grounding with dual-memory mechanisms and uses actor-critic simulation trajectories for distillation. The work addresses a critical gap in AI healthcare access for low-resource, multilingual settings like rural India where English-centric MLLMs fall short.

Agent and Tool Ecosystem Multimodal Progress ArogyaSutra IIT Patna ArogyaBodha

Related guides (2)

Multimodal ProgressTopic guide

Multimodal Progress: How AI Learned to See, Hear, and Act

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

4arXiv · cs.CL·46h ago·source ↗

MedRLM: Recursive multimodal agent framework for long-context clinical decision support

MedRLM is a proposed framework for clinical decision support that uses recursive multi-agent reasoning over heterogeneous patient data including EHRs, medical images, physiological sensor streams, and clinical guidelines. Rather than single-step prompting, it decomposes patient cases into an inspectable external environment coordinated by specialized agents, with a Clinical Evidence Graph Memory and sensor-triggered deeper reasoning. The paper outlines an evaluation design using public and credentialed clinical datasets spanning radiology, ECG, ICU time series, and referral outcomes. The work targets a gap between static medical QA benchmarks and real-world longitudinal clinical workflows.

Agent and Tool Ecosystem Multimodal Progress MedRLM Clinical Evidence Graph Memory

6arXiv · cs.CL·9d ago·source ↗

OpenMedReason: Large-scale multimodal medical reasoning corpus with 450K instances for clinical VLM training

Researchers introduce OpenMedReason, a 450K-instance open multimodal medical reasoning corpus with reasoning traces derived from human-authored biomedical literature rather than synthetic chains of thought. The dataset covers diverse medical imaging modalities and is paired with OpenMedReason-Bench, a held-out benchmark evaluating LVLMs on perception, medical knowledge, and rationale axes. Training with OpenMedReason yields a 20% average VQA accuracy improvement over base models and achieves performance within 4.2% of leading comparable-scale medical VLMs. Both the dataset and code are publicly released.

Evaluation and Benchmarking Alignment and RLHF OpenMedReason OpenMedReason-Bench +1 more

3Github Trending·12d ago·source ↗

agent-teams-ai: multi-agent orchestration framework with kanban-style oversight

A TypeScript open-source project on GitHub implements a multi-agent system where autonomous agents handle tasks, communicate with each other, and review each other's work, while the user supervises via a kanban board. The framework supports 200+ models across 75+ LLM providers including Codex, Claude, and OpenCode. It has accumulated 1,189 stars with 56 added today, suggesting growing community interest.

Agent and Tool Ecosystem agent-teams-ai OpenAI Anthropic

6arXiv · cs.CL·3d ago·source ↗

RubricsTree: Scalable hierarchical rubric framework for evaluating personal health AI agents

RubricsTree is a new evaluation framework for LLM-powered personal health agents, built around a hierarchical taxonomy of over 100 clinically-verifiable Boolean rubrics derived from 4,000 real user queries and curated with physician oversight. A context-aware router activates only relevant rubrics per query, enabling scalable yet expert-aligned evaluation. The framework outperforms strong LLM-as-a-judge baselines on expert alignment and, when used as training signal, yields up to ~66% relative gains on HealthBench across Gemini, GPT, and Qwen model families. The work addresses a concrete bottleneck in clinical deployment of health AI: the cost-quality tradeoff in evaluation.

Evaluation and Benchmarking AI Safety Research HealthBench RubricsTree Qwen +2 more

5Openai Blog·1mo ago·source ↗

OpenAI Introduces IndQA: Multilingual Benchmark for Indian Languages

OpenAI has released IndQA, a benchmark designed to evaluate AI systems across 12 Indian languages and 10 knowledge domains. The benchmark was developed with domain experts and focuses on cultural understanding and reasoning capabilities. It targets a significant gap in multilingual evaluation coverage for South Asian languages.

Evaluation and Benchmarking Multimodal Progress IndQA OpenAI

5Github Trending·11d ago·source ↗

ARIS: Lightweight autonomous ML research agent using Markdown-only skills

ARIS (Auto-Research-In-Sleep) is an open-source Python project providing lightweight, framework-free Markdown-based skills for autonomous ML research workflows, including cross-model review loops, idea discovery, and experiment automation. It is designed to work with any LLM agent backend including Claude Code, Codex, or others. The project has accumulated 11,791 GitHub stars with notable daily traction (+106), suggesting meaningful community adoption.

Agent and Tool Ecosystem wanshuiyin ARIS Claude Code +1 more

4arXiv · cs.CL·17d ago·source ↗

Training-free mixture-of-agents framework combines LLMs and knowledge graphs for multi-document summarization

A new arXiv preprint proposes a training-free multi-agent framework for multi-document summarization (MDS) that decomposes the task into specialized agents for extractive selection, knowledge-aware abstraction, and iterative refinement, unified via a multi-perspective consistency mechanism. The system integrates LLMs with knowledge graphs without task-specific fine-tuning. Experiments across four datasets in English and Vietnamese show state-of-the-art or competitive performance, with the authors emphasizing cross-domain and cross-lingual generalization.

Evaluation and Benchmarking Agent and Tool Ecosystem A Training-Free Mixture-of-Agents Framework for Multi-Document Summarization using LLMs and Knowledge Graphs

6Hugging Face Blog·1mo ago·source ↗

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

Cohere's Aya Vision is a multilingual multimodal model designed to extend vision-language capabilities beyond English-centric systems. The blog post provides a technical deep-dive into the model's architecture, training approach, and multilingual evaluation results. It represents a notable push toward broader language coverage in multimodal AI, targeting underrepresented languages in the vision-language space.

Evaluation and Benchmarking Open Weights Progress Aya Cohere Hugging Face +2 more