Agents-K1: End-to-end knowledge orchestration pipeline for agent-native scientific knowledge graphs
Agents-K1 is a new pipeline that converts raw scientific documents into structured knowledge graphs for use by LLM-based research agents, addressing the gap where existing systems reduce papers to abstracts and flat citation edges. The system integrates a multimodal parser, a 4B information-extraction model trained with GRPO, and a tri-source agent interface combining web search, graph retrieval, and cross-document traversal. The authors process 2.46 million scientific papers to produce Scholar-KG, releasing a one-million-paper subset. Experiments show improvements in scientific information extraction, knowledge graph construction, and multi-hop reasoning.
Related guides (3)
Related events (8)
K-Dense-AI/scientific-agent-skills: Ready-to-Use Agent Skills Library for Research and Engineering
A Python repository providing a collection of pre-built agent skills targeting research, science, engineering, analysis, finance, and writing tasks. The project has accumulated 24,087 stars with a notable single-day gain of 762 stars, indicating significant community traction. No detailed technical documentation is available from the snippet, but the scope suggests a modular agent tooling library.
Training-free mixture-of-agents framework combines LLMs and knowledge graphs for multi-document summarization
A new arXiv preprint proposes a training-free multi-agent framework for multi-document summarization (MDS) that decomposes the task into specialized agents for extractive selection, knowledge-aware abstraction, and iterative refinement, unified via a multi-perspective consistency mechanism. The system integrates LLMs with knowledge graphs without task-specific fine-tuning. Experiments across four datasets in English and Vietnamese show state-of-the-art or competitive performance, with the authors emphasizing cross-domain and cross-lingual generalization.
EurekAgent: Environment Engineering as the Key Bottleneck for Autonomous Scientific Discovery
EurekAgent is a new LLM-based agent system that reframes autonomous scientific discovery around 'environment engineering' — designing the resources, constraints, and interfaces that shape agent behavior — rather than prescribing agent workflows. The system engineers four dimensions: permissions, artifact management (filesystem/Git), budget awareness, and human-in-the-loop oversight. It achieves state-of-the-art results on mathematics, kernel engineering, and ML tasks, including new 26-circle packing results at under $11 in API cost, and is fully open-sourced.
Benchmark Agent: Autonomous system for end-to-end benchmark construction
Researchers introduce Benchmark Agent, a fully autonomous agentic system that orchestrates the complete benchmark construction pipeline — from query analysis and subtask design to data annotation and quality control. The system was used to produce 15 benchmarks spanning text understanding, multimodal understanding, and domain-specific reasoning, with evaluation via human judges, LLM-as-a-judge, and consistency checks. The work addresses two persistent problems in the field: the labor intensity of benchmark creation and rapid performance saturation after release. Code and a demo will be publicly released.
AgentSpec: A modular framework for controlled composition and analysis of embodied LLM agent scaffolds
AgentSpec is a new modular specification framework that represents embodied LLM agents as typed compositions of reusable policy components with standardized interfaces across perception, memory, reasoning, reflection, action, and learning modules. The framework enables controlled swapping and recombination of components, instantiated across four benchmarks (DeliveryBench, ALFRED, MiniGrid, RoboTHOR). Key findings include that agent performance is governed by scaffold compatibility and interaction effects rather than isolated module strength, and that RL-trained policies compose best when optimized with deployment-time scaffold structure. Code, baselines, and an interactive playground are publicly released.
agent-teams-ai: multi-agent orchestration framework with kanban-style oversight
A TypeScript open-source project on GitHub implements a multi-agent system where autonomous agents handle tasks, communicate with each other, and review each other's work, while the user supervises via a kanban board. The framework supports 200+ models across 75+ LLM providers including Codex, Claude, and OpenCode. It has accumulated 1,189 stars with 56 added today, suggesting growing community interest.
CodeGraph: Pre-indexed Local Code Knowledge Graph for AI Coding Agents
CodeGraph is an open-source TypeScript tool that builds a pre-indexed knowledge graph of a codebase to reduce token usage and tool calls for AI coding agents including Claude Code, Codex, Cursor, OpenCode, and Hermes Agent. It runs entirely locally, positioning itself as an efficiency layer between codebases and LLM-based coding assistants. The project gained significant traction with 3,688 stars in a single day, reaching 16,371 total stars.
KATE framework improves LLM tool calling via experiential knowledge integration and parallel reasoning
Researchers present KATE (Knowledge-Augmented Tool Execution), a framework addressing LLM failures in multi-step tool use by systematically studying knowledge acquisition, activation, and internalization. Key findings include that instance-level experiential knowledge outperforms abstract intent-level knowledge, that expanding reasoning width via parallel sampling with aggregation beats deeper chain-of-thought, and that reinforcement learning outperforms supervised fine-tuning for knowledge internalization. KATE is evaluated on BFCL-V3 and AppWorld benchmarks, showing consistent improvements over strong baselines across model scales.


