5GitHub Trending (AI/LLM filtered)·11d ago

ARIS: Lightweight autonomous ML research agent using Markdown-only skills

ARIS (Auto-Research-In-Sleep) is an open-source Python project providing lightweight, framework-free Markdown-based skills for autonomous ML research workflows, including cross-model review loops, idea discovery, and experiment automation. It is designed to work with any LLM agent backend including Claude Code, Codex, or others. The project has accumulated 11,791 GitHub stars with notable daily traction (+106), suggesting meaningful community adoption.

Agent and Tool Ecosystem wanshuiyin ARIS Claude Code Anthropic

Related guides (3)

Claude Code

Claude Code: Anthropic's Autonomous Coding Agent

Read asBeginner In-depthfeatured

Anthropic

Anthropic: The AI Safety Company at the Center of the Frontier

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

4Github Trending·1mo ago·source ↗

AutoResearchClaw: Fully Autonomous Self-Evolving Research Agent (Idea to Paper)

AutoResearchClaw is an open-source Python project from aiming-lab that claims to automate the full research pipeline from idea to paper, positioning itself as a fully autonomous and self-evolving research agent. The repository has accumulated 12,426 stars with 55 added today, indicating notable community traction. It represents a concrete implementation in the growing space of AI agents designed to conduct and write scientific research autonomously.

Agent and Tool Ecosystem AutoResearchClaw aiming-lab

6arXiv · cs.AI·12d ago·source ↗

AARRI-Bench evaluates frontier LLMs and agents on granular research-intern-level tasks

Researchers introduce AARR (Act As a Real Researcher), a new benchmark series targeting whether AI agents can emulate the professionalism, thoroughness, and nuanced judgment of human researchers in granular research scenarios—not just macro-level task execution. The first benchmark, AARRI-Bench, tests frontier models and agentic harnesses, finding that even the best configuration (Mini-SWE-Agent with Claude Opus 4.7) achieves only 68.3% success, frequently missing subtle but critical details obvious to human researchers. The work argues that closing the gap requires deeper modeling of research behavior rather than more complex scaffolding.

Evaluation and Benchmarking Agent and Tool Ecosystem Claude Opus 4.6 SWE-bench AARRI-Bench +2 more

4Github Trending·1mo ago·source ↗

K-Dense-AI/scientific-agent-skills: Ready-to-Use Agent Skills Library for Research and Engineering

A Python repository providing a collection of pre-built agent skills targeting research, science, engineering, analysis, finance, and writing tasks. The project has accumulated 24,087 stars with a notable single-day gain of 762 stars, indicating significant community traction. No detailed technical documentation is available from the snippet, but the scope suggests a modular agent tooling library.

Agent and Tool Ecosystem K-Dense-AI/scientific-agent-skills K-Dense-AI

5Github Trending·3d ago·source ↗

Microsoft RD-Agent: automated AI-driven R&D for data and model development

Microsoft has released RD-Agent, an open-source Python framework aimed at automating high-value R&D processes in AI, with a focus on data and model development. The project positions AI as the driver of data-driven AI workflows, targeting industrial productivity use cases. With 13,500 GitHub stars, it has attracted meaningful community interest, and a technical report is available.

Enterprise Deployment Patterns Agent and Tool Ecosystem Microsoft RD-Agent

4Github Trending·16d ago·source ↗

last30days-skill: AI agent skill for multi-source research synthesis

A Python-based AI agent skill on GitHub that queries Reddit, X, YouTube, Hacker News, Polymarket, and the web to research any topic, then synthesizes a grounded summary. The repository has accumulated 27,522 stars with 173 added today, indicating significant community traction. It represents a practical agent tool for multi-source information aggregation.

Agent and Tool Ecosystem last30days-skill mvanhorn

5arXiv · cs.LG·3d ago·source ↗

ReproRepo: Scalable LLM agent framework for reproducibility auditing using GitHub issues

ReproRepo is a new framework for evaluating LLM agents on reproducibility auditing of ML research, using naturally occurring GitHub issues as supervision signals rather than costly manual curation. The framework is instantiated on 1,149 recent ML papers from major conferences and benchmarks four frontier model-agent configurations. The best-performing agent (Codex with GPT-5.5) surfaces at least one semantically related human-reported reproduction blocker for ~90% of papers, though exact localization of issues remains a weakness. The work provides a reusable, scalable evaluation harness for this underexplored agentic task.

Evaluation and Benchmarking Agent and Tool Ecosystem OpenAI ReproRepo Codex +1 more

4Github Trending·1mo ago·source ↗

agent-skills: Secure Validated Skill Registry for AI Coding Agents

A TypeScript-based open-source skill registry designed to extend AI coding agents including Claude Code, Cursor, GitHub Copilot, and Antigravity with validated, reusable capabilities. The project provides a structured way to add skills to multiple coding agent platforms with a focus on security and validation. It is gaining notable traction with 3,767 total stars and 225 stars added today.

Enterprise Deployment Patterns Agent and Tool Ecosystem Cursor Claude Code Antigravity +2 more

5Github Trending·1mo ago·source ↗

karpathy/autoresearch: AI Agents for Automated Single-GPU Research

Andrej Karpathy's autoresearch repository on GitHub has accumulated over 82,000 stars, with 332 new stars today. The project focuses on AI agents that autonomously run research experiments on single-GPU nanochat training setups. The high star count and trending activity suggest significant community interest in automated ML research tooling.

Training Infrastructure Agent and Tool Ecosystem autoresearch Andrej Karpathy nanochat