ARIS: Lightweight autonomous ML research agent using Markdown-only skills
ARIS (Auto-Research-In-Sleep) is an open-source Python project providing lightweight, framework-free Markdown-based skills for autonomous ML research workflows, including cross-model review loops, idea discovery, and experiment automation. It is designed to work with any LLM agent backend including Claude Code, Codex, or others. The project has accumulated 11,791 GitHub stars with notable daily traction (+106), suggesting meaningful community adoption.
Related guides (3)
Related events (8)
AutoResearchClaw: Fully Autonomous Self-Evolving Research Agent (Idea to Paper)
AutoResearchClaw is an open-source Python project from aiming-lab that claims to automate the full research pipeline from idea to paper, positioning itself as a fully autonomous and self-evolving research agent. The repository has accumulated 12,426 stars with 55 added today, indicating notable community traction. It represents a concrete implementation in the growing space of AI agents designed to conduct and write scientific research autonomously.
AARRI-Bench evaluates frontier LLMs and agents on granular research-intern-level tasks
Researchers introduce AARR (Act As a Real Researcher), a new benchmark series targeting whether AI agents can emulate the professionalism, thoroughness, and nuanced judgment of human researchers in granular research scenarios—not just macro-level task execution. The first benchmark, AARRI-Bench, tests frontier models and agentic harnesses, finding that even the best configuration (Mini-SWE-Agent with Claude Opus 4.7) achieves only 68.3% success, frequently missing subtle but critical details obvious to human researchers. The work argues that closing the gap requires deeper modeling of research behavior rather than more complex scaffolding.
K-Dense-AI/scientific-agent-skills: Ready-to-Use Agent Skills Library for Research and Engineering
A Python repository providing a collection of pre-built agent skills targeting research, science, engineering, analysis, finance, and writing tasks. The project has accumulated 24,087 stars with a notable single-day gain of 762 stars, indicating significant community traction. No detailed technical documentation is available from the snippet, but the scope suggests a modular agent tooling library.
Microsoft RD-Agent: automated AI-driven R&D for data and model development
Microsoft has released RD-Agent, an open-source Python framework aimed at automating high-value R&D processes in AI, with a focus on data and model development. The project positions AI as the driver of data-driven AI workflows, targeting industrial productivity use cases. With 13,500 GitHub stars, it has attracted meaningful community interest, and a technical report is available.
last30days-skill: AI agent skill for multi-source research synthesis
A Python-based AI agent skill on GitHub that queries Reddit, X, YouTube, Hacker News, Polymarket, and the web to research any topic, then synthesizes a grounded summary. The repository has accumulated 27,522 stars with 173 added today, indicating significant community traction. It represents a practical agent tool for multi-source information aggregation.
ReproRepo: Scalable LLM agent framework for reproducibility auditing using GitHub issues
ReproRepo is a new framework for evaluating LLM agents on reproducibility auditing of ML research, using naturally occurring GitHub issues as supervision signals rather than costly manual curation. The framework is instantiated on 1,149 recent ML papers from major conferences and benchmarks four frontier model-agent configurations. The best-performing agent (Codex with GPT-5.5) surfaces at least one semantically related human-reported reproduction blocker for ~90% of papers, though exact localization of issues remains a weakness. The work provides a reusable, scalable evaluation harness for this underexplored agentic task.
agent-skills: Secure Validated Skill Registry for AI Coding Agents
A TypeScript-based open-source skill registry designed to extend AI coding agents including Claude Code, Cursor, GitHub Copilot, and Antigravity with validated, reusable capabilities. The project provides a structured way to add skills to multiple coding agent platforms with a focus on security and validation. It is gaining notable traction with 3,767 total stars and 225 stars added today.
karpathy/autoresearch: AI Agents for Automated Single-GPU Research
Andrej Karpathy's autoresearch repository on GitHub has accumulated over 82,000 stars, with 332 new stars today. The project focuses on AI agents that autonomously run research experiments on single-GPU nanochat training setups. The high star count and trending activity suggest significant community interest in automated ML research tooling.


