4GitHub Trending (AI/LLM filtered)·3h ago

VulnClaw: AI agent + MCP toolchain for automated penetration testing workflows

VulnClaw is an open-source Python project that orchestrates an AI agent pipeline using MCP tooling and LLMs to automate the full penetration testing lifecycle: information gathering, vulnerability discovery, exploitation, and report generation via natural language input. The project has accumulated 1,049 GitHub stars with 105 added today, indicating notable community traction. It represents a concrete application of the MCP protocol to security automation.

Agent and Tool Ecosystem Unclecheng-li MCP VulnClaw

Related guides (1)

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

5Github Trending·25d ago·source ↗

HexStrike AI: MCP server exposing 150+ cybersecurity tools to AI agents

HexStrike AI is an open-source MCP server that enables AI agents (Claude, GPT, Copilot, and others) to autonomously invoke over 150 offensive security tools for penetration testing, vulnerability discovery, and bug bounty automation. The project bridges LLMs with real-world offensive security capabilities via the Model Context Protocol. With 9,221 GitHub stars, it represents a notable community signal around agentic security tooling and the expanding attack surface of AI-driven automation.

AI Safety Research Agent and Tool Ecosystem Claude HexStrike AI MCP +1 more

6arXiv · cs.AI·1mo ago·source ↗

Claw-Anything: Benchmark for Always-On Personal Assistants with Broad Digital World Access

Claw-Anything is a new benchmark designed to evaluate LLM agents acting as always-on personal assistants with access to long-horizon activity histories, interdependent backend services, and multi-device GUI/CLI interaction. The benchmark simulates months of user activity to create complex, noisy world states and evaluates both reactive and proactive assistance. GPT-5.5 achieves only 34.5% pass@1, revealing a substantial capability gap versus prior narrower benchmarks. An accompanying automated data-generation pipeline produces 2,000 training environments and yields a 23.7% improvement over the base model.

Long Context Evolution Evaluation and Benchmarking multi-round event injection Claw-Anything large language model agents +3 more

4Github Trending·1mo ago·source ↗

Deep Eye: Multi-Provider AI-Orchestrated Vulnerability Scanner

Deep Eye is an open-source Python tool that orchestrates multiple AI providers (OpenAI, Claude, Grok, Gemini, Ollama, Groq, Mistral, and others) to generate attack payloads and scan targets for 45+ vulnerability types. It produces professional security reports with compliance mapping. The project has accumulated 1,572 GitHub stars with 42 added today, indicating growing community interest in AI-augmented offensive security tooling.

AI Safety Research Agent and Tool Ecosystem Ollama Grok zakirkun +5 more

4Github Trending·1mo ago·source ↗

PentestAgent: AI Agent Framework for Black-Box Security Testing

PentestAgent is an open-source Python framework that applies AI agent techniques to penetration testing, bug bounty, and red-team workflows. The project has accumulated 2,497 GitHub stars with modest daily traction (+30). It represents a practical deployment of autonomous agent architectures in offensive security contexts.

Agent and Tool Ecosystem PentestAgent GH05TCREW

7Anthropic News·27d ago·source ↗

Anthropic Launches Claude Code Security: AI-Powered Vulnerability Detection for Defenders

Anthropic has released Claude Code Security in limited research preview for Enterprise and Team customers, a capability built into Claude Code that scans codebases for security vulnerabilities and suggests patches for human review. Unlike rule-based static analysis tools, it uses Claude's reasoning to understand code context, trace data flows, and detect complex vulnerabilities including novel ones. Built on Claude Opus 4.6, the system found over 500 previously undetected vulnerabilities in production open-source codebases during internal research. The release is framed as a defensive measure to put AI-enabled vulnerability discovery in the hands of defenders before attackers can exploit the same capabilities.

Frontier Model Releases AI Safety Research Claude Opus 4.6 Anthropic Policy Frontier Red Team Pacific Northwest National Laboratory +5 more

5Github Trending·6d ago·source ↗

AWS releases official agent toolkit with MCP servers and plugins for AI agents

AWS has published an official, AWS-supported toolkit on GitHub providing MCP servers, skills, and plugins to enable AI agents to interact with AWS services. The repository is Python-based and has accumulated 966 stars. This represents AWS's formal entry into the MCP server ecosystem for agentic workloads.

Enterprise Deployment Patterns Agent and Tool Ecosystem agent-toolkit-for-aws MCP Amazon Web Services

4Github Trending·16d ago·source ↗

claude-bug-bounty: autonomous bug bounty hunting tool built on Claude Code

A Python tool on GitHub integrates Claude Code to automate bug bounty hunting workflows from the terminal, covering reconnaissance, 20 vulnerability classes, autonomous hunting, and report generation. The project has accumulated 2,745 stars with 203 added today, indicating significant community traction. It represents a concrete agentic use case of Claude Code for offensive security automation.

Agent and Tool Ecosystem claude-bug-bounty Claude Code Anthropic

6arXiv · cs.AI·17d ago·source ↗

SpatialClaw: Code-as-action interface for agentic 3D/4D spatial reasoning with VLMs

SpatialClaw is a training-free framework that uses code execution as the action interface for vision-language model agents performing spatial reasoning tasks. The system maintains a stateful Python kernel with perception and geometry primitives, allowing the VLM to write iterative executable cells conditioned on prior outputs rather than committing to a full strategy upfront. Evaluated across 20 spatial reasoning benchmarks covering static and dynamic 3D/4D tasks, SpatialClaw achieves 59.9% average accuracy, outperforming the prior state-of-the-art spatial agent by +11.2 points across six VLM backbones.

Evaluation and Benchmarking Agent and Tool Ecosystem SpatialClaw +1 more