5Hacker News (AI-filtered, score >= 200)·2h ago

Ornith-1.0: self-improving open-source models for agentic coding

DeepReinforce AI has released Ornith-1.0, an open-source model series targeting agentic coding tasks with a self-improvement mechanism. The project is hosted on GitHub and has attracted moderate community attention on Hacker News (172 points, 32 comments). Self-improving open-weights coding agents represent an active area of development as the field pushes toward autonomous software engineering.

Open Weights Progress Agent and Tool Ecosystem Ornith-1.0 DeepReinforce AI

Related guides (2)

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Open Weights ProgressTopic guide

Open Weights Progress: How Freely Available AI Models Caught Up to the Frontier

Read asBeginner In-depth

Related events (8)

4Simon Willison'S Weblog·8h ago·source ↗

Simon Willison highlights Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding

Simon Willison links to or comments on Ornith-1.0, a system described as enabling self-scaffolding LLMs for agentic coding tasks. The item appears to be a brief pointer or commentary from Willison's blog. Self-scaffolding approaches are relevant to the broader agent-tool ecosystem as they reduce reliance on externally-authored harnesses.

Agent and Tool Ecosystem Simon Willison Ornith-1.0

4Openai Blog·1mo ago·source ↗

Coding with OpenAI o1

OpenAI published a brief feature in which Scott Wu, CEO of Cognition (maker of the Devin AI software engineer), describes how o1 approaches coding decisions in a more human-like, reasoning-oriented manner. The piece is a short promotional commentary tied to the o1 model launch, highlighting o1's potential impact on AI-assisted software development. No new technical benchmarks or capability details are disclosed.

Frontier Model Releases Agent and Tool Ecosystem Scott Wu Devin Cognition +1 more

4Github Trending·15d ago·source ↗

Open Interpreter: lightweight coding agent for open models (Deepseek, Kimi, Qwen)

Open Interpreter is an open-source Python coding agent framework supporting open-weight models including Deepseek, Kimi, and Qwen. The project has accumulated nearly 64,000 GitHub stars, with 45 new stars on the trending day. It provides a lightweight harness for running code-executing agents on locally-hosted or open models.

Open Weights Progress Agent and Tool Ecosystem Kimi DeepSeek V4 Qwen +1 more

5The Batch·3d ago·source ↗

The Batch Issue 359: Loop Engineering for Agentic Coding, GLM-5.2 Open-Weights Release, Apple On-Device Models

Andrew Ng's weekly letter introduces a framework of three nested loops for agentic software development (engineering loop, developer feedback loop, external feedback loop), contextualizing the 'loop engineering' trend popularized by Claude Code and OpenClaw creators. The issue also covers Z.ai's GLM-5.2, a 753B MoE open-weights model with 1M token context that claims first place among open models on Artificial Analysis Intelligence Index v4.1 and leads all models on PostTrainBench for long-running agentic tasks. Additional coverage includes Apple's recipe for on-device models and AI education trends.

Frontier Model Releases Evaluation and Benchmarking DeepLearning.AI Artificial Analysis Intelligence Index Boris Cherny +8 more

4Github Trending·19d ago·source ↗

Archon: open-source harness builder for deterministic AI coding workflows

Archon is an open-source TypeScript project positioning itself as a harness builder for AI coding, aiming to make AI-assisted code generation deterministic and repeatable. The repository has accumulated 22,323 stars with modest daily momentum (+38). It targets a known pain point in agentic coding workflows: reproducibility and controllability of AI-generated outputs.

Agent and Tool Ecosystem Archon coleam00

6arXiv · cs.AI·6d ago·source ↗

OpenThoughts-Agent: Open data curation pipeline for broadly capable agentic models

The OpenThoughts-Agent (OT-Agent) project releases a fully open data curation pipeline for training agentic language models, addressing the gap left by prior efforts (SWE-Smith, SERA, Nemotron-Terminal) that target single benchmarks. The team conducts over 100 controlled ablation experiments and assembles a 100K-example training set, fine-tuning Qwen3-32B to achieve 44.8% average accuracy across seven agentic benchmarks — a 3.9 percentage point improvement over the strongest existing open agentic model (Nemotron-Terminal-32B at 40.9%). Training data, pipeline, experimental data, and models are publicly released at openthoughts.ai.

Evaluation and Benchmarking Open Weights Progress Nemotron-Terminal-32B SWE-Smith SERA +4 more

7The Batch·28d ago·source ↗

Z.ai's GLM-5.1 Open-Weights Model Targets Multi-Hour Agentic Coding Tasks with Iterative Self-Evaluation

Z.ai released GLM-5.1, a 754B parameter mixture-of-experts open-weights model optimized for long-running agentic coding tasks, capable of cycling through planning, execution, and strategy revision hundreds of times over sessions lasting up to eight hours. The model achieves top open-weights scores on the Artificial Analysis Intelligence Index and third place on Arena's Code leaderboard, while leading SWE-Bench Pro in Z.ai's own evaluations at 58.4 percent. Weights are available on HuggingFace under MIT license, with API pricing roughly 40 percent higher than its predecessor but still below comparable proprietary models. No technical report has been published, leaving architecture and training details undisclosed.

Frontier Model Releases Evaluation and Benchmarking Gemini 3.1 Pro Artificial Analysis Intelligence Index Claude Opus 4.6 +14 more

7Mistral Ai News·28d ago·source ↗

Mistral AI Releases Devstral Medium and Devstral Small 1.1 for Agentic Coding

Mistral AI, in collaboration with All Hands AI, has released two new agentic coding models: Devstral Small 1.1 (24B parameters, Apache 2.0, 53.6% on SWE-Bench Verified) and Devstral Medium (61.6% on SWE-Bench Verified, API-only). Devstral Medium is positioned as a cost-performance leader, claiming to surpass Gemini 2.5 Pro and GPT-4.1 at roughly one-quarter the price, priced at $0.4/M input and $2/M output tokens. Devstral Small 1.1 sets a new state-of-the-art among open models for code agents without test-time scaling, and supports both Mistral function calling and XML formats for broad agentic scaffold compatibility.

Frontier Model Releases Evaluation and Benchmarking Devstral 2 Small Mistral AI All Hands AI +10 more