4GitHub Trending (AI/LLM filtered)·10d ago

Opik: open-source LLM observability and evaluation platform by Comet ML

Opik is an open-source toolkit from Comet ML for debugging, evaluating, and monitoring LLM applications, RAG systems, and agentic workflows. It provides tracing, automated evaluations, and production dashboards. The project has accumulated nearly 20K GitHub stars, indicating meaningful adoption in the practitioner community.

Enterprise Deployment Patterns Agent and Tool Ecosystem Comet ML Opik

Related guides (2)

Enterprise Deployment PatternsTopic guide

Enterprise Deployment Patterns: From AI Demo to Production Reality

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

4Github Trending·24d ago·source ↗

Langfuse: Open Source LLM Engineering Platform Trending on GitHub

Langfuse is an open-source LLM engineering platform providing observability, metrics, evaluations, prompt management, and dataset tooling. It integrates with OpenTelemetry, LangChain, OpenAI SDK, and LiteLLM. The project has accumulated 28,075 GitHub stars with 89 new stars today, indicating sustained community traction. Backed by Y Combinator (W23), it represents a notable entry in the LLM ops/tooling ecosystem.

Evaluation and Benchmarking Agent and Tool Ecosystem OpenTelemetry Langfuse Y Combinator +3 more

3Github Trending·27d ago·source ↗

Onyx: Open Source AI Chat Platform with Multi-LLM Support

Onyx is an open-source AI chat platform written in Python that supports multiple LLMs with advanced features. The repository has accumulated 29,665 total stars with modest daily traction (+28 today). It positions itself as an enterprise-ready AI assistant that integrates with various language model backends.

Enterprise Deployment Patterns Agent and Tool Ecosystem Onyx onyx-dot-app

4Github Trending·10d ago·source ↗

LiteLLM AI gateway trending: 50K stars, unified interface for 100+ LLM APIs

LiteLLM is a Python SDK and proxy server providing a unified OpenAI-compatible interface to 100+ LLM APIs including Bedrock, Azure, OpenAI, VertexAI, Anthropic, and others. It includes cost tracking, guardrails, load balancing, and logging. The project is trending on GitHub with ~50K total stars and 141 new stars today, signaling continued strong adoption as an AI gateway layer.

Inference Economics Agent and Tool Ecosystem Amazon Bedrock BerriAI LiteLLM +2 more

3Github Trending·9d ago·source ↗

MLflow trending on GitHub as open-source AI engineering platform

MLflow, an open-source platform for managing AI/ML workflows, is trending on GitHub with 26,442 total stars and 22 new stars today. The project supports agents, LLMs, and traditional ML models, offering debugging, evaluation, monitoring, and optimization capabilities for production AI applications. It is a mature, widely-used tooling platform in the MLOps space.

Agent and Tool Ecosystem Linux Foundation MLflow

3Github Trending·35h ago·source ↗

OpenKB: Open-source LLM knowledge base library gains traction on GitHub

VectifyAI has released OpenKB, an open-source Python library for building LLM-powered knowledge bases. The repository is trending on GitHub with 2,389 total stars and 208 new stars in a single day, suggesting meaningful community interest. No detailed technical description is available from the source snippet.

Agent and Tool Ecosystem OpenKB VectifyAI

6arXiv · cs.LG·8d ago·source ↗

Operadic consistency: a label-free signal for detecting compositional reasoning failures in LLMs

Researchers introduce operadic consistency (OC), a label-free inference-time signal that checks whether an LLM's direct answer to a compositional query agrees with the answer produced by composing its own stated decomposition of that query. Evaluated across 12 instruction-tuned LLMs (4B–671B parameters) on four multi-hop QA datasets, OC achieves Pearson r ∈ [0.86, 0.94] with accuracy uniformly across all datasets, outperforming self-consistency, semantic entropy, and P(True) in cross-dataset robustness. At the per-question level, OC provides information beyond existing baselines and yields selective-prediction improvements (AUARC lifts +0.086–0.096, AUROC lifts +0.092–0.164) at equal sampling cost, with results extending to frontier thinking models using chain-of-thought decompositions.

Evaluation and Benchmarking AI Safety Research operadic consistency Chain-of-Thought Self-Consistency MuSiQue +6 more

5Hugging Face Blog·8d ago·source ↗

AllenAI releases olmo-eval evaluation workbench for model development

AllenAI published a blog post on Hugging Face introducing olmo-eval, an evaluation workbench designed to integrate into the model development loop. The tool appears aimed at streamlining evaluation workflows for researchers iterating on open-weights models. This is relevant to the OLMo model family ecosystem and the broader open-weights evaluation infrastructure space.

Evaluation and Benchmarking Open Weights Progress OLMo AllenAI Hugging Face +1 more

4Github Trending·6d ago·source ↗

Open Interpreter: lightweight coding agent for open models (Deepseek, Kimi, Qwen)

Open Interpreter is an open-source Python coding agent framework supporting open-weight models including Deepseek, Kimi, and Qwen. The project has accumulated nearly 64,000 GitHub stars, with 45 new stars on the trending day. It provides a lightweight harness for running code-executing agents on locally-hosted or open models.

Open Weights Progress Agent and Tool Ecosystem Kimi DeepSeek V4 Qwen +1 more