Opik: open-source LLM observability and evaluation platform by Comet ML
Opik is an open-source toolkit from Comet ML for debugging, evaluating, and monitoring LLM applications, RAG systems, and agentic workflows. It provides tracing, automated evaluations, and production dashboards. The project has accumulated nearly 20K GitHub stars, indicating meaningful adoption in the practitioner community.
Related guides (2)
Related events (8)
Langfuse: Open Source LLM Engineering Platform Trending on GitHub
Langfuse is an open-source LLM engineering platform providing observability, metrics, evaluations, prompt management, and dataset tooling. It integrates with OpenTelemetry, LangChain, OpenAI SDK, and LiteLLM. The project has accumulated 28,075 GitHub stars with 89 new stars today, indicating sustained community traction. Backed by Y Combinator (W23), it represents a notable entry in the LLM ops/tooling ecosystem.
Onyx: Open Source AI Chat Platform with Multi-LLM Support
Onyx is an open-source AI chat platform written in Python that supports multiple LLMs with advanced features. The repository has accumulated 29,665 total stars with modest daily traction (+28 today). It positions itself as an enterprise-ready AI assistant that integrates with various language model backends.
LiteLLM AI gateway trending: 50K stars, unified interface for 100+ LLM APIs
LiteLLM is a Python SDK and proxy server providing a unified OpenAI-compatible interface to 100+ LLM APIs including Bedrock, Azure, OpenAI, VertexAI, Anthropic, and others. It includes cost tracking, guardrails, load balancing, and logging. The project is trending on GitHub with ~50K total stars and 141 new stars today, signaling continued strong adoption as an AI gateway layer.
MLflow trending on GitHub as open-source AI engineering platform
MLflow, an open-source platform for managing AI/ML workflows, is trending on GitHub with 26,442 total stars and 22 new stars today. The project supports agents, LLMs, and traditional ML models, offering debugging, evaluation, monitoring, and optimization capabilities for production AI applications. It is a mature, widely-used tooling platform in the MLOps space.
OpenKB: Open-source LLM knowledge base library gains traction on GitHub
VectifyAI has released OpenKB, an open-source Python library for building LLM-powered knowledge bases. The repository is trending on GitHub with 2,389 total stars and 208 new stars in a single day, suggesting meaningful community interest. No detailed technical description is available from the source snippet.
Operadic consistency: a label-free signal for detecting compositional reasoning failures in LLMs
Researchers introduce operadic consistency (OC), a label-free inference-time signal that checks whether an LLM's direct answer to a compositional query agrees with the answer produced by composing its own stated decomposition of that query. Evaluated across 12 instruction-tuned LLMs (4B–671B parameters) on four multi-hop QA datasets, OC achieves Pearson r ∈ [0.86, 0.94] with accuracy uniformly across all datasets, outperforming self-consistency, semantic entropy, and P(True) in cross-dataset robustness. At the per-question level, OC provides information beyond existing baselines and yields selective-prediction improvements (AUARC lifts +0.086–0.096, AUROC lifts +0.092–0.164) at equal sampling cost, with results extending to frontier thinking models using chain-of-thought decompositions.
AllenAI releases olmo-eval evaluation workbench for model development
AllenAI published a blog post on Hugging Face introducing olmo-eval, an evaluation workbench designed to integrate into the model development loop. The tool appears aimed at streamlining evaluation workflows for researchers iterating on open-weights models. This is relevant to the OLMo model family ecosystem and the broader open-weights evaluation infrastructure space.
Open Interpreter: lightweight coding agent for open models (Deepseek, Kimi, Qwen)
Open Interpreter is an open-source Python coding agent framework supporting open-weight models including Deepseek, Kimi, and Qwen. The project has accumulated nearly 64,000 GitHub stars, with 45 new stars on the trending day. It provides a lightweight harness for running code-executing agents on locally-hosted or open models.

