3GitHub Trending (AI/LLM filtered)·17d ago

NVIDIA NeMo Gym: framework for evaluating and improving models and agents using environments

NVIDIA's NeMo team has published a Python library called NeMo Gym on GitHub, designed to evaluate and improve models and agents through environment-based interaction. The repository has 941 stars with minimal recent traction (+1 today). It appears to be an RL-style evaluation and training harness within the NeMo ecosystem.

Evaluation and Benchmarking Agent and Tool Ecosystem NVIDIA NeMo Gym

Related guides (3)

NVIDIA

NVIDIA: The Hardware Backbone of the AI Era

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

4Github Trending·7d ago·source ↗

NVIDIA PhysicsNeMo: open-source Physics-ML deep learning framework

NVIDIA has published PhysicsNeMo, an open-source Python framework for building, training, and fine-tuning deep learning models using Physics-ML methods. The repository has accumulated 2,933 stars on GitHub. Physics-informed ML is a growing area relevant to scientific computing and simulation workloads.

Agent and Tool Ecosystem PhysicsNeMo NVIDIA

4Hugging Face Blog·1mo ago·source ↗

The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator

NVIDIA and Hugging Face present an evaluation methodology for the Nemotron 3 Nano model using the NeMo Evaluator framework. The post describes benchmark results and an open evaluation recipe intended to standardize how small/nano-scale models are assessed. It positions NeMo Evaluator as a reproducible, open evaluation stack for the community.

Evaluation and Benchmarking Open Weights Progress Nemotron 3 Nano Omni NeMo Evaluator NVIDIA +1 more

5Hugging Face Blog·1mo ago·source ↗

Measuring Open-Source Llama Nemotron Models on DeepResearch Bench

NVIDIA evaluates its open-source Llama Nemotron models on the DeepResearch Bench, a benchmark designed to assess deep research agent capabilities. The post appears to report competitive performance of the Nemotron models in agentic research tasks. This is relevant to the ongoing development of open-weights models capable of multi-step research and reasoning workflows.

Evaluation and Benchmarking Open Weights Progress Llama Nemotron NVIDIA DeepResearch Bench +3 more

7Openai Blog·1mo ago·source ↗

OpenAI Gym Beta Release

OpenAI released the public beta of OpenAI Gym, a toolkit for developing and comparing reinforcement learning algorithms. The toolkit includes a suite of environments ranging from simulated robots to Atari games, along with a site for comparing and reproducing results. This represented a significant early infrastructure contribution to the RL research community.

Evaluation and Benchmarking Agent and Tool Ecosystem OpenAI Atari OpenAI Gym

7The Batch·35h ago·source ↗

Nvidia Nemotron 3 Ultra: hybrid Mamba-transformer open-weights model targeting agentic workloads

Nvidia released Nemotron 3 Ultra, a 550B parameter (55B active) hybrid Mamba-transformer mixture-of-experts model with a 1M token context window, publishing weights, training data, and RL environments under an open license. The model ranks as the highest-scoring U.S. open-weights model on the Artificial Analysis Intelligence Index (47.7-48.2) and is approximately three times faster than comparable open-weights rivals, though it trails leading Chinese models like Kimi K2.6 and DeepSeek V4 Pro on intelligence benchmarks. Nvidia used a novel Multi-Teacher On-Policy Distillation approach with 10+ specialized teacher models and trained using NVFP4 quantization. The release is strategically motivated by Nvidia's interest in a healthy open-weights ecosystem that drives AI semiconductor adoption.

Frontier Model Releases Open Weights Progress Mamba IFBench Artificial Analysis Intelligence Index +17 more

6Hugging Face Blog·1mo ago·source ↗

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents

NVIDIA has released Nemotron 3 Nano Omni, a multimodal model targeting long-context understanding across documents, audio, and video modalities. The model is positioned for agentic use cases requiring cross-modal reasoning. It is published via the Hugging Face blog as part of NVIDIA's Nemotron model family. No detailed technical specifications or benchmark results are provided in the available body text.

Long Context Evolution Open Weights Progress Nemotron 3 Nano Omni NVIDIA Hugging Face +3 more

4Openai Blog·1mo ago·source ↗

OpenAI Releases Neural MMO: Massively Multiagent RL Game Environment

OpenAI released Neural MMO, a massively multiagent game environment designed for reinforcement learning research. The platform supports a large and variable number of agents operating within a persistent, open-ended task structure. The environment is designed to encourage emergent behaviors including better exploration, divergent niche formation, and improved overall agent competence through multi-species competition.

Evaluation and Benchmarking Agent and Tool Ecosystem Reinforcement Learning OpenAI Neural MMO

3Github Trending·24d ago·source ↗

NVIDIA NeMo Megatron-Bridge: Bidirectional Hugging Face Conversion for Megatron-Based Training

Megatron-Bridge is an NVIDIA NeMo training library for Megatron-based models that supports bidirectional conversion between Megatron and Hugging Face formats. The repository has accumulated 670 stars with modest daily growth (+5). It addresses a practical interoperability gap between the high-performance Megatron training stack and the broader HuggingFace ecosystem.

Training Infrastructure Agent and Tool Ecosystem NVIDIA NeMo Hugging Face Megatron-Bridge +1 more