Entity · benchmark

PinchBench

benchmarkactivepinchbench-2e9533f8·3 events·first seen Jun 2, 2026

Aliases: PinchBench

Co-occurring entities

More like this (12)

FoldBench SelectBench SorryBench CursorBench PaperBench ClawBench PhantomBench RepoBench RuBench JailbreakBench TriggerBench PIE-Bench

Recent events (3)

7The Batch·Jun 19, 2026·source ↗

Nvidia Nemotron 3 Ultra: hybrid Mamba-transformer open-weights model targeting agentic workloads

Nvidia released Nemotron 3 Ultra, a 550B parameter (55B active) hybrid Mamba-transformer mixture-of-experts model with a 1M token context window, publishing weights, training data, and RL environments under an open license. The model ranks as the highest-scoring U.S. open-weights model on the Artificial Analysis Intelligence Index (47.7-48.2) and is approximately three times faster than comparable open-weights rivals, though it trails leading Chinese models like Kimi K2.6 and DeepSeek V4 Pro on intelligence benchmarks. Nvidia used a novel Multi-Teacher On-Policy Distillation approach with 10+ specialized teacher models and trained using NVFP4 quantization. The release is strategically motivated by Nvidia's interest in a healthy open-weights ecosystem that drives AI semiconductor adoption.

Frontier Model Releases Open Weights Progress Mamba IFBench Artificial Analysis Intelligence Index +17 more

6arXiv · cs.AI·Jun 16, 2026·source ↗

TokenPilot: Dual-granularity context management cuts LLM agent inference costs by up to 87%

TokenPilot is a cache-efficient context management framework for LLM agents that addresses the trade-off between token sparsity and prompt cache continuity. It combines Ingestion-Aware Compaction (global prefix stabilization) with Lifecycle-Aware Eviction (local segment offloading) to reduce inference costs by 56–87% across benchmarks while maintaining competitive task performance. The system is evaluated on PinchBench and Claw-Eval and has been integrated into the open-source LightMem2 library.

Inference Economics Agent and Tool Ecosystem PinchBench Claw-Eval LightMem +2 more

7The Batch·Jun 2, 2026·source ↗

Nvidia releases Nemotron 3 Super 120B-A12B open-weights model with hybrid Mamba-2/MoE architecture

Nvidia released Nemotron 3 Super 120B-A12B, an open-weights LLM with a hybrid Mamba-2/transformer/MoE architecture that activates only 12B parameters per token and supports up to 1 million token context. The model claims the fastest inference speed in its size class at 442 tokens/second and leads open-weights models on PinchBench agentic task evaluation, outperforming larger models including Kimi K2.5 (1T parameters). Nvidia is releasing weights, training data, and recipes under a permissive commercial license, and plans a $26B five-year investment in open-weights models — framed partly as a strategic response to Chinese labs building capable open-weights models on non-Nvidia hardware.

Frontier Model Releases Open Weights Progress Nemotron 3 Super 120B-A12B Nemotron 3 Ultra-500B-A50B PivotRL +18 more