Entity · benchmark

DeepResearch Bench

benchmarkactivedeepresearch-bench-90a53530·2 events·first seen May 19, 2026

Aliases: DeepResearch Bench

Co-occurring entities

DeepSeek V4 cognitive-graph Qwen3.6-27B DeepConsult VeriTrace Llama Nemotron NVIDIA Hugging Face Meta Llama

More like this (12)

Deep Research DeepWeb-Bench Hebbia Deep Research Open Deep Research FutureBench deep research agents WildBench PaperBench RepoBench SpecBench EdgeBench HealthBench

Recent events (2)

6arXiv · cs.AI·May 26, 2026·source ↗

VeriTrace: Cognitive-Graph Framework with Explicit Regulatory Loops for Deep Research Agents

VeriTrace introduces a cognitive-graph framework for deep research agents that replaces implicit LLM reasoning over intermediate representations with three explicit regulatory loops: interpretive update, deviation feedback, and schema revision. The system addresses contamination and error propagation in evolving mental models during complex multi-step research tasks. Using Qwen3.5-27B backbones, VeriTrace improves over the strongest matched baseline by 4.22 pp on DeepResearch Bench Insight and 5.9 pp Overall win rate on DeepConsult. With Config-DeepSeek, it achieves the strongest reproducible open-source result on DeepResearch Bench.

Frontier Model Releases Evaluation and Benchmarking DeepSeek V4 cognitive-graph DeepResearch Bench +4 more

5Hugging Face Blog·May 19, 2026·source ↗

Measuring Open-Source Llama Nemotron Models on DeepResearch Bench

NVIDIA evaluates its open-source Llama Nemotron models on the DeepResearch Bench, a benchmark designed to assess deep research agent capabilities. The post appears to report competitive performance of the Nemotron models in agentic research tasks. This is relevant to the ongoing development of open-weights models capable of multi-step research and reasoning workflows.

Evaluation and Benchmarking Open Weights Progress Llama Nemotron NVIDIA DeepResearch Bench +3 more