DeepResearch Bench
deepresearch-bench-90a53530·2 events·first seen 28d agoAliases: DeepResearch Bench
Co-occurring entities
More like this (12)
Recent events (2)
Measuring Open-Source Llama Nemotron Models on DeepResearch Bench
NVIDIA evaluates its open-source Llama Nemotron models on the DeepResearch Bench, a benchmark designed to assess deep research agent capabilities. The post appears to report competitive performance of the Nemotron models in agentic research tasks. This is relevant to the ongoing development of open-weights models capable of multi-step research and reasoning workflows.
VeriTrace: Cognitive-Graph Framework with Explicit Regulatory Loops for Deep Research Agents
VeriTrace introduces a cognitive-graph framework for deep research agents that replaces implicit LLM reasoning over intermediate representations with three explicit regulatory loops: interpretive update, deviation feedback, and schema revision. The system addresses contamination and error propagation in evolving mental models during complex multi-step research tasks. Using Qwen3.5-27B backbones, VeriTrace improves over the strongest matched baseline by 4.22 pp on DeepResearch Bench Insight and 5.9 pp Overall win rate on DeepConsult. With Config-DeepSeek, it achieves the strongest reproducible open-source result on DeepResearch Bench.