person
Rishabh Sabharwal
personactiveprovisional
rishabh-sabharwal-c8c80997·1 events·first seen 8d agoAliases: Rishabh Sabharwal
Co-occurring entities
More like this (12)
Recent events (1)
Multi-turn evaluation reveals deep research agents fail to compound gains from process-level feedback
A new arXiv paper evaluates deep research agents (DRAs) across multiple feedback turns, comparing self-reflection against process-level feedback via a novel method called Research Gap Inference (RGI). Key findings: self-reflection yields negligible net improvement, one round of process-level feedback raises normalized scores by 8-15 points (~35-40% incorporation rate), but gains do not compound across turns as agents regress on up to 24% of previously satisfied criteria. The results suggest reliable multi-turn improvement remains out of reach for current DRA architectures, highlighting a fundamental limitation in iterative agentic research workflows.