benchmark
DRFLOW
benchmarkactiveprovisional
drflow-e2a746ad·1 events·first seen 7h agoAliases: DRFLOW
Co-occurring entities
More like this (12)
Recent events (1)
DRFLOW: Benchmark for Evaluating Agent Workflow Prediction from Heterogeneous Sources
Researchers introduce DRFLOW, a benchmark targeting a gap in deep research (DR) agent evaluation: predicting concrete, personalized action-step workflows rather than generating summaries or reports. The benchmark contains 100 tasks across five domains, grounded in over 3,900 sources, with seven diagnostic metrics covering factual grounding, step recovery, structural ordering, and personalization. A reference agent (DRFA) is also presented, improving over strong baselines by up to 10% average F1 but leaving substantial headroom, indicating workflow prediction remains a hard open problem for DR agents.