Entity · benchmark

DRFLOW

benchmarkactivedrflow-e2a746ad·1 events·first seen Jun 17, 2026

Aliases: DRFLOW

Co-occurring entities

DRFLOW-Agent

More like this (12)

DRFLOW-Agent FlowDPO ShellFlow DataFlow MLflow MeanFlow RAGFlow FlowPipe DFly Flow-GRPO LatentFlow SINT-Flow

Recent events (1)

4arXiv · cs.AI·Jun 17, 2026·source ↗

DRFLOW: Benchmark for Evaluating Agent Workflow Prediction from Heterogeneous Sources

Researchers introduce DRFLOW, a benchmark targeting a gap in deep research (DR) agent evaluation: predicting concrete, personalized action-step workflows rather than generating summaries or reports. The benchmark contains 100 tasks across five domains, grounded in over 3,900 sources, with seven diagnostic metrics covering factual grounding, step recovery, structural ordering, and personalization. A reference agent (DRFA) is also presented, improving over strong baselines by up to 10% average F1 but leaving substantial headroom, indicating workflow prediction remains a hard open problem for DR agents.

Evaluation and Benchmarking Agent and Tool Ecosystem DRFLOW-Agent DRFLOW