Entity · technique

ReAct

techniqueactivereact-09364d28·8 events·first seen May 18, 2026

Aliases: ReAct

Co-occurring entities

More like this (12)

RedAct RELEX ProAct Replit RAG ProActEval RePro ARC-AGI ReaORE ACKTR RD-Agent Reversa

Guides (1)

ReActConcept

ReAct: How AI Agents Think and Act at the Same Time

Read asBeginner In-depth

Recent events (8)

5arXiv · cs.CL·Jul 10, 2026·source ↗

WebSwarm: Recursive multi-agent framework for deep-and-wide web search

WebSwarm is a new multi-agent orchestration framework for LLM-based web search that uses progressive recursive delegation to handle both depth and breadth simultaneously. Each agent node couples a local objective with a search mode and can either solve its task or delegate to child nodes, passing evidence upward for parent-level expansion and aggregation. The system outperforms single-agent and multi-agent baselines on BrowseComp-Plus, WideSearch, DeepWideSearch, and GISA benchmarks. The work addresses known limitations of ReAct-style single-agent search and flat parallel multi-agent approaches.

Evaluation and Benchmarking Agent and Tool Ecosystem BrowseComp-Plus WebSwarm WideSearch +3 more

5arXiv · cs.AI·Jul 7, 2026·source ↗

SovereignPA-Bench: A benchmark for evaluating user sovereignty in personal AI agents

Researchers introduce SovereignPA-Bench, an executable benchmark designed to evaluate personal AI agents on dimensions beyond task completion, including privacy preservation, consent compliance, evidence grounding, manipulation resistance, and user burden. The benchmark tests 120 sovereignty stress scenarios across 4 model families and 8 policy baselines, generating 3,840 frozen-prompt trajectories with a blinded human audit component. Results show that full-sovereign scaffolding outperforms all baselines on sovereignty metrics while reducing privacy leakage and consent violations. The work argues that personal-agent evaluation must shift from task success toward consent-aware, evidence-grounded action.

Evaluation and Benchmarking AI Safety Research ReAct SovereignPA-Bench +1 more

5arXiv · cs.CL·Jun 10, 2026·source ↗

TRACE: Tree-structured rollout budget allocation for efficient agentic RL training

TRACE (Tree Rollout Allocation for Contrastive Exploration) is a new framework for improving reinforcement learning with verifiable rewards (RLVR) in multi-turn agentic LLM settings. The method models each ReAct-style thought-action-observation turn as a distinct node, enabling budget allocation across both prompt-level and turn-level prefixes in a tree structure, rather than only at the prompt level. A shared predictor estimates conditional success probability at each anchor to guide allocation, enriching reward contrast within a fixed sampling budget. Empirically, TRACE improves Qwen3-14B multi-hop QA accuracy by 2.8 points over baselines at equal sampling cost.

Evaluation and Benchmarking Agent and Tool Ecosystem RLVR TRACE ReAct +2 more

6The Batch·Jun 3, 2026·source ↗

Data Points: Perplexity Computer expands, Google Aletheia math agent, DeepSeek chip strategy, Nvidia retrieval pipeline, Stargate cancellation

The Batch's weekly data points roundup covers five significant AI developments: Perplexity expanded its Computer agentic platform to desktop, mobile, and enterprise with new APIs and financial data tools; Google released Aletheia, a Gemini-based math research agent achieving 95.1% on IMO-Proof Bench Advanced (up from 65.7%); DeepSeek withheld pre-release access to its V4 model from Nvidia and AMD while giving domestic Chinese chipmakers early access; Nvidia's NeMo Retriever topped the ViDoRe v3 leaderboard using a ReACT-based agentic retrieval loop; and OpenAI and Oracle cancelled plans to expand the Abilene Stargate campus from 1.2 GW to 2.0 GW due to financing and reliability issues.

Training Infrastructure Frontier Model Releases ViDoRe v3 Crusoe BRIGHT +19 more

4The Batch·Jun 1, 2026·source ↗

Coding Agents Accelerate Some Software Tasks More Than Others

Andrew Ng offers a practitioner framework ranking how much coding agents accelerate different software work: frontend development benefits most (agents close the loop via browser feedback), followed by backend, infrastructure, and research in decreasing order. Backend work still requires skilled developers to handle corner cases and security; infrastructure decisions remain largely human-driven due to complex tradeoffs and limited LLM knowledge in that domain; research is least accelerated because ideation and hypothesis iteration are not primarily coding tasks. The commentary is aimed at helping engineering managers set realistic expectations and organize teams accordingly.

Enterprise Deployment Patterns Agent and Tool Ecosystem TypeScript DeepLearning.AI coding agents +2 more

5arXiv · cs.AI·May 27, 2026·source ↗

Maat: ReAct-Based Agentic Legal Research Assistant for Competition Law

Maat is a ReAct agent designed specifically for competition law research, orchestrating tools for RAG-based retrieval, web search fallback, and citation generation. Built iteratively with domain experts, it addresses hallucination and citation gaps found in general assistants (Claude, ChatGPT) and legal-specific models (SaulLM-7B, LegalGPT). Maat significantly outperforms baselines on case-specific tasks and matches top baselines on theoretical questions. The evaluation dataset is publicly released on GitHub.

Evaluation and Benchmarking Enterprise Deployment Patterns ChatGPT Maat Claude +5 more

6arXiv · cs.CL·May 26, 2026·source ↗

Semantic vs. Surface Noise in LLM Agents: 68-Cell Measurement Study with Held-Out Validation

This paper documents an empirical phenomenon across 10 LLMs from 7 architecture families: meaning-bearing perturbations (paraphrase, synonym substitution) cause final-answer inconsistency ~19.69 percentage points more often than presentation-level perturbations (formatting, reordering) of comparable severity, across GSM8K, MATH, and HotpotQA benchmarks. The effect is validated on a held-out 11th model (qwen2.5-14B-Instruct) with 1,800 trajectories. Trace-level analysis supports a 'stealth-divergence' picture where semantic perturbations preserve the first action but induce divergence in intermediate reasoning steps, while two prior mechanism claims are explicitly retracted. The study is notable for its honest reporting of stress-test failures and pre-registered replication.

Evaluation and Benchmarking AI Safety Research Qwen2.5-7B-Instruct-1M ReAct stealth-divergence +5 more

6arXiv · cs.LG·May 18, 2026·source ↗

FORGE: Self-Evolving Agent Memory via Population Broadcast Without Weight Updates

FORGE (Failure-Optimized Reflective Graduation and Evolution) is a staged, population-based protocol that evolves prompt-injected natural-language memory for hierarchical ReAct agents without any gradient updates. It wraps a Reflexion-style inner loop where a reflection agent converts failed trajectories into textual heuristics or few-shot demonstrations, then propagates the best-performing instance's memory across a population between stages. Evaluated on CybORG CAGE-2 (a stochastic network-defense POMDP), FORGE improves average return by 1.7–7.7× over zero-shot and 29–72% over Reflexion across all 12 model-representation conditions tested with four LLM families. Notably, weaker models benefit disproportionately, suggesting the method may help close capability gaps rather than amplify already-strong models.

Evaluation and Benchmarking Agent and Tool Ecosystem Reflexion Grok-4-Fast ReAct +6 more