FORGE
forge-98639926·3 events·first seen 1mo agoAliases: FORGE
Co-occurring entities
More like this (12)
Recent events (3)
FORGE: Self-Evolving Agent Memory via Population Broadcast Without Weight Updates
FORGE (Failure-Optimized Reflective Graduation and Evolution) is a staged, population-based protocol that evolves prompt-injected natural-language memory for hierarchical ReAct agents without any gradient updates. It wraps a Reflexion-style inner loop where a reflection agent converts failed trajectories into textual heuristics or few-shot demonstrations, then propagates the best-performing instance's memory across a population between stages. Evaluated on CybORG CAGE-2 (a stochastic network-defense POMDP), FORGE improves average return by 1.7–7.7× over zero-shot and 29–72% over Reflexion across all 12 model-representation conditions tested with four LLM families. Notably, weaker models benefit disproportionately, suggesting the method may help close capability gaps rather than amplify already-strong models.
FORGE benchmark reveals search-augmented LLMs vulnerable to fake product promotion via web content pollution
Researchers introduce FORGE, a benchmark measuring how often search-augmented LLMs recommend fake products when retrieval results are polluted with fabricated reviews or promotional pages. Across 12 commercial and open-weights models, a single polluted page causes fooled rates up to 27%, rising to 73.8% when all top-3 results are replaced. Notably, chain-of-thought reasoning does not mitigate the vulnerability and often generates spurious social proof to justify false recommendations. Three defenses tested—skepticism prompting, model-prior filtering, and cross-document consensus—each carry significant drawbacks.
Forge: Python Framework for Self-Hosted LLM Tool-Calling and Multi-Step Agentic Workflows
Forge is an open-source Python framework designed for self-hosted LLM deployments with tool-calling and multi-step agentic workflow capabilities. It has accumulated 1,396 total stars with a notable single-day spike of +449 stars, suggesting growing community interest. The project targets developers building local or self-hosted agent pipelines without reliance on managed API services.