Almanac
← Events
5arXiv cs.AI (Artificial Intelligence)·19d ago

LLM Agent Framework for Last-Mile Time Series Forecasting Revision

This paper introduces a 'last-mile forecasting' framework where an LLM agent sits atop a statistical forecasting backbone to incorporate weakly structured business context—holidays, campaigns, expert feedback, external events—into decision-ready forecasts. The system uses tool-invocation for contextual retrieval, converts reasoning into explicit revision actions under safety constraints, and supports long-horizon forecasting via map-reduce decomposition with a memory bank for post-hoc reflection. The authors validate the approach through real-world case studies, positioning it as a bridge between statistical prediction and operationally usable forecasts.

Related guides (2)

Related events (8)

4arXiv · cs.AI·17d ago·source ↗

AgentMob: Training-free LLM agent framework for evidence-grounded mobility prediction

AgentMob is a training-free LLM-driven agent framework that formulates next-location prediction as adaptive evidence-controlled decision making, using a fast path for routine cases and iterative tool use for ambiguous ones. Evaluated on three mobility datasets, it achieves the strongest overall performance among training-free LLM-based methods, with GPT-5.4 reaching 71.42% Acc@1 on the BW dataset. The framework demonstrates that LLM controllers add most value in resolving ambiguous predictions through adaptive evidence gathering rather than routine cases.

4Github Trending·20d ago·source ↗

TradingAgents: Multi-Agent LLM Financial Trading Framework

TradingAgents is an open-source Python framework by TauricResearch that applies multi-agent LLM architectures to financial trading tasks. The repository has accumulated 81,650 GitHub stars with 284 added today, indicating strong community traction. It represents a concrete deployment pattern for agentic AI systems in quantitative finance.

5Hugging Face Blog·1mo ago·source ↗

Jupyter Agents: Training LLMs to Reason with Notebooks

Hugging Face published a blog post on training LLMs to operate as Jupyter notebook agents, enabling models to reason and execute code iteratively within notebook environments. The work covers dataset construction, training methodology, and evaluation for notebook-native agentic behavior. This represents a step toward LLMs that can conduct multi-step data analysis and experimentation autonomously within a familiar scientific computing interface.

6arXiv · cs.CL·13d ago·source ↗

Agentopia: Long-term multi-agent life simulation framework for training LLMs on social behavior

Researchers introduce Agentopia, a framework for simulating 10 years of social life across 100 LLM-powered agents, enabling study of emergent social behaviors and long-term personal growth dynamics. The system defines a 'life reward' metric mirroring human well-being and uses it to train LLMs via rejection sampling. Training on simulated social experience yields a +15.6% improvement on downstream role-playing benchmarks, suggesting that synthetic social simulation can generalize to real capability gains.

5arXiv · cs.AI·12d ago·source ↗

Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

Role-Agent is a new framework that uses a single LLM simultaneously as both agent and environment, enabling self-bootstrapped co-evolution without external environment feedback. The system has two components: World-In-Agent (WIA), which uses predicted vs. actual state alignment as a process reward, and Agent-In-World (AIW), which reshapes training data by retrieving tasks with similar failure patterns. Experiments across multiple benchmarks show an average performance gain of over 4% over strong baselines. The approach addresses key limitations in LLM agent training: inefficient feedback and static environments.

5arXiv · cs.CL·3d ago·source ↗

Multi-Agent Fictitious Play (MAFP) applies game-theoretic equilibrium-seeking to LLM decision-making

Researchers propose Multi-Agent Fictitious Play (MAFP), a multi-agent system paradigm that frames LLM-based decision-making as an equilibrium-seeking process borrowed from game theory. Each agent represents a stakeholder stance and iteratively best-responds to the empirical mixture of other agents' past decisions, addressing what the authors call 'stance entanglement' — mutual interdependence among stakeholder decisions that cannot be decomposed into independent subtasks. MAFP is evaluated on competitive strategy tasks and outperforms single-round and multi-round baselines on tournament strength and robustness metrics. The work extends the MAS literature beyond divide-and-conquer execution patterns into interdependent decision scenarios.

6arXiv · cs.AI·1mo ago·source ↗

A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents

This paper introduces the stochastic-deterministic boundary (SDB) as a foundational architectural primitive for production LLM agent runtimes, defining it as a four-part contract (proposer, verifier, commit step, reject signal) governing how LLM outputs become system actions. The authors organize agent runtime design around Coordination, State, and Control concerns, presenting a catalog of six runtime patterns applicable to conversational, autonomous, and long-horizon agents. A five-step pattern-selection methodology and diagnostic procedure mapping production failures to pattern weaknesses are contributed, along with a newly named failure mode—replay divergence—where LLM consumers of deterministic event logs produce inconsistent outputs across model versions or prompt changes. The paper argues that as model variance decreases, architectural pattern choice and SDB strength become the dominant reliability levers.

5arXiv · cs.CL·6d ago·source ↗

RePro: Retrospective Progress-Aware Self-Refinement for LLM Agent Training

Researchers introduce RePro (Retrospective Progress-Aware Training), a framework addressing the gap between step-wise RL optimization and metacognitive task-progress awareness in LLM agents. The approach uses a forward-then-reflect rollout paradigm where agents execute actions online and then retrospectively assess step-wise progress given the completed trajectory and known outcome. Evaluated on WebShop, ALFWorld, and Sokoban, RePro achieves up to 12% absolute success rate gains over baseline Qwen-family models without requiring continuous external supervision.