Entity · other

LLM agents

otheractivellm-agents-d25620a4·3 events·first seen May 19, 2026

Aliases: LLM agents, LLM Agent

Co-occurring entities

Map-Reduce Decomposition Last-Mile Forecasting Framework Time Series Foundation Models Memory Bank Agentic CLEAR multi-level agent evaluation task-conditioned generation task-agnostic generation SkillGenBench skill generation

More like this (12)

LLM Bargaining Agents LLM LLM Agent Classroom Arabic LLMs vLLM Multi-Component LLM Agent LLM CLI LLM inference LLM (CLI tool)LLM-as-a-Judge whichllm Progress Advantage for LLM Agents

Recent events (3)

5arXiv · cs.AI·Jun 2, 2026·source ↗

LLM Agent Framework for Last-Mile Time Series Forecasting Revision

This paper introduces a 'last-mile forecasting' framework where an LLM agent sits atop a statistical forecasting backbone to incorporate weakly structured business context—holidays, campaigns, expert feedback, external events—into decision-ready forecasts. The system uses tool-invocation for contextual retrieval, converts reasoning into explicit revision actions under safety constraints, and supports long-horizon forecasting via map-reduce decomposition with a memory bank for post-hoc reflection. The authors validate the approach through real-world case studies, positioning it as a bridge between statistical prediction and operationally usable forecasts.

Enterprise Deployment Patterns Agent and Tool Ecosystem Map-Reduce Decomposition Last-Mile Forecasting Framework Time Series Foundation Models +2 more

6arXiv · cs.CL·May 22, 2026·source ↗

Agentic CLEAR: Automating Multi-Level Evaluation of LLM Agents

Agentic CLEAR is an automatic evaluation framework for LLM-based agentic systems that analyzes behavior at three granularity levels: system, trace, and node. Unlike existing tools that rely on static error taxonomies or focus only on observability, it dynamically generates textual insights and integrates above the observability layer with an accessible UI. Experiments across four benchmarks and seven agentic settings demonstrate strong alignment with human-annotated errors and predictive accuracy for task success rates.

Evaluation and Benchmarking AI Safety Research Agentic CLEAR multi-level agent evaluation LLM agents +1 more

5arXiv · cs.AI·May 19, 2026·source ↗

SkillGenBench: Benchmarking Skill Generation Pipelines for LLM Agents

SkillGenBench is a new benchmark designed to evaluate the ability of LLM agents to generate correct, reusable, and executable skills from raw repositories and documents, rather than merely using pre-provided skills. It covers two generation regimes (task-conditioned and task-agnostic) and two procedural sources (repository-grounded and document-grounded), with standardized execution-based evaluation protocols. Experiments across multiple skill-generation methods reveal substantial performance variation and distinct failure modes depending on source type. The benchmark aims to establish skill generation as an independent research problem within agent systems.

Evaluation and Benchmarking Agent and Tool Ecosystem task-conditioned generation task-agnostic generation SkillGenBench +2 more