5arXiv cs.AI (Artificial Intelligence)·1mo ago

Democratizing Large-Scale Re-Optimization with LLM-Guided Model Patches

This paper introduces an agentic framework where an LLM acts as an operations research expert, translating natural-language user prompts into structured updates ('patches') to deployed optimization models and selecting appropriate re-optimization techniques from a toolbox. The toolbox leverages primal information—historical solutions, valid inequalities, solver configurations, and metaheuristics—to accelerate re-optimization while preserving solution quality. Experiments on supply chain re-optimization and university exam scheduling demonstrate computational efficiency gains and improved interpretability through patch-based model modifications. The framework aims to reduce dependence on OR experts for maintaining dynamic decision-support systems.

Enterprise Deployment Patterns Agent and Tool Ecosystem LLM-Guided Model Patches agentic re-optimization framework supply chain re-optimization operations research (OR)university exam scheduling

Related guides (2)

Enterprise Deployment PatternsTopic guide

Enterprise Deployment Patterns: From AI Demo to Production Reality

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

5arXiv · cs.CL·25d ago·source ↗

Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

This paper investigates multi-objective prompt optimization for LLM-as-judge systems, testing five decomposition modes of textual gradient optimizers across varying levels of cross-task information sharing. In 6 of 10 configurations, optimization fails to improve over the initial prompt, with gradient specificity dropping 59% when multiple criteria are processed jointly. The authors identify two separable failure modes: gradient dilution at optimization time and instruction interference at inference time. These findings constrain the design space for customizing LLM judges via textual feedback across multiple evaluation criteria simultaneously.

Evaluation and Benchmarking Agent and Tool Ecosystem MGDA Multi-Task Learning LLM-as-a-Judge +4 more

4Hugging Face Blog·1mo ago·source ↗

Optimizing your LLM in production

A Hugging Face blog post covering practical techniques for optimizing large language models in production environments. The post likely addresses inference efficiency methods such as quantization, batching, caching, and hardware utilization strategies. It serves as a practitioner-oriented guide for deploying LLMs at scale.

Inference Economics Enterprise Deployment Patterns Hugging Face

4Hugging Face Blog·1mo ago·source ↗

Investing in Performance: Fine-tune small models with LLM insights — a CFM case study

This Hugging Face blog post presents a case study from CFM (Capital Fund Management) on using large language model outputs to guide fine-tuning of smaller, more efficient models for financial applications. The approach leverages LLM-generated signals or labels to train compact models that can be deployed at lower cost and latency. The case study illustrates an enterprise pattern of distilling LLM capabilities into task-specific smaller models for production use.

Inference Economics Enterprise Deployment Patterns knowledge distillation Hugging Face Capital Fund Management +1 more

5arXiv · cs.CL·5d ago·source ↗

RePro: Retrospective Progress-Aware Self-Refinement for LLM Agent Training

Researchers introduce RePro (Retrospective Progress-Aware Training), a framework addressing the gap between step-wise RL optimization and metacognitive task-progress awareness in LLM agents. The approach uses a forward-then-reflect rollout paradigm where agents execute actions online and then retrospectively assess step-wise progress given the completed trajectory and known outcome. Evaluated on WebShop, ALFWorld, and Sokoban, RePro achieves up to 12% absolute success rate gains over baseline Qwen-family models without requiring continuous external supervision.

Agent and Tool Ecosystem Alignment and RLHF ALFWorld Sokoban RePro +2 more

4arXiv · cs.CL·47h ago·source ↗

Survey proposes four-layer architecture for token-operations-oriented LLM inference optimization

A new arXiv preprint introduces a four-layer technical architecture—Multi-model Fusion, Model Optimization, Compute-Model Fusion, and Compute-Network-Model Fusion—for systematically organizing LLM inference optimization techniques. The paper reviews key technologies and industry status at each layer and analyzes their application in real-world business scenarios. The framing around 'token operations' positions inference optimization as an operational discipline analogous to traditional IT operations.

Training Infrastructure Inference Economics Token-Operations-Oriented Inference Optimization Techniques for Large Models

6arXiv · cs.LG·9d ago·source ↗

APPO: Fine-grained branching and credit assignment for agentic RL in LLMs

Researchers introduce Agentic Procedural Policy Optimization (APPO), a reinforcement learning method that shifts branching and credit assignment from coarse tool-call boundaries to fine-grained decision points within generated sequences. APPO uses a Branching Score combining token uncertainty with policy-induced likelihood gains to select exploration points, plus procedure-level advantage scaling for credit distribution. Evaluated on 13 benchmarks, APPO improves strong agentic RL baselines by nearly 4 points while maintaining efficient tool use and interpretability. The work addresses a known weakness in multi-turn agentic RL: that influential decisions are distributed throughout sequences, not concentrated at tool-call boundaries.

Agent and Tool Ecosystem Alignment and RLHF APPO: Agentic Procedural Policy Optimization

5Hugging Face Blog·1mo ago·source ↗

We Got Claude to Fine-Tune an Open Source LLM

Hugging Face demonstrates using Claude (Anthropic's model) as an orchestrating agent to autonomously fine-tune an open-source LLM, showcasing an agentic workflow for model training. The post illustrates how a frontier model can handle the end-to-end process of dataset preparation, training configuration, and execution for a smaller open-weights model. This represents a practical example of AI-assisted ML engineering and agent-tool ecosystem development.

Open Weights Progress Agent and Tool Ecosystem Claude Hugging Face Anthropic

5Hugging Face Blog·1mo ago·source ↗

StackLLaMA: A hands-on guide to train LLaMA with RLHF

Hugging Face published a detailed tutorial demonstrating how to fine-tune Meta's LLaMA model using Reinforcement Learning from Human Feedback (RLHF) on StackExchange data. The guide covers the full pipeline: supervised fine-tuning, reward model training, and PPO-based RL optimization. It serves as a practical reference for practitioners seeking to replicate RLHF workflows on open-weight models using the TRL library.

Open Weights Progress Agent and Tool Ecosystem Reinforcement Learning from Human Feedback PPO StackLLaMA +5 more