6arXiv cs.AI (Artificial Intelligence)·29d ago

LCGuard: Adversarial Training Framework for Safe KV Cache Sharing in Multi-Agent LLM Systems

LCGuard introduces a framework for preventing sensitive information leakage when multi-agent LLM systems share KV caches as a latent communication channel. The approach formalizes leakage operationally via reconstruction: a shared cache artifact is deemed unsafe if an adversarial decoder can recover sensitive inputs from it. An adversarial training loop pits a reconstructor against LCGuard's representation-level transformations, which aim to preserve task-relevant semantics while suppressing recoverable sensitive content. Empirical results across multiple model families and multi-agent benchmarks show reduced reconstruction-based leakage and attack success rates with competitive task performance.

Inference Economics AI Safety Research Agent and Tool Ecosystem KV Cache representation-level sensitive information leakage LCGuard multi-agent systems adversarial training latent communication

Related guides (3)

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Inference EconomicsTopic guide

Inference Economics: The Cost of Running AI in Production

Read asBeginner In-depth

Related events (8)

7arXiv · cs.CL·11d ago·source ↗

Latent Context Language Models (LCLMs) achieve competitive encoder-decoder KV cache compression at scale

Researchers introduce Latent Context Language Models (LCLMs), a family of encoder-decoder compressors that map long token sequences to shorter latent embeddings consumed by a decoder, targeting the KV cache memory bottleneck in long-context inference. The authors conduct architecture search and continually pre-train 0.6B-encoder/4B-decoder models on over 350B tokens at compression ratios of 1:4, 1:8, and 1:16. LCLMs improve the Pareto frontier across general-task performance, compression speed, and peak memory, and are demonstrated as efficient backbones for long-horizon agents that can skim compressed context and expand relevant segments on demand. The work closes a previously noted gap between encoder-decoder approaches and KV cache compression methods on the accuracy-efficiency frontier.

Long Context Evolution Inference Economics End-to-End Context Compression at Scale Latent Context Language Models +1 more

4Github Trending·8d ago·source ↗

LMCache: KV cache layer for LLM inference acceleration

LMCache is an open-source Python library providing a KV cache layer designed to accelerate LLM inference. The project has accumulated 8,613 GitHub stars with modest daily growth (+17). It targets inference efficiency by offloading or sharing KV cache state across requests.

Inference Economics Agent and Tool Ecosystem LMCache

5Hugging Face Blog·1mo ago·source ↗

An Introduction to AI Secure LLM Safety Leaderboard

Hugging Face introduces the DecodingTrust-based LLM Safety Leaderboard, a benchmark framework for evaluating large language models across multiple safety and trustworthiness dimensions. The leaderboard aims to provide standardized, reproducible safety assessments covering areas such as toxicity, stereotype bias, adversarial robustness, and privacy. It offers a public ranking of models to help researchers and practitioners compare safety properties across different LLMs.

Evaluation and Benchmarking AI Safety Research LLM Safety Leaderboard Hugging Face DecodingTrust

5arXiv · cs.CL·4d ago·source ↗

KVEraser: Learned KV cache editing for efficient localized context erasing in LLMs

KVEraser is a learned method for efficiently erasing specific spans from an LLM's KV cache without full recomputation of subsequent tokens. The approach replaces only the KV states of the erased interval with learned steering states, using a two-stage training pipeline of generic pre-training followed by task-specific fine-tuning. On contexts from 1K–32K tokens, KVEraser nearly matches full recomputation quality while incurring only 24% latency overhead versus a 17.6x increase for exact recomputation, with demonstrated generalization to long-document QA with harmful factual distractors.

Long Context Evolution Inference Economics KVEraser +1 more

4arXiv · cs.LG·12d ago·source ↗

SETA: Sparse Subspace-to-Expert Sharing for Continual Learning in LLMs

Researchers introduce SETA (Mixture of Sparse Experts for Task Agnostic Continual Learning), a framework addressing catastrophic forgetting in LLMs via adaptive sparse subspace decomposition into task-specific and shared expert modules. The approach uses adaptive elastic anchoring and routing-aware regularization to protect shared knowledge at both weight and routing levels. Experiments on LLaMA-2 7B and Qwen3-4B show competitive or superior performance versus continual learning baselines, with strong retention of early-task knowledge.

Evaluation and Benchmarking Open Weights Progress LLaMA-7B Qwen3-4B Sparse Subspace-to-Expert Sharing for Task-Agnostic Continual Learning +1 more

5arXiv · cs.CL·11d ago·source ↗

AdvGRPO: Stable co-training framework for adaptive red teaming of language models

Researchers introduce AdvGRPO, a co-training framework that makes GRPO viable for joint attacker-defender optimization in LLM red teaming, addressing previously reported instability. The method uses dense multi-channel rewards and decoupled advantage normalization, with a curriculum progressing from single-turn to multi-turn attacks before bootstrapping co-training. Co-trained defenders outperform baselines on safety benchmarks, and the attacks show transferability across models.

AI Safety Research Alignment and RLHF AdvGRPO GRPO PPO +1 more

7arXiv · cs.AI·22d ago·source ↗

Bounding Compositional Incoherence in Multi-Component LLM Agents

This paper formalizes a failure mode in multi-component LLM agent systems where individual components are locally probabilistically coherent but their composition violates basic probability axioms. The authors introduce the 'compositional residual' (eps*) as a runtime-computable measure of this incoherence, finding it positive in 33–94% of ensemble cliques across 1,876 tested configurations on a four-LLM panel. A hierarchical Boyle-Dykstra projection is proposed as a deterministic repair, and an anytime-valid e-process enables sequential monitoring. Notably, three intuitive LLM-side mitigations—retrieval, partition-aware prompting, and aggregator-LLM—each fail or regress.

Evaluation and Benchmarking AI Safety Research Compositional Residual (eps*)Proportional Allocation Rule Multi-Component LLM Agent +4 more

6arXiv · cs.CL·29d ago·source ↗

Agentic CLEAR: Automating Multi-Level Evaluation of LLM Agents

Agentic CLEAR is an automatic evaluation framework for LLM-based agentic systems that analyzes behavior at three granularity levels: system, trace, and node. Unlike existing tools that rely on static error taxonomies or focus only on observability, it dynamically generates textual insights and integrates above the observability layer with an accessible UI. Experiments across four benchmarks and seven agentic settings demonstrate strong alignment with human-annotated errors and predictive accuracy for task success rates.

Evaluation and Benchmarking AI Safety Research Agentic CLEAR multi-level agent evaluation LLM agents +1 more