LCGuard: Adversarial Training Framework for Safe KV Cache Sharing in Multi-Agent LLM Systems
LCGuard introduces a framework for preventing sensitive information leakage when multi-agent LLM systems share KV caches as a latent communication channel. The approach formalizes leakage operationally via reconstruction: a shared cache artifact is deemed unsafe if an adversarial decoder can recover sensitive inputs from it. An adversarial training loop pits a reconstructor against LCGuard's representation-level transformations, which aim to preserve task-relevant semantics while suppressing recoverable sensitive content. Empirical results across multiple model families and multi-agent benchmarks show reduced reconstruction-based leakage and attack success rates with competitive task performance.
Related guides (3)
Related events (8)
Latent Context Language Models (LCLMs) achieve competitive encoder-decoder KV cache compression at scale
Researchers introduce Latent Context Language Models (LCLMs), a family of encoder-decoder compressors that map long token sequences to shorter latent embeddings consumed by a decoder, targeting the KV cache memory bottleneck in long-context inference. The authors conduct architecture search and continually pre-train 0.6B-encoder/4B-decoder models on over 350B tokens at compression ratios of 1:4, 1:8, and 1:16. LCLMs improve the Pareto frontier across general-task performance, compression speed, and peak memory, and are demonstrated as efficient backbones for long-horizon agents that can skim compressed context and expand relevant segments on demand. The work closes a previously noted gap between encoder-decoder approaches and KV cache compression methods on the accuracy-efficiency frontier.
LMCache: KV cache layer for LLM inference acceleration
LMCache is an open-source Python library providing a KV cache layer designed to accelerate LLM inference. The project has accumulated 8,613 GitHub stars with modest daily growth (+17). It targets inference efficiency by offloading or sharing KV cache state across requests.
An Introduction to AI Secure LLM Safety Leaderboard
Hugging Face introduces the DecodingTrust-based LLM Safety Leaderboard, a benchmark framework for evaluating large language models across multiple safety and trustworthiness dimensions. The leaderboard aims to provide standardized, reproducible safety assessments covering areas such as toxicity, stereotype bias, adversarial robustness, and privacy. It offers a public ranking of models to help researchers and practitioners compare safety properties across different LLMs.
KVEraser: Learned KV cache editing for efficient localized context erasing in LLMs
KVEraser is a learned method for efficiently erasing specific spans from an LLM's KV cache without full recomputation of subsequent tokens. The approach replaces only the KV states of the erased interval with learned steering states, using a two-stage training pipeline of generic pre-training followed by task-specific fine-tuning. On contexts from 1K–32K tokens, KVEraser nearly matches full recomputation quality while incurring only 24% latency overhead versus a 17.6x increase for exact recomputation, with demonstrated generalization to long-document QA with harmful factual distractors.
SETA: Sparse Subspace-to-Expert Sharing for Continual Learning in LLMs
Researchers introduce SETA (Mixture of Sparse Experts for Task Agnostic Continual Learning), a framework addressing catastrophic forgetting in LLMs via adaptive sparse subspace decomposition into task-specific and shared expert modules. The approach uses adaptive elastic anchoring and routing-aware regularization to protect shared knowledge at both weight and routing levels. Experiments on LLaMA-2 7B and Qwen3-4B show competitive or superior performance versus continual learning baselines, with strong retention of early-task knowledge.
AdvGRPO: Stable co-training framework for adaptive red teaming of language models
Researchers introduce AdvGRPO, a co-training framework that makes GRPO viable for joint attacker-defender optimization in LLM red teaming, addressing previously reported instability. The method uses dense multi-channel rewards and decoupled advantage normalization, with a curriculum progressing from single-turn to multi-turn attacks before bootstrapping co-training. Co-trained defenders outperform baselines on safety benchmarks, and the attacks show transferability across models.
Bounding Compositional Incoherence in Multi-Component LLM Agents
This paper formalizes a failure mode in multi-component LLM agent systems where individual components are locally probabilistically coherent but their composition violates basic probability axioms. The authors introduce the 'compositional residual' (eps*) as a runtime-computable measure of this incoherence, finding it positive in 33–94% of ensemble cliques across 1,876 tested configurations on a four-LLM panel. A hierarchical Boyle-Dykstra projection is proposed as a deterministic repair, and an anytime-valid e-process enables sequential monitoring. Notably, three intuitive LLM-side mitigations—retrieval, partition-aware prompting, and aggregator-LLM—each fail or regress.
Agentic CLEAR: Automating Multi-Level Evaluation of LLM Agents
Agentic CLEAR is an automatic evaluation framework for LLM-based agentic systems that analyzes behavior at three granularity levels: system, trace, and node. Unlike existing tools that rely on static error taxonomies or focus only on observability, it dynamically generates textual insights and integrates above the observability layer with an accessible UI. Experiments across four benchmarks and seven agentic settings demonstrate strong alignment with human-annotated errors and predictive accuracy for task success rates.


