Entity · organization

Zhejiang University NLP Group (ZJUNLP)

organizationactivezhejiang-university-nlp-group-zjunlp--6af31bbf·5 events·first seen May 28, 2026

Aliases: Zhejiang University NLP Group (ZJUNLP), Zhejiang University NLP Group

Co-occurring entities

More like this (12)

Zhejiang University NLP Lab Tianjin University NLP Lab OSU NLP Group Yale NLP EIT-NLP Natural Language Processing LLM-augmented clinical NLP pipeline EMNLP 2025 Yanjun Zhao Z.ai Yuan Lab AI Clinical NLP

Recent events (5)

6arXiv · cs.AI·Jun 16, 2026·source ↗

TokenPilot: Dual-granularity context management cuts LLM agent inference costs by up to 87%

TokenPilot is a cache-efficient context management framework for LLM agents that addresses the trade-off between token sparsity and prompt cache continuity. It combines Ingestion-Aware Compaction (global prefix stabilization) with Lifecycle-Aware Eviction (local segment offloading) to reduce inference costs by 56–87% across benchmarks while maintaining competitive task performance. The system is evaluated on PinchBench and Claw-Eval and has been integrated into the open-source LightMem2 library.

Inference Economics Agent and Tool Ecosystem PinchBench Claw-Eval LightMem +2 more

6arXiv · cs.CL·May 29, 2026·source ↗

BeliefTrack: Benchmarking and Improving Contextual Belief Management in LLMs

This paper introduces Contextual Belief Management (CBM) as a framework for studying how LLMs should update, preserve, or ignore information across long-horizon interactions. The authors release BeliefTrack, a closed-world benchmark with symbolic verifiers enabling exact turn-level evaluation across Rule Discovery and Circuit Diagnosis tasks. Vanilla LLMs show severe CBM failures; reinforcement learning with belief-state rewards reduces failure rates by 70.9% on average, while representation-level steering achieves 46.1% reduction. Probing experiments reveal latent belief-state dynamics underlying these failures.

Evaluation and Benchmarking Agent and Tool Ecosystem reinforcement learning with belief-state rewards Contextual Belief Management (CBM)BeliefTrack +3 more

6arXiv · cs.CL·May 29, 2026·source ↗

Parametric Memory Law for LoRA Finetuning: Quantifying LLM Memory Capacity

This paper introduces the Parametric Memory Law, a power-law relationship linking loss reduction to effective parameters and sequence length during LoRA-based LLM finetuning. The authors identify a phase transition at the token level where prediction probability p > 0.5 constitutes a sufficient condition for verbatim recall under greedy decoding. Building on these findings, they propose MemFT, a threshold-guided optimization strategy that dynamically reallocates training budget toward sub-threshold tokens, improving memory fidelity and efficiency.

Evaluation and Benchmarking Agent and Tool Ecosystem large language models MemFT Parametric Memory Law +3 more

6arXiv · cs.CL·May 28, 2026·source ↗

MemTrace: Framework for Tracing and Attributing Errors in LLM Memory Systems

MemTrace introduces a framework that converts LLM memory pipelines into executable memory evolution graphs to enable fine-grained error tracing and root-cause attribution. The authors construct MemTraceBench, a benchmark covering Long-Context, RAG, Mem0, and EverMemOS memory systems, to systematically characterize memory failure modes such as information loss and retrieval misalignment. An automatic attribution method iteratively traces operation subgraphs to pinpoint failures, and the resulting signals are used to guide prompt optimization in a closed-loop system that improves end-task performance by up to 7.62%.

Long Context Evolution Evaluation and Benchmarking Mem0 memory evolution graph MemTrace +5 more

6arXiv · cs.AI·May 28, 2026·source ↗

FluxMem: Connectivity-Evolving Memory Framework for LLM Agents

FluxMem proposes a heterogeneous graph-based memory framework for LLM agents that continuously evolves its topology through three stages: initial connection formation, feedback-driven refinement, and long-term consolidation. Unlike static memory repositories, FluxMem repairs missing links, prunes interference, aligns abstraction granularity, and distills successful trajectories into reusable procedural circuits. The system is guided by a single metric for memory generalizability and evolutionary maturity, achieving state-of-the-art results on LoCoMo, Mind2Web, and GAIA benchmarks.

Long Context Evolution Evaluation and Benchmarking heterogeneous graph memory LightMem GAIA +6 more