Entity · benchmark

LoCoMo

benchmarkactivelocomo-0edfe856·4 events·first seen May 28, 2026

Aliases: LoCoMo

Co-occurring entities

GAIA LongMemEval Supra Cognitive Modes MemoryAgentBench EvoArena EvoMem ConvMemory v2 mxbai-rerank-large-v1 ms-marco-MiniLM-L-6-v2 heterogeneous graph memory LightMem FluxMem Mind2Web procedural circuits Zhejiang University NLP Group (ZJUNLP)

More like this (12)

LoMo DiLoCo MoDiCoL CoInCo MS COCO CO-LMLM MuJoCo OLMo2 OLMo-3 OLMo MoE²-LoRA COCO

Recent events (4)

5arXiv · cs.CL·Jul 22, 2026·source ↗

Supra Cognitive Modes: routed architecture for agent memory across factual, relational, and synthesis workloads

A new arXiv preprint introduces Supra Cognitive Modes (SCM), an agent memory architecture that routes queries to specialized retrieval and synthesis pipelines over a shared ingest substrate combining dense embeddings, knowledge graph triples, and fact-version metadata. A frozen semantic classifier dispatches queries among lexical/dense lookup, multi-hop graph reasoning, and long-form synthesis modes. The system is evaluated on three agent memory benchmarks—LoCoMo (84.87% factoid), MemoryAgentBench (61.49%), and LongMemEval (86.00%)—though the authors note that causal routing effects, efficiency gains, and statistical significance remain unestablished.

Long Context Evolution Evaluation and Benchmarking LongMemEval LoCoMo Supra Cognitive Modes +2 more

5arXiv · cs.CL·Jun 12, 2026·source ↗

EvoArena benchmark and EvoMem memory paradigm for LLM agents in dynamic environments

Researchers introduce EvoArena, a benchmark suite that evaluates LLM agents in dynamic environments by modeling changes as progressive update sequences across terminal, software, and social domains. Alongside it, they propose EvoMem, a patch-based memory paradigm that records memory evolution as structured update histories to help agents reason about environmental change. Current agents score only 39.6% average accuracy on EvoArena, while EvoMem yields consistent gains on EvoArena and also improves performance on GAIA and LoCoMo benchmarks. The work highlights a significant gap between static-benchmark performance and real-world dynamic deployment requirements.

Evaluation and Benchmarking Agent and Tool Ecosystem EvoArena GAIA LoCoMo +1 more

3arXiv · cs.CL·Jun 10, 2026·source ↗

ConvMemory v2: Recall-preserving cross-encoder reranker for conversational memory retrieval

ConvMemory v2 is a fine-tuned cross-encoder reranker (22M parameters, based on ms-marco-MiniLM-L-6-v2) that reorders the top-10 candidates from the prior ConvMemory v1 system without changing which memories are retrieved, preserving Recall@10 by construction. On the LoCoMo conversational memory benchmark, v2 raises MRR from 0.5824 to 0.6560 and Hit@1 from 0.4440 to 0.5474, closing most of the gap to a much more expensive full-pool cross-encoder baseline. An ablation study confirms that candidate-specific memory text is the key mechanism driving the improvement.

Evaluation and Benchmarking Agent and Tool Ecosystem ConvMemory v2 LoCoMo mxbai-rerank-large-v1 +1 more

6arXiv · cs.AI·May 28, 2026·source ↗

FluxMem: Connectivity-Evolving Memory Framework for LLM Agents

FluxMem proposes a heterogeneous graph-based memory framework for LLM agents that continuously evolves its topology through three stages: initial connection formation, feedback-driven refinement, and long-term consolidation. Unlike static memory repositories, FluxMem repairs missing links, prunes interference, aligns abstraction granularity, and distills successful trajectories into reusable procedural circuits. The system is guided by a single metric for memory generalizability and evolutionary maturity, achieving state-of-the-art results on LoCoMo, Mind2Web, and GAIA benchmarks.

Long Context Evolution Evaluation and Benchmarking heterogeneous graph memory LightMem GAIA +6 more