Almanac
technique

state space model

techniqueactivestate-space-model-ce3a98b7·4 events·first seen 28d ago

Aliases: state space model, State-Space Model (SSM), State Space Model (SSM)

Co-occurring entities

More like this (12)

Recent events (4)

6arXiv · cs.CL·22d ago·source ↗

Language Models Need Sleep: Periodic Context Consolidation via Fast Weights and SSM Blocks

This paper proposes a sleep-like consolidation mechanism for transformer-based LLMs to address the quadratic scaling of attention with context length. During 'sleep' phases, the model performs N offline recurrent passes over accumulated context, updating fast weights in state-space model (SSM) blocks via a learned local rule, then clears the KV cache. The approach is evaluated on synthetic tasks (cellular automata, multi-hop graph retrieval) and math reasoning, where standard transformers and SSM-attention hybrids fail, with performance scaling with sleep duration N.

3arXiv · cs.AI·20d ago·source ↗

CaMBRAIN: Real-time, Continuous EEG Inference with Causal State Space Models

CaMBRAIN is a Mamba-based causal state space model designed for real-time, continuous inference on variable-length EEG signals, addressing quadratic scaling limitations of attention-based models. It introduces a multi-stage self-supervised training pipeline for long-range memory retention and achieves state-of-the-art results across three EEG datasets with over 10x throughput improvement.

7Hugging Face Blog·28d ago·source ↗

Falcon Mamba: First Strong Attention-Free 7B Model

Technology Innovation Institute (TII) releases Falcon Mamba, a 7B parameter state space model (SSM) based on the Mamba architecture, announced as the first attention-free model at this scale to match or exceed transformer-based models on standard benchmarks. The model is hosted on Hugging Face and represents a significant milestone for SSM-based architectures competing with transformers. This release advances the case for pure SSM models as viable alternatives to attention-based LLMs at the 7B scale.

5Hugging Face Blog·28d ago·source ↗

Bamba: Inference-Efficient Hybrid Mamba2 Model

Hugging Face published a blog post introducing Bamba, a hybrid architecture combining Mamba2 state-space layers with attention layers, designed for inference efficiency. The model targets reduced KV-cache memory and improved throughput compared to pure transformer architectures. The post covers architecture details, training approach, and benchmarking results positioning Bamba as a practical alternative for deployment-constrained settings.