Almanac
model

RWKV

modelactiverwkv-10c3f159·2 events·first seen 28d ago

Aliases: RWKV

Co-occurring entities

More like this (12)

Recent events (2)

5Hugging Face Blog·28d ago·source ↗

Introducing RWKV - An RNN with the advantages of a transformer

Hugging Face introduces RWKV, a recurrent neural network architecture that claims to combine the parallelizable training of transformers with the efficient linear-time inference of RNNs. The model avoids the quadratic attention bottleneck of standard transformers while maintaining competitive performance. RWKV represents an alternative architectural direction to the dominant transformer paradigm for language modeling.

6arXiv · cs.CL·22d ago·source ↗

Triplet-Block Diffusion RWKV: Unifying Linear-Time Causal Models with Bidirectional Discrete Diffusion

The paper introduces B³D-RWKV, a 7.2B-parameter language model that combines RWKV's O(L) linear-time inference with parallel bidirectional discrete diffusion via a triplet-block layout. This architecture resolves the fundamental tension between causal (unidirectional) and diffusion (bidirectional) attention requirements. On an 8-task evaluation suite, B³D-RWKV-7.2B achieves comparable accuracy to existing models while delivering an average 1.6× decoding throughput speedup over baselines.