RWKV
rwkv-10c3f159·2 events·first seen 28d agoAliases: RWKV
Co-occurring entities
More like this (12)
Recent events (2)
Introducing RWKV - An RNN with the advantages of a transformer
Hugging Face introduces RWKV, a recurrent neural network architecture that claims to combine the parallelizable training of transformers with the efficient linear-time inference of RNNs. The model avoids the quadratic attention bottleneck of standard transformers while maintaining competitive performance. RWKV represents an alternative architectural direction to the dominant transformer paradigm for language modeling.
Triplet-Block Diffusion RWKV: Unifying Linear-Time Causal Models with Bidirectional Discrete Diffusion
The paper introduces B³D-RWKV, a 7.2B-parameter language model that combines RWKV's O(L) linear-time inference with parallel bidirectional discrete diffusion via a triplet-block layout. This architecture resolves the fundamental tension between causal (unidirectional) and diffusion (bidirectional) attention requirements. On an 8-task evaluation suite, B³D-RWKV-7.2B achieves comparable accuracy to existing models while delivering an average 1.6× decoding throughput speedup over baselines.