Almanac
technique

FlashMorph

techniqueactiveprovisionalflashmorph-85b427c5·1 events·first seen 14h ago

Aliases: FlashMorph

More like this (12)

Recent events (1)

5arXiv · cs.CL·14h ago·source ↗

FlashMorph: Learned layer selection for converting Transformers to hybrid attention models

This arXiv paper introduces FlashMorph, a method for converting standard Transformer models into hybrid attention architectures by optimally selecting which layers retain full attention versus linear attention. Rather than using heuristic placement patterns, FlashMorph frames layer selection as a budget-constrained subset optimization, jointly learning layerwise gates on synthetic long-context retrieval data with a linearization regularization term. Experiments show FlashMorph finds more effective hybrid configurations that preserve long-context recall and general benchmark performance while reducing selection cost compared to prior methods. The work addresses a practical efficiency problem in deploying long-context models at scale.