technique
EmbedFilter
techniqueactiveprovisional
embedfilter-419bcec0·1 events·first seen 9d agoAliases: EmbedFilter
Co-occurring entities
More like this (12)
Recent events (1)
EmbedFilter: Using the unembedding matrix to suppress high-frequency token noise in LLM text embeddings
Researchers identify that LLM text embeddings over-express high-frequency but semantically uninformative tokens when projected onto vocabulary space, degrading embedding quality. They introduce EmbedFilter, a simple linear transformation that filters out the subspace of the unembedding matrix responsible for writing these tokens into embedding space. The method improves zero-shot performance on text embedding benchmarks across multiple LLM backbones and yields a byproduct of dimensionality reduction without quality loss. Code is publicly released.