Almanac
technique

EmbedFilter

techniqueactiveprovisionalembedfilter-419bcec0·1 events·first seen 9d ago

Aliases: EmbedFilter

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·9d ago·source ↗

EmbedFilter: Using the unembedding matrix to suppress high-frequency token noise in LLM text embeddings

Researchers identify that LLM text embeddings over-express high-frequency but semantically uninformative tokens when projected onto vocabulary space, degrading embedding quality. They introduce EmbedFilter, a simple linear transformation that filters out the subspace of the unembedding matrix responsible for writing these tokens into embedding space. The method improves zero-shot performance on text embedding benchmarks across multiple LLM backbones and yields a byproduct of dimensionality reduction without quality loss. Code is publicly released.