Almanac
paper

Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

paperactiveprovisionalyour-unembedding-matrix-is-secretly-a-feature-lens-for-text-embeddings-1ccbf43d·1 events·first seen 9d ago

Aliases: Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·9d ago·source ↗

EmbedFilter: Using the unembedding matrix to suppress high-frequency token noise in LLM text embeddings

Researchers identify that LLM text embeddings over-express high-frequency but semantically uninformative tokens when projected onto vocabulary space, degrading embedding quality. They introduce EmbedFilter, a simple linear transformation that filters out the subspace of the unembedding matrix responsible for writing these tokens into embedding space. The method improves zero-shot performance on text embedding benchmarks across multiple LLM backbones and yields a byproduct of dimensionality reduction without quality loss. Code is publicly released.