5arXiv cs.CL (Computation and Language)·3d ago

BitNet Text Embeddings: Ternary-weight LLM encoders with multi-precision output vectors

BITEMBED is a new framework that converts pretrained LLM backbones (tested on Qwen3-0.6B and Gemma3-270M) into BitNet-style text embedding encoders using ternary weights, quantized activations, and lightweight normalization refinement. The system applies continual contrastive pre-training and supervised fine-tuning with similarity-distribution and attention-relation distillation from a full-precision teacher. Evaluated on MMTEB (English v2), BITEMBED achieves performance largely comparable to full-precision teacher embedders while supporting flexible output embedding precisions to trade off storage cost. The work targets the dual bottleneck of inference compute and vector index storage in large-scale retrieval systems.

Evaluation and Benchmarking Inference Economics Qwen3.5-0.8B BitNet Gemma3-270M MMTEB BITEMBED

Related guides (2)

Inference EconomicsTopic guide

Inference Economics: The Cost of Running AI in Production

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

5Hugging Face Blog·1mo ago·source ↗

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Hugging Face published a blog post describing a method for fine-tuning large language models down to 1.58-bit precision, referencing the BitNet b1.58 quantization scheme. The post covers tooling and workflows that make extreme quantization more accessible via the Hugging Face ecosystem. This represents a practical guide to applying ternary-weight quantization ({-1, 0, 1}) to existing models through fine-tuning rather than training from scratch.

Open Weights Progress Inference Economics Transformers 1.58-bit quantization Hugging Face +1 more

4Github Trending·23d ago·source ↗

Microsoft BitNet: official inference framework for 1-bit LLMs trending on GitHub

Microsoft's BitNet repository, the official inference framework for 1-bit large language models, is trending on GitHub with over 39,000 total stars. The project enables efficient inference for extremely quantized models. Continued community interest signals ongoing relevance of 1-bit quantization as an inference efficiency approach.

Open Weights Progress Inference Economics Microsoft BitNet

5arXiv · cs.CL·2d ago·source ↗

BINEVAL: Binary question decomposition for interpretable LLM evaluation and prompt optimization

Researchers introduce BINEVAL, a framework that decomposes LLM evaluation criteria into atomic binary yes/no questions, aggregating answers into multi-dimensional interpretable scores. The approach matches or outperforms baselines including UniEval and G-Eval on SummEval, Topical-Chat, and QAGS benchmarks, with particular strength on factual consistency. Beyond evaluation, the binary question feedback is shown to support iterative prompt optimization in both self-update and cross-model settings on IFBench. The framework is training-free and task-agnostic, addressing opacity and ceiling-effect problems common in holistic LLM judges.

Evaluation and Benchmarking Alignment and RLHF IFBench G-Eval SummEval +3 more

7Qwen Research·1mo ago·source ↗

Qwen3 Embedding: State-of-the-Art Text Embedding and Reranking Models Released

Alibaba's Qwen team has released the Qwen3 Embedding series, a set of open-weights text embedding and reranking models built on the Qwen3 foundation model. The models are designed for retrieval and reranking tasks and claim state-of-the-art performance across multiple benchmarks. They are released under the Apache 2.0 license and are available on Hugging Face and ModelScope.

Evaluation and Benchmarking Open Weights Progress Qwen3 Embedding Alibaba Qwen Apache 2.0 +5 more

6Hugging Face Blog·1mo ago·source ↗

MTEB: Massive Text Embedding Benchmark

MTEB (Massive Text Embedding Benchmark) is introduced as a large-scale benchmark for evaluating text embedding models across a wide variety of tasks and datasets. The benchmark covers multiple embedding task types including classification, clustering, retrieval, and semantic similarity, enabling systematic comparison of embedding models. It provides a public leaderboard to track progress in the text embedding space. The work addresses the lack of a unified, comprehensive evaluation framework for text embeddings.

Evaluation and Benchmarking MTEB Hugging Face

5arXiv · cs.CL·20d ago·source ↗

EmbedFilter: Using the unembedding matrix to suppress high-frequency token noise in LLM text embeddings

Researchers identify that LLM text embeddings over-express high-frequency but semantically uninformative tokens when projected onto vocabulary space, degrading embedding quality. They introduce EmbedFilter, a simple linear transformation that filters out the subspace of the unembedding matrix responsible for writing these tokens into embedding space. The method improves zero-shot performance on text embedding benchmarks across multiple LLM backbones and yields a byproduct of dimensionality reduction without quality loss. Code is publicly released.

Evaluation and Benchmarking Inference Economics Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings EmbedFilter

5arXiv · cs.CL·20d ago·source ↗

TEVI: Sparse autoencoders for text-conditioned editing of CLIP image embeddings to improve vision-language alignment

TEVI is a framework that uses sparse autoencoders to disentangle CLIP image embeddings and a learned masking module to selectively reconstruct embeddings conditioned on a given caption, addressing the information imbalance between images and their captions. The approach improves image-text retrieval on both coarse-grained benchmarks (MS COCO, Flickr) and fine-grained long-caption benchmarks (IIW, DOCCI), with larger gains on richer captions. The work also shows improved robustness on the RoCOCO benchmark.

Evaluation and Benchmarking Multimodal Progress DOCCI MS COCO IIW +4 more

7Google Deepmind Blog·18d ago·source ↗

Google DeepMind releases Gemma 4 12B, a unified encoder-free multimodal open model

Google DeepMind has released Gemma 4 12B, a new open-weights multimodal model that uses a unified, encoder-free architecture. The model is positioned as a capable multimodal system at the 12B parameter scale. This is notable as an open-weights release from a frontier lab with an architectural distinction — eliminating the separate vision encoder common in most multimodal models.

Frontier Model Releases Open Weights Progress Google Google DeepMind Gemma 4 +1 more