Entity · paper

TEVI

paperactivetevi-2bebde0a·1 events·first seen Jun 8, 2026

Aliases: TEVI

Co-occurring entities

DOCCI MS COCO IIW Flickr30k RoCOCO CLIP

More like this (12)

TIES T-Eval TTT-E2E NTREX TEQST MIRA-Ev EEVEE TIMIT τ-Voice MMTEB ETTm2 TREK

Recent events (1)

5arXiv · cs.CL·Jun 8, 2026·source ↗

TEVI: Sparse autoencoders for text-conditioned editing of CLIP image embeddings to improve vision-language alignment

TEVI is a framework that uses sparse autoencoders to disentangle CLIP image embeddings and a learned masking module to selectively reconstruct embeddings conditioned on a given caption, addressing the information imbalance between images and their captions. The approach improves image-text retrieval on both coarse-grained benchmarks (MS COCO, Flickr) and fine-grained long-caption benchmarks (IIW, DOCCI), with larger gains on richer captions. The work also shows improved robustness on the RoCOCO benchmark.

Evaluation and Benchmarking Multimodal Progress DOCCI MS COCO IIW +4 more