Entity · benchmark

IIW

benchmarkactiveiiw-267b5b70·1 events·first seen Jun 8, 2026

Aliases: IIW

Co-occurring entities

DOCCI MS COCO TEVI Flickr30k RoCOCO CLIP

More like this (12)

AIEWF IWSLT 2026 Web IQ WISE IFLLM IUVM IAAN CWQ AWQ UST IntelliOps DeepSWIP IQL

Recent events (1)

5arXiv · cs.CL·Jun 8, 2026·source ↗

TEVI: Sparse autoencoders for text-conditioned editing of CLIP image embeddings to improve vision-language alignment

TEVI is a framework that uses sparse autoencoders to disentangle CLIP image embeddings and a learned masking module to selectively reconstruct embeddings conditioned on a given caption, addressing the information imbalance between images and their captions. The approach improves image-text retrieval on both coarse-grained benchmarks (MS COCO, Flickr) and fine-grained long-caption benchmarks (IIW, DOCCI), with larger gains on richer captions. The work also shows improved robustness on the RoCOCO benchmark.

Evaluation and Benchmarking Multimodal Progress DOCCI MS COCO IIW +4 more