paper
TEVI
paperactiveprovisional
tevi-2bebde0a·1 events·first seen 9d agoAliases: TEVI
Co-occurring entities
More like this (12)
Recent events (1)
TEVI: Sparse autoencoders for text-conditioned editing of CLIP image embeddings to improve vision-language alignment
TEVI is a framework that uses sparse autoencoders to disentangle CLIP image embeddings and a learned masking module to selectively reconstruct embeddings conditioned on a given caption, addressing the information imbalance between images and their captions. The approach improves image-text retrieval on both coarse-grained benchmarks (MS COCO, Flickr) and fine-grained long-caption benchmarks (IIW, DOCCI), with larger gains on richer captions. The work also shows improved robustness on the RoCOCO benchmark.