ViT (Vision Transformer)
vit-vision-transformer--f25a37b0·3 events·first seen 28d agoAliases: ViT (Vision Transformer), Vision Transformers (ViTs), Vision Transformer (ViT), Vision Transformer
Co-occurring entities
More like this (12)
Recent events (3)
Information-theoretic formalization of the binding problem in Vision Transformers
Researchers introduce a formal information-theoretic framework for the binding problem — the challenge of associating features (color, shape) with the correct objects in multi-object scenes. They develop a probing method to measure binding information in model representations and apply it to several pre-trained Vision Transformers, examining components like the [CLS] token and spatial tokens across datasets with feature sharing, occlusion, and natural features. Results position binding information as a key factor in visual recognition and reasoning quality, and suggest current ViT architectures have limited binding capability, consistent with known failure modes.
New ViT and ALIGN Models From Kakao Brain
Kakao Brain released new Vision Transformer (ViT) and ALIGN models, announced via the Hugging Face blog. The post covers multimodal vision-language models contributed to the open ecosystem. These models expand the available open-weights options for image-text tasks.
OrpQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization
This paper introduces Orthogonal Residual Projection (ORP), an algorithm-hardware co-design framework for ultra-low-bit quantization of LLMs and Vision Transformers targeting edge deployment. ORP addresses the structural limitations of Power-of-Two (PoT) quantization by formulating quantization as a dual-basis geometric projection that synthesizes higher-resolution residual lattices using only shift-and-add operations, eliminating multipliers. At 3-bit (W3/A16), ORP achieves 6.10 perplexity on LLaMA-2-7B, competitive with MAC-intensive baselines like AWQ, while reducing full-model calibration time to ~15 minutes. RTL synthesis at 28nm confirms hardware efficiency by mitigating timing bottlenecks from dense multiplier trees.