paper

Data Selection Through Iterative Self-Filtering for Vision-Language Settings

paperactiveprovisionaldata-selection-through-iterative-self-filtering-for-vision-language-settings-bfa887b3·1 events·first seen 43h ago

Aliases: Data Selection Through Iterative Self-Filtering for Vision-Language Settings

Co-occurring entities

CLIP

More like this (12)

Leveraging Audio-LLMs to Filter Speech-to-Speech Training Data contrastive vision-language pretraining TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models Vision-Language Models visual language model Modeling Complex Behaviors: Multi-Personality Composition and Dynamic Switching in Vision-Language Models RECALL: Recovery Experience Collection for Active Lifelong Learning in Vision-Language-Action Models LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories Connecting Speech to Words through Images Training-Free Semantic Correction for Autoregressive Visual Models vision-language grounding

Recent events (1)

5arXiv · cs.AI·43h ago·source ↗

Self-Filtering: Iterative bootstrapped data selection for vision-language model training

Researchers propose Self-Filtering, a bootstrapped data curation method for vision-language models in which a CLIP model iteratively trains on and re-selects its own training data. The approach alternates between filtering high-confidence clean samples and preserving distributional diversity, without requiring curated reference datasets or pre-trained external models. Experiments show downstream performance improvements over standard noisy training pipelines.

Training Infrastructure Multimodal Progress Data Selection Through Iterative Self-Filtering for Vision-Language Settings CLIP