Almanac
paper

Data Selection Through Iterative Self-Filtering for Vision-Language Settings

paperactiveprovisionaldata-selection-through-iterative-self-filtering-for-vision-language-settings-bfa887b3·1 events·first seen 43h ago

Aliases: Data Selection Through Iterative Self-Filtering for Vision-Language Settings

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.AI·43h ago·source ↗

Self-Filtering: Iterative bootstrapped data selection for vision-language model training

Researchers propose Self-Filtering, a bootstrapped data curation method for vision-language models in which a CLIP model iteratively trains on and re-selects its own training data. The approach alternates between filtering high-confidence clean samples and preserving distributional diversity, without requiring curated reference datasets or pre-trained external models. Experiments show downstream performance improvements over standard noisy training pipelines.