Entity · paper

Who Needs Labels? Adapting Vision Foundation Models With the Metadata You Already Have

paperactivewho-needs-labels-adapting-vision-foundation-models-with-the-metadata-you-already-have-76a6d1d2·1 events·first seen Jun 4, 2026

Aliases: Who Needs Labels? Adapting Vision Foundation Models With the Metadata You Already Have

Co-occurring entities

FINO

More like this (12)

Beyond Independent Labels: Schwartz-Geometry Decoding for Human Value Detection Evidence Attribution in Visual Document Understanding without Coordinates or Region Labels Vision-Language Models Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models Selective Disclosure Watermarking for Large Language Models Soft Label Supervision LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight Verifier Benchmarking Multimodal Large Language Models for Scientific Visualization Literacy Field Order Should Not Matter: Permutation-Invariant Embedding Model Fine-Tuning for Structured Metadata Retrieval MLSkip: Data Skipping for ML Filters via Lightweight Metadata Gaze Heads: How VLMs Look at What They Describe

Recent events (1)

5arXiv · cs.AI·Jun 4, 2026·source ↗

FINO: Label-free adaptation of vision foundation models using metadata in scientific domains

Researchers propose FINO, a self-supervised method for adapting vision foundation models to specialized scientific domains without task labels, using metadata as a guidance signal instead. The approach combines a standard self-supervised objective with flexible handling of both discrete and continuous metadata to preserve informative factors while suppressing spurious ones. Evaluated across subcellular fluorescence microscopy, Earth observation, wildlife monitoring, and medical imaging, FINO outperforms both unsupervised domain adaptation and fully supervised fine-tuning, including domain-specific state-of-the-art models.

Evaluation and Benchmarking FINO Who Needs Labels? Adapting Vision Foundation Models With the Metadata You Already Have