Entity · paper

When to Align, When to Predict: A Phase Diagram for Multimodal Learning

paperactivewhen-to-align-when-to-predict-a-phase-diagram-for-multimodal-learning-c5aeedec·1 events·first seen Jun 10, 2026

Aliases: When to Align, When to Predict: A Phase Diagram for Multimodal Learning

More like this (12)

Multimodal Learning multimodal classification models Co-Learning for Missing Arbitrary Modalities in Multi-modal Classification multimodal pretraining Multimodal Gain Anatomy of Post-Training: Using Interpretability to Characterize Data and Shape the Learning Signal Multimodal Continual Instruction Tuning Scaling Native Multimodal Pre-Training From Scratch Geometric Trajectory and Contrastive Learning multivariate time series representation learning Visual Instruction Tuning Aligns Modalities through Abstraction Cost-Sensitive Conformal Prediction and Human-in-the-Loop Abstention for Imbalanced High-Stakes Decision Support: A Multi-Domain Benchmark

Recent events (1)

6arXiv · cs.LG·Jun 10, 2026·source ↗

Phase diagram framework for choosing between cross-modal alignment and prediction in multimodal learning

A new arXiv preprint develops a unified linear framework to determine when cross-modal alignment (CA) versus cross-modal prediction (CP) is the better objective for multimodal representation learning. Under a spiked signal-plus-noise model, the authors derive separation ratios that expose complementary failure modes for each paradigm, producing a four-regime phase diagram (Both, CA only, CP only, Neither). A data-driven procedure lets practitioners locate their dataset in this diagram using a small labeled subsample before committing to training. Experiments on synthetic data, stereo-vision, image-caption pairs, and astrophysical data validate the framework, including a 'Neither' regime where cross-modal training is actively harmful.

Evaluation and Benchmarking Multimodal Progress When to Align, When to Predict: A Phase Diagram for Multimodal Learning