Entity · model

LLaVA-1.5-13B

modelactivellava-1-5-13b-6b0fb3f5·2 events·first seen May 26, 2026

Aliases: LLaVA-1.5-13B, LLaVA-1.5

Co-occurring entities

FastV PDrop Qwen Nüwa Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models Reroute MAGIC LLaVA-1.5-7B LLaVA-665K Skill-Neuron Signatures Vision-Flan Multimodal Gain

More like this (12)

LLaVA-1.5-7B LLaMA-2-13B LLaVA-665K LLaDA-1.5-8B LLaVA-OneVision-1.5-8B LLaVA 1.6 LLaVA-v1.5-Instruct LLaDA-8B LLaDA-8B-Base LLaVA OneVision 72B LLaMA-2-7B-32K-Instruct LLaMA-7B

Recent events (2)

5arXiv · cs.AI·Jun 11, 2026·source ↗

Reroute: Training-free recoverable visual token routing for vision-language models

A new arXiv preprint proposes Reroute, a training-free plug-in that replaces the standard rank-and-remove visual token pruning paradigm in VLMs with a recoverable routing mechanism. Instead of permanently discarding low-ranked tokens, Reroute defers them to re-enter the candidate pool at later decoder stages, addressing the problem that token importance shifts across decoder depth. Evaluated on LLaVA-1.5 and Qwen backbones augmented with FastV, PDrop, and Nüwa pruning methods, Reroute improves grounding performance under aggressive token reduction without sacrificing general VQA accuracy. The approach preserves the theoretical compute and KV-cache budget of the underlying pruning method.

Inference Economics Multimodal Progress FastV PDrop Qwen +4 more

6arXiv · cs.CL·May 26, 2026·source ↗

MAGIC: Multimodal Alignment & Grounding-aware Instruction Coreset for Vision-Language Models

MAGIC is a training-free coreset selection method for multimodal instruction tuning that uses three intrinsic signals—Multimodal Gain, Bridging Relevance, and Skill-Neuron Signatures—to identify compact, behaviorally faithful training subsets without backpropagation. The method operates in a three-stage pipeline: filtering low-gain examples, ranking by a quality objective, and bucket-wise budget allocation over neuron signatures. On LLaVA-665K and Vision-Flan datasets with 20% data budgets, MAGIC matches or slightly exceeds full fine-tuning performance (100.3% and 101.6% relative) while reducing wall-clock training time by 73.7%. Results transfer to LLaVA-1.5-7B and -13B target models.

Training Infrastructure Inference Economics MAGIC LLaVA-1.5-7B LLaVA-665K +5 more