4arXiv cs.AI (Artificial Intelligence)·18h ago

TailorMind: Preference-aligned multimodal content generation via hypergraph collaborative filtering

TailorMind is a new system for personalized multimodal content generation that translates user behavioral traces into generation-ready preferences without relying on existing item pools or user-generated content. The approach combines hypergraph collaborative filtering for sparse user histories, ranking-error feedback with textual gradient descent for profile optimization, and retrieval-augmented style control. The authors introduce TailorBench, a benchmark across three platforms evaluated on coherence, novelty, aesthetic quality, hallucination, and profiling, reporting up to 29% Recall gains in reranking over baselines.

Evaluation and Benchmarking Multimodal Progress TailorBench iLearn-Lab TailorMind

Related guides (2)

Multimodal ProgressTopic guide

Multimodal Progress: How AI Learned to See, Hear, and Act

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

5arXiv · cs.LG·28d ago·source ↗

Active Query Synthesis for Preference Learning via Mutual Information Maximization

This paper introduces Info-Synth, an active query synthesis framework for preference learning that generates optimal pairwise queries by maximizing a mutual information objective in continuous space, bypassing the computational cost of pool-based evaluation. A confidence-aware response model is proposed to handle ambiguous comparisons between nearly identical or highly dissimilar items. Two finite-pool extensions (Pair M-dist and Pair Opt-dist) are also introduced. The framework is validated on synthetic preference tasks, text summarization datasets, and robotic controller tuning.

Evaluation and Benchmarking Alignment and RLHF active learning Pair Opt-dist mutual information +2 more

4arXiv · cs.AI·7d ago·source ↗

TuneJury: Open pairwise reward model for text-to-music preference alignment

Researchers introduce TuneJury, an open-source instance-level pairwise reward model for text-to-music generation that predicts preference scores from text prompts and audio clips. The model is trained on publicly available human-preference labels spanning arena votes, crowdsourced comparisons, and expert ratings. A post-hoc anchor calibration method enables efficient adaptation to new generators without full retraining. The reward model drives gains across best-of-N selection, latent optimization, and expert-iteration post-training.

Alignment and RLHF Multimodal Progress DITTO Bradley-Terry TuneJury

6arXiv · cs.AI·1mo ago·source ↗

Semantic Generative Tuning (SGT) for Unified Multimodal Models

This paper introduces Semantic Generative Tuning (SGT), a post-training paradigm for unified multimodal models (UMMs) that bridges the gap between visual understanding and visual generation. The authors find that image segmentation tasks serve as optimal generative proxies, providing structural semantics that improve both perception and generative layout fidelity. SGT aligns representation spaces across understanding and generation objectives, improving feature linear separability and visual-textual attention allocation. Evaluations show consistent gains on multimodal comprehension and generative fidelity benchmarks.

Frontier Model Releases Alignment and RLHF Semantic Generative Tuning (SGT)image segmentation generative post-training +2 more

4arXiv · cs.CL·13d ago·source ↗

GenAIR: LLM-grounded archetype representations improve sequential recommendation

GenAIR is a framework that uses LLMs to infer 'archetype' profiles of items' ideal target audiences, generating richer item embeddings for sequential recommendation systems. A behavioral calibration objective aligns these semantic embeddings with actual user interaction patterns, closing the gap between language-space representations and real-world behavior. Experiments on three datasets show consistent improvements over state-of-the-art baselines across multiple sequential recommendation models.

Enterprise Deployment Patterns GenAIR

5arXiv · cs.CL·15d ago·source ↗

M³Exam: Benchmark for Multimodal Memory in Realistic User-Agent Interactions

Researchers introduce M³Exam, a query-centric multimodal conversational memory benchmark designed to evaluate language agents on realistic user-agent interactions, including cross-modal grounding and implicit information inference. Existing benchmarks are critiqued for assuming sparse visuals and human-human interaction formats. The paper also proposes M³Proctor, a companion memory method that detects query modality bias and retrieves raw visual sources on demand, achieving 13% accuracy improvement while reducing index-construction time and retrieved tokens by over 70%.

Evaluation and Benchmarking Agent and Tool Ecosystem M³Exam M³Proctor +1 more

6arXiv · cs.CL·20d ago·source ↗

Taiji: Pareto Optimal Policy Optimization for LLM-enhanced recommendation at Kuaishou scale

Researchers from Kuaishou present Taiji, an LLM-as-Enhancer framework for industrial recommender systems that addresses two bottlenecks: generating high-quality chain-of-thought data via reverse-engineered reasoning and rejection sampling during SFT, and balancing semantic vs. ID-based rewards during RL alignment via a new algorithm called Pareto Optimal Policy Optimization (POPO). The system has been deployed on Kuaishou's advertising platform since May 2026, serving over 400 million daily users. The paper contributes both a practical deployment case study and a novel RL alignment technique for the LLM4Rec paradigm.

Enterprise Deployment Patterns Alignment and RLHF Taiji Pareto Optimal Policy Optimization Kuaishou

5arXiv · cs.CL·21d ago·source ↗

CRAM: Centroid-Routing and Adaptive MoE for Multimodal Continual Instruction Tuning

CRAM is a new method for Multimodal Continual Instruction Tuning (MCIT) that addresses the tension between catastrophic forgetting and parameter efficiency in MLLMs. It combines adaptive-rank instantiation to dynamically allocate parameters based on capability gaps, centroid-guided routing to reuse existing expert knowledge, and an orthogonality penalty to confine new updates to task-specific directions. The approach uses a Mixture-of-Experts architecture where task-specific patterns are isolated into independent modules, avoiding both the interference of shared updates and the parameter bloat of fully isolated expansion. Experiments across diverse benchmarks show consistent improvements over existing MCIT methods.

Enterprise Deployment Patterns Agent and Tool Ecosystem Multimodal Large Language Models CRAM centroid-guided routing +4 more

5Hugging Face Blog·1mo ago·source ↗

Preference Optimization for Vision Language Models

This Hugging Face blog post covers the application of Direct Preference Optimization (DPO) to vision-language models (VLMs). It likely discusses how preference learning techniques originally developed for text-only LLMs can be adapted to multimodal settings. The post addresses training methodology for aligning VLMs with human preferences across both visual and textual modalities.

Alignment and RLHF Multimodal Progress Direct Preference Optimization (DPO)Vision-Language Models Hugging Face