dataset

IFLLM

datasetactiveprovisionalifllm-0c576414·1 events·first seen 47h ago

Aliases: IFLLM

Co-occurring entities

Direct Preference Optimization (DPO)

More like this (12)

LiteLLM InfLLMv2 ICLR DistIL IIW FAISS whichllm Fast-dLLM FID MaFI FigSIM MedRLM

Recent events (1)

5arXiv · cs.CL·47h ago·source ↗

IFLLM dataset uses mouse and eye-tracking signals to improve LLM alignment via implicit feedback

Researchers introduce IFLLM, a dataset of 1,336 multi-turn interactions from 59 Mechanical Turk workers capturing mouse trajectories and webcam-derived eye gaze to study implicit user feedback for LLM alignment. A reward model trained on this implicit feedback improves text-based reward model accuracy from 55% to 64% and nearly triples relative response quality improvements when combined with DPO across eight LLMs. The work addresses the scarcity and cost of explicit preference annotations by mining behavioral signals already present in user interactions.

Evaluation and Benchmarking Alignment and RLHF Direct Preference Optimization (DPO)IFLLM