product

DramaSR-LRM

productactiveprovisionaldramasr-lrm-0bef7b9c·1 events·first seen 16h ago

Aliases: DramaSR-LRM

Co-occurring entities

DramaSR-532K

More like this (12)

DramaSR-532K MedRLM G-RRM CheckRLM LuxASR DreamLM RL² SE-RRM ContextRL AdaSR RLVR ModeratorLM

Recent events (1)

4arXiv · cs.CL·16h ago·source ↗

DramaSR-LRM: Reasoning LLM with multimodal tool-use for speaker recognition in TV dramas

Researchers introduce DramaSR-532K, a large-scale benchmark of 532K annotated dialogue lines across 900+ characters from long-form TV dramas, targeting multimodal speaker recognition. They also propose DramaSR-LRM, a system built on a large reasoning model that uses multimodal tool-use to aggregate auditory, linguistic, and visual cues for speaker attribution. The approach significantly outperforms baselines, especially on short utterances where acoustic biometrics alone are unreliable. Data and code are to be publicly released.

Evaluation and Benchmarking Multimodal Progress DramaSR-LRM DramaSR-532K