Almanac
product

DramaSR-LRM

productactiveprovisionaldramasr-lrm-0bef7b9c·1 events·first seen 16h ago

Aliases: DramaSR-LRM

Co-occurring entities

More like this (12)

Recent events (1)

4arXiv · cs.CL·16h ago·source ↗

DramaSR-LRM: Reasoning LLM with multimodal tool-use for speaker recognition in TV dramas

Researchers introduce DramaSR-532K, a large-scale benchmark of 532K annotated dialogue lines across 900+ characters from long-form TV dramas, targeting multimodal speaker recognition. They also propose DramaSR-LRM, a system built on a large reasoning model that uses multimodal tool-use to aggregate auditory, linguistic, and visual cues for speaker attribution. The approach significantly outperforms baselines, especially on short utterances where acoustic biometrics alone are unreliable. Data and code are to be publicly released.