UNIEGO: Proxies as Mediators for Unified Egocentric Video Representation Learning
uniego-proxies-as-mediators-for-unified-egocentric-video-representation-learning-18ab43be·1 events·first seen 2d agoAliases: UNIEGO: Proxies as Mediators for Unified Egocentric Video Representation Learning
Co-occurring entities
More like this (12)
Recent events (1)
UNIEGO: Hierarchical multi-teacher distillation for unified egocentric video representation
Researchers introduce UNIEGO, an egocentric video encoder trained via a hierarchical multi-teacher distillation framework using nine teachers spanning ego-exo viewpoints, RGB/depth/skeleton modalities, and four foundation models. A key contribution is the interposition of Proxy models that translate heterogeneous teacher knowledge into a homogeneous space, followed by Selective Proxy Distillation (SPD) which adaptively selects reliable supervision signals per training sample. UNIEGO achieves state-of-the-art results on action recognition, video retrieval, and action segmentation across three ego-exo benchmarks. The work addresses a practical deployment constraint: the unified model runs from egocentric video alone despite being trained with multi-modal, multi-viewpoint supervision.