Entity · person

Jiasen Lu

personactivejiasen-lu-74869b5e·1 events·first seen Jun 2, 2026

Aliases: Jiasen Lu

Co-occurring entities

FLUX.1-dev Rotary Position Embedding (RoPE)TokenBench Liangchen Song Trellis-SLAT AToken Apple ImageNet UniTok SigLIP2

More like this (12)

Jiyuan Tan Christina Lu Jiazheng Xing Jiaheng Hu Jocelyn Shen Hanjiang Hu Yanjun Zhao Jiatao Gu Yuxiao Qu Hanlin Zhu Jason Liu Baiyu Chen

Recent events (1)

6The Batch·Jun 2, 2026·source ↗

Apple's AToken: A Unified Multimodal Tokenizer and Encoder for Images, Videos, and 3D Objects

Apple researchers introduced AToken, a transformer model with a single 4D tokenizer and encoder-decoder architecture that handles images, videos, and 3D objects in a shared token space. The model is trained to both reconstruct and classify all three media types, using a pretrained SigLIP2 vision encoder extended to four dimensions with 4D Rotary Position Embedding. AToken approaches or matches specialized models on image classification (82.2% ImageNet), image generation (0.21 rFID), and 3D reconstruction (28.28 PSNR), while remaining competitive on video tasks. The work addresses a longstanding tension between generation-focused and classification-focused encoders by forcing embeddings to retain both fine visual detail and semantic content.

Frontier Model Releases Multimodal Progress FLUX.1-dev Rotary Position Embedding (RoPE)Jiasen Lu +8 more