Almanac
dataset

ImageNet

datasetactiveimagenet-04eee4ed·7 events·first seen 29d ago

Aliases: ImageNet, ImageNet-1K

Co-occurring entities

More like this (12)

Recent events (7)

6Openai Blog·28d ago·source ↗

AI and Efficiency: Algorithmic Progress Halving Training Compute Every 16 Months Since 2012

OpenAI released an analysis showing that compute required to match AlexNet-level ImageNet performance has decreased 44x since 2012, with algorithmic efficiency doubling every 16 months. This outpaces Moore's Law, which would have yielded only an 11x improvement over the same period. The findings suggest that for heavily-invested AI tasks, algorithmic progress is a larger driver of efficiency gains than hardware improvements alone.

5arXiv · cs.LG·12h ago·source ↗

Large-scale benchmarking finds dataset distillation methods fail to outperform coresets on ImageNet-scale tasks

A new arXiv paper critically evaluates seven state-of-the-art dataset distillation (DD) methods against coreset selection (CS) strategies using standardized protocols on ImageNet-1K, ImageNet100, and ImageNette. Results show that some DD methods fail to beat random subsets, and SOTA DD approaches are comparable to or worse than coresets on large-scale datasets while incurring substantially higher construction costs. The paper also finds coresets achieve better coverage of the original data distribution in terms of representativeness and diversity, challenging the prevailing assumption that synthetic samples are inherently more expressive than real-data subsets.

5Hugging Face Blog·29d ago·source ↗

LeRobot Community Datasets: The "ImageNet" of Robotics — When and How?

Hugging Face's LeRobot blog post discusses the vision and current state of building a large-scale community robotics dataset analogous to ImageNet for computer vision. The post examines what it would take to create a standardized, scalable dataset repository for robot learning, drawing on the LeRobot ecosystem. It addresses data collection formats, community contribution workflows, and the open challenges in making such a resource practically useful for training generalizable robot policies.

6The Batch·15d ago·source ↗

Apple's AToken: A Unified Multimodal Tokenizer and Encoder for Images, Videos, and 3D Objects

Apple researchers introduced AToken, a transformer model with a single 4D tokenizer and encoder-decoder architecture that handles images, videos, and 3D objects in a shared token space. The model is trained to both reconstruct and classify all three media types, using a pretrained SigLIP2 vision encoder extended to four dimensions with 4D Rotary Position Embedding. AToken approaches or matches specialized models on image classification (82.2% ImageNet), image generation (0.21 rFID), and 3D reconstruction (28.28 PSNR), while remaining competitive on video tasks. The work addresses a longstanding tension between generation-focused and classification-focused encoders by forcing embeddings to retain both fine visual detail and semantic content.

5The Batch·14d ago·source ↗

Apple researchers propose Feature Auto-Encoder to speed diffusion training via compressed DINOv2 embeddings

Researchers at Apple introduced Feature Auto-Encoder (FAE), a latent diffusion image generator that compresses DINOv2 vision encoder embeddings before learning to denoise them, then expands them back for decoding. The approach achieves comparable image quality to state-of-the-art diffusion models while training roughly 7x faster on ImageNet class-conditional generation. The key insight is that shrinking semantically rich vision embeddings reduces compute during diffusion training without sacrificing the representational benefits of large pretrained encoders.

4arXiv · cs.LG·14d ago·source ↗

SEAOTTER: Learned compression framework for cloud robotics combining autoencoder latents with JPEG compatibility

SEAOTTER is a compression framework for cloud robotics that pairs a sensor-embedded autoencoder with a one-time JPEG transcode step, enabling extreme compression ratios while remaining compatible with standard JPEG infrastructure. At 200:1 compression versus AVIF, the system achieves 7x faster encoding, 3.5x faster decoding, and +8% ImageNet top-1 accuracy. The approach targets the asymmetric power/bandwidth constraints of sensor, cloud, and consumer stages in robotic vision pipelines, and supports general-purpose and task-aware transcoding for dense and vision-language perception tasks.

5arXiv · cs.LG·36h ago·source ↗

Exact Posterior Score (EPS): Closed-form posterior sampling for linear inverse problems with diffusion models

A new arXiv preprint derives the exact posterior score in closed form for linear Gaussian inverse problems under general Gaussian interpolants, showing that posterior sampling reduces to a denoising problem at an operator-dependent shifted pivot under anisotropic noise covariance. The authors convert this identity into a training objective called Exact Posterior Score (EPS) that preserves the input/output structure of standard diffusion pretraining, enabling training from scratch or fine-tuning from a pretrained denoiser. EPS is evaluated on five linear inverse problems across FFHQ and ImageNet, outperforming both training-free and training-based baselines while requiring roughly an order of magnitude fewer denoiser evaluations than gradient-based posterior samplers.