Entity · dataset

ImageNet

datasetactiveimagenet-04eee4ed·11 events·first seen May 19, 2026

Aliases: ImageNet, ImageNet-1K

Co-occurring entities

More like this (12)

ImageNet-100 ImageNet100 ImageNet-256 ResNet ImageNette AlexNet neural network image classifiers MNIST CIFAR-100 CIFAR-10 AI image verification PixelCNN

Recent events (11)

4arXiv · cs.LG·45h ago·source ↗

MixFrag: Fragility-guided mixed-precision post-training quantization for Vision Transformers

MixFrag is a new post-training quantization (PTQ) framework for Vision Transformers that assigns mixed bit-widths per layer based on quantization fragility, measured via KL divergence between full-precision and quantized output distributions. Bit allocation is formulated as a Multiple-Choice Knapsack Problem to optimize precision under a target bit budget. Evaluated on ImageNet-1K classification and COCO detection/segmentation, MixFrag claims state-of-the-art among mixed-precision PTQ methods, improving the prior best by up to 9.6 AP on a challenging low-bit setting.

Evaluation and Benchmarking Inference Economics COCO MixFrag ImageNet

5arXiv · cs.LG·3d ago·source ↗

Spectral-norm SAM combined with Muon optimizer achieves strong ImageNet results

A new arXiv preprint introduces a matrix-aware variant of Sharpness-Aware Minimization (SAM) that uses a layerwise spectral inner perturbation for hidden-layer weights, combined with either AdamW/SGDW or the Muon optimizer for the outer update. Experiments on ImageNet-1K with ViT-Small/16 and ResNet-50 show the spectral SAM + Muon combination achieves the best validation accuracy among evaluated methods. The work connects the recently popular Muon optimizer's matrix-structure philosophy to the SAM generalization framework.

Training Infrastructure Evaluation and Benchmarking Sharpness-Aware Minimization AdamW ImageNet +2 more

4arXiv · cs.LG·Jul 10, 2026·source ↗

SLORR: Stateless in-training low-rank regularization for compressible neural networks

Researchers introduce SLORR, a framework for in-training low-rank regularization that avoids SVD computation, architectural modifications, and stateful caching. Two variants based on Hoyer sparsity and nuclear norm are evaluated on ImageNet-1K (ResNet-50, ViT-B/16, ViT-L/16) and LLM pretraining at 135M and 560M parameter scales. SLORR-trained models compress more effectively than unregularized baselines while adding under 8% training overhead for vision models and under 1% for LLM pretraining, making post-training compression more viable.

Training Infrastructure Inference Economics ViT-B/16 SLORR Hoyer sparsity +2 more

4arXiv · cs.AI·Jun 29, 2026·source ↗

DEFAR framework uses exposure bias signals to self-rectify Flow Matching during training

A new arXiv preprint introduces DEFAR (DirEctional-Frequency Adaptive Rectification), a training framework for Flow Matching generative models that addresses exposure bias — the train/inference discrepancy — by extracting dynamic correction signals from the bias itself. The method has two components: Anti-Drift Rectification (ADR), which steers deviated inference states back toward targets, and Frequency Compensation (FC), which reinforces missing low-frequency components using bias as a self-feedback weight. Experiments on CIFAR-10, CelebA-64, and ImageNet-256/512 show improvements over prior baselines with favorable scalability and inference robustness.

Frequency Compensation CIFAR-10 Anti-Drift Rectification +3 more

5arXiv · cs.LG·Jun 17, 2026·source ↗

Large-scale benchmarking finds dataset distillation methods fail to outperform coresets on ImageNet-scale tasks

A new arXiv paper critically evaluates seven state-of-the-art dataset distillation (DD) methods against coreset selection (CS) strategies using standardized protocols on ImageNet-1K, ImageNet100, and ImageNette. Results show that some DD methods fail to beat random subsets, and SOTA DD approaches are comparable to or worse than coresets on large-scale datasets while incurring substantially higher construction costs. The paper also finds coresets achieve better coverage of the original data distribution in terms of representativeness and diversity, challenging the prevailing assumption that synthetic samples are inherently more expressive than real-data subsets.

Training Infrastructure Evaluation and Benchmarking Rethinking Dataset Distillation for Classification: Do Distilled Sets Outperform Coresets?ImageNette ImageNet +1 more

5arXiv · cs.LG·Jun 16, 2026·source ↗

Exact Posterior Score (EPS): Closed-form posterior sampling for linear inverse problems with diffusion models

A new arXiv preprint derives the exact posterior score in closed form for linear Gaussian inverse problems under general Gaussian interpolants, showing that posterior sampling reduces to a denoising problem at an operator-dependent shifted pivot under anisotropic noise covariance. The authors convert this identity into a training objective called Exact Posterior Score (EPS) that preserves the input/output structure of standard diffusion pretraining, enabling training from scratch or fine-tuning from a pretrained denoiser. EPS is evaluated on five linear inverse problems across FFHQ and ImageNet, outperforming both training-free and training-based baselines while requiring roughly an order of magnitude fewer denoiser evaluations than gradient-based posterior samplers.

Evaluation and Benchmarking Exact Posterior Score Estimation for Solving Linear Inverse Problems FFHQ ImageNet +1 more

4arXiv · cs.LG·Jun 3, 2026·source ↗

SEAOTTER: Learned compression framework for cloud robotics combining autoencoder latents with JPEG compatibility

SEAOTTER is a compression framework for cloud robotics that pairs a sensor-embedded autoencoder with a one-time JPEG transcode step, enabling extreme compression ratios while remaining compatible with standard JPEG infrastructure. At 200:1 compression versus AVIF, the system achieves 7x faster encoding, 3.5x faster decoding, and +8% ImageNet top-1 accuracy. The approach targets the asymmetric power/bandwidth constraints of sensor, cloud, and consumer stages in robotic vision pipelines, and supports general-purpose and task-aware transcoding for dense and vision-language perception tasks.

Inference Economics Multimodal Progress SEAOTTER University of Texas SysML Lab ImageNet

5The Batch·Jun 3, 2026·source ↗

Apple researchers propose Feature Auto-Encoder to speed diffusion training via compressed DINOv2 embeddings

Researchers at Apple introduced Feature Auto-Encoder (FAE), a latent diffusion image generator that compresses DINOv2 vision encoder embeddings before learning to denoise them, then expands them back for decoding. The approach achieves comparable image quality to state-of-the-art diffusion models while training roughly 7x faster on ImageNet class-conditional generation. The key insight is that shrinking semantically rich vision embeddings reduces compute during diffusion training without sacrificing the representational benefits of large pretrained encoders.

Training Infrastructure Multimodal Progress DINOv2 Yuan Gao MS COCO +7 more

6The Batch·Jun 2, 2026·source ↗

Apple's AToken: A Unified Multimodal Tokenizer and Encoder for Images, Videos, and 3D Objects

Apple researchers introduced AToken, a transformer model with a single 4D tokenizer and encoder-decoder architecture that handles images, videos, and 3D objects in a shared token space. The model is trained to both reconstruct and classify all three media types, using a pretrained SigLIP2 vision encoder extended to four dimensions with 4D Rotary Position Embedding. AToken approaches or matches specialized models on image classification (82.2% ImageNet), image generation (0.21 rFID), and 3D reconstruction (28.28 PSNR), while remaining competitive on video tasks. The work addresses a longstanding tension between generation-focused and classification-focused encoders by forcing embeddings to retain both fine visual detail and semantic content.

Frontier Model Releases Multimodal Progress FLUX.1-dev Rotary Position Embedding (RoPE)Jiasen Lu +8 more

6Openai Blog·May 20, 2026·source ↗

AI and Efficiency: Algorithmic Progress Halving Training Compute Every 16 Months Since 2012

OpenAI released an analysis showing that compute required to match AlexNet-level ImageNet performance has decreased 44x since 2012, with algorithmic efficiency doubling every 16 months. This outpaces Moore's Law, which would have yielded only an 11x improvement over the same period. The findings suggest that for heavily-invested AI tasks, algorithmic progress is a larger driver of efficiency gains than hardware improvements alone.

Training Infrastructure Evaluation and Benchmarking AlexNet Moore's Law OpenAI +2 more

5Hugging Face Blog·May 19, 2026·source ↗

LeRobot Community Datasets: The "ImageNet" of Robotics — When and How?

Hugging Face's LeRobot blog post discusses the vision and current state of building a large-scale community robotics dataset analogous to ImageNet for computer vision. The post examines what it would take to create a standardized, scalable dataset repository for robot learning, drawing on the LeRobot ecosystem. It addresses data collection formats, community contribution workflows, and the open challenges in making such a resource practically useful for training generalizable robot policies.

Evaluation and Benchmarking Open Weights Progress LeRobot Hugging Face ImageNet +1 more