5Berkeley AI Research (BAIR) Blog·1mo ago

Information-Driven Design of Imaging Systems

Researchers from Berkeley present a framework for evaluating and optimizing imaging systems based on mutual information content rather than traditional metrics like resolution or SNR, published at NeurIPS 2025. The method estimates mutual information directly from noisy measurements using known noise physics and learned probabilistic models (including transformers and PixelCNN), avoiding the need for task-specific decoders. Validated across four domains—color photography, radio astronomy, lensless imaging, and microscopy—the information metric predicts downstream decoder performance and enables hardware optimization with less compute and memory than end-to-end neural approaches.

Evaluation and Benchmarking Inference Economics UC Berkeley information-driven imaging framework mutual information autoregressive transformer NeurIPS 2025 PixelCNN

Related guides (2)

Inference EconomicsTopic guide

Inference Economics: The Cost of Running AI in Production

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

5arXiv · cs.CL·9d ago·source ↗

Information-theoretic metric for measuring semantic progress in multi-turn dialogue

A new arXiv preprint formalizes 'semantic progress' in multi-turn dialogue as question-conditioned uncertainty reduction and introduces an information-theoretic metric approximated in embedding space using a Gaussian formulation with closed-form updates. The metric has desirable theoretical properties (monotonicity, additive decomposition, diminishing returns) and requires no autoregressive inference at evaluation time, making it reproducible and lightweight. Experiments on MT-Bench, Chatbot Arena, and UltraFeedback show competitive or improved agreement with human judgments compared to several LLM-as-a-judge baselines. The approach works with lightweight embedding models under CPU-only execution.

Evaluation and Benchmarking Chatbot Arena MT-Bench UltraFeedback +1 more

4Hugging Face Blog·1mo ago·source ↗

Instruction-tuning Stable Diffusion with InstructPix2Pix

This Hugging Face blog post describes a methodology for instruction-tuning Stable Diffusion using the InstructPix2Pix framework, enabling image editing via natural language instructions. The approach adapts techniques from language model instruction-tuning to the image generation domain. The post covers dataset construction, training procedures, and evaluation of the resulting models.

Alignment and RLHF Multimodal Progress Stable Diffusion 3 InstructPix2Pix Hugging Face +1 more

5Hugging Face Blog·1mo ago·source ↗

Perceiver IO: a scalable, fully-attentional model that works on any modality

Hugging Face published a blog post introducing Perceiver IO, a general-purpose transformer-based architecture designed to handle arbitrary input and output modalities by using a small latent array to avoid quadratic attention scaling. The model decouples input size from the attention bottleneck, enabling it to process images, audio, video, text, and multimodal data within a single unified framework. The post covers the architecture's design principles and its integration into the Hugging Face ecosystem.

Long Context Evolution Multimodal Progress DeepMind Hugging Face Perceiver IO

5arXiv · cs.LG·17d ago·source ↗

Information-theoretic formalization of the binding problem in Vision Transformers

Researchers introduce a formal information-theoretic framework for the binding problem — the challenge of associating features (color, shape) with the correct objects in multi-object scenes. They develop a probing method to measure binding information in model representations and apply it to several pre-trained Vision Transformers, examining components like the [CLS] token and spatial tokens across datasets with feature sharing, occlusion, and natural features. Results position binding information as a key factor in visual recognition and reasoning quality, and suggest current ViT architectures have limited binding capability, consistent with known failure modes.

Evaluation and Benchmarking Multimodal Progress ViT (Vision Transformer)Formalizing the Binding Problem

5arXiv · cs.AI·4d ago·source ↗

Internal Oppenheim-Lim test reveals phase/sign identity codes shared across image classifier architectures

A new arXiv preprint applies a causal intervention inspired by Oppenheim and Lim (1981) to probe whether trained image classifiers encode identity in Fourier phase rather than magnitude within their hidden layers. By transplanting phase or sign components between images at chosen layers in PRISM2D, GFNet, ViT-B/16, and ResNet-50, the authors find that predictions follow the phase/sign donor across all tested architectures, with image-specific magnitude largely dispensable. ResNet-50 requires a pre-ReLU intervention to reveal a latent sign code, exposing how rectification and readout geometry shape the basis in which the code is expressed. The findings offer a mechanistic account of the texture–shape gap between CNNs and attention-based models.

Evaluation and Benchmarking ViT-B/16 GFNet PRISM2D +2 more

7arXiv · cs.AI·29d ago·source ↗

The Matching Principle: A Geometric Theory Unifying Robustness, Domain Adaptation, and Alignment via Nuisance Covariance

This paper proposes the 'matching principle': a unified geometric framework arguing that robustness methods (CORAL, IRM, adversarial training, augmentation, metric learning, Jacobian penalties, alignment constraints) are all estimators of the same object—the covariance of label-preserving deployment nuisance—and that regularizing the encoder Jacobian along this covariance's range is the core statistical problem. The authors prove closed-form optimality results in a linear-Gaussian model, introduce the Trajectory Deviation Index (TDI) as a label-free embedding sensitivity probe, and validate predictions across 13 pre-registered experimental blocks including Qwen2.5-7B. At 7B scale, matched style-PMH improves selective honesty while standard DPO degrades Style TDI, connecting the theory to alignment safety.

Evaluation and Benchmarking AI Safety Research Invariant Risk Minimization Matching Principle Qwen2.5-7B +5 more

5arXiv · cs.LG·25d ago·source ↗

Active Query Synthesis for Preference Learning via Mutual Information Maximization

This paper introduces Info-Synth, an active query synthesis framework for preference learning that generates optimal pairwise queries by maximizing a mutual information objective in continuous space, bypassing the computational cost of pool-based evaluation. A confidence-aware response model is proposed to handle ambiguous comparisons between nearly identical or highly dissimilar items. Two finite-pool extensions (Pair M-dist and Pair Opt-dist) are also introduced. The framework is validated on synthetic preference tasks, text summarization datasets, and robotic controller tuning.

Evaluation and Benchmarking Alignment and RLHF active learning Pair Opt-dist mutual information +2 more

5The Batch·22d ago·source ↗

Meta Research Improves Image Generation via Staged Planning and Self-Revision Fine-Tuning

Researchers from Meta and collaborating universities propose a fine-tuning method that teaches image generators to compose images through discrete plan-sketch-inspect-refine cycles rather than generating all at once. Starting from BAGEL-7B, they construct ~62,000 training examples using GPT-4o and FLUX.1 Kontext to supervise each stage, achieving 83% on GenEval versus 77% for the base model and a competing method (PARM) that required 11x more training data and ~8x more inference steps. The approach improves spatial relationship accuracy, object attribute fidelity, and real-world knowledge grounding in generated images.

Evaluation and Benchmarking Agent and Tool Ecosystem University of California San Diego WISE FLUX.1 Kontext +10 more