Entity · organization

UC Berkeley

organizationactiveuc-berkeley-0ab8aaaf·9 events·first seen May 18, 2026

Aliases: UC Berkeley, UC Berkeley (BAIR), UC Berkeley BAIR Lab

Co-occurring entities

More like this (12)

Berkeley AI Research (BAIR)University of California, Berkeley Berkeley Artificial Intelligence Research UCLA University of California Los Angeles Stanford University UC Berkeley Sky Lab California State University University of Pennsylvania University of Texas Austin California Carnegie Mellon University

Recent events (9)

5Berkeley Ai Research (Bair) Blog·4d ago·source ↗

BAIR introduces ABBEL: teaching LLMs to update beliefs for efficient long-horizon interaction

Researchers from UC Berkeley's BAIR Lab present ABBEL, a method for training LLMs to perform principled belief updating during long-horizon interactive tasks, reducing the number of clarifying questions needed while maintaining task accuracy. The approach targets a core inefficiency in current LLM-based agents: failure to maintain and revise a coherent model of user intent across multi-turn interactions. The work is positioned as a step toward more efficient human-AI collaboration in agentic settings.

Agent and Tool Ecosystem Alignment and RLHF ABBEL UC Berkeley

6Berkeley Ai Research (Bair) Blog·Jul 7, 2026·source ↗

BAIR perspective: data systems must be redesigned for, of, and by AI agents as inference costs approach zero

UC Berkeley EECS professor Aditya Parameswaran and collaborators publish a landscape survey and perspective on the implications of near-zero AI inference costs for data systems, arguing that agents will soon become the dominant workload. The piece identifies three research challenges: redesigning databases for agentic query patterns (including 'agentic speculation' generating thousands of SQL queries per user request), building infrastructure to manage and coordinate agent swarms over long-running tasks, and verifying data systems synthesized by agents. Concrete findings include that 80-90% of sub-queries from multi-agent text-to-SQL workloads are redundant, motivating new multi-query optimization and approximate query processing approaches. The post draws on the authors' own ongoing research directions including structured memory and agent-synthesized data systems.

Training Infrastructure Inference Economics Berkeley AI Research (BAIR)UC Berkeley Aditya G. Parameswaran +3 more

5The Batch·Jul 3, 2026·source ↗

RoboReward: Vision-Language Reward Models for Robot Training via RL

Researchers at Stanford and UC Berkeley developed RoboReward, a family of 4B and 8B vision-language reward models designed to provide reward signals for robot reinforcement learning across diverse robot types and tasks. The team built a novel dataset by augmenting successful robot demonstrations with synthetically generated failure examples using GPT-5 mini and Qwen3-4B, then fine-tuned Qwen3-VL models to predict task progress scores. RoboReward 8B outperformed GPT-5, GPT-5 mini, and Gemini Robotics-ER 1.5 on the new RoboRewardBench evaluation, and in real-world robot trials substantially exceeded prior reward model baselines while still falling short of human-assigned rewards. The authors also release RoboRewardBench as a community benchmark for reward model evaluation.

Evaluation and Benchmarking Agent and Tool Ecosystem DeepLearning.AI Stanford University UC Berkeley +12 more

4Berkeley Ai Research (Bair) Blog·Jul 1, 2026·source ↗

BAIR Lab 2026 PhD Graduate Showcase: Placements at OpenAI, Physical Intelligence, Mistral, and Academia

Berkeley Artificial Intelligence Research (BAIR) Lab published its 2026 graduate showcase, highlighting PhD completions across LLMs, robotics, AI safety, computer vision, and human-AI interaction. Notable placements include a graduate joining OpenAI as Member of Technical Staff (LLM reasoning), one joining Physical Intelligence (generalist vision/robotics), one joining Mistral AI as AI Scientist, and one becoming an Assistant Professor at UCLA. The cohort's research themes span test-time vs. pretraining scaling tradeoffs, LLM fairness and calibration, dexterous manipulation, and generative modeling for proteins.

AI Safety Research Agent and Tool Ecosystem Baifeng Shi Mistral AI Eve Fleisig +8 more

6The Batch·Jun 1, 2026·source ↗

Test-Time Training End-to-End (TTT-E2E) Retrains Model Weights to Handle Long Inputs

Researchers from Astera Institute, Nvidia, Stanford, UC Berkeley, and UC San Diego introduced TTT-E2E, a method that compresses long context into transformer weights by training the model during inference via meta-learning. The approach uses sliding-window attention restricted to 8,000 tokens and updates only the fully connected layers of the last quarter of the network on each 1,000-token chunk at inference time, keeping per-token generation latency roughly constant as context scales to 128,000 tokens. TTT-E2E slightly outperforms vanilla transformers on next-token prediction loss across long contexts and matches efficient architectures like Mamba 2 and Gated DeltaNet on inference speed, but fails dramatically on Needle-in-a-Haystack retrieval beyond 8,000 tokens and incurs substantially higher training latency. The work reframes long-context handling as a training-inference trade-off rather than an architectural design problem.

Training Infrastructure Long Context Evolution University of California San Diego Mamba Stanford University +13 more

7Openai Blog·May 20, 2026·source ↗

Concrete Problems in AI Safety

OpenAI, Google Brain, Berkeley, and Stanford researchers co-authored 'Concrete Problems in AI Safety,' a foundational paper exploring research challenges in ensuring modern ML systems operate as intended. The paper identifies and frames specific technical safety problems for the field. Published in June 2016, it became a landmark reference for AI safety research agendas.

AI Safety Research Alignment and RLHF Concrete Problems in AI Safety Stanford University UC Berkeley +2 more

5Hugging Face Blog·May 18, 2026·source ↗

IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST

IBM Research and UC Berkeley have released IT-Bench and MAST, a benchmark suite and diagnostic framework aimed at evaluating why AI agents fail in enterprise IT environments. The work targets realistic IT operations tasks such as incident response, service management, and infrastructure automation. By categorizing failure modes systematically, MAST provides a structured taxonomy for understanding agent shortcomings beyond simple pass/fail metrics. This addresses a gap in enterprise-focused agent evaluation, where general benchmarks often fail to capture domain-specific complexity.

IBM Research UC Berkeley IT-Bench

5Berkeley Ai Research (Bair) Blog·May 18, 2026·source ↗

Information-Driven Design of Imaging Systems

Researchers from Berkeley present a framework for evaluating and optimizing imaging systems based on mutual information content rather than traditional metrics like resolution or SNR, published at NeurIPS 2025. The method estimates mutual information directly from noisy measurements using known noise physics and learned probabilistic models (including transformers and PixelCNN), avoiding the need for task-specific decoders. Validated across four domains—color photography, radio astronomy, lensless imaging, and microscopy—the information metric predicts downstream decoder performance and enables hardware optimization with less compute and memory than end-to-end neural approaches.

Evaluation and Benchmarking Inference Economics UC Berkeley information-driven imaging framework mutual information +3 more

6Berkeley Ai Research (Bair) Blog·May 18, 2026·source ↗

GRASP: Gradient-based Planning for World Models at Longer Horizons

Researchers from Berkeley, Meta, and collaborators introduce GRASP, a gradient-based planner designed to make long-horizon planning with learned world models more robust. The method addresses three core failure modes: ill-conditioned computation graphs from backpropagation through time, non-greedy loss landscapes with many local minima, and brittle gradients through high-dimensional vision models. GRASP lifts trajectory optimization into virtual states for parallel optimization across time, injects stochasticity into state iterates for exploration, and reshapes gradients to avoid problematic state-input gradient paths. The work is positioned in the context of scaling world models toward general-purpose simulators usable for control and planning.

Long Context Evolution Frontier Model Releases Mike Rabbat backpropagation through time Meta AI +7 more