Almanac
← Events
8OpenAI Blog·5d ago

OpenAI and Broadcom unveil Jalapeño, a custom LLM inference chip

OpenAI and Broadcom have jointly announced Jalapeño, a custom silicon chip designed specifically for LLM inference workloads. The chip targets improvements in performance, efficiency, and scale across AI systems. This marks OpenAI's entry into custom inference silicon, reducing dependence on third-party GPU suppliers for inference at scale.

Related guides (3)

Related events (8)

8Hacker News·5d ago·source ↗

OpenAI unveils first custom AI chip, manufactured by Broadcom

OpenAI has announced its first custom silicon chip, built in partnership with Broadcom. This marks a significant strategic move for OpenAI to reduce dependence on Nvidia and control its own inference and training infrastructure. Custom chip development is a major capital and engineering commitment that signals OpenAI's long-term infrastructure ambitions.

7The Batch·4d ago·source ↗

The Batch: Jalapeño inference chip, Fugu multi-agent system, Claude Tag, Robin bio-agent, and Getty-OpenAI deal

OpenAI and Broadcom announced Jalapeño, OpenAI's first custom inference chip, designed in nine months with AI-assisted design and showing better performance-per-watt than current accelerators; engineering samples are already running GPT-5.3-Codex-Spark with datacenter deployment planned by end of 2026. Sakana AI released Fugu, a multi-agent routing system that scored 73.7% on SWE-Bench Pro, outperforming Claude Opus 4.8 and GPT-5.5 while remaining below the inaccessible Fable 5. Additional items cover Anthropic's Claude Tag Slack integration for async team collaboration, Seedance 2.5 video model improvements, the Robin autonomous biology research agent that identified a novel drug candidate, and a Getty Images licensing partnership with OpenAI.

8Openai Blog·1mo ago·source ↗

OpenAI and Broadcom Announce Strategic Collaboration to Deploy 10 GW of OpenAI-Designed AI Accelerators

OpenAI and Broadcom have announced a multi-year strategic partnership targeting deployment of 10 gigawatts of OpenAI-designed AI accelerators by 2029. The collaboration involves co-developing next-generation AI accelerator systems and Ethernet networking solutions aimed at scalable, energy-efficient AI infrastructure. This represents OpenAI's continued push into custom silicon, reducing dependence on third-party chip suppliers like NVIDIA.

7Openai Blog·1mo ago·source ↗

OpenAI partners with Cerebras for 750MW of high-speed AI compute

OpenAI has announced a partnership with Cerebras Systems to add 750MW of AI compute capacity. The collaboration is aimed at reducing inference latency and improving response speeds for ChatGPT and other real-time AI workloads. Cerebras is known for its wafer-scale chip architecture optimized for fast inference.

5Github Trending·25d ago·source ↗

omlx: LLM inference server with continuous batching and SSD caching for Apple Silicon

omlx is an open-source Python project providing an LLM inference server optimized for Apple Silicon, featuring continuous batching and SSD caching managed via a macOS menu bar interface. The project has accumulated nearly 16,000 GitHub stars with strong daily momentum. It targets local inference on Apple hardware, a growing niche as consumer-grade silicon becomes increasingly capable for running open-weights models.

4Hugging Face Blog·1mo ago·source ↗

Accelerating LLM Inference with TGI on Intel Gaudi

Hugging Face's Text Generation Inference (TGI) framework has added a backend for Intel Gaudi accelerators, enabling LLM inference on Intel's AI hardware. The integration allows users to deploy large language models on Gaudi hardware using TGI's serving infrastructure. This expands the hardware ecosystem for LLM inference beyond NVIDIA GPUs, offering an alternative accelerator option for enterprise deployments.

4Hugging Face Blog·1mo ago·source ↗

Make your llama generation time fly with AWS Inferentia2

This Hugging Face blog post covers deploying and optimizing Llama 2 inference on AWS Inferentia2 accelerators. It demonstrates integration between Hugging Face's Optimum Neuron library and AWS's custom silicon to achieve competitive inference throughput and latency. The post serves as a practical guide for enterprise teams looking to reduce inference costs by moving off GPU-based infrastructure.

4Hugging Face Blog·1mo ago·source ↗

Intel and Hugging Face Partner to Democratize Machine Learning Hardware Acceleration

Intel and Hugging Face announced a partnership aimed at making hardware acceleration for machine learning more accessible. The collaboration focuses on optimizing Hugging Face models and tools to run efficiently on Intel hardware. This represents an early-stage industry alignment between a major chip manufacturer and the dominant open-source ML model hub.