Skip to content

OpenAI and Broadcom unveil Jalapeno, a custom inference chip built in nine months with AI-assisted design

· by Pondero Newsdesk

The short version

OpenAI revealed its first custom silicon on June 24, 2026: Jalapeno, an LLM inference accelerator co-developed with Broadcom that targets gigawatt-scale deployment by end of 2026 and is already running GPT-5.3-Codex-Spark in lab testing.

OpenAI and Broadcom unveil Jalapeno, a custom inference chip built in nine months with AI-assisted design

OpenAI has spent years depending entirely on Nvidia for the compute behind ChatGPT and its API. On June 24, 2026, that changed: the company unveiled Jalapeno, its first custom silicon, built with Broadcom in what the two companies described as a multi-generation partnership aimed at driving down the cost of AI inference at scale.

What

OpenAI and Broadcom jointly announced Jalapeno on June 24, per OpenAI's official announcement. The chip is an LLM inference accelerator designed from a blank slate rather than adapted from a general-purpose GPU architecture. Engineering samples are already running ML workloads in the lab at production target frequency and power, including GPT-5.3-Codex-Spark.

The development timeline is the most striking claim: OpenAI and Broadcom say the chip went from initial design to manufacturing tape-out in nine months, which they describe as "the fastest ASIC development cycle ever achieved in high-performance advanced semiconductors." OpenAI's own models accelerated parts of the design and optimization process.

Early testing shows performance per watt substantially better than current state-of-the-art alternatives, though final benchmark figures are not published yet. A detailed technical report is promised within months. The architecture reduces data movement and balances compute, memory, and networking to achieve realized utilization closer to theoretical peak. Broadcom's Tomahawk networking silicon handles the high-performance interconnect layer. Manufacturing partner Celestica handles board, rack, and system integration.

Initial deployment targets gigawatt scale by end of 2026 alongside data center partners including Microsoft, per Broadcom CEO Hock Tan's statements in the announcement. The roadmap extends across multiple chip generations.

Why it matters

Inference is where AI spending actually accumulates. Training is a one-time cost per model run; inference runs continuously every time a user sends a message or a developer makes an API call. If Jalapeno delivers the claimed efficiency gains on production workloads, that directly cuts what OpenAI spends to serve each ChatGPT and API request.

The competitive framing matters too. Google built TPUs and Amazon built Trainium specifically to reduce dependence on Nvidia and capture vertical-stack advantages in their own cloud businesses. OpenAI entering custom silicon follows the same logic, but from a different starting position: the company is a pure AI lab and software operator, not a cloud provider, so it has to partner rather than fab in-house. Broadcom is the chip implementation partner; Celestica handles systems; Microsoft provides the data center capacity.

For operators building on OpenAI's API, the near-term implication is indirect. Cheaper inference for OpenAI does not automatically mean cheaper API pricing, but it does improve the company's margin picture and reduces a structural cost that has been cited as a drag on long-term pricing sustainability. A faster, more efficient inference stack also means tighter response latency, which matters for agentic products like Codex that chain multiple calls.

The Nvidia angle is the one to watch for competitive purposes. Jalapeno is explicitly inference-only, and OpenAI's announcement notes that intensive pretraining will likely still run on Nvidia hardware. But inference is a large and growing portion of total AI compute spend as deployed models scale.

Context

The Broadcom partnership was publicly announced in October 2025, but Jalapeno is the first physical chip to emerge from it. OpenAI president Greg Brockman framed the move as part of a deliberate full-stack strategy: "We have a deep understanding of the workload," Brockman said on OpenAI's podcast after the October announcement. "We've really been looking for specific workloads that are underserved, [and asking] how can we build something that will be able to accelerate what's possible?"

The June 24 announcement was made in person: Broadcom CEO Hock Tan and President Charlie Kawwas delivered the chip to OpenAI CEO Sam Altman and Brockman at the unveiling.

What to watch next

Two concrete milestones define the near-term story. First, the detailed technical report OpenAI promised for "coming months" will show whether the efficiency claims hold up at scale. Second, the end-of-2026 target for gigawatt-scale deployment with Microsoft will indicate whether the production ramp runs on schedule. Any Nvidia pricing move or new GPU product announcement in the inference tier should be read in relation to this program.

Sources