FuriosaAI Teams with Broadcom on Third‑Gen Inference Accelerator

FuriosaAI announced a strategic partnership with Broadcom (NASDAQ: AVGO) to co‑develop its third‑generation AI accelerator. The collaboration extends Furiosa’s Tensor Contraction Processor (TCP) architecture and software stack onto Broadcom’s scale‑up AI networking solutions, targeting rack‑scale inference platforms for large‑scale “agentic” AI systems. For enterprise data‑center leaders, the joint effort promises higher performance‑per‑watt and tighter inter‑chip communication in hyperscale environments.

Partnership Details and Timeline

The agreement builds on FuriosaAI’s existing data‑center inference chip, RNGD, which is already in mass production using TSMC’s 5 nm process. RNGD is a 180 W, PCIe‑based accelerator validated by Samsung SDS and LG AI Research for large language model (LLM) and agentic AI workloads. Under the new partnership, Furiosa and Broadcom will combine Furiosa’s AI architecture and software with Broadcom’s XPU Technology, Ethernet scale‑up, and fabric switches to create a rack‑scale inference platform. Sampling of the third‑generation chip is scheduled to begin in the first half of 2028.

Architecture, Packaging, and Networking

The third‑generation accelerator will employ a 2 nm compute die paired with a dedicated I/O die for scale‑up networking, and will integrate HBM4/4E memory. Broadcom’s advanced packaging will enable multiple silicon dies to be assembled into a single high‑performance inference accelerator. By leveraging Broadcom’s Ethernet technologies, the design aims to provide low‑latency, high‑bandwidth all‑to‑all interconnect across hundreds of chips at rack scale, addressing “the key bottlenecks of large‑scale agentic AI,” according to Charlie Kawwas, Ph.D., president of Broadcom’s Semiconductor Solutions Group.

Software Stack and Developer Experience

Furiosa’s SDK abstracts the hardware layer, allowing developers to compile high‑level PyTorch code directly to silicon via a general compiler. For teams that need finer control, the Virtual ISA offers a declarative programming model that avoids the non‑deterministic complexity typical of GPU kernels. This approach is intended to reduce hand‑tuning effort and accelerate the deployment of new frontier models and optimization techniques.

Key Takeaways

FuriosaAI’s RNGD chip, already in mass production, serves as the foundation for the third‑generation platform co‑developed with Broadcom.
The new accelerator will use a 2 nm compute die, a dedicated I/O die, and HBM4/4E memory, with Broadcom’s packaging and Ethernet fabric delivering rack‑scale, low‑latency interconnect.
Sampling of the third‑generation solution is planned for the first half of 2028, targeting the next decade of AI data‑center deployments.

TechInsyte's Take

The Furiosa‑Broadcom effort signals a shift toward tightly integrated compute‑and‑network solutions for inference workloads that exceed traditional GPU capabilities. While the roadmap extends to 2028, enterprises should monitor the sampling results and the maturity of the software stack before committing to large‑scale deployments.

Source: Businesswire