Amazon Web Services announced a partnership with AI chipmaker Cerebras Systems to offer ultra-fast inference computing, diversifying beyond Nvidia’s dominant processors.
The move signals AWS’s strategy to provide customers with specialized hardware options for real-time AI applications that demand minimal latency.
Key Takeaways
- AWS partners with Cerebras for lightning-fast AI inference computing
- OpenAI signed $10 billion deal with Cerebras earlier this year
- Cerebras chips deliver 15x faster code generation than competitors
Market Context and Strategic Implications
The partnership comes as major cloud providers seek alternatives to Nvidia’s GPU dominance in AI processing. Cerebras has emerged as a key challenger with its wafer-scale processors that eliminate traditional chip-to-chip communication bottlenecks 1.
OpenAI’s earlier $10 billion agreement with Cerebras validated the company’s technology for high-speed inference workloads. The deal involves deploying 750 megawatts of Cerebras systems through 2028 2.
Technical Advantages
Cerebras’s Wafer Scale Engine-3 contains approximately 4 trillion transistors across a single dinner-plate-sized chip. This architecture enables what the company calls “near-instant” AI responses for coding applications 3.
OpenAI’s GPT-5.3-Codex-Spark model, running on Cerebras hardware, generates code at over 1,000 tokens per second. This represents a 15x speed improvement compared to traditional GPU-based systems 4.
Industry Momentum
“Just as broadband transformed the internet, real-time inference will transform AI, enabling entirely new ways to build and interact with AI models,” said Andrew Feldman, co-founder and CEO of Cerebras 5.
The partnership reflects broader industry trends toward specialized AI processors. While Nvidia remains dominant in AI training workloads, companies like Cerebras focus specifically on inference optimization for production applications.
Competitive Landscape
Cerebras recently raised $1 billion in Series H funding at a $23 billion valuation, triple its worth from six months earlier 6. The company had previously relied heavily on UAE-based customer G42, which represented 87% of revenue in the first half of 2024.
AWS joins other major cloud providers exploring chip diversification strategies. The partnership gives AWS customers access to ultra-low latency inference capabilities while providing Cerebras with expanded market reach beyond direct enterprise sales.
Future Outlook
The collaboration positions AWS to compete more effectively in real-time AI applications where response speed is critical. Industries requiring instant AI feedback, from autonomous vehicles to financial trading, represent growing market opportunities.
For investors, the partnership validates the emerging market for specialized AI inference processors beyond Nvidia’s training-focused GPUs.
Not investment advice. For informational purposes only.
References
1John Koetsier (Feb 5, 2026). “OpenAI Now Using Cerebras’ AI Chips To Code At 1,000 Tokens Per Second”. Forbes. Retrieved March 13, 2026.
2Stephanie Palazzolo and Rocket Drew (Feb 25, 2026). “Why OpenAI’s Cerebras Chip Deal Matters; What Anthropic Wants to Know About Chinese Rivals”. The Information. Retrieved March 13, 2026.
3Ashley Capoot (Jan 16, 2026). “OpenAI has committed billions to recent chip deals. Some big names have been left out”. CNBC. Retrieved March 13, 2026.
4(Feb 12, 2026). “OpenAI deploys Cerebras chips for ‘near-instant’ code generation in first major move beyond Nvidia”. VentureBeat. Retrieved March 13, 2026.
5(Jan 14, 2026). “OpenAI partners with Cerebras”. OpenAI. Retrieved March 13, 2026.
6Suhasini Srinivasaragavan (Jan 15, 2026). “OpenAI inks AI chips deal with Nvidia challenger Cerebras”. Silicon Republic. Retrieved March 13, 2026.