Google and OpenAI Are Frenemies Now
Sign up for ARPU: Stay ahead of the curve on tech business trends.
Arming the Enemy
One of the great themes of the modern AI era is that the need for raw computing power is so vast and so desperate that it makes for some truly strange bedfellows. The latest example is a particularly weird one: OpenAI, the company whose ChatGPT poses the most direct existential threat to Google’s search dominance, is now a Google Cloud customer. This is like Ford announcing a major strategic engine partnership with General Motors. It’s awkward. Here is Scotiabank, via Reuters, on the deal:
The deal ... underscores the fact that the two are willing to overlook heavy competition between them to meet the massive computing demands. Ultimately, we view this as a big win for Google’s cloud unit, but ... there are continued worries that ChatGPT is becoming an incrementally larger threat to Google’s search dominance.
On the surface, OpenAI is simply trying to quench an insatiable thirst for compute. It has already partnered with Microsoft, Oracle, and CoreWeave, and now it’s adding Google to the list to avoid bottlenecks and ensure it has enough processing power to train its next generation of models. For Google’s cloud division, which has long been a distant third to Amazon and Microsoft, landing the world’s most famous AI company is a monumental win.
But the really interesting story isn’t just about the cloud contract; it’s about the silicon underneath. The infrastructure that OpenAI will be using at Google isn't primarily built on the Nvidia GPUs that power much of the AI world. Instead, it relies on Google's own custom-designed chips: Tensor Processing Units, or TPUs.
We have talked before about the “Nvidia tax.” Nvidia sells the essential picks and shovels for the AI gold rush, and it does so at gross margins that analysts estimate are in the 80% to 90% range. If you are OpenAI, you pay that tax (via Microsoft Azure). Google, on the other hand, makes its own shovels. By designing and deploying its own TPUs, it effectively bypasses the Nvidia markup, giving it a fundamental cost advantage that industry watchers believe could be as high as 4x-6x per unit of compute.
OpenAI’s decision to use Google’s TPU-powered infrastructure is the ultimate validation that there is now a powerful, scalable, non-Nvidia ecosystem capable of handling the most demanding AI workloads in the world.
This, of course, creates a fascinatingly messy situation inside Google. The Google Cloud team just landed the whale of all whales. Meanwhile, the Google Search team has to watch its biggest global competitor get stronger by renting out its own company’s most advanced, cost-efficient hardware. It is a spectacular case of one hand of a company feeding the other hand’s mortal enemy, all because the demand for compute is so absolute that it overwrites traditional rules of competition.
Feeding the Beast
Memory chipmaker Micron announced yesterday that it plans to invest about $200 billion in the US, mostly on manufacturing. Its stock went down on the news, which is a bit weird but also sort of the least interesting thing about it.
The interesting thing is what Micron is building, and why it matters for the great artificial intelligence bubble. The dirty secret of the AI boom is that it’s not just a software problem; it’s a plumbing problem. A modern AI chip from, say, Nvidia is like a world-champion speed-eater. It can perform trillions of calculations per second. But to do that, it needs to be fed a constant, fire-hose-like stream of data. If you feed it data with a regular teaspoon, it will spend most of its time sitting around waiting for the next spoonful, and you will have wasted $40,000 on a very bored GPU.
This is the memory bandwidth problem. The solution, for now, is a technology called High-Bandwidth Memory, or HBM, which is basically a whole bunch of memory chips stacked vertically, like a memory-chip skyscraper, to create a much fatter pipe for data to flow through. But it’s not enough to just build the skyscraper; you have to put it right next to the speed-eater’s mouth. This is a delicate and fantastically expensive process called “advanced packaging,” where you take the main AI processor and surgically attach the HBM skyscrapers to it on a single piece of silicon.
And the thing is, there’s really only one place in the world that is a master of this particular surgery at massive scale, and that is Taiwan Semiconductor Manufacturing Co (TSMC). This makes people in Washington, who are funding a domestic AI boom while also worrying about a risk of war over Taiwan, a little bit nervous.
Which brings us back to Micron. The company’s $200 billion plan isn't just about making more memory chips. A huge part of the bet is building out advanced packaging capabilities in the US. It’s an attempt to solve one of the most critical physical bottlenecks in the AI supply chain, right here at home.
Nvidia CEO Jensen Huang, who needs all the HBM he can get, blessed the move, calling it “an important step forward for the AI ecosystem.” And he’s right. For all the talk of digital transformation and intelligent agents, the AI race currently comes down to who can build the most, and the most complicated, stuff. It’s a battle over foundries, cooling systems, power grids and, yes, the incredibly complex business of putting memory chip skyscrapers right next to processor brains. It is a profoundly physical, capital-intensive industrial project.
The Manufacturing Chokepoint
A top U.S. official said something interesting this week about Huawei. The Chinese tech giant has been hailed as a formidable competitor in the AI chip race, a national champion shrugging off American sanctions to build powerful processors. But according to the Commerce Department, there’s a hard limit to this success. Here’s Reuters:
"Our assessment is that Huawei Ascend chip production capacity for 2025 will be at or below 200,000 and we project that most or all of that will be delivered to companies within China," Jeffrey Kessler, Under Secretary of Commerce for Industry and Security at the Commerce Department, told a congressional hearing.
Two hundred thousand chips is a lot of chips, but in the world of hyperscale AI, it’s a rounding error. Companies like Microsoft and Meta are buying Nvidia GPUs by the hundreds of thousands, if not millions. OpenAI alone was using around 720,000 NVIDIA H100 GPUs as of 2024.
So what’s going on here? If Huawei’s chips are so good—and even Nvidia’s CEO admits they are—why can’t they just make more? This gets to a fundamental truth about the semiconductor business. There is a huge difference between designing a chip and manufacturing a chip. It is one thing to have the architectural blueprint for a state-of-the-art processor. It is another thing entirely to have the factory that can print trillions of its transistors onto silicon wafers, reliably and at a reasonable cost. Huawei is very good at the first part. The second part is a problem.
As we’ve discussed around here, most chip companies are “fabless.” Nvidia doesn’t have its own factories; it sends its designs to Taiwan Semiconductor Manufacturing Co. (TSMC), the world’s most advanced contract manufacturer. TSMC is the best at this because it’s an impossibly hard business that requires staggering capital—a new fab can cost $30 billion—and decades of accumulated, specialized, hard-to-replicate expertise.
The U.S. government, being aware of this, has made it clear that TSMC is not to produce advanced chips for Huawei. This is a pretty effective chokepoint. So Huawei has to rely on China’s domestic champion, SMIC. But SMIC is also under sanctions, which leads us to the real chokepoint.
The single most important piece of equipment in this whole global chess match is the photolithography machine, which uses light to etch circuit patterns onto silicon. And the Dutch company ASML has a complete global monopoly on the most advanced machines, which use extreme ultraviolet (EUV) light and cost hundreds of millions of dollars apiece. You cannot mass produce the world's best chips without them. And the U.S. has leaned on the Dutch government to make sure ASML doesn’t sell its EUV machines to China.
This is why Huawei has a 200,000-chip problem. Its domestic foundries are stuck using older DUV machines, which are less precise. You can use them to make more advanced chips, but it requires doing multiple, complicated, expensive passes, which kills your yield and jacks up your costs. You just can’t do it at the scale needed to compete with the TSMC-Nvidia machine.
The U.S. strategy isn’t just about blocking the sale of American chips to China. It’s about blocking China’s ability to make its own advanced chips at scale. Controlling the company that makes the machines that make the factories that make the chips is a much deeper and more durable form of power. Huawei has a great blueprint, but the U.S. and its allies have a lock on the printing press.
The Scoreboard
- AI: Is Apple Struggling with AI? (ARPU)
- SaaS: Why Salesforce is Hoarding Your Slack Messages (ARPU)
- Cloud Infra: Oracle Will Build More Cloud Infrastructure Data Centers Than All Its Competitors Combined, Says Larry Ellison (Data Center Dynamics)
- Robotics: : Nvidia, Samsung Plan Investments in Robotics Startup Skild AI (Bloomberg)
Enjoying these insights? If this was forwarded to you, subscribe to ARPU and never miss out on the forces driving tech: