Jun 20, 2025 5 min read

Apple Challenges AI "Reasoning"

Sign up for ARPU: Stay informed with our newsletter - or upgrade to bespoke intelligence.

The Illusion of Thinking

For the past year, the AI industry has been captivated by a new frontier: reasoning models. Led by OpenAI's powerful "o-series," these models promised to do more than just generate text; they could "think," breaking down complex problems and arriving at more accurate answers. This was a great business model: you take your regular Large Language Model, you make it "think" for a lot longer (using a lot more of your expensive Nvidia chips), and you charge customers a premium for the supposedly better, more reasoned responses.

But what if the "thinking" is just an illusion? Earlier this month, in a deliciously ironic twist, Apple—the company everyone agrees is playing catch-up in AI—published a research paper suggesting that its competitors' fancy reasoning models might not be so smart after all.

The core of Apple's argument is that the standard tests for AI reasoning, which usually involve math and coding problems, are flawed. It's hard to know if a model is genuinely solving a problem or if it just memorized the answer from the vast library of human knowledge it was trained on. So Apple's researchers designed a new test: a series of controllable puzzles, like the Tower of Hanoi, that the models have never seen before. This setup allowed them to systematically increase the complexity to see when, and if, the models would break.

And break they did. The results were stark. Here is the abstract from Apple's paper:

Through extensive experimentation across diverse puzzles, we show that frontier [Large Reasoning Models] face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counter-intuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having an adequate token budget.

This is a polite, academic way of saying that not only do the models fail when the problems get hard, but they seem to know they're going to fail and just give up trying. They don't use their full compute budget; they just stop "thinking."

What Apple has shown is that reasoning models are just good at pattern matching that looks like reasoning. This calls into question the entire premise of the reasoning model revolution.

This finding adds to a growing sense that the AI hype cycle might be getting ahead of reality. We have talked before about the "AGI bubble bursting," with leading labs struggling to make significant leaps in model performance. Apple's paper provides a rigorous, empirical basis for that feeling. If the primary benefit of these expensive, high-compute reasoning models disappears when they face a genuinely novel and complex task, then what exactly are customers paying for? It suggests that the path to smarter AI isn't just about throwing more compute time at the problem. The next real breakthrough, it seems, won't come from making models that appear to think longer, but from inventing models that can actually think.

Amazon's Revolt

For a while now, if you wanted to build serious AI, you paid the Nvidia tax. The company’s powerful GPUs and entrenched software ecosystem created a situation where the path to AI runs straight through Santa Clara, and Nvidia collects a toll. A very, very profitable toll, with gross margins on its high-end data center chips reportedly approaching 90%. This is a nice business if you can get it.

The problem for Nvidia is that its biggest customers are companies like Amazon, who are pretty good at getting nice businesses for themselves. This week, we got a clearer picture of Amazon Web Services' quiet revolt. The company is set to announce plans for its upgraded Graviton4 central processing unit (CPU), and debut a new addition to the Trainium chip series, which are now being used at scale to power models from Anthropic in a deployment featuring over 400,000 chips. This isn't a science experiment; it's a direct challenge.

The reason for the rebellion is simple: money. When hyperscalers like Amazon, Google, and Microsoft spend tens of billions of dollars a year on Nvidia GPUs, they are effectively funding Nvidia's massive R&D budget and its spectacular profit margins. At some point, you look at your bill and think, "For this much money, couldn't I just build the thing myself?"

Of course, if it were that easy, everyone would have done it years ago. Nvidia's real advantage isn't just the hardware; it's CUDA. It's the software ecosystem that millions of developers know and the libraries that make everything just work. As one analyst put it, the difference between programming for CUDA and its alternatives is "night-and-day." This is why AMD, despite having respectable hardware, has struggled to make a serious dent. Building a chip is hard; getting an entire generation of developers to use it is harder.

But Amazon isn't just anyone. It has the capital, the engineering talent, and, most importantly, a captive customer for its chips: itself. Its strategy isn’t necessarily to build a chip that’s faster than Nvidia’s best, but one that offers better price-performance. In his annual shareholder letter, Amazon CEO Andy Jassy laid out the thinking:

AI does not have to be as expensive as it is today, and it won’t be in the future. Chips are the biggest culprit. Most AI to date has been built on one chip provider. It’s pricey. Trainium should help, as our new Trainium2 chips offer 30-40% better price-performance than the current GPU-powered compute instances generally available today.

This is the key. Amazon doesn't need to have a Nvidia-killer chip; it just needs to create a viable alternative that forces the "Nvidia tax" down. By building its own silicon, AWS can lower its own operational costs, offer more competitive pricing to its cloud customers, and claw back some of the margin that was previously flowing to Nvidia. Google has been doing this for years with its TPUs, reportedly running its AI workloads at a fraction of the cost of its rivals.

This represents a fundamental shift in the AI arms race. The battle is no longer just between model providers like OpenAI and Anthropic, or even cloud providers like AWS and Azure. It's becoming a vertical conflict, where the biggest players are integrating down the stack, from the application layer all the way to the silicon. The race is no longer just to buy the best shovels; it’s to build your own shovel factory. And Amazon just rolled the first one off its assembly line.

The Scoreboard

AI: Nvidia-backed Startup Bets on Synthetic Data for AI (ARPU)
Semiconductor: Texas Instruments Plans $60 Billion Us Investment Under Trump Push (Reuters)
Semiconductor: Intel Appoints Engineering Hires as Part of CEO Tan’s Turnaround Strategy (Reuters)
E-commerce: Temu Battles TikTok-Like Backlash Over Data (WSJ)

Enjoying these insights? If this was forwarded to you, subscribe to ARPU and never miss out on the forces driving tech:

Sign up for ARPU: Stay informed with our newsletter - or upgrade to bespoke intelligence.

The Illusion of Thinking

Amazon's Revolt

The Scoreboard

Latest on Newsletter

The Disposable Interface

Microsoft's $30 Billion Round Trip

China's Efficiency Hack

Meta's Chief Scientist Hates LLMs

Samsung's Pricing Power

Sign up for ARPU: Stay informed with our newsletter - or upgrade to bespoke intelligence.