Apple looks like an AI loser — late, no model, no ChatGPT competitor. But they might be the only company building a complete on-device AI system without burning billions in the cloud.

  • Silicon thesis since 2017 — Neural Engine went from 600B ops/sec (A11) to 38T ops/sec (M4). M5 now embeds neural accelerators directly inside GPU cores. Transistors don’t lie.

  • Unified memory — CPU, GPU, and Neural Engine share one memory pool. LLMs are memory-bound — no VRAM limits, no bus bottlenecks, no copy operations.

  • Full vertical stack — Apple controls chip, memory, OS, ML frameworks (Core ML, MLX), dev tools, and distribution across 2.5B devices. Nobody else optimizes across all these boundaries.

  • Context > intelligence — Model intelligence is commoditizing. What’s scarce is personal context. Apple has 2.5B devices worth of messages, photos, health data, calendars — all on-device, encrypted.

  • Privacy as architecture — On-device inference = no API costs, no latency, no data leaving the device. Users will want AI on personal data only if they trust it stays private.

  • Strategic optionality — Routes to Google’s Gemini for cloud queries. Skipped the cash-burning phase. Let others commoditize intelligence, own the layer where it meets personal context.

  • The gap is the opportunity — Most Neural Engine cores sit idle today. Most apps that’d justify 38T ops/sec don’t exist yet. When software catches up, the hardware is already in 2.5B pockets.

If intelligence becomes a commodity, the winner won’t be whoever trained the biggest model — it’ll be whoever deploys AI on-device, with the user’s data, under their control.

Sources;