Apple is abandoning its 2024 promise to keep AI queries on Apple silicon, routing heavier requests to Google Cloud instead. According to @kimmonismus, a distilled Gemini model runs locally on iPhone while offloaded queries use Nvidia confidential-compute tech in Google Cloud.
Key facts
- Local model is a distilled version of Google's Gemini
- Heavy queries route to Google Cloud, not Apple silicon
- Apple signed off on Nvidia confidential-compute tech for Google Cloud
- Apple looked at Liquid AI for on-device model shrinking
- Private Cloud Compute name unchanged despite cloud switch
Apple's WWDC next month is expected to center on long-delayed Siri and on-device AI upgrades, but the infrastructure story is more complicated than the marketing suggests.
What Apple is actually doing
Apple will run a smaller, distilled version of Google's Gemini locally on iPhone silicon, pitched on privacy and lower token costs. [According to @kimmonismus] Most of that stack is sourced from elsewhere. The local model is distilled from Gemini.
Queries too heavy for the device route to Google Cloud, where Apple has now signed off on Nvidia's confidential-compute tech to process them. Apple is also reportedly hunting for small on-device-AI startups to speed up the model-shrinking work, having looked at Liquid AI among others.
The broken promise
One quiet shift from the 2024 rollout: Apple promised then that anything leaving your iPhone would run on Apple silicon inside Private Cloud Compute. It couldn't get the full Gemini running there, so those queries now sit in Google Cloud. The Private Cloud Compute name is staying anyway. [The Information]
The unique take
This is the first major crack in Apple's vertical-integration AI narrative. The company that spent years building custom silicon and marketing privacy as a differentiator is now routing its most sensitive user queries — voice commands, personal context — through a competitor's cloud running Nvidia GPUs. The "Private Cloud Compute" label becomes branding, not architecture.
Apple's reliance on Google for both the model (Gemini) and the compute (Google Cloud) creates an awkward dependency. If Google changes pricing, terms, or model access, Apple has limited leverage. The startup acquisition hunt suggests Apple knows this and is trying to build internal model-shrinking capability, but that takes time.
Key Takeaways
- Apple routes AI queries to Google Cloud, breaking 2024 Apple silicon pledge.
- Distilled Gemini runs locally; heavier queries use Nvidia tech in Google Cloud.
What to watch
Watch for WWDC 2026 (next month) where Apple will detail the Siri upgrade. Key metrics: whether Apple discloses the cloud provider arrangement, and whether it announces an acquisition of an on-device-AI startup like Liquid AI to reduce Google dependency.







