

One of the very first announcements on this year’s WWDC was that for the first time, third‑party developers will get to tap directly into Apple’s on‑device AI with the new Foundation Models framework. But how do these models actually compare against what’s already out there?

With the new Foundation Models framework, third-party developers can now build on the same on-device AI stack used by Apple’s native apps.

In other words, this means that developers will now be able to integrate AI features like summarizing documents, pulling key info from user text, or even generating structured content, entirely offline, with zero API cost.
But how good are Apple’s models, really?
Competitive where it counts
Based on Apple’s own human evaluations, the answer is: pretty solid, especially when you consider the balance (which some might call ‘tradeoff’) between size, speed, and efficiency.
In Apple’s testing, its ~3B parameter on-device model outperformed similar lightweight vision-language models like InternVL-2.5 and Qwen-2.5-VL-3B in image tasks, winning over 46% and 50% of prompts, respectively.

And in text, it held its ground against larger models like Gemma-3-4B, even edging ahead in some international English locales and multilingual evaluations (Portuguese, French, Japanese, etc.).
In other words, Apple’s new local models seem set to deliver consistent results for many real-world uses without resorting to the cloud or requiring data to leave the device.

When it comes to Apple’s server model (which won’t be accessible by third-party developers like the local models), it compared favorably to LLaMA-4-Scout and even outperformed Qwen-2.5-VL-32B in image understanding. That said, GPT-4o still comfortably leads the pack overall.
The “free and offline” part really matters
The real story here isn’t just that Apple’s new models are better. It’s that they’re built in. With the Foundation Models framework, developers no longer need to bundle heavy language models in their apps for offline processing. That means leaner app sizes and no need to fall back on the cloud for most tasks.
The result? A more private experience for users, and no API costs for developers, savings that can ultimately benefit everyone.
Apple says the models are optimized for structured outputs using a Swift-native “guided generation” system, which allows developers to constrain model responses directly into app logic. For apps in education, productivity, and communication, this could be a game-changer, offering the benefits of LLMs without the latency, cost, or privacy tradeoffs.
Ultimately, Apple’s models aren’t the most powerful in the world, but they don’t need to be. They’re good, they’re fast, and now they’re available to every developer for free, on-device, and offline.
That might not make for the same headlines as more powerful models will, but in practice, it could lead to a wave of genuinely useful AI features in third-party iOS apps that don’t require the cloud. And for Apple, that may very well be the point.
FTC: We use income earning auto affiliate links. More.

Comments