What is the difference between on-device AI and cloud AI in a mobile app?

Cloud AI sends data to a remote server, processes it there, and returns a result. The user's device needs internet access, and the data leaves the device. On-device AI runs the model directly on the phone or tablet. No data leaves the device and the feature works without internet. For enterprises with privacy requirements, compliance obligations, or users in low-connectivity environments, on-device AI is often the right architecture.

How can I verify a vendor has shipped on-device AI in production?

Ask for an App Store link or a GitHub page for a production app that uses on-device inference. Not a demo, not a prototype — a live app users have downloaded. If they cannot provide this within 48 hours, their on-device AI capability is likely theoretical. Wednesday's Off Grid app (iOS, Android, macOS) is publicly available on the App Store and GitHub with 1,700+ stars.

What device RAM limits matter for on-device AI?

A 3B parameter model requires approximately 4GB of RAM. A 7B parameter model requires 6GB+. Most flagship devices from 2022 onward have 6-8GB of RAM. Devices with 4GB RAM can run 3B models but require careful memory management. A vendor who does not know these numbers has not shipped on-device AI in production.

What is QNN and why does it matter?

QNN (Qualcomm Neural Networks) is the SDK for running AI models on Qualcomm NPU chips, which are present in most Android flagship devices. Each Snapdragon generation (8 Gen 1, 8 Gen 2, 8 Gen 3) requires a different compiled model variant. A vendor who has not handled this complexity has not shipped on-device AI on Android in a real-world production environment.

How long does it take to add on-device AI to an existing enterprise mobile app?

The timeline depends on the feature scope and the device compatibility matrix required. A single on-device feature (e.g., voice transcription) for iOS and Android with a defined device minimum adds roughly 6-10 weeks to a standard engagement. A multi-capability suite like text, voice, and vision on both platforms adds 14-20 weeks. Wednesday scopes this precisely before build starts.

What should I do if my current vendor claims on-device AI capability but cannot show shipped work?

Treat the claim as unverified and scope the work assuming a first-time implementation. This adds contingency time and cost to your plan. Alternatively, evaluate vendors who have shipped production on-device AI and can demonstrate it. The questions in this guide will tell you within one conversation whether the capability is real.

Writing

How to Evaluate a Mobile Vendor's On-Device AI Capability Before You Sign: 2026 Guide for US Enterprise

Most mobile vendors claim on-device AI capability. Fewer than 5% have shipped it. These questions separate real from claimed.

Ali Hafizji · CEO & Co-founder, Wednesday Solutions

9 min read·Published Nov 11, 2025·Updated Nov 11, 2025

4xfaster with AI

2xfewer crashes

10xmore work, same cost

4.8on Clutch

Trusted by teams at

In this article

Claimed vs shipped capability
Five questions that reveal the truth
What a real answer looks like
The hard problems only experience reveals
Vendor scorecard
The Off Grid benchmark
How Wednesday approaches this
Frequently asked questions

Fewer than 5% of mobile development agencies have shipped a production on-device AI application. The other 95% will tell you they can do it. The difference between those two groups is not what they say in a sales call. It is whether they can show you a live app, a GitHub link, and a straight answer to five technical questions that only direct experience can answer.

This guide gives you those five questions, explains what a real answer looks like versus a practiced non-answer, and provides a scorecard to evaluate any vendor you are considering.

Key findings

Fewer than 5% of mobile agencies globally have shipped a production on-device AI app with text, image, and voice capabilities simultaneously.

Asking for an App Store link or GitHub page for a shipped on-device AI product eliminates 95% of vendors who claim capability they do not have.

The five technical questions in this guide can be answered in under 10 minutes by a team that has shipped on-device AI. A team that has not will stall, deflect, or answer incorrectly.

Wednesday's Off Grid app is one of fewer than 10 open-source on-device AI mobile apps globally with 1,000+ GitHub stars — the benchmark for what shipped on-device AI looks like.

Why claimed capability is not shipped capability

On-device AI is genuinely hard. Not in the way that "mobile development is complicated" hard. In the way that problems you encounter are not documented anywhere, and you only find them by shipping.

When a vendor says they have on-device AI capability, they almost always mean one of the following:

They have integrated a cloud AI API (OpenAI, Gemini, Claude) into a mobile app. That is cloud AI. It requires internet access and sends data to a remote server. It has nothing to do with on-device inference.

They have read about on-device AI frameworks like Core ML, ONNX Runtime, or llama.cpp and understand them theoretically. Reading and shipping are different skills.

They have built a demo or prototype in a controlled environment that runs a model on a simulator. Simulators have access to the full machine's RAM. A real iPhone 15 with 6GB of RAM and 50 other running apps is a different environment.

The only way to know whether a vendor has real on-device AI experience is to ask questions that only hands-on production shipping can answer correctly.

The five questions that reveal the truth

Question 1: Show me an on-device AI app you have shipped to production.

Not a demo. Not a client reference you have to call. An App Store link or a GitHub page for a live app that users have downloaded. This is the single most important question. A vendor with real capability answers in under two minutes.

Question 2: What happens on a 4GB iPhone when your model exceeds available RAM?

On iOS, when an app allocates more memory than the device can provide, the Metal framework calls abort() before JavaScript or any application-level code can catch the signal. The process terminates without warning and without the ability to show a user-facing error. The only way to handle this is to pre-check available RAM against model size before loading begins and degrade gracefully. A vendor who has shipped on-device AI knows this and has an architecture for it. A vendor who has not will give a generic answer about error handling.

Question 3: How do you handle chipset-specific model variants on Android?

Android devices run on chipsets from multiple manufacturers. Qualcomm's NPU (Neural Processing Unit), accessed via QNN, requires a separately compiled model variant for each Snapdragon generation — the 8 Gen 1, 8 Gen 2, and 8 Gen 3 each require a different binary. An app shipping on-device AI to Android users needs a compatibility matrix and fallback logic when the device's chipset requires a variant you have not compiled. A vendor who has shipped on Android production apps knows this. A vendor who has not will describe generic "device compatibility" without specifics.

Question 4: How do you manage background generation state independently of the UI component lifecycle?

In React Native and Flutter, if the component that triggered an AI generation request is unmounted (the user navigates away), the generation request does not automatically stop. If the architecture ties generation state to the component lifecycle, navigating away mid-generation creates memory leaks and orphaned generation threads. The correct architecture runs generation in a background service independent of any component. A vendor who has shipped this correctly can describe the architecture immediately. A vendor who has not will describe component-level state management and miss the issue entirely.

Question 5: What is your device compatibility matrix and minimum device specification?

On-device AI has hard floor requirements. A 3B parameter model requires approximately 4GB of RAM and an NPU or GPU with sufficient TFLOPS for real-time inference. A 7B parameter model requires 6GB RAM minimum. Voice transcription at production quality requires specific audio hardware capabilities. A vendor with a real device matrix will give you a table with device names, RAM requirements, and chipset minimums. A vendor without will say "most modern devices" or "flagship phones."

30 minutes with a Wednesday engineer gives you a complete on-device AI feasibility assessment for your specific app and user base.

Get my recommendation →

What a real answer looks like

A vendor with genuine on-device AI experience answers question one with a link, not a story. The app exists and is downloadable.

For questions two through five, a real answer is immediate, specific, and technical. You do not need to understand the technical details yourself. What you are listening for is whether the answer is pre-loaded and specific or whether the vendor is constructing it in real time. Vendors who have shipped on-device AI have war stories. They remember the night the Metal abort() call took down the app with no error log. They remember finding the chipset-specific QNN variant issue in production after shipping. They answer with the texture of experience.

A non-answer looks like this: "We have extensive experience with AI integration and have worked with the latest frameworks." This says nothing about on-device AI specifically. It is the answer of a team that integrates cloud AI APIs.

Another non-answer: "We can research the best approach for your specific requirements." Research is not the same as experience. You are not paying a vendor to learn on your timeline.

The technical hard problems only experience reveals

Three specific problems in on-device AI development are not documented in any framework guide, SDK documentation, or developer forum. They are discovered by shipping.

Metal buffer allocation on 4GB devices. iOS devices with 4GB RAM represent a large portion of the active iPhone install base. Apple recommends keeping app memory footprint below 50% of available RAM. A 3B parameter model in float16 precision occupies approximately 3.2GB. On a 4GB device, loading this model leaves less than 800MB for the rest of the app. The Metal framework pre-allocates GPU buffers before the model fully loads, which means the actual RAM ceiling is reached earlier than naive calculations suggest. Managing this requires pre-flight RAM checks, model quantization to 4-bit or 8-bit precision, and intelligent model offloading when the app moves to background.

QNN chipset variant management. Android's Qualcomm NPU requires compiled model artifacts specific to each Snapdragon generation. This means maintaining separate model artifacts for 8 Gen 1, 8 Gen 2, and 8 Gen 3 devices, plus fallback CPU inference for devices without Qualcomm chipsets (Exynos, MediaTek, Tensor). The device detection and model routing logic is non-trivial, and the fallback to CPU inference for non-Qualcomm Android devices changes performance characteristics enough that the UX needs to adapt.

Background generation lifecycle management. Users do not wait for AI outputs. They navigate, switch apps, and return. An on-device AI feature that ties inference to a screen's lifecycle will either lose generation state on navigation or create memory leaks from orphaned inference threads. The correct architecture runs inference in a persistent background service with a queue, returning results via callback regardless of which screen the user is on when the result is ready.

Vendor scorecard: on-device AI

Use this scorecard to evaluate any vendor you are considering for on-device AI work. Score each item 0, 1, or 2.

Evaluation criterion	Score (0-2)
Can show a production on-device AI app (App Store or GitHub link)	0-2
Can describe Metal RAM handling on 4GB devices specifically	0-2
Can describe chipset-specific model variant management on Android	0-2
Has a defined device compatibility matrix with specific device names	0-2
Can describe background generation architecture independent of UI lifecycle	0-2
Has shipped text, voice, and image AI on the same device simultaneously	0-2
Has published open-source on-device AI work (verifiable)	0-2

A perfect score is 14. Vendors with genuine on-device AI experience typically score 10-14. Vendors with cloud AI experience claiming on-device AI typically score 2-6. If a vendor scores below 6 and you proceed, build their learning time into your project timeline.

Case study — Clinical digital health platform

0patient logs lost offline — seizures logged anywhere, synced automatically

“They really cared and felt like an extension of our team. The quality of the work was top notch, and they were receptive to shifting priorities.”

Founder, Digital health platformRead the case study →

The Off Grid benchmark

Wednesday built Off Grid: a complete on-device AI suite with text generation, image generation, voice transcription, and vision capabilities running simultaneously on iOS, Android, and macOS. No cloud. No telemetry. Every inference runs on the device.

Off Grid has 50,000+ users and 1,700+ GitHub stars. It is downloadable on the App Store today. The GitHub page is public and auditable.

Building Off Grid required solving every problem described in this guide: Metal abort() handling on 4GB iPhones, chipset-specific QNN variant routing for Android flagship devices, background generation state management independent of component lifecycles, and a device compatibility matrix covering three years of flagship hardware.

Off Grid is not a demo. It is Wednesday's proof that on-device AI at production quality is achievable — and our benchmark for what every on-device AI engagement starts from.

When a vendor claims on-device AI capability, ask them to show you the equivalent of Off Grid. If they cannot, factor their learning curve into your project plan.

How Wednesday approaches vendor evaluation

When enterprises come to Wednesday evaluating their current vendor's on-device AI capability, the first step is the same: ask for the App Store link. If the link exists, the rest of the evaluation is a technical conversation about architecture. If the link does not exist, the capability is unverified and the project needs contingency time.

Wednesday's pre-engagement process for on-device AI projects includes a device compatibility assessment against your user base's hardware profile, a RAM budget analysis for your target model size, a compliance review to confirm on-device architecture satisfies your privacy requirements, and a scope estimate that accounts for chipset-specific variants and device minimum testing.

The output is a one-page scope document with a specific timeline and device support matrix before any contract is signed.

Wednesday is one of fewer than 5% of mobile agencies that have shipped production on-device AI. The capability is verifiable.

Book my 30-min call →

4.8 on Clutch

4x faster with AI2x fewer crashes100% money back

Frequently asked questions

More vendor evaluation frameworks, cost analyses, and decision guides are in the writing archive.