Writing
How to Evaluate a Mobile Vendor's On-Device AI Capability Before You Sign: 2026 Guide for US Enterprise
Most mobile vendors claim on-device AI capability. Fewer than 5% have shipped it. These questions separate real from claimed.
In this article
Fewer than 5% of mobile development agencies have shipped a production on-device AI application. The other 95% will tell you they can do it. The difference between those two groups is not what they say in a sales call. It is whether they can show you a live app, a GitHub link, and a straight answer to five technical questions that only direct experience can answer.
This guide gives you those five questions, explains what a real answer looks like versus a practiced non-answer, and provides a scorecard to evaluate any vendor you are considering.
Key findings
Fewer than 5% of mobile agencies globally have shipped a production on-device AI app with text, image, and voice capabilities simultaneously.
Asking for an App Store link or GitHub page for a shipped on-device AI product eliminates 95% of vendors who claim capability they do not have.
The five technical questions in this guide can be answered in under 10 minutes by a team that has shipped on-device AI. A team that has not will stall, deflect, or answer incorrectly.
Wednesday's Off Grid app is one of fewer than 10 open-source on-device AI mobile apps globally with 1,000+ GitHub stars — the benchmark for what shipped on-device AI looks like.
Why claimed capability is not shipped capability
On-device AI is genuinely hard. Not in the way that "mobile development is complicated" hard. In the way that problems you encounter are not documented anywhere, and you only find them by shipping.
When a vendor says they have on-device AI capability, they almost always mean one of the following:
They have integrated a cloud AI API (OpenAI, Gemini, Claude) into a mobile app. That is cloud AI. It requires internet access and sends data to a remote server. It has nothing to do with on-device inference.
They have read about on-device AI frameworks like Core ML, ONNX Runtime, or llama.cpp and understand them theoretically. Reading and shipping are different skills.
They have built a demo or prototype in a controlled environment that runs a model on a simulator. Simulators have access to the full machine's RAM. A real iPhone 15 with 6GB of RAM and 50 other running apps is a different environment.
The only way to know whether a vendor has real on-device AI experience is to ask questions that only hands-on production shipping can answer correctly.
The five questions that reveal the truth
Question 1: Show me an on-device AI app you have shipped to production.
Not a demo. Not a client reference you have to call. An App Store link or a GitHub page for a live app that users have downloaded. This is the single most important question. A vendor with real capability answers in under two minutes.
Question 2: What happens on a 4GB iPhone when your model exceeds available RAM?
On iOS, when an app allocates more memory than the device can provide, the Metal framework calls abort() before JavaScript or any application-level code can catch the signal. The process terminates without warning and without the ability to show a user-facing error. The only way to handle this is to pre-check available RAM against model size before loading begins and degrade gracefully. A vendor who has shipped on-device AI knows this and has an architecture for it. A vendor who has not will give a generic answer about error handling.
Question 3: How do you handle chipset-specific model variants on Android?
Android devices run on chipsets from multiple manufacturers. Qualcomm's NPU (Neural Processing Unit), accessed via QNN, requires a separately compiled model variant for each Snapdragon generation — the 8 Gen 1, 8 Gen 2, and 8 Gen 3 each require a different binary. An app shipping on-device AI to Android users needs a compatibility matrix and fallback logic when the device's chipset requires a variant you have not compiled. A vendor who has shipped on Android production apps knows this. A vendor who has not will describe generic "device compatibility" without specifics.
Question 4: How do you manage background generation state independently of the UI component lifecycle?
In React Native and Flutter, if the component that triggered an AI generation request is unmounted (the user navigates away), the generation request does not automatically stop. If the architecture ties generation state to the component lifecycle, navigating away mid-generation creates memory leaks and orphaned generation threads. The correct architecture runs generation in a background service independent of any component. A vendor who has shipped this correctly can describe the architecture immediately. A vendor who has not will describe component-level state management and miss the issue entirely.
Question 5: What is your device compatibility matrix and minimum device specification?
On-device AI has hard floor requirements. A 3B parameter model requires approximately 4GB of RAM and an NPU or GPU with sufficient TFLOPS for real-time inference. A 7B parameter model requires 6GB RAM minimum. Voice transcription at production quality requires specific audio hardware capabilities. A vendor with a real device matrix will give you a table with device names, RAM requirements, and chipset minimums. A vendor without will say "most modern devices" or "flagship phones."
30 minutes with a Wednesday engineer gives you a complete on-device AI feasibility assessment for your specific app and user base.
Get my recommendation →What a real answer looks like
A vendor with genuine on-device AI experience answers question one with a link, not a story. The app exists and is downloadable.
For questions two through five, a real answer is immediate, specific, and technical. You do not need to understand the technical details yourself. What you are listening for is whether the answer is pre-loaded and specific or whether the vendor is constructing it in real time. Vendors who have shipped on-device AI have war stories. They remember the night the Metal abort() call took down the app with no error log. They remember finding the chipset-specific QNN variant issue in production after shipping. They answer with the texture of experience.
A non-answer looks like this: "We have extensive experience with AI integration and have worked with the latest frameworks." This says nothing about on-device AI specifically. It is the answer of a team that integrates cloud AI APIs.
Another non-answer: "We can research the best approach for your specific requirements." Research is not the same as experience. You are not paying a vendor to learn on your timeline.
The technical hard problems only experience reveals
Three specific problems in on-device AI development are not documented in any framework guide, SDK documentation, or developer forum. They are discovered by shipping.
Metal buffer allocation on 4GB devices. iOS devices with 4GB RAM represent a large portion of the active iPhone install base. Apple recommends keeping app memory footprint below 50% of available RAM. A 3B parameter model in float16 precision occupies approximately 3.2GB. On a 4GB device, loading this model leaves less than 800MB for the rest of the app. The Metal framework pre-allocates GPU buffers before the model fully loads, which means the actual RAM ceiling is reached earlier than naive calculations suggest. Managing this requires pre-flight RAM checks, model quantization to 4-bit or 8-bit precision, and intelligent model offloading when the app moves to background.
QNN chipset variant management. Android's Qualcomm NPU requires compiled model artifacts specific to each Snapdragon generation. This means maintaining separate model artifacts for 8 Gen 1, 8 Gen 2, and 8 Gen 3 devices, plus fallback CPU inference for devices without Qualcomm chipsets (Exynos, MediaTek, Tensor). The device detection and model routing logic is non-trivial, and the fallback to CPU inference for non-Qualcomm Android devices changes performance characteristics enough that the UX needs to adapt.
Background generation lifecycle management. Users do not wait for AI outputs. They navigate, switch apps, and return. An on-device AI feature that ties inference to a screen's lifecycle will either lose generation state on navigation or create memory leaks from orphaned inference threads. The correct architecture runs inference in a persistent background service with a queue, returning results via callback regardless of which screen the user is on when the result is ready.
Vendor scorecard: on-device AI
Use this scorecard to evaluate any vendor you are considering for on-device AI work. Score each item 0, 1, or 2.
| Evaluation criterion | Score (0-2) |
|---|---|
| Can show a production on-device AI app (App Store or GitHub link) | 0-2 |
| Can describe Metal RAM handling on 4GB devices specifically | 0-2 |
| Can describe chipset-specific model variant management on Android | 0-2 |
| Has a defined device compatibility matrix with specific device names | 0-2 |
| Can describe background generation architecture independent of UI lifecycle | 0-2 |
| Has shipped text, voice, and image AI on the same device simultaneously | 0-2 |
| Has published open-source on-device AI work (verifiable) | 0-2 |
A perfect score is 14. Vendors with genuine on-device AI experience typically score 10-14. Vendors with cloud AI experience claiming on-device AI typically score 2-6. If a vendor scores below 6 and you proceed, build their learning time into your project timeline.
The Off Grid benchmark
Wednesday built Off Grid: a complete on-device AI suite with text generation, image generation, voice transcription, and vision capabilities running simultaneously on iOS, Android, and macOS. No cloud. No telemetry. Every inference runs on the device.
Off Grid has 50,000+ users and 1,700+ GitHub stars. It is downloadable on the App Store today. The GitHub page is public and auditable.
Building Off Grid required solving every problem described in this guide: Metal abort() handling on 4GB iPhones, chipset-specific QNN variant routing for Android flagship devices, background generation state management independent of component lifecycles, and a device compatibility matrix covering three years of flagship hardware.
Off Grid is not a demo. It is Wednesday's proof that on-device AI at production quality is achievable — and our benchmark for what every on-device AI engagement starts from.
When a vendor claims on-device AI capability, ask them to show you the equivalent of Off Grid. If they cannot, factor their learning curve into your project plan.
How Wednesday approaches vendor evaluation
When enterprises come to Wednesday evaluating their current vendor's on-device AI capability, the first step is the same: ask for the App Store link. If the link exists, the rest of the evaluation is a technical conversation about architecture. If the link does not exist, the capability is unverified and the project needs contingency time.
Wednesday's pre-engagement process for on-device AI projects includes a device compatibility assessment against your user base's hardware profile, a RAM budget analysis for your target model size, a compliance review to confirm on-device architecture satisfies your privacy requirements, and a scope estimate that accounts for chipset-specific variants and device minimum testing.
The output is a one-page scope document with a specific timeline and device support matrix before any contract is signed.
Wednesday is one of fewer than 5% of mobile agencies that have shipped production on-device AI. The capability is verifiable.
Book my 30-min call →Frequently asked questions
More vendor evaluation frameworks, cost analyses, and decision guides are in the writing archive.
Read more decision guides →About the author
Ali Hafizji
LinkedIn →CEO, Wednesday Solutions
Ali founded Wednesday Solutions and has led on-device AI product development including Off Grid, one of fewer than 10 open-source on-device AI mobile apps globally with 1,000+ GitHub stars.
Four weeks from this call, a Wednesday squad is shipping your mobile app. 30 minutes confirms the team shape and start date.
Get your start date →Keep reading
Shipped for enterprise and growth teams across US, Europe, and Asia