How do I tell if a vendor's on-device AI claim is real?

Ask for an App Store link or GitHub page for a live production app that uses on-device inference. If the vendor cannot produce this within 48 hours, their capability is theoretical. A vendor who has shipped on-device AI has a public record of it. Wednesday's Off Grid app is publicly available on both iOS and Android App Stores and has 1,700+ GitHub stars — the public record is verifiable in under a minute.

What is the difference between on-device AI and running a local AI model?

They are the same thing in practical terms. On-device AI refers to any AI inference that runs on the device's hardware (NPU, GPU, or CPU) rather than on a remote server. A local AI model is the model file that enables this inference. Both terms describe the same architecture: the model is installed on the device and processes data locally without network access.

Can a vendor learn on-device AI during my engagement?

Yes, but you will pay for the learning curve in time and cost overruns. The hard problems (Metal abort() handling, QNN chipset variants, background generation state) typically surface 6-10 weeks into build when the vendor encounters them for the first time. Discovering and solving these problems adds 4-8 weeks to the project. A vendor who has solved them before solves them in days. Budget accordingly if you proceed with a vendor who has not shipped on-device AI in production.

What models run on-device on enterprise mobile hardware?

Models up to 7B parameters run on flagship devices with 6GB+ RAM (iPhone 15 Pro, Samsung S24+, Pixel 8 Pro). Models up to 3B parameters run on mid-range flagship devices with 4GB RAM. Whisper (voice transcription) runs on any device from 2020 onward with minimal performance impact. Image classification and object detection models run on hardware from 2019 onward. Image generation requires NPU acceleration: Apple A15+ or Snapdragon 8 Gen 1+.

How does on-device AI handle the fact that models keep improving?

On-device AI apps update the model as part of app updates. When a better model is available, the new model file is included in the next app release and downloaded to the device. The user experience improves with each update, just as any other app capability improves. Wednesday's Off Grid has shipped multiple model upgrades since launch through standard app updates.

Writing

Why Most Mobile Vendors Cannot Deliver On-Device AI: What US Enterprise Buyers Need to Know 2026

Fewer than 5% of mobile agencies have shipped on-device AI in production. Here is why, and what to ask before you commit.

Praveen Kumar · Technical Lead, Wednesday Solutions

9 min read·Published Mar 11, 2026·Updated Mar 11, 2026

4xfaster with AI

2xfewer crashes

10xmore work, same cost

4.8on Clutch

Trusted by teams at

In this article

Claiming vs shipping
What "AI capable" actually means
The hard problems vendors skip
Metal abort(): the undocumented problem
QNN chipset variants on Android
Background generation state
Off Grid as the verification standard
Frequently asked questions

Fewer than 5% of mobile development agencies have shipped a production on-device AI app with text inference. This is not a guess. It is the outcome of asking hundreds of vendors a simple question: show me the App Store link. The ones who can answer immediately are in the 5%. The ones who cannot are in the 95% who claim capability they do not have.

This matters to you because on-device AI is categorically harder than cloud AI integration. If your vendor has not shipped it, you are paying for their first attempt. This guide explains why the gap exists and what the hard problems look like up close.

Key findings

Fewer than 5% of mobile development agencies have shipped a production on-device AI app with text inference.

Wednesday's Off Grid is one of fewer than 10 open-source on-device AI mobile apps globally with 1,000+ GitHub stars — and the hard problems it solved are not documented anywhere.

The three hard problems (Metal abort() handling, chipset-specific QNN variants, background generation state) are discovered by shipping, not by reading documentation.

A vendor discovering these problems for the first time on your engagement adds 4-8 weeks of unplanned time to your project.

The gap between claiming and shipping

Cloud AI integration is a skill most mobile vendors have. The pattern is well-documented: add an API client, send user input to a cloud endpoint, receive a response, display it. Any competent mobile engineer can do this in a few days. The frameworks are mature, the documentation is thorough, and the error handling is straightforward.

On-device AI is a different discipline. The model runs on the device's hardware. The developer is responsible for memory management, hardware acceleration, model loading and offloading, device compatibility, and the failure modes that emerge when those constraints interact. The failure modes are not in any documentation because they are discovered in production.

When vendors say "we have AI capability," they almost always mean cloud AI integration. They have built features that call AI APIs. That is genuine work and it has value. It is not on-device AI.

The confusion is not always intentional. Some vendors do not understand the distinction themselves. They have shipped AI features, those features work, and when a client asks "can you do on-device AI?" the answer is "yes, we do AI" without recognising that the question was about a different class of problem.

What most vendors actually mean by "AI capable"

When a mobile vendor tells you they are "AI capable" or have "shipped AI features," ask one follow-up question: was the AI inference running on the device or on a remote server?

If the answer is "it called the OpenAI API" or "we used Claude" or "it connected to Google AI" — that is cloud AI. The model ran on someone else's server. The mobile app was a thin client that sent data and received a response.

Cloud AI is not a lesser capability. For many use cases, it is the right architecture. But it shares almost nothing technically with on-device AI development. The skills, the failure modes, the architecture decisions, and the device constraints are completely different.

A vendor with cloud AI experience building their first on-device AI feature will encounter the same problems every first-time on-device team encounters. Those problems will surface mid-project. They will take time to solve. Your timeline will absorb the cost.

The hard problems vendors do not talk about

Three specific problems in on-device AI development are not in any SDK documentation, framework guide, or developer forum. They are discovered by shipping to real users on real devices.

A vendor who has shipped on-device AI can describe these problems specifically and explain how they handled them. A vendor who has not will give a generic answer about device compatibility and memory management without naming the specific failure modes.

Metal abort(): the problem nobody documents

On iOS, the Metal framework handles GPU memory allocation. When an app tries to allocate more GPU memory than the device can provide — which happens when loading a large on-device model on a device with limited RAM — Metal does not return an error code. It calls abort().

abort() terminates the process immediately. It does not throw an exception. It does not return an error. It does not allow application-level code to catch the failure and show a user-facing message. The app simply terminates.

This means that if you load a model that is too large for the available RAM, the app crashes silently with no recoverable error state and no user-visible explanation. From the user's perspective, the app closed. From the developer's perspective, there is no error log entry that explains why.

The only way to handle this correctly is to pre-flight available RAM before loading the model. The app checks device RAM, checks current available RAM (which changes depending on how many other apps are open), checks the model's memory requirement, and makes a go/no-go decision before the model load begins. If the available RAM is insufficient, the app degrades gracefully — showing a lower-capability fallback or an informative message — rather than crashing.

This pre-flight logic is not documented anywhere. It is discovered when an app crashes in testing on a 4GB iPhone and the developer traces the crash through the Metal layer to understand why abort() was called.

Wednesday solved this in Off Grid's iOS implementation. The pre-flight RAM check is part of every on-device AI feature Wednesday now ships.

30 minutes with a Wednesday engineer covers your specific device matrix and the architecture decisions that handle failure modes safely.

Get my recommendation →

QNN chipset variants on Android

Android devices run on chipsets from Qualcomm, Samsung (Exynos), Google (Tensor), and MediaTek. Each chipset manufacturer has a different NPU (Neural Processing Unit) architecture and different SDK for hardware-accelerated AI inference.

Qualcomm's SDK is QNN (Qualcomm Neural Networks). QNN provides access to the Snapdragon NPU, which is the fastest path for on-device AI inference on Android flagship devices. But QNN requires a compiled model artifact specific to each Snapdragon generation. The Snapdragon 8 Gen 1, 8 Gen 2, and 8 Gen 3 each require a different compiled binary. An app that ships one QNN artifact will only accelerate inference on one generation of Snapdragon chips.

The practical implication: an Android app shipping on-device AI to a wide user base needs a model routing system that detects the device's chipset, selects the appropriate compiled artifact, and falls back to CPU inference for devices without a matching Qualcomm variant. Exynos devices (some Samsung flagships), Tensor devices (Pixel), and MediaTek devices all need CPU inference unless specific artifacts are compiled for their respective NPU SDKs.

CPU inference is slower than NPU inference by a factor of 4-8x for typical model sizes. An app that runs at full speed on Snapdragon and falls back to CPU inference on Exynos has a noticeably different user experience across the Android ecosystem.

Wednesday's Off Grid Android implementation includes chipset detection, model routing across QNN variants, and CPU fallback with UX adaptation to reflect the performance difference. Building this routing system the first time requires trial and error across physical devices — not simulators, not emulators, physical devices from each chipset generation.

Background generation state management

Users do not stay on a screen while AI generates a response. They navigate away, switch apps, receive notifications, and return. An on-device AI feature that does not handle this correctly produces one of two failure modes.

The first failure mode: when the user navigates away, the generation stops. The user returns to find the output incomplete. No error, no explanation — the feature simply stopped generating when the screen changed.

The second failure mode: the generation continues, but the result is never delivered because the component that would display it has been unmounted. The AI ran for several seconds (or minutes, for longer generations), consumed battery and RAM, and produced output that went nowhere.

Both failure modes have the same root cause: the generation was tied to the component lifecycle. When the component unmounted, the generation's connection to the UI was severed.

The correct architecture runs inference in a background service that is independent of any component lifecycle. The service maintains a queue. When a generation completes, the service delivers the result via a callback to whatever screen the user is currently on — even if it is a different screen than the one that triggered the generation. If the user is not in the app, the result is stored and delivered when the app resumes.

Building this background service requires understanding both the platform's background execution rules (iOS and Android have different limits on background processing) and the on-device AI framework's threading model. It is not a single engineer's afternoon — it is a week of architecture work the first time.

Wednesday built this correctly in Off Grid, where users trigger image and text generation and then use other features while the generation runs. The result is delivered to whatever screen they are on when the generation completes.

Off Grid as the verification standard

Off Grid is Wednesday's on-device AI application. It runs complete AI inference — text generation, image generation, voice transcription, and vision analysis — on the device with no cloud connection and no telemetry. It is available on iOS, Android, and macOS.

50,000+ users have downloaded Off Grid. The GitHub page has 1,700+ stars. The code is public and auditable.

Building Off Grid required discovering and solving all three problems described above: Metal abort() handling on 4GB iPhones, chipset-specific QNN variant routing on Android, and background generation state management across both platforms. The solutions are in the app, visible to anyone who audits it.

When Wednesday evaluates a vendor's on-device AI claim, the standard is simple: show the equivalent of Off Grid. A live app, a public record, and specific technical answers to the three questions above. Any vendor who meets this standard has real experience. Any vendor who cannot describe the Metal abort() problem has not shipped on-device AI on iPhone.

Case study — Clinical digital health platform

0patient logs lost offline — seizures logged anywhere, synced automatically

“They really cared and felt like an extension of our team. The quality of the work was top notch, and they were receptive to shifting priorities.”

Founder, Digital health platformRead the case study →

Why this matters for your engagement

If your board has mandated AI in the mobile app and you are evaluating vendors, the on-device AI question is the one filter that most quickly segments the field.

95% of vendors who claim AI capability have cloud AI experience. That capability is genuine but irrelevant if your compliance requirements, data residency rules, or connectivity constraints make cloud AI impractical.

The 5% who have shipped on-device AI in production have solved problems that cannot be solved by reading. They bring solved solutions to your engagement. Your timeline absorbs their experience, not their learning curve.

Wednesday is in the 5%. Off Grid is the public record.

Wednesday's on-device AI capability is verifiable before you sign. The App Store link and GitHub page exist today.

Book my 30-min call →

4.8 on Clutch

4x faster with AI2x fewer crashes100% money back

Frequently asked questions

More guides on evaluating mobile vendor capability are in the writing archive.