Does on-device AI require a special device?

For the fastest performance, yes. Apple devices with an A15 chip or newer (iPhone 13 and later) have a neural processing unit that accelerates AI inference. On Android, Snapdragon 8 Gen 1 and newer (most 2022+ flagships) have equivalent hardware. Older devices can still run smaller models but at slower speeds. A 3B parameter text model runs acceptably on an iPhone 12 but takes about twice as long per response as on an iPhone 15.

How does model size affect the app download?

Models are not bundled in the app binary. The app downloads the model to local device storage on first launch over Wi-Fi, then uses it offline from that point forward. A typical 3B parameter model in compressed format is about 1.8 GB. A 7B model is about 4 GB. The app itself adds minimal size to the App Store listing.

Is on-device AI accurate enough for business use?

For voice transcription, classification, and document scanning, yes. On-device Whisper models achieve word error rates under 5% for clear audio in English. For open-ended text generation, 3B parameter models produce good results for focused tasks like summarization or form-filling but are noticeably weaker than GPT-4 class models for complex reasoning. The right answer depends on the specific task.

Can on-device AI models be updated without an app store submission?

Yes. The model file lives in local device storage and can be downloaded and replaced independently of the app binary. This means you can update the AI model without going through app store review. The app binary itself only needs to be updated when the code that calls the model changes.

What happens to the AI when the user is offline?

Nothing changes. That is the point. The model is already on the device and runs entirely from local storage and the device processor. No network request is made. The user experience is identical whether the device has full connectivity or none.

How long does it take to add on-device AI to an existing app?

A single capability like voice transcription or document scanning typically takes four to eight weeks including integration, testing, and release. A full suite covering text generation, voice, and vision takes ten to sixteen weeks. The range depends on how tightly the feature needs to integrate with existing data and workflows in the app.

Writing

AI in Your Mobile App Without Internet: What Is Possible, What Is Not, and What It Costs in 2026

A plain-language capability table covering what on-device AI can do offline, what still requires a server, and what each capability costs to add to an enterprise app.

Anurag Rathod · Technical Lead, Wednesday Solutions

9 min read·Published Dec 12, 2025·Updated Dec 12, 2025

4xfaster with AI

2xfewer crashes

10xmore work, same cost

4.8on Clutch

Trusted by teams at

In this article

Why offline AI matters now
What on-device AI can do today
What still requires a server
Device requirements by capability
Cost to add each capability
The Off Grid reference point
How to pick what to build first
Frequently asked questions

Forty-three percent of enterprise mobile users work in locations with unreliable or no connectivity at least once a week. That number comes from field operations, hospital networks, construction sites, and office basements where Wi-Fi drops and cellular is blocked. If your AI features stop working the moment the connection drops, you have not built AI into your app. You have built a dependency on someone else's server.

On-device AI runs entirely on the phone or tablet. No request leaves the device. No connection is required. The AI processes text, voice, images, and documents using the device processor and a model stored in local memory. This is not a workaround. It is how the most privacy-sensitive and connectivity-constrained enterprise use cases get solved.

This guide covers what on-device AI can do today, what it cannot, device requirements by capability, and realistic cost ranges for adding each capability to an existing enterprise app.

Key findings

Voice transcription, text generation, document scanning, and image classification all work fully offline on modern iOS and Android devices.

Real-time language translation, complex multi-step reasoning, and web search require a server connection and cannot be replicated on-device at equivalent quality today.

Adding a single on-device AI capability to an existing app typically costs between $40,000 and $90,000 depending on complexity and integration depth.

Wednesday's Off Grid project shipped all four core capabilities (text, voice, image, vision) on iOS and Android from a single app with 50,000+ users and zero cloud AI dependency.

Why offline AI matters now

Your board's mandate to "add AI" usually means one of two things. Either they want the app to get smarter and more useful, or they want to reduce manual work for users. Both goals are achievable without a cloud AI dependency, and in many cases the on-device path is faster to approve and faster to ship.

There are three drivers pushing enterprise teams toward on-device AI specifically.

First, compliance. Any data sent to a third-party AI service is data leaving your control. For healthcare, financial services, legal, and government applications, that triggers review processes that can add months to a launch timeline. On-device processing eliminates the data transfer entirely.

Second, reliability. Field operations, clinical settings, and manufacturing floors have poor connectivity. An AI feature that drops out when the signal drops is worse than no AI feature at all. It trains users to distrust it.

Third, cost. Cloud AI charges per request. An app with millions of active users running AI features can generate six-figure monthly API bills. On-device AI has zero marginal cost per inference once the model is on the device.

What on-device AI can do today

The table below covers the capabilities your team is most likely to ask about. "On-device" means the capability works with no network connection. "Device minimum" is the oldest device that produces acceptable results.

Capability	On-device	Notes	Device minimum
Voice transcription	Yes	Whisper models, 3-5% word error rate in English	iPhone 12 / Android 2021 flagship
Document scanning and OCR	Yes	Extracts structured text from photos of documents	iPhone 11 / Android 2020 flagship
Image classification	Yes	Labels images from a fixed category list	iPhone 11 / Android 2020 flagship
Text summarization	Yes	Summarizes documents up to ~10,000 words	iPhone 13 / Android 2022 flagship
Form auto-fill from photo	Yes	Reads a form image and populates fields	iPhone 12 / Android 2021 flagship
Short text generation	Yes	Replies, descriptions, notes under 500 words	iPhone 13 / Android 2022 flagship
Language detection	Yes	Identifies the language of a text passage	iPhone 11 / Android 2020 flagship
On-device image generation	Yes (slow)	30-90 seconds per image on 2023 devices	iPhone 14 Pro / Android 2023 flagship
Named entity extraction	Yes	Extracts names, dates, amounts from text	iPhone 12 / Android 2021 flagship
Sentiment classification	Yes	Positive/neutral/negative at sentence level	iPhone 11 / Android 2020 flagship

Every capability in the table above runs with the device in airplane mode. No API key. No monthly bill. No data leaving the device.

What still requires a server

On-device AI has real limits. The table below is equally important.

Capability	Requires server	Why
Web search and real-time data	Yes	The model has no knowledge of events after its training cutoff and no access to the internet
Complex multi-step reasoning	Often	3B-7B parameter models produce noticeably weaker results on tasks requiring long chains of logic
Real-time language translation (50+ languages)	Often	High-quality translation for rare language pairs needs larger models that don't fit comfortably on-device
Large document analysis (100+ pages)	Often	Context window limits on smaller models affect quality on very long documents
High-resolution image generation	Yes	Generating high-quality images at 1024px and above takes minutes on-device vs seconds in the cloud
Custom model training or fine-tuning	Yes	Training always happens server-side; only inference runs on-device

The practical rule: on-device AI is excellent for well-defined, focused tasks. When the task requires open-ended reasoning over large amounts of information, a server connection gives better results.

Not sure which capabilities fit your use case? A 30-minute conversation with a Wednesday engineer will give you a clear list.

Get my recommendation →

Device requirements by capability

Not every user will have a 2023 flagship. Your device minimum decision affects what percentage of your user base gets the full AI experience.

Voice transcription with Whisper small runs on iPhone 12 and equivalent Android. That covers roughly 85% of enterprise iOS users and 70% of enterprise Android users based on typical enterprise device refresh cycles.

Text generation with a 3B parameter model requires an iPhone 13 or equivalent. That is about 75% of enterprise iOS users today. The gap closes every year as devices age out of enterprise fleets.

Image generation is the most demanding. Producing a single image takes 30-90 seconds on an iPhone 14 Pro. On most Android devices it takes longer. This capability is worth adding only when your use case truly requires it.

For apps where your user base skews toward newer devices (financial services, healthcare with clinical staff devices, managed enterprise deployments), the device requirements are rarely a barrier. For consumer-facing apps, they matter more.

Cost to add each capability

These ranges assume an existing native iOS and Android app with a modern architecture. Greenfield apps or apps with significant technical debt cost more. Ranges cover design, engineering, testing, and release.

Capability	Engineering cost range	Timeline
Voice transcription (Whisper)	$35,000 - $55,000	4-6 weeks
Document scanning and OCR	$30,000 - $50,000	4-6 weeks
Image classification (custom categories)	$40,000 - $70,000	5-8 weeks
Text summarization	$45,000 - $75,000	6-8 weeks
Short text generation	$55,000 - $90,000	7-10 weeks
Full on-device AI suite (text + voice + vision)	$150,000 - $250,000	14-20 weeks

The full suite cost is not simply the sum of individual capabilities. Shared infrastructure (model loading, memory management, on-device storage) is built once and used across all features, which reduces the marginal cost of each additional capability.

The Off Grid reference point

Wednesday built Off Grid as an open-source proof of concept that these capabilities work at scale. Off Grid runs on iOS, Android, and macOS from a single app. It includes:

Text generation via llama.cpp
Image generation via MNN/QNN/Core ML
Voice transcription via Whisper
Vision (image understanding and description)

Zero cloud dependency. Zero ongoing API cost. The project has 50,000+ users and 1,700+ GitHub stars. It is not a demo. It is a working application that Wednesday's engineering team built and maintains.

When a client asks whether on-device AI is real or a marketing claim, Off Grid is the answer. The source code is public and the app is in the App Store.

Case study — Clinical digital health platform

0patient logs lost offline — seizures logged anywhere, synced automatically

“They really cared and felt like an extension of our team. The quality of the work was top notch, and they were receptive to shifting priorities.”

Founder, Digital health platformRead the case study →

How to pick what to build first

Start with the capability that solves a problem your users have today, not the most technically impressive feature on the list.

The fastest path to a shipped on-device AI feature is voice transcription. The infrastructure is well-understood, the Whisper model is proven, and the use case is clear in almost every enterprise context. A field technician filing a report by voice, a clinician logging a patient note, a sales rep capturing a meeting summary. One capability, clear value, four to six weeks to ship.

The second-fastest is document scanning with OCR. Most enterprise apps involve some form of paperwork. Scanning a document and extracting the data into a structured form removes a manual step that users dislike. The implementation is straightforward and the device requirements are low.

Text generation and summarization are worth adding once voice and document scanning are live. They require more careful design because the output is generative and needs guardrails. Budget an extra two weeks for prompt design and output validation.

The decision framework is simple: pick the capability that removes the most manual work for your users, confirm it works on the devices your users carry, and get it shipped before attempting the next one.

Wednesday engineers have shipped all four on-device AI capabilities in production apps. Book a call to scope your first feature.

Book my 30-min call →

4.8 on Clutch

4x faster with AI2x fewer crashes100% money back

Frequently asked questions

More on-device AI guides, cost frameworks, and capability analyses are in the writing archive.