Does on-device AI work on both iPhone and Android?

Yes, with different hardware requirements. On iPhone, Apple's Core ML framework provides NPU acceleration on A-series chips from the A15 onward (iPhone 13+). On Android, Qualcomm's QNN provides NPU acceleration on Snapdragon 8 Gen 1 and newer (most 2022+ flagships). Both platforms run the same AI capabilities; the implementation and device minimum differ by platform.

How large are on-device AI models and how does that affect app size?

A 3B parameter model in 4-bit quantisation is approximately 1.8GB. A 7B parameter model in 4-bit quantisation is approximately 4GB. These are too large to bundle in an app binary. The standard implementation downloads the model to local device storage on first launch over Wi-Fi, then uses it offline indefinitely. The app binary itself adds minimal size — the model download happens separately.

Will on-device AI drain my users' batteries faster?

Running inference consumes more power than idle operation but less than you might expect in typical use. Voice transcription using Whisper adds less than 3% battery drain per hour of active transcription. Text generation for a typical query (a few sentences) consumes roughly the same power as loading a complex webpage. Image generation is the most intensive operation and should be used sparingly on low-battery devices.

What is quantisation and why does it matter for on-device AI?

Quantisation reduces the precision of a model's numerical weights — from 16-bit floating point to 8-bit integer or 4-bit integer. This reduces model file size by 50-75% and reduces RAM requirements proportionally. The tradeoff is a slight reduction in output quality that is imperceptible for most practical use cases. 4-bit quantised models perform at approximately 95% of the quality of full-precision models for typical text generation tasks.

Can on-device AI be updated without an app store update?

The model files can be updated through an over-the-air download mechanism without requiring an App Store update. The app checks for a new model version, downloads it to local storage, and switches to the new version silently. The application code that loads and runs the model typically requires an App Store update when the inference framework changes, but model-only updates are distributable without App Store review.

Writing

AI in Your Mobile App Without Internet: What Is Possible, What Is Not, and What It Costs in 2026

A plain-language guide to what on-device AI can do today, what it cannot, and what each capability costs to add.

Anurag Rathod · Technical Lead, Wednesday Solutions

9 min read·Published Oct 10, 2025·Updated Oct 10, 2025

4xfaster with AI

2xfewer crashes

10xmore work, same cost

4.8on Clutch

Trusted by teams at

In this article

What on-device AI actually means
What works on-device today
What does not work on-device
Device requirements by capability
Cost to add each capability
The Off Grid reference point
How to decide what to build
Frequently asked questions

A 7B parameter AI model fits on any device with 6GB of RAM shipped in the last three years. That covers the iPhone 15 Pro, the Samsung S24+, and the Pixel 8 Pro — all of which are in your users' pockets right now. What that model can do without touching the internet is the question most enterprise buyers have never gotten a straight answer to.

This guide answers it in plain language: what is working on-device today, what is not, what hardware it requires, and what it costs to add each capability to your app.

Key findings

A 7B parameter model fits on any flagship device with 6GB+ RAM — iPhone 15 Pro, Samsung S24+, Pixel 8 Pro. A 3B model fits on 4GB devices, covering most 2022+ flagships.

Voice transcription (Whisper) works on any device from 2020 onward. Image generation requires NPU acceleration: Apple A15+ or Snapdragon 8 Gen 1+.

What does not work on-device: real-time knowledge, models above 7B parameters, reliable support for rare languages.

Wednesday's Off Grid ships text AI, voice transcription, image generation, and vision analysis simultaneously on iOS and Android — the reference for what is achievable today.

What on-device AI actually means

On-device AI means the model runs on the phone or tablet. No internet connection is needed during inference. The user types a query, the device processes it using its own chip, and the response appears — all without any data leaving the device.

The model is installed on the device, either bundled with the app or downloaded to local storage on first launch. Once installed, it works offline indefinitely. Updates to the model come through app updates or background downloads, but day-to-day use requires no connectivity.

This is different from a cloud AI feature that "feels" fast because it has a low-latency API. Even a 100ms cloud AI response requires an internet connection and transmits user data to a remote server. On-device AI needs neither.

The practical implication for enterprise apps: on-device AI works in hospitals where phones are in airplane mode, on job sites with no cellular coverage, in areas with unreliable connectivity, and in any situation where sending data to a server is prohibited by policy.

What works on-device today

Six categories of AI capability run reliably on-device on current enterprise hardware.

Text generation and summarisation (up to 7B parameters). A 7B parameter language model handles text summarisation, question answering, document analysis, writing assistance, and conversational interfaces. The output quality matches cloud AI for most enterprise use cases. Wednesday's Off Grid ships 3B and 7B models; the 7B model produces output that users frequently cannot distinguish from a cloud API.

Voice transcription. OpenAI's Whisper model transcribes speech to text entirely on-device with accuracy equivalent to cloud transcription services. It runs on any device from 2020 onward. Battery impact during active transcription is under 3% per hour. Supported languages: English, Spanish, French, German, and 35 other commonly spoken languages.

Image classification. Identifying what is in a photo — object category, scene type, product identification — runs on-device with classification accuracy above 90% on standard categories. Inference time is under 200ms on NPU-equipped devices. Useful for field inspection apps, inventory apps, and any workflow where users photograph physical objects.

Object detection. Locating and labelling multiple objects within a single image, including their position. Runs on-device at 15-30 frames per second on NPU-equipped devices. Useful for assembly line inspection, safety compliance documentation, and augmented reality overlays.

Document Q&A. Asking questions about a PDF or document and receiving a specific answer. A 3B parameter model with retrieval-augmented generation handles documents up to approximately 50 pages without cloud processing. Useful for policy lookup, contract review, and field procedure reference.

Image generation. Generating images from text descriptions runs on-device on NPU-equipped devices (Apple A15+, Snapdragon 8 Gen 1+). Inference time for a 512x512 image is 15-45 seconds on current hardware. Off Grid ships image generation on-device for both iOS and Android.

What does not work on-device today

Three categories of capability have real limitations on current hardware that make on-device AI the wrong architecture.

Real-time knowledge. An on-device model knows nothing beyond its training data cutoff. It cannot look up current prices, recent news, live inventory, or any information that changes after the model was trained. Cloud AI features that use retrieval-augmented generation against live data sources cannot be replicated on-device without an internet connection. For apps that require current information, cloud AI is the correct architecture.

Models above 7B parameters. Current flagship devices support models up to approximately 7B parameters at practical inference speeds. Larger models (13B, 70B, and above) produce higher-quality outputs for complex reasoning tasks but do not fit in device RAM or run at usable inference speeds on current hardware. If your use case requires complex multi-step reasoning, nuanced writing, or advanced code generation, a 7B model may not meet the quality bar. Cloud AI may be necessary.

Reliable rare language support. Current small on-device models are trained primarily on English and major European languages. Support for Arabic, Hindi, Japanese, Korean, and other languages has improved significantly but is not yet equivalent to cloud models in output quality. For enterprise apps serving multilingual user bases, language-specific evaluation is required before committing to on-device AI for text generation.

A 30-minute call with a Wednesday engineer maps which on-device capabilities are feasible for your specific app and user base.

Get my recommendation →

Device requirements by capability

Not all on-device AI works on all devices. The table below shows the minimum device specifications for each capability category, based on Wednesday's production testing data across Off Grid deployments.

Capability	Minimum device	RAM requirement	NPU required?
Text generation (3B model)	iPhone 12 / Samsung S21	4GB	Recommended, not required
Text generation (7B model)	iPhone 15 Pro / Samsung S24+	6GB	Required for acceptable speed
Voice transcription (Whisper)	iPhone X / any 2020+ Android	2GB	No — CPU sufficient
Image classification	iPhone X / any 2019+ Android	2GB	Recommended
Object detection	iPhone X / any 2019+ Android	2GB	Recommended
Document Q&A (3B model)	iPhone 12 / Samsung S21	4GB	Recommended
Image generation	iPhone 13 Pro / Samsung S22+	6GB	Required

The practical implication for enterprise deployment: before committing to an on-device AI feature, profile your actual user base's device mix. If 40% of your users are on three-year-old devices with 4GB RAM, a 7B model will not serve them. A 3B model or voice transcription will.

Cost to add each capability

The following cost ranges are based on Wednesday's delivery data across enterprise mobile engagements. They assume an existing production app on iOS and Android and include model integration, device compatibility testing, RAM management, background state architecture, and App Store submission preparation.

Capability	Estimated engagement time	Complexity notes
Voice transcription (both platforms)	4-6 weeks	Low complexity; Whisper is well-documented
Image classification (both platforms)	3-5 weeks	Low complexity; mature model ecosystem
Object detection (both platforms)	5-7 weeks	Moderate; requires bounding box UI
Text generation — 3B model (both platforms)	8-12 weeks	High; RAM management, chipset variants
Text generation — 7B model (both platforms)	10-14 weeks	High; device compatibility matrix is narrower
Document Q&A (both platforms)	8-12 weeks	High; retrieval system plus model
Image generation (both platforms)	12-16 weeks	Very high; NPU requirement and long inference time
Full multi-capability suite	16-24 weeks	Very high; Off Grid is the reference

These ranges assume a vendor who has shipped on-device AI in production. A vendor shipping on-device AI for the first time should add 4-8 weeks to each range for problem discovery time.

The Off Grid reference point

Wednesday's Off Grid app ships all six capability categories simultaneously on iOS and Android. Text generation (3B and 7B), voice transcription, image classification, object detection, document Q&A, and image generation — all running on-device, offline, with no telemetry.

50,000+ users have downloaded Off Grid. The GitHub page is public and has 1,700+ stars. The implementation handles Metal abort() on 4GB iPhones, chipset-specific QNN variants on Android, and background generation state across all capabilities.

Off Grid is not a demonstration. It is a production application that Wednesday built to validate what on-device AI requires at production quality. Every enterprise on-device AI engagement Wednesday takes on starts from the architecture Off Grid validated.

Case study — Clinical digital health platform

0patient logs lost offline — seizures logged anywhere, synced automatically

“They really cared and felt like an extension of our team. The quality of the work was top notch, and they were receptive to shifting priorities.”

Founder, Digital health platformRead the case study →

How to decide what to build

The decision framework for on-device AI is three questions.

Does your use case require features that only work with internet connectivity? Real-time data, rare languages, or complex reasoning above a 7B parameter quality bar all require cloud AI. If your answer is yes for the primary use case, on-device AI may be a supplement but not the core architecture.

Does your compliance or privacy context make cloud AI complicated? If your users' inputs contain sensitive data — patient information, financial records, privileged communications — cloud AI creates compliance overhead that on-device AI avoids entirely. If the answer is yes, on-device AI is worth the implementation complexity.

What is your users' device mix? If your user base has significant numbers on 4GB RAM devices and the use case requires a 7B parameter model, the feature will not be available to those users. Profiling your actual user base before scoping determines whether on-device AI can serve enough of your users to justify the build.

Wednesday's pre-scope process covers all three questions for your specific app, compliance context, and user base before a line of code is written.

Wednesday has shipped every on-device AI capability in this guide in production. The assessment for your app takes 30 minutes.

Book my 30-min call →

4.8 on Clutch

4x faster with AI2x fewer crashes100% money back

Frequently asked questions

More on-device AI guides, device requirement analyses, and cost frameworks are in the writing archive.