What Android devices support NPU-accelerated on-device AI in 2026?

Qualcomm Snapdragon 8 Gen 1 and newer devices (including the Snapdragon 8s Gen 3 and 8 Elite) support QNN NPU acceleration. Samsung Galaxy S22 and newer with Exynos chips have Samsung's NPU. Google Pixel 6 and newer have the Tensor NPU. All ARM64 Android devices (essentially everything from 2018 onward) support MNN-based inference, though without NPU acceleration, throughput is slower. Enterprise device fleets purchased after 2022 overwhelmingly meet the NPU requirements.

Why does each Snapdragon generation require a different compiled model file?

Qualcomm's QNN SDK compiles models against a specific NPU instruction set. The Snapdragon 8 Gen 1 NPU has a different instruction set from the Snapdragon 8 Gen 2, which differs again from the 8 Gen 3 and 8 Elite. A model compiled for one generation will not run optimally on another. Production deployment requires detecting the device chipset at runtime and loading the correct pre-compiled model variant. Wednesday discovered and resolved this during Off Grid development — the behavior is not documented publicly.

How large are on-device AI models for enterprise Android apps?

Quantized LLMs suitable for on-device inference range from 1.5 GB (Phi-3 Mini 3.8B at 4-bit quantization) to 4 GB (Mistral 7B at 4-bit). Voice transcription models (Whisper) are 150-300 MB. Image classification and object detection models for industrial inspection range from 10-100 MB. App bundle size management for AI models requires either bundling models in the APK (practical for small models), downloading models post-install, or using Android's native model delivery via Play Asset Delivery.

Can on-device AI run while the Android device is offline?

Yes. On-device AI requires no network connection. The model runs entirely on the device's CPU, GPU, or NPU. This is the primary enterprise advantage: features work in environments without connectivity, sensitive data never leaves the device, and there is no dependency on external API availability or latency. Wednesday's Off Grid app has 50,000 users running on-device AI with zero server calls.

What is the difference between on-device inference speed on a Snapdragon 8 Gen 1 vs 8 Gen 3?

Significant. The Snapdragon 8 Gen 3 NPU delivers approximately 98 TOPS (tera operations per second) of AI performance, compared to 23 TOPS on the 8 Gen 1. In practical terms, a 3.8B parameter quantized language model runs at 10-15 tokens per second on an 8 Gen 3 device vs 3-5 tokens per second on an 8 Gen 1. Image generation that takes 8 seconds on an 8 Gen 1 completes in under 3 seconds on an 8 Gen 3. Device targeting requirements should be scoped based on the expected inference workload.

What are the privacy compliance benefits of on-device AI for healthcare or financial services apps?

On-device AI means sensitive data — patient notes, financial transactions, employee data — never leaves the device for AI processing. There is no third-party AI API that receives the data, no network transmission, and no data residency issue. For HIPAA-covered healthcare apps, this eliminates the Business Associate Agreement requirement for AI features. For financial services apps subject to GLBA or state privacy laws, it eliminates data-sharing disclosure requirements for the AI processing.

Writing

On-Device AI for Android Enterprise Apps: What US Companies Can Ship Without Cloud Dependency 2026

Snapdragon NPUs, Samsung Galaxy AI hardware, and ARM64 inference now make real on-device AI possible for enterprise Android. Here is what you can ship today.

Anurag Rathod · Technical Lead, Wednesday Solutions

9 min read·Published Jan 24, 2026·Updated Jan 24, 2026

4xfaster with AI

2xfewer crashes

10xmore work, same cost

4.8on Clutch

Trusted by teams at

In this article

The Android on-device AI landscape
What you can ship today
Qualcomm QNN: the Snapdragon NPU path
MNN for any ARM64 Android device
The per-chipset model problem
Enterprise use cases that work today
Off Grid: the proof
Frequently asked questions

Snapdragon 8 Gen 1 launched in 2022 with a dedicated NPU running 23 TOPS. Snapdragon 8 Gen 3 in 2024 runs 98 TOPS. Every Android flagship phone sold since 2022 has hardware built for on-device AI. Most enterprise apps are not using it.

Key findings

Every Snapdragon 8 Gen 1 or newer Android device has a dedicated NPU. Enterprise device fleets purchased after 2022 have hardware that can run on-device AI today.

QNN enables NPU-accelerated inference on Snapdragon devices. MNN runs on any ARM64 Android device. Production Android on-device AI requires both backends: QNN for flagship devices, MNN as the universal fallback.

Each Snapdragon generation requires a different compiled model file. The Snapdragon 8 Gen 1, Gen 2, and Gen 3 NPUs have different instruction sets. Shipping on-device AI to the full Android fleet requires chipset detection and per-generation model variants.

Wednesday shipped all of this in Off Grid: on-device text AI, image generation via MNN/QNN, and voice transcription via Whisper on Android, with 50,000 users and zero server calls.

The Android on-device AI landscape

On-device AI on Android is not a single solution. It is a family of inference backends, each with different device coverage, performance characteristics, and integration complexity.

The landscape divides into three tiers.

Tier one: NPU-accelerated inference on Qualcomm Snapdragon. Qualcomm's Neural Processing SDK (QNN) enables inference on the dedicated NPU in Snapdragon 8 Gen 1 and newer chipsets. These are the fastest Android AI implementations — the NPU is purpose-built for matrix operations, runs more efficiently than the GPU, and does not interfere with the CPU or screen rendering.

Tier two: GPU-accelerated inference using frameworks like MNN or MediaPipe. These work on any ARM64 Android device with a GPU, which covers essentially every Android phone made after 2018. Performance is slower than NPU on Snapdragon flagship devices, but faster than CPU-only inference and consistent across the broad Android device fleet.

Tier three: CPU-only inference. Works on any Android device. Slowest. Viable for small models like image classification or object detection. Not viable for language models or image generation at acceptable quality.

Enterprise Android AI deployment requires understanding all three tiers, because enterprise device fleets include all three. A deployment to 500 field workers with a mix of Samsung Galaxy A-series (tier two) and Galaxy S-series (potentially tier one, depending on year and model) requires an app that serves both correctly.

What you can ship today

Four on-device AI capability classes are production-ready for enterprise Android in 2026.

Text AI: question answering, document summarization, on-device classification, translation without a network call. Quantized language models in the 1.5B-7B parameter range (Phi-3 Mini, Mistral 7B, Llama 3.1 8B) run on Snapdragon 8 Gen 1+ devices at 3-15 tokens per second depending on chipset generation. On NPU-enabled devices, responses feel conversational. On mid-range ARM64 devices via MNN, they are slower but usable for async tasks.

Image generation: stable diffusion variants (SDXL-Turbo, FLUX-schnell) run on Snapdragon 8 Gen 1+ via QNN and on any ARM64 Android device via MNN. Quality is production-grade. Generation time ranges from 3 seconds (Snapdragon 8 Gen 3, NPU) to 30+ seconds (mid-range ARM64, CPU). Enterprise use cases: report generation with images, product visualization, inspection documentation.

Voice transcription: Whisper runs on Android in quantized form. The small model (150MB) runs on CPU or GPU on any Android device. Transcription quality for clear speech in a quiet environment is comparable to cloud-based transcription services. Enterprise use cases: field service voice notes, mobile dictation, accessibility features.

Object detection and classification: MobileNet, YOLO variants, and MediaPipe solutions run fast on any Android device. Inference time under 50 milliseconds for image classification is standard. Enterprise use cases: industrial inspection (detecting defects, reading gauges), warehouse picking (identifying packages and bin locations), safety monitoring (PPE compliance).

Qualcomm QNN: the Snapdragon NPU path

QNN is Qualcomm's Neural Network SDK for running AI models on Snapdragon NPUs. It is the right backend for Snapdragon 8 Gen 1 and newer devices when maximum throughput matters.

The integration path starts with model conversion. Most publicly available AI models are in ONNX, PyTorch, or TensorFlow format. QNN requires these to be compiled into QNN-compatible format using the QNN SDK tools. The compilation step is offline — it produces model artifacts that are bundled with the app or downloaded post-install.

The NPU acceleration is significant. A 3.8B parameter quantized language model running on the Snapdragon 8 Gen 3 NPU delivers 10-15 tokens per second. The same model on the GPU of the same device delivers 5-8 tokens per second. The same model on CPU delivers under 3 tokens per second.

For enterprise use cases where responsiveness matters — on-device question answering, real-time translation, voice-to-action transcription — the NPU path is meaningful.

The integration is not simple. QNN requires native Android code, not a JavaScript or Dart wrapper. It requires the QNN runtime libraries to be bundled with the app. And it requires solving the per-chipset model variant problem described below.

Building an Android app with on-device AI requirements? Book a 30-minute architecture review with Wednesday.

Get my recommendation →

MNN for any ARM64 Android device

MNN (Mobile Neural Network) is Alibaba's open-source inference framework. It runs on any ARM64 Android device using CPU or GPU acceleration, without requiring a specific chipset.

MNN's role in an Android on-device AI deployment is the universal fallback. For a device fleet that includes both Snapdragon 8 Gen 2 flagship phones (QNN-capable) and Samsung Galaxy A34 mid-range phones (no Qualcomm NPU), MNN delivers consistent on-device AI performance across both tiers — at different speeds.

MNN's ONNX compatibility means the same model that runs on QNN can run on MNN with minimal conversion. The same quantized LLM or image generation model serves both paths. The app detects the device chipset at runtime and routes inference to the correct backend: QNN for Snapdragon NPU devices, MNN GPU for other Android devices, MNN CPU as the final fallback.

Wednesday used MNN as the primary inference backend for Off Grid on Android. MNN handles all AI inference on Android devices that are not Snapdragon 8 Gen 1+. The production dataset is 50,000 users across a diverse Android device fleet — the MNN backend has been validated at scale.

The per-chipset model problem

This is the part that is not documented anywhere.

Qualcomm's QNN SDK compiles models against a specific NPU instruction set. The Snapdragon 8 Gen 1, 8 Gen 2, 8 Gen 3, and 8 Elite each have different NPU architectures with different instruction sets. A model compiled for 8 Gen 1 will not run optimally on 8 Gen 2. A model compiled for 8 Gen 2 will not run on 8 Gen 3.

This means that deploying QNN-accelerated models to the full Snapdragon fleet requires:

A separate compiled model artifact for each Snapdragon generation
Runtime chipset detection to select the correct artifact
An app that bundles or downloads multiple model variants
Testing on physical devices from each generation

Wednesday discovered this during Off Grid development. The QNN SDK documentation describes the general compilation process. It does not clearly explain that each chipset generation produces incompatible artifacts, or that production deployment requires per-generation model management.

The solution is runtime chipset detection using Android's Build API to identify the CPU model, mapping CPU model to Snapdragon generation, and loading the corresponding QNN model artifact. The logic must handle devices where the chipset detection is ambiguous — some budget Android phones misreport their chipset information.

This is proprietary engineering knowledge. Most mobile vendors will tell you on-device AI with QNN is possible without knowing this constraint exists. Wednesday has solved it in production.

Snapdragon generation	NPU TOPS	Typical device (2026)	QNN model variant required
8 Gen 1	23 TOPS	Samsung S22 series	Gen 1 artifact
8 Gen 2	45 TOPS	Samsung S23 series	Gen 2 artifact
8 Gen 3	98 TOPS	Samsung S24 series	Gen 3 artifact
8 Elite	45 TOPS (HTP v75)	Samsung S25 series	Elite artifact
Other ARM64	N/A	Most mid-range Android	MNN backend

Case study — Clinical digital health platform

0patient logs lost offline — seizures logged anywhere, synced automatically

“They really cared and felt like an extension of our team. The quality of the work was top notch, and they were receptive to shifting priorities.”

Founder, Digital health platformRead the case study →

Enterprise use cases that work today

Five Android on-device AI implementations that are production-ready for enterprise in 2026.

Field inspection with defect detection: a warehouse or manufacturing worker points their Android device at a component or surface. The app runs object detection on-device (MNN or MediaPipe, under 50ms per frame) and highlights detected defects in real time. No network connection required. Works in a Faraday cage, in a cold storage facility, underground.

Document processing without cloud OCR: a field service engineer photographs an equipment label or a form. The app runs on-device OCR (Tesseract or MediaPipe, text recognition model) and extracts text without sending the image to a cloud service. Relevant for environments where document images contain sensitive information.

Voice-to-action in the field: a field service worker speaks a work order update instead of typing it. Whisper on-device transcribes the voice note to text. The text is processed by an on-device instruction-following model that extracts structured data (work order number, time, action taken) from the natural language input. No cloud dependency. Works offline.

On-device translation for multilingual field teams: a field operations app serving workers who speak different languages runs a compact translation model on-device to translate work instructions from English into Spanish, Portuguese, or other languages. No cloud translation API, no per-translation cost, no data privacy issue.

Intelligent form pre-fill: a field service app uses an on-device language model to analyze previous work orders and suggest fields for a new work order based on location, equipment type, and recent history. The model runs locally on the device. The suggestion quality improves as the local history grows.

Off Grid: the proof

Wednesday built Off Grid as proof that production on-device AI on Android and iOS is achievable today.

Off Grid is a complete on-device AI application: text generation, image generation, voice transcription, and vision — all running without server calls. On Android, the image generation backend runs on three code paths: MNN for any ARM64 device, QNN for Snapdragon 8 Gen 1, and the per-generation QNN variants for newer Snapdragon chipsets.

50,000 users. 1,700+ GitHub stars. Zero paid acquisition spend. Zero server calls.

Wednesday's Android on-device AI engineers built Off Grid's Android backend from React Native, using native modules to integrate MNN and QNN. The architecture is production-ready and can be adapted for enterprise use cases that require on-device AI on Android.

If your enterprise Android app needs on-device AI — text, vision, voice, or image generation — Wednesday has already solved the hard problems: per-chipset QNN model management, MNN fallback integration, offline inference pipeline, and App Store and Play Store distribution of large model assets.

Talk to Wednesday's on-device AI team about your Android enterprise requirements.

Book my 30-min call →

4.8 on Clutch

4x faster with AI2x fewer crashes100% money back

Frequently asked questions

Not ready for a call? Browse on-device AI decision guides and implementation references for enterprise mobile apps.